URBAN AND CLUSTER AGGLOMERATION ECONOMIES'S EFFECTS ON RURAL HOUSEHOLDS IN ASIA By Chaoran Hu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Agricultural, Food, and Resource Economics - Doctor of Philosophy 2017 ABSTRACT URBAN AND CLUSTER AGGLOMERATION ECONOMIES'S EFFECTS ON RURAL HOUSEHOLDS IN ASIA By Chaoran Hu Agglomeration effects play important roles for rural households in participating in farm and nonfarm activities. With the rapid growth of cities of different sizes and the development of food value chain, how these agglomerations of urban effects, networks, and food value chain clusters will affect rural households’ participation in nonfarm employment and farm behavior (technology adoption) are not yet well known. The dissertation consists of three chapters that aim to assess the impacts of these urban and cluster agglomeration economies’ on rural households in Asia. Chapter 1 links urban proximity, agricultural performance, and nonfarm employment of rural households in China to study how the distance to cities of different sizes will impact rural households income diversifications. I calculate unique composite distance measures and find that proximity to cities of different sizes impacts rural households’ nonfarm employment differently. Chapter 2 is an extension of essay 1 and studies how networks will impact ruralurban migrants’ destination choice in Guangdong, China. I calculate the origindestination paired networks using two methods and find that network effects play positive roles in choice of cities for rural-urban migrants in China. The results also emphasize the importance to include migration costs and to separate physical migration costs and network effects in rural-urban domestic migration studies. Besides the agglomeration of urban effects on rural households nonfarm employment, the agglomeration of actors within food value chain, particularly in the rural areas, will also affect rural households farming decisions. Chapter 3 explores value-chain clusters and aquaculture innovation in Bangladesh. I test whether the clustering, and thus economies of agglomeration with implied lower transaction costs, encourage and facilitate farmers to innovate. I use a unique data set from our own primary survey of the aquaculture value chain in Bangladesh and calculate an index to include both horizontal agglomeration and vertical interconnections among actors in the value chain. I find that being in an area with a high clustering index is associated with a higher probability of farmers using more modern inputs and growing non-traditional commodity fish species, controlling for farmers’ other characteristics as well as proximity to cities. Copyright by CHAORAN HU 2017 ACKNOWLEDGMENTS I would like to express my sincere gratitude and thanks to my major professor, Thomas Reardon, for his guidance and support in my graduate studies. His commitment and enthusiasm to research has inspired me a lot and his creative ideas have expanded my horizons. It is a great pleasure and honor to work with him. I am grateful to my dissertation committee members, Songqing Jin, Jeffrey Wooldridge, and Joseph Herriges, for their constructive advice, comments and support. Also, I would like to thank a group of people providing valuable advice and support for my work. Thank Ricardo Hernandez, Ben Belton, Xiaobo Zhang, and Shahidur Rashid for their insights and advice on fish value chain project. Thank Kevin Z. Chen, Duncan Boughton, and David Tschirley for their support in my graduate studies. I am also thankful to the graduate students, faculties, and staff in the Department of Agricultural, Food, and Resource Economics at MSU. I wouldn’t have made it without them. Last, thank my dear family in China and USA for always being considerate and caring! v TABLE OF CONTENTS LIST OF TABLES ........................................................................................................... viii LIST OF FIGURES ........................................................................................................... xi INTRODUCTION .............................................................................................................. 1 CHAPTER 1: LINKING PROXIMITY TO SECONDARY CITIES VS. MEGA CITIES, AGRICULTURAL PERFORMANCE, AND NONFARM EMPLOYMENT OF RURAL HOUSEHOLDS IN CHINA ............................................................................................... 4 1. Introduction ..................................................................................................................... 4 2. Survey, sample, and variables ......................................................................................... 7 2.1. Farm household survey and sample ......................................................................... 7 2.2. City classification and sample ................................................................................. 8 2.3. Definition of urban proximity ................................................................................ 10 3. Descriptive statistics ..................................................................................................... 13 3.1. Distances of villages from cities of different sizes ................................................ 13 3.2. Sample characteristics ............................................................................................ 15 4. Model and estimation methods ..................................................................................... 18 4.1. Model ..................................................................................................................... 18 4.2. Estimation issues and methods .............................................................................. 20 5. Empirical results ........................................................................................................... 22 5.1. Impacts of urban proximity .................................................................................... 22 5.2. Interacted effects from urban proximities and agricultural performance .............. 27 5.3. Robustness check ................................................................................................... 29 5.3.1. Different definitions of nonfarm employment ................................................ 29 5.3.2. Estimations with additional controls............................................................... 31 6. Conclusions ................................................................................................................... 32 APPENDIX ....................................................................................................................... 34 REFERENCES ................................................................................................................. 57 CHAPTER 2: NETWORK EFFECTS AND RURAL-URBAN MIGRATION DESTINATION CHOICES: EVIDENCE FROM CHINA.............................................. 62 1. Introduction ................................................................................................................... 62 2. The model ..................................................................................................................... 65 3. Data ............................................................................................................................... 66 3.1. Data and sampling.................................................................................................. 66 3.2. Characteristics of the sample ................................................................................. 67 4. Estimations.................................................................................................................... 68 4.1. Networks ................................................................................................................ 68 4.2. Calculation of counterfactual incomes................................................................... 70 4.3. Migration cost ........................................................................................................ 72 4.4. Amenities of cities ................................................................................................. 73 vi 4.5. Weighting used to correct for on-site sampling ..................................................... 73 5. Estimation results .......................................................................................................... 74 5.1. Impacts of network effects ..................................................................................... 74 5.2. Impacts of the migration cost ................................................................................. 76 5.3. Results with individual characteristics interactions ............................................... 77 5.4. Results with city amenities interactions ................................................................. 80 6. Conclusions ................................................................................................................... 81 APPENDIX ....................................................................................................................... 83 REFERENCES ................................................................................................................. 99 CHAPTER 3: VALUE-CHAIN CLUSTERS AND AQUACULTURE INNOVATION IN BANGLADESH ........................................................................................................ 102 1. Introduction ................................................................................................................. 102 2. Background on Aquaculture in Bangladesh ............................................................... 107 3. Survey and sample ...................................................................................................... 109 4. Clustering index and its patterns ................................................................................. 112 4.1. Calculation of clustering index ............................................................................ 112 4.2. Clustering patterns ............................................................................................... 115 5. Regression estimations................................................................................................ 117 5.1. Regression specification ...................................................................................... 117 5.2. Econometric issues and their resolution .............................................................. 120 5.3. Descriptive results ................................................................................................ 122 6. Results of Regressions ................................................................................................ 125 6.1. Adoption of modern inputs .................................................................................. 125 6.2. Adoption of nontraditional commodity and newly commercialized aquaculture niche species ............................................................................................................... 127 6.3. Specialization versus diversification of fish production ...................................... 128 6.4. Robustness check ................................................................................................. 128 7. Conclusions ................................................................................................................. 129 APPENDIX ..................................................................................................................... 133 REFERENCES ............................................................................................................... 151 vii LIST OF TABLES Table 1.1. Distribution of cities and population statistics by size of cities ....................... 36 Table 1.2. Distance measures and village level descriptive statistics ............................... 37 Table 1.3. Distribution of sample villages by measure of distance .................................. 38 Table 1.4. Distribution of incomes of sample households, by years and sources............. 39 Table 1.5. Descriptive Statistics of Family Members (Individual Level)......................... 40 Table 1.6. Distribution of nonfarm income of sample households, by year and location 41 Table 1.7. Descriptive Statistics of other variables........................................................... 42 Table 1.8. Impacts of urban proximities on levels of income, by sources and by six methods of measurements of distance .............................................................................. 43 Table 1.9. Impacts of urban proximities on income shares, by sources and by six methods of measurements of distance ............................................................................................. 44 Table 1.10. Impacts of local agricultural performance (AP) and urban proximities on migration income levels and shares, by three methods of measurements of distance ...... 45 Table 1.11. Robustness check using different definitions of local nonfarm income levels ........................................................................................................................................... 46 Table 1.12. Robustness check using different definitions of Local-Migration income levels ................................................................................................................................. 47 Table 1.13. Robustness check using different definitions of Migration income levels and shares................................................................................................................................. 48 Table 1.14. Robustness check, including additional controls in the estimations .............. 49 Table 1.A1. Full results of impacts of urban proximities on local income values, by sources and by six methods of measurements of distance ................................................ 51 Table 1.A2. Full results of impacts of urban proximities on local-migration income values, by sources and by six methods of measurements of distance ............................... 53 Table 1.A3. Full results of impacts of urban proximities on migration income values, by sources and by six methods of measurements of distance ................................................ 55 Table 2.1. Summary statistics of demographic variables, Individual level ...................... 84 viii Table 2.2. Average incomes and rents per month ............................................................. 85 Table 2.3. Origin-destination paired networks (%), by cities and measures .................... 86 Table 2.4. Distribution of sample migrants by types ........................................................ 87 Table 2.5. Probability of individual from region r to city j, by education levels .............. 88 Table 2.6. Imputed incomes for all observations by cities ............................................... 89 Table 2.7. Travel distance and time, by destination cities ................................................ 90 Table 2.8. Summary statistics of city attributes, city level ............................................... 91 Table 2.9. Distribution of sample migrants in nine destination cities .............................. 92 Table 2.10. Estimation results of Network I on destination choice, without travel time.. 93 Table 2.11. Estimation results of Network II on destination choice, without travel time 94 Table 2.12. Estimation results of networks on destination choice, with travel time ........ 95 Table 2.13. Estimation results of networks and education selections............................... 96 Table 2.14. Estimation results of networks and micro-level nonfarm experience............ 97 Table 2.15. Estimation results with networks and city amenities interactions ................. 98 Table 3.1. Farmed fish output of sample households, by species ................................... 134 Table 3.2. Distribution of size categories of different types of actors in the study areas 135 Table 3.3. Clustering in the 20 districts in four cardinal-points zones ........................... 136 Table 3.4. Descriptive statistics of household and district characteristics ...................... 137 Table 3.5. Patterns in fish farmers’ adoption of technology innovations ....................... 138 Table 3.6. Regression of clustering degree on adoption of modern inputs..................... 140 Table 3.7. Regression of clustering degree on per acre expenditures on modern inputs 141 Table 3.8. Regression of clustering degree on output share, by species ......................... 142 Table 3.9. Regression of clustering degree and urban proximity on modern inputs adoption, by product-type farmers .................................................................................. 143 Table 3.10. Regression of clustering degree on specialization ....................................... 144 Table 3.11. Regression of clustering degree and urban proximity, by different samples145 ix Table 3.A1. Fishes that mapped into each specie category ............................................ 149 Table 3.A2. IV regression of clustering degree on specialization, by IV methods ........ 150 x LIST OF FIGURES Figure 1.1. Trend of income structure of rural households in China, 1990-2011............. 50 Figure 3.1. 20 sample districts in six major aquaculture areas of Bangladesh ............... 146   Figure 3.2. Distribution of actors per 1,000 rural people in 2013 by district ................. 147   Figure 3.3. Clustering degrees of Sample Districts, 2013 .............................................. 148   xi INTRODUCTION Agglomeration effects play important roles for rural households in participating in farm and nonfarm activities. On the one hand, the geography and migration literature has summarized that the urban proximity could provide push effects to increase the rural labor supply from farm to nonfarm employment. On the other hand, the industrial organization literature has shown that network as well as clustering effects could help the actors within the clusters to benefit from agglomeration externalities. However, the gaps remained in the literature motivated the goal of this dissertation. The first gap remained in the literature is that little is unknown whether distance to cities of different sizes would impact rural households’ nonfarm employments differently. In the first chapter, I link urban proximity, agricultural performance, and nonfarm employment of rural households in China to study how the distance to cities of different sizes will impact rural households income diversifications. To capture the potential impacts from all cities I calculate unique composite distance measures. I then analyze the distance measures’ impacts on rural employment. Several main findings stand out. First, proximity to cities of different sizes impacts rural households’ nonfarm employment (including agricultural wage) versus agricultural employment (own farming only) differently. Specifically, proximity to smaller cities increases both income levels and shares in total income from agriculture and local nonfarm activities, while proximity to mega cities increases nonfarm income from migratory activities. Second, rural households seeking nonfarm employment undertaken at shorter distances (within their province) or by commuting tend to be attracted by service sectors in cities of medium 1 size. By contrast, the agglomeration effects and manufacturing sectors in smaller cities are important to households’ undertaking rural local nonfarm employment. Manufacturing in mega cities induces rural households to undertake out- of-own province migration. Third, in contrast to the common view, local agriculture performance spurs migration by rural households, in particularly for people living far from mega cities. The second chapter is an extension of the first chapter. This paper studies how networks in the destination city will impact rural-urban migrants’ location choice. Existing studies calculate the network effects from a relatively narrow aspect, the impacts of which may be limited and even decrease over time. In this paper, I calculate the origindestination paired networks using two methods, and study their impacts on rural-urban migrants’ destination choices in Guangdong, China. Filling in the gaps by allowing migration cost being varied by location, and calculating counterfactual expected incomes to account for selection concerns, I provide another evidence on education-related selection to the migration literature with mixed results. The results show that network effects play positive roles in choice of cities for rural-urban migrants in China, and I find positive selection of networks in terms of education. The results emphasize the importance to include migration costs and to separate physical migration costs and network effects in rural-urban domestic migration studies. Besides the agglomeration of urban effects on rural households nonfarm employment, the agglomeration of actors within food value chain, particularly in the rural areas, will also affect rural households farming decisions. Chapter three explores value-chain clusters and aquaculture innovation in Bangladesh. Farmers adopting and implementing innovations, such as new technologies and new products, often require “collaborative 2 inter-segment innovation” by other actors in other segments of the value chain, such as wholesalers implementing new product innovations such as supply of commercial fish feed and chemicals and buying and marketing non-traditional fish species. I test whether the clustering, and thus economies of agglomeration with implied lower transaction costs, encourage and facilitate farmers to innovate. That potential determinant of farmer choices has not been studied in agriculture or aquaculture or indeed the food sector. I use a unique data set from our own primary survey of the aquaculture value chain in Bangladesh, including micro data for 1500 fish farm households and 20 districts for meso level data. I calculate an index to include both horizontal agglomeration and vertical interconnections among actors in the value chain. I find that being in an area with a high clustering index is associated with a higher probability of farmers using more modern inputs and growing non-traditional commodity fish species, controlling for farmers’ other characteristics as well as proximity to cities. 3 CHAPTER 1: LINKING PROXIMITY TO SECONDARY CITIES VS. MEGA CITIES, AGRICULTURAL PERFORMANCE, AND NONFARM EMPLOYMENT OF RURAL HOUSEHOLDS IN CHINA 1. Introduction Nonfarm income from manufactures or services, in rural or urban areas, is important to rural households in developing countries. The share of nonfarm income over total rural household income averages 34%, 51%, and 47% in developing Africa, Asia, and Latin America, respectively (Haggblade et al., 2007). The National Bureau of Statistics of China notes that the share of nonfarm income in total income of rural households has increased from 34% to 52% in one decade, from 2000 to 2010 (see Figure 1.1). Three strands of literature have explored the determinants of rural households’ participation in nonfarm activities. A first strand starting in the 1950s focused on decisions of rural households (who were considered to be autarchic farmers before migrating) to migrate to urban areas and internationally. The strand conceived of the drivers of migration as the relative rural/urban wage (Lewis, 1954), the expected urban wage and employment (Todaro, 1969); Harris and Todaro, 1970), and perceptions of risks and capacities to migrate (in general, Stark and Lucas, 1988; for China, Rozelle et al., 1999; Mu and Giles, 2014; and Du et al., 2015) A second strand of literature focused on activity portfolio selection by rural households of local rural nonfarm employment, farm work, and migration (e.g., Hymer and Resnick, 1969, in general, and Barrett et al., 2001, Yunez-Naude and Taylor, 2001, for Africa and Mexico; and for China, examples are Zhao, 1999a, and Zhao, 1999b). 4 This literature modeled the determinants of this multisectoral, multi-spatial activity choice as a function of labor-supply origin-zone characteristics (such as rural infrastructure, and local agriculture performance which can create demand for local nonfarm employment via production and consumption linkages (Mellor and Lele, 1973). A third strand of literature introduced spatial determinants into the modeling of local and migratory employment. This strand started with von Thünen's (1826) observation that different degrees of proximity to an urban center affect the spatial pattern of agricultural activities as a function of transaction costs and factor costs. Recently this was extended to model the spatial distribution of sectoral activity (e.g., Desmet and Fafchamps, 2005, for the US) and rural households’ participation in nonfarm employment as a function of distance to the nearest urban center (e.g., Lucas, 2001, Fafchamps & Shilpi, 2003, for Nepal; De Janvry et al., 2005, for China; and Fafchamps et al., 2016 for Ghana). Some work has crossed distance to the nearest city with agricultural performance of the rural household’s zone (Deichmann et al., 2009, for Bangladesh). However, I perceive a gap in the third strand of literature. Gains in understanding of the impact of city distance on rural household labor choices could be made by differentiating distance to different sizes of cities, rather than just a city per se. This has not been done in the literature and is addressed by the present paper. The justification for the importance of the gap is made in a general way in the work of Christiaensen and Todo (2014) and Christiaensen et al. (2017) in Africa and Berdegué et al. (2015) in Latin America. They show that the distinction between mega cities and secondary cities is important in terms of how the city influences the rural hinterland’s poverty rates and 5 territorial development, respectively. Both works argue that secondary cities may have more direct dependence on and interactions with rural areas, and secondary versus mega cities have different draw potentials for rural migrants. For the US, Partridge et al. (2008) found stronger effects of proximity to larger urban centers compared with small urban centers on population growth of U.S. hinterland counties. But one can go further, and I do in this paper, than just distinguishing distance from different sizes of cities. I also bring in the basic concept from Deichmann et al. (2009) by crossing the distance/city size effect with agricultural performance of the labor-origin zone. This cross also has not been done before in the literature. Agricultural performance conditions the degree of pull effect of a city on a rural household by either being so risky or poor that it encourages through a push effect the household to send migrants or commuters to work in the city (given the distance), or it is so good that it creates a pull to stay in the farm area for direct farm employment or employment in local nonfarm activities in consumption linkages or production linkages to local agriculture. The idea of including and differentiating rural agricultural performance by interacting it with distance to cities is the same as that done in Henderson and Wang (2005). In their model, they allow for technological improvement in agriculture. Thus, agricultural performance of the rural areas may not only enter as a push effect for labor to migrate from the rural area but also a pull effect for rural-urban migration, and these effects may be impacted by surrounding urban sectors. Besides modeling the impact on rural household labor choices of the distance of cities of different sizes crossed with local rural agricultural performance, I aim to make a methodological contribution. Most studies that include a measure of the distance to urban 6 areas tend to measure urban proximity as the shortest distance to a nearby city, or as the distance to a largest city. Such studies do not, however, take into account the potential effects of proximity to other cities beside the nearest city. I thus measure urban proximity as a composite distance by summing the distances to all cities of different size categories with two different weights (population and sectoral GDPs) that are surmised to have impacts on household decisions. To my knowledge no other study uses this composite measure. We examine the above research questions using data from five provinces in a balanced two –year panel, 2005 and 2006, from a nation-wide survey collected annually by China’s Research Center for the Rural Economy (RCRE) of the Ministry of Agriculture. The paper is organized as follows. Section 2 discusses the survey, sampling, and definitions of key variables. Section 3 provides descriptive statistics on nonfarm activities of rural households in the sample. It also shows distances of survey villages from cities of different sizes, depicting the different measures used in this paper. Section 4 presents the conceptual and empirical approach. Section 5 shows the regression results. Section 6 concludes. 2. Survey, sample, and variables 2.1. Farm household survey and sample The data sets come from a nation-wide survey collected annually by China’s Research Center for the Rural Economy (RCRE) of the Ministry of Agriculture. The survey sample method was as follows. In each province, RCRE stratified counties into 7 three groups by income level. In each selected county, villages were randomly selected; the number of villages varied by the size of the province. Per village, 40-120 households were randomly selected; the number varied by the size of the village (Benjamin et al., 2005). We use data from five provinces (Heilongjiang, Shandong, Sichuan, Hunan, and Jiangxi) from the 31 RCRE annually surveyed provinces. I chose these five because they represent a diversity of rural settings: intensive horticulture, plains rice, mountainous mixed-product farming. I use two years of data, 2005 and 2006. These were selected to predate the 2008 earthquake, which affected Sichuan deeply, and the 2007-08 financial crisis, which affected all the provinces. Also, sample villages in the RCRE survey were changed after 2006, so using 2005 and 2006 allows me to have a two-year balanced panel of households. For each year, the sample consists of 2,560 rural households from 59 villages. 2.2. City classification and sample In China, a province is divided into several “prefectural cities,” the second level administrative division, also called the capital city of the prefecture. The country is divided into 344 prefectural “cities” each of which has an urban center as well as surrounding areas that are further divided into several counties. Each county has a county center and the surrounding rural areas that are divided into townships. Typically, the urban center of a “prefectural city” is much bigger than a county center inside the same prefectural city. In this study, I use “prefectural city” as a proxy for “city,” since the data from county-level cities are very limited (Au and Henderson, 2006). To include the 8 potential impacts from all urban areas, I take all prefecture level cities into consideration. The size of the city is measured as the total urban population of the “prefectural city”. Only the registered urban population is included; the registration is urban “hukou,” a kind of citizenship certificate for urban residents.. So in a sense, the city in this study is a collection of urban areas (urban center and the county centers) within a prefectural city, not a city in a strict sense. While this is not an ideal measure of a city, I believe it is a reasonable proxy. First, migrants typically answer with the name of a “prefectural city” when they are asked about their migration destination (e.g., Dongguan, Guangzhou, Shenzhen and Wenzhou, prefectural cities, are among the most common migration destinations).1 Second, it is reasonable to argue that the core urban center of a prefecture city is a more populated area and plays a more important role in attracting migrants. Evidence shown in Fan and Scott (2003) suggests a positive correlation between a large city region, industrial agglomeration, and economic performance, and thus attraction for rural migrants. The issue for the measure however is that the county level cities are left out of the analysis. Therefore, to control for the impacts from lower (e.g. county) level cities, I include the distance from the sample village to the nearest railway and long-distance bus station, which tend to be located in county and township level cities. Chinese cities are officially classified into five types by resident population; e small (<0.2 million); medium (0.2-0.5 million); large (0.5-1 million); extra-large (1-5 million); 1 Using the urban population of prefectural level cities in China to reflect the city proportion is typically used in the literature. See for example Au and Henderson (2006). 9 and extra-extra-large (>5 million) with the last category being added in 2014.2 This resident population includes rural migrants staying in the city for more than six months. The latter is endogenous. Thus, to get an exogenous definition of city size, I use the population of permanent urban residents (hukou registered urban population). My definition is more exogenous as it is not temporary, since “hukou” is passed from mothers to children (Li and Gibson, 2013). The distribution of these cities by both official classifications (the five noted above) and my own classification of city sizes are shown in Table 1.1. To obtain comparable sizes of sets, I made some combinations. I combined the officially defined set of small and medium cities into one category comprising 152 cities, what I call the “smaller-city” group. The latter range from 0.03 million to 0.99 million, but only 20% of this group have fewer than 0.3 million (and thus would be considered “towns”). I also group the officially defined extra-large and extra-extra-large cities into what I call a “mega-city” group since there are few cities in this group. According to my definition, overall, 44% of the cities in China are in the “smaller cities” group, but their total population is only 15% of the overall urban population. 2.3. Definition of urban proximity We measure urban proximity as the spherical distance (shortest distance between two points on the earth) from the center of the village to the center of a city. This is calculated using QGIS software. I do not use travel distance, since it will be endogenous in my case, and it is hard to find a valid instrumental variable. For example, the construction of roads 2 http://www.gov.cn/zhengce/content/2014-11/20/content_9225.htm 10 between villages and cities is due partly to economic or political factors, such as to attract the settlement of firms in the suburban and rural areas, which will also affect rural households' participation in and earnings from nonfarm activities. (Also, as mentioned in Volpe Martincus et al. (2017), even using historical roads would be hard to meet the exclusion restriction required by a valid instrument variable.) We examine the effects of six categories of distance variables. The first three types of distance are “unweighted physical distances”. The first is the one most commonly used in the literature (such as in Escobal, 2001, the distance to the nearest city/market, regardless of the size of city). Apart from the nearest city, researchers also use the distance to the county’s capital city (de Janvry et al., 2005) or provincial capital city (Gibson, 2000) or largest city of the country (Deichmann et al., 2009). The latter citytype specific measures assume that the pull effects from these cities are higher than other cities near to the household. The second category is the distance to the nearest city for each of three sets of cities (“smaller,” “larger,” and “mega” cities). This measure has been used for example by Jonasson and Helfand (2010). The third category is called the incremental distance method, used in Partridge et al. (2008). It is based on the assumption that cities in the higher tier (for example mega cities) offer goods and services that are not available in the lower tier (for example smaller cities). The distance to the nearest city of any size is the first distance controlled for. Then, the additional distance to the nearest city in the next highesttier is controlled for, and finally, the distance to the nearest city in the highest tier is controlled for. Incremental distances to larger and mega cities may be equal to zero. 11 However, results from Fafchamps and Wahba (2006) and de Janvry and Sadoulet (2001) imply that the above nearest distance methods may ignore the potential impacts from other cities (beyond the nearest but within a certain perimeter). Therefore, I introduce three more measures to capture the potential effects from all cities in the country on any rural village (and household). But unlike Fafchamps and Wahba, and de Janvry and Sadoulet, I do not limit the maximum travel time, for three reasons. First, in China, several cities, in particular larger and mega cities, even when they are far from a village, may also have some impact on nonfarm employment of rural households in that village. For example, several cities in Guangdong province (such as Shenzhen and Dongguan), instead of the nearest or capital city (Guangzhou), are attracting lots of rural migrants. Second, limiting the travel time or distance by any size is too subjective, as it might exclude cities potentially important to a certain village. Third, massive investments in trains and highways in the past several decades have made it practicable and even relatively cheap for a villager to reach any major city in China. My measures also contain distance variables to each of the three city categories (smaller /larger/mega cities). However, instead of including a single distance from the sample village to the nearest city in a given category, I calculate the composite distances using all (near and far from the village) the cities in a given category. The composite distances are calculated using different weights. In the fourth measure the distances are weighted by the populations of cities (with the weight being the share of a city’s population in total urban population). In the fifth measure the distances are weighted by GDP from the secondary sector (manufacturing and construction, which for simplicity I will just call manufacturing hereafter). The sixth measure weights distances 12 by GDP from the tertiary sector (which I term the service sector hereafter). The GDP weights are the GDP of a city in the aggregate GDP of cities for that sector. I include the fifth and sixth measures by GDP to give greater weight to cities with higher GDP values from manufacturing and service sectors. To show how I have done the weighting, take for example the measurement of population-weighted distance. The composite distance from village j to nine mega cities =   where 𝑤𝑒𝑖𝑔ℎ𝑡! = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛! ! !!! 𝐷!" ∗ 𝑤𝑒𝑖𝑔ℎ𝑡!   ! !!! 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛! The measures of distance to other type of cities and distance weighted by GDP by sectors are similar. All methods used to calculated distance are at village level. Table 1.2 summarizes the statistics of the six measures. 3. Descriptive statistics 3.1. Distances of villages from cities of different sizes The spatial distribution of villages (in terms of distances from various types of cities) is shown in Table 1.3. The first two columns in Section I (the horizontal section) show the distribution of villages by the shortest distance to a nearby city without regard to the size of the city (M1). The remaining columns of that table section show the distribution of villages by the shortest distance to a nearby city by size (smaller, larger, and mega city) (M2). The first part of section II of Table 1.3 shows the distribution of villages by distance to all cities by size, with the distance being weighted by the population of the city (M4). The following two parts show the distribution of villages by distance to all cities by size. The distance is weighted by GDP from the manufacturing sector (M5) and 13 from the service sector (M6). The distribution of villages from the third measure (M3) is explained in the notes to the Table. Several points stand out in Table 1.2 and Table 1.3. First, contrary to conventional wisdom that rural people live far from cities, 80% of the sample villages are only 80 kilometers away from their nearest city of any size. All sample villages are within 160 kilometers, a few hours, from some city. As one moves over the distance categories from 0-40 to 120-160 it is interesting to see that there is no clear pattern of city size at the various distances, and all four are in a rather tight range averaging from 1 to 2 million. Second, compared to the average distance to the nearest smaller cities (119km), more sample villages are closer to a larger city (81km). Yet 86% of the sample villages do not have a “nearest” mega city within 160 kilometers. Third, once I relax the nearest-city condition and include all cities, the populationweighted distance shows that almost all sample villages can access a larger city or mega city within 2500 kilometers, yet over 10% of sample villages cannot reach a small city within 2500 kilometers. The reason for this is that as I include all cities and group them into three categories by size (recall Table 1.1), a larger number of cities with larger sizes but still within the small city category are far from villages compared with smaller cities within the small city category. Fourth, the weighted composite distances (M4, M5, and M6) show that a high share of villages are within 1000 kilometers of larger cities (27% in M4, 39% in M5, and 36% in M6) and mega cities (47% in M4, 49% in M5, and 41% in M6). 14 Fifth, results of the incremental distance measure show that 36% of the sample villages have a “nearest city” belonging to the category of smaller cities, and 63% have a “nearest city” belonging to the category of larger cities. 3.2. Sample characteristics Table 1.4 shows household income composition. Note that average household income increased by 9% from 2005 to 2006 (compared with overall GDP/capita growth, of 11%). The income from own-farming decreased slightly (2%), while nonfarm incomes (including agricultural wage earnings) increased by 21%. The average rural household has income very diversified beyond agriculture: in the two years analyzed, agriculture is only 44% and 39% of household income, respectively. Table 1.5 shows for each of the two years the decomposition of nonfarm employment by location using individual (not household) level data with observations about the location of employment and how many days each family member worked outside the home and stayed at home. Several interesting points emerge. First, the results are consistent with the main body of literature that finds a high participation rate of rural labor in nonfarm employment in developing countries. The data show that 74% in 2005 (and 76% in 2006) of rural labor participated in some type of nonfarm employment, compared with slightly lower for participation in own-farming (71% in 2005, 68% in 2006). Second, while much of the literature on nonfarm employment in China focuses on out-of-province migration, my results show that only 27% of people participating in some type of nonfarm activity is working in another province. Among 83% of people doing 15 nonfarm activities within their own province, 57% of them even undertake the nonfarm activities within own village, and 43% of them work outside their own village but within their own province. Even for people doing nonfarm activities outside of their own villages, a substantial share of labor works locally. For instance, 39% of rural people work within their own county (22% in another village but within their own township, 17% in another township but within their own county), 21% of them work out of their own township but within their own province, and only 40% of them work in another province. Based on these results, I classify nonfarm income into three categories. The first is “local”. This is income earned from nonfarm activities within one’s township. The second category is “local migration”, being income earned from nonfarm activities outside of one’s own township but within the province. The third category is “migration”, being income from activities outside of one’s own province. Table 1.6 shows nonfarm income composition based on location. Several points are salient. First, local nonfarm income is substantial: the share of local nonfarm income (36%) is similar to the share of migration income (38%) in the households’ total nonfarm income. If I add income earned in another township in the household’s own county as local nonfarm income, then local nonfarm income is 49% of total nonfarm income - 13% higher than migration income. Second, Table 1.5 shows that 61% of nonfarm laborers are employed within their own township, while Table 1.6 shows that only 36% of total nonfarm income is from within-township employment. The results suggest that participation is widespread 16 perhaps because entry barriers and capital requirements for this local nonfarm employment are low – but the earnings are also relatively low. Third, local-migration income is also substantial. While only 17% of nonfarm rural laborers have jobs outside their township and within their province, income from these jobs forms fully 26% of total nonfarm income. Table 1.7 shows the characteristics of sample households and villages. Household heads are late middle aged, and almost all are male. The highest education is 9 years (after middle school). The farms are small. The average size of owned land under cultivation per household is 6.2 mu (0.41 hectare). Households are generally close to transport facilities: the mean distance between the sample villages and a paved road is 1.5 km, to the nearest railway station, 38 kilometers, and to the nearest long-distance bus station, 9.4 kilometers. A village’s agricultural performance is posited to be a critical factor affecting the nonfarm economy, both locally and non-locally. This occurs either by pull effects (such as intersectoral linkages) or push effects (such as poor performance driving farm households to diversify incomes into nonfarm activities to manage risk or cope with farm sector shocks). I define a village’s agricultural performance as the village output in value terms per mu of all crops. On average, the latter is around 750 RMB per mu ($1125/ha at the 2006 exchange rate). But there is substantial variation over villages. Poorer performance is a push factor in local rural households seeking nonfarm employment. One could associate this case with Heilongjiang, with the crop composition focus on lower value products (rice). By contrast, high performance with its pull factors for nonfarm 17 employment can be associated with Shandong, with its high share of high value cropping (horticulture in particular). Some other village level variables that are hypothesized to affect rural households’ labor choices are included, such as the degree of development of the local land rental market. The existing evidence shows that households are more likely to obtain their main income from migration if the land rental market is developed so they can migrate and rent their farmland out while they are away (Deininger and Jin, 2006, Jin and Deininger, 2009). To avoid the potential reverse causality of nonfarm incomes on the household’s land renting decision, I use a proxy for the village-level land rental market in the controls. 4. Model and estimation methods 4.1. Model The analysis spatializes a labor supply function of a rural household with n types of labor characterized by geography and sector. The explained and determinant variables are as follows. The explained variables are quantities of labor supply to various locations and sectors. Following Barrett, Reardon, & Webb (2001), the relevant n types are a cross between spatial distribution of the work performed by the rural household (local versus migratory employment) and the sectoral distribution (agricultural versus non-agricultural employment). Non-agriculture includes services and manufactures, which I lump in the analysis of employment. The latter could be further disaggregated into the functional distribution (self-employment versus wage employment) but I lump those. The determinants in a labor supply function are generally incentive variables (such as relative urban/rural wages) and capacity variables (such as education). I apply that 18 general framework with specific incentives capturing urban employment “pull” by distance to various size categories of cities controlling for access to earnings from ownfarming and intersectoral linkage employment by including village “agricultural performance” measured as crop income per acre. The capacity variables are farm assets (such as land). . The basic econometric model used is an application of the above: Yijt =α + 𝛽1Smallj + 𝛽2Largej + 𝛽3Megaj +𝛽4 APjt + δZijt + γyear + φprovince +vj + cij + εijt , (1) The specific variables are shown in the Table 1.A1, and summarized here. The Yijt are either the level of nonfarm income or the share of nonfarm income in total household income of household i in village j in year t. Nonfarm income is further divided into local nonfarm income (earnings from nonfarm activities within the household’s own township), local-migration income (outside of their own township but within their own province) and migration income (from outside their own province). The determinants include a set of distance variables, Smallj, Largej, and Megaj, measuring distances from the center of village i to a smaller city, a larger city, and a mega city. As noted above, the regressions also explore variations on the simple distances with several weighted distance measures. Other determinants are as follows. APit is village-level agricultural performance, measured as the value of total output per acre of cultivated land in village i. Zjit is a vector of individual, household, and village level controls, the descriptive statistics for which are shown in Table 1.7. γyear are time-fixed 19 effects and φprovince are province-fixed effects. vj and cij are village- and householdspecific unobserved characteristics. εijt is a stochastic error term. 4.2. Estimation issues and methods While one can safely assume the exogeneity of the physical distances between the village center of any given rural household and cities of different sizes3, it cannot be ruled out that some unobserved household traits (cij), such as ability, risk aversion, and experience, and village unobserved factors (vj), such as history, shocks, and local policies, are correlated with household labor supply choices and village-level agricultural performance. The existence of vj and cij causes the OLS estimates to be biased and inconsistent. Ideally I would use a panel fixed effects model (FE) to remove vj and cij. Unfortunately, neither a household FE nor a village FE method can be used to estimate equation (1) because the distance variables are time-constant. Hence, I use the correlated random effect (CRE) estimation method, following Chamberlain (1979) and Mundlak (1978). The CRE method relaxes the assumption of strong independence of unobserved stochastic errors from the random effects model, and provides fixed effect model estimators of time-variant variables and random effect estimators for time-constant variables. Specifically it is assumed, 𝑐!" =  λ𝒁  !"   +   α!" , and 𝑣! =  η𝑨𝑷  !   +  σ𝒁  !" + µμ! 3 An average rural household typically does not move from one village to another village. Temporary migration has been the dominant type of migration in China so a large majority of migrants still belong to the original villages. 20 The CRE model is obtained by substituting cij and vi in equation (1) by mean values of time-varying household and village level variables. Specifically, the CRE model estimates the following equation, 𝑌!"# = 𝛼 + 𝛽! 𝑆𝑚𝑎𝑙𝑙! + 𝛽! 𝐿𝑎𝑟𝑔𝑒! + 𝛽! 𝑀𝑒𝑔𝑎! + 𝛽! 𝐴𝑃!" + 𝛿𝒁𝒊𝒋𝒕 +𝜸𝒚𝒆𝒂𝒓 + 𝜑!"#$%&'( + λ𝒁  !" + η𝑨𝑷  ! +   𝜀!"# , (2) Equation (2) can now be estimated consistently by OLS. Since the key variables of interest are measured at the village level and there are multiple household observations for a given village, I follow Wooldridge (2003) and correct the standard errors by adjusting for the clustering effect at the village level. The statistical significance of estimates of λ and η would signal the existence of unobservables. The key coefficients of interest in equation (2) are the β’s. In a different set of estimations, the first equation is augmented by including the interaction terms of APjt and three distance variables (smallit, largeit and megait), to test the potential multiplier effects of village agricultural performance. As in the case of equation (1), I adopt the CRE model to obtain consistent estimates of the augmented model. The equivalent CRE model for the augmented model can be written as, 𝑌!"# = 𝛼 + 𝛽! 𝑆𝑚𝑎𝑙𝑙! + 𝛽! 𝐿𝑎𝑟𝑔𝑒! + 𝛽! 𝑀𝑒𝑔𝑎! + 𝛽! 𝐴𝑃!" + 𝜃! (𝑆𝑚𝑎𝑙𝑙! ×𝐴𝑃! ) + 𝜃! (𝐿𝑎𝑟𝑔𝑒! ×𝐴𝑃! )+𝜃! (𝑀𝑒𝑔𝑎! ×𝐴𝑃! )+𝛿𝒁𝒊𝒋𝒕 + 𝜸𝒚𝒆𝒂𝒓 + 𝜑!"#$%&'( + λ𝒁  !" + η𝑨𝑷  ! + 𝜏! (𝑆𝑚𝑎𝑙𝑙! ×𝐴𝑃! ) + 𝜏! (𝐿𝑎𝑟𝑔𝑒! ×𝐴𝑃! )+𝜏! (𝑀𝑒𝑔𝑎! ×𝐴𝑃! ) +   𝜀!"# , (2’) 21 Like equation (2), equation (2’) can be estimated consistently using OLS. The key coefficients of interest in equation (2’) are the β’s and θ’s. 5. Empirical results 5.1. Impacts of urban proximity Because of space limitations, the full results for income levels are reported in the tables in the Appendix. All the estimations include year- and provincial-level fixed effects, with errors clustered by villages. The results of demographic variables and village level controls are consistent and robust to different measurements of distances. Table 1.A1, Table 1.A2, and Table 1.A3 show that the highest education level of family member, the size of the household, access of the village to a railway and a longdistance bus, and the village-level land rental market are all important to household income levels. Several results stand out. First, as expected, more education increases migration income; a bigger household (and thus more labor available) increases not only migration income but also localmigration income. Second, the effect of distance to the nearest long-distance bus station on local nonfarm income is significantly negative, which is robust to different measures/models (see Table 1.A1). Third, households in villages with higher shares of rented lands have higher migration incomes, consistent with land rental market studies (Deininger and Jin, 2005; Chernina et al., 2014). Rural households in villages with more active land rental markets could rent 22 their lands out easily and thus are less restricted in their labor movements. Also, land rental may generate cash for households to undertake migration. The key variables of interest are impacts of urban proximity on income levels and shares, which are shown in Table 1.8 and Table 1.9, respectively. The results are generally robust to both income levels and shares using six different measurements of distances. In Table 1.8, the results using the physical distance measurement are shown in the first three columns (Models M1, M2 and M3), and the results using the composite weighted distance measurements are shown in Models M4, M5 and M6. The distance variables in M1 and the first distance variable in M3 are measured as the distance to the nearest city of any size. Their impacts are similar. But there are no significant impacts of incremental distances on local-migration incomes. Recalling the descriptive patterns in Table 1.3, section M2, I surmise that the reason for insignificance of the nearest city on local nonfarm income and agriculture income in the regressions may be because large share of the sample villages have the nearest city as a larger city instead of smaller city. If the nearest cities of different size categories are considered, only the nearest one of any size will affect non-local nonfarm income levels and shares. Rural households close to a city of any size have higher local-migration nonfarm incomes while having lower migration income in both levels and shares. Thus these results emphasize the importance of nearest cities on nonfarm employments, which are consistent to existing findings using nearest cities. Moreover, the results also suggest that being close to a larger city, when other factors are controlled for, will increase localmigration nonfarm employment. However, from these measures, including nearest cities 23 only, I do not find any other linkages (e.g. rural nonfarm -small cities, and migration mega cities). The results using the nearest distances to cities of the three size categories are shown in model M2. The variance inflation factors of the three distance variables are 4, 2 and 7, so we do not worry about multicollinearity if they are together present in the same regression. The results provide some supplemental points to results in M1 and M3. Specifically, households closer to the nearest larger cities have higher local-migration income levels and shares. This is line with the findings above. Moreover, Table 1.9 shows that once we control for distance to the nearest cities of the three different sizes, a household being closer to a smaller city will not only increase the share of local-migration income but also the share of local nonfarm income. This implies linkages between small cities and local nonfarm employment. As noted above, even if sample villages are close to large cities, being also close to small cities increases income from local nonfarm employment. We surmise that local nonfarm employment requires lower threshold investments and transaction costs, as well as lower skill requirements compared with participation in migratory nonfarm employment. In addition, small cities tend to rely more closely on their rural hinterlands than do big cities, both for food and concomitant nonfarm activities such as transport and wholesale and processing (Berdegué and Proctor, 2014). The small city might itself be home to agroindustries and agricultural services closely linked to the surrounding rural areas and employing rural workers. The results using population- and sectoral GDP-weighted distances, including all cities, rather than the nearest ones only, are reported in the right three columns (M4, M5 24 and M6) in both tables. Unlike the results in M1-M3, where only partial linkages (small city vs. local nonfarm income or larger city vs. local-migration income) could be observed, several significant results with more detailed information emerge regarding M4-M6. First, the results using weighted distances show that the impacts of distance to cities of different sizes are significantly different. Table 1.8 shows that rural households close to smaller cities have higher agricultural income. In addition, being close to smaller cities increases income from local nonfarm employment, while being close to larger cities increases local-migration nonfarm income. Being closer to mega cities increases migration income. We surmise that the observed small city - agricultural employment linkage is similar to that discussed above for small city-local nonfarm rural employment linkages: the small city depends on its rural hinterland and that market spurs agricultural production with attendant effects on farm employment and nonfarm-farm linkage employment. Moreover, it is probable that the link observed between proximity to mega cities and migration income is that mega city proximity reduces the transaction cost to migrate, and the mega city’s employment opportunities and relative amenities are easily observable by nearby rural households. Second, effects of distances are differed by type of weight. Results using populationweighted (M4) and manufacturing GDP-weighted distance (M5) are identical for all types of incomes, while the effects of distances on local and migration nonfarm incomes are not significant using service GDP-weighted distance (M6). These suggest the pull effects from larger cities on local-migration employment might operate through the effect of the service sector, rather than manufacture or construction sectors of large cities. 25 By contrast, the pull effects from smaller (mega) cities on local (migration) incomes are especially from the manufacturing sector of small (mega) cities. Compared with findings for international migration, my mega city result is opposite to the argument that the growth of the global cities would increase low-paid service jobs (Sassen, 1991), supplied for example by the arrival of immigrants from poor countries into London (Gordon and Kaplanis, 2014). But my findings are consistent with studies that show mega cities serving as the base for clusters of advanced services firms, which demand skilled workers (Taylor et al., 2014). Moreover, in the domestic migration literature my findings concerning the effects of mega cities are consistent with those of Heath and Mushfiq Mobarak (2015), who find that female migrants originating from close to Dhaka are more likely to be involved in manufacturing occupations in Dhaka than those coming from far from the city. To further resolve the question of why, controlling for distance, the service sector appears to be the major inducement for rural households to migrate to larger cities, but the manufacture sector is the main factor (inferred from the effect of its weight) for small as well as mega cities, I would need to have detailed information about those sectors in the receiving cities and to analyze the capacity determinants (such as skills) of the rural households migrating to different city sizes and choosing manufactures versus service jobs in those cities. I lack both these kinds of data to pursue that analysis. I can only note a hypothesis for future research that it may be that low skill-requiring agroindustrial jobs are being sought by rural households in small cities, and higher paying jobs in urban factories in mega cities. The latter jobs probably require more skill to enter. By contrast, larger cities may have more developed service sectors that at the same time are low skill- 26 requiring; this may be for example where large cities are ports or transport hubs and the migrants go there to seek jobs as drivers and loaders. As the models using weighted distances (M4, M5 and M6) have better goodness-offit, and the former models may suffer from ignoring cities of other sizes or potential effects problems, to explore how the distance and agricultural performance will jointly impact the incomes I use below the latter three methods (M4, M5 and M6) 5.2. Interacted effects from urban proximities and agricultural performance The results on both income levels and shares are reported in Table 1.10. Only the results for migration incomes are discussed here, as they alone among the results are significant for interacted effects. Table 1.10 shows that using composite weighted distances, the signs and significance of distance estimators in M4 and M5 alone are similar to the results without interactions. The exception is that the effects are diminished with respect to local agricultural performance. To wit, households have more migration income (levels and shares) if they are closer to a mega city, but if they are living in the villages with better agricultural performance, the effects of proximity to mega cities are smaller. This re-emphasizes (along with the previous model’s results) the importance of proximity for rural workers’ migration to mega cities. Moreover, while I expected from the traditional development literature to find that there would be a strong effect of agricultural performance on local nonfarm employment due to intersectoral linkage effects, instead I find that local agriculture performance has a positive effect on migration income (both level and share) in M4 and M5. That is, 27 households in rural areas with better local agriculture performance have higher migration income levels and shares, and these effects are magnified by being far from mega cities. This is an interesting result for several reasons. On the one hand, much of the literature (such as Taylor et al. 2003 for China) shows the effects of migration on farming (such as investments from remittances into farm capital that can be especially important where rural credit markets are constrained). On the other hand, some literature tests for the effect of farming conditions (land tenure and land holdings, and farm productivity and other measures of performance such as the net income from crops) on the decision to migrate. The traditional hypothesis is that poor farming conditions drives migration as a way to generate alternative income (Massey et al. 1993). Other work (such as de Janvry and Sadoulet, 2001, for Mexico) finds that farm performance actually promotes migration because better-off farmers have the initial wealth to make the investment in migration (transport, set-up at destination, and so on); having one’s own initial wealth source has been found to be important in the situation where rural credit markets are constrained. Also, as most migrants are from Sichuan, Hunan, and Jiangxi, where some remote areas are very poor, better-off agricultural performance areas there might have an especially important effect on enabling migration investment. To my knowledge, this finding is the first time in the literature to test for a relation between distance from cities and the effect of agricultural performance on migration. Interestingly, distances and agricultural performance are jointly significant in M6 using distance weighted by GDP from the service sector. However, the results in M6 are contrary to those in M4 and M5, which may be because migration employment is mainly 28 via the effects of the manufacturing sector in the mega cities, instead of the service sector, as noted above. 5.3. Robustness check In this section, I conduct two types of robustness checks. One is to rerun the above estimations using different definitions of nonfarm incomes, and another is to consider the potential concerns of missing variables in the above estimations. 5.3.1. Different definitions of nonfarm employment On the one hand, it could be argued that the results might be sensitive to the definitions of nonfarm employment by three sources (local vs. local-migration vs. migration). I further explore the results in three other ways, used in some literature. First, instead of defining local nonfarm employment as within the township, I include as “local” the nonfarm employment in another township but within the county. Second, for local-migration, instead of defining all people doing nonfarm employment outside their township and within their province as local-migration, I define income earned by commuters as local-migration income. Similar to the local-migration definition I use above, commuters are defined as rural people working outside of their own township and within their own province, but different from the definition I use above, I class commuters as being only those rural workers returning home every evening (zaochu wangui, or “morning leave evening return”). Third, instead of defining all people working outside their own province as migrants, I separate migrants into two subsamples based on the number of months per year the rural 29 worker stays at home. The first and second types of migrants are those working outside their own province but staying outside for more than three and six months, respectively. Table 1.11 compares the results for local nonfarm income with different definitions of local-migration employment. The upper (in the table) results are based on the withintownship definition and the bottom results are based on the within-county definition. Despite my varying the definitions, the signs and significance of results using M4 and M5 discussed above remain unchanged. Table 1.12 compares the results for local-migration income levels. The upper (in the table) results repeat the previous results where I had defined local-migration as working outside one’s own township but within one’s own province (local-migrants). The middle (of the table) and bottom parts are the new results. The middle section shows results including commuters only and the bottom results only include non-commuters (local migrants excluding commuters). The results show that the service sector GDP effects (M6) from larger cities are only significant for commuters. Aggregation effects (the model using composite distance weighted by population) (M4) and manufacturing sector weight effects (M5) from larger cities are only significant for non-commuters. For non-commuters, their situations in larger cities may be similar to rural migrants to mega cities, and thus they are more likely to undertake jobs in the manufacturing sector, which may be relatively undemanding in skills. However, for commuters, they may be likely to undertake jobs that involve transiting between rural and urban areas as part of the service itself, such as in jobs in transport and commerce. Yet rural residents might commute rather than take jobs in cities that require long hours and inflexible schedules, such as in factories or construction 30 sites, because they are bound in some way to the rural areas. The latter might be due to having to care for their farm land, or care for children and old parents. Table 1.13 compares the results of population (M4) and manufacturing GDPweighted (M5) distances on migration income (levels and shares). The M4 results are in the upper part of the table and the M5 in the bottom. The first two columns show the previous results including all migrants working outside their own province. The middle two columns include migrants working outside their own province for more than three months and the left two columns include migrants working outside their own province for more than six months. The significances, signs, and even values are robust for all types of migrants using these different methods. The distance to mega cities has negatively significant effects on migration income (levels and shares) for all types of migrants. 5.3.2. Estimations with additional controls A potential concern is that even though I controlled for the city effects by size and potential urban effects from all cities using a composite distance, I may underestimate the effects of the nearby cities. To avoid missing effects from nearby cities, I add specifications with additional dummy variables in the regressions: whether there is a nearby small city within 100 km; whether there is a nearby large city within 100 km; and whether there is a nearby mega city within 300 km. Table 1.14 compares the results of population (M4), manufacturing GDP-weighted (M5), and service GDP-weighted (M6) distances on income levels by four sources. Both results with and without the three additional dummies are reported. 31 The results from the robustness check columns of agricultural, local nonfarm, and local-migration nonfarm incomes show that controlling for the composite distances, having a city nearby decreases non-migratory income levels. This implies that for nonmigratory employment, it is not the nearby city alone, but total urban effects around the rural areas that matter. Specifically, most of the results on composite distances are consistent with my previous findings. Being close to the smaller cities increases incomes from agricultural and local nonfarm employment, due to population and manufactures aggregation effects. Being close to large cities increases income from local-migration nonfarm employment, due to population and services aggregation effects. The results on migration nonfarm incomes are not robust in the two models. Models with the additional three dummy variables suggest that once we have controlled for the composite distances, having a small or a large city within 100 km increases rural households’ migration incomes. However, once we add the interactions between local agricultural potential and distances, the results from the bottom part of Table 1.14 remain consistent with the previous results. This shows that controlling the nearest cities by sizes, local agricultural performance still has positive multiplier effects on migration incomes. Specifically, agricultural performance increases migration incomes, particularly for rural people far away from mega cities. The reasons are discussed above. 6. Conclusions This paper seeks to contribute to the literature by examining the labor supply choices of rural households in China as a function of distance from mega versus secondary cities, controlling for the agricultural performance of the zones and household characteristics. It also seeks to contribute by moving beyond physical distances to distances associated with 32 economic influence on household decisions of a city, including population and sectoral GDPs. Three results stand out. First, access to cities, measured in various ways related to distances, affect rural household labor supply decisions to agriculture and non-agriculture. However, the impacts differed by income sources and city sizes. In summary, I find three significant linkages. The first set of linkages is between smaller cities and agriculture as well as local nonfarm employment, which may be due to production and expenditure linkages between rural and smaller cities. The second linkage is between larger cities and local-migration employment. The third linkage is between mega cities and migration employment, which probably is due to expected wage differences. Second, pull effects from cities of different sizes on nonfarm income levels may come from distinct sectors. Specifically, pull effects from manufacture and construction sectors are important to all types of nonfarm incomes. However, it seems that rural commuting nonfarm labor is more likely to be attracted by the pull effects from the services sector in larger cities. This might be due to the types of jobs undertaken by commuters, who probably prefer jobs linking rural-urban areas, like traders, or prefer jobs with flexible time schedules due to labor constraints to their time in the rural areas. Third, I find strong multiplier effects of local agricultural performance on migration employment. This emphasizes the importance of not only urban proximity but also local agriculture, which will help the poor to overcome credit constraints to diversify income. My results shed some light on how important it is to take potential cities that may influence household’s income diversification into consideration, rather than including the nearest city only. The work points possible directions for future work. Due to the 33 limitation of the data, I do not separate the nonfarm employments further, as by industry or by low-high returns. With more detailed information on jobs chosen by rural labor would make it feasible to clarify the linkages between city effects by sizes and sectors and income diversification. 34 APPENDIX 35 APPENDIX Table 1.1. Distribution of cities and population statistics by size of cities City Total Population City Population Type (million) Number Share Population Share Government’s Classification Small 18.24 0-0.5 64 18.6 3.3 Medium 66.82 0.5-1 88 25.6 12.1 Large 380.56 1-5 183 53.2 68.6 XL 40.75 5-10 6 1.7 7.3 XXL 48.2 >10 3 0.9 8.7 Total 344 100 554.57 100.00 Our Classification Smaller 85.06 0-1 152 44.2 15.3 Larger 380.56 1-5 183 53.2 68.6 Mega 88.95 >5 9 2.6 16.1 Total 344 100 554.57 100.00 Note: population is number of people registered as urban citizen (chengshi hukou) 36 Table 1.2. Distance measures and village level descriptive statistics Method (M) Mean Std. Dev. Min Max I. Physical Distance (including nearest cities only) M1 to nearest city of any size 54 34 10 142 M2 M2 M2 to nearest Smaller city to nearest Larger city to nearest Mega city 119 81 493 78 49 424 10 12 42 324 313 1,896 M3 M3 M3 to nearest city of any size (same as M1) incremental distance to Larger city incremental distance to Mega city 54 20 418 34 37 414 10 0 0 142 219 1,824 II. Composite Distance (including all cities, with different weights) M4 to all Smaller cities (Population weighted) 1,587 486 M4 to all Larger cities (Population weighted) 1,320 411 M4 to all Mega cities (Population weighted) 1,243 471 1,222 881 864 3,067 2,534 2,653 M5 M5 M5 to all Smaller cities (Manufacturing GDP wgt) to all Larger cities (Manufacturing GDP wgt) to all Mega cities (Manufacturing GDP wgt) 1,504 1,275 1,246 448 426 473 1,172 788 845 2,905 2,418 2,629 M6 M6 M6 to all Smaller cities (Service GDP weighted) to all Larger cities (Service GDP weighted) to all Mega cities (Service GDP weighted) 1,465 1,309 1,265 533 418 430 1,071 851 824 3,012 2,437 2,518 (N=59) Measures of Distance 37 Table 1.3. Distribution of sample villages by measure of distance City observations (Village #=59) All Cities N=344 Village % Ave. Pop. Smaller Cities N=152 Village Cum. % % Larger Cities N=183 Village % Cum. % Mega Cities N=9 Village % Cum. % I. Physical Distance (Including Nearest Cities Only) M1. Distance to the nearest city M2. Distance to the nearest city, by size 34 1.4 14 14 20 20 0 0-40 44 1.9 25 39 38 58 3 40-80 15 1.1 17 56 18 78 5 80-120 7 1.9 10 66 15 92 5 120-160 0 / 34 100 9 100 86 >160 0 3 8 14 100 II. Composite Distance (Including All Cities) M4. Distance weighted by population / / 0 <1000 / / 69 1000-1500 / / 8 1500-2000 / / 7 2000-2500 / / 14 2500-3000 / / 2 >3000 0 69 78 85 98 100 27 42 15 14 2 0 27 69 85 98 100 100 47 29 3 19 2 0 47 76 80 98 100 100 M5. Distance weighted by GDP from manufacture sector <1000 / / 0 0 39 1000-1500 / / 75 75 22 1500-2000 / / 3 78 27 2000-2500 / / 15 93 12 2500-3000 / / 7 100 0 >3000 / / 0 100 0 39 61 88 100 100 100 49 27 3 19 2 0 49 76 80 98 100 100 M6. Distance weighted by GDP from service sector <1000 / / 0 0 36 36 41 41 1000-1500 / / 76 76 24 59 34 75 1500-2000 / / 2 78 29 88 12 86 2000-2500 / / 8 86 12 100 12 98 2500-3000 / / 12 98 0 100 2 100 >3000 / / 2 100 0 100 0 100 Notes: Results from incremental distance measurement (M3) show that 21 (36%) sample villages have a nearest city belonging to a smaller city, 37 (63%) have a nearest city belong to a larger city, and one village can reach a mega city as its nearest city. 38 Table 1.4. Distribution of incomes of sample households, by years and sources Year=2005 (N=2,560) HH total income (RMB) ...Farm Income % ...Nonfarm Income % ...Transfer income % ...Other income % Mean 23,121 10,046 10,953 782 1,340 Std. Dev. 20,025 14,888 10,831 1,828 9,882 Min 1,929 0 0 0 0 Max 317,250 241,500 225,000 36,050 303,000 Share of total 100.0 43.5 47.4 3.4 5.8 Year=2006 (N=2,560) Mean Std. Dev. Min Max Share of total HH total income (RMB) 25,348 20,612 1,675 351,512 100.0 ...Farm Income % 9,854 14,733 0 244,000 38.9 ...Nonfarm Income % 13,245 12,419 0 241,000 52.3 ...Transfer income % 955 2,289 0 40,100 3.8 ...Other income % 1,294 8,549 0 306,320 5.1 Notes: Nonfarm income includes income earned from all types of nonfarm activities in any place. Transfer income includes earnings from remittances and cash gifts. Other income includes earnings from interest, insurance, and pensions. 39 Table 1.5. Descriptive Statistics of Family Members (Individual Level) Group of people (Col. A) Days at Home Days doing farm prod. Days doing NFA invil. Days doing NFA outvil. N. Share of people (in col. A) doing: Share (%) Obs. Age Total individual 10,388 Any types of work (labor) 73.1 7,592 40 270 94 40 112 ...Total labor 7,592 70.7 5,364 43 332 133 44 52 ...Total labor 7,592 Any farm productions Any nonfarm activities (NFA) 74.1 5,623 37 240 64 54 152 ......NFA labor 5,623 Any within village NFA 46.6 2,622 42 339 106 116 28 ......NFA labor 5,623 62.7 3,528 .........Out-village NFA 3,528 Any out-village NFA Any NFA in another village 22.3 787 38 321 76 22 173 .........Out-village NFA 3,528 17.6 620 37 263 57 13 215 .........Out-village NFA 3,528 1.8 64 40 250 98 20 173 .........Out-village NFA 3,528 7.7 271 33 135 27 4 252 .........Out-village NFA 3,528 7.5 263 32 167 35 17 234 .........Out-village NFA 3,528 4.1 143 34 73 11 1 291 .........Out-village NFA 3,528 38.9 1,374 29 59 10 2 292 .........Out-village NFA 3,528 Any NFA in another town Any NFA in another county-rural Any NFA in another county-urban Any NFA in provincial capital city Any NFA in another province-rural Any NFA in another province-urban Any NFA in another country 0.2 6 28 79 2 2 293 Total individual 10,324 Any types of work (labor) 73.4 7,577 40 268 91 41 116 ...Total labor 7,577 67.9 5,141 44 329 128 44 56 ...Total labor 7,577 Any farm productions Any nonfarm activities (NFA) 76.4 5,786 38 238 61 55 157 ......NFA labor 5,786 Any within village NFA 46.7 2,702 43 335 103 118 35 ......NFA labor 5,786 63.2 3,657 .........Out-village NFA 3,657 Any out-village NFA Any NFA in another village 21.9 800 39 324 66 19 188 .........Out-village NFA 3,657 16.6 608 38 268 52 10 224 .........Out-village NFA 3,657 1.8 65 42 252 99 23 176 .........Out-village NFA 3,657 8.4 306 34 139 37 7 247 .........Out-village NFA 3,657 6.9 253 32 148 31 19 244 .........Out-village NFA 3,657 5.0 182 34 63 8 3 292 .........Out-village NFA 3,657 39.2 1,434 30 66 9 9 290 .........Out-village NFA 3,657 Any NFA in another town Any NFA in another county-rural Any NFA in another county-urban Any NFA in provincial capital city Any NFA in another province-rural Any NFA in another province-urban Any NFA in another country 0.2 9 28 74 4 3 284 Year=2005 Year=2006 40 Table 1.6. Distribution of nonfarm income of sample households, by year and location Year (N=2,560 for each year) HH Nonfarm income 2005 Mean Share 10,953 100 2006 Mean Share 13,245 100 1. Local ... within village ... outside village within township …sub total 2,622 1,297 3,919 23.9 11.8 36 3,111 1,655 4,766 23.5 12.5 36 2. Local-Migration ... outside township within county ... outside county (rural) within province ... outside county (urban) within province ... provincial capital city ... sub total 1,445 221 626 604 2,896 13.2 2.0 5.7 5.5 26 1,815 159 762 686 3,422 13.7 1.2 5.8 5.2 26 3. Migration ... outside province (rural) ... outside province (urban) ... outside country ... sub total 451 3,663 24 4,138 4.1 33.4 0.2 38 482 4,514 61 5,056 3.6 34.1 0.5 38 41 Table 1.7. Descriptive Statistics of other variables N=2,560 for each year Variable Individual/HH level Age of HH head Share of household head male Highest education level of HH member HH size Children and elderly share within household Cultivated land areas owned by household Village level Distance to nearest railway station Distance to nearest long-distance bus station Distance to main road Share of cultivated land rented in/out Share of labor working out of village Share of HH with electricity Number of phones per person Agricultural performance (AP) Note: 1 mu=0.0667 hectare. 2005 2006 Unit Mean Std. Dev. years % years people % mu 48.8 96.9 8.9 4.1 18.1 6.2 10.5 17.4 2.3 1.4 20.2 9.9 49.7 96.8 9.0 4.0 17.8 6.2 10.4 17.5 2.4 1.4 20.2 9.7 km km km % % % number RMB/m u 38.0 9.4 1.4 7.7 33.2 99.4 0.27 53.2 6.7 1.9 11.8 19.8 3.2 0.11 38.0 9.4 1.5 8.1 33.6 99.3 0.31 53.2 6.7 2.4 11.8 20.3 3.4 0.12 710 511 735 541 42 Mean Std. Dev. Table 1.8. Impacts of urban proximities on levels of income, by sources and by six methods of measurements of distance Composite Distance (weighted (Wgt.)) Physical Distance (M1-M3) M2 Nearest city by size M3 M4 Increment Distance Wgt. by Population 5,120 2,560 5,120 2,560 Agricultural income (ln) Dis_nearest city -0.743 Dis_smaller city / Dis_larger city / Dis_mega city / / 0.247 -0.073 -0.582 Local nonfarm income (ln) Dis_nearest city 0.466 Dis_smaller city / Dis_larger city / Dis_mega city / / 0.128 2.138** -0.809 Distance measurements Observations Number of HH M1 Nearest city 5,120 2,560 Local-Migration nonfarm income (ln) Dis_nearest city -2.882*** / Dis_smaller city / -1.322 Dis_larger city / -2.209** Dis_mega city / 0.656 M5 Wgt. by GDP from Manuf. M6 Wgt. by GDP from Service 5,120 2,560 5,120 2,560 5,120 2,560 -0.759 / -0.056 -0.197 / -36.747*** -39.293 54.303 / -62.638*** -25.840 62.498** / -21.786** -1.553 8.720 0.415 / 0.008 -0.487 / -64.009*** -6.868 53.748 / -107.118*** -43.350 125.844*** / -17.892 35.697 -13.728 -2.840*** / -0.442 0.064 / -8.112 -128.254** 129.353** / -43.783 -73.111** 104.217** / -9.369 -89.180*** 91.384** Migration nonfarm income (ln) Dis_nearest city 1.753* / 1.789* / / Dis_smaller city / 1.970* / 26.567* 75.668*** Dis_larger city / 0.494 0.241 95.073* 75.743** Dis_mega city / 0.115 0.530 -115.785** -136.679*** Note: In column M3, "Dis_larger city" and "Dis_mega city" are incremental distances *** p<0.01, ** p<0.05, * p<0.1 43 / -10.792 -4.325 2.094 Table 1.9. Impacts of urban proximities on income shares, by sources and by six methods of measurements of distance Composite Distance (weighted (Wgt.)) M5 Wgt. by GDP from Manuf. M6 Wgt. by GDP from Service 5,120 2,560 5,120 2,560 / -3.609*** -1.992** 4.535*** / -0.655 0.407 0.082 Local nonfarm income as share of total households income (%) Dis_nearest city -0.017 / -0.018 / Dis_smaller city / -0.045** / -0.478 Dis_larger city / 0.007 -0.002 0.742 Dis_mega city / -0.018 -0.012 -0.335 / -0.377 0.247 0.121 / 0.154 1.259 -1.172 Local-Migration nonfarm income as share of total income (%) Dis_nearest city -0.075*** / -0.073*** Dis_smaller city / -0.065** / Dis_larger city / -0.062** -0.002 Dis_mega city / 0.054 0.017 / 0.376 -0.972 0.608 / 0.206 -1.870** 1.545 Distance measurements Observations Number of HH Physical Distance (M1-M3) M1 M2 M3 Nearest Nearest Increment city by city Distance size 5,120 2,560 5,120 2,560 5,120 2,560 M4 Wgt. by Population 5,120 2,560 Agricultural income as share of total households income (%) Dis_nearest city Dis_smaller city Dis_larger city Dis_mega city -0.009 / / / / 0.028 0.013 -0.017 -0.010 / 0.000 -0.010 / -2.069*** -3.551** 4.731*** households / 0.559 -2.013 1.490 Migration nonfarm income as share of total households income (%) Dis_nearest city 0.078*** / 0.079*** / / Dis_smaller city / 0.072** / 1.755*** 3.101*** Dis_larger city / 0.030 0.009 2.425 1.898* Dis_mega city / -0.004 0.011 -3.642** -4.293** Note: In column M3, "Dis_larger city" and "Dis_mega city" are incremental distances *** p<0.01, ** p<0.05, * p<0.1 44 / 0.408 -0.192 -0.340 Table 1.10. Impacts of local agricultural performance (AP) and urban proximities on migration income levels and shares, by three methods of measurements of distance Composite Distance Distance measurements M4 (Observations=5,120) (Number of HH=2,560) Wgt. by Population Migration nonfarm income (ln) Dis1_smaller city (ln) Dis2_larger city (ln) Dis3_mega city (ln) AP*Dis1 AP*Dis2 AP*Dis3 AP ( Ag. Performance, ln(RMB/Mu)) 233.594** 419.469*** -549.558*** -22.403* -60.252** 65.048** 135.471** (weighted (Wgt.)) M5 M6 Wgt. by GDP Wgt. by GDP from from Service Manufacturing 373.592* 281.883 -558.585* -43.743 -41.305 67.247 136.997 -66.479 -252.082 300.760* 3.184 -1.937 -3.984 19.713 Migration nonfarm income as share of total households income (%) Non-Linear Dis1_smaller city (ln) 9.451** 11.149 -1.080 Dis2_larger city (ln) 14.167** 6.572 -8.501 Dis3_mega city (ln) -19.271** -14.646 9.501* AP*Dis1 -0.947** -1.202 0.111 AP*Dis2 -2.073** -0.963 0.466 AP*Dis3 2.353** 1.665 -0.640 AP ( Ag. Performance, ln(RMB/Mu)) 5.125** 3.829 0.420 Note: AP and Distance variables are jointly significant at 1% and 5% in M4 and M5, respectively; in migration income level (upper) regressions. AP and Distance variables are jointly significant at 1% in M4, M5 and M6 in migration income share (bottom) regressions. *** p<0.01, ** p<0.05, * p<0.1 45 Table 1.11. Robustness check using different definitions of local nonfarm income levels Distance measurements Composite Distance (weighted (Wgt.)) M4 M5 M6 (Observations=5,120) (Number of HH=2,560) Wgt. by Population Wgt. by GDP from Manuf. Wgt. by GDP from Service Local: within own township Dis_nearest city (ln) Dis_smaller city (ln) Dis_larger city (ln) Dis_mega city (ln) / -64.009*** -6.868 53.748 / -107.118*** -43.350 125.844*** / -17.892 35.697 -13.728 Local: within own county (Including within within county) Dis_nearest city (ln) / Dis_smaller city (ln) -56.013*** Dis_larger city (ln) -20.619 Dis_mega city (ln) 59.897 *** p<0.01, ** p<0.05, * p<0.1 46 township & outside township but / -88.340*** -38.344 105.667** / -21.628 5.918 14.729 Table 1.12. Robustness check using different definitions of Local-Migration income levels Distance measurements Composite Distance (weighted (Wgt.)) M4 M5 M6 (Observations=5,120) (Number of HH=2,560) Wgt. By Popul. Wgt. By GDP from Manuf. Base: all people doing local-migration nonfarm activities (NFA) Dis_nearest city (ln) / / Dis_smaller city (ln) -8.112 -43.783 Dis_larger city (ln) -128.254** -73.111** Dis_mega city (ln) 129.353** 104.217** Wgt. By GDP from Service / -9.369 -89.180*** 91.384** Alternative 1: commuters only (doing local-migration NFA and staying at home everyday) Dis_nearest city (ln) / / / Dis_smaller city (ln) 9.833 -10.702 -10.849 Dis_larger city (ln) -41.926 -35.198 -72.394** Dis_mega city (ln) 29.580 36.963 69.641** Alternative 2: Non-commuters only (doing local-migration NFA but do not stay at home everyday) Dis_nearest city (ln) / / / Dis_smaller city (ln) -18.570 -35.827* 0.843 Dis_larger city (ln) -93.372*** -41.484** -20.028 Dis_mega city (ln) 105.520*** 71.240** 23.775 *** p<0.01, ** p<0.05, * p<0.1 47 Table 1.13. Robustness check using different definitions of Migration income levels and shares All Migrants (Obs. =5,120) (HH =2,560) M4 M5 Wgt. by Popul. Wgt. by GDP from Manuf. Migration nonfarm income levels (ln) Dis1 26.567* 75.668*** Dis2 95.073* 75.743** Dis3 -115.785** -136.679*** AP 0.123 0.123 Migrants: Out home > 3 Months M4 M5 Migrants: Out home > 6 Months M4 M5 Wgt. by Popul. Wgt. by GDP from Manuf. Wgt. By Popul. Wgt. by GDP from Manuf. 23.898 101.764** -118.922** -0.037 73.385*** 78.127** -135.729*** -0.037 29.030* 102.389* -123.510** 0.009 79.745*** 79.297** -142.127*** 0.009 Migration nonfarm income as share of total households income (%) Dis1 1.755*** 3.101*** 1.693*** 3.016*** 1.742*** Dis2 2.425 1.898* 2.447 1.890* 2.347 Dis3 -3.642** -4.293** -3.590** -4.190** -3.536** AP -0.007 -0.007 -0.015 -0.015 -0.013 *** p<0.01, ** p<0.05, * p<0.1 48 3.033*** 1.835* -4.160** -0.013 Table 1.14. Robustness check, including additional controls in the estimations (Obs=5,120) (HH=2,560) M4. Wgt. by Population M5. Wgt. by GDP from Manuf. Main Robust Model Check Main Robust Model Check Agricultural income levels (ln) Dis1 -36.747*** -22.080** -62.638*** Dis2 -39.293 -24.437 -25.840 Dis3 54.303 25.554 62.498** Has a small city / -0.235 / Has a large city / -1.648** / Has a mega city / -1.341** / Local nonfarm income levels (ln) Dis1 -64.009*** -52.219*** -107.118*** Dis2 -6.868 5.624 -43.350 Dis3 53.748 23.950 125.844*** Has a small city / 1.601 / Has a large city / -2.797** / Has a mega city / -1.500 / Local-Migration nonfarm income levels (ln) Dis1 -8.112 41.322* -43.783 Dis2 -128.254** -91.530* -73.111** Dis3 129.353** 68.914 104.217** Has a small city / -5.289*** / Has a large city / -3.329** / Has a mega city / 0.861 / Migration nonfarm income levels (ln) Dis1 26.567* -12.208 75.668*** Dis2 95.073* 69.225 75.743** Dis3 -115.785** -69.075 -136.679*** Has a small city / 2.860** / Has a large city / 4.084*** / Has a mega city / -1.582 / Migration nonfarm income levels (ln), with interactions Dis1 233.594** 193.912** 373.592* Dis2 419.469*** 345.823** 281.882 Dis3 -549.559*** -481.640** -558.583* AP*Dis1 -22.403* -20.414* -43.743 AP*Dis2 -60.252** -56.002** -41.305 AP*Dis3 65.048** 59.976** 67.247 AP 135.471** 126.395** 136.997 Has a small city / 3.039*** / Has a large city / 4.629*** / Has a mega city / -1.782* / *** p<0.01, ** p<0.05, * p<0.1 49 M6. Wgt. by GDP from Service Main Robustnes Model s Check -24.287 -0.403 5.664 -0.462 -1.692* -1.280** -21.786** -1.553 8.720 / / / -0.221 37.562* -48.032* -0.808 -2.072*** -1.845** -67.784* -11.838 57.799 1.354 -2.680* -1.280 -17.892 35.697 -13.728 / / / 9.836 91.550*** -96.935** 0.383 -3.958*** -2.749** 42.353 -41.548 19.383 -5.641*** -3.592*** 0.955 -9.369 -89.180*** 91.384** / / / 15.075 -99.652*** 93.887** -5.194*** -3.135** 1.881 10.797 51.756 -71.500 3.039** 3.947*** -1.849* -10.792 -4.325 2.094 / / / -38.262** -9.260 23.280 3.494*** 4.726*** -0.931 258.713 186.329 -405.906 -33.344 -33.144 52.127 110.253 3.397*** 4.499*** -1.890* -66.479 -252.081 300.760* 3.183 -1.937 -3.984 19.713 / / / -83.901** -349.961** 372.560*** 3.183 -1.938 -3.983 19.714 3.510*** 5.631*** -1.364 Figure 1.1. Trend of income structure of rural households in China, 1990-2011 Note1: SE Farm shows share of income from self-employment farm activities; Wage Income shows share of income from wage-employment; SE Non-Farm shows the share of income from nonfarm self-employment nonfarm; Other income shows the share of income from other sources apart from the above three activities. Note2: Data is from National Bureau of Statistics of China, available on-line: http://www.stats.gov.cn 50 Table 1.A1. Full results of impacts of urban proximities on local income values, by sources and by six methods of measurements of distance Distance measurements Physical Distance (M1-M3) M1 M2 M3 Nearest Nearest Increment city by city Distance size Composite Distance (weighted (Wgt.)) M4 M5 M6 Wgt. By Wgt. By Wgt. By GDP from GDP from Population Manuf. Service Observations Number of HH 5,120 2,560 5,120 2,560 5,120 2,560 5,120 2,560 5,120 2,560 5,120 2,560 Dis_nearest city (ln) 0.466 (0.864) / / / / / / / / 0.128 (0.876) 2.138** (0.907) -0.809 (1.651) 0.415 (0.855) / / 0.008 (0.285) -0.487 (0.696) / / -64.009*** (21.623) -6.868 (45.455) 53.748 (40.324) / / -107.118*** (25.995) -43.350 (31.856) 125.844*** (42.880) / / -17.892 (18.309) 35.697 (28.601) -13.728 (28.721) 0.987** (0.473) 0.987** (0.473) 0.987** (0.473) 0.987** (0.473) 0.987** (0.473) 0.987** (0.473) 0.010 (0.034) 0.029 (0.019) -0.316*** (0.118) 0.046 (0.204) -0.003 (0.012) -0.004 (0.044) 0.010 (0.034) 0.029 (0.019) -0.316*** (0.118) 0.046 (0.204) -0.003 (0.012) -0.004 (0.044) 0.010 (0.034) 0.029 (0.019) -0.316*** (0.118) 0.046 (0.204) -0.003 (0.012) -0.004 (0.044) 0.010 (0.034) 0.029 (0.019) -0.316*** (0.118) 0.046 (0.204) -0.003 (0.012) -0.004 (0.044) 0.010 (0.034) 0.029 (0.019) -0.316*** (0.118) 0.046 (0.204) -0.003 (0.012) -0.004 (0.044) 0.010 (0.034) 0.029 (0.019) -0.316*** (0.118) 0.046 (0.204) -0.003 (0.012) -0.004 (0.044) 0.014** (0.006) -0.173** (0.078) -0.077 (0.078) -0.121 (0.115) 0.030 (0.028) 0.147** 0.012** (0.006) -0.193*** (0.068) -0.077 (0.078) -0.121 (0.115) 0.030 (0.028) 0.147** 0.015** (0.007) -0.175** (0.074) -0.077 (0.078) -0.121 (0.115) 0.030 (0.028) 0.147** 0.013** (0.006) -0.185*** (0.068) -0.077 (0.078) -0.121 (0.115) 0.030 (0.028) 0.147** 0.009 (0.006) -0.192*** (0.066) -0.077 (0.078) -0.121 (0.115) 0.030 (0.028) 0.147** 0.017*** (0.006) -0.173** (0.072) -0.077 (0.078) -0.121 (0.115) 0.030 (0.028) 0.147** Dis_smaller city (ln) Dis_larger city (ln) Dis_mega city (ln) Ag. Performance HH/Individual Var. Age of HH head % of HH head is male Highest edu. level HH size % of kids and elder Cultivated land areas owned by household Village level Var. Distance to nearest railway station Distance to nearest longdistance bus station Distance to main road % of cultivated land are rented in/out % of labor working out of village % of HH with electricity 51 Table 1.A1. (cont’d) (0.071) 2.605 (3.236) (0.071) 2.605 (3.237) (0.071) 2.605 (3.237) (0.071) 2.605 (3.237) (0.071) 2.605 (3.237) (0.071) 2.605 (3.237) 0.222 (0.210) -1.143 (1.987) 0.920 (1.648) -0.963 (2.136) -0.692 (2.040) 0.222 (0.210) -1.823 (2.795) 1.091 (2.754) -0.482 (3.367) -2.784 (4.228) 0.222 (0.210) -2.071 (2.285) 0.261 (1.936) -1.795 (2.491) -2.276 (3.160) 0.222 (0.210) -2.169 (6.497) 4.825 (5.186) -6.773 (8.277) -17.852 (12.434) 0.222 (0.210) 2.000 (5.492) 7.403 (5.116) -0.371 (6.847) -7.596 (8.337) 0.222 (0.210) -1.685 (8.912) 5.318 (7.217) -4.849 (9.891) -10.297 (9.213) Constant -6.119 (12.391) -14.482 (15.954) -3.216 (12.672) 129.539 (88.507) 182.754** (73.173) -36.307 (96.662) R2 (overall) P Chi2 0.195 0.000 873.5 0.215 0.000 1025 0.196 0.000 980.5 0.222 0.000 1364 0.223 0.000 1498 0.208 0.000 729.6 Number of phones per person Time and province Var. Year=2006 Province=Jiangxi Province=Shandong Province=Hunan Province=Sichuan Note: In column M3, "Dis_larger city" and "Dis_mega city" are incremental distances. 52 Table 1.A2. Full results of impacts of urban proximities on local-migration income values, by sources and by six methods of measurements of distance Distance measurements Physical Distance (M1-M3) M1 M2 M3 Nearest Nearest Increment city by city Distance size Composite Distance (weighted (Wgt.)) M4 M5 M6 Wgt. By Wgt. By Wgt. By GDP from GDP from Population Manuf. Service Observations Number of HH 5,120 2,560 5,120 2,560 5,120 2,560 5,120 2,560 5,120 2,560 5,120 2,560 Dis_nearest city (ln) -2.882*** (0.921) / / / / / / / / -1.322 (1.259) -2.209** (0.963) 0.656 (1.300) -2.840*** (0.985) / / -0.442 (0.304) 0.064 (0.584) / / -8.112 (22.482) -128.254** (55.414) 129.353** (53.246) / / -43.783 (32.882) -73.111** (35.591) 104.217** (52.205) / / -9.369 (16.709) -89.180*** (34.516) 91.384** (36.684) -1.940*** (0.438) -1.940*** (0.438) -1.940*** (0.438) -1.940*** (0.438) -1.940*** (0.438) -1.940*** (0.438) -0.023 (0.039) 0.011 (0.015) 0.158 (0.144) 0.358** (0.153) -0.011 (0.009) 0.044 (0.057) -0.023 (0.039) 0.011 (0.015) 0.158 (0.144) 0.358** (0.154) -0.011 (0.009) 0.044 (0.057) -0.023 (0.039) 0.011 (0.015) 0.158 (0.144) 0.358** (0.154) -0.011 (0.009) 0.044 (0.057) -0.023 (0.039) 0.011 (0.015) 0.158 (0.144) 0.358** (0.154) -0.011 (0.009) 0.044 (0.057) -0.023 (0.039) 0.011 (0.015) 0.158 (0.144) 0.358** (0.154) -0.011 (0.009) 0.044 (0.057) -0.023 (0.039) 0.011 (0.015) 0.158 (0.144) 0.358** (0.154) -0.011 (0.009) 0.044 (0.057) -0.001 (0.007) -0.007 (0.070) 0.003 (0.075) -0.014 (0.106) 0.019 (0.040) -0.565*** -0.006 (0.008) -0.035 (0.080) 0.003 (0.075) -0.014 (0.106) 0.019 (0.040) -0.565*** 0.001 (0.007) -0.024 (0.076) 0.003 (0.075) -0.014 (0.106) 0.019 (0.040) -0.565*** -0.023*** (0.008) -0.044 (0.088) 0.003 (0.075) -0.014 (0.106) 0.019 (0.040) -0.565*** -0.021*** (0.007) -0.068 (0.091) 0.003 (0.075) -0.014 (0.106) 0.019 (0.040) -0.565*** -0.017** (0.007) -0.031 (0.085) 0.003 (0.075) -0.014 (0.106) 0.019 (0.040) -0.565*** Dis_smaller city (ln) Dis_larger city (ln) Dis_mega city (ln) Ag. Performance HH/Individual Var. Age of HH head % of HH head is male Highest edu. level HH size % of kids and elder Cultivated land areas owned by household Village level Var. Distance to nearest railway station Distance to nearest longdistance bus station Distance to main road % of cultivated land are rented in/out % of labor working out of village % of HH with electricity 53 Table 1.A2. (cont’d) (0.076) -8.837** (3.561) (0.076) -8.837** (3.562) (0.076) -8.837** (3.562) (0.076) -8.837** (3.562) (0.076) -8.837** (3.562) (0.076) -8.837** (3.562) 0.440** (0.220) -5.372** (2.230) -6.885*** (2.042) -4.845** (2.310) -4.807** (2.406) 0.440** (0.220) -3.935 (2.719) -6.414** (2.570) -3.109 (2.825) -2.358 (3.595) 0.440** (0.220) -4.434 (2.719) -6.541*** (2.216) -4.788* (2.591) -3.841 (3.492) 0.440** (0.220) 11.674 (8.552) 1.218 (6.515) 14.841 (10.259) 20.425 (14.088) 0.440** (0.220) 3.882 (6.680) -3.630 (6.407) 7.718 (8.197) 9.695 (8.795) 0.440** (0.220) -3.808 (6.847) -1.637 (6.489) -0.222 (7.963) 5.943 (8.056) Constant 9.974 (12.279) 12.822 (18.706) 10.057 (13.723) 40.931 (90.868) 89.278 (103.187) 54.671 (82.277) R2 (overall) P Chi2 0.115 0.000 1546 0.102 0.000 1553 0.120 0.000 1840 0.102 0.000 4090 0.0984 0.000 4304 0.0999 0.000 2325 Number of phones per person Time and province Var. Year=2006 Province=Jiangxi Province=Shandong Province=Hunan Province=Sichuan Note: In column M3, "Dis_larger city" and "Dis_mega city" are incremental distances. 54 Table 1.A3. Full results of impacts of urban proximities on migration income values, by sources and by six methods of measurements of distance Distance measurements Physical Distance (M1-M3) M1 M2 M3 Nearest Nearest Increment city by city Distance size Composite Distance (weighted (Wgt.)) M4 M5 M6 Wgt. By Wgt. By Wgt. By GDP from GDP from Population Manuf. Service Observations Number of HH 5,120 2,560 5,120 2,560 5,120 2,560 5,120 2,560 5,120 2,560 5,120 2,560 Dis_nearest city (ln) 1.753* (0.912) / / / / / / / / 1.970* (1.084) 0.494 (0.711) 0.115 (1.511) 1.789* (0.963) 1.789* -0.963 0.241 (0.320) 0.530 (0.608) / / 26.567* (15.379) 95.073* (51.805) -115.785** (53.228) / / 75.668*** (26.703) 75.743** (30.302) -136.679*** (49.771) / / -10.792 (13.108) -4.325 (25.174) 2.094 (29.831) 0.123 (0.351) 0.123 (0.352) 0.123 (0.352) 0.123 (0.352) 0.123 (0.352) 0.123 (0.352) 0.059** (0.026) -0.016 (0.014) 0.312** (0.147) 0.420** (0.201) -0.011 (0.009) -0.027 (0.027) 0.059** (0.026) -0.016 (0.014) 0.312** (0.147) 0.420** (0.201) -0.011 (0.009) -0.027 (0.027) 0.059** (0.026) -0.016 (0.014) 0.312** (0.147) 0.420** (0.201) -0.011 (0.009) -0.027 (0.027) 0.059** (0.026) -0.016 (0.014) 0.312** (0.147) 0.420** (0.201) -0.011 (0.009) -0.027 (0.027) 0.059** (0.026) -0.016 (0.014) 0.312** (0.147) 0.420** (0.201) -0.011 (0.009) -0.027 (0.027) 0.059** (0.026) -0.016 (0.014) 0.312** (0.147) 0.420** (0.201) -0.011 (0.009) -0.027 (0.027) -0.015 (0.012) 0.070 (0.072) -0.062 (0.064) 0.182** (0.084) 0.015 (0.018) -0.071 -0.015 (0.013) 0.098 (0.072) -0.062 (0.064) 0.182** (0.084) 0.015 (0.018) -0.071 -0.017 (0.012) 0.081 (0.073) -0.062 (0.064) 0.182** (0.084) 0.015 (0.018) -0.071 0.001 (0.011) 0.096 (0.071) -0.062 (0.064) 0.182** (0.084) 0.015 (0.018) -0.071 0.002 (0.011) 0.119* (0.071) -0.062 (0.064) 0.182** (0.084) 0.015 (0.018) -0.071 -0.006 (0.011) 0.106 (0.074) -0.062 (0.064) 0.182** (0.084) 0.015 (0.018) -0.071 Dis_smaller city (ln) Dis_larger city (ln) Dis_mega city (ln) Ag. Performance HH/Individual Var. Age of HH head % of HH head is male Highest edu. level HH size % of kids and elder Cultivated land areas owned by household Village level Var. Distance to nearest railway station Distance to nearest longdistance bus station Distance to main road % of cultivated land are rented in/out % of labor working out of village % of HH with electricity 55 Table 1.A3. (cont’d) (0.046) -2.343 (2.014) (0.046) -2.343 (2.014) (0.046) -2.343 (2.014) (0.046) -2.343 (2.014) (0.046) -2.343 (2.014) (0.046) -2.343 (2.014) 0.320** (0.147) 5.442*** (1.868) 3.040* (1.811) 4.992** (2.404) 4.876** (2.458) 0.320** (0.147) 6.488** (2.650) 3.622 (2.508) 4.195 (2.995) 6.692 (4.207) 0.320** (0.147) 5.991** (2.372) 3.611* (2.095) 5.927** (2.679) 6.170* (3.668) 0.320** (0.147) -10.140 (7.644) -8.016 (6.186) -10.902 (8.717) -10.111 (10.580) 0.320** (0.147) -6.712 (6.106) -5.800 (6.198) -8.700 (6.793) -6.095 (6.028) 0.320** (0.147) -6.225 (7.367) -6.510 (7.473) -7.114 (7.636) -4.262 (6.079) Constant -32.423*** (12.324) -36.622** (15.334) -35.843*** (13.725) -60.808 (58.407) -130.663** (59.731) 82.245 (84.166) R2 (overall) P Chi2 0.334 0.000 725.4 0.335 0.000 657.6 0.336 0.000 772.9 0.337 0.000 1218 0.339 0.000 1112 0.328 0.000 800.5 Number of phones per person Time and province Var. Year=2006 Province=Jiangxi Province=Shandong Province=Hunan Province=Sichuan Note: In column M3, "Dis_larger city" and "Dis_mega city" are incremental distances. 56 REFERENCES 57 REFERENCES Au, C.C. and Henderson, J.V., 2006. Are Chinese cities too small?. The Review of Economic Studies, 73(3): 549-576. Au C-C and Henderson JV (2006) How migration restrictions limit agglomeration and productivity in China. Journal of Development Economics 80(2): 350–388. Barrett CB, Reardon T and Webb P (2001) Nonfarm income diversification and household livelihood strategies in rural Africa: concepts, dynamics, and policy implications. Food Policy 26(4): 315–331. Benjamin D, Brandt L and Giles J (2005) The evolution of income inequality in rural China. Economic Development and Cultural Change 53(4): 769–824. Berdegué JA, Carriazo F, Jara B, et al. (2015) Cities, Territories, and Inclusive Growth: Unraveling Urban–Rural Linkages in Chile, Colombia, and Mexico. World Development. Berdegué, J.A., F. Proctor. 2014. Inclusive Rural-Urban Linkages. Doc no 123, Working Group: Development with Territorial Cohesion. Santiago, Chile: Rimisp. Chernina E, Castañeda Dower P and Markevich A (2014) Property rights, land liquidity, and internal migration. Journal of Development Economics, 110: 191–215. Christiaensen L and Todo Y (2014) Poverty reduction during the rural–urban transformation–the role of the missing middle. World Development 63: 43–58. Christiaensen L, De Weerdt, J, Ingelaere, B and Kanbur, R (2017) Why Secondary Towns Can Be Important for Poverty Reduction-A Migrant's Perspective. No. 12193. CEPR Discussion Papers. De Janvry A and Sadoulet E (2001) Income strategies among rural households in Mexico: The role of off-farm activities. World development 29(3): 467–480. De Janvry A, Sadoulet E and Zhu N (2005) The role of non-farm incomes in reducing rural poverty and inequality in China. Department of Agricultural & Resource Economics, UCB. Deichmann U, Shilpi F and Vakis R (2009) Urban proximity, agricultural potential and rural non-farm employment: Evidence from Bangladesh. World Development 37(3): 645–660. Deininger K and Jin S (2005) The potential of land rental markets in the process of economic development: Evidence from China. Journal of Development Economics 78(1): 241–270. 58 Deininger K and Jin S (2006) Dynamics of temporary migration in China: exploring the changing role of networks. World Band Policy Research Working Paper. Desmet K and Fafchamps M (2005) Changes in the spatial concentration of employment across US counties: a sectoral analysis 1972–2000. Journal of economic geography 5(3): 261–284. Du X, Ifft J, Lu L, et al. (2015) Marketing Contracts and Crop Insurance. American Journal of Agricultural Economics. Escobal J (2001) The benefits of roads in rural Peru: a transaction costs approach. Grupo de Análisis para el Desarrollo (GRADE), Lima. Fafchamps M and Shilpi F (2003) The spatial division of labour in Nepal. Journal of Development Studies 39(6): 23–66. Fafchamps M and Wahba J (2006) Child labor, urban proximity, and household composition. Journal of Development Economics 79(2): 374–397. Fafchamps M, Koelle M and Shilpi F (2016) Gold mining and proto-urbanization: recent evidence from Ghana. Journal of Economic Geography. Fan CC and Scott AJ (2003) Industrial Agglomeration and Development: A Survey of Spatial Economic Issues in East Asia and a Statistical Analysis of Chinese Regions. Economic Geography 79(3): 295–319. Gibson J (2000) A Poverty Profile of Cambodia, 1999. A Report to the World Bank and the Cambodian Ministry of Planning, Phnom Penh. Gordon IR and Kaplanis I (2014) Accounting for Big-City Growth in Low-Paid Occupations: Immigration and/or Service-Class Consumption. Economic Geography 90(1): 67–90. Haggblade S, Hazell PB and Reardon T (2007) Transforming the rural nonfarm economy: Opportunities and threats in the developing world. Intl Food Policy Res Inst. Harris JR and Todaro MP (1970) Migration, unemployment and development: a twosector analysis. The American Economic Review: 126–142. Heath R and Mushfiq Mobarak A (2015) Manufacturing growth and the lives of Bangladeshi women. Journal of Development Economics 115: 1–15. Henderson JV and Wang HG (2005) Aspects of the rural-urban transformation of countries. Journal of Economic Geography 5(1): 23–42. Hymer S and Resnick S (1969) A model of an agrarian economy with nonagricultural activities. The American Economic Review: 493–506. 59 Jin S and Deininger K (2009) Land rental markets in the process of rural structural transformation: Productivity and equity impacts from China. Journal of Comparative Economics 37(4): 629–646. Jonasson E and Helfand SM (2010) How important are locational characteristics for rural non-agricultural employment? Lessons from Brazil. World Development 38(5): 727–741. Lewis WA (1954) Unlimited Supplies of Labour. Manchester school. Li C and Gibson J (2013) Rising regional inequality in China: Fact or artifact? World Development 47: 16–29. Lucas RE (2001) The effects of proximity and transportation on developing country population migrations. Journal of Economic Geography 1(3): 323–339. Massey, DS, Arango, J, Hugo, G, Kouaouci, A, Pellegrino, A, Taylor, JE. 1993. Theories of International Migration: A Review and Appraisal. Population and Development Review, 19(3), September: 431-466. Mellor JW and Lele UJ (1973) Growth linkages of the new foodgrain technologies. Indian Journal of Agricultural Economics 28(1): 35. Mu R and Giles J (2014) Village political economy, land tenure insecurity, and the rural to urban migration decision: evidence from China. World Bank Policy Research Working Paper (7080). Partridge MD, Rickman DS, Ali K, et al. (2008) Lost in space: population growth in the American hinterlands and small cities. Journal of Economic Geography: lbn038. Rozelle S, Guo L, Shen M, et al. (1999) Leaving China’s farms: survey results of new paths and remaining hurdles to rural migration. The China Quarterly 158: 367– 393. Sassen S (1991) The Global City: New York, London, Tokyo, Princeton. NJ: Princeton. Stark O and Lucas RE (1988) Migration, remittances, and the family. Economic Development and Cultural Change: 465–481. Taylor, JE, Rozelle S, de Brauw, A. 2003. Migration and Incomes in Source Communities: A New Economics of Migration Perspective from China. Economic Development and Cultural Change, 51(1), October: 75-101. Taylor PJ, Derudder B, Faulconbridge J, et al. (2014) Advanced Producer Service Firms as Strategic Networks, Global Cities as Strategic Places. Economic Geography 90(3): 267–291. 60 Todaro MP (1969) A model of labor migration and urban unemployment in less developed countries. The American Economic Review: 138–148. Volpe Martincus C, Carballo J and Cusolito A (2017) Roads, exports and employment: Evidence from a developing country. Journal of Development Economics 125: 21–39. von Thünen JH (1826) Der isolierte staat in Beziehung auf Landwirtschaft und Nationalökonomie. Jena: Gustav Fischer. Wooldridge JM (2003) Cluster-sample methods in applied econometrics. Yunez-Naude A and Taylor JE (2001) The determinants of nonfarm activities and incomes of rural households in Mexico, with emphasis on education. World Development 29(3): 561–572. Zhao Y (1999a) Labor migration and earnings differences: the case of rural China. Economic Development and Cultural Change 47(4): 767–782. Zhao Y (1999b) Leaving the countryside: rural-to-urban migration decisions in China. The American Economic Review 89(2): 281–286. 61 CHAPTER 2: NETWORK EFFECTS AND RURAL-URBAN MIGRATION DESTINATION CHOICES: EVIDENCE FROM CHINA 1. Introduction Networks or social connections are important in social capital theory to explain whether (migration decision) and where to migrate (destination choice) (Massey et al., 1994, Stark and Jakubek, 2013). Networks can help migrants to reduce the costs of moving, credit, settlement and establishment, and job search, and thus will affect the migrant’s utility of whether and where to migrate (Bauer et al., 2009). In this paper, I study the impacts of network effects alone, and as a substitute or a complement to other migrant attributes like education and city amenities like city size, on rural-urban migrant’s destination choice. The gaps in the literature discussed below motivate the goal of this paper. In the migration literature, there are incentives and capacity factors to determine the supply of labor by rural people to migrate and to where to go. The incentives can be push effects like poor agricultural performance, and can be pull effects like better amenities in the migrating cities. The incentives can be pull factors like good education of the migrant, so he wants to move to a place that values better his education than the rural areas, and can be pull factors like better network capital, so he can move with little search cost. The literature calls the migrant labor supply choice “positive selection” if an attribute of the migrant (such as education or network capital) positively affects the person’s capacity to migrate as well as where to migrate. An example could be education or network or both help migrant to go to a city where he/she gets a higher return to labor (Chiquiar and Hanson 2005 and Grogger and Hanson 2011). While an example of 62 “negative selection” could be that lower-educated people are more likely to migrate to cities in search of low-entry barrier (skill requirement) jobs (Borjas 1987, Durand et al. 1996, and Moraga 2011). The “capital” above is really a vector. For example it can be education as well as network capital for destination. Logically they can be complements or substitutes. Their effects on migration decisions have been tested in some recent literature (Pedersen et al. 2008, Bertoli 2010, McKenzie and Rapoport 2010, and Beine et al. 2011), but not in all possible interesting situations to test it. Specifically, whether those are complements or substitutes have been mainly tested only in the international literature, but not in the domestic (internal) migration literature. One can argue that the latter is similar in a way to international migration because going from the mountain village to a big coastal city has aspects of costs and threshold investments just like going from a Mexican village to Los Angeles would have. But it also might be a different situation, at least in degree, as the poorest do not tend to internationally migrate while they do tend to migrate internally, and there have been few tests of their network capital and education in that setting. But even controlling for whether a study of education and network capital effects has been done in domestic or international settings, it has seldom been done where specific destination cities have been used as the focus of the “where” (instead of the general issue of whether to just migrate domestically vs internationally as in Yunez-Naude and Taylor (2001)). The city-specific “where” is interesting in analysis of domestic migration because one would expect heterogeneity of cities (size, economies, amenities) to significantly affect both the individual coefficients on education and on network capital, but also on their interaction as substitutes or complements. 63 Therefore there are two major research questions this analysis addresses. First, how do network effects alone impact rural-urban migrants’ destination choices, once moving costs are controlled for? Usually the literature proxies search and moving costs by whether the migrant has a network to help him/her move and to receive at the destination. But here I separate the network from an explicit representation of the search and moving costs, as a contribution to the literature. I hypothesize that rural-urban migrants prefer a city where they have a more developed network (from their sending area), once search and moving costs are controlled for. Second, in what ways do rural migrants’ networks and education (individually and jointly) affect their urban destination choice? The hypothesis on this is ambiguous. Networks can help less-educated rural people to migrate if the network helps the wouldbe migrant find and land a job requiring low skills in the destination city. But a network can also help well-educated rural people to find commensurate jobs in the destination cities. For either pool of migrants, poorly or well educated, a network could reduce asymmetric information risks in the job market. These two research questions will be addressed with a detailed data set for rural migrants in nine destination cities in Guangdong province. The paper is organized as follows. Section 2 explains the models. Section 3 summarizes the data and the statistics. Section 4 shows the estimations. Section 5 discusses the results. Section 6 concludes. 64 2. The model Following the migration literature, the utility of a rural migrant choosing city j from all available cities he could choose from depend on the expected net incomes person i could obtain in location j, networks of i in j, migration cost from origin to destination j, and a vector of location-specific amenities 𝑿𝒋 . 𝑈!" = 𝑈! 𝐸𝑥𝑝𝐼𝑛𝑐𝑜𝑚𝑒!" , 𝑁𝑊!" , 𝑀𝐶!" , 𝑿𝒋   = 𝑉!" + 𝜀!" , (1) where 𝑉!" is an indirect utility and 𝑙𝑛𝑉!" =𝛽! 𝑙𝑛𝐸𝑥𝑝𝐼𝑛𝑐!" + 𝛽!" 𝑁𝑊!" − 𝛽!" 𝑀𝐶!" + 𝜷𝑿 𝒍𝒏𝑿𝒋 + 𝜀!" , (2) Two factors should be emphasized here. First, I separate the network effects (NW) and physical migration cost (MC) apart, which two factors are usually lumped together in the literature (Bayer et al., 2009, Fafchamps and Shilpi, 2013). But I assume that the two variables could have differently signed impacts on destination choice. I will use travel distance in the model to measure moving cost. Second, the net income of migrant i obtained in city j is only observed in chosen city j, while we cannot observe the incomes that this individual would have earned if he gone to the other cities. I will generate counterfactual expected net incomes for the individual in all destinations, following the semi-parametric method in Dahl (2002). The detailed estimation methods of all the independent variables will be discussed in section 4. 65 In equation (1), 𝜀!" captures a vector of unobservables. Suppose the error term 𝜀!" follows independent and identical distribution type I extreme value distribution, the probability of migrant i choosing city j to is, 𝑃!" = !! !"!"#$%&!" !!!" !"!" !!!" !"!" !𝜷𝑿 𝒍𝒏𝑿𝒋 !!!" ! !!! !! !"!"#$%&!" !!!" !"!" !!!" !"!" !𝜷𝑿 𝒍𝒏𝑿𝒌 !!!" , (3) To recover the estimators in (3), a LOGIT estimation will be used. To explore the joint effects of network effects and other controls, later in the analysis the interaction terms between individual characteristics, city level amenities and network effects will be included to allow for the inclusion of individual’s characteristics. 3. Data 3.1. Data and sampling The primary data set used in this study is collected in the China Labor-force Dynamics Survey (CLDS) by Sun Yat-sen University in 2009. The survey interviewed rural-urban workers with college degrees or lower, who migrated from another province or from a rural area within Guangdong province into one of the nine cities4 in Guangdong. The survey collected origin and destination places of the migrants, detailed information on employment and living situation, including wage and rents, and demographics of the migrants. 4 The nine cities include Guangzhou, Shenzhen, Zhuhai, Foshan, Zhaoqing, Dongguan, Huizhou, Zhongshan, and Jiangmen cities. 66 The sampling method of the CLDS is as follows. First, they used the 2000 census to obtain the number of rural migrants (including migrants from another province or within Guangdong province but from another city. Second, they generated share of migrants in city j over total number of migrants in the nine cities to obtain the distribution of population migrants over sample cities. Third, they used the stratified sampling method to randomly sample migrants in the nine cities. In total 1,544 migrants are used in my analysis5. CLDS selected the sample migrants in nine cities, then the sample is likely to be nonrandom. Since the probability of selecting this migrant in city j is conditional on the probability of this migrant’s choosing city j, which will result in biased estimators (Shaw, 1988). To address this dress this endogenous stratification concern, I will use weighted exogenous stratification maximum likelihood estimation in the later analysis. 3.2. Characteristics of the sample Sample migrants’ birth provinces are spatially concentrated. 28% of them are from rural Guangdong. The other migrants are mainly from four nearby provinces, including provinces inland from the coast and in the north-south middle belt of China, composed of relatively poor provinces (Hunan, Guangxi, Sichuan, and Hubei), which account for another 47%. The remaining 25% are mainly from the North. The migrants tend not to be from the richer coastal provinces. 5 22 observations from urban areas (with Chinese urban hukou registration) collected by CLDS are excluded in the analysis, since I will only study the destination choice of ruralurban migrants. 67 The demographics of the sample are shown in Table 2.1. The average age of the sample migrant is 29. 54% of them are male, and 48% are married. Only 42% have attended high school or received higher-level degrees. This is not surprising, since the survey only interviewed migrants with education levels lower than the collage. Table 2.2 shows that the average monthly wage is 1730 RMB (around $251 in 2017 USD), which, according to Guangdong Statistical Yearbook, is lower than the average wage earned by laborers in Guangdong from the manufacture sector (2126 RMB) and service sector (3456 RMB), but higher than that from the agriculture sector (1087 RMB). Only small amount of rents is paid by the migrants due to the poor living conditions or that the accommodations are provided by the employers. Average net income (after subtracting the small housing rent paid from the gross income) is 1647 RMB. 4. Estimations To recover the coefficients in (3), I discuss the specifics of the independent variables in the model, including network effects, counterfactual incomes, migration cost, and city amenities. 4.1. Networks The information to calculate network effects is obtained from the 2010 population census of Guangdong province6. The census shows the number of migrants in each of the cities (prefectural level) of Guangdong province by origin province and number of migrants in each of the cities from another city. 6 Note population census is available every five years, thus we use the information of 2009, which is collected in 2010, though the survey launched in 2009 and collected information of 2008. And I assume the distribution of migrants from a certain city over total migrants in the destination city will remain similar in 2008 and 2009. 68 I calculate the network using two methods. First, the network of each individual from region r (r refers to a province if the individual is not from Guangdong, and refers to a prefectural level city within Guangdong province if the individual is from Guangdong province) and living in city j is the share of people in city j from origin r over total migrants in 21 cities of Guangdong from region r, which can be specified as, ! 𝑁𝑊!" = !"#$%"!" !" !"#$%" !" !!! , j=1,..,9, r=1,…,23. (4) In total, there are 23 origin provinces and nine destination cities. The networks vary by origin-destination but not by individual within the same origin. The total number of cities (k) does not equal to the maximum number of destination city (j), because there are 21 cities in Guangdong, while only nine cities are covered in the survey. Although only 9 cities are included in the survey, 93% of the rural-urban out-province migrants have chosen one of these 9 cities as their destinations. The second type of network is calculated as follows. The network of each individual from region r and living in city j is the share of people in city j from origin r over total migrants in city j from all other regions o, which can be specified as, ! 𝑁𝑊!" = !"#$%"!" !" !"#$%" !" !!! , j=1,..,9, r=1,…,23. (5) Same as the first type of network effect, the second one also varies by origindestination but not by individual within the same origin. Region o equals to 32, including 69 31 provinces other than Guangdong province in China, and Hong Kong, Macao and Taiwan regions. Table 2.3 reports the statistics for the network variables. Sample migrants are from 23 provinces and thus the networks have 207 different values (23 origins*9 destinations). The results from the upper panel show that the average value is 10.42%, and it varies from 0.37% to 46.75%. The higher the value is, the higher level of networks the sample migrants are faced with. Specifically, the highest value in Shenzhen city (46.75%) is from Jilin province, that is to say 46.75% of rural migrants from Jilin province in Guangdong province is concentrated in Shenzhen city. The results from the lower panel show the results of the second measure of networks. The average value is 5.67%, and it varies a lot from 0.15% to 76.56%. Still, the higher the value is, the higher level of networks the sample migrants are faced with. Specifically, the lowest value in Zhongshan city (0.15%) is from Shanxi province, that is to say only 0.15% of out-of-province rural-urban migrants in Zhongshan city is from Shanxi province. 4.2. Calculation of counterfactual incomes The income of migrant i obtained in city j is only observed in city j, while of course we cannot observe the incomes that this individual would have earned in other cities. Unobserved heterogeneity of the individual (such as risk aversion) affects whether he has chosen city j and has thus entered our sample for city j. To solve this self-selection problem in a choice model, I generate counterfactual expected net incomes for the individual in all destinations, following the semi-parametric method in Dahl (2002). 70 First, I use the following function form to regress the location-specific realized income on individual attributes, 𝑙𝑛𝐸𝑥𝑝𝐼𝑛𝑐𝑜𝑚𝑒!" = 𝛼 + 𝛼!"#,! 𝑙𝑛𝐴𝑔𝑒 + 𝛼!"#$,! 𝑀𝑎𝑙𝑒 + 𝛼!"##$%&,! 𝑀𝑎𝑟𝑟𝑖𝑒𝑑 + ! 𝛼!"#_!!"! , 𝑗𝐸𝑑𝑢_ℎ𝑖𝑔ℎ + 𝛼!! ! 𝑃 r, j Edu + 𝛼!! ! [𝑃 r, j Edu ]! + 𝜀!" , (6) The term 𝑃 r, j Edu measures the percentage of individuals with different education levels, born in region r, and living in city j. The reason to include 𝑃 r, j Edu is to improve the accuracy of estimators on independent variables and thus counterfactual incomes. Including these terms, we assume that unobserved heterogeneity is relatively small of migrants migrating from the same birth location to to same destination j with the same level of education. The equation to obtain 𝑃 r, j Edu is as follows, 𝑃(r, j|Edu) = 𝐸𝑑𝑢_𝐿𝑜𝑤! ∗ P(r, j|Edu_low) + 𝐸𝑑𝑢_ℎ𝑖𝑔ℎ! ∗ P(r, j|Edu_high), (7) To calculate this, we need to divide the observations into mutually exclusive cells based on education level. For simplicity, I combine some provinces, which gives three regions of origin: Guangdong province, Neighbor provinces (including four provinces sharing borders with Guangdong province: Guangxi, Hunan, Jiangxi, and Fujian), and Non-neighbor provinces. In total 54 cells are generated. The average size of the cell contains 54 migrants, with the size ranges from 1 to 104. The distribution of the sample migrants by types is shown 71 in Table 2.4. We can then generate the percentage of individuals with different certain education level from region r to city j following (7). The calculated results of 𝑃(r, j|Edu) are shown in Table 2.5. Using these obtained probabilities in regression (6) allows us to obtain the estimated coefficients for each variable. With these coefficients, I can then impute the expected incomes for individual in all cities, even if the migrant did not migrate to that city. The imputed incomes for the sample can be found in Table 2.6. 4.3. Migration cost To calculate the migration cost, I use Google Map to get the travel distance and time from the migrant’s origin to the nine destination cities. Since we know from which location to what city each migrant goes, I first extract geographic coordinates of the urban centers of these cities. Then I use Google Map to calculate a pairwise travel time from 174 origin cities to nine destination cities. The summary of the results can be found in Table 2.7. It is not surprising that the travel distance and time are similar among sending locations to each destination city, since these nine cities are geographically close by. But the distances and travel time vary a lot from the center of each of the nine study cities to the origins of the migrants as these origins stretch across China. The maximum travel distance is 3555 km and the longest travel time by car is 36 hours. 72 4.4. Amenities of cities City amenities hypothesized to affect rural-urban migration are obtained from Guangdong Bureau of Statistics, the 2009 Guangdong Statistical Yearbook. The summary of the amenities is in Table 2.8. Urban population shows the size of the city (the urban center of the prefecture city). Per capita expenditure from the government is controlled for to capture the investment in local amenities. I expect these two would positively affect the migration choice to city j since these two variables are pull effects. Numbers of hospitals, and primary and middle schools per person are included. This is of course important for the children and elderly family members of migrants as well as migrants’ work injuries. Other employment- related variables are included. The average unemployment rate is 2.3% with small variation among nine cities, which is expected to be negatively correlated with the destination choice. The numbers of industrial and construction firms are included in the regressions, which vary a lot among the nine cities. Since rural-urban migrants are likely to be involved in the industrial and construction-related occupations, I expect that cities with greater number of these firms attract more rural-urban migrants. 4.5. Weighting used to correct for on-site sampling The last estimation issue to consider in the estimation is the on-site sampling problem. If it is ignored it will bias the estimators. Following Wooldridge (2002), I use weighted exogenous stratification maximum likelihood estimation to address this endogenous stratification problem. The weights used in the analysis are shown in the last column of 73 Table 2.9, which is calculated as the ratio of population share over the sample share (Weight (column (5))= column (2)/column (4)). The distribution of city populations and sample migrants in the nine city destinations are in Table 2.9. All the nine destination cities are in Guangdong, and the distributions are quite concentrated in a few cities. Around 31% of the migrants are located in Dongguan city, followed by Shenzhen (23%), Guangzhou (17%), and Foshan (9%) cities, four of which have accommodated 80% of total sample migrants. The remained 20% of sample migrants work in other five cities, with fewer than 100 sampled migrants in each city. 5. Estimation results 5.1. Impacts of network effects I estimate the probability that rural migrant i choses city j in Guangdong province. The results are estimated using conditional Logit. To allow for the correlations within individual, the error terms are clustered by individual level. The results are shown starting from Table 2.10. The regression is in several variations in Table 2.10, 2.11, and 2.12: I begin by estimating the equation with city attributes, and then control for city fixed effects. Both results with income and unemployment rate interactions are reported. The summary statistics for city attributes included in the estimation are shown in Table 2.8. Results in Table 2.10 and Table 2.11 do not include migration costs (hours of travel) and report the results using two different measures of network effects, respectively. Table 2.12 has all the same models as above tables, but also includes travel time from the migrant’s origin to the destination city as an independent variable. 74 Recall that our first research question concerns the network effect on rural-urban migrants’ destination choice. The regression results from Table 2.10 and Table 2.11 show that the network effects are positively significant, controlling for other expected income and city amenities. The estimator is somewhat smaller after controlling for city level fixed effects in Table 2.10, but the significance level and sign of the estimator are remained the same. The positive effect of network effects on migrants’ destination choice is consistent with the existing literature discussed in the introduction, but this finding is about the “meso network” (cluster of people they do not know but from the same birth province) rather than the focus of the earlier literature that was on the “micro network” of the family and local community of the migrant to a domestic destination. Our results are similar to the results in Fafchamps and Shilpi (2013), who test impacts of social proximity on choice of migration destination in Nepal. The results suggest that rural migrants would prefer to work and live in a city with higher share of population coming from the same origin (rural location in the province or outside province). The reason behind this is that the cities with larger size of networks might reduce the moving costs, and might provide the migrants better access to jobs, settlements, and other information. Surprisingly, model (1) and (2) in Table 2.10 and Table 2.11 do not show significant impacts of income no matter if I control for city fixed effects or not. I expected that the income would be a key variable that positively impact migrants’ choices, as found in Bertoli (2010). However these insignificant results might be due to the fact that the sampled migrants are rural-urban migrant, who are more likely or able to undertake occupations with poorer conditions or with lower wage. This assumption can be partially confirmed by the following. The estimator of the unemployment rate is positively 75 significant: rural-urban migrants prefer cities with higher unemployment rates. It implies that rural migrants may have higher opportunity to be employed by the occupations with higher risks to be fired and higher temporary turnover like construction jobs. I further test the hypothesis by interacting income and unemployment. The results are shown in columns (3) and (4) of Table 2.10 and Table 2.11. The estimators of income as well as the interaction term are significant. Rural-urban migrants prefer cities where they can earn higher incomes, while this effect is decreasing by the unemployment rate of the city. This is reasonable, since in cities with low unemployment rates, seeking a job with a higher wage is the goal of the rural-urban migrant. However, in cities where unemployment is high, the first goal of the migrants is to find a job to guarantee some earnings, and then the second step is to seek a job with higher incomes. 5.2. Impacts of the migration cost The second research question addressed by this paper is how migration cost will impact the choice of city destinations. I used travel time as the index for physical migration cost, and the results are robust to travel distance. The results with two measures of networks are shown in Table 2.12. As expected, a longer travel time thus higher migration cost significantly decreases the probability of the destination choice in all the models. Comparing the results in Table 2.12 with Table 2.10 and Table 2.11, although ignoring the migration cost leaves the first type of network estimator almost the same, the estimators of second type of network are not significant once controlling for the city effects (models 2 and 4 in the lower panel). Two implications are obtained from this result. First, it emphasizes the importance to include both physical migration cost and 76 social networks separately in the migration equation, which is generally lumped together in the previous literature. Second, it shows that the different measurements of the mesolevel networks will differ the results. The calculation from the origin-orientated side (first measure: share of migrants in destination city j from origin r over total migrants from r) seems to have more significant impacts than destination-orientated side (second measure: share of migrants in city j from origin r over total migrant in destination city j), when we control for the physical migration cost. The estimators on city amenities in Table 2.12, which are not reported due to spaces, are also different from Table 2.10 and Table 2.11 when the migration cost is controlled for. This finding is consistent with the finding in Bayer et al. (2009), who showed that ignoring the mobility cost would make a large difference in the estimated value of clean air (city amenity) of the destination. 5.3. Results with individual characteristics interactions Table 2.13 and Table 2.14 show the results with interactions terms between network effects and education level, and micro-level nonfarm experience, respectively. All the models from here include city fixed effects and migration cost. For comparison, results without interaction terms are shown in column (1) and (4), which are the same as the results in column (4) in Table 2.12. Recall that the third research question concerns whether the network effect could condition the effect of education on city destination. As discussed in McKenzie and Rapoport (2010), networks could shape self-selection through decreasing migration cost. However, they use the migrant’s network as an index for migration cost to test this 77 hypothesis. In this paper, I can and do separate the network effect from the moving cost (travel time), and compare the effects of the two on selection. Columns (3) and (6) in Table 2.13 show that the estimator of the interaction term of education and migration cost are positive, while the estimator of the migration cost is negative, no matter which measurement is used to calculate networks. Taking both together shows that although being close to a city would increase the probability of choosing this city, this effect can be decreased by having a higher education level. This is consistent with high migration costs resulting in positive self-selection, as in Chiquiar and Hanson (2005). The reason for this positive selection is that education can increase the information accessibility and decrease the other nonmonetary costs to settle and to find a job. Interestingly, column (2) and (5) also show that the estimators of education-network interaction are significantly positive. In particular, since the estimator of the network effect is significantly positive, it says that cities with greater level of network effects would attract more highly educated rural-urban migrants. This result is opposite to that of McKenzie and Rapoport (2010), who find that education and networks have substitution effects, meaning that lower-educated migrants in communities with stronger networks for Mexico-U.S. migration are more likely to migrate. The difference with the Mexico result can be explained in two ways. First, it might be due to their not separating networks and migration costs. McKenzie and Rapoport (2010) suggest that migration cost is a function of networks, and use network as the index for migration cost in their estimation model, without controlling for further physical migration costs. However, our results show that once we controlled for the physical 78 migration costs (travel time), the network effect alone does result in a positive selection effect for education. Second, the difference may be due to the different characteristics of Chinese ruralurban domestic migration and Mexico-U.S. international migration. Since the MexicoU.S. international migrants undertake employment with lower skills, such as wage employment in the agricultural sector or the service sector, they do not need a high level of education; networks of Mexican immigrants could help them to find these types of jobs requiring low skills. However, in the case of Chinese internal migration, recall that the average income of sample migrants, shown in the statistical table, is much higher than that of agricultural sector, and similar to the manufactures industry. It is thus possible that the jobs undertaken by these rural-urban migrants require some skill. Thus even with the help of the network, education plays a multiplier effect. That implies that the networks may help the laborers to overcome the asymmetric information problem in the job market, and thus education is still a requirement asked for by the employers in the cities. Again, the results suggest that it may not be appropriate to use the network effects as the index for migration cost directly. Highly educated rural-urban migrants could decrease the credit constraints and choose cities being far from the origins. Meanwhile, they would choose cities with higher network effects, once controlling for the travel time. Both of these show positive selections in terms of education as education could decrease both types of costs, including physical and psychological costs. The results are consistent with that in Chiswick (1999). Table 2.14 shows the results with interactions between networks and whether the first nonfarm job is referred by a family or friend member in columns (1) and (4), whether the 79 current nonfarm job is referred by a family or friend member in columns (2) and (5), and years of undertaking nonfarm employment in columns (3) and (6), and we do not find any significant effects of these micro-level social networks, once we controlled for the mesolevel networks. 5.4. Results with city amenities interactions I further explored how the results might differ if we allow the networks effects to be varied by city amenities. The idea of including these terms is to see whether network effects could generate some substitute or multiplier effects in terms of city attributes. Specifically, columns (1) and (4), columns (2) and (5), and columns (3) and (6) in Table 2.15 show the results include network interactions with city size (urban population), number of industrial firms, and number of construction firms, respectively. The Wald tests show that all the terms with interactions are also jointly significant. Several results are standing out. First, the significantly opposite results on networks and network-city size interaction suggest that network effects are more important for a rural-urban migrant to choose a city when the city is small. For rural migrant facing with a same level of networks in two cities, it is much easier for him to get contact with someone in the networks or to be benefit from the externalities from the networks in a smaller city. While, when the network effects are low, people prefer larger cities where the urban push effects should be larger. Second, the results in columns (2) and (3) using origin-orientated networks (measured by the first method) show substitute effects between networks and number of industrial 80 and construction firms. Specifically, migrants prefer cities with higher level of networks, yet these effects are diminishing if cities have greater number industrial and construction firms. These results make sense as migrants are more likely to be hired in cities with more firms if other factors are the same, and it is likely that they can find a job more easily even with no or low level of networks in these cities. The results in columns (5) and (6) using destination-orientated networks (measured by the second method) show opposite results. Similar to the results using originorientated networks, migrants still prefer cities with more industrial and construction firms, however, networks play complementary effects in these two models, meaning they prefer cities with more firms, especial for those with more networks. These different results imply the importance to differ the measurement of networks when we study the impacts of network effects, as different measures may give very different results. 6. Conclusions Network effect is important for rural-urban migrations. In this paper, I show networks, calculated by two ways, will increase the probability of choosing this destination for rural-urban migrants in Guangdong, China. This paper help in part to show physical migration costs (travel time) have significantly negative impact on the choice. Ignoring these physical costs would results in biased estimators of city attributes. Also, the paper emphasizes the importance to separate the physical (migration costs) and non-monetary costs (networks) to study the domestic rural-urban migration in China. 81 I find the migration costs and education play the substitute effects, while education and network effects play the complement effects on destination choice of rural-urban domestic migration. My different results compared to international migration suggest that for different types of employment, education and networks may have different effects. These results address the importance of decreasing migration costs and increasing network effects if the city would like to attract more rural-urban migrants. Meanwhile, developing education skills of rural-urban migrants in China should also help the people faced with credit constraints, and would multiply the network effects of the cities. The paper also finds that the effects of origin-orientated networks measured as share of people from origin r in city j over total migrants in Guangdong from origin r, are different from that of destination-orientated networks measured as share of people from origin r in city j over total migrants in city j from all other provinces. The originorientated networks show more significant effects in most of the cases and show substitute effects for number of industrial and construction firms in the city, while the destination-orientated networks show the opposite. This suggests the importance to differ the measurements of network effects in the future studies. 82 APPENDIX 83 APPENDIX Table 2.1. Summary statistics of demographic variables, Individual level Variable Description Mean Std. Dev. (N=1,544) Age Age (years) 28.55 8.87 Male Gender (=1 if male, 0 if female) 0.54 0.50 Edu_high Education level (=1 if equal to or higher than high school, =0 if lower than high school) 0.42 0.49 Married Marital status (=1 if married) 0.48 0.50 84 Table 2.2. Average incomes and rents per month Variable (N=1,544) Mean Std. Dev. Min Wage received from the firm 1,730 946 240 Housing subsidies received from the firm 42 95 0 Total income (wage+ housing subsidies) 1,772 952 250 Rental cost 125 163 0 Income (total income-rental cost) 1,647 914 230 Note: RMB in 2009/Month 85 Max 15,000 800 15,000 1,200 15,000 Table 2.3. Origin-destination paired networks (%), by cities and measures N. Mean Std. Dev. Min Max Network I All 207 10.42 11.43 0.37 46.75 By Cities Guangzhou 23 18.61 6.36 9.74 30.23 Shenzhen 23 33.88 8.04 19.31 46.75 Zhuhai 23 2.35 1.42 0.89 5.21 Foshan 23 7.99 3.65 4.29 19.64 Zhaoqing 23 0.87 0.54 0.37 2.25 Dongguan 23 19.09 7.38 8.01 32.39 Huizhou 23 4.97 1.42 3.04 8.30 Zhongshan 23 4.17 1.88 2.07 9.81 Jiangmen 23 1.90 1.17 0.75 4.73 Network II All By Cities Guangzhou Shenzhen Zhuhai Foshan Zhaoqing Dongguan Huizhou Zhongshan Jiangmen 207 5.67 9.25 0.15 76.56 23 23 23 23 23 23 23 23 23 5.77 7.62 6.12 5.57 4.64 6.07 5.23 5.24 4.73 8.68 15.92 9.70 8.94 6.42 9.52 6.52 7.79 7.49 0.48 0.43 0.80 0.20 0.25 0.16 0.26 0.15 0.16 34.45 76.56 44.03 29.67 25.47 39.92 21.22 27.10 33.13 86 Three Regions (r) Observations City (j) Dongguan Shenzhen Guangzhou Foshan Zhongshan Zhuhai Jiangmen Huizhou Zhaoqing Total Table 2.4. Distribution of sample migrants by types Education Level=Low Education Level=High Guang NonGuang Nondong Neighbor Neighbor dong Neighbor Neighbor 219 344 331 211 244 195 23.74 24.2 10.5 17.35 2.74 6.85 4.11 6.39 4.11 100 28.2 15.12 22.38 12.79 9.01 3.78 4.07 2.33 2.33 100 31.42 22.66 18.43 6.04 5.74 3.63 5.44 3.93 2.72 100 87 24.64 32.7 15.64 9.95 2.84 4.74 2.84 3.79 2.84 100 33.61 23.36 18.44 5.74 8.61 5.74 0.41 0.82 3.28 100 43.08 22.56 14.87 2.05 2.05 5.13 3.08 3.08 4.1 100 Table 2.5. Probability of individual from region r to city j, by education levels Education Level=Low Education Level=High Three Guang NonGuang NonRegions (r) dong Neighbor Neighbor dong Neighbor Neighbor City (j) Dongguan 5.82 10.85 11.63 8.00 12.62 12.92 Shenzhen 5.93 5.82 8.39 10.62 8.77 6.77 Guangzhou 2.57 8.61 6.82 5.08 6.92 4.46 Foshan 4.25 4.92 2.24 3.23 2.15 0.62 Zhongshan 0.67 3.47 2.13 0.92 3.23 0.62 Zhuhai 1.68 1.45 1.34 1.54 2.15 1.54 Jiangmen 1.01 1.57 2.01 0.92 0.15 0.92 Huizhou 1.57 0.89 1.45 1.23 0.31 0.92 Zhaoqing 1.01 0.89 1.01 0.92 1.23 1.23 Total 24.50 38.48 37.02 32.46 37.54 30.00 88 Table 2.6. Imputed incomes for all observations by cities City (N=1544) Mean Std. Dev. Min. Dongguan 1,443 215 1,094 Shenzhen 1,706 259 1,215 Guangzhou 1,502 306 972 Foshan 1,311 259 738 Zhongshan 1,419 307 711 Zhuhai 1,571 335 816 Jiangmen 1,657 433 801 Huizhou 1,531 242 1,065 Zhaoqing 1,501 395 855 89 Max. 1,915 2,375 2,104 1,805 2,608 3,006 3,057 2,313 2,381 Table 2.7. Travel distance and time, by destination cities N=1544 Distance (km) Time (hour) City Mean Min Max Mean Min Max Dongguan 800 4.4 3,463 9.3 0.2 34.8 Shenzhen 852 75.7 3,459 9.8 1.3 35.1 Guangzhou 756 8.7 3,438 8.8 0.3 34.5 Foshan 767 1.6 3,459 8.9 0.1 34.7 Zhongshan 833 5.0 3,526 9.5 0.3 35.5 Zhuhai 856 0.0 3,555 9.9 0.0 36.0 Jiangmen 802 0.5 3,518 9.1 0.0 35.3 Huizhou 822 2.3 3,369 9.6 0.1 34.1 Zhaoqing 771 1.6 3,514 8.9 0.1 35.1 90 Table 2.8. Summary statistics of city attributes, city level Std. Variable (N=9) Description Mean Dev. Urban_pop Population of urban resident (million) 4.2 3.0 Gov_exp_pc Government expenditure per capita (billion in 2009 RMB /person) 4.1 2.6 Hospital_num_pc Number of hospital and health institution per capita 197 67 PriMidSch_num_pc Number of primary and middle school per capita 133 94 Unemployment Share of labor unemployed (%) 2.3 0.4 Industry_num_pc Number of industrial firms per capita 4763 3014 Construc_num_pc Number of constructional firms per capita 366 269 91 Min. Max. 1.3 8.8 1.1 9.1 125 318 40 1.6 322 2.8 1035 8930 123 807 Table 2.9. Distribution of sample migrants in nine destination cities Population Pop. Sample Sample Weight City N. Share (%) N. Share (%) (1) (2) (3) (4) (5) Dongguan 4,922,608 25.3 471 30.5 0.828 Shenzhen 5,848,539 30.0 350 22.7 1.324 Guangzhou 3,312,887 17.0 268 17.4 0.980 Foshan 2,206,538 11.3 141 9.1 1.240 Zhongshan 1,044,861 5.4 87 5.6 0.952 Zhuhai 581,476 3.0 74 4.8 0.623 Jiangmen 463,298 2.4 54 3.5 0.680 Huizhou 913,038 4.7 51 3.3 1.419 Zhaoqing 186,533 1.0 48 3.1 0.308 Total 19,479,778 100.0 1,544 100.0 92 Table 2.10. Estimation results of Network I on destination choice, without travel time VARIABLES Observations Networks (%) Income (ln) Income*Unemployment Unemployment Industry_num_th Construc_num_th Hospital_num_th PriMidSch_num_th Gov_exp_pc Urban_pop City Fixed Effects Log likelihood W/O Interaction (1) (2) 13,896 13,896 W/ Interaction (3) (4) 13,896 13,896 0.032*** (0.005) 0.267 (0.224) / / 10.443*** (1.759) -0.400*** (0.068) 17.147*** (2.758) -6.959*** (1.084) -6.898*** (1.025) -1.564*** (0.252) 2.456*** (0.352) 0.029*** (0.005) 0.210 (0.230) / / / / / / / / / / / / / / / / 0.030*** (0.005) 3.419*** (0.897) -1.409*** (0.384) 21.039*** (3.476) -0.394*** (0.068) 17.458*** (2.784) -7.148*** (1.087) -7.073*** (1.036) -1.607*** (0.255) 2.521*** (0.353) 0.029*** (0.005) 3.163*** (0.924) -1.320*** (0.396) 3.373 (3.017) / / / / / / / / / / / / No -2760 Yes -2756 No -2754 Yes -2751 *** p<0.01, ** p<0.05, * p<0.1 93 Table 2.11. Estimation results of Network II on destination choice, without travel time VARIABLES Observations Networks (%) Income (ln) Income*Unemployment Unemployment Industry_num_th Construc_num_th Hospital_num_th PriMidSch_num_th Gov_exp_pc Urban_pop City Fixed Effects Log likelihood W/O Interaction (1) (2) 13,896 13,896 W/ Interaction (3) (4) 13,896 13,896 0.012*** (0.003) 0.289 (0.220) / / 9.743*** (1.743) -0.418*** (0.067) 15.840*** (2.727) -6.526*** (1.073) -6.690*** (1.011) -1.485*** (0.250) 2.464*** (0.351) 0.012*** (0.003) 0.205 (0.229) / / / / / / / / / / / / / / / / 0.013*** (0.003) 4.007*** (0.868) -1.674*** (0.375) 22.410*** (3.424) -0.409*** (0.068) 16.342*** (2.759) -6.799*** (1.077) -6.928*** (1.024) -1.548*** (0.252) 2.548*** (0.352) 0.013*** (0.003) 3.637*** (0.904) -1.545*** (0.392) 3.436 (2.982) / / / / / / / / / / / / No -2765 Yes -2759 No -2757 Yes -2752 *** p<0.01, ** p<0.05, * p<0.1 94 Table 2.12. Estimation results of networks on destination choice, with travel time VARIABLES Observations First Type of Network Migration cost (Travel time in hours) Network I (%) Income (ln) Income*Unemployment Unemployment City Fixed Effects Log likelihood Second Type of Network Migration cost (Travel time in hours) Network II (%) Income (ln) Income*Unemployment Unemployment City Fixed Effects Log likelihood W/O Interaction (1) (2) 13,896 13,896 W/ Interaction (3) (4) 13,896 13,896 -0.698*** (0.068) 0.033*** (0.005) 0.334 (0.222) / / 11.482*** (1.777) -0.783*** (0.073) 0.028*** (0.005) 0.216 (0.231) / / / / -0.708*** (0.068) 0.032*** (0.005) 3.879*** (0.892) -1.581*** (0.380) 23.407*** (3.471) -0.784*** (0.072) 0.028*** (0.005) 3.343*** (0.935) -1.398*** (0.401) 4.207 (3.053) No -2702 Yes -2690 No -2695 Yes -2685 -0.671*** (0.078) 0.005 (0.003) 0.331 (0.218) / / 10.379*** (1.767) -0.784*** (0.082) 0.004 (0.003) 0.185 (0.229) / / / / -0.678*** (0.078) 0.005* (0.003) 4.255*** (0.863) -1.762*** (0.371) 23.742*** (3.417) -0.780*** (0.082) 0.004 (0.003) 3.589*** (0.916) -1.531*** (0.396) 3.620 (3.016) No -2720 Yes -2703 No -2711 Yes -2696 *** p<0.01, ** p<0.05, * p<0.1 95 Table 2.13. Estimation results of networks and education selections VARIABLES Observations Networks (%) Edu_high*Network Edu_high*Mig.cost Migration cost Income (ln) Income*Unempl. Unemployment (1) 13,896 Network I (2) 13,896 (3) 13,896 (4) 13,896 Network II (5) 13,896 (6) 13,896 0.028*** (0.005) / / / / -0.784*** (0.072) 3.343*** (0.935) -1.398*** (0.401) 4.207 (3.053) 0.020*** (0.006) 0.016*** (0.006) / / -0.786*** (0.073) 2.854*** (0.965) -1.258*** (0.409) 3.156 (3.081) 0.028*** (0.005) / / 0.299*** (0.106) -0.905*** (0.087) 3.194*** (0.946) -1.373*** (0.403) 4.032 (3.062) 0.004 (0.003) / / / / -0.780*** (0.082) 3.589*** (0.916) -1.531*** (0.396) 3.620 (3.016) -0.000 (0.004) 0.010** (0.005) / / -0.790*** (0.082) 3.789*** (0.922) -1.616*** (0.398) 4.277 (3.029) 0.004 (0.003) / / 0.286*** (0.108) -0.900*** (0.096) 3.438*** (0.925) -1.502*** (0.398) 3.394 (3.023) Yes -2681 Yes -2680 Yes -2696 Yes -2694 Yes -2692 City Fixed Effects Yes Log likelihood -2685 *** p<0.01, ** p<0.05, * p<0.1 96 Table 2.14. Estimation results of networks and micro-level nonfarm experience VARIABLES Observations Networks (NW) NW*Initial refer NW *Current refer NW *Years NFA Migration cost Income (ln) Income*Unempl . Unemployment (1) 13,896 Network I (2) 13,896 (4) 13,896 Network II (5) 13,896 (3) 13,896 (6) 13,896 0.030*** (0.006) -0.004 (0.006) 0.028*** (0.006) / / 0.036*** (0.006) / / 0.002 (0.004) 0.003 (0.005) -0.000 (0.004) / / 0.007 (0.004) / / / / -0.000 (0.006) / / / / 0.008 (0.005) / / / / -0.785*** (0.072) 3.320*** (0.936) / / -0.784*** (0.072) 3.342*** (0.935) -0.001** (0.000) -0.781*** (0.072) 3.367*** (0.937) / / -0.779*** (0.082) 3.576*** (0.916) / / -0.781*** (0.082) 3.557*** (0.917) -0.000 (0.000) -0.777*** (0.082) 3.592*** (0.916) -1.393*** (0.402) 4.161 (3.054) -1.398*** (0.401) 4.205 (3.052) -1.420*** (0.402) 4.308 (3.044) -1.526*** (0.397) 3.576 (3.018) -1.516*** (0.397) 3.504 (3.019) -1.529*** (0.397) 3.606 (3.016) Yes -2685 Yes -2682 Yes -2696 Yes -2695 Yes -2696 City Fixed Effects Yes Log likelihood -2684 *** p<0.01, ** p<0.05, * p<0.1 97 Table 2.15. Estimation results with networks and city amenities interactions VARIABLES Observations Networks (NW) NW*urban pop. Urban population NW*Number of Industrial firms Industry firm num. NW* Number of Construction firms Constr. firm num. Migration cost Income (ln) Income*Unempl. Unemployment City Fixed Effects Log likelihood (1) 13,896 Network I (2) 13,896 (3) 13,896 (4) 13,896 Network II (5) 13,896 (6) 13,896 0.126*** (0.028) -0.014*** (0.004) 0.212*** (0.037) 0.049* (0.028) / / / / 0.059*** (0.017) / / / / 0.014* (0.008) -0.001 (0.001) 0.176*** (0.022) -0.008 (0.009) / / / / 0.003 (0.007) / / / / / / / / -0.003 (0.004) 0.239*** (0.050) / / / / / / / / 0.001 (0.001) 0.320*** (0.039) / / / / / / / / -0.729*** (0.073) 3.142*** (0.940) -1.345*** (0.405) 7.279** (3.037) / / / / -0.780*** (0.072) 3.304*** (0.933) -1.385*** (0.401) 8.191*** (3.022) -0.054* (0.028) 1.702*** (0.368) -0.759*** (0.072) 3.184*** (0.937) -1.354*** (0.403) 7.290** (3.019) / / / / -0.765*** (0.083) 3.613*** (0.918) -1.547*** (0.398) 7.979*** (2.985) / / / / -0.792*** (0.082) 3.488*** (0.919) -1.492*** (0.398) 9.300*** (2.993) 0.002 (0.008) 1.706*** (0.231) -0.782*** (0.083) 3.581*** (0.916) -1.526*** (0.397) 8.236*** (2.984) Yes -2678 Yes -2684 Yes -2683 Yes -2696 Yes -2695 Yes -2696 Note: all the variables with interactions are jointly significant *** p<0.01, ** p<0.05, * p<0.1 98 REFERENCES 99 REFERENCES Bauer, T., Epstein, G.S. and Gang, I.N., 2009. Measuring ethnic linkages among migrants. International Journal of Manpower 30, 56-69. Bayer, P., Keohane, N., Timmins, C., 2009. Migration and hedonic valuation: The case of air quality. Journal of Environmental Economics and Management 58, 1–14. Beine, M., Docquier, F., Özden, Ç., 2011. Diasporas. Journal of Development Economics, Symposium on Globalization and Brain Drain 95, 30–41. Bertoli, S., 2010. Networks, sorting and self-selection of Ecuadorian migrants. Annals of Economics and Statistics/Annales d’Économie et de Statistique 261–288. Borjas, G.J., 1987. Self-Selection and the Earnings of Immigrants. The American Economic Review 531–553. Chiquiar, D., Hanson, G.H., 2005. International migration, self-selection, and the distribution of wages: Evidence from Mexico and the United States. Journal of political Economy 113, 239–281. Chiswick, B.R., 1999. Are immigrants favorably self-selected? The American economic review 89, 181–185. Dahl, G.B., 2002. Mobility and the return to education: Testing a Roy model with multiple markets. Econometrica 70, 2367–2420. Durand, J., Kandel, W., Parrado, E.A., Massey, D.S., 1996. International migration and development in Mexican communities. Demography 33, 249–264. Fafchamps, M., Shilpi, F., 2013. Determinants of the Choice of Migration Destination*. Oxford Bulletin of Economics and Statistics 75, 388–409. Grogger, J., Hanson, G.H., 2011. Income maximization and the selection and sorting of international migrants. Journal of Development Economics 95, 42–57. Massey, D.S., Goldring, L. and Durand, J., 1994. Continuities in transnational migration: An analysis of nineteen Mexican communities. American journal of Sociology 99, 1492-1533. McKenzie, D., Rapoport, H., 2010. Self-Selection Patterns in Mexico-U.S. Migration: The Role of Migration Networks. Review of Economics and Statistics 92, 811– 821. 100 Moraga, J.F.-H., 2011. New evidence on emigrant selection. The Review of Economics and Statistics 93, 72–96. Pedersen, P.J., Pytlikova, M., Smith, N., 2008. Selection and network effects—Migration flows into OECD countries 1990–2000. European Economic Review 52, 1160– 1186. Shaw, D., 1988. On-site samples’ regression: Problems of non-negative integers, truncation, and endogenous stratification. Journal of Econometrics 37, 211–223. Stark, O. Jakubek, M., 2013. Migration networks as a response to financial constraints: Onset, and endogenous dynamics. Journal of Development Economics 101, 1-7. Wooldridge, J.M., 2002. Inverse probability weighted M-estimators for sample selection, attrition, and stratification. Portuguese Economic Journal 1, 117–139. Yunez-Naude, A., Taylor, J.E., 2001. The determinants of nonfarm activities and incomes of rural households in Mexico, with emphasis on education. World Development 29, 561–572. 101 CHAPTER 3: VALUE-CHAIN CLUSTERS AND AQUACULTURE INNOVATION IN BANGLADESH 1. Introduction Creators of economic innovations, such as in technologies, products, production processes, business models, or new locations for markets, generate an advantage for themselves that usually translates into higher than average profits. This is called the “middleman model” (Lerner 1934, Just et al., 1979). Zilberman et al. (2017, this volume) note that innovators make choices to design, as much as possible, the value chain that “implements” their innovation, in order to sustain and protect their middleman advantage. This involves choosing whether to make or buy inputs or marketing services or indeed the product itself; what technology to use if the innovator chooses to “make”; and what institutional and organizational approach to use if the innovator buys upstream or downstream goods or services. Sometimes an innovator needs actors upstream or downstream from the innovator to themselves innovate in order to supply the specific goods or services needed – in volumes or timing or other attributes such as variety and quality. We can call this “collaborative inter-segment innovation.” An example is where McDonalds innovated in Argentina in the 1990s by standardizing large volumes of cooked French fries; it demanded collaborative innovations by French fry producing companies (such as McCain colocating in Argentina) and the Argentine farmers supplying them. The innovation of the processor and farmers was to form contract farming for a new variety of potato, the Atlantic variety, which produces better processed fries (Ghezan et al. 2002). 102 By extension, farmers in developing countries who innovate either in adoption of a new input or a new product as output, benefit from suppliers of inputs innovating in terms of the input provided (e.g., setting up input dealerships to supply the new inputs farmers then adopt) or the location of market (such as by co-location with the farmers). Innovating farmers also benefit from suppliers of output marketing services innovating in terms of adaptation to market new products and also of location of their services to be near farmers. Moreover, farmers would implicitly benefit from these co-innovators making their goods and services available in a timely manner and at low transaction cost and risk of accessing them. It is evident that the way that input suppliers, farmers, and wholesalers could most easily co-innovate with the least risk and transaction cost is to take advantage of economies of agglomeration by forming clusters. That formation can be a conscious group strategy or an accretive and spontaneous act. Apart from lower transaction costs, informational advantage also facilitates adoption of new technologies or better practices. Marshall (1920) notes that technology spillover is easier in clusters than outside clusters because one can see what others in the cluster are doing and imitate them. The prominent work by Porter (1998, 2000) draws from Marshall’s (1920) idea of “external economies” of co-located firms. Porter notes: A cluster is a geographically proximate group of interconnected companies and associated institutions in a particular field, linked by commonalities and complementarities…. The geographic scope of a cluster relates to the distance over which informational, transactional, incentive, and other efficiencies occur. More than 103 single industries, clusters encompass an array of linked industries and other entities important to competition. (Porter, 2000:16) The above conceptual foundation has driven a first, substantial, strand of literature on the formation of clusters, and clusters’ effects on technology (and product) choice of actors in the clusters. Much of the literature on this effect is in the non-agricultural sector (furniture (Sandee and Rietveld, 2001), metalworking (Kelley and Helper, 1999), software (Forman et al., 2005), garments (Visser, 1996), and shoes (Schmitz, 1995)). Relatively little has been done on food and agricultural products with a few exceptions (e.g., Zhang and Hu, 2014 for potatoes in China). This literature shows that clusters affect the productivity as well as the innovation and upgrading by actors in the cluster (such as product “value ladder” climbing and technology change to produce higher quality or more consistent production). This can be in both “process upgrading” (e.g. in the production technology of an aquaculture farm) and “product upgrading” (such as a shift from a local niche to commodity non-traditional fish species and newly commercialized niche species in my case). The inducement and facilitation of upgrading is a function of the ease of access to needed inputs and services where actors in clusters are complementary (Porter 2000; Humphrey and Schmitz, 2002; Sonobe and Otsuka, 2011), and where intra-cluster vertical and horizontal relationships create “collective efficiency” (Schmitz, 1995). Schmitz (1999) distinguishes between “incidental external economies” and “consciously pursued joint action” such as contracts between traders and subcontractors, or business associations. A second strand of literature joins the literatures on clusters and value chains. Examples include Humphrey and Schmitz (2000) and Pietrobelli and Rabellotti (2006). 104 The latter note that beyond the emphasis of the first strand of literature on local interrelations among firms in the cluster, there should also be a recognition that many of the actors in the cluster are engaged in domestic or global value chains and are also responding to price signals and governance institution (such as contracts and standards) signals from upstream and downstream in the value chain. They present non-agricultural cases and three agrifood cluster cases in Latin America - the dairy, fruit, and salmon clusters - and focus on small and medium enterprises and farms linking to large-scale international processors and wholesalers. The cases combine study of conscious joint action in the small firm/farm clusters and institutional coordination with contracts and standards by the large-scale actors. While this second strand of literature is rich in case studies of cluster actors, there are several gaps in the analysis. First, the studies tend to be of one or several clusters, but not of multiple clusters of differential intensities (low versus high) of clustering and distances from markets. Second, the studies tend to not examine inter-actor differences per segment, thus not controlling for the effect of the cluster on actors of different asset profiles. Both of these two sets of factors may substantially affect the effect of the cluster on actor behavior. To address these two gaps in the literature, as well as add to a literature with scant agrifood cases, my paper tests for the effects of clusters of different intensities of clustering on farm technology and product upgrading, controlling for micro level variables reflecting differential asset levels, as well as for other (than cluster) meso variables, such as distance to urban markets, the target markets for the value chain. While there is a rich literature on farm technology adoption (see Feder et al. 1985; Liu 2013), 105 and some farm technology adoption literature that models the impact of social learning from a density of adopters in a location or from networks (Bandiera and Rasul 2006, Conley and Udry 2010), the degree of clustering in the location of the farmer has not figured in the literature as a determinant of farmer choices in agriculture or aquaculture or indeed the food sector. To study that interaction I calculate and use as a determinant of farmer choice an index to include both horizontal agglomeration and vertical interconnections among actors in the value chain. The case is aquaculture farmers, feed and other inputs (e.g. chemicals) traders, and fish traders (wholesalers) in Bangladesh. I rely on a unique data set collected, that includes meso level information on the density of actors of different segments over time and over districts, as well as micro level data from a large survey of aquaculture farm households. To my knowledge this is the first time in the agricultural economics literature that such a combined data set has been collected. I study three innovations by farmers. The first is the adoption of “modern inputs” central to intensification of aquaculture (commercial feeds, fuel for generators for pumps, fertilizer, pesticides, lime, and vitamins), The second is the adoption of non-traditional (exotic) commodity fish species (tilapia and pangas catfish) that grow very rapidly and densely in ponds especially in combination with the modern inputs. This corresponds to the “commoditization” phase of the product cycle. The third is the adoption in aquaculture production (instead of wild capture as traditionally done) of various niche species. This corresponds to the “product differentiation” phase of the product cycle. Both of these are innovations relative to the traditional “extensive”, low-input 106 aquaculture production technology that has been focused on indigenous, slow-growing, and low density carps (Ali et al., 2013). I proceed as follows. Section 2 briefly provides background on the aquaculture sector in Bangladesh. Section 3 discusses the survey and sampling. Section 4 explains the clustering index. Section 5 discusses the econometric approach and provides descriptive statistics. Section 6 shows the regression results and robustness checks. Section 7 concludes. 2. Background on Aquaculture in Bangladesh Several points are important background to the subsequent discussion. First, demand and supply for fish has grown fast. Rapid urbanization and increases in incomes have spurred a strong increase in fish consumption (Belton et al., 2011; Reardon et al. 2014. Moreover, the aquaculture sector is growing fast in Bangladesh. Over 1984 to 2014, Bangladesh’s farmed fish jumped from 124,000 tons to 1.96 million tons, increasing by 1580%. As a result, aquaculture now accounts for 55% of Bangladesh’s fish supply, up from just 16% three decades ago (The department of fisheries of Bangladesh, 1994; 1997; 2006; 2015). The great majority (94%) of aquaculture production is destined for domestic consumption. (Nearly all the aquaculture product exports are of shrimp.) The farmed fish market grew by a factor of 25 times in three decades to nearly 2 million tons today. At most 10% of farmed fish are home-consumed, the rest are marketed. 42% of marketed farmed fish is consumed in urban areas and that share is growing quickly (Hernandez et al. 2017). 107 Second, aquaculture fish product composition has changed substantially over the past two decades. Carps are the traditional species in Bangladesh. As a response to pond water resources limitations (for extensive aquaculture) and due to disease outbreaks among shrimp and carp (Hussain, 2009), two species were introduced into Bangladesh aquaculture that allowed for intensification based on high densities of fish combined with use of various external inputs such as feed and chemicals. Pangasius catfish were introduced in 1989 from Thailand and spread quickly over farms thereafter. Tilapia production increased more than three fold from 2005 to 2007 due to the rapid expansion of tilapia hatcheries (Hussain, 2009). Both of these newly introduced species grow fast, perform well under a range of aquaculture conditions, and tolerate poor water quality. They thus can be farmed at higher intensities and attain higher yields than carps. Moreover, niche varieties that used to be wild-capture fish have been introduced mainly over the past decade into Bangladesh aquaculture and now marketed. This is part of the long term process of transition from hunting/fishing to farming, and the short/medium term process of transition over the product cycle, from initial traditional niche local products in aquaculture (different kinds of carp) to exotic commodities (tilapia, pangas), to differentiated products that include diverse tasty fish from wild capture now laboriously “domesticated” into ponds. At the all-sector level of fish farming in Bangladesh, tilapia and pangasius experienced dramatic diffusion in the 2000s (Ali et al., 2013). Our survey corroborates this. Table 3.1 shows that from 2008 to 2013, the share of carp in total farmed fish tons produced by the sample households decreased largely from 50.8% to 39.8%, while the 108 shares of tilapia, pangas, and niche fish species increased sharply. Carps output doubled, but tilapia niche varieties tripled, and pangasius quadrupled, only over five years. Third, farmers have gone from a traditional system in which there is no use of added feed or merely using manure or fertilizer to induce phytoplankton thence zooplankton growth on which fish feed, to use of supplementary feeding (with rice bran and other simple agricultural processing byproducts) and formulated feed concentrates. Mamun-UrRashid et al. (2013) estimate that the production of pelleted feeds increased at 32% per year over 2008-2012, to reach 1.07 million tons in 2012. Another 0.3-0.4 million tons of farm-made feeds are estimated to have been produced in 2012. They note that sinking feeds7 accounted for 81% of feed output, but the rest, the more efficient floating feed, has grown at 89% yearly over that period. 3. Survey and sample I draw on three sets of data. The first is secondary data on the geography of the clusters. The second is meso-level (district level) primary survey data on changes in the numbers of supply chain segment actors. The third is a primary survey of aquaculture farm households. Teams from IFPRI and MSU undertook the two primary surveys in 2014. Five main segments of the aquaculture value chain in Bangladesh are studied at the meso level (fish farmers, rural fish traders (wholesalers), input dealers, hatcheries, and 7 Sinking feed sinks to the bottom of the pond. Floating feed is produced by extrusion in such a way as to float. Floating feed is much more efficient than sinking feed as a much higher share of it is consumed rather than wasted. 109 feed mills). The micro survey of aquaculture farm households in 2014 with recall of 2013 and 2008 is the only segment examined in the micro analysis. Both the meso and micro data come from six broad aquaculture areas in Bangladesh. These six areas are the major areas of production and concentration of fish farmers and fish production in Bangladesh. The sample was chosen in the following steps. First, the six main aquaculture areas were identified using official statistics and our own analysis of the 2008 agricultural census. We then selected all the districts (first-level administrative division of Bangladesh, analogous to the “county” level in the US or UK) in these six areas. Each area contains several districts, ranging from one to five districts. The details can be found in the footnote.8 Second, within each district, we eliminated upazilas (townships) that have few aquaculture ponds and/or few fish farmers. We thus eliminated 46% of the upazilas, but kept the upazilas representing 84% of the total production area in all clusters. Third, after eliminating upazilas with few ponds, from the remaining universe we randomly selected upazilas using proportional probability sampling. Once upazilas were selected, we listed all mouzas (village level, which is below the township) within the selected upazilas. We then eliminated from the universe of mouzas all with less than 20 aquaculture farmers. Once the list of mouzas was trimmed, we randomly selected two to three mouzas per selected upazila. Fourth, once mouzas (PSUs) were randomly selected, we sent a team of enumerators to each mouza to do a rapid census of aquaculture farmers within the mouza. Once we collected the list of farmers in each mouza, we randomly selected 25 farmers per mouza 8 Area 1: Khulna, Jessore, Satkhira, Bagerhat, and Gopalganj; Area 2: Barisal, Bhola, Comilla, Chandpur, and Noakhali; Area 3: Narsingdi, Gazipur, Brahmanbaria, and Mymensingh; Area 4: Chittagong and Cox’s Bazaar; Area 5: Bogra, Natore, and Dinajpur; Area 6: Sylhet. 110 (20 farmers for our sample, plus 5 replacement farmers in case we failed to find any of the 20 selected farmers). In total, we obtained 1,540 aquaculture households from 32 upazilas, and 77 mouzas. Since I focus on fish farmers, I dropped farmers producing shrimp only, and farmers producing no fish in 2013 and 2008. The resulting sample is 1,514 fish farming households as the micro-level information, distributed in 20 districts, 32 upazilas, and 77 mouzas (villages). See Figure 3.1 for the location of the 20 sample districts. The selected mouzas are representative of 86% of the fish pond areas in the districts selected. In turn, the districts selected constitute 61% of all fish pond production in the country, based on data for 2014 from the Bangladesh Bureau of Statistics. The sample design for the inventory over years of the value chain actors (noted above) is based on the 77 mouzas, which were selected for the household sample. In each mouza (village), the survey conducted 2-5 interviews with key stakeholders concerning the number and size distribution of value chain actors at the upazila and district level over 2003, 2008, and 2013. The survey was conducted in 2014 to collect the data in 2013, and the same information was recalled for 2003 and 2008. To address the concern over measurement errors I use the cross-section analysis with information in 2013 in my regressions, while I show statistics in the descriptives section for both years to compare the patterns. The meso survey collected three categories for “size of the actor”: small, medium, and large. Table 3.2 presents the definitions of the size category by type of actor. Based on the average volume of transactions or size of actors (column 2) and their size categories (column 3), I generate the district level total transactions/sizes of actors, which are used 111 to calculate the clustering index at the district level. Thus in total I have 20 clustering levels for 20 sample districts. The results concerning the cluster indexes are discussed below. 4. Clustering index and its patterns 4.1. Calculation of clustering index Various indexes are used to characterize economic activity in a given area. The Hirschman-Herfindahl index is used for concentration; the Krugman Specialization Index for sectoral specialization; and the Agglomeration Index (Uchida and Nelson, 2009) for a combination of population size (of a city) and density of the population. None of these indexes gets at spatial agglomeration of diverse value chain segments and thus potentially how value chain actors in the same locality have high access to each other for interconnection (Long and Zhang, 2012). To fill this gape I calculate a clustering index at district level to capture the number as well as the relatedness of actors in the different segments of the aquaculture value chain in each district. My index is based on Long and Zhang (2011) and Delgado et al. (2015). Figure 3.2 shows the share of number of actors by segment per thousand rural people in the sample districts. One can see that fish farmers, input dealers, and fish wholesalers tend to co-locate in the same areas (northern and southern areas circled in yellow). However, feed mills tend to be relatively concentrated in the North, the most developed aquaculture area in terms of buildup over time. Following industrial agglomeration economy theory (Henderson, 1974), the geographic concentration of feed 112 mills is due to economies of scale in the feed industry, which is different from the other actors in the aquaculture value chain. Hatcheries tend to be spatially scattered, with high densities in three places (Mymensingh in the North, Cox’s Bazar in the East, and Jessore in the Southwest). This locational concentration could be explained in two ways. First, hatcheries, especially shrimp hatcheries depend heavily on natural sources, such as saline seawater, and thus not many places are suitable for them. Second, path dependency is a key factor in the development of hatcheries. Hatcheries started in places where there was a history of harvesting, nursing, and trading wild fry (baby fish captured from rivers, instead of hatchery-produced). Based on these dissimilarities over the spatial distribution of segments, to capture relatedness of actors within the cluster, I do not include hatcheries and feed millers in the calculation of the cluster index. Instead I include the three segments (farmers, wholesalers, and input dealers) with similar spatial concentration patterns. Specifically, I use the following six variables in 2008 (the lagged period five years before the survey) from the meso-level data to calculate the clustering index. I use Principal Component Analysis (PCA) to identify the underlying relationship among the three segments. V1: Total number of fish farmers in each district normalized by district rural population V2: Total area of fish ponds in each district normalized by total area of the district V3: Total number of traders in each district normalized by district rural population V4: Total volume of fish transactions in each district normalized by district rural population 113 V5: Total number of input dealers in each district normalized by district rural population V6: Total volume of input transactions in each district normalized by district rural population To avoid the concern that absolute values will favor large aquaculture districts, which naturally have more people working in the aquaculture value chain, the number of actors and total quantities of transactions are normalized by the district’s rural population. From the results of the PCA method, I adopt the first component with an eigenvalue greater than one to calculate the clustering index in 2008, which is a composite value as the sum of the weighted average of the six variables. To compare the clustering levels and changes over years, from 2003, to 2008, and to 2013, I apply the obtained weights for six variables in 2008 to the same normalized variables in 2003 and 2013. The calculated composite indicators from the PCA method range from -2.05 to 5.91. To make the indicators more comparable and meaningful, I first generate the district-level (d) z-scores based on the three-year (t) pooled indicators, 𝑧_𝑠𝑐𝑜𝑟𝑒!" =   (𝐼𝑛𝑑𝑖𝑐𝑎𝑡𝑜𝑟!" − 𝐼𝑛𝑑𝑖𝑐𝑎𝑡𝑜𝑟!"#$ )/𝐼𝑛𝑑𝑖𝑐𝑎𝑡𝑜𝑟!"# Then, I normalize the z-scores into 0-1 to avoid the negative value. The clustering index (CI) is calculated as follows, 𝐶𝐼!" =   (𝑧_𝑠𝑐𝑜𝑟𝑒!" –  𝑧_𝑠𝑐𝑜𝑟𝑒!"# )/(𝑧_𝑠𝑐𝑜𝑟𝑒!"# – 𝑧_𝑠𝑐𝑜𝑟𝑒!"# ) 114 The calculated clustering index ranges from zero to one. The greater the value, the higher is the clustering level. Specifically, the Mymensingh district has a value of one in 2013, which means that the clustering degree in 2013 of this district is the highest among all other sample districts in the three years. It is a relative value, and does not imply that the clustering level of Mymensingh cannot be higher if more actors establish in it. In total, I have 20 district-clusters (called clusters hereafter). Each district is regarded as a cluster with a unique clustering index. 4.2. Clustering patterns Table 3.3 reports the patterns and changes of the clustering index by districts over the three years spread over a decade. The results suggest that the degree of clustering varies greatly among the 20 districts well known for fish farming. Recalling that the 20 sample districts were selected from six aquaculture areas in Bangladesh, the results show a great variation of clustering degree among districts even within the same area. Here, to simplify the description of the patterns, I map the 20 sample districts into four geographic zones based on cardinal points (North, Southwest, South center, and East zones) that correspond to regional development debate in Bangladesh, rather than the six sampling areas discussed in the sample section. Specifically, among all the districts, Mymensingh, in the North, always has the highest degree of clustering for three years (2003, 2008, and 2013), yet the clustering degrees of other districts in the North are relatively low. This indicates a large and uneven spatial concentration of fish farming areas in the North. This result is explicable as Mymensingh is a traditionally important area for aquaculture. With the availability of numerous hatcheries (Ali et al., 2013), large feed mills (Belton et al., 2011), favorable 115 agro-ecological conditions, and access to aquaculture research institutions (Ahmed, 2009), Mymensingh is ranked first among districts of aquaculture farming in Bangladesh (Ahmed and Toufique, 2015). Unlike the spatially uneven concentration of clusters in the North, all four districts in the Southwest have a relatively high degree of clustering. This is particularly the case for the district of Khulna, whose degree of clustering is always ranked No.2, following Mymensingh in the North, during the three-year measures during the decade. The districts in the East have low degrees of clustering. In the three traditional fish farming districts (Mymensingh, Khulna, and Gopalganj) with high levels of clustering, their growth in the degree of clustering is also high. This implies that the initial fish farm value chain segments clusters are developing their clustering faster than the more recently established clusters. There are racers even among the top clustering districts: specifically, one district (Bhola in the Southcenter) recently (over the decade) rose to become among the top five clustered districts. To summarize, the clustering indices of 20 districts in 2013 shown in Table 3.3 are mapped in Figure 3.3. Three zones with districts with higher level of clustering degrees, including North (Mymensingh) area, Southwest (Khulna) area, and South Center (Bhola), are circled in red. Based on the measure of clustering, the populations of fish cluster actors as well as vertical relatedness of fish producers and service providers are greater in these regions. 116 5. Regression estimations 5.1. Regression specification The regressions address how the degree of district-level clustering affects fish farmers’ adoption of innovations, specifically the use of modern inputs (for farming intensification) and product composition changes – in particular the production of nontraditional commodity species and “niche” commercial species. As discussed before, to address the concerns of measurement errors I use the cross-section analysis with information in 2013 in my regression analysis, while I show descriptive statistics for both 2008 and 2013 to compare the basic changes. Apart from clustering, I test several other control variables that are assumed to affect farmers’ behaviors. The increasing demand for fish particularly in cities in Bangladesh suggests that urban proximity may affect all three farmers’ behaviors that I am testing. These hypotheses give us the following reduced form for the adoption decision of innovation k of farmer i in district j, ! 𝐴𝑑𝑜𝑝𝑡𝑖𝑜𝑛!" = 𝑓(𝐶𝐼! , 𝑈𝑃!" , 𝑋!" , 𝑉! , 𝑍, 𝜈! , 𝜀!" ) Modern inputs and new species are regarded as two sets of innovations (k) in this study. First, as noted above, I define modern inputs as fuel, fertilizer, pesticide, lime, purchased feed, and vitamins. According to the survey, a sample household’s share of the expenditures on the modern inputs over total expenditures during fish production (from pond preparation to harvesting stages) is 32.6%, without imputing the family labor cost. In the analysis, in one variation, I also separate the purchased feeds from total modern 117 input. Two input-related variables are used as dependent variables, including adoption of the modern inputs, and the expenditure on modern inputs per acre. The former captures the adoption decision directly, and the latter stands for the technology intensification via addition of modern (variable) inputs. Second, because 95% of the sample households produced more than one species (on average, a household farmed four fish species), rather than analyzing the binary adoption decision, I model the share of output by species sets. The species sets I analyze are carps, tilapia, pangasius catfish, and (newly commercialized) niche fishes. I expect the coefficient of the degree of clustering on the carps share to be negative, and the coefficients on the tilapia and pangasius shares to be positive for the reasons discussed above. Moreover, I include a regression explaining the number of species produced by the household as an index for specialization. I expect that the effect of clustering on the number of species to be negative: the higher the clustering is, the more speciesspecialized the household, and thus economic gains from specialization to be facilitated by clustering. CIj stands for the calculated clustering index in district j. I hypothesize that fish farmers in areas with greater clustering are more likely to adopt innovations, due to reduced risks and transaction costs, and due to the “incidental external economies” that de facto equal collaborative innovations over segments in the supply chain. UPij captures urban proximity. With the geographic coordinates of sample households, I used Google Maps to calculate travel time from households to the three biggest cities in Bangladesh (Dhaka, Chittagong, and Khulna). I adopt the minimum 118 travel time of the three to measure the urban proximity of household i in district j. I assume that the coefficient on adoption is negative, and thus being close to a large city will induce adoption of innovations. This would be an effect similar to what Fafchamps and Shilpi (2003) found for proximity to cities on farmer’s fertilizer use. The positive impact of urban proximity may be because of the payoff to modern input use due to lower transaction costs to access markets and meet their demand for increasing volumes, and the payoff to non-traditional species adoption to meet the changing diet patterns in cities consistent with Bennett’s Law. Xij contains a vector of household controls including gender and education of the household head, family size, total area of crop lands and ponds, and nonfarm employment. The latter is expected to favor adoption of modern inputs as nonfarm employment is a cash source for inputs in the presence of credit constraints. It also permits overall income risk management (Reardon et al. 1994). The hypothesis on the land variable is, however, ambiguous. Wealth can reduce risk aversion and thus spur adoption of innovations. But it may be the smaller aquaculture farmers who intensify with modern inputs and maximize revenue with profitable non-traditional species that are best for densification9 of fish stocks in a pond. Vj is a district-level variable: it is total length of paved roads, used as a proxy for road access. To avoid the potential endogeneity of roads (such as unobserved district level economic factors affecting both adoption and roads), I use the five-year lagged value. 9 Densification is a subset of what in economics is intensification. One cannot increase the density of fish per acre, but adding non-land inputs is intensification. Holding non-land inputs apart from seed constant, increasing seed increases density as an outcome, which is also intensification. 119 To capture the dissimilarities among zones, I control for locational fixed effects by adding zone (Z) dummy variables. 𝜈! contains district level unobservables, and 𝜀!" is an idiosyncratic error term. I estimate the adoption regressions with Probit, and the other equations with OLS. The heteroskedasticity robust standard errors are reported. My results are robust to Tobit and Fractional analyses, even when the dependent variables are in the fractional (share) forms. 5.2. Econometric issues and their resolution There are two main econometric issues and for each I note how I address it. First, a potential concern in the above analysis is the existence of unobserved district level variables (𝜈! ), which may affect the degree of district clustering and the farmer adoption decision. Unobserved aquaculture-related natural advantages like water quality and disadvantages, like floods, could affect the number of actors or their linkages in the cluster, or reduce the incentive for technology adoption by farmers. In that case, my estimated coefficient on the clustering index would be biased. To address this concern, I construct a set of district level instrumental variables based on the characteristics of crop clusters. As rice and fish are the two fundamental agricultural sectors in Bangladesh, districts with more aquaculture clustering may have lower (substituted) or higher level (rice-fish inter-production) crop clusters. That is to say that the district-level crop clusters are closely correlated with the fish clusters. Moreover, I assume that the level of crop clustering will not directly impact fish farmer’s technology or species adoption decisions as it is very unlikely that crop input dealers sell fish feeds (based on they fish trader 120 survey, only 4% of traders also trade some other agricultural products except for fish). Based on these two conditions, the crop clustering is regarded as the instrumental variables (IVs) for fish clustering, which is consistent with the exclusion restriction of using IVs. Similar to the calculation of the fish clustering index, I use two normalized districtlevel variables as the IVs to capture the degree of crop clustering in a district, including: 1) the quantity of inputs used for crop production, to capture input provider-crop producer relations; and 2) quantity of crops sold, to capture output trader-crop producer relations. These variables are generated from the nationally representative household income and expenditure surveys (HIES) conducted in 2005 and 2010 by the Bangladesh Bureau of Statistics. Since the adoption equation is estimated using a Probit (non-linear) model, I use control function estimation instead of two-stage least squares. However, to first test the endogeneity of the clustering index and the validation of the IVs, the first stage regression result is shown and the results using different IV methods are reported and compared in Table 3.A2. The Hausman tests imply the need to use IVs. The significant impacts of the two IVs on the clustering index show that the IVs are not weak IVs. The negative impacts show that the districts with higher clustering of aquaculture value chain actors have fewer crop inputs and crop outputs transactions or lower levels of crop clustering. This is as expected, since fish production requires more ponds while crop production requires more crop lands to produce, thus one district might be hard to be agglomerated by two types of clusters. The similar coefficients and standard errors of the estimators from three IV methods, including two-stage least square (2SLS), limited 121 information maximum likelihood (LIML), and generalized method of moments (GMM), suggest that the IVs are proper. Although I reject the over-identification restriction at the 5% level, meaning that one of the IVs is likely to be invalid when assuming another IV is valid, I will report results both with and without IVs for all the models in the results section, so the differences can be compared. Second, an endogeneity problem could be caused by reverse causality. Ideally, I would not worry about the reverse causality, since it is less likely that a famer’s adoption of innovations (micro-level) would affect the clustering of all actors (macro-level). But in the case of the extreme monopoly situation as described in the middleman model discussed in the introduction, I provide an additional robustness check by using only the sub-sample of those who produced fish in 2013 but not in 2008. By doing so, I exclude the concern that adoption of innovations of farmers happened earlier than the existence of the fish clusters. 5.3. Descriptive results Table 3.4 shows the descriptive statistics of household and district variables. Although I do not use 2008 recall information in the regressions, I report two annual statistics here to show basic changes. The first three columns of Table 3.4 show the characteristics of sample households and districts for two years and significance tests between the two. The right six columns compare households in districts with lower and higher levels of clustering for two years. The degree of clustering grew significantly from 2008 to 2013, particularly for districts that have been important in aquaculture for the past two decades. This implies that the initial fish farm value chain segments clusters are being developed and 122 strengthened, although the growth rate is also high in the recently established clusters where the clustering degree is still low. These findings are consistent with what are shown in Table 3.2, and the reasons are discussed as above. Households differ by the degree of clustering of their districts in several interesting ways. First, higher clustering districts have a significantly higher share of participation in nonfarm activities. Second, they tend to be within three hours of one of the three large cities (Dhaka, Chittagong or Khulna); clustering of the district and proximity to a large city are correlated. Households in the districts with lower clustering are twice as far (in travel time terms) as those in highly clustered districts. All else equal, it is reasonable that fish farm clusters are nearer urban demand points. Third, infrastructure density is greater in higher clustered districts. But in both low and high clustering districts total paved road length has increased similarly in the past five years. Economies of agglomeration appear to be a function of commerce and transport transaction costs, controlling for other factors like natural endowment of water of the zone. Fourth, more than 70% of households in higher clustering areas own some ponds alone (rather than in groups of households), yet that share is only 50% for households in lower clustering areas. Farmers with sole ownership may perceive a greater ability to appropriate investments in modernizing their operations. Yet also the share of farmers renting rather than owning is higher in high cluster districts. Yet I did not find significant differences in aquaculture farm size (operated pond area) and years of pond operation between low and high clustering areas. This suggests that in high clustering areas farms 123 are being added and existing farms are undergoing intensification, compared with low cluster areas. Table 3.5 provides descriptive statistics on farmers’ adoption of modern inputs. The adoption rates between high and low clustering districts are significantly different. Around 90% (average for two years) of farmers in high clustering areas adopt at least one type of modern input (commercial feed, fuel, fertilizer, pesticide, lime, vitamins) versus 70% for households in low clustering areas. This pattern is also true for farmers’ adoption of commercial feeds alone and modern inputs outside of commercial feeds. The shares of expenditures on modern inputs over total input expenditures during fish production (from pond preparation to harvesting stages) are also greater for farmers in high degree clusters, no matter whether we include the imputed cost of family labor. These results make sense for the reasons we discussed above. For carps, a lower share of households in higher clustering areas grows them, as expected. On average, the share of carps in output was 63% for households in high clustering areas, more than 10% lower than those in low clustering areas. For pangas, the opposite pattern emerges. As farms are multi-species (four on average), the share of farms growing pangas is actually higher in low cluster districts but the share in overall output is lower. The findings for tilapia are, however, contrary to my initial expectations. The table shows that the share of farms and of total output in higher clustering areas are actually both somewhat lower than in low clustering areas. There are two possible reasons for this. 124 On the one hand, Table 3.3 showed that Mymensingh district in the North and districts in the Southwest have high degrees of clustering. But districts in the Southwest have a disproportionate focus on shrimp (in Khulna, Satkhira, Bagerhat) and carp (Jessore); the Mymensingh district is specialized in pangasius, niche species, and carps. Hence, tilapia is farmed mainly in polyculture in these high cluster districts. Also, around 16% of farmers produce shrimp in high clustering areas, while almost none do so in low cluster districts. On the other hand, there are localized pockets of intensive tilapia (as opposed to polyculture) production that emerged relatively recently in the Northern and Eastern zones, but mainly on small farms outside our district-level clusters, according to key informants. I do not have data on these localized pockets. Finally, the number of species per household, measuring household’s specialization level, is not statistically different by degree of clustering. This is consistent with the finding of species polyculture in all districts. 6. Results of Regressions 6.1. Adoption of modern inputs Table 3.6 shows the regression results on adoption of modern inputs. The left two columns show households adopt some modern inputs if they use any commercial feed, fuel, fertilizer, pesticide, lime, or vitamins. The middle two columns exclude commercial feeds, and the right two columns show adoption of commercial feeds only. Both results are shown with and without use of control function estimation to take care of the endogeneity issues are reported. 125 The salient findings are as follows. First, a major finding of the paper is that regardless of how I specify the modern input set and estimation method, the degree of clustering has a positive and significant effect on use of modern inputs, especially commercial feed. For the coefficients of the clustering variable, three Probit estimations using IVs show much larger estimators than models that do not use IVs, but the signs and significant levels of the estimators are consistent between the two methods. The reasons for these effects were discussed above in the conceptual framework. Second, another major finding is the significant negative coefficients on the distance to major city and adoption of modern inputs, especially again on commercial feeds. In terms of the estimations, I find the estimators of travel time to the nearest big city from the IV-Probit method are smaller than those from the regular Probit method. The impact of city distance and clustering degree can also be observed using per acre expenses on modern inputs, shown in Table 3.7. The three OLS-IV regressions may not provide good results, as the coefficients of the residuals are not significant. Therefore, I will explain the results here based on OLS estimations only. Similar to adoption of modern inputs, households in high clustering areas and close to a big city use intensification technologies (with more external inputs). As above, the impacts on commercial feeds are greater than other modern inputs. Moreover, Table 3.6 and Table 3.7 show that fish farmers with larger areas of crop land are more likely to adopt modern inputs and spend more on modern inputs per acre on their fish farms. Larger farms imply more wealth and potentially better access to credit as well as cash from crop sales, thus are less likely to be constrained by the liquidity constraints to adopt and intensify in modern technology. 126 6.2. Adoption of nontraditional commodity and newly commercialized aquaculture niche species Table 3.8 shows the results of regressions explaining the quantity shares of carp, tilapia, pangasius catfish and niche fishes. As shown in Table 3.5 that the share of pangas volume is low, I will combine tilapia and pangas together as modern species. First, an important finding is that fish farmers in districts with higher clustering have significantly lower shares of carp production, since carps belongs to the traditional species in Bangladesh. But the clustering degree significantly increases the output share of tilapia and pangasius species combined, as well as the newly commercialized niche fishes. The positive impacts on these non-traditional species are consistent with my assumption that clusters help adoption of or specialization in non-traditional species, probably through decreased costs to produce/sell these species through clusters of actors in the supply chain. The coefficients of residuals of the first two species sets are significant, so I interpret the OLS-IV results for carp, tilapia and pangas regressions, but OLS for niche regression. Interestingly, I find significantly negative impacts of travel time to a big city on output share of traditional versus non-traditional species. This result is again consistent with my hypothesis that proximity to big cities would increase the adoption of more advanced species to meet the changing urban diets. The results thus emphasize the importance of both supply side (clusters) as well as demand side (urban proximity) on intensification in non-traditional farmed fish species in Bangladesh. Additionally, I explore the impacts of the clustering degree on adoption of species composition conditional on types of farmers. The hypothesis behind this is that although 127 clusters decrease the output share of carps, it is possible that carp producers in areas with higher degrees of clustering are more likely to adopt advanced inputs. This hypothesis is confirmed by the results shown in Table 3.9. Specifically, both carp farmers, tilapia and pangas producers are more likely to adopt modern inputs, especially commercial feeds, in districts with higher clustering degree, and being close to a big city. This is particularly the case for carp farmers, the results of which are robust to methods with and without IVs. 6.3. Specialization versus diversification of fish production Table 3.10 shows the regression results on specialization. Household’s specialization level is measured as the number of species farmed by the household. Several points stand out. First, clustering of the district induces farmer’s species specialization significantly. It could be that output traders persuade farmers to specialize in species they need, or easy access to feed dealers helps farmers to specialize in high stocking density feed-requiring non-traditional species, all else equal. Second, fish farmers living close to a big city are more diversified. Diversified fish farmers may lose on economies of specialization but gain on flexibility to meet diversified and evolving urban fish markets. 6.4. Robustness check In this section, I show the results using a subsample to address the potential concern about reverse causality, in particular that the adoption of innovations by farmers as a first 128 step might then induce clustering of off-farm actors to meet their needs. To avoid the situation that adoption happened before development of clusters, I use fish farmers that produced fish in 2013 only but not in 2008. By doing so, one can safely assume that the adoption of fish production-related innovation will be affected by the degree of clustering, instead of vice versa. As before, each model is estimated using two methods: with IVs and without IVs. In Table 3.11, I show the results with a full sample and a sub-sample. Most of the key results (in bold) are consistent between the two samples and the two methods. Specifically, clusters increase the adoption of commercial feeds for all sample households, including both carp farmers and more non-traditional farmers who adopt tilapia and pangasius. Clusters also increase households’ production of non-traditional commodity species and specialization in fish species. As before, urban proximity plays significantly consistent roles in spurring adoption of commercial feeds and of non-traditional species. 7. Conclusions Farmers adopting and implementing innovations, such as new technologies and new products, often require “collaborative inter-segment innovation” by other actors in other segments of the value chain, such as wholesalers implementing new product innovations such as supply of commercial fish feed and chemicals and buying and marketing nontraditional fish species. I tested whether the clustering, and thus economies of agglomeration with implied lower transaction costs, encourage and facilitate farmers to innovate in technology choice (adoption of modern inputs) and product choice (adoption 129 of non-traditional “commodity” species exotic to Bangladesh (tilapia and pangasius catfish) and of newly commercialized and-pond farmed local niche species). Clustering in the meso environment of the farm as a potential determinant of farmer choices has not been studied in agriculture or aquaculture or indeed the food sector. I used a unique data set from our own primary survey of the aquaculture value chain in Bangladesh, including micro data for 1500 fish farm households and 20 districts (77 villages) for meso level data. I calculated an index to include both horizontal agglomeration and vertical interconnections among actors in the value chain. The analysis gave rise to several key findings. First, I found that being in an area with a high clustering index is associated with a higher probability of farmers intensifying via use of more modern inputs (especially commercial feed) and making the fish species product choices noted above. This emphasizes the importance of value chain off-farm components for farm modernization. Second, I found that farmers in clusters nearer to big cities have twice the propensity to make these innovations compared to farmers far from the big cities. Third, I found that the degree of clustering spurs specialization (or a reduction of the widespread species diversification practiced), allowing relative gains from economies of specialization. Fourth, controlling for the meso variables of clustering and distance to cities, I found that while ponds area does not drive adoption of the modern inputs (hence there is not an “exclusion” factor in fish farm size unlike what we expected), there is a positive correlation between lagged crop farm size (part of the overall household farming operation of fish farmers) and rural nonfarm employment. These two sources of cash to 130 invest in fish farming implies the possibility of local credit market constraints and thus need for reliance on own cash sources (a common finding in the crop literature, see Reardon et al. 1994). The policy and research implications of the findings are as follows. First, a key implication of the approach and findings is that research would benefit from increased attention to meso level variables, including clustering in particular and economies of agglomeration in general, and distance to urban areas, as a way of enriching study of the microeconomics of product composition and technology choice. This will be increasingly important as large-scale urbanization as well as local small town development proceeds apace in developing regions. Second, the research reaffirms and extends the importance of study of rural nonfarm employment on farm technology and product choices. Again, this will be increasingly important as the rural nonfarm labor market develops incessantly in these regions (Haggblade et al. 2010). Third, a key implication for policy is that development of value chains in general, and the local off-farm components of those chains, such as input dealers and output wholesalers, would best be important parts of strategies of agricultural and commerce ministries to spur modernization of farming as well as alignment of fish species adoption with patterns of urban food demand. Part and parcel of that development is the array of infrastructural investments – and other “enabling business” policies (World Bank, 2017) that facilitate the commercial environment for clusters and value chain actors to invest. As I found that older established fish farm areas had more rapid cluster development, it is important for these 131 policies to level the playing field for the newcomer aquaculture districts to catch up and develop rapidly. Finally, I see as complementary the findings concerning meso level determinants of farm modernization, and extant policy prescriptions – such as extension and collective asset provision - for building capacity of farms to adopt technologies and shift product mix to take advantage of opportunities in product markets – and take advantage of the opportunities afforded by the reduction of transaction costs to reach new markets due to clustering itself. 132 APPENDIX 133 APPENDIX Table 3.1. Farmed fish output of sample households, by species 2008 2013 2008-2013 Output Output Growth Rate Total Pie % % (Kg) (Kg) of Output (%) Observations 1084 1514 / Carps 417,411 50.8 869,268 39.8 108 Tilapia 124,967 15.2 310,490 14.2 148 Pangasius 195,608 23.8 873,521 40 347 Shrimp 60,733 7.4 86,656 4 43 Niche 7,905 1 24,457 1.1 209 Other 15,474 1.9 18,896 0.9 22 Total 822,098 100 2183286 100 166 Note 1: the list of fish species in each category can be found in the Table 3.A1. Note 2: unbalanced sample since farms do not produce fishes in 2008 are excluded 134 Table 3.2. Distribution of size categories of different types of actors in the study areas Size Actor Defining Characteristic Definition Category Small Less than 0.1 ha Hatcheries Total production area in hectares Medium 0.1 to 2.0 ha Large More than 2 ha Feed Mills Total metric tons of feed produced per month Small Medium Large Less than 50 MT 50 to 300 MT More than 300 MT Input Traders Total metric tons of feed sold per month Small Medium Large Less than 10 MT 10 to 100 MT More than 100 MT Farmers Total pond area in hectares Small Medium Large Less than 0.5 ha 0.5 to 1.0 ha More than 1 ha Traders Total metric tons of fish traded per week Small Medium Large Less than 1 MT 1 to 5 MT More than 5 MT 135 Table 3.3. Clustering in the 20 districts in four cardinal-points zones Change Change Change rate Change rate District Zone 2003 2008 2013 03-13 (1308-13 (1303)/03 08)/08 Mymensingh 0.236 North 0.528 0.809 1.000 0.472 0.894 0.191 North Natore 0.574 0.078 0.122 0.192 0.114 1.462 0.070 North Bogra 0.623 0.057 0.106 0.173 0.116 2.035 0.066 North Gazipur 0.078 0.080 0.115 0.124 0.044 0.550 0.009 North Dinajpur 0.652 0.014 0.023 0.038 0.024 1.714 0.015 North Narsingdi 0.226 0.026 0.031 0.038 0.012 0.462 0.007 Khulna Bagerhat Jessore Satkhira Southwest Southwest Southwest Southwest 0.421 0.410 0.255 0.200 0.542 0.375 0.326 0.280 0.683 0.448 0.396 0.372 0.263 0.038 0.141 0.172 0.625 0.093 0.553 0.860 0.142 0.074 0.071 0.092 0.262 0.197 0.218 0.329 Gopalganj Bhola Barisal Chandpur Southcenter Southcenter Southcenter Southcenter 0.413 0.114 0.080 0.000 0.484 0.263 0.111 0.005 0.646 0.560 0.158 0.011 0.233 0.446 0.078 0.011 0.564 3.912 0.975 - 0.162 0.297 0.047 0.006 0.335 1.129 0.423 1.200 Brahmanbaria 0.311 East 0.146 0.193 0.253 0.107 0.733 0.060 East Noakhali 0.432 0.088 0.118 0.169 0.081 0.920 0.051 East Sylhet 0.438 0.032 0.073 0.105 0.073 2.281 0.032 East Cox's bazar 0.379 0.058 0.066 0.090 0.032 0.552 0.025 East Chittagong 0.246 0.059 0.069 0.086 0.027 0.458 0.017 East Comilla 0.560 0.032 0.050 0.078 0.046 1.438 0.028 Note: The clustering measure in 2008 is generated by principle component analysis on the six variables explained in the text (Section 4.1). The measures in 2003 and 2013 are calculated based on the parameters generated from the principle component analysis based on data in 2008. 136 Table 3.4. Descriptive statistics of household and district characteristics HH Level Observations Clustering Index (CI) 2008 2013 1084 0.27 1514 0.35 Individual & Household Variables HH Size 4.9 4.9 HHH Male (%) 92.7 92.5 HHH Education. Level 1.4 1.5 (1-5 levels) % of HH doing any 42.4 54.7 nonfarm activities Annual income from nonfarm activities 32.2 42.3 (1,000 Taka/HH) Hours to nearest large city (either Dhaka, 2.7 2.7 Chittagong, or Khulna) HH's fish farming characteristics Operated crop land area 0.83 0.98 (Acre) Operated pond area 1.23 1.26 (Acre) Ownership (% of HH owned some ponds 65.1 61.5 alone by themselves) Ownership (% of HH owned some ponds by 93.8 89.2 themselves or jointly with other households) Average years of ponds 19.2 15.8 District Variable Total length of paved roads per district (km) (four-year lag) By 2008's CI [median] and Year 2008 2013 TLow High Low High Test 527 557 752 762 0.08 0.44 *** 0.12 0.58 TTest 5.1 89.6 4.7 95.7 1.5 1.4 *** 36.6 47.9 *** 34.1 30.4 3.6 1.8 0.89 All HH 271 343 TTest *** 5.1 89.5 4.7 95.5 *** *** 1.5 1.4 ** 48.7 60.6 *** 43.7 40.9 3.5 1.9 *** 0.78 1.08 0.89 ** 1.33 1.13 1.31 1.20 * 54.5 75.2 *** 52.0 70.9 *** *** 97.3 90.5 *** 92.6 86.0 *** *** 19.6 18.7 16.2 15.5 *** 318 226 390 296 *** *** *** *** *** *** *** Note: unbalanced sample since farms that do not produce fish in 2008 are excluded *** p<0.01, ** p<0.05, * p<0.1 137 *** Table 3.5. Patterns in fish farmers’ adoption of technology innovations All HH HH Level 2008(1) 2013 TTest Low By 2013's CI [median] and Year 2008 2013 THigh Low High Test 557 752 762 Observations 1084 1514 527 I. Inputs Modern Inputs [include commercial feeds, fuel, fertilizer, pesticide, lime, vitamins, probiotic] % of HH used any 77.7 84.5 66.6 88.2 *** 74.9 94.0 *** modern inputs % of HH used modern 68.2 76.9 57.3 78.5 *** 66.8 87.0 *** inputs, excluding commercial feeds % of HH used any 64.0 73.8 54.3 73.2 *** 64.2 83.2 *** commercial feeds (purchased any feeds) Average expenses on all modern inputs per acre per HH (Taka/Acre) Average expenses on commercial feeds per acre per HH (Taka/Acre) 6,349 12,182 *** 5,805 6,863 10,819 13,527 4,784 9,422 *** 4,580 4,977 8,650 10,184 Share (%) of expenses on … over total expenses during fish production per HH (3) % on modern inputs, with family labor cost 15.0 17.5 *** 11.9 18 *** 14.4 imputed % on modern inputs, without family labor 28.1 32.6 *** 20.7 35.1 *** 25.9 cost imputed % on commercial feeds, with family 7.6 9.4 *** 7.5 7.8 9.4 labor cost imputed % on commercial feeds, without family 14.2 17.2 *** 12.7 15.5 ** 16.3 labor cost imputed % on fingerling, with family labor cost imputed 36.3 36.0 41.9 31.1 *** 42.0 % on fingerling, without family labor cost imputed 60.4 59.0 69.2 52.1 *** 67.4 % on medicines, with family labor cost imputed 0.2 0.5 * 0.3 0.2 0.7 138 TTest *** *** *** 20.5 *** 39.3 *** 9.5 18.2 30.0 *** 50.6 *** 0.3 Table 3.5. (cont’d) % on medicines, without family labor cost imputed % on hired labor, with family labor cost imputed % on hired labor, without family labor cost imputed % on family labor, with family labor cost imputed 0.3 0.7 * 0.4 0.2 0.8 0.5 3.3 4.3 ** 1.9 4.6 *** 3.1 5.4 *** 5.1 6.2 * 3.0 7.0 *** 4.4 8.0 *** 40.3 41.0 39.0 41.5 39.1 42.9 ** 4.2 4.4 *** 4.2 4.3 Number of Species (4) Share (%) of fish farms, by species 96.2 95.2 98.5 94.1 Carps 61.9 61.4 71.5 52.8 Tilapia 7.7 8.9 10.1 5.6 Pangasius 28.0 25.7 2.3 52.4 Shrimp 4.4 4.5 5.1 3.8 Niche 11.2 10.7 2.8 19.0 Other (3) Share (%) of fish volume over total output per HH, by species 70.0 67.4 ** 76.9 63.5 Carps 16.3 17.6 * 19.6 13.1 Tilapia 2.7 4.2 ** 2.3 3.1 Pangasius 8.8 8.5 0.5 16.7 Shrimp 0.4 0.7 0.3 0.4 Niche 1.8 1.7 0.4 3.1 Other 4.4 4.4 97.6 73.0 10.8 1.9 4.7 2.8 92.8 49.9 7.1 49.2 4.3 18.5 *** *** ** *** 72.8 22.5 3.2 0.4 0.6 0.5 62.1 12.7 5.1 16.4 0.7 2.9 *** *** ** *** II. Species Specialization *** *** *** *** *** *** *** *** *** Note (1): unbalanced sample since farms do not producing fish in 2008 are excluded Note (2): the list of fish species in each category can be found in the Appendix Note (3): Total expenses with family labor imputed = expenses on modern input + fingerling + medicine + hired labor + imputed family labor Total expenses without family labor imputed = expenses on modern input + fingerling + medicine + hired labor Note (4): this share is an average share over households; it differs from the shares in Table 3.1, which are shares of an aggregate “pie” *** p<0.01, ** p<0.05, * p<0.1 139 *** *** Table 3.6. Regression of clustering degree on adoption of modern inputs Adoption of Mod. Adoption of Mod. Adoption of Inputs, w/ feeds Inputs, w/o feeds commercial feeds (N=1,514) Probit Probit-IV Probit Probit-IV Probit Probit-IV Clustering Degree 0.129*** 0.322*** 0.097** 0.226* 0.150*** 0.435*** HH Level Controls HH size HHH male (dummy) HHH education level Nonfarm activities Total area of crop lands Total area of pond Pond ownership Pond years Hours to nearest big city (0.041) (0.094) (0.044) (0.116) (0.048) (0.119) 0.002 (0.004) 0.038 (0.031) -0.003 (0.007) 0.000 (0.018) 0.020*** (0.008) 0.001 (0.001) 0.030* (0.018) -0.003*** (0.001) -0.015*** (0.005) 0.001 (0.004) 0.037 (0.030) 0.000 (0.007) -0.003 (0.018) 0.020*** (0.008) 0.001 (0.001) 0.020 (0.018) -0.002*** (0.001) -0.006 (0.007) 0.003 (0.005) 0.076** (0.037) 0.007 (0.008) -0.002 (0.021) 0.020** (0.008) 0.002 (0.002) 0.018 (0.022) -0.003*** (0.001) -0.023*** (0.007) 0.002 (0.005) 0.075** (0.037) 0.009 (0.008) -0.003 (0.021) 0.020** (0.008) 0.002 (0.002) 0.011 (0.022) -0.003*** (0.001) -0.017* (0.009) 0.003 (0.005) 0.064* (0.038) -0.007 (0.008) 0.007 (0.021) 0.038*** (0.010) 0.001 (0.002) 0.035 (0.022) -0.005*** (0.001) -0.045*** (0.007) 0.001 (0.005) 0.061 (0.038) -0.002 (0.008) 0.004 (0.021) 0.039*** (0.010) 0.002 (0.002) 0.020 (0.022) -0.005*** (0.001) -0.031*** (0.009) -0.015* (0.009) -0.010 (0.012) -0.014 (0.012) -0.010 (0.012) -0.019 (0.012) 0.002 (0.032) 0.184*** (0.049) -0.071* (0.041) -0.226** (0.098) 0.192 -0.031 (0.039) 0.076* (0.043) -0.211*** (0.032) -0.025 (0.040) 0.083* (0.044) -0.163*** (0.052) -0.148 (0.123) 0.134 -0.146*** (0.040) 0.079* (0.046) -0.245*** (0.034) -0.136*** (0.041) 0.090** (0.046) -0.142*** (0.054) -0.331*** (0.125) 0.169 Other District Level Controls Total paved roads(lag) -0.009 (0.009) Zone Fixed Effects South West Zone South Center Zone East Zone Residual -0.006 (0.032) 0.177*** (0.050) -0.141*** (0.025) / / / / / / 0.189 0.133 Pseudo R2 0.165 Note: Marginal effects are reported. Heteroskedasticity robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1. Control function estimation is used of Probit-IV columns. 140 Table 3.7. Regression of clustering degree on per acre expenditures on modern inputs Exp. on Mod. Inputs, Exp. on Mod. Inputs, Exp. on commercial w/ feeds per acre w/o feeds per acre feeds per acre (N=1,514) OLS OLS-IV OLS OLS-IV OLS OLS-IV Clustering Degree 1.846*** 1.381 1.601** -0.377 1.854*** 3.504** HH Level Controls HH size HHH male HHH education level Nonfarm activities Area of crop lands Area of ponds Pond ownership Pond years (0.539) (1.204) (0.635) (1.377) (0.651) (1.427) -0.012 (0.080) 1.321* (0.743) -0.009 (0.080) 1.325* (0.744) 0.021 (0.085) 1.744** (0.787) 0.034 (0.085) 1.761** (0.785) -0.021 (0.091) 1.530** (0.763) -0.032 (0.091) 1.516** (0.761) -0.055 (0.114) 0.009 (0.317) 0.318*** (0.093) -0.006 (0.027) 0.757** (0.346) -0.068*** (0.015) -0.474*** (0.122) -0.063 (0.115) 0.012 (0.317) 0.316*** (0.093) -0.007 (0.027) 0.778** (0.346) -0.069*** (0.015) -0.498*** (0.137) 0.080 (0.124) 0.016 (0.351) 0.312*** (0.104) 0.014 (0.021) 0.301 (0.380) -0.065*** (0.016) -0.411*** (0.129) 0.046 (0.125) 0.031 (0.351) 0.307*** (0.104) 0.012 (0.021) 0.390 (0.381) -0.068*** (0.016) -0.509*** (0.144) -0.158 (0.133) 0.128 (0.368) 0.551*** (0.117) -0.000 (0.032) 0.937** (0.386) -0.101*** (0.017) -0.945*** (0.131) -0.130 (0.135) 0.115 (0.368) 0.555*** (0.117) 0.001 (0.032) 0.863** (0.384) -0.098*** (0.017) -0.863*** (0.144) (0.880) -2.379*** (0.895) -2.085** (0.926) -1.845* (0.942) -1.272 (0.996) -1.472 (1.015) -1.998*** (0.667) -0.247 (0.615) -4.562*** (0.771) 0.147 (0.338) 10.133*** -1.111 (0.702) 0.182 (0.684) -4.583*** (0.645) -1.210* (0.706) 0.031 (0.699) -5.331*** (0.811) 0.627 (0.399) 7.447*** -4.002*** (0.708) -0.450 (0.712) -5.403*** (0.655) -3.920*** (0.713) -0.324 (0.724) -4.778*** (0.848) -0.523 (0.395) 7.727*** Hours to nearest big city Other District Level Controls Total paved roads -2.435*** Zone Fixed Effects South West Zone South Center Zone East Zone Residual -1.975*** (0.660) -0.211 (0.601) -4.386*** (0.619) / / / / / / Adjusted R2 9.928*** 6.574*** 8.457*** Note: Marginal effects are reported. Heteroskedasticity robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 141 Table 3.8. Regression of clustering degree on output share, by species % of Niche fish % of Carp output % of Til.+Pang. output output (N=1,514) OLS OLS-IV OLS OLS-IV OLS OLS-IV Clustering Degree -0.058 -0.342*** 0.048 0.274*** 0.015* -0.001 (0.036) (0.078) (0.035) (0.071) (0.009) (0.019) (0.003) 0.016 (0.029) -0.004 (0.003) 0.019 (0.029) 0.002 (0.003) -0.047* (0.026) 0.001 (0.003) -0.049* (0.026) -0.000 (0.001) -0.005 (0.005) -0.000 (0.001) -0.005 (0.005) -0.005 (0.005) 0.019 (0.015) 0.024*** (0.005) -0.006** (0.003) -0.030** (0.015) 0.001* (0.001) 0.027*** (0.005) -0.009* (0.005) 0.021 (0.015) 0.023*** (0.005) -0.006** (0.003) -0.017 (0.015) 0.001 (0.001) 0.012** (0.005) 0.005 (0.005) -0.012 (0.013) -0.015*** (0.004) 0.002 (0.001) 0.023* (0.013) -0.001*** (0.001) -0.027*** (0.004) 0.009** (0.005) -0.013 (0.013) -0.015*** (0.004) 0.002 (0.001) 0.013 (0.013) -0.001* (0.001) -0.016*** (0.005) 0.001 (0.001) 0.001 (0.003) 0.000 (0.001) -0.000** (0.000) -0.001 (0.002) 0.000** (0.000) 0.000 (0.000) 0.001 (0.001) 0.001 (0.003) 0.000 (0.001) -0.000** (0.000) -0.001 (0.002) 0.000** (0.000) -0.001 (0.001) (0.009) -0.021** (0.008) -0.027*** (0.009) 0.001 (0.001) 0.001 (0.001) -0.214*** (0.026) -0.062** (0.028) 0.048** (0.022) -0.200*** (0.025) -0.042 (0.027) 0.135*** (0.028) -0.256*** (0.067) 0.355*** (0.056) 0.123 -0.009** (0.004) -0.006* (0.004) -0.005* (0.003) -0.010** (0.004) -0.007** (0.004) -0.011* (0.006) 0.018 (0.014) 0.009 (0.011) 0.014 HH Level Controls HH size -0.005 HHH male HHH education level Nonfarm activities Area of crop lands Area of ponds Pond ownership Pond years Hours to nearest big city Other District Level Controls Total paved roads 0.035*** 0.044*** (0.009) Zone Fixed Effects South West Zone -0.023 South Center Zone East Zone (0.028) 0.081*** (0.029) -0.054** (0.023) Residual / / Constant 0.503*** (0.056) 0.117 -0.041 (0.027) 0.056** (0.028) -0.163*** (0.031) 0.321*** (0.074) 0.643*** (0.062) 0.123 / / 0.466*** (0.052) 0.118 Adjusted R2 Note: Heteroskedasticity robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 142 / / 0.001 (0.008) 0.014 Table 3.9. Regression of clustering degree and urban proximity on modern inputs adoption, by product-type farmers Probit Probit-IV M1 M2 M3 M1 M2 M3 Carp Producers (N=1,441) 0.130*** 0.093** 0.153*** 0.314*** 0.205* 0.417*** Clustering (0.042) (0.045) (0.048) (0.096) (0.117) (0.118) Degree -0.008 -0.019** -0.031*** Hours to nearest -0.016*** -0.025*** -0.044*** (0.005) (0.007) (0.007) (0.007) (0.009) (0.009) big city / / / -0.214** -0.129 -0.308** Residual / / Tilapia & Pangas Producers (N=962) 0.184*** 0.149** Clustering (0.070) (0.060) Degree Hours to nearest -0.015** -0.020** (0.007) (0.008) big city / / Residual / / / 0.176** (0.068) -0.048*** (0.009) / / (0.098) 0.198 (0.169) -0.014 (0.011) -0.016 (0.164) (0.124) 0.212 (0.169) -0.016 (0.012) -0.070 (0.175) (0.123) 0.299 (0.193) -0.041*** (0.014) -0.140 (0.198) Note: M1 refers to all modern inputs; M2 refers to modern inputs outside of commercial feed; M3 includes commercial feeds only. Heteroskedasticity robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 143 Table 3.10. Regression of clustering degree on specialization (N=1,514) OLS OLS-IV Clustering Degree -0.982*** -2.828*** HH Level Controls HH size HHH male (dummy) HHH education level Nonfarm activities (dummy) Total area of crop lands (Acre) Total area of ponds (Acre) Pond ownership (owned=1) Pond years Hours to nearest big city Other District Level Controls Total paved roads (km) (lag) Zone Fixed Effects South West Zone (dummy) South Center Zone (dummy) East Zone (dummy) Constant Residual (0.157) (0.541) 0.060*** (0.021) 0.320** (0.143) 0.044 (0.032) 0.095 (0.084) 0.064** (0.032) 0.023*** (0.008) -0.039 (0.083) 0.011*** (0.004) -0.071*** (0.026) 0.073*** (0.021) 0.335** (0.144) 0.013 (0.033) 0.109 (0.083) 0.060* (0.031) 0.021** (0.008) 0.043 (0.085) 0.008** (0.004) -0.164*** (0.036) -0.022 (0.061) 0.034 (0.063) 0.005 (0.166) -0.260 (0.170) -0.709*** (0.146) 4.302*** (0.355) -0.107 (0.170) -0.422** (0.177) -1.418*** (0.240) 5.207*** (0.445) 2.087*** (0.589) 0.082 / / Adjusted R2 0.072 Note: Heteroskedasticity robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 144 Table 3.11. Regression of clustering degree and urban proximity, by different samples Full sample (N=1,514) Sub-sample (N=430) Observations without IV with IV without IV with IV (A) Adoption of commercial feeds 0.150*** 0.435*** 0.137** 0.229** Clustering Degree Hours to nearest big city (0.048) -0.045*** (0.007) Residual / / Clustering Degree 0.153*** (0.048) -0.044*** (0.007) / / Hours to nearest big city Residual Clustering Degree Hours to nearest big city Residual Clustering Degree Hours to nearest big city 0.417*** (0.118) -0.031*** (0.009) -0.308** (0.123) 0.184*** (0.067) -0.020** (0.010) / / (0.193) -0.041*** (0.014) -0.140 (0.198) (0.090) -0.034** (0.015) (D) Output Share of Carps (%) -0.058 (0.036) 0.027*** (0.005) Clustering Degree 0.048 (0.035) -0.027*** (0.004) 0.275*** (0.103) -0.017 (0.011) -0.137 (0.118) -0.342*** (0.078) 0.012** (0.005) 0.321*** (0.074) -0.158** (0.064) 0.032*** (0.009) / 0.274*** (0.071) -0.016*** (0.005) -0.256*** (0.067) 0.116* (0.064) -0.029*** (0.009) -2.828*** (0.541) -0.164*** (0.036) 2.087*** (0.589) -0.987*** (0.276) -0.092* (0.050) (0.150) -0.030* (0.016) -0.245 (0.177) -0.374*** (0.102) 0.022** (0.010) 0.315*** (0.112) / (E) Output Share of Tilapia+ Pangas (%) Residual / / Clustering Degree -0.982*** (0.157) -0.071*** (0.026) Residual (0.107) -0.020* (0.012) -0.139 (0.130) (C) Adoption of commercial feeds, Tilapia+Pangas 0.299 producers 0.186** 0.346** 0.176** (0.068) -0.048*** (0.009) / / / / Hours to nearest big city (0.067) -0.023** (0.011) / / (B) Adoption of commercial feeds, Carp producers Residual Hours to nearest big city (0.119) -0.031*** (0.009) -0.331*** (0.125) / / (F) Number of Species per HH Note: Heteroskedasticity robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 145 0.308*** (0.089) -0.019** (0.009) -0.279*** (0.090) -0.880* (0.489) -0.087* (0.052) -0.156 (0.611) Figure 3.1. 20 sample districts in six major aquaculture areas of Bangladesh 146 Fish Farmers Input Dealers Feed Mills Hatcheries Fish Traders Figure 3.2. Distribution of actors per 1,000 rural people in 2013 by district Note 1: the scale is not provided due to the space limitation. In each figure, the darker the blue, the higher the number of actors per 1000 rural people in the district; Note 2: the concentrated areas of each actor are circled in yellow. 147 Figure 3.3. Clustering degrees of Sample Districts, 2013 Note: We have 20 sample districts and each district is regarded as a cluster. Only 20 sample districts are colored in green. The other districts of Bangladesh are not colored. The darker the green, the greater is the degree of clustering. 148 Carp silver carp grass carp mirror carp common carp black carp karfu rui katla mrigel kalibaus pona puti/ swarputi bata blad cap blad carp gunia big head hangry hanri Table 3.A1. Fishes that mapped into each specie category Other Niche Pangas Tilapia Shrimp faishiya chitol pangas tilapia bagda chingri khalse koi nailotica golda chingri shol/gajar/taki magur chaka chingri tangra/baim shingi harina chingri vetki rup chanda deshi cingri mola/dela rup chnda desi chinri ayri gura chingri bowal foli fosholla koral mas mola dhela sai plane tablet tampo tblet 149 Table 3.A2. IV regression of clustering degree on specialization, by IV methods Dependent Var.=number of fishes (N=1,514) Instrument Variables Crop input volumes per capita (ton) Crop output volumes per capita (ton) Clustering Degree HH Level Controls HH size HHH male (dummy) HHH education level HH doing nonfarm activities (dummy) Total areas of crop lands (Acre) Total areas of pond (Acre) Pond ownership (owned=1) Pond years Hour to nearest big city Other District Level Controls Total paved roads of district (km) (lag) Zone Fixed Effects In South West Zone (dummy) In South Center Zone (dummy) In East Zone (dummy) Constant (1) First-Stage (2) IV-2SLS (3) IV-LIML (4) IV-GMM -1.991*** (0.306) -0.371*** (0.063) / / / / / / -2.828*** (0.532) / / / / -2.889*** (0.548) / / / / -2.994*** (0.529) 0.003 (0.003) 0.025 (0.023) -0.017*** (0.004) 0.006 (0.012) -0.001 (0.004) -0.001 (0.001) 0.029** (0.012) -0.001** (0.001) -0.034*** (0.004) 0.073*** (0.022) 0.335** (0.151) 0.013 (0.034) 0.109 (0.085) 0.060* (0.032) 0.021*** (0.008) 0.043 (0.087) 0.008** (0.004) -0.164*** (0.037) 0.073*** (0.022) 0.336** (0.152) 0.012 (0.034) 0.110 (0.086) 0.060* (0.032) 0.021*** (0.008) 0.046 (0.088) 0.007** (0.004) -0.167*** (0.038) 0.077*** (0.022) 0.321** (0.153) 0.007 (0.034) 0.112 (0.086) 0.052 (0.032) 0.021*** (0.008) 0.054 (0.088) 0.007* (0.004) -0.178*** (0.037) 0.009 (0.008) 0.034 (0.064) 0.036 (0.064) 0.062 (0.063) -0.053** (0.025) -0.222*** (0.025) -0.450*** (0.019) 0.697*** (0.050) -0.107 (0.181) -0.422** (0.188) -1.418*** (0.237) 5.207*** (0.450) -0.111 (0.182) -0.427** (0.189) -1.441*** (0.243) 5.237*** (0.455) -0.141 (0.183) -0.429** (0.190) -1.533*** (0.233) 5.251*** (0.453) NA 0.000 0.018 0.017 Hausman Test (H0: ci is exogenous) P-Value / 0.000 Overidentification Test P-Value / 0.017 Note: Heteroskedasticity robust standard errors are in parentheses. 150 REFERENCES 151 REFERENCES Ahmed, N., 2009. Revolution in small-scale freshwater rural aquaculture in Mymensingh, Bangladesh. World Aquaculture, 40 (4), 31. Ahmed, N., Toufique, K.A., 2015. Greening the blue revolution of small‐scale freshwater aquaculture in Mymensingh, Bangladesh. Aquaculture Research, 46 (10), 2305-2322. Ali, H., Haque, M.M., Belton, B., 2013. Striped catfish (Pangasianodon hypophthalmus, Sauvage, 1878) aquaculture in Bangladesh: an overview. Aquaculture Research, 44 (6), 950-965. Bandiera, O., Rasul, I., 2006. Social networks and technology adoption in northern Mozambique. The Economic Journal, 116 (514), 869-902. Belton, B., Karim, M., Thilsted, S., Collis, W., Phillips, M., 2011. Review of aquaculture and fish consumption in Bangladesh. Studies and Reviews 2011-53. Penang: The WorldFish Center, November 2011. Conley, T.G., Udry, C.R., 2010. Learning about a new technology: Pineapple in Ghana. The American Economic Review, 100 (1), 25-69. Delgado, M., Porter, M.E., Stern, S., 2015. Defining clusters of related industries. Journal of Economic Geography, 16 (1), 1-38. Fafchamps, M., Shilpi, F., 2013. Determinants of the choice of migration destination. Oxford Bulletin of Economics and Statistics, 75 (3), 388-409. Feder, G., Just, R.E., Zilberman, D., 1985. Adoption of agricultural innovations in developing countries: A survey. Economic Development and Cultural Change, 33 (2), 255-298. Forman, C., Goldfarb, A., Greenstein, S., 2005. Technology adoption in and out of major urban areas: When do internal firm resources matter most? NBER Working Paper No. 11642: Cambridge, Massachusetts. Ghezán, G., Mateos, M., Viteri, L. 2002. Impact of supermarkets and fast-food chains on horticultural supply chains in Argentina. Development Policy Review 20 (4), 389-408. Haggblade, S., P.B.R. Hazell, and T. Reardon. 2010. The Rural Nonfarm Economy: Prospects for Growth and Poverty Reduction, World Development 38(10), 1429–1441. Henderson, J.V., 1974. The sizes and types of cities. The American Economic Review, 64 (4), 640-656. 152 Hernandez, R., Belton, B., Reardon, T., Hu, C., Zhang, X. and Ahmed, A., 2017. The “quiet revolution” in the aquaculture value chain in Bangladesh. Aquaculture. http://dx.doi.org/10.1016/j.aquaculture.2017.06.006 Humphrey, J. and Schmitz, H., 2000. Governance and upgrading: linking industrial cluster and global value chain research. IDS Working Paper No.120: Institute of Development Studies (IDS), University of Sussex, Brighton. Humphrey, J. and Schmitz, H., 2002. How does insertion in global value chains affect upgrading in industrial clusters?. Regional Studies, 36 (9), 1017-1027. Hussain, M.G., 2009. A future for the tilapia in Bangladesh. Aquaculture Asia Pacific Magazine, 5(4), 38–40. Just, R.E., Schmitz, A. and Zilberman, D., 1979. Price controls and optimal export policies under alternative market structures. The American Economic Review, 69 (4),706-714. Kelley, M.R. and Helper, S., 1999. Firm size and capabilities, regional agglomeration, and the adoption of new technology. Economics of Innovation and New Technology, 8 (1-2), 79-103. Lerner, A.P., 1934. The Concept of Monopoly and the Measurement of Monopoly Power. Review of Economic Studies, 1 (3), 157–175. Liu, E.M., 2013. Time to change what to sow: risk preferences and technology adoption decisions of cotton farmers in China. The Review of Economics and Statistics. 95 (4), 1386-1403. Long, C. and Zhang, X., 2011. Cluster-based industrialization in China: Financing and performance. Journal of International Economics, 84 (1), 112-123. Long, C. and Zhang, X., 2012. Patterns of China's industrialization: Concentration, specialization, and clustering. China Economic Review, 23 (3), 593-612. Mamun-Ur-Rashid, M., Belton, B., Phillips, M. and Rosentrater, K.A., 2013. Improving aquaculture feed in Bangladesh: From feed ingredients to farmer profit to safe consumption. Working Paper: 2013-34. WorldFish, Penang, Malaysia. Marshall, A., 1920. Principles of Economics. An Introductory Volume. Eighth Edition. London: Macmillan. Pietrobelli, C., and Rabelloti, R. 2006. “Clusters and Value Chains in Latin America: In Search of an Integrated Approach.” In Upgrading to Compete: Global Value Chains, Clusters, and SMEs in Latin America, edited by Pietrobelli, C., and Rabelloti, R. 1–40. Washington, DC: Inter-American Development Bank. 153 Porter, M.E., 1998. Clusters and the new economics of competition. Harvard Business Review, 76 (6), 77–90 Porter, M.E., 2000. Location, competition, and economic development: Local clusters in a global economy. Economic Development Quarterly, 14 (1), 15-34. Reardon, T., Crawford, E. and Kelly, V., 1994. Links between nonfarm income and farm investment in African households: adding the capital market perspective. American Journal of Agricultural Economics, 76 (5), 1172-1176. Reardon, T., Tschirley, D., Dolislager, M., Snyder, J., Hu, C. and White, S., 2014. Urbanization, diet change, and transformation of food supply chains in Asia. East Lansing, Michigan: Global Center for Food Systems Innovation. Michigan State University. Sandee, H. and Rietveld, P., 2001. Upgrading traditional technologies in small-scale industry clusters: Collaboration and innovation adoption in Indonesia. Journal of Development Studies, 37 (4), 150-172. Schmitz, H., 1995. Small shoemakers and Fordist giants: tale of a supercluster. World Development, 23 (1), 9-28. Schmitz, H., 1999. Collective efficiency and increasing returns. Cambridge Journal of Economics, 23 (4), 465-483. Otsuka, K. and Sonobe, T., 2011. A cluster-based industrial development policy for lowincome countries. Policy Research Working Paper No. 5703: World Bank, Washington, DC. Uchida, H., A. Nelson. 2009. Agglomeration Index: Towards a new measure of urban concentration. Washington, D.C.: World Bank, Washington, DC. Visser, E. J., 1996. Local sources of competitiveness. Spatial clustering and organisational dynamics in small-scale clothing in Lima, Peru. Tinbergen Institute Ph.D Thesis. University of Amsterdam, Amsterdam. World Bank. 2017. Enabling the Business of Agriculture 2017. Washington DC, The World Bank. Zhang, X. and Hu, D., 2014. Overcoming successive bottlenecks: The evolution of a potato cluster in China. World Development, 63 (2014), 102-112. Zilberman, D., Lu, L. and Reardon, T., 2017. Innovation-induced food supply chain design. Food Policy. https://doi.org/10.1016/j.foodpol.2017.03.010 154