MARKET ACCESS AND SMALLHOLDER DEVELOPMENT IN KENYA AND ZAMBIA By Jordan Chamberlin A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Agricultural, Food & Resource Economics - Doctor of Philosophy 2013 ABSTRACT MARKET ACCESS AND SMALLHOLDER DEVELOPMENT IN KENYA AND ZAMBIA By Jordan Chamberlin In this dissertation I examine the influence of market access on a variety of small farm management decisions and welfare outcomes in two very different settings. In Kenya, a relatively high-density country with relatively good overall levels of access, I focus on the relationship between rural infrastructure provision and participation in agricultural markets. In Zambia, a low-density country in which the majority of farms operate under conditions of economic remoteness, I explore the relationship between access to markets, population density and the role these factors play in conditioning the farm strategies pursued by small farmers in different areas. In particular, I explore how accessibility may enable or constraint area-expansion strategies in land abundant environments. In both studies, I find that economic remoteness is a critical constraint to smallholder development (where the development pathway of interest is market participation in Kenya, and extensive versus intensive production strategies in Zambia). This work is of relevance to researchers and policymakers interested in how policies and investment strategies may best target accessibility deficiencies in rural areas in order to stimulate smallholder economic growth. ACKNOWLEDGEMENTS I would like to thank John Coltrane and Lady Day, without whom I would not have made it. iii TABLE OF CONTENTS LIST OF TABLES ................................................................................................................................vi LIST OF FIGURES ............................................................................................................................ viii 1 Introduction ............................................................................................................................. 1 1.1 Smallholder access to markets in sub-Saharan Africa ......................................................... 1 1.2 Data used in this research ................................................................................................... 2 1.3 Overview of Essay 1: Rural infrastructure and smallholder market participation in Kenya . .......................................................................................................................................... 4 1.4 Overview of Essay 2: Population density, remoteness and farm size: exploring the paradox of small farms amidst land abundance in Zambia................................................. 7 2 Rural infrastructure and smallholder market participation in Kenya ..................................... 9 2.1 Introduction ......................................................................................................................... 9 2.1.1 Remoteness, market participation and small farm development ........................ 9 2.1.2 Organization of this paper................................................................................... 14 2.2 Smallholder marketing and rural infrastructure in Kenya................................................. 15 2.2.1 Smallholder market participation in Kenya......................................................... 15 2.2.2 Rural market access conditions are generally improving ................................... 15 2.2.3 A dynamic rural infrastructure landscape ........................................................... 19 2.2.4 Previous assessments of access and market participation in Kenya .................. 21 2.3 Conceptual Framework ..................................................................................................... 23 2.3.1 Definitions ........................................................................................................... 23 2.3.2 Remoteness and market participation................................................................ 23 2.3.3 Expected smallholder responses to access improvements ................................ 26 2.3.4 Mobile phones and distance ............................................................................... 28 2.3.5 Spatial dependence ............................................................................................. 31 2.3.6 Hypotheses .......................................................................................................... 37 2.4 Data.................................................................................................................................... 39 2.4.1 Summary statistics: household and village characteristics................................. 45 2.4.2 Distribution in space ........................................................................................... 48 2.5 Empirical models and estimation strategy ........................................................................ 51 2.5.1 Basic empirical model ......................................................................................... 51 2.5.2 Spatial model estimation strategies.................................................................... 54 2.5.3 Panel estimators.................................................................................................. 56 2.5.4 Defining neighborhoods: spatial weights............................................................ 62 2.5.5 Partial effects in spatial lag models..................................................................... 63 2.5.6 Controlling for endogenous regressors............................................................... 65 iv 2.6 2.7 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 2.5.7 Long-run and short-run changes in access.......................................................... 69 Results................................................................................................................................ 73 2.6.1 Descriptive analysis ............................................................................................. 73 2.6.2 Determinants of market participation ................................................................ 86 Conclusions ...................................................................................................................... 111 Population density, remoteness and farm size: exploring the paradox of small farms amidst land abundance in Zambia .................................................................................................. 117 Motivation ....................................................................................................................... 117 Access to land and rural development in Zambia ........................................................... 126 3.2.1 The paradox of land constraints under land abundance .................................. 126 Conceptual model............................................................................................................ 137 3.3.1 Relative factor endowments and farm management ....................................... 137 3.3.2 Rural population density and household welfare ............................................. 147 3.3.3 Land quality and the interpretation of population density .............................. 151 3.3.4 Hypotheses ........................................................................................................ 152 Data.................................................................................................................................. 155 3.4.1 Household panel survey data ............................................................................ 155 3.4.2 Population density estimates ............................................................................ 156 3.4.3 Other spatial datasets ....................................................................................... 160 3.4.4 Land inequality .................................................................................................. 162 Estimation ........................................................................................................................ 167 3.5.1 Estimation challenges........................................................................................ 167 3.5.2 Estimation framework ....................................................................................... 171 Results.............................................................................................................................. 172 3.6.1 Descriptive results ............................................................................................. 172 3.6.2 Econometric results........................................................................................... 203 Discussion ........................................................................................................................ 232 3.7.1 Farm growth and market access ....................................................................... 232 Conclusions ...................................................................................................................... 242 APPENDICES ................................................................................................................................ 247 APPENDIX A Access indicator selection criteria ......................................................................... 248 APPENDIX B Comparability of Fixed-Effects and Correlated Random Effects results ............... 251 REFERENCES ................................................................................................................................ 253 v LIST OF TABLES Table 1 Trends in smallholder marketing in Kenya....................................................................... 17 Table 2 Trends in purchased inorganic fertilizer applied to all crops ........................................... 17 Table 3 Provision of rural infrastructure and services in Kenya 2000-2010 ................................. 20 Table 4 Summary household and village characteristics (2010) .................................................. 41 Table 5 Household characteristics, output marketing and off-farm employment ...................... 44 Table 6 Correlation between HCI and crop-specific marketing measures ................................... 46 Table 7 Characteristics of marketing outcome indicator variables .............................................. 54 Table 8 Description of access indicators used in this study ......................................................... 72 Table 9 Moran’s I calculated for household marketing decisions ................................................ 85 Table 10 Reduced form estimates of access indicators................................................................ 88 Table 11 Determinants of marketed share of production (HCI) ................................................... 97 Table 12 Impacts on HCI in spatial autoregressive (SAR) model .................................................. 98 Table 13 Determinants of log value of marketed crop production (all crops) ........................... 103 Table 14 Impacts on log value sold in spatial autoregressive (SAR) model................................ 104 Table 15 Determinants of log fertilizer purchase amount ......................................................... 107 Table 16 Mobile-distance interactions in market participation models .................................... 110 Table 17: Landholdings, rural population densities and survey coverage ................................. 122 Table 18: Landholding size and perceptions of local land availability ........................................ 122 Table 19: Rural population density by categories of market access and agricultural potential 130 Table 20: Hypothesized relationships between geographic conditions, intermediate outcomes and final outcomes of interest ................................................................................... 154 Table 21: Gridded spatial datasets used in this study ................................................................ 161 vi Table 22: Variables used in this analysis ..................................................................................... 165 Table 23: Distribution of rural population densities in the household survey villages .............. 172 Table 24: Rural population density growth and levels, by district ............................................. 173 Table 25: Smallholder household characteristics by rural population density quintile ............. 182 Table 26: Panel household trends in farm size and fallow, 2001-2008...................................... 183 Table 27: Indicators of institutional constraints, by access quartile .......................................... 202 Table 28: Indicators of technology constraints, by access quartile ............................................ 202 Table 29: Determinants of cultivated area ................................................................................. 207 Table 30: First stage reduced form models the determinants of animal and mechanical traction usage .......................................................................................................................... 208 Table 31: Determinants of output prices and agricultural wage rates....................................... 209 Table 32: Determinants of input demand and output supply .................................................... 216 Table 33: Factors affecting measures of productivity and household welfare .......................... 217 Table 34: Cultivated area results with alternative definitions of rural population density ....... 222 Table 35: Quantile regression results for household landholding size ...................................... 228 Table 36: Quantile regression results for household landholding size, including land concentration index as additional regressor ............................................................. 230 Table 37: Targeting land constraints .......................................................................................... 238 Table 38: Share of farms by category of holding size ................................................................. 239 Table 39: Factors affecting farm income growth; dependent variable: log farm income per adult equivalent................................................................................................................... 241 Table 40 Determinants of HCI & log value sold .......................................................................... 252 vii LIST OF FIGURES Figure 1 Trends in access to services and infrastructure.............................................................. 18 Figure 2 Impact of enhanced communication on distance-related marketing costs ................... 30 Figure 3 Distribution of survey villages ......................................................................................... 49 Figure 4 Household and village locations in Nakuru District, Rift Valley ...................................... 50 Figure 5 Production, marketing and remoteness ......................................................................... 75 Figure 6 Value of high-value sales ................................................................................................ 78 Figure 7 High-value share of marketed output ............................................................................ 78 Figure 8 Use of purchased fertilizer and distance from town ...................................................... 79 Figure 9 Transport costs and remoteness .................................................................................... 80 Figure 10 Extension advice and remoteness ................................................................................ 81 Figure 11 Scatter plot correlations between distances to different types of infrastructure and services from the farmgate .......................................................................................... 82 Figure 12 Trends in access to mobile phones ............................................................................... 83 Figure 13 Rural population growth rates, 1960-2010 ................................................................ 120 Figure 14 Declining arable land per capita, 1960-2010 .............................................................. 121 Figure 15: Relationship between rural population density and market access ......................... 131 Figure 16: Average annual rainfall .............................................................................................. 133 Figure 17: Rural population density............................................................................................ 134 Figure 18: Access to large urban centers .................................................................................... 135 Figure 19: Distribution of urban settlements and road infrastructure ...................................... 136 Figure 20: Conceptual framework showing role between population density, market access and cultivated area in farm management decisions......................................................... 145 viii Figure 21: Hypothesized relationship between land expansion constraints and access conditions .................................................................................................................................... 146 Figure 22: Conceptual framework showing linkages between population density and income 148 Figure 23: Comparison of gridded population datasets, estimates for 2010 ............................. 159 Figure 24: Village neighborhood definitions............................................................................... 164 Figure 25 Rural population density growth and levels, by district ............................................. 175 Figure 26 Distributions of alternative population density measures ......................................... 177 Figure 27 Capital-land ratios over space..................................................................................... 188 Figure 28 Capital-labor ratios over space ................................................................................... 190 Figure 29 Labor-land ratios over space...................................................................................... 192 Figure 30: Farm size, household size and traction technology use at different levels of market access ......................................................................................................................... 194 Figure 31: Institutional indicators and traction technology usage at different levels of market access ......................................................................................................................... 198 Figure 32: Market participation and travel time to city of 50,000 or more inhabitants ............ 234 ix 1 1.1 Introduction Smallholder access to markets in sub-Saharan Africa A central goal of the rural development agenda in sub-Saharan Africa is an agricultural transformation from low-input/low-output smallholder systems which are primarily oriented toward home consumption, toward market-oriented systems featuring higher productivity, greater use of inputs, and greater production specialization. Remoteness – i.e. poor physical access to markets – is frequently cited as critical constraint to this transformation (WB 2009). Yet the degree to which infrastructure upgrades or other access improvements result in theorized responses remains an open question. Furthermore, access may be measured in different ways and it different access-oriented investments may have very different payoffs. Consequently, there is a great deal of interest in parsing out exactly how different dimensions of market access condition small farm behaviors and how changes in access may be associated with transformations in the agricultural sector. In this dissertation I examine the influence of market access on a variety of small farm management decisions and welfare outcomes in two very different settings. In Kenya, a relatively high-density country with good overall levels of access, I focus on the relationship between rural infrastructure provision and participation in agricultural markets. In Zambia, a low-density country in which the majority of farms operate under conditions of economic remoteness, I explore the relationship between access to markets, population density and the role these factors play in conditioning the farm strategies pursued by small farmers in different 1 areas. In particular, I explore how accessibility may enable or constraint area-expansion strategies in land abundant environments. This research is organized as two essays, which are entitled: 1) Rural infrastructure and smallholder market participation in Kenya 2) Population density, remoteness and farm size: exploring the paradox of small farms amidst land abundance in Zambia Together, they provide a complementary view of the range of rural accessibility characteristics of smallholder marketing environments within the region. The essays also complement each other by framing very different policy issues within a rural accessibility framework. In Kenya, the policy question is how to target investments in rural infrastructure and services such that they most effectively stimulate market participation. In Zambia, the policy question is how to use rural accessibility conditions as a framework for targeting investments aimed at enhancing the value of land as a productive asset. In a broad sense, both essays relate rural remoteness and accessibility to the structural transformation agenda, but do so by examining different implications of accessibility for farmers in different settings. This document should provide valuable information to both policymakers and researchers concerned with the role of market access in rural economic development. The subsequent sections of this introduction provide a description of the data used in this study and overviews of both essays. 1.2 Data used in this research Data used in this study come from several sources. Smallholder household data for Kenya (used in Essay 1) come from a nationwide panel survey collected by the Tegemeo Institute of Egerton 2 University. Detailed plot and farm data was collected from 1,233 agricultural households in 1997, 2000, 2004, 2007 and 2010. The balanced panel survey contains information on household production, marketing activities, input and output, and a variety of self-reported indicators of access to markets, related infrastructure and services. The sampling frame for the panel was prepared in consultation with the Kenya National Bureau of Statistics (KNBS) in 1997. Twenty-four (24) districts were purposively chosen to represent the broad range of agroecological zones (AEZs) and agricultural production systems in Kenya. Next, all non-urban divisions in the selected districts were assigned to one or more AEZs based on agronomic information from secondary data. Third, proportional to population across AEZs, divisions were selected from each AEZ. Fourth, within each division, villages and households in that order were randomly selected. A total of 1,578 households were selected in 1997 in the 24 districts within the country’s eight agriculturally-oriented provinces. The sample excluded large farms with over 50 acres and two pastoral areas. The initial survey was implemented in 1997, which covered both the 1996/97 and 1995/96 cropping seasons. Subsequent follow up surveys were conducted in 2000, 2004, 2007 and 2010. After the 2010 survey, 1,233 households were consistently interviewed in all five years. 1 Smallholder household data for Zambia (used in Essay 2) come from the Supplemental Surveys carried out by the Zambian Central Statistical Office (CSO) in association with the Zambian Ministry of Agriculture and Cooperatives (MACO) and Michigan State University’s Food Security 1 There are actually 1243 households surveyed in each wave of the panel. However, because our focus is on smallholder farmers, I restricted my analysis to households reporting average cultivated areas of 10 hectares or less; this resulted in 10 households being dropped from our sample, leaving a total of 1,233 households in each of the 5 panel waves. 3 Research Project (FSRP). These surveys are linked with the 2000 Post Harvest Survey for small and medium scale holdings. A consistent panel of 4340 smallholder households was surveyed in each of the Supplemental Survey waves, which took place in 2001, 2004 and 2008. The survey is nationally representative and the sampling frame includes villages in 70 of Zambia’s 72 Districts. Spatial data on infrastructure, population, terrain, land cover, and climate were used in both essays. These data come from various data sources, which are detailed in the Appendices to each chapter. The data were brought into a common geographic information system (GIS) framework, where they were transformed in ways described in the essays and appendices. Several of the spatial variables used (e.g. the estimated travel time to towns of particular population sizes) were originally developed by the author in the course of carrying out the research described in this dissertation. 1.3 Overview of Essay 1: Rural infrastructure and smallholder market participation in Kenya In this essay, I undertake an empirical assessment of how changes in rural infrastructure and services have affected household marketing behavior. I emphasize the most dynamic changes in market access over the last decade in rural Kenya, most notably the rapid expansion of mobile telephony. A number of features set this work apart from other efforts. First, I track changes taking place over a 10-year period using panel data which include multiple indicators of access to markets and related infrastructure and services. This stands in contrast to most household models incorporating access as an exogenous variable, in which cross-sectional variation is used to infer impacts of changes in market access, and the access indicator is a single variable which often appears to be selected in an ad hoc manner. 4 Second, using multiple indicators allows me to raise and explore the question of access complementarities, i.e. synergistic effects of improvements in different infrastructure types. In particular, I examine how the expansion of telecommunications is altering the structure of transactions costs imposed by physical remoteness in rural areas. Third, I theorize the endogeneity of some types of access changes and farm behaviors (in particular, I theorize the post-liberalization expansion of private marketing services as both a driver and a response to marketed surplus in different geographical areas). I test and control for this using an instrumental variables approach. Fourth, I distinguish between persistent access conditions and relatively short term changes in those conditions. To date, most household-level empirical assessments have been based on cross-sectional differences and have not been able to make such distinctions. The few panel studies that exist have opted to buy robustness to unobserved heterogeneity by differencing away the time-invariant components of access through fixed effects. I advocate an approach that allows both components to be estimated and interpreted, based on the correlated random effects estimator, which also controls for unobserved time-invariant heterogeneity. Finally, I motivate and test for spatial structure in household marketing decisions. Following Manski (1993), I propose three channels which may underlie this structure: (a) endogenous interaction effects, whereby the marketing decisions made by a household influence and are influenced by the decisions made by neighboring households; (b) exogenous interaction effects, whereby the marketing decisions made by a household depend upon exogenous conditions in neighboring locations; and (c) correlated effects related to unobserved factors affecting 5 households in nearby locations. The latter two channels may be thought of as aspects of a spatial diffusion process through which infrastructural changes act upon marketing conditions throughout a region. The first channel is primarily defined in terms of household interactions although, as I will show, these inter-household interactions also affect how the impacts of access investments percolate through an area. The contributions of this work are conceptual, methodological, and policy-relevant. Conceptually, this work demonstrates the need for more explicit theorizing about how market access is best represented and how changes in market access actually take place. I demonstrate this by substantiating the multidimensional nature of market access in an empirical setting. For example, in distinguishing between typical measures of access (distance to town and roads) and conditions related to telecommunications and service provision, I am able to show that such non-road investments are extremely important to agricultural marketing outcomes. Furthermore, I address complementarities between different types of access changes. The policy relevance of these ideas is not trivial: investment strategies that focus overwhelmingly on roads will likely miss important opportunities for stimulating rural market participation and income growth. Other conceptual contributions have methodological implications. I provide evidence that some important kinds of access changes are endogenous to smallholder farm outcomes and appropriate modeling strategies are therefore required for the identification of impacts. Furthermore, I show that within a panel data framework, the correlated random effects 6 estimator can be used to provide estimates of impacts of both transitory and persistent access conditions while still controlling for unobserved heterogeneity. Perhaps the most novel feature of this work is my explicit modeling of spatial dependence in marketing outcomes across households in close proximity to one another. I show that spatial dependence is a feature of marketing outcomes, even after controlling for other geographically-specific factors. I implement this idea with a spatial panel model, testing alternative assumptions about the nature and structure of spatial dependence. The use of spatial panel techniques, which are only just now being described in the mainstream econometrics literature, have not before been applied to household survey data in developing countries. While I implement this idea within a specific problem context – i.e. estimating the determinants of market participation – my findings are of much broader relevance for the discipline as a whole. Specifically, I allege that rural household survey data, as they are currently collected, are unlikely to represent spatially independent outcomes. This assertion implies that econometric analyses that do not control for such dependence are probably not efficient and, depending upon the form of spatial dependence, may be inconsistent. 1.4 Overview of Essay 2: Population density, remoteness and farm size: exploring the paradox of small farms amidst land abundance in Zambia In many ways the Zambian small farm sector represents a paradox: despite apparently abundant land resources, the overwhelming majority of farms are very small and growing smaller. 70% of family farms cultivate 2 hectares or less and frequently report that no additional land is available within their communities. Such patterns are very characteristic of 7 high-density countries where land scarcity is widely recognized. Furthermore, small farms are associated with lower farm output, levels of marketing and household incomes. These facts are hard to reconcile with the widely promulgated idea that access to land is not a primary constraint to Zambian small farm development. In this essay, I use nationally representative household panel data to evaluate a number of alternative explanations for the prevalence of small farm sizes: misleading measurements of rural population density, institutional constraints to land access, and limited access to labor and technologies that enable area expansion. I find that high-resolution measurements of rural population confirm that local densities are generally quite low, but that institutional constraints may limit access to land even in such low density conditions. Even more important are constraints to area-expansion technologies, such as animal and mechanical traction. I find that market access, rather than population density, is the key geographical gradient that links these findings. After discussing my findings, I conclude by proposing a geographical framework for policies and investments which aim to reduce constraints to utilizing arable land within Zambia’s small farm sector. 8 2 2.1 Rural infrastructure and smallholder market participation in Kenya Introduction 2.1.1 Remoteness, market participation and small farm development Improving marketing participation and performance is a key part of the agricultural transformation agenda, under which low-input/low-output smallholder systems which are primarily oriented toward home consumption, evolve toward more commercialized systems featuring higher productivity, more rationalized use of inputs, and greater production specialization (Timmer 1988). Such changes are critical to overall economic transformations in which rising agricultural productivity spurs growth throughout the economy (Johnston and Mellor 1961). Remoteness – i.e. poor physical access to markets – is frequently cited as critical constraint to this transformation. There is strong evidence that remote places are poorer, less productive, more exposed to price risks and less engaged with markets (e.g. Fafchamps 2012, Stifle and Minten 2009, Barrett 2008). Yet the degree to which specific changes in infrastructure or other conditions affecting “economic remoteness” result in theorized responses remains an important empirical question. There has been increased awareness of the fact that market access may be measured in quite different ways, and alternative indicators are not always highly correlated with one another (Chamberlin and Jayne 2013, Baltenweck and Staal 2007, Wood 2007). This implies that different investment strategies may have very different payoffs. Furthermore, the various dimensions of accessibility are dynamic: conditions are a function of infrastructure investments, policies related to the provision of public goods, and changes in 9 technology. As a consequence, recent empirical assessments of rural infrastructure impacts have shifted from cross-sectional analysis to panel studies (e.g. Muto and Yamano 2009, Yamauchi et al. 2011) and field experiments (e.g. Bernard and Torero 2010). In summary, there is a great deal of current interest in parsing out the impacts of observed changes in multidimensional access conditions on small farm behaviors and household welfare outcomes. Empirical approaches to measuring ex post impacts of improved access have typically focused either on highly aggregated relationships (e.g. returns to transportation investments at regional scale) or on relatively simple indicators in household models (e.g. the presence or absence of an all-weather road in the surveyed village). There are several potential problems in the standard approaches. (i) Different dimensions of access may be driving different kinds of responses, i.e., a single indicator of access, especially if selected on an ad hoc basis, may poorly reflect the true state of market access. (ii) Multiple dimensions of access may have important complementary effects. (For example, electricity may have less impact on economic behavior in the absence of roads, or vice versa.) Such effects are opaque to models using univariate access indicators. (iii) Exogeneity assumptions may not be valid for some kinds of access indicators. (iv) It is important to distinguish between transitory changes in access and more persistent access conditions. (v) Given the increasing evidence of spatial spillovers in smallholder decision making, as well as the strong spatial expression of infrastructure and other access-mediating features, some form of spatial dependence is likely to feature in the relationship between access changes and outcomes. Failure to address these problems may result in biased estimates of impacts and, consequently, misguided policy prescriptions. The high profile of access-oriented investments in current policy discussions makes this an issue of considerable practical importance. 10 In this paper, I undertake an empirical assessment of the most dynamic changes in market access over the last decade in rural Kenya, employing a methodological approach that addresses the five measurement challenges raised above. A number of features set this work apart from other efforts. First, I track changes taking place over a 10-year period using panel data which include multiple indicators of access to markets and related infrastructure and services. This stands in contrast to most household models incorporating access as an exogenous variable, in which cross-sectional variation is used to infer impacts of changes in market access, and the access indicator is a single variable which often appears to be selected in an ad hoc manner. Second, using multiple indicators allows me to raise and explore the question of access complementarities, i.e. synergistic effects of improvements in different infrastructure types. In particular, I examine how the expansion of telecommunications is altering the structure of transactions costs imposed by physical remoteness in rural areas. Third, I theorize the endogeneity of some types of access changes and farm behaviors (in particular, I theorize the post-liberalization expansion of private marketing services as both a driver and a response to marketed surplus in different geographical areas). I test and control for this using an instrumental variables approach. Fourth, I distinguish between persistent access conditions and relatively short term changes in those conditions. To date, most household-level empirical assessments have been based on cross-sectional differences and have not been able to make such distinctions. The few panel studies that exist have opted to buy robustness to unobserved heterogeneity by differencing 11 away the time-invariant components of access through fixed effects. I advocate an approach that allows both components to be estimated and interpreted, based on the correlated random effects estimator, which also controls for unobserved time-invariant heterogeneity. Finally, I motivate and test for spatial structure in household marketing decisions. Following Manski (1993), I propose three channels which may underlie this structure: (a) endogenous interaction effects, whereby the marketing decisions made by a household influence and are influenced by the decisions made by neighboring households; (b) exogenous interaction effects, whereby the marketing decisions made by a household depend upon exogenous conditions in neighboring locations; and (c) correlated effects related to unobserved factors affecting households in nearby locations. The latter two channels may be thought of as aspects of a spatial diffusion process through which infrastructural changes act upon marketing conditions throughout a region. The first channel is primarily defined in terms of household interactions although, as I will show, these inter-household interactions also affect how the impacts of access investments percolate through an area. The contributions of this work are conceptual, methodological, and policy-relevant. Conceptually, this work demonstrates the need for more explicit theorizing about how market access is best represented and how changes in market access actually take place. I demonstrate this by substantiating the multidimensional nature of market access in an empirical setting. For example, in distinguishing between typical measures of access (distance to town and roads) and conditions related to telecommunications and service provision, I am able to show that such non-road investments are extremely important to agricultural marketing outcomes. 12 Furthermore, I address complementarities between different types of access changes. The policy relevance of these ideas is not trivial: investment strategies that focus overwhelmingly on roads will likely miss important opportunities for stimulating rural market participation and income growth. Other conceptual contributions have methodological implications. I provide evidence that some important kinds of access changes are endogenous to smallholder farm outcomes and appropriate modeling strategies are therefore required for the identification of impacts. Furthermore, I show that within a panel data framework, the correlated random effects estimator can be used to provide estimates of impacts of both transitory and persistent access conditions while still controlling for unobserved heterogeneity. Perhaps the most novel feature of this work is my explicit modeling of spatial dependence in marketing outcomes across households in close proximity to one another. I show that spatial dependence is a feature of marketing outcomes, even after controlling for other geographically-specific factors. I implement this idea with a spatial panel model, testing alternative assumptions about the nature and structure of spatial dependence. The use of spatial panel techniques, which are only just now being described in the mainstream econometrics literature, have not before been applied to household survey data in developing countries. While I implement this idea within a specific problem context – i.e. estimating the determinants of market participation – my findings are of much broader relevance for the discipline as a whole. Specifically, I allege that rural household survey data, as they are currently collected, are unlikely to represent spatially independent outcomes. This assertion implies that 13 econometric analyses that do not control for such dependence are probably not efficient and, depending upon the form of spatial dependence, may be inconsistent. Questions addressed by this research: • What have been the most rapidly changing aspects of farmers’ access to infrastructure and services in rural Kenya over the past decade? • How have these changes impacted the input and output marketing decisions made by smallholder farmers over the same period? • To what extent are market access conditions endogenously determined with the aggregation of farmers behavior determining the quantity of marketed output? • Is there a spatial structure to household marketing decisions? If so, how is it best characterized? How does it affect analytical conclusions about the impacts of access investments? 2.1.2 Organization of this paper This essay is organized as follows. Section 2 describes the major patterns of smallholder marketing and rural infrastructure in Kenya, including changes taking place over the last decade. The conceptual framework relating access conditions to marketing outcomes is presented in Section 3. Data used in this analysis are described in Section 4. The estimation strategy is given in Section 5, along with detailed notes on methodological aspects that are novel. Results are presented in Section 6: first a set of non-parametric relationships between marketing and access indicators, and then the econometric estimation results. Section 7 14 concludes the essay with a summary of the key findings and their implications for policy and future research. 2.2 Smallholder marketing and rural infrastructure in Kenya 2.2.1 Smallholder market participation in Kenya Kenya has one of the most commercialized agricultural systems in sub-Saharan Africa, yet by world standards, farmer participation in markets remains low (Olwande and Mathenge 2011). This is illustrated in Table 1. Persistently low levels of market participation characterize output marketing of staples, as well as higher value commodities (ibid). Stagnant market participation is not limited to output markets. Fertilizer adoption rates are higher now than they were in previous years, although application rates remain lower than desired (Omamo and Mose 2001, Alene et al. 2008). Furthermore, input use is highly variable over time: many farmers adopt and disadopt in successive years (Suri 2011) or, more commonly, vary the degree of input use per hectare cultivated from one year to the next. Table 2 shows average percentage of farmers using purchased fertilizer over the past decade, as well as the average application rates. We may observe that the share of farmers using fertilizer is fairly high, but has not grown over the past decade; the average application rate is fairly low and is declining over time. 2.2.2 Rural market access conditions are generally improving The factors constraining market participation are various, including poor access to improved technologies, productive assets, credit markets and extension services (Barrett 2008). Household- and community level factors may interact in complex ways, and may be further 15 affected by such non-local factors as the costs of inter-market commerce, which conditions reference prices in local markets (ibid.). Many of the non-household factors can be understood as features of economic remoteness, i.e. poor access to market infrastructure and services generally, under which many of the conditions above are intensified. Numerous empirical studies confirm the inverse relationship between access to markets and participation in Kenya (e.g. Renkow et al. 2004, Omamo 1998, Alene et al. 2008, Omiti et al. 2009, Muto and Yamano 2009). Not surprisingly, the government of Kenya has placed great emphasis on rural infrastructure provision, with the aim of improving rural market access. 16 Table 1 Trends in smallholder marketing in Kenya smallholders who produce crop maize high value milk all crops 2000 95% 96% 69% 2004 98% 97% 69% 2007 98% 97% 69% 2010 95% 98% 69% producers who sell 2000 38% 74% 73% 2004 44% 77% 75% 2007 44% 74% 79% marketed % of production 2010 34% 72% 82% 2000 17% 37% 40% 44% 2004 20% 35% 37% 42% 2007 21% 37% 45% 45% 2010 15% 39% 49% 41% % of gross income from sales 2000 3% 8% 11% 26% 2004 5% 7% 12% 23% 2007 5% 6% 11% 24% 2010 3% 6% 15% 21% Note: Marketed % of production is calculated at the household level as the marketed share of total value of production. High value includes all perennial and annual horticulture and cash crops, but not grains or other staples. (A list of high-value crops is provided in the footnote on page 31.) The summary values shown are the national averages of household level calculations for all households in the sample used in this analysis. % of gross income from sales is calculated as the share of gross household income coming from farm sales. Table 2 Trends in purchased inorganic fertilizer applied to all crops year 2000 2004 2007 2010 Total Farmers using purchased Average application inorganic fertilizer rate (kg/ha) 70% 165 72% 148 76% 149 74% 139 73% 150 Note: Users of purchased inorganic fertilizer were defined as any farmer reporting a non-zero usage of any type of inorganic fertilizer, net of any amounts that were obtained in 2010 at fully or partially subsidized rates (i.e. subsidized purchases were not considered as purchases.) The average application rate is calculated as the total amount of inorganic fertilizer used in the main season divided by the total cultivated area for that season. This includes inorganic fertilizer of various types, and includes fertilizer received at subsidized rates. The application rate for different fields will vary within a farm. 17 Figure 1 Trends in access to services and infrastructure 18 2.2.3 A dynamic rural infrastructure landscape Over the last decade, Kenya’s rural access landscape has experienced considerable improvements. Across the country, rural households have experienced decreasing distances to roads and other physical infrastructure, as well as to services such as input retailers and veterinary clinics (Figure 1, left panel). At the same time, household level access to mobile telephone and electricity has increased (Figure 1, left panel). In the case of mobile phones, the growth has been stunning, going from nearly zero to reach 80% of rural households by 2010. Improvement in other indicators has been more modest, but has generally been positive. Table 3 summarizes the major infrastructure changes in Kenya over the past decade. 2 Given the generally positive changes in rural infrastructure, on the one hand, and the lack of widespread growth in market participation, on the other, one might be tempted to conclude that investments in infrastructure and service provision have not had positive impacts on market participation. However, it is important to note that these are unconditional relationships, which do not control for other driving factors. Furthermore, these aggregate trends mask important variation at the community and household levels. Some households certainly are increasing their participation in input and/or output markets, and how rural infrastructure investments may have influenced such growth, after controlling for other factors, remains an open question. This question is the primary starting point for the research described in this paper. 2 There have been important spatial variations in the growth of infrastructure provision. These trends are described in more detail in Chamberlin and Jayne (2009). 19 Table 3 Provision of rural infrastructure and services in Kenya 2000-2010 Type Mobile phones Provision Private Growth ’00-‘10 Rapid Characteristics of expansion Coverage went from <2% in 2000 to 80% in 2010 Electricity Public Moderate Steady incremental expansion since 2004 Fertilizer supply Private Moderate Significant initial expansion following market liberalization in late nineties, especially in previously underserved areas (Chamberlin and Jayne 2009). Some contraction since 2007 possibly due to civil disturbances. Extension Mostly public Low Local governmental agricultural extension offices are operated under the National Agriculture and Livestock Extension Programme, under the Ministry of Agriculture. Muyanga and Jayne (2006) note the expansion in recent years of agricultural extension services provided by non-profit entities such as non-governmental organizations, faith based initiatives and community based organizations. The TAMPA data indicate a gradual but steady expansion of extension offices throughout the period, although we cannot distinguish between expansion of governmental and nongovernmental offices. Roads Public Low 3 The national road network of tarmac and murram roads was largely in place before independence; some expansion of feeder roads into new areas 3 The extension distance question asked in the TAMPA survey is “how far is it to the nearest agricultural extension office?” Presumably, most respondents are indicating distance to the nearest government extension office, although it is possible that respondents are also indicating distances to non-governmental project offices. If this is the case, then at least some of the changes we observe in this indicator are reflective of non-governmental service provision. 20 2.2.4 Previous assessments of access and market participation in Kenya Other household-level assessments of market participation and access conditions in Kenya have generally concluded a strong positive relationship. Renkow et al. (2004) use price spreads between the farmgate and the nearest market, along with information on distance and the predominant mode of transportation to estimate fixed costs of maize market participation. They conclude that fixed costs increase with distance and transportation time. Omamo (1998) calculated the opportunity cost of time spent walking to and from market centers to estimate the marginal costs of transporting crops to market; these distance-dependent variable costs were then shown to be important determinants of cash crop production and market participation. Alene et al. (2008) use dummy indicators of whether or not the nearest maize and fertilizer markets were considered “distant” (greater than 10 and 15 km, respectively) within two-stage (Heckman) models of output and input market participation. They conclude that distant markets impose both fixed and variable marketing costs and thereby constitute an important negative determinant of both participation and volume decisions. Omiti et al. (2009) classify rural and peri-urban villages into 4 categories of relative access on the basis of a mix of qualitative and quantitative characteristics and conclude very generally that better access conditions enable market participation. Most of these studies consider only one dimension of market access, which is almost exclusively rendered in terms of distance (or imputed cost of travel) to the nearest market. Access to services or non-road infrastructure is almost universally absent (or, if included, treated as part of a village classification scheme). None consider the possible interactions between changes in different aspects of remoteness within an econometric framework. 21 Furthermore, all of these studies use cross-sectional measures to make inferences about the impacts of market access. Given the changes taking place over the past decade, and our ability to observe them in household panel data, it is natural to ask how cross sectional results may differ from those of panel analysis, especially with regard to the aspects of the Kenya’s rural infrastructure landscape which have been most dynamic over the past decade. 22 2.3 Conceptual Framework 2.3.1 Definitions Marketing costs are defined here as the aggregation of physical transfer costs and transactions costs of exchange. This definition distinguishes the more conventional physical costs of transportation and storage from the less quantifiable transactions costs associated with searching for trading partners, obtaining market information, bargaining, contract design and enforcement, dealing with principal-agent problems and opportunistic behaviors of trading partners, etc. Although in recent years many researchers have used transaction costs as a label encompassing all marketing costs, this has diluted the integrity of the term meant to refer to the costs associated with contracting and exchange per se (as consistent with Williamson 1979, North 1990, Coase 1937, Demsetz 1988, etc). Farmgate prices are the prices of inputs and outputs at the location of the farm, i.e. after transportation and other marketing costs have been accounted for. Farmgate prices may differ considerably from market prices, especially in remote areas where such marketing costs tend to be large. 4 2.3.2 Remoteness and market participation The first way in which access enters into behavioral models for small farm managers is through its influence on the variable costs of marketing: the unit costs of transferring a good from one location to another. Holding unit costs constant, total costs obviously increase with distance: it 4 Some researchers use the tem effective prices to refer to farmgate prices (e.g. Mason 2011, Renkew et al. 2004, Key et al. 2000). In most cases within the applied agricultural economics literature, effective price is synonymous with farmgate price, as I define it here. However, the use of this term varies somewhat in other economic research contexts and their associated literatures. 23 costs more to transport a sack of grain over 100km than over 10km. However, Minten and Kyle (1999) also find evidence of marginal transfer costs increasing with remoteness, i.e. the unit costs of trader services increase at an increasing rate with remoteness. As a result, farm households face increasing prices of inputs, and decreasing prices of outputs, with greater distances from markets. Second, however, remoteness from markets is also characterized by higher fixed costs of exchange. For example, the costs of obtaining market information, locating and negotiating with transaction partners, monitoring quality, etc., all increase with remoteness, especially when physical distance is compounded by such factors as weak infrastructure (e.g. telecommunication) and/or institutions (e.g. legal enforcement). Much theoretical and empirical work has shown that the fixed costs of marketing have strong effects on the market participation decisions by rural smallholders (de Janvry et al. 1991; Key et al. 2000; Croppestedt et al. 2003; Holloway, Barrett and Ehui 2005; Bellemare and Barrett 2006; Holloway et al. 2008). To illustrate how both fixed and variable transfer costs enter into effective price formation: the unit cost of fertilizer at the farmgate 𝑝 𝑓𝑒𝑟𝑡 is generally taken to be some function of the price 𝑓 𝑓𝑒𝑟𝑡 at the origin market, 𝑝 𝑚 , plus the costs of intermediation by traders and transporters. We may represent these intermediation costs as (1) 𝜏 = 𝜏(𝑑 𝑓, 𝑚 , 𝐺, 𝑀, 𝐼, 𝑄) 24 where 𝜏 is a function of the distance between farm and market (𝑑 𝑓, 𝑚 ), endowment of public assets (𝐺) such as infrastructure and public services, degree of market competition (M), 5 availability of market information (I), and the quantity marketed (𝑄). Although I do not specify a functional form, nor do I distinguish between fixed and variable components of intermediation costs, we may generally expect that 𝜕𝜏⁄ 𝜕𝑑 > 0 , 𝜕𝜏⁄ 𝜕𝐺 < 0 , 𝜕𝜏⁄ 𝜕𝑀 < 0 , and 𝜕𝜏⁄ 𝜕𝑄 < 0. The third way in which access to markets affects economic decision making is through outcome uncertainty. In particular, remote areas are often characterized by greater price uncertainty than areas with better market access. This relationship operates through several mechanisms. Minten and Randrianarison (2003; cited in Stifel and Minten 2008) document greater seasonal price variability for rice in remote areas of Madagascar, linking this with the observation that many poor farmers, faced with liquidity constraints and missing financial markets, sell at harvest time, when supplies are greatest, but buy back from urban stockpiles for consumption later in the year when rural stocks are low (Barrett and Dorosh, 1996). This use of the rice market as a capital market results in much larger seasonal price differences than would otherwise be the case and is particularly pronounced in remote areas where financial markets are weakest. Stifel and Minten (2008) also associate increased price variability with low road 5 Note that these components may also interact with one another. In particular, as I discuss below, better access to information at the individual level may reduce information asymmetries that feature in market conduct and performance. Thus, in equation 1 above, we might render 𝑀 as a function of 𝐼, i.e. 𝑀 = 𝑀(𝐼). Since 𝐼 may also play a direct role in lowering the fixed costs of marketing (e.g. by lowering the costs of acquiring price information), it may also appear independently, i.e. 𝜏 = 𝜏(𝑑 𝑓, 𝑚 , 𝐺, 𝑀(𝐼), 𝐼, 𝑄). 25 quality in remote areas, which increases the likelihood of interruptions in information and material flows. We may integrate these three aspects of remoteness/accessibility into a concise conceptual definition: market access is the set of non-household factors that affects differences between farmgate and market prices, i.e. 𝐴 = �𝑑 𝑓, 𝑚 , 𝐺, 𝑀, 𝐼�. 2.3.3 Expected smallholder responses to access improvements We may incorporate the above ideas very generally into a standard household market participation model, where the expected level of participation falls out of the first order conditions of a utility maximization problem (Barrett 2008). Let us define the outcome 𝑦 𝑖 𝑘 for household 𝑖, where 𝑘 indexes the following: 1. Marketed share of value of production (HCI: household commercialization index) 2. Log value of crop sales 3. Log inorganic fertilizer demand (kg) Each of these outcomes is theoretically linked with improving access to markets. Specifically, as access conditions improve, we would expect, for household incomes to rise (as lower costs of market participation enable returns to farm specialization; as greater off-farm employment opportunities become available to household members), for cultivated area to decrease (as land becomes relatively more costly as a factor of production), for specialization to increase (as consumption needs are more efficiently met through the marketplace than through subsistence-oriented farm management), for farms to participate more in markets (as the costs 26 of participation decline), for input demand to increase (as the effective costs of input use decline), for land prices to increase (as the value of lower access costs are capitalized into land values) and for agricultural wages to increase (as rising demand for non-farm labor increases the opportunity cost of on-farm work). Consider a simple profit-maximizing household, As a consequence of non-separability; both production- and consumption-side variables may affect demand for farm inputs and supply of marketed output. The supply of marketed output and the demand for purchased are defined as: 𝑜𝑢𝑡𝑝𝑢𝑡 𝑖𝑛𝑝𝑢𝑡 𝑦 𝑘 = 𝑓 �𝑝 , 𝑝 , 𝑧� 𝑓 𝑓 (2) 𝑝 𝑖𝑛𝑝𝑢𝑡 𝑖𝑛𝑝𝑢𝑡 = 𝑓 �𝑝 𝑚 , 𝜏(𝑑 𝑓, 𝑚 , 𝐺, 𝑀, 𝐼, 𝑄)� 𝑓 𝑝 𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑓 �𝑝 𝑚 , 𝜏(𝑑 𝑓, 𝑚 , 𝐺, 𝑀, 𝐼, 𝑄)� 𝑓 Where 𝑦^𝑘 is marketed quantity (e.g. 𝑦^3 = fertilizer demand), 𝑑 is a vector of variable input prices at the farmgate; 𝑝_𝑓^𝑜𝑢𝑡𝑝𝑢𝑡 is a vector of expected farmgate crop prices and 𝑧 is a vector of other shifters, such as household landholding, livestock and durable assets such as farm equipment and vehicles, labor availability, household demographic characteristics and geographic characteristics such as biophysical production endowments. As described in the previous section, farmgate input and output prices are a function of market prices plus the costs of market intermediation 𝜏(𝑑 𝑓, 𝑚 , 𝐺, 𝑀, 𝐼, 𝑄) which depends on distance, public assets, 27 market competition and information. Because access is defined as 𝐴 = �𝑑 𝑓, 𝑚 , 𝐺, 𝑀, 𝐼� , we may identify the partial effects of interest as 𝜕𝑦 𝑘 ⁄ 𝜕𝐴k , where the subscript k emphasizes that the relevant measures of access may differ for different household marketing decisions. Given the definitions above, we would generally expect that 𝜕𝑦 𝑘 ⁄ 𝜕𝐴k > 0 for all 𝑘. 2.3.4 Mobile phones and distance Of particular interest in the present study is mobile phone access. As noted above, mobile telephony has expanded phenomenally in rural Kenya. Much recent empirical work has identified ways in which changes in telecommunications technology are affecting the transactions costs of local trade, even holding other types of infrastructure constant. Overå’s (2006) study of small traders in Ghana illustrates how mobile phones are lowering the costs of interactions over dispersed areas. In particular, lowered costs of discovery and exchange of information, negotiation, and monitoring make it possible for traders to penetrate further into remote areas than they had previously been willing to do. Jensen (2010) has suggested that there are two main channels through which better information benefits rural producers: efficiency gains associated with better potential for spatial arbitrage, and reduced information asymmetries in rural marketing environments. He argues that several secondary impacts derive from these two main channels: (1) increased supply response, as farmgate prices tend to rise with reduced information asymmetry; (2) reduced transportation costs, as spatial trading patterns reflect more efficient allocation of farm surplus to markets; and (3) reduced price variability, also deriving from more efficient arbitrage over time and space. These latter effects can be understood as shifters of either fixed 28 or proportional costs facing small farm marketers. In this paper, we treat the role of information in fixed and proportional transactions costs as an empirical question: we examine market participation model specifications that allow for mobile telephony to appear in both participation and supply decisions. Evidence to date has tended to support the importance of mobile access for reducing search costs (Overa 2006, Aker 2008), reducing price volatility and exposure to price risk (Jensen 2007, Aker 2010, Aker and Fafchamps 2010, Amaya and Alwang 2012), and for increasing output market participation generally (Muto and Yamano 2009). Muto and Yamano’s (2009) study also shows that the importance of mobile phones for marketing increases with physical distance from markets. Mathematically, this conclusion is the same as that drawn by Overa (2006), i.e. that mobile phones effectively moderate the costs of commerce imposed by physical distance. To illustrate this: if the probability of market participation is related linearly to distance from market (where increasing distance corresponds to decreasing likelihood of market participation), and better access to information enters the model as a fixed-cost reducing shock, we would expect to see a shift such as that represented graphically in Figure 2. 29 Figure 2 Impact of enhanced communication on distance-related marketing costs market participation distance to market Information about prices, availability of transaction partners, etc., is perhaps most easily characterized as a fixed cost since such costs are invariant to the volume transacted. This idea frequently characterizes household market participation models that feature access to information as a determinant (e.g. Heltberg and Tarp 2002). However, Muto and Yamano (2009) argue that mobile-facilitated access to market information is likely to affect variable costs; specifically, they argue that increased information is likely to result in improved transportation efficiency, with a consequential reduction in transfer costs between the market and the farmgate. The impact channels described by Jensen (2010; summarized above) suggest that mobile phones affect both fixed and variable costs. In this paper, my emphasis is not on which type of costs are being affected (although I do offer some observations related to this point in the results section), but rather on the possible interaction effects between mobile phones and physical distance from market infrastructure. 30 2.3.5 Spatial dependence 2.3.5.1 Spatially dependent decision making There is abundant empirical evidence that smallholder decision making is strongly conditioned by the decisions of neighboring agents. Much of this is rooted in the technology adoption literature (see Feder et al. 1985, Doss 2006, and Foster and Rosenzweig 2010 for detailed reviews). A key idea is that, given that many poor farmers are risk averse and the profitability for different technologies is uncertain (especially for new technologies) farmers tend to be more willing to adopt a technology when they observe their neighbors doing so. This notion is sometimes called endogenous learning. Numerous empirical studies support the idea that technology adoption decisions are strongly conditioned by social networks (e.g. Munshi 2004, Moser and Barrett 2006, Bandiera and Rasul 2006, Conley and Udry 2010). Barrett (2008) has pointed out that market participation decisions are functionally analogous to technology adoption decisions: they are components of utility maximizing household strategies, responding to economic incentives and constraints in similar ways. As such, the applicability of endogenous learning frameworks to market participation decisions is a natural extension. Other reasons which have been suggested for spatially dependent decision making would also seem applicable to market participation decisions. For example, evidence of rural household utility obtained from social conformity or “bandwagon effects” (Leibenstein, 1950; Bernard and Torero, 2010), the empirical regularity of “like-mindedness” or preference interdependence among similar households (Pollak 1976; Case 1992; Manski 1993) including cases where such similarity is defined by geographical nearness (Holloway and Lapar 2007). 31 2.3.5.2 Formalizing the channels of spatial dependence Francese and Hayes (2009) note that while the notion of interdependence in outcomes across neighboring agents is “substantively and theoretically ubiquitous in and central to” microeconomic models of decision making, the vast majority of empirical applications omit interdependence altogether. Even in the research contexts where spatial interdependence has been emphasized (most notably in the social network and policy diffusion literatures), econometric models often fail to fully reflect simultaneity of the outcomes across units. For reasons articulated in the literature review, it is desirable to allow a formal specification of our 𝐾 farm-level outcomes as equilibrium outcomes of spatial or social interaction processes, in which the value of the dependent variable for one household is jointly determined with that of neighboring households. In this study, I am interested in managerial decisions taken at the farm level. A growing body of literature is converging on the idea that smallholding decision-making about technology use, market participation, etc., is strongly conditioned by the decisions taken by their peers and neighbors (e.g. Holloway et al. 2002; Holloway and Lapar 2007; Langyintuo and Mekuria 2008; Conley and Udry, 2010). To formalize this conceptualization, consider three different types of interaction effects that may explain why a given outcome is dependent on nearby outcomes (Manski 1993, Le Sage and Pace 2009, Elhorst 2011). First, endogenous interaction effects describe a process whereby the decision taken by an economic agent is directly affected by the decisions taken by other neighboring agents. The idea is that observed behaviors can be thought of as equilibrium outcomes of explicit spatial interactions, e.g., farmers observing and learning from early 32 adopting neighbors. Second, exogenous interaction effects refer to a process in which an agent’s decisions depend on independent explanatory variables of the decision taken by other spatial units. (As an example, consider a farmer’s marketing decision as a function of local access conditions as well as access conditions in neighboring areas.) A third mechanism is correlated effects, where shared unobserved environmental characteristics result in similar behaviors or outcomes in a given neighborhood, e.g. unobserved weather-related shocks that have similar effects on all households in a given locality. I now formalize each of these channels in more detail. To accommodate the idea of endogenous interaction effects, let us start by defining a simple spatial lag model, where outcome 𝑦 𝑖 depends on outcomes of neighboring locations 𝑗, as defined by a spatial weighting weight 𝑤 𝑖𝑗 : (4) 𝑦 𝑖 = 𝜌 ∑ 𝑗𝑁= 1 𝑤 𝑖𝑗 𝑦 𝑗 + 𝜀 𝑖 We simplify notation to write this general model as: (5) 𝑦 = 𝜌𝑊𝑦 + 𝜀 The spatial weights matrix 𝑊 gives formal structure to the interactions between each pair of observations in a cross-section. 𝜌 is the corresponding spatial autoregressive parameter. 𝑊 is a 𝑁 × 𝑁 positive matrix. Each element 𝑤 𝑖𝑗, ∀ 𝑖 ≠ 𝑗 indicates the intensity of the relationship between cross sectional units 𝑖 and 𝑗. By convention, the diagonal elements 𝑤 𝑖𝑖 are all set to 33 6 zero. The values in each element may represent nearness between neighbors (e.g. inverse distance), or may represent a binary designation (i.e. 1=neighbors, 0=not neighbors). Non-zero elements indicate that two observations are neighbors (and, if weighting elements are continuous, to what degree they are close). 7 We incorporate non-lagged exogenous covariates into the model as: (6) 𝑦 = 𝜌𝑊𝑦 + 𝑋𝛽 + 𝜀 Note that if this model is correct but we estimate its non-spatial version (i.e. if we leave out 𝑊𝑦) then all the resulting estimates ̂ are biased. I refer to the model in equation (6) as the 𝛽 spatial lag or the spatial autoregressive model (SAR). To incorporate exogenous interaction effects, we may define the following model: (7) 𝑦 = 𝑋𝛽 + 𝑊𝑋𝜃 + 𝜀 where 𝑊𝑋 represents lagged exogenous variables. 8 These zero-valued diagonal elements is what allows us to write equation (5): for any observation 𝑦 𝑖 , the right hand side elements of 𝑾𝒚 exclude 𝑦 𝑖 by definition. In other words, a unit is not its own neighbor. 6 7 Note that the distance between each pair of observations does not need to be defined in terms of geographical space, although it is easier to introduce the idea in these terms. I discuss the implementation of non-geographical distance in a later section of this chapter. Also note that the The spatial econometrics literature often refers to this formulation as the Spatial Durbin model when accompanied by a spatial lag (i.e. = 𝜌𝑾𝒚 + 𝑿𝜷 + 𝑾𝑿𝜽 + 𝜺 ) and the Spatial Durbin Error model when accompanied by a spatially autocorrelated error term (i.e. 𝒚 = 𝑿𝜷 + 𝑾𝑿𝜽 + 𝜺 , 𝒖 = 𝜆𝑾𝒖 + 𝜺). 8 34 The third way of conceptualizing spatial interaction, via correlated effects, is by allowing for dependence in the innovations. Such an approach would be consistent with a process in which similar outcomes observed on neighboring agents are driven not by social interactions per se, but rather through omitted variables which are spatially autocorrelated. If this is the case, we may write a model that includes spatial dependence in the error term: (8) (9) 𝑦 = 𝑋𝛽 + 𝑢 𝑢 = 𝜆𝑊𝑢 + 𝜀 Here 𝜆 denotes the spatial autocorrelation parameter. I shall refer to this model as the spatial error model (SEM). For simplicity, the weights matrix W is taken to be the same in equations (9), (10) and (11), although we note that this does not have to be the case. Furthermore, these channels of interdependence are not necessarily mutually exclusive. The so-called Manski model formalizes the case where all three channels exist as: (10) (11) 𝑦 = 𝜌𝑊𝑦 + 𝑋𝛽 + 𝑊𝑋𝜃 + 𝑢 𝑢 = 𝜆𝑊𝑢 + 𝜀 Note that allowing interdependence in the error term (and not elsewhere in the model) does not really require a modified conceptual model. In other words, if the true form of dependence is exclusively in the error term, OLS (i.e. estimation of the non-spatial model) will be consistent, although not efficient. However, even if this is the only channel of dependence, the coefficient 35 estimate will still yield information with an economic interpretation (although any such interpretation is necessarily conjectural since it represents unobserved effects). In this essay, I assume the existence of some kind of spatial interdependence in marketing decisions, but I treat the mechanism of interdependence agnostically, i.e. I let the data tell me which model fits best. Regarding the assumption of spatial independence, I assert that it is simply unrealistic to assume that smallholder decisions regarding marketing and technology are made independently of one another. A priori, I hypothesize that, after controlling for all other observable factors, some form of spatial dependence is detectable at the level of the village (or somewhat broader), and that the direction of this influence is positive, i.e. neighbors have similar outcomes. Regarding the specific form of this dependence, I emphasize the spatial lag and spatial error formulations as simple dependence structures amenable to empirical verification. (I also comment on the possibility of more complicated forms of dependence in the results section.) What does a spatial model buy us? Conceptually, the central attraction is that the identification and quantification of spatial dependence gives us insights into the social structure of decision making. More prosaically, but in some ways more critically, ignoring spatial structure when it exists can give rise to a number of serious econometric implications, which I have already alluded to above and revisit in Section 5.1. Finally, the mechanism of spatial dependence (i.e. lag versus error) has implications for how we conceptualize and measure partial effects. Specifically, in the presence of a spatial lag of the dependent variable, a linear coefficient estimate is no longer equivalent to a partial effect, since the variable of interest also operates 36 through neighboring locations. I discuss an approach to estimating the marginal effects of covariates in spatial lag models in Section 5.4. 2.3.6 Hypotheses Drawing from the various parts of the conceptual framework I have outline above, I summarize the hypotheses that I test in this analysis: i) Improved access is positively associated with market participation and these effects are detectable for both input and output markets ii) Multiple dimensions of access are important to decision making outcomes; specifically: a. Access to input providers is an important factor for input market participation (but not necessarily for output market participation) b. Access to extension services is an important factor for input market participation (but not necessarily for output market participation) c. The impacts of transitory changes will be most influential for types of infrastructure which have undergone the most pronounced changes over the past decade; access indicators which have changed very gradually over time (such as roads) are more likely to affect outcomes through their absolute levels (i.e. time-averages) d. Access to telecommunications infrastructure is important to output marketing and, to a lesser extent, to input marketing 37 iii) There are interactions between information and distance that correspond with a distance-reducing effect of better information (with respect to market participation) iv) Smallholder marketing decisions are characterized by spatial dependence even after controlling for community-level and geographical factors a. This dependence may be characterized by an endogenous lag of the dependent variable and/or by spatially autocorrelated innovations 38 2.4 Data Smallholder household data for Kenya come from a nationwide panel survey collected by the Tegemeo Institute of Egerton University. Detailed plot and farm data was collected from 1,233 agricultural households in 1997, 2000, 2004, 2007 and 2010. The balanced panel survey contains information on household production, marketing activities, input and output costs (from which farmgate prices were imputed), and a variety of self-reported indicators of access to markets, related infrastructure and services. The sampling frame for the panel was prepared in consultation with the Kenya National Bureau of Statistics (KNBS) in 1997. Twenty-four (24) districts were purposively chosen to represent the broad range of agro-ecological zones (AEZs) and agricultural production systems in Kenya. Next, all non-urban divisions in the selected districts were assigned to one or more AEZs based on agronomic information from secondary data. Third, proportional to population across AEZs, divisions were selected from each AEZ. Fourth, within each division, villages and households in that order were randomly selected. A total of 1,578 households were selected in 1997 in the 24 districts within the country’s eight agriculturally-oriented provinces. The sample excluded large farms with over 50 acres and two pastoral areas. The initial survey was implemented in 1997, which covered both the 1996/97 and 1995/96 cropping seasons. Subsequent follow up surveys were conducted in 2000, 2004, 2007 and 2010. After the 2010 survey, 1,233 households were consistently interviewed in all five years. 9 9 There are actually 1243 households surveyed in each wave of the panel. However, because our focus is on smallholder farmers, I restricted my analysis to households reporting average 39 The household data were augmented by geospatial data on infrastructure, population, terrain, land cover, and climate. These data come from various data sources, which are detailed in the Appendices to each chapter. The data were brought into a common geographic information system (GIS) framework, where they were transformed in ways described in the essays and appendices. Several of the spatial variables used (e.g. the estimated travel time to towns of particular population sizes) were originally developed by the author in the course of carrying out the research described in this dissertation. cultivated areas of 10 hectares or less; this resulted in 10 households being dropped from our sample, leaving a total of 1,233 households in each of the 5 panel waves. 40 Table 4 Summary household and village characteristics (2010) category output marketing description marketed % of production (total) marketed % of production (maize) maize producer sells maize (if producer) value of maize sold high-value producer sells high-value (if producer) value of high-value sold milk producer sells milk (if producer) value of milk sold high-value % of marketed output 10 unit share share binary binary Ksh*1,000 binary binary Ksh*1,000 binary binary Ksh*1,000 share 5th 0% 0% 1 0 0 1 0 0 0 0 0 0% 10 25th 14% 0% 1 0 0 1 0 0 0 1 4.6 1% percentile 50th 75th 41% 67% 0% 29% 1 1 0 1 0 4.0 1 1 1 1 1.5 6.8 1 1 1 1 16.9 41.0 14% 71% 95th 88% 67% 1 1 45.0 1 1 75.4 1 1 116.3 100% mean 41% 15% 95% 34% 8.2 98% 72% 15.5 69% 82% 31.5 34% N 1,231 1,172 1,233 1,172 1,172 1,233 1,207 1,207 1,233 851 851 1,073 High-value crops include annual and perennial crops which have high value-to-weight ratios and are not generally considered staple commodities. Specifically, as defined here, high-value crops include the following (in order of frequency, using 2010 data): sukuma wiki, cowpeas leaves, bananas, avocado, amaranthus, irish potatoes, mangoes, guava, cabbage, pumpkin, pumpkin leaves, pawpaws, lugard, coffee, tea, sugarcane, oranges, passion fruit, tomatoes, onions, lemons, pigeon peas, spinach, carrots, coconuts, tree tomatoes, green peas, soya beans, macadamia nuts, Jack fruit, matomoko, poyo, mero, french beans, plums, dry peas, orange, pineapples, zambarao, pears, mulberry, cotton, citrus, snow peas, brinjals, tangerine, cashew, peppers-bell, apples, chili peppers, macadamia, squash, sunflower, peppers, straw berry, dhania, bambara beans, passion, watermelon, other leaves (beans, njahi....), coconuts, venessi, snap peas, tamarind, pyrethrum, lucerne, cucumber, tobacco, rosemary, miraa, gourds, wild berries, turnips, lettuce, cauliflower, grapes, beetroot, simsim, melon, pomegranate, ginger, artemesia, lemon grass, medicinal plants, jatropha. 41 Table 4 (cont’d) category input use description purchased inorganic fertilizer fertilizer application rate household land owned characteristics household size female headed household value of productive assets age of household head education of household head total household income off-farm share of income 11 unit binary 5th 0.0 25th 1.0 percentile 50th 75th 1.0 1.0 95th 1.0 mean N 74% 1,110 kg/acre acres 0.0 0.8 1.7 9.4 1.8 4.0 40.6 3.0 5.7 75.0 5.0 7.7 136.4 15.0 11.0 49.6 4.8 6.0 1,111 1,233 1,233 0 0 39 0 28.8 0% 0 2.2 51 3 76.3 12% 0 7.1 61 7 152.0 37% 1 23.3 70 10 286.0 67% 1 215.0 82 15 706.0 92% 27% 49.8 60.6 6.5 233.0 40% 1,233 1,233 1,233 1,233 1,233 1,233 12 AE binary Ksh*1,000 years years Ksh*1,000 share 11 There was a fertilizer subsidy program that began in 2007/2008 and is observed in the TAMPA data during the 2010 round. About 3% of the households in the study sample reported receiving subsidized fertilizer in 2010. About 8% received subsidized fertilizer in the previous year. Since I am interested in fertilizer purchases, I subtract all subsidized fertilizer received in 2010 from the 2010 usage. Unfortunately, it is not possible to determine if subsidized fertilizer received in the previous year (2009) was applied in the 2010 agricultural seasons. I have conducted some sensitivity analyses, which included using dummy indicators whether or not a household received any subsidy, and have also estimated the fertilizer demand models with both 2009 and 2010 quantities removed. The results do not differ substantially from those reported here, which only net out the 2010 subsidies. 12 AE indicates adult equivalent, calculated using the OECD equivalent scale (http://www.oecd.org/social/familiesandchildren/35411111.pdf). 42 Table 4 (cont’d) category access to markets & infrastructure biophysical description owns telephone distance to extension advice distance to fertilizer retailer distance to tarmac road distance to electricity distance to District town time to nearest city of 50,000+ population density characteristics average seasonal rainfall 5th 0 1.5 0.8 0.3 0.0 3.0 0.3 13109 pp/sq.km 25th 1 2.3 1.6 2.3 0.5 6.0 1.7 244 percentile 50th 75th 1 1 4.0 6.0 3.0 4.5 5.0 9.5 1.0 2.0 9.9 16.8 3.4 5.6 379 502 95th 1 10.5 8.0 20.0 4.0 30.0 11.1 744 mean 0.8 4.8 3.5 6.9 1.3 12.4 4.0 408 N 1,233 1,233 1,233 1,233 1,233 1,233 1,233 1,233 mm 372 554 751 531 1,233 unit binary km km km km km hours 248 700 Notes: All values shown above are for the year 2010. Ksh is Kenyan shillings in nominal terms. 13 pp/sq.km indicates persons per square kilometer. 43 Table 5 Household characteristics, output marketing and off-farm employment marketing categories income assets maize nonsellers sellers 245,949 379,294 108,691 178,842 5.7 6.6 high-value nonsellers sellers 224,499 325,702 141,922 133,715 5.3 6.3 milk nonsellers sellers 248,270 377,676 145,964 194,034 5.5 6.6 off-farm income dependence percentage of income coming from off-farm sources education maize high-value marketed % avg value % avg value % of farm marketing marketed marketing marketed production income assets education < 25% >= 25% 265,474 319,656 149,405 129,299 5.0 6.7 45% 36% 17,293 10,186 79% 71% 37,132 15,042 53% 37% < 50% >= 50% 271,304 342,638 144,393 125,678 5.5 6.8 44% 33% 15,999 7,964 79% 67% 32,870 8,836 50% 31% Notes: Values of income, assets and marketed output are all in real 2010 terms. Data are based on panel responses from 2000, 2004, 2007 and 2010. 44 2.4.1 Summary statistics: household and village characteristics Table 4 provides summary statistics of the smallholders in the sample. Most households are maize producers, although marketed share of maize output is low (20%), as is overall marketed share of production (40%). Most households also produce high value crops and a surprisingly large share of households market high-value output (70%, as compared with only 30% of maize producers). Use of purchased inorganic fertilizer is fairly common (80% of the sample), although application rate is still quite low (<50 kg/ha), much lower than the recommended rates. Farms are small: the average farm size is less than 5 acres (2 hectares) and the median is only 3 acres (1.2 hectares). Households are poor: the median income in 2010 was 152,000 KSh (2000 USD). Farms are weakly capitalized: in 2010 the median value of productive assets was only 7,000 Ksh (93 USD). 30% of the households in the survey are headed by females. On average, 60% of household income comes from on-farm production (which includes both sales income and the value of production retained for consumption). Households which engage with output markets tend to be wealthier and have better educated heads (Table 5). This is consistently the case across years and geographical regions, although the differences are not always statistically significant. Wealthy and well-educated rural households, on the other hand, are not necessarily the households most engaged in farm output marketing. The non-farm rural economy is important throughout Kenya and wealthier and better educated households are more likely to obtain larger shares of their total income from off- and non-farm employment. Households with non-trivial off-farm income tend to rely 45 much less on crop marketing. Table 5 shows that households obtaining more than a quarter of their total income from non-farm sources are less likely to sell maize or high-value crops, and market smaller amounts when they do sell. This is true even when normalizing by farm output: the last column of the table shows the percentage of total value of output which is marketed; this value is consistently smaller for households which receive important shares of their income from off-farm sources. As measures of output market participation, I will emphasize one in particular: the marketed share of production in value terms. For brevity, I will denote this as HCI (household commercialization index). One reason for preferring this measure to crop-specific measures is that most smallholders are producing and marketing a large number of commodities, but do not necessarily market the same commodities in every year. Therefore, constructing a balanced panel for a crop such as sugarcane or millet will suffer from a very reduced sample size. Additionally, reduced form estimation may be more plausible for an aggregate measure than individual crops, for which production and marketing decisions are not made in isolation. HCI is positively correlated with crop specific indicators (as shown in Table 6). Table 6 Correlation between HCI and crop-specific marketing measures maize beans sorghum millet wheat banana sugarcane milk sold (yes=1) 0.3074 0.2054 0.4024 0.2104 0.5689 0.2370 0.2370 0.1330 value of sales (KSh) 0.2745 0.1713 0.4827 0.2022 0.5485 0.1715 0.2029 0.1165 46 One crop specific measure I will look at in more detail, however, is maize, which plays an important role in the Kenyan agricultural economy. Almost all farm households produce and consume maize (Table 4), and it is the single most important staple consumption good in the both rural and urban areas. The input marketing decisions I will focus on are purchases of inorganic fertilizer. I will consider both incidence of use (a binary choice) and the intensity of use, which I measure as the application rate (kg/ha). One potential challenge in estimating infrastructure impacts on marketing outcomes is the low level of variability in marketing behavior across time (as described in Section 2.2 and Tables 1 and 2). This flatness has implications for the power of significance tests associated with panel estimators, a point which I will return to later. The access indicators available in the dataset include: distance to nearest tarmac road, distance to nearest fertilizer retailer, distance to nearest district headquarters (which is usually also the nearest market town) and travel time to the nearest large urban market, defined as cities of 50,000 or more inhabitants. I also examine the impact of access to telecommunications through the observed status of mobile phone ownership. The data on mobile phones, distances to tarmac road and fertilizer stockists are time varying. The data on distance to town and travel time to cities is fixed over the panel periods. 47 2.4.2 Distribution in space The 107 villages are distributed throughout Kenya’s agricultural zones (Figure 3). While these areas do not represent the most marginal environments in Kenya (such as the arid north), they do represent the range of biophysical and infrastructural characteristics encountered by the majority of Kenyan agricultural households. Many of the villages are clustered quite closely to one another. The inset maps in Figure 3 (labeled A, B and C) illustrate the close proximity of villages in several areas. Many households belonging to a particular village may be very reasonably considered close neighbors with households in nearby villages. This is illustrated in Figure 4 which shows survey household locations in Nakuru District. Household locations are designated by symbols which indicate which village they belong to, per the survey data. Even allowing for some imprecision in the household geographic coordinates, there is clearly a lot of overlap between these areas. I will refer to this proximity in the next section, when I discuss the definition of the neighborhoods used by the spatial weights matrix. 48 Figure 3 Distribution of survey villages 49 Figure 4 Household and village locations in Nakuru District, Rift Valley 50 2.5 Empirical models and estimation strategy 2.5.1 Basic empirical model In order to evaluate the impact of market access changes on market participation, I implement the conceptual model in equation (2) as follows: (3) 𝑘 𝑦 𝑖𝑡 = 𝐴 𝑖𝑡 𝛾 + 𝑋 𝑖𝑡 𝛽 + 𝜀 𝑖𝑡 where 𝐴 = {𝑚𝑜𝑏𝑖𝑙𝑒, 𝑒𝑥𝑡𝑒𝑛𝑠𝑖𝑜𝑛, 𝑡𝑎𝑟𝑚𝑎𝑐} for the output marketing outcomes (HCI, log value of crop sales) and 𝐴 = {𝑚𝑜𝑏𝑖𝑙𝑒, 𝑓𝑒𝑟𝑡𝑖𝑙𝑖𝑧𝑒𝑟, 𝑡𝑎𝑟𝑚𝑎𝑐} for the input marketing outcome (log inorganic fertilizer purchased). Mobile is a binary indicator of mobile phone ownership and extension, fertilizer, and tarmac are household-reported indicators of distance to the nearest extension office, fertilizer retailer, and tarmac road, respectively. X includes all other household and community-level covariates, including the market prices (at the district level) for inputs and outputs. The coefficients 𝛾 provide estimates of the influence of the access indicators on marketing decisions. Observations on the three principal outcomes of interest have the nature of corner solution problems: a non-negligible percentage of households report zero crop sales, fertilizer purchases and/or marketed share of production. (Additionally, a few households market all of their production, giving another pile-up at the upper bound of HCI=1.) One approach to evaluating such corner-solution problems is to model the decision as a twopart process, i.e. a binary market participation decision and a subsequent market participation 51 amount decision. A frequently used approach to modeling this type of process is Cragg’s double hurdle model (Cragg, 1971). The advantage of the double hurdle model is that it allows different mechanisms to determine the first and second stages: model covariates in the first and second stages need not be the same and, even if they are, their coefficient estimates may differ. Cragg’s model nests the Tobit model as a special case in which the same mechanisms underlie both stages. In this study, the corner solution problems are all characterized by relatively small amounts of non-participants, and so my emphasis is on the participation amount decision (i.e. the second stage). Accordingly, I am primarily interested in the unconditional average partial effect (APE) of access indicators on the second stage decision. A Tobit model provides consistent estimates of the unconditional (and conditional) APEs. An additional reason for preferring the Tobit model is that I am dealing with multiple endogenous variables, one of which is a binary outcome, and both IV and CF strategies present challenges for finding numerical solutions in complex models. In practice, the double hurdle model is more difficult to solve than the Tobit model, and this difference is exacerbated by model complexity. Other researchers have come to similar conclusions regarding the tradeoffs of flexibility and tractability of the double hurdle versus Tobit models for two-part participation decisions (e.g. Mason and Ricker-Gilbert 2013). Finally, I note that the double hurdle estimator is not capable of taking advantage of panel structure: applications of the double hurdle model to panel data are based on pooled crosssections (Wooldridge 2010). Whether or not this limitation is a liability is difficult to know in practice: within a double-hurdle modeling context there is no way to evaluate the poolability 52 assumption. The poolability assumption may easily be evaluated, however, in random effects and correlated random effects Tobit models. Having established a Tobit estimation as my empirical starting point, I will then evaluate how well corresponding linear models (i.e. linear CRE) approximate the unconditional APEs from the Tobit estimation. Wooldridge (2010) notes that it is reasonable in many cases to use linear panel models to estimate the determinants of a corner solution outcome: linear models have the advantages of being easily implemented, easily interpreted and often provide reasonable approximations of average effects. The main disadvantage is that a linear model may not provide good approximations of average effects at extreme values. If linear estimation does a reasonable job at indicating average impacts, then extending the non-spatial model into a spatial panel framework is much easier. (At the time of writing, only linear spatial panel estimators have been written; non-panel spatial Tobit models have also been written, although obtaining numerical solutions are sometimes challenging.) A priori, I have good reason to suspect that the linear models will perform well: the number of households reporting at the corner is relatively small for the output marketing indicators (Table below). The number of households reporting no fertilizer use is a bit larger (21%), but not so large that we must discard the possibility of a reasonable linear approximation to Tobit results. 53 Table 7 Characteristics of marketing outcome indicator variables variable HCI min mean 0 .46 median .47 max 1 obs. % zero skewness 4092 6% -.07 kurtosis 1.76 value of sales (KSh) 0 53,145 19,770 2,988,758 4092 6% 10.01 204.82 fertilizer use (kg) 205 4,317 4.04 32.05 0 100 1068 21% (Note: the skewness and kurtosis of the value of sales and fertilizer use variables pose a problem for Tobit estimation, which requires approximately normally distributed values within the non-truncated range. To resolve this problem, I use log value of sales and fertilizer in my empirical specifications, which effectively addresses this issue. Skewness and kurtosis of the log-transformed variables are within acceptable ranges and tests of the normality of errors from Tobit estimation indicate that the problem is successfully addressed.) 2.5.2 Spatial model estimation strategies Spatial models pose special challenges for estimation. The cost of ignoring spatial dependence in the dependent variable (and/or a spatial lag in the independent variables ) is high due to the simple fact that if one or more relevant explanatory variable are omitted from a regression equation, the estimator of the coefficients for the remaining variables is biased and inconsistent (i.e. the omitted variable problem; Wooldridge 2010). In contrast, ignoring spatial dependence in the disturbances, if present, will only cause a loss of efficiency (assuming, of course, that this non-spherical spatial error term is not an artifact resulting itself from an omitted variable). Furthermore, even when correctly specified, models with lagged dependent variables which are estimated via least squares will generally result in inconsistent parameter 54 estimates inconsistent estimation of the spatial parameters, and inconsistent estimation of standard errors (Le Sage and Pace 2009). There are three main approaches described in the literature for estimating models that include spatial interaction effects. The first is based on maximum likelihood (e.g. Le Sage and Pace 2009; chapter 3). The second is based on a generalized method of moments approach that uses 14 instrumental variables to deal with the endogeneity of spatial lags (IV/GMM ; e.g. Kelejian and Prucha 2010). A third approach uses a Bayesian Markov Chain Monte Carlo (MCMC) approach (e.g. Le Sage and Pace 2009; chapter 5). ML estimators assume normality of errors; IV/GMM does not rely on this assumption. Both ML and IV/GMM approaches, however, assume that the εit are independently and identically distributed for all 𝑖 and 𝑡 with zero mean and variance σ2 . Franzese and Hays (2007) compared the performance of the both estimators for panel data models with a spatially lagged dependent variable in terms of unbiasedness and efficiency. They find that the ML estimator weakly dominates the IV/GMM estimator in terms of efficiency, but that the IV/GMM estimator offers more robust estimates for some ranges of 𝛿. However, Elhorst (2010) notes that they did not consider differences between spatial fixed or random effects. In this section, I will describe a ML approach to estimating a specification of the conceptual model in equation (22). I describe approaches for both spatial fixed effects and spatial random effects, where the model has a spatial lag. Standard diagnostics for evaluating which approach to use are based on an extension of the normal Hausman test. 14 The estimator used in this approach is frequently referred to as the Generalized Spatial 2Stage Least Squares (GS2SLS) estimator. 55 The spatial specific effects 𝜇 𝑖 may be treated as fixed effects or as random effects. In the fixed effects model, a dummy variable is introduced for each spatial unit, while in the random effects model, 𝜇 𝑖 is treated as a random variable that is independently and identically distributed with zero mean and variance σ2 . Furthermore, it is assumed that the random variables 𝜇 𝑖 and 𝜀 𝑖𝑡 it µ are independent of each other. Other convenient assumptions are that 𝑊 is constant over time and that the panel is balanced. 2.5.3 Panel estimators 2.5.3.1 Fixed effects spatial lag model To implement the FE approach, we remove the time constant effects 𝜇 𝑖 by demeaning the equation. This leaves us the following time-demeaned variables (23) 𝑦 ∗ = 𝑦 𝑖𝑡 − (1/𝑇) ∑ 𝑡𝑇 = 1 𝑦 𝑖𝑡 𝑖𝑡 and 𝑥 ∗ = 𝑥 𝑖𝑡 − (1/𝑇) ∑ 𝑡𝑇 = 1 𝑥 𝑖𝑡 𝑖𝑡 Now, stack the observations as successive cross-sections for 𝑡 = 1, … , 𝑇 to obtain vectors of dimension(𝑁𝑇, 1) for 𝑌 ∗ and �𝐼 𝑇 ⨂𝑊�𝑌 ∗ , and an (𝑁𝑇, 𝐾) matrix for 𝑋 ∗ of the demeaned variables. Note for future reference that we may write (24) 𝑦∗ ⎡ 11 ⎤ ⎢ ⋮ ⎥ ∗ ⎢ 𝑦 𝑁1 ⎥ 𝑌 ∗ = ⎢ ⋮ ⎥ and ⎢ 𝑦∗ ⎥ ⎢ 1𝑇 ⎥ ⎢ ⋮ ⎥ ∗ ⎣ 𝑦 𝑁𝑇 ⎦ 𝑥∗ ⎡ 11 ⎤ ⎢ ⋮ ⎥ ∗ ⎢ 𝑥 𝑁1 ⎥ 𝑋∗ = ⎢ ⋮ ⎥ . ⎢ 𝑥∗ ⎥ ⎢ 1𝑇 ⎥ ⎢ ⋮ ⎥ ∗ ⎣ 𝑥 𝑁𝑇 ⎦ 56 As shown by Elhorst (2010; following Kelejian and Prucha [1998]), a consistent estimation procedure is as follows. Let b0 and b1 denote the OLS estimators of successively regressing 𝑌 ∗ and �𝐼 𝑇 ⨂𝑊�𝑌 ∗ on 𝑋 ∗ , and let e∗ and e∗ be the corresponding residuals. The ML estimate 0 1 of 𝛿 is then obtained by maximizing the concentrated log-likelihood function (25) ∗ ∗ ′ ∗ ∗ 𝐿𝑜𝑔𝐿 = 𝐶 − (𝑁𝑇/2)𝑙𝑜𝑔 ��𝑒0 − 𝛿𝑒1 � �𝑒0 − 𝛿𝑒1 �� + 𝑇𝑙𝑜𝑔�𝐼 𝑁 − 𝛿𝑊� where 𝐶 is a constant not depending on 𝛿. Third, estimators for 𝛽 and 𝜎 2 may be computed, using the numerical estimate of 𝛿, as follows: −1 𝛽 = 𝑏0 − 𝛿𝑏1 = �𝑋 ∗ ′ 𝑋 ∗ � (26) 𝑋 ∗ ′ �𝑌 ∗ − 𝛿�𝐼 𝑇 ⨂𝑊�𝑌 ∗ � ∗ ∗ ′ ∗ ∗ 𝜎 2 = (1/𝑁𝑇)�𝑒0 − 𝛿𝑒1 � �𝑒0 − 𝛿𝑒1 � (27) The asymptotic variance matrix of the parameters is required for inference. Elhorst and Freret (2007) derive this matrix as: 2 ∗′ ∗ ⎡(1/𝜎 )𝑋 𝑋 � (28) 𝐴𝑣𝑎𝑟 �𝛽, 𝛿, 𝜎 2 � = ⎢(1/𝜎 2 )𝑋 ∗ ′ �𝐼 𝑇 ⨂𝑊 �𝑋 ∗ 𝛽 ⎢ ⎣0 Ψ � (𝑇/𝜎 2 )𝑡𝑟�𝑊 � ⎤ ⎥ ⎥ (𝑁𝑇/2𝜎 4 )⎦ −1 ′ � 𝑊 � 𝑊 where Ψ = 𝑇 ∗ 𝑡𝑟�𝑊 ′ � + � ′ � � + �1⁄ 𝜎 2 � 𝛽′ 𝑋 ∗ �𝐼 𝑇 ⨂𝑊 ′ � � 𝑋 ∗ 𝛽 and � = 𝑊�𝐼 𝑁 − 𝑊 𝑊 𝑊 𝛿𝑊� −1 . Note that upper diagonal elements are omitted because the matrix is symmetric. The setting are the change in dimension of the matrix X ∗ from 𝑁 to 𝑁 × 𝑇 observations and the differences with the asymptotic variance matrix of a spatial lag model in a cross-sectional 57 summation over 𝑇 cross-sections involving manipulations of the (𝑁, 𝑁) spatial weights matrix 𝑊. 2.5.3.2 Random Effects Spatial Lag Model If the spatial effects in the model in equation (20) are assumed to be random, then we may write the log-likelihood as: (30) 𝐿𝑜𝑔𝐿 = −(𝑁𝑇⁄2)𝑙𝑜𝑔 �2𝜋𝜎 2 � + 𝑇𝑙𝑜𝑔�𝐼 𝑁 − 𝛿𝑊� − (1/2𝜎 2 ) ∑ 𝑖 𝑁= 1 ∑ 𝑡𝑇 = 1 �𝑦 ∘ − 𝑖𝑡 ∘ 𝛿 �∑ 𝑗𝑁= 1 𝑤 𝑖𝑗 𝑦 𝑗𝑡 � − 𝑥 ∘ 𝛽� 𝑖𝑡 2 where the symbol ∘ denotes a transformation of the variables dependent on 𝜃, i.e. (31) 𝑦 ∘ = 𝑦 𝑖𝑡 − (1 − θ)(1/T) ∑ 𝑡𝑇 = 1 𝑦 𝑖𝑡 𝑖𝑡 and 𝑥 ∘ = 𝑥 𝑖𝑡 − (1 − θ)(1/T) ∑ 𝑡𝑇 = 1 𝑥 𝑖𝑡 𝑖𝑡 In equation (31), 𝜃 denotes the weight attached to the cross-sectional component of the data, with 𝜃 = 𝜎 2 ��𝑇𝜎 2 + 𝜎 2 � and 0 < 𝜃 ≤ 1. 𝜇 (32) 𝑦∘ ⎡ 11 ⎤ ⎢ ∘⋮ ⎥ ⎢ 𝑦 𝑁1 ⎥ ∘ 𝑌 =⎢ ⋮ ⎥ ⎢ 𝑦∘ ⎥ ⎢ 1𝑇 ⎥ ⎢ ∘⋮ ⎥ ⎣ 𝑦 𝑁𝑇 ⎦ and 𝑥∘ ⎡ 11 ⎤ ⎢ ∘⋮ ⎥ ⎢ 𝑥 𝑁1 ⎥ ∘ 𝑋 =⎢ ⋮ ⎥ ⎢ 𝑥∘ ⎥ ⎢ 1𝑇 ⎥ ⎢ ∘⋮ ⎥ ⎣ 𝑥 𝑁𝑇 ⎦ 58 Given 𝜃, this log-likelihood function is identical to the log-likelihood function of the fixed effects spatial lag model in (25). This implies that the same procedure can be used to estimate 𝛽, 𝜎 2 and 𝛿 as before, except that now we denote the demeaned variables with the superscript symbol ∘ instead of ∗. Given 𝛽, 𝜎 2 and 𝛿, 𝜃 can be estimated by maximizing the concentrated log-likelihood function with respect to 𝜃: (33) 𝐿𝑜𝑔𝐿 = −(𝑁𝑇/2)𝑙𝑜𝑔 �𝑒(𝜃)′ 𝑒(𝜃)� + (𝑁/2)𝑙𝑜𝑔 �𝜃 2 � where the elements of 𝑒(𝜃) follow the form: (34) 𝑇 𝑁 𝑇 𝑒(𝜃) 𝑖𝑡 = 𝑦 𝑖𝑡 − (1 − θ)(1/T) � 𝑦 𝑖𝑡 − 𝛿 � � 𝑤 𝑖𝑗 𝑦 𝑗𝑡 − (1 − θ)(1/T) � 𝑤 𝑖𝑗 𝑦 𝑗𝑡 � 𝑡=1 𝑗=1 𝑡=1 𝑇 − �𝑥 𝑖𝑡 − (1 − θ)(1/T) � 𝑥 𝑖𝑡 � 𝛽 𝑡=1 Here again, an iterative procedure may be used to alternately estimate the set of parameters 𝛽, 𝜎 2 and 𝛿 and the parameter θ, until convergence. Finally, the asymptotic variance matrix of the parameters is computed. Elhorst (2010) shows this matrix to be: 59 (35) (1/𝜎 2 )𝑋 ∘′ 𝑋 ∘ ⎡ � ⎢(1/𝜎 2 )𝑋 ∘′ �𝐼 𝑇 ⨂𝑊 �𝑋 ∘ 𝛽 2� = ⎢ 𝐴𝑣𝑎𝑟 �𝛽, 𝛿, 𝜃, 𝜎 ⎢ 0 ⎢ ⎢ 0 ⎣ Γ � −(1/𝜎 2 )𝑡𝑟�𝑊 � � (T/𝜎 2 )𝑡𝑟�𝑊 � ⎤ ⎥ ⎥ 𝑁 �𝑇 + 1/𝜃 2 � ⎥ ⎥ 𝑁𝑇 ⎥ 2) −(𝑁/𝜎 2𝜎 4 ⎦ ′ � 𝑊 � 𝑊 where Γ = 𝑇 ∗ 𝑡𝑟�𝑊 ′ � + � ′ � � + �1/𝜎 2 � 𝛽′ 𝑋 ∘ �𝐼 𝑇 ⨂𝑊 ′ � �𝑋 ∘ 𝛽 and, as before, � = 𝑊 𝑊 𝑊 𝑊�𝐼 𝑁 − 𝛿𝑊� −1 . 60 2.5.3.3 Correlated random effects in a spatial lag model An alternative approach to FE for dealing with unobserved heterogeneity, but which does not require the strong RE assumption of independence between covariates and 𝜇 𝑖 , is provided by correlated random effects (CRE). The extension of the spatial lag RE estimation described above to CRE is straightforward. Following Mundlak’s version of the Chamberlain device, we write a linear projection of the unobserved effects 𝜇 𝑖 as: (40) 𝜇 𝑖 = 𝜓 + 𝑥̅ 𝑖 𝜉 + 𝑎 𝑖 where 𝑥̅ 𝑖 is the time average of 𝑥 𝑖, 𝑡 for all 𝑡. Furthermore, we assume that 𝐸�𝜇 𝑖 � = 0 and 𝐸�𝑥 ′𝑖 𝜇 𝑖 � = 0. Then, plugging this expressing back into our model (equation 20), we obtain: (41) 𝑦 𝑖𝑡 = 𝛿 ∑ 𝑗𝑁= 1 𝑤 𝑖𝑗 𝑦 𝑗𝑡 + 𝑥 𝑖𝑡 𝛽 + 𝜓 + 𝑥̅ 𝑖 𝜉 + 𝑎 𝑖 + 𝑢 𝑖𝑡 Or equivalently: (42) 𝑦 𝑖𝑡 = 𝛿 ∑ 𝑗𝑁= 1 𝑤 𝑖𝑗 𝑦 𝑗𝑡 + 𝑥 𝑖𝑡 𝛽 + 𝜓 + 𝑥̅ 𝑖 𝜉 + 𝑟 𝑖𝑡 𝑟 𝑖𝑡 = 𝑎 𝑖 + 𝑢 𝑖𝑡 where, under the standard FE assumption, the errors 𝑟 𝑖𝑡 satisfy 𝐸�𝑟 𝑖𝑡 � = 0, 𝐸�𝑥 ′𝑖 𝑟 𝑖𝑡 � = 0, 𝑡 = 1,2, … , 𝑇. Because of independence between 𝑎 𝑖 and 𝑢 𝑖𝑡 , adjustments to the estimated variance-covariance matrix required for inference is straightforward. Controlling for the spatial lag �𝐼 𝑇 ⨂𝑊�𝑌 ∗ can then take place using the method outlined above for the RE estimation of the spatial lag model. 61 2.5.4 Defining neighborhoods: spatial weights All of the spatial model specifications above require the definition of a spatial weighting matrix 𝑊. This matrix may be defined in a number of different ways. Because my dataset has geographic coordinates assigned at the household level, the weights I emphasize in this study are based on geographic distances between observations. In particular, I use inverse distances based on planar coordinates with a threshold of 10 kilometers imposed, such that points separated by more than this value were not considered neighbors. The weights matrix is then 15 standardized in order to ensure that expressions involving the matrix are invertible . One of the issues that non-spatial econometricians sometimes raise with spatial econometric methods has to do with the a priori nature of the weights definition (Corrado and Fingleton 2010). In principle, however, this is no more or less arbitrary than the methods commonly used to define temporal lag structures in time-series analysis. In my analysis, 15 kilometers as a neighborhood threshold roughly corresponds to the diameter of many household clusters in the dataset, but also allows for households in adjacent villages to be neighbors to one another. In discussions with other researchers with field experience, I also confirmed that this structure concords with commonly held intuition about the core density of rural social networks. Matrices 𝐈 − 𝛒𝐖 and 𝐈 − 𝛌𝐖 are generally non-singular if the spatial weights matrix 𝐖 is not normalized. In empirical applications, it is common to normalize the spatial-weight matrix such that each row sums to unity. Following Kelejian and Prucha (2010), who argue that row normalization may lead to model misspecification, I normalize the spatial-weight matrix using the so-called minimax approach, where each element is normalized by 𝛕 𝑵 , defined as: 𝛕 𝑵 = 𝑚𝑖𝑛 �max1 < 𝑖 < 𝑁 ∑ 𝑗𝑁= 1 �𝑤 𝑖𝑗, 𝑁 � , max1 < 𝑗 < 𝑁 ∑ 𝑖 𝑁= 1 �𝑤 𝑖𝑗, 𝑁 ��. 15 62 In addition, before undertaking the analyses reported here, I evaluated several alternative weight specifications using non-nested specification tests. In particular, I evaluated (a) a simple weights matrix based on village co-membership, where co-villagers were assigned weights of 1, (b) inverse distance weights based on larger truncation thresholds (the distance value between two points at which the weight linking observations is set to zero; in particular, I compared alternatives 10, 15, 20 and 30 kilometers), (c) modified inverse distance weights using alternative decay function coefficients, which control how fast the neighborhood relationship attenuates over linear space, and (d) a mixed-weight based on geographic and economic distance, where the latter is based on absolute distance between household income. I evaluated these alternatives using the spatial J-test outlined by Kelejian and Piras (2011). This is a modification of Hansen’s J test of non-nested alternatives. In this case, the non-nested alternatives are based on spatial model specifications which differ only in terms of alternative versions of 𝑊. Results supported use of the simple inverse distance weights using the 15 km threshold. All results reported in this paper correspond to this weighting matrix. 2.5.5 Partial effects in spatial lag models Estimating the partial effects of covariates in spatial lag models requires some strategy for incorporating direct and indirect spillovers governed by the spatial lag structure. In this study, our primary analytical interest is in the partial effect of a change in access on our outcomes of interest. Unlike non-spatial linear models, where parameter estimates may be interpreted as partial effects, spatial dependence requires that we consider the dependence channels 63 specified in the model. In particular, models containing spatial lags of the dependent variable require special interpretation of the parameters (Le Sage and Pace 2009, Elhorst 2010). For a single period, we can represent the partial effect on outcome 𝑦 for an individual 𝑖 from a change in the r explanatory variable 𝑥 𝑟 , in terms of an own derivative: th 𝜕𝑦 𝑖 ⁄ 𝜕𝑥 𝑖𝑟 = 𝑆 𝑟 (𝑊) 𝑖𝑖 where 𝑆 𝑟 (𝑊) = (𝐼 𝑛 − 𝜌𝑊)−1 (𝐼 𝑛 𝛽 𝑟 ) However, we might also consider the effect on observation 𝑖 deriving from a change in 𝑥 at observation 𝑗, i.e. 𝜕𝑦 𝑖 � 𝜕𝑥 𝑗𝑟 = 𝑆 𝑟 (𝑊) 𝑖𝑗 . This expression, unlike in non-spatial models, is not necessarily zero. Le Sage and Pace (2009) suggest the following summary measures of aggregate impacts: Average Direct Impact: The impact of changes in the 𝑖th observation of 𝑥 𝑟 on 𝑦 𝑖 could be summarized by averaging over the direct impact associated with all observations 𝑖. This is somewhat analogous to the standard non-spatial linear regression coefficient interpretations that represent the average response of the dependent variable to changes in the independent variables. Average Total Impact to an Observation: The sum across the 𝑖th row of 𝑆 𝑟 (𝑊) = (𝐼 𝑛 − 𝜌𝑊)−1 (𝐼 𝑛 𝛽 𝑟 ) would represent the total impact on individual observation 𝑦 𝑖 resulting from changing the explanatory variable 𝑥 𝑟 by the same amount across all 𝑛 observations. 64 Average Total Impact from an Observation: The sum across the 𝑗th column of 𝑆 𝑟 (𝑊) = (𝐼 𝑛 − 𝜌𝑊)−1 (𝐼 𝑛 𝛽 𝑟 ) represents the total impact over all 𝑦 𝑖 from changing the 𝑟th explanatory variable by some amount in the 𝑗th observation. 2.5.6 Controlling for endogenous regressors One of the assertions made in the introduction was that some important indicators of market access may be endogenous to some of the outcomes we are interested in. 16 van de Walle (2009), discussing rural roads, noted that the physical infrastructure poses specific challenges for evaluation. First, the benefits of roads and other infrastructure are both derived and conditional on interactions with the household, community and geographic characteristics of their physical locations. Second, infrastructure investment locations are typically determined by these very same characteristics. This confounds inferences based on a comparison of places with roads versus without them. These are all very good reasons for treating infrastructure access as endogenous in general. In the case of Kenya, there are also good reasons for treating them as exogenous. The access 16 Applied researchers sometimes distinguish between endogeneity arising via reverse causation (e.g. the idea that household characteristics might condition road placement) versus endogeneity arising from the presence of an unobserved third variable that determines both placement and outcomes (i.e. omitted variable bias). Econometrically, the first type implies the second type, and the resulting estimation problems are the equivalent (Wooldridge 2010). According to Jacoby (2000), all remoteness/access variables should be treated as at least potentially endogenous. The basic reason is that household characteristics (e.g. asset wealth) are conditioned by the same unobservables that condition infrastructure targeting (the simplest version of this argument is that the poor tend to be found in marginal and less accessible areas to begin with). Furthermore, if we take access indicators as representative of place-specific public investments, we are forced to treat these indicators as endogenous in order to even speak about, let alone identify, any kind of treatment effect. 65 indicators evaluated in this study are described in Table 8. Regarding roads, the national network, i.e. the inter-urban and transnational network where most non-urban tarmac roads are found, has been largely in place since well before independence. There have been some important extensions, but these have mostly been in areas not represented in our sample. There have also been considerable upgrades to major inter-urban routes in recent years, but these changes are largely opaque in my dataset, which does not ask about quality. For these reasons, it may not be unreasonable to treat distance to tarmac roads as exogenous in this study, although in principle they are subject to targeting mechanisms that raise endogeneity issues in general. Regarding fertilizer suppliers, since market liberalization in the mid-1990’s, private fertilizers have expanded considerably (Chamberlin and Jayne 2009). It is reasonable to expect that this expansion has been guided by underserved areas of production surplus where fertilizer use is profitable. In principle, this implies a possible simultaneity argument: as communities generate larger marketable surpluses, they attract input suppliers. Improved access to inputs, in turn, drives productivity gains, surplus generation and market participation. This argument, however, applies mostly at the community level. At the household level, it may be reasonable to assume exogenous fertilizer retailer location decisions. If we appeal instead to an omitted variables issue, where unobserved local production endowments condition both household marketing outcomes as well as fertilizer location, then endogeneity becomes more plausible, although by including controls for production endowments (such as length of growing period, soil conditions and recent rainfall patterns) we may sufficiently diffuse this. 66 With respect to information and communication technologies (ICTs), many commentators have argued that access to such technologies in developing countries is strongly conditioned by wealth, income and education (the so-called digitial divide; Torero and von Braun 2006). Within the context of this research, an extension of this assertion is that if wealthier and better educated households are more likely to engage in markets and to have access to mobile telephony, then mobile access is strongly endogenous to marketing behavior. In principle this is a valid argument. However, in the case of Kenya, there are several countervailing arguments. First, productive assets and household head characteristics such as education and age are included in the market participation models because of non-separability assumptions. Thus, wealth and educational characteristics are ostensibly controlled for. Second, it is not clear that wealth and education characteristics are unambiguously associated with marketing outcomes in such a way as to make the nature of mobile-ownership endogeneity clear. It is true that mobile owners are wealthier and better educated than non-owners, and marketing households also tend to be wealthier and better educated (Table 5). However, wealthier and better educated households are also more likely to have non-negligible shares of income coming from off-farm non-agricultural sources; such households are less likely to participate in output marketing and tend to participate less in both absolute and relative terms (ibid). Fortunately, we can both control for and test for the potential endogeneity of proximity to roads, fertilizer retailers and mobile telephones, by employing a control function approach. Consider the model 𝑦 𝑖𝑡 = 𝛽𝑥 𝑖𝑡 + 𝛾𝑧 𝑖𝑡 + 𝜀 𝑖𝑡 where zit is the access covariate suspected of endogeneity and where there is some correlation between zit and 𝜀 𝑖𝑡 . The control function method involves using the residuals from a reduced form model of zit and including them as an 67 additional covariate in the structural model of response to access changes. The inclusion of this regressor effectively breaks the linkage between correlation between zit and 𝜀 𝑖𝑡 . Moreover, the significance of the coefficient of this regressor (evaluated using a test robust to heteroskedasdicity) provides a test of the endogeneity hypothesis (Rivers and Vuong 1988; Smith and Blundel 1986; Papke and Wooldridge 2008). The access variables suspected of endogeneity in my study are listed in Table 8, below, along with the instruments I use as controls. Because the control function approach entails using a generated regressor (i.e. the residuals from the control function regression, which are included as additional regressors in the main equation of interest), the resulting standard errors are not valid. Wooldridge (2010) suggests a bootstrap approach for obtaining valid standard errors. Implementing spatial models within a bootstrap routine turns out to be quite complicated. The main reason for this has to do with the definition of the weights matrix. By definition, the bootstrap is generates a sample through a random drawing-with-replacement from the original dataset. This virtually guarantees that any panel observation (i.e. household) will appear more than once. The distance between these duplicate (triplicate, etc.) observations is zero, and the inverse distance is therefore undefined. The solution that I propose is to allow replicate observations to be neighbors with everyone within their neighborhood except their dopplegangers. On occasion, the bootstrap may draw a sample containing only replicates for a given neighborhood (e.g. may produce a village with many versions of only one household). Such cases are rare, however, and the bootstrap does go through. 68 2.5.7 Long-run and short-run changes in access There is considerable enthusiasm for panel estimators that control for unobserved heterogeneity in the cross-section, as failure to account for such factors may result in severe bias (if the outcome of interest is in fact confounded by such heterogeneity). Fixed-effects (FE), first-difference (FD) and correlated-random effects (CRE), as typically implemented, are all examples of such estimators. However, there are potential drawbacks to this strategy. When there is little variation in the covariate of interest and/or there is a high degree of measurement error – i.e. when the signalto-noise ratio is low – the results from differencing estimators may be seriously compromised, with coefficient estimates severely attenuated towards zero. This is sometimes referred to as attenuation bias (Deaton 1997: p108, Baltagi 2008: p205-208, Wooldridge 2010: p365). 17 This problem is usually framed as a tradeoff: differencing approaches exacerbate measurement error bias even as they eliminate heterogeneity bias. In other words, in order to remove the inconsistency arising from unobserved heterogeneity, precision has been sacrificed. Deaton (1997) notes that “a consistent but imprecise estimate can be further from the truth than an inconsistent estimator” (p108). Furthermore, “we must also be aware of misinterpreting a decrease in efficiency as a change in parameter estimates between the differenced and undifferenced equations. If the cross-section estimate shows that 𝛽 is positive and significant, 17 The case for attenuation bias is usually made for FE and FD methods, but since CRE estimates are asymptotically equivalent to FE estimates, under the CRE assumptions, this argument applies to CRE also. See Wooldridge (2010) for comparison of CRE and FE estimator properties. Solon (1985) and Griliches and Hausman (1986) are the seminal studies of attenuation bias from measurement error in panel data. 69 and if the differenced data yield an estimate that is insignificantly different from both zero and the cross-section estimate, it is not persuasive to claim that the cross-section result is an artifact of not “treating” the heterogeneity.” (p108). McKinnish (2008) shows that the measurement error problem can be extended to include cases where an indicator is measured with precision, but where this indicator is an imprecise measure of the true factor relevant to the outcome of interest. She notes that time-series variation in panels – i.e. the variation that remains after removing fixed effects – often reflects idiosyncratic changes in the independent variable that have little or no influence on the dependent variable. “For example, we may know the exact value of state welfare benefits from administrative records, but not all of the variation in these benefit levels will necessarily influence behavior. In particular, we expect many outcomes to respond differently to short-term and long-term variation in conditions. This differential effect of long-term and short-term variation can generate the same bias as “true” measurement error.” (p 336). In this case, measurement error as conventionally defined is not an issue, although the resulting “measurement error problem” is the same. To paraphrase her argument in the context of this study, consider a model of market participation: 𝑦 𝑖𝑡 = 𝛼 𝑖 + 𝛽𝑥 𝑖𝑡 + 𝜀 𝑖𝑡 𝑥 𝑖𝑡 = 𝑧 𝑖𝑡 + 𝜈 𝑖𝑡 where is 𝑦 𝑖𝑡 is the marketing outcome of interest, 𝑥 𝑖𝑡 is the access measurement for the corresponding period, and 𝑧 𝑖𝑡 is the sustained component of access. Even in the absence of 70 measurement error on 𝑥 𝑖𝑡 , 𝑥 may still not capture the underlying causal relationship of primary interest: if 𝑧 is highly correlated over time and two observations of 𝑥 from adjacent periods are differenced, most of the information about 𝑧 will be eliminated, leaving variation which is mainly associated with the noise component 𝜈. Elsewhere in the paper I show that, although we observe several access indicators over time in our dataset, most of these indicators vary considerably more over space than over time. (The major exception to this is mobile phone access.) Furthermore, there is considerable intra-village heterogeneity of responses about distances to infrastructure. While some intra-village variation is to be expected due to dispersed settlements in Kenya as in much of east and southern Africa (e.g., households in the same village may be up to 15 km apart from one another), we observe variation in distances that are sometimes greater than the distances between households in the village, suggesting that some measurement error may be an issue. In response to this, I propose a structural interpretation of the time-averages of the timevarying access indicators: these represent the persistent levels of access, whereas the timevarying component represents the transitory component, which varies across the periods observed. Because we are using a CRE framework, the time-averages are already present in our estimating equation. Normally, these terms are not interpreted directly, as their role is primarily to control for unobserved heterogeneity (along with all other time-averages). However, in this case, such variables have a natural interpretation. 71 Table 8 Description of access indicators used in this study type variable infrastructure mobile rural services description ownership of mobile phone timevarying? yes endogenous? instruments yes household characteristics in 1997 [education of 2 household head; education of household head; radio ownership; household electricity] * dummy variable indicating more than 50% of village respondents possessed a mobile phone in current year distance to veterinary services, population density, km to fertilizer distance to yes yes 2 nearest population density fertilizer retailer rural services km to distance to yes no** extension nearest extension office infrastructure km to distance to yes no** tarmac nearest tarmac road Notes: ** As discussed in the text, these indicators may be suspected of endogeneity, but valid instruments were not available. However, given the relatively low rates of change in these indicators over the panel period, we may be less worried about endogeneity arising from targeting in the short term. Endogeneity arising from unobserved conditions in the longer term may still be an issue, but these are mitigated by the CRE estimation strategy and the other geographical variables used as additional controls. 72 2.6 Results The information presented in Section 2 established that smallholder marketing has been mostly stagnant over the past decade. Trends over time in the marketed share of production have been flat (for aggregate production as well as for maize), as have the trends over time in the share of producers engaging in output markets (e.g. the % selling maize, etc., in any given year). Trends over time in purchased input use have been similarly flat. On the other hand, we have seen that rural market access conditions are generally improving. Access to tarmac roads, fertilizer retailers and mobile telecommunications have all improved over the last decade, especially access to mobile phones. This section reports the results of analysis linking marketing outcomes and rural infrastructure. I first present some non-parametric estimates of the relationship between marketing outcomes and access conditions. The purpose of this is to flesh out the general relationships. I then present econometric results which examine the same relationships in a more controlled setting and upon which I base my conclusions. 2.6.1 Descriptive analysis 2.6.1.1 Non-parametric relationships between marketing and access This section shows Figure 5 shows non-parametric estimates of the relationship between production and marketing decisions and distance to town. The top row of the figure indicates the share of maize and beans producers who sell some portion of their output (recall that nearly 100% of the sample grows maize and beans, so these figures effectively also represent 73 the share of the rural population that is engaging with these output markets). The spatial trends are very stable over time: in all years producers are more likely to sell the further from town they are. For maize, this spatial trend is very slight: remote growers are only slightly more likely to sell than non-remote growers. For beans, the trend is more pronounced. The bottom row of the figure represents the share of farmers who produce high-value crops and milk. Since the majority of high-value production is sold and almost all milk is sold, these figures also effectively represent the share of the rural population engaged in high value markets. The large share of high-value producers is surprising, but these numbers mask the amount and type of produce. With regard to the type of produce, given the stable amount of high-value engagement across the distance gradient, we may conclude that much of this production is not going into urban- and export-oriented high-value chains, but rather to local consumption. 74 Figure 5 Production, marketing and remoteness In regards to the volume of sales, Figure 8 shows the declining value of marketed horticulture output with distance to town and Figure 9 shows this as a percentage of total marketed value. The overall downward trend is not surprising; what is surprising is the relatively high importance of such sales in remote areas. The patterns across years vary little, suggesting that the fixed components of remoteness are playing a large role in these processes. Figure 8 shows the use of purchased fertilizer (as a percentage of farmers) and the application rate (in kg/ha) over the same distance gradient. Here again, the apparent impact of remoteness is considerable, with a steep downward slope beginning at about 5km (for probability of use) 75 and 15km (for the application rate). There is little variation over time, again suggesting that the influence of distance to town on these outcomes is fairly stable. Figure 9 shows transport costs over the same distance gradient. The darker line is the price in 2010 to move a 90kg bag of maize from the farmgate to the nearest marketplace. The dotted line is the 2010 price to move a 50kg bag of fertilizer from the point of purchase to the farm. It is important to note that we cannot infer unit costs of transport from these graphs because the distances for which they are reported are not fixed over the gradient. In other words, it is likely that farmers further from town are also located further from input and output markets, but we don’t know by how much. Still, the graph is illustrative of the fact that total costs faced by farmers are increasing over very general measures of remoteness. Figure 10 shows the relationship between the probability of receiving extension advice and distance from the nearest extension office. The steep slope graphically demonstrates the importance of physical distance for service acquisition. There could be confounding factors – for example, farmers further away from extension offices may be in areas less suited to crop agriculture – but the general point is that distance from service provision equates to service usage in very real ways. Although it is tempting to think that remoteness is captured by the distance to town variable in the graphs above, it is important to recognize that remoteness has multiple dimensions which play out over space in different ways (Chamberlin and Jayne 2013). Figure 11 shows the relationship between farmer-reported distances to the nearest town, tarmac road, electricity, fertilizer retailer and extension office. Although these multiple dimensions do generally move in 76 the same direction, as we would expect, they also show considerable differences from one another. This means that while a generic indicator, like km to town, may capture a lot, a univariate approach to defining access will necessarily miss other dimensions, some of which may be quite important. One of the most important dimensions in which different measures of accessibility have been changing is in the time dimension. Figure 1 showed trends in access to a variety of services and types of infrastructure over the previous decade. The most dynamic component of this picture, by far, is mobile phones. Figure 12 shows trends over time and space in mobile ownership. In contrast to the other indicators, which vary far more in the cross section than over time, mobile phone expansion has grown rapidly over time and reached deep into rural Kenya. The only other dimension of access in the household data which shows any dynamism is the reduction in distance to fertilizer retailers, who expanded following liberalization of input marketing in the late 1990s. 77 Figure 6 Value of high-value sales Figure 7 High-value share of marketed output 78 Figure 8 Use of purchased fertilizer and distance from town 79 Figure 9 Transport costs and remoteness Note: costs in real 2010 KSh 80 Figure 10 Extension advice and remoteness Note: data only available for 2010 81 Figure 11 Scatter plot correlations between distances to different types of infrastructure and services from the farmgate 82 Figure 12 Trends in access to mobile phones 83 2.6.1.2 Spatial autocorrelation in marketing outcomes Household marketing decisions are highly correlated with the decisions of their neighbors. Table 9 shows Moran’s I test statistic calculated for a number of marketing variables. Moran's I is a measure of spatial autocorrelation. The statistic takes values ranging from −1 to 1. Positive values indicate positive spatial autocorrelation (neighboring observations tend to be similar) and negative values indicate negative spatial autocorrelation (neighboring observations tend to be dissimilar). A value of 0 indicates spatial randomness. The test statistic is evaluated against the null hypothesis of spatial randomness. The marketing variables are: marketed share of total production (which I will refer to subsequently as the household commercialization index or HCI); marketed share of maize production (HCI maize); the participation and volume decisions for maize output marketing (probability of selling; amount sold); and the participation and volume decisions for fertilizer purchases (probability of buying; application rate in kg/ha). The test statistic is generally highly significant whether calculated for individual years or on the pooled panel. This means that household marketing outcomes are very similar to those of their neighbors. Of course, these are unconditional distributions and not surprising since our conceptual model of marketing determinants identifies spatially varying factors such as remoteness, population density and agronomic potential. The question remains whether or not these strong spatial patterns in behavior are fully accounted for by including such controls in a regression framework. I evaluate that question below. 84 Table 9 Moran’s I calculated for household marketing decisions variable Pooled statistic p-value 2010 statistic p-value 2007 statistic p-value 2004 statistic p-value output marketing HCI Log value crop sales 0.041 0.032 0.000*** 0.000*** 0.320 0.215 0.000*** 0.000*** 0.267 0.202 0.000*** 0.000*** 0.229 0.203 0.000*** 0.000*** input marketing Log fertilizer (kg) 0.031 0.000*** 0.222 0.000*** 0.287 0.000*** 0.306 0.000*** Note: test calculated using inverse distance weights with neighborhood thresholds defined at 20km. * p<0.10, ** p<0.05, *** p<0.01 85 2.6.2 Determinants of market participation The previous section (2.6.1) characterized patterns in marketing outcomes over time and across space. That analysis provides some indications of the relationship between access conditions and marketing, but does not control for important household level and other factors. I now turn to the results of the econometric analysis of marketing behavior, in which such factors are explicitly controlled for. 2.6.2.1 Reduced form estimation results Reduced form estimates of access indicators reported by households are presented in Table 10. The mobile ownership equation is estimated with a Probit-CRE estimator. The fertilizer distance equation is estimated with a linear CRE estimator. Only the estimated coefficients for IVs are reported in the table below; full estimation results are available. In both model results, the IVs are highly significant and have expected signs. As previously indicated, the instruments for mobile ownership are the age and squared age of the household member who is next in age to the household head, interacted with a village-level indicator of whether or not anyone has a mobile phone in that panel round. Both IVs are highly significant and overidentification tests reject the null of weak instruments. This variable is positive and highly significant. Distance to fertilizer retailer is instrumented by distance to veterinary services. This instrument is also positive and highly significant. For the other two time-changing access indicators of interest – distance to extension services and the nearest tarmac road – it was not possible to find valid instruments. However, given the 86 lower levels of change in these indicators over the study period, I submit that endogeneity is less of a potential problem. The basic reason is that, if the principal channel of endogeneity is through omitted variables related to investment targeting, then such endogeneity is most likely to be present in the time-invariant portion. However, this issue is mitigated by the use of the CRE estimator which addresses time-invariant unobserved heterogeneity. 87 Table 10 Reduced form estimates of access indicators nextage*1[vil-mobile] 2 nextage *1[vil-mobile] (1) mobile ownership coeff./p-value 0.059*** (0.000) -0.001*** (2) km to nearest fertilizer retailer coeff./p-value (0.000) km to piped water km to health center km to veterinary services 0.538*** (0.000) -0.016*** (0.001) 0.000*** population density population density 2 R-squared Pseudo R-squared N Estimator 0.465 4931 Probit (0.000) 0.519 4931 OLS Notes: nextage is the age of the next-youngest household member to the head. 1[vil-mobile] is an indicator variable equaling 1 if any village respondents possessed a mobile in the survey year. Additional regressors not reported in this table include household characteristics for model (1), village level aggregates of household characteristics for model (2), geographical characteristics and time fixed effects for all models. Complete specification results are provided in the Appendix. * p<0.10, ** p<0.05, *** p<0.01 88 2.6.2.2 Output marketing In this section, I present the main estimation results for the three marketing outcomes of interest. 18 First, I show estimation results for the marketed share of crop production (HCI) in table 11. There are four specifications shown in this table: (1) a (non-spatial) linear CRE model, (2) a (non-spatial) Tobit CRE model, (3) a linear CRE model with a spatial lag of the dependent variable, and (4) a linear CRE model with spatially autocorrelated errors. The general logic of this table, and the discussion below, is to first establish that linear models perform well even though the outcome variables are nominally corner-solution problems, and then to compare the linear non-spatial models with linear spatial model specifications. All specifications use the Chamberlin-Mundlak device to control for unobserved heterogeneity and are estimated with the full random effects error structure. 19 Furthermore, control function residuals are used in all models to both test for and control for endogeneity of mobile ownership. The linear CRE estimates (column 1) are very close to the unconditional average partial effects (APE) from the Tobit-CRE model (column 2). This indicates that the linear model works well as a general measure of average impacts, as we might expect given the low number of households marketing either 0% or 100%. 18 In related work, I have also explored crop-specific market participation (e.g. for maize) as well as milk marketing. These results are not reported or discussed here because the specifications for those models are different and I was not able to implement spatial models in the same way. 19 The Chamberlin-Mundlak device may also be used with pooled cross sections, an approach which generally yields more robust but less efficient estimates than estimation which uses the full random effects structural assumptions. Because the “pooled CRE” estimates differ little from the results I show here, I do not report them here. 89 Mobile phones have a significantly positive impact on marketing outcomes. Mobile ownership translates to an expected increase in HCI of 8-13 percentage points. (At the sample mean value of HCI, 43%, this is an upward shift of about 20-30%). The control function residual for mobile ownership is highly statistically significant, supporting the supposition of endogeneity. Interestingly, the coefficient estimate for the CF residual is negative. In other words, the endogenous portion of mobile ownership (presumably having to do with wealth and related unobservables) is associated with less output market engagement. Although not what I expected a priori, this result is plausible: although marketing households are generally wealthier than subsistence households, the wealthiest rural households are also more likely to be engaged in non-farm income and, consequently, rely less on generating cash income from marketed farm production. The impacts of short term changes in access to extension services (km extension) are not statistically different from zero. However, the impacts of long-term levels of access to extension services (avg: km extn., which is measured as the time-average of this variable over all the panel rounds) is highly significant. A reduction in distance by 1km results in a 0.32-0.44 percentage point increase in expected HCI. The impacts of short term changes in access to all-weather roads (km tarmac) are not significant. The impacts of long-term levels of access to tarmac roads (avg: km tarmac) are also not significant. On the face of it, this stands in contrast with other empirical findings that roads are important to marketing outcomes. However, recall that most empirical studies which do examine the impacts of road distance do not consider other access factors. Because distance to 90 roads is generally positively associated with other access indicators (although not necessarily highly correlated), if such non-road indicators are important but omitted, then their influence may be exerting bias on road coefficient estimates. In this case, however, I am controlling for access to services. (In fact, initial specification testing indicated that this is the case, i.e. when the other access variables are omitted, the apparent importance of roads increases.) Of non-access factors, landholding is an important positive determinant of HCI, significant at th the 99 percentile level. The price of maize is significant and positive. Younger household heads have larger expected HCI values. Other factors are generally not significant. To evaluate the existence of spatial dependence, I first calculated Moran’s I statistic on the residuals from the non-spatial linear CRE estimator. The test statistic of 0.101 indicates th moderately positively spatial autocorrelation, significant at the 99 percentile level. As noted earlier, failure to control for this spatial structure has implications for inference. If the spatial autocorrelation is restricted to the error terms, then how we handle it reduces to an efficiency argument: we lose efficiency by failing to correctly specify this structure, but coefficient estimates are still consistent. However, if the spatial autocorrelation we observe in the nonspatial residuals is really related to a structural interaction in the decision-making outcomes of neighboring households – i.e. the process of interest is characterized by endogenous dependence -- then ordinary panel estimators will not only be inefficient, they will be inconsistent for all covariates in the model. 91 To evaluate possible alternative spatial models, a common point of departure is to test nonspatial models for the likelihood of a spatial lag or spatial error term. The LM and robust LM tests developed by Anselin (1988, 2001) use the residuals from non-spatial maximum likelihood estimates to evaluate the a restricted likelihood function for the spatial lag and error models where the spatial parameters in each are set to zero. The null hypothesis is no spatial structure. (The robust LM tests are called robust because the existence of dependence of one type does not bias detection of the other type.) Both standard and robust LM tests indicate strong support for spatial structure of one or both types. However, because test results do not reject either specification, I estimate both SEM and SAR specifications. Column (3) shows estimation results for the spatial error model (linear-SEM-CRE) and column (4) shows results for the spatial lag model (linear-SAR-CRE). Spatial parameters for both models (lambda and rho, respectively) are highly significant and similar in magnitude. The coefficient on the spatial error term, denoted as 𝜆, is quite large (0.60), indicating pronounced positive spatial autocorrelation. Similarly, the coefficient on the spatial lag, denoted as 𝜌, is highly significant and large. The estimated value of 0.56 indicates non-trivial positive spillovers, the effects of which we observe in the total impacts. Robust Wald tests strongly reject the null of 𝜌 = 0 and 𝜆 = 0. The similarity of spatial parameter estimates indicates that the estimated magnitude of spatial dependence does not change much under alternative assumptions of spatial structural (lag versus error). In the SEM model, mobile phones have a significantly positive impact on marketing outcomes. Mobile ownership translates to an expected increase in HCI of 8.3 percentage points. As with 92 the non-spatial models, the control function residual for mobile ownership is statistically significant, confirming our suspicion of endogeneity. The impacts of long-term levels of access to extension services (avg: km extn.) is significant in the SEM specification, although the impacts of short term changes (km extension) are not significant. As in the non-spatial models, the impacts of long-term states and short term changes in access to all-weather roads are not significant. Estimates for the non-access factors are generally very similar to the non-spatial model results. The main differences are that the household head’s age is not significant in the SEM specification. In the SAR model, mobile phones have a significantly positive impact on marketing outcomes and the magnitude of the impacts is much larger than in the other specifications. Recall that the SAR coefficients cannot be directly interpreted as partial effects. As mentioned earlier, we may calculate partial effects in models with a spatial lag of the dependent variable in terms of average direct impacts, average indirect impacts and average total impacts. The direct, indirect and total effects of mobile ownership in the SAR model are shown in table 12. The average direct impacts, which are analogous to the partial effects in non-lag models, are of about the same magnitude as coefficients in the non-spatial and spatial error models. The indirect impacts, which measure the average impact of a one unit change in the covariate observed at neighboring locations, is the same sign as the direct impact, and of slightly smaller magnitude. (Note that direct and indirect impacts do not necessarily have to have the same sign, although here they do, which is consistent with my conceptual model.) The average total impacts are 93 comprised of both direct and indirect impacts, again averaged across the sample. The cumulative effect is larger and more significant than the corresponding partial effects from the non-spatial and spatial error models. After adding in the indirect effect occurring through the spatial lag, the total impact of mobile phones on HCI jumps to 23 percentage points. This is a very large increase over the non-spatial and SEM estimates. As with the other specifications, the control function residual for mobile ownership is statistically significant. As in the other specifications, the impacts of long-term levels of access to extension services (avg: km extn.) is significant. Using the total impact estimates to infer marginal effects, a reduction in distance by 1km results in a 1 percentage point increase in expected HCI. This effect is much larger than the estimates from the other specifications. As with the other specifications, after controlling for services and mobile telephones, the impacts of access to allweather roads are not significant. The non-access determinants are very similar to the SEM model results. Here again, the major differences from the non-spatial results are in estimates which are hard to interpret in the nonspatial models (i.e. negative coefficients on price of maize and education of household head). Because failure to estimate a spatial lag of the dependent variable in the true data generating process will result in biased estimators, one interpretation of this comparison is that the nonspatial results are biased because of specification error. In summary of the similarities and differences across specifications, several things stand out. First, access to mobile phones is quite important, as are the long-term access conditions related to extension services. Short term or transitory changes do not appear to be important 94 determinants of HCI outcomes in this sample. Furthermore, after controlling for access to services and mobile phones, the direct impact of roads is reduced. In interpreting these results, it is important to recall that in the presence of measurement error, attenuation bias may be present. This will have the effect of downplaying the importance of transitory changes. However, it is also important to recall that the period being evaluated is relatively short – 10 years – and changes for most indicators have been gradual over this period. Furthermore, with respect to road infrastructure in Kenya, because the national road network was fairly well established at independence, the most important changes taking place in recent years may be in terms of quality, rather than in terms of extending the tarmac network. Be that as it may, the importance of rural services and non-road infrastructure as determinants of marketing outcomes is a clear story. A second major observation is that the spatial models appear to fit the data better than nonspatial models. The spatial parameters are large in magnitude and highly significant. The coefficient estimates from the SAR and SEM models do not differ very much from one another and, for the most part, differ only slightly from the non-spatial estimates. (This latter fact, incidentally, suggests that any bias brought about by failing to specify a spatial lag is relatively small). However, the presence of an endogenous spatial lag implies a more complex calculation of impacts, which generally mean that the importance of access variables is magnified through the endogenous spillover. So how may we determine which spatial model to prefer? To evaluate between SAR and SEM specifications within a ML framework, Anselin (1998) suggests using the Akaike information 95 criterion. the table. 20 21 The Akaike and Bayesian information criteria are both reported at the bottom of Information criteria values are slightly lower for the SEM model than the SAR model, suggesting that the endogenous spatial lag gives a better overall goodness of fit. However, these differences are not large. Because the major difference occurs through the indirect and total impacts, the main implication of preferring the SEM when the SAR is the true specification is a downward biased assessment of the average impacts of the determinants of interest. 20 Following the model comparison procedures outlined by Elhorst (2010), after rejecting the null of no spatial structure in the models, I estimated a spatial Durbin model and tested for whether this model could be reduced to either a spatial lag or spatial error model, as those models nest within the spatial Durbin structure. The spatial Durbin specification was rejected, but I could not reject the null of collapsing to a spatial lag or to a spatial error model. I then evaluated the SARAR model, which combines both lag and error terms under a more general structure. Under the SARAR specification, the parameters for both rho and lambda were highly significant, but the sign of lambda was negative, which is difficult to interpret because it suggests that after controlling for positive endogenous spatial lags, the innovations are negatively associated with neighboring innovations. For this reason, I show results for both the spatial lag and error models but not for the more complex models which combine both terms. 21 I also report the log likelihood, which is related to the information criterion; the positive value of the log likelihood (and the negative AIC and BIC values) for the HCI model may seem strange, but is not an uncommon outcome in ML estimation when the density distribution of likelihoods has a very narrow dispersion. See Canette (2011) for an informal discussion of this. 96 Table 11 Determinants of marketed share of production (HCI) (1) linear CRE Access variables mobile km extension avg: km extn. km tarmac avg: km tarmac CF res: mobile Other factors farm size adult equiv. female age of head education assets maize price rainfall Spatial lambda (3) SEM CRE (4) SAR CRE coeff./p-value (2) APEs from Tobit CRE coeff./p-value coeff./p-value coeff./p-value 11.14 0.000*** 0.148 0.160 -0.424 0.004*** -0.0704 0.582 -0.0753 0.562 -6.321 0.000*** 11.05 0.000*** 0.142 0.331 -0.318 0.021** -0.0443 0.717 -0.121 0.394 -6.302 0.000*** 8.257 0.015** 0.102 0.368 -0.365 0.057* 0.0185 0.891 0.0453 0.794 -4.629 0.023** 10.10 0.006*** 0.122 0.192 -0.435 0.071* -0.0161 0.908 -0.00850 0.964 -5.818 0.009*** 1.788 0.000*** 0.153 0.489 0.442 0.682 -0.0839 0.005*** -0.122 0.302 -0.00191 0.513 0.0419 0.000*** -0.00290 0.261 1.747 0.000*** 0.220 0.146 0.677 0.486 -0.0843 0.010** -0.124 0.206 -0.00183 0.515 0.0412 0.000*** -0.00176 0.422 1.725 0.000*** 0.276 0.250 0.622 0.592 -0.0541 0.129 0.0675 0.649 0.00269 0.481 0.0414 0.047** -0.00102 0.880 1.725 0.000*** 0.196 0.205 0.711 0.509 -0.0297 0.585 0.0578 0.806 0.000833 0.817 0.0198 0.175 0.00228 0.689 0.602 0.000*** rho 97 0.566 0.000*** Table 11 (cont’d) (1) linear CRE (2) (3) (4) APEs from Tobit SEM CRE SAR CRE CRE coeff./p-value coeff./p-value coeff./p-value coeff./p-value Log likelihood -20699.8 -19753.2 -20666.1 -20684.8 AIC 41453.6 39560.4 41386.2 41423.5 BIC 41626.8 39733.6 41559.5 41596.8 N 4520 4520 4520 4520 Note: Dependent variable HCI is the marketed share of output value, measured as a ratio with values in the range [0,100]. CRE controls and year dummies not shown. Tobit model in (2) has lower bound of zero and upper bound of 100. * p<0.10, ** p<0.05, *** p<0.01 Table 12 Impacts on HCI in spatial autoregressive (SAR) model mobile km extension avg: km extn. km tarmac avg: km tarmac coefficient 10.1008 *** 0.1220 -0.4351 ** -0.0161 -0.0085 average direct impact 10.5487 0.1274 -0.4544 -0.0168 -0.0089 p-value 0.0011 0.2653 0.0105 0.9010 0.9520 average indirect impact 12.7171 0.1536 -0.5478 -0.0203 -0.0107 average total impact 23.2658 0.2810 -1.0023 -0.0371 -0.0196 Note: Coefficients are the same as in model (4) in Table 11. Coefficients and impacts are only reported for the access variables of interest. * p<0.10, ** p<0.05, *** p<0.01 98 I now turn to the second measure of output marketing: the log value of crops sold. The estimates in Table 13 are arranged as in Table 11 (Determinants of HCI). The regressors are identical; only the dependent variable is different from the previous output. The linear CRE estimates (column 1) do a fair job at approximating the unconditional average partial effects (APE) from the Tobit-CRE model (column 2). This indicates that the linear model is a satisfactory general measure of average impacts. This is not surprising, given the relatively low number of households reporting zero sales value. In all specifications, mobile phones have a positive impact and significant impact on marketing outcomes. The CF residual is also significant in all specifications, indicating that endogeneity is a valid concern. As with the determinants of HCI, the direction of this bias is negative. Of the distance measures, none of the transitory changes are significant in any specification. Of the persistent changes, distance to extension services is not significant, although access to allweather roads is. This difference from the HCI results is interesting. It suggests that while access to extension (and possibly other services) is the most relevant determinant of smallholder market orientation, proximity to roads is more important for marketed volume. Of non-access factors, results are very similar to the HCI estimates. Landholding, household size, maize wholesale market price and rainfall are the most important positive determinants. To evaluate the existence of spatial dependence, as before, I first calculated Moran’s I statistic on the residuals from the non-spatial linear CRE estimator. The test statistic of 0.08 indicates 99 th moderately positively spatial autocorrelation, significant at the 99 percentile level. Standard and robust LM tests indicate strong support for spatial structure of one or both types. However, as with HCI, test results do not reject either specification, and so I estimate both SEM and SAR specifications separately. Column (3) shows estimation results for the spatial error model (linear-SEM-CRE) and column (4) shows results for the spatial lag model (linear-SAR-CRE). Spatial parameters for both models (lambda and rho, respectively) are highly significant and similar in magnitude. The coefficient on the spatial error term, denoted as 𝜆, is moderately large (0.57), indicating the presence of positive spatial autocorrelation in the innovations. Similarly, the coefficient on the spatial lag, denoted as 𝜌, is highly significant and large. The estimated value of 0.52 indicates important positive endogenous spillovers in marketing outcomes. Robust Wald tests strongly reject the null of 𝜌 = 0 and 𝜆 = 0. The similarity of spatial parameter estimates indicates that the estimated magnitude of spatial dependence does not change much under alternative assumptions of spatial structural (lag versus error). In the SEM model, mobile phones have a significantly positive impact on marketing outcomes. Mobile ownership is associated with an approximate doubling of the expected value of crops sold. The control function residual for mobile ownership is statistically significant, giving support to the idea that endogeneity is an issue to be concerned with. The impacts of shortterm and long-term differences in access to extension are not significant in the SEM. The impacts of short-term changes in access to roads are also not significant, although sustained 100 changes (avg. km tarmac) approach significance. Estimates for the non-access factors are nearly identical to the non-spatial model results. In the SAR model, mobile phones have a significantly positive impact on marketing outcomes. The coefficient estimate is about the same magnitude as the non-spatial models and slightly larger than the SEM result. However, when the spatial lag effects are accounted for (shown in Table 14), the resulting total impact of mobile phones nearly doubles. As with the other specifications, the control function residual for mobile ownership is highly statistically significant. None of the distance variables are significant, however. Of the non-access factors, results are almost identical to those from the other specifications. In summary, we observe that mobile phone access is the most important determinant of crop marketing volume, a result which is consistent across alternative specifications. Distance to tarmac roads is a significant determinant in the non-spatial models but generally not significant in the spatial models. Otherwise, the spatial models give generally similar results to the nonspatial models, but the presence of an endogenous spatial lag implies much larger impacts for the significant determinants in the model. As before, to evaluate between SAR and SEM specifications, we may compare Akaike and Bayesian information criteria, which are reported at the bottom of the table. In this case, information criteria values are almost identical for the SEM and SAR models. This suggests that a preference for one or the other may boil down to judgments about the sensibility of coefficient estimates, rather than a goodness of fit. Because the SAR and SEM coefficient estimates are so similar, however, we do not find much decision-making traction there. In any 101 case, the major difference between SAR and SEM estimation results through the indirect and total impacts. Therefore, as before, the main implication of preferring the SEM when the SAR is the true specification is a downward biased assessment of the average impacts of the determinants of interest. 102 Table 13 Determinants of log value of marketed crop production (all crops) (1) linear CRE Access variables mobile km extension avg: km extn. km tarmac avg: km tarmac CF res: mobile Other factors farm size adult equiv. female age of head education assets maize price rainfall Spatial lambda (3) SEM CRE (4) SAR CRE coeff./p-value (2) APEs from Tobit CRE coeff./p-value coeff./p-value coeff./p-value 0.908 0.007*** 0.0110 0.395 0.00616 0.724 0.00149 0.908 -0.0272 0.067* -0.515 0.011** 0.956 0.000*** 0.0110 0.438 0.0124 0.392 0.00317 0.837 -0.0309 0.074* -0.546 0.006*** 0.779 0.035** 0.00820 0.506 -0.00560 0.789 0.0185 0.212 -0.0285 0.126 -0.424 0.056* 0.902 0.008*** 0.0105 0.381 -0.0155 0.405 0.00840 0.553 -0.0192 0.214 -0.511 0.012** 0.229 0.000*** 0.0608 0.018** 0.134 0.291 -0.00126 0.717 0.00959 0.504 -0.0000524 0.883 0.00424 0.000*** 0.000580 0.046** 0.235 0.000*** 0.0668 0.004*** 0.155 0.232 -0.00170 0.619 0.00908 0.315 -0.0000572 0.877 0.00440 0.000*** 0.000653 0.062* 0.221 0.000*** 0.0708 0.007*** 0.186 0.142 -0.00246 0.528 0.00889 0.583 0.000436 0.298 0.00429 0.041** 0.00102 0.139 0.219 0.000*** 0.0615 0.019** 0.172 0.177 0.00247 0.524 0.0149 0.351 0.000222 0.601 0.00219 0.017** 0.000861 0.009*** 0.567 0.000*** rho 103 0.526 0.000*** Table 13 (cont’d) (1) linear CRE (2) (3) (4) APEs from Tobit SEM CRE SAR CRE CRE coeff./p-value coeff./p-value coeff./p-value coeff./p-value Log likelihood -10690.7 -10723.0 -10647.2 -10674.2 AIC 21435.4 21500.0 21348.3 21402.5 BIC 21608.6 21673.3 21521.5 21575.7 N 4520 4520 4520 4520 Note: Dependent variable is the log of marketed value of crop production. CRE controls and year dummies not shown. Tobit model in (2) has lower bound of zero. * p<0.10, ** p<0.05, *** p<0.01 Table 14 Impacts on log value sold in spatial autoregressive (SAR) model mobile km extension avg: km extn. km tarmac avg: km tarmac coefficient 0.9022 *** 0.0105 -0.0155 0.0084 -0.0192 p-value 0.0078 0.3807 0.4050 0.5532 0.2144 average direct impact 0.9349 0.0109 -0.0161 0.0087 -0.0199 average indirect impact 0.9671 0.0113 -0.0166 0.0090 -0.0206 average total impact 1.9019 0.0222 -0.0327 0.0177 -0.0405 Note: Coefficients are the same as in model (4) in Table 13. Coefficients and impacts are only reported for the access variables of interest. * p<0.10, ** p<0.05, *** p<0.01 104 2.6.2.3 Input marketing I now turn to the determinants of input market participation, where the dependent variable is the log amount of purchased inorganic fertilizer. Estimation results are shown in Table 15, below. The arrangement of the table is similar to the output marketing results shown previously: the first two columns show linear CRE and Tobit-CRE estimates, followed by SEM and SAR linear panel model estimates. Here, the access variables of interest are: mobile ownership, distance to fertilizer/input dealers, and distance to roads. Mobile phones are not significant determinants of fertilizer purchase in either of the nonspatial models. Estimates are shown using the CF residual, which is mostly not significant; however, results change little when this term is excluded from the estimating equation. Of the distance variables, the distance to fertilizer retailer is highly significant, both for transitory changes and sustained differences. In the linear model, a 1km short-term reduction in distance results in an expected increase of 1.3% in purchased fertilizer; a sustained 1km reduction in distance results in an expected increase of 12%. For the Tobit-CRE, these effects are even larger: a short-term and sustained reductions of 1km in result in expected increases of 3% and 19%. The time-average of distance to tarmac is inversely related to fertilizer purchase in both nonspatial models. This corresponds with empirical support for the importance of roads in rural marketing outcomes. These results underscore the importance of multiple access factors and the relative importance of long-term/sustained access conditions (versus transitory changes). 105 Unlike the output marketing results, here we do not find strong support for spatial model specifications. Estimates of the spatial error parameter (lambda) and the spatial lag parameter (rho) are not statistically different from zero. For this reason, I do not comment further on the SEM or SAR model results here. 106 Table 15 Determinants of log fertilizer purchase amount (1) linear CRE Access variables mobile km fertilizer avg: km fertilizer km tarmac avg: km tarmac CF res: mobile Other factors farm size adult equiv. female age of head education assets DAP price maize price rainfall Spatial lambda (3) SEM CRE (4) SAR CRE coeff./p-value (2) APEs from Tobit CRE coeff./p-value coeff./p-value coeff./p-value 0.260 0.174 -0.0135 0.052* -0.122 0.000*** 0.00887 0.160 -0.0523 0.000*** -0.0763 0.515 0.0700 0.732 -0.0307 0.000*** -0.188 0.000*** 0.0102 0.258 -0.0871 0.000*** 0.0533 0.684 0.672 0.005*** -0.0144 0.099* -0.121 0.000*** 0.00828 0.405 -0.0514 0.000*** -0.346 0.017** 0.664 0.006*** -0.0144 0.097* -0.122 0.000*** 0.00821 0.409 -0.0512 0.000*** -0.342 0.018** 0.116 0.000*** 0.0240 0.065* 0.0379 0.571 0.0101 0.000*** 0.0330 0.000*** 0.000141 0.431 -0.00867 0.326 0.00103 0.011** -0.000614 0.000*** 0.117 0.000*** 0.0272 0.058* 0.0246 0.813 0.00823 0.000*** 0.0410 0.000*** 0.000140 0.366 -0.0113 0.074* 0.000914 0.020** -0.000554 0.015** 0.116 0.000*** 0.0211 0.251 0.0862 0.331 0.00948 0.000*** 0.0217 0.054* 0.000182 0.543 -0.00863 0.557 0.000931 0.149 -0.000336 0.161 0.116 0.000*** 0.0209 0.255 0.0879 0.321 0.00931 0.001*** 0.0212 0.060* 0.000183 0.540 -0.00864 0.557 0.000932 0.148 -0.000337 0.159 -0.00304 0.930 rho 107 -0.0377 0.182 Table 15 (cont’d) (1) linear CRE (2) (3) (4) APEs from Tobit SEM CRE SAR CRE CRE coeff./p-value coeff./p-value coeff./p-value coeff./p-value Log likelihood -8196.2 -7773.7 -9100.0 -9099.1 AIC 16450.5 15605.4 18258.0 18256.2 BIC 16636.8 15791.5 18444.3 18442.5 N 4560 4560 4560 4560 Note: Dependent variable is the log of purchased inorganic fertilizer. CRE controls and year dummies not shown. Tobit model in (2) has lower bound of zero. * p<0.10, ** p<0.05, *** p<0.01 108 Interactions between mobile and distance In the conceptual model, I posited that the importance of information for marketing outcomes was increasing in remoteness. In initial specification testing, I found limited support for this. The most robust measure was a simple interaction term defined as mobile ownership interacted with a dummy variable for villages further than 5 hours from a major urban market. Partial results are shown in the table below for each of the three principal market participation outcomes (HCI, log value sold and log fertilizer purchased amount). The interaction dummy is mobile*[>5 hours]. 22 In the output models, the sign is positive and significant, confirming my hypothesis that the positive impacts of mobile ownership on marketing increase with distance from cities. In the input marketing model, the interaction term is not significant. 22 Hours to city is defined as the estimated travel time to the nearest town of 50,000 or more inhabitants. This calculation was done within a geographical information system. 109 Table 16 Mobile-distance interactions in market participation models marketed % of output (1) (2) coeff./p-value coeff./p-value Access variables mobile mobile*[>5 hours] km extension avg: km extn. km fertilizer log value sold (3) coeff./p-value (4) coeff./p-value (5) coeff./p-value (6) coeff./p-value 0.908 0.007*** 0.0110 0.395 0.00616 0.724 0.722 0.030** 0.458 0.003*** 0.00913 0.395 0.00734 0.760 0.260 0.174 0.148 0.160 -0.424 0.004*** 8.624 0.004*** 6.242 0.000*** 0.123 0.202 -0.407 0.073* 0.242 0.170 0.0425 0.614 -0.0704 0.582 -0.0753 0.562 -6.321 0.000*** 4520 -0.0445 0.696 -0.127 0.383 -6.036 0.001*** 4520 0.00149 0.908 -0.0272 0.067* -0.515 0.011** 4520 0.00339 0.789 -0.0310 0.050** -0.493 0.013** 4520 -0.0135 0.052* -0.122 0.000*** 0.00887 0.160 -0.0523 0.000*** -0.0763 0.515 4560 -0.0135 0.017** -0.122 0.000*** 0.00904 0.162 -0.0526 0.000*** -0.0740 0.484 4560 11.14 0.000*** avg: km fert. km tarmac avg: km tarmac CF res: mobile N log fertilizer Note: All models estimates with linear CRE. mobile*[>5 hours] is defined as the interaction between the mobile ownership variable and a dummy variable indicating the village is more than 5 hours travel time by road from the nearest urban market of 50,000 or more inhabitants. Significance levels denoted by * p<0.10, ** p<0.05, *** p<0.01 110 2.7 Conclusions This work has found strong support for the importance of access to infrastructure and services on a range of market participation outcomes. While most other market access studies have focused on static aspects of access and/or indicators which focus exclusively on road infrastructure, my work has emphasized the most rapidly changing aspects of market access in rural Kenya over the last decade: access to mobile phones and rural agricultural services. Household access to mobile telecommunications has large positive impacts on both input and output marketing outcomes. A household with a mobile phone sells a greater share of total crop production and its marketed output is larger in volume, compared with households 23 without a mobile phone . There is limited evidence that these positive influences of mobile phones on marketing outcomes increase with remoteness, suggesting that new technologies are changing the structure of transactions costs over physical space. Access to extension services and roads are also important to output marketing outcomes, although the nature of these impacts depends on how market participation is measured. For marketed share of production (HCI), after controlling for access to extension services, access to tarmac roads does not appear to play a large role in output marketing. However, for marketed volume, extension access is not significant, whereas proximity to tarmac roads is important. This may have to do with different kinds of market engagement strategies responding to 23 In related work not reported here, I have also found that a household with a mobile phone is more likely to engage in high-value marketing than a household without a mobile phone 111 different specific access conditions. Further work may explore these relationships in more detail. For input market participation, measured as volume of inorganic fertilizer purchases, access to fertilizer retailers is an important factor, along with improved access to roads. This finding, together with the HCI results, highlights the importance of non-infrastructural elements of access, which appears to be underemphasized in both research and policy discussions. Such discussion often focuses on physical infrastructure, such as roads and electricity, which are important but may take longer to generate benefits and to affect household marketing behavior. The distinction between short term changes and long-term or sustained differences in access conditions has also yielded some important insights. Except for access to mobile phones, which has been the most rapidly changing component of Kenya’s rural infrastructural landscape over the last decade, the transitory changes in distance to infrastructure and services has not generally found to be significant (except for fertilizer access for input demand). In contrast, longer-term conditions are much more clearly important, both for road infrastructure and extension services. One possible explanation for this is that measurement error in householdreported distances is resulting in attenuation bias (under which the impacts of short-term changes are not detectably different from zero). Another possibility, however, is that transitory conditions in infrastructure over the past decade have simply been insufficient in magnitude to bring about significant impacts. Somewhat related, a third possibility is that even substantive changes in some kinds of access conditions require a significant temporal lag before their 112 impacts are registered. 24 This is probably particularly true for physical infrastructure such as roads. Estimates of partial effects are significantly influenced by the strategy used to address endogeneity in access conditions. Using a control function approach, I found support for the endogeneity of mobile ownership. The direction of endogenous influence is consistently negative for mobile phones, suggesting that (a) mobile owners are more likely to be engaged in non-farm livelihood strategies and less engaged with agricultural marketing, and (b) failing to control for endogeneity will cause impact estimates for marketing outcomes to be biased downwards. There is very clear evidence of spatial dependence in output marketing behavior, although not for input marketing. The structural nature of such spatial dependence is not clear, however. I am unable to definitively rule out a spatial lag or spatial error structure, although either of these models is much more likely than a non-spatial model. Increased precision in coefficient estimates is obtained through either model. The magnitude of estimated direct impacts changes little across model specifications, suggesting that any biases resulting from failure to specify an endogenous spatial lag structure are minor. However, the total impacts accruing through the spatial lag model are considerably higher than just the direct impacts. If this structure is correct, then non-spatial assessments of the importance of access to infrastructure and services are considerably underestimated, as an important channel for their effects is 24 Initial testing of dynamic specifications did not indicate that such lags were observable within this dataset however. 113 through spatial spillovers in smallholder marketing behaviors. Further validation is warranted, but quantifying such spillovers may help to refine the measurement of the impacts of infrastructure investments. Such improvements in impact assessment should lead to better targeting of scarce development resources. Although this work has focused on the relationship between market access and market participation, the results have implications for a broader set of issues related to microeconomic models of household behavior in developing countries. This study has provided evidence that smallholder decision making processes are not independent from those of their neighbors. Further analysis is warranted on the exact structure of interdependence within different decision making contexts, but the importance of addressing spatial dependence is clear: failure to do so implies inefficient estimators, at best, and very possibly inconsistent estimates. I anticipate that evidence in support of this conjecture will continue to accumulate, and spatial models of household decision making will become increasingly mainstream in agricultural economics. This work has also indicated that endogeneity concerns are warranted for many important access indicators. I have used a control function approach to address this endogeneity. A likely profitable area of further research would be to expand exploration of endogeneity issues in rural infrastructure and service provision and to explore ways in which survey instruments might be designed to provide better potential instruments. A major contribution of this work has been to demonstrate the existence of spatial dependence in household-level marketing outcomes observable in rural survey data. The issue of spatial 114 dependence goes far beyond the specific market participation questions I have been concerned with in this essay. If dependence takes the form of an endogenous spatial lag (as it seems to do in this case study), then non-spatial econometric estimates will be inconsistent. Since this study used household data that are representative in many ways of survey data used throughout the developing world, this is potential a very broadly applicable finding. This argues for more explicit testing in a broader range of case studies to confirm the extent to which undetected spatial dependence may be an issue. In the present case study, the adoption of a spatial lag model generally implies larger estimates of the impacts of infrastructure investments. These larger estimates are attributable to the additive direct and indirect effects of spatially lagged impacts. Although the evidence for spatial dependence in my household models is strong, whether or not this dependence is best modeled as an endogenous spatial lag or simply as spatially autocorrelated errors is still uncertain. Model diagnostics suggest that a spatial lag structure is a slightly better fit, but how robust or generalizable this conclusion is. More spatially-explicit empirical analysis is warranted, in Kenya and elsewhere. Georeferenced household surveys with good spatial coverage will enhance this. A related recommendation for household survey design is to suggest spatial sampling frameworks that target along gradients of key geographical factors such as population density and remoteness (perhaps defined by a coarse measure such as hours travel to a large urban center). Geographic information systems can help enable the design and implementation of such sampling frameworks. 115 A final point: telecommunications, roads and the provision of inputs and supporting services very likely have synergistic effects that go far beyond what I have been able to document directly in this study. Rural town development that emphasizes a range of services and infrastructure complementarities will likely have the largest impacts on the effective market access conditions perceived in a given area. In order to better assess how such complementarities work, we need better and more frequently collected information on local service provision and other quickly changing (or potentially quickly changing) aspects of marketing environments. 116 3 Population density, remoteness and farm size: exploring the paradox of small farms amidst land abundance in Zambia 3.1 Motivation Although some areas of sub-Saharan Africa are characterized by dense rural populations (e.g. parts of the East African highlands), arable land in the region is generally considered to be an abundant resource (e.g. UNDP 2012). Despite this apparent abundance, average farm sizes remain very small, are inversely related to population density, and appear to be declining over time as rural population densities rise. Rural landlessness appears to be rising in some parts of the region. Recent evidence from household surveys in the region suggests several pervasive features of the smallholder sector. First, over the past 50 years there has been a gradual but steady decline in mean farm size as rural population growth has outstripped the growth in arable land (Figures 13 and 14). Second, half or more of Africa’s smallholder farms are below 1.5 hectares in size with limited potential for area expansion (Jayne et al. 2003). Most of this bottom 50% of farmers tend to produce very little agricultural surplus, and make very little use of productivity-enhancing inputs. Third, a high proportion of farmers perceive that it is not possible for them to acquire more land through customary land allocation procedures, even in areas where a significant portion of land appears to be unutilized (Stambuli, 2002; Jayne et al., 2008). Fourth, in some areas such as Kenya, roughly a quarter of young men and women start their families without inheriting any land from their parents, forcing them to either commit themselves to off-farm employment or buy land from an increasingly active land sales market (Yamano et al., 2009). These trends suggest pervasive constraints to land expansion by small farmers, especially in high density areas (Jayne et al. 2012). Moreover, these observations 117 strongly suggest a critical re-assessment of the role of land access in African smallholder development processes, particularly in contexts of dense rural populations with high population growth rates. What about countries with relatively low population densities? Are land access constraints also important in such contexts? I address this question by examining the case of Zambia, a country with one of the lowest overall rural population densities in the region. The conventional narrative of smallholder land constraints in sub-Saharan African emphasizes growing rural populations and finite land resources, usually expressed in terms of high-and-rising rural population densities. In Zambia, a land abundant country, access to arable land is generally not seen as a major constraint. Instead, the predominance of very small farm sizes is often ascribed to limited access to the labor and traction resources required for area expansion (Siegel 2008). This explanation, however, is not consistent with several important empirical observations. First, we observe farm sizes which are, on average, much smaller than the limits imposed by technology/labor constraints alone (Table 17). As a rule of thumb, a typical household can cultivate up to about 2 hectares under family labor alone and up to about 4 hectares with animal traction (Chapoto et al. 2012). For Zambian smallholders (defined as cultivating 10 hectares or less), 70% cultivate less than 2 hectares and almost 40% cultivate 1 hectare or less. Second, farm sizes are shrinking over time and rural populations are increasing (Figures 13 and 14). This pattern is very characteristic of the high-and-rising labor-land ratio narrative, most frequently told for countries like Malawi and Rwanda. It is surprising to observe in a country with some of the lowest rural densities in the region. 118 Third, small farmers do report constrained access to land, even in relatively low-density areas (Table 18). Although perceptions of land access constraints do appear to increase with rural population density, the fact that farmers report land access constraints even in low-density areas suggests that there may be additional institutional issues at play. 119 Figure 13 Rural population growth rates, 1960-2010 Source: World Development Indicators database. 120 Figure 14 Declining arable land per capita, 1960-2010 Source: World Development Indicators database. Notes: Per capita calculation based on rural population, defined as the difference between the total population and the urban population. Estimates of national urban population shares are from the United Nations Population Division’s World Urbanization Prospects: The 2009 Revision. Data on national total population are World Bank estimates, compiled and produced by the Development Data Group in consultation with its Human Development Network, operational staff, and country offices. Data on land area, permanent cropland, and arable land are from the FAO, which gathers these data from national agencies through annual questionnaires and by analyzing the results of national agricultural censuses. Arable land “includes land defined by the FAO as land under temporary crops (double-cropped areas are counted once), temporary meadows for mowing or for pasture, land under market or kitchen gardens, and land temporarily fallow.” (WB 2012: p 141). 121 Table 17: Landholdings, rural population densities and survey coverage Median landholding size (ha) 2001 2004 2008 Province Central Copperbelt Eastern Luapula Lusaka Northern North Western Southern Western National 2.0 1.7 1.6 1.8 1.0 2.4 1.1 2.0 1.0 1.7 1.8 1.3 1.7 1.8 1.2 1.9 1.3 1.9 1.4 1.6 1.6 1.3 1.7 1.8 0.8 1.4 1.3 2.4 1.2 1.5 Average rural population density in survey villages 2 (persons/km ) 10.10 16.09 31.45 32.98 5.22 12.45 6.11 24.06 8.56 19.03 Number of villages surveyed 39 24 72 46 13 80 28 49 41 392 Number of households surveyed 465 247 911 372 110 776 250 542 426 11,715 Notes: Median landholding size is defined for smallholders cultivating 10 hectares or less who appear in all rounds of the CSO/MACO/FSRP supplemental surveys (2001, 2004, and 2008). Data on rural population density come from the AfriPop dataset for 2000. The last two columns refer to number of villages and households surveyed in all three panel periods of the supplemental survey. Both datasets are described further in the data section. Table 18: Landholding size and perceptions of local land availability Rural population density (persons/km2) < 25 25 - 50 50 - 100 > 100 National Average landholding size (ha) 2.22 2.91 2.46 1.58 2.34 Percentage of respondents reporting that… unallocated land is available, no unallocated land is but it is not available to this available in this area household 54% 9% 76% 12% 72% 14% 84% 20% 59% 9% Notes: Data reported in this table come from the 2008 round of the CSO/MACO/FSRP supplemental survey. The last column indicates the percentage of respondents which have reported that land is available locally, but also report that the unallocated land is not accessible to their household. 122 Taken together, these facts suggest that access to land in Zambia is more complex than simple explanations might suggest. But how problematic is this? How important is land for farm production and household income growth? Alternatively, if land availability is constrained beyond what rural population density metrics would suggest, is there evidence of induced innovation, i.e. intensification induced by relative land scarcity? Agricultural intensification appears to taking place at only a modest rate in Zambia and productivity remains low (Chapoto et al. 2012). The predominant theoretical perspectives on rural development suggest that decreasing land-labor ratios should encourage intensification pathways (Boserup 1965, Hayami and Ruttan 1970, Pingali and Binswanger 1987, Binswanger and McIntire 1987). As urban populations also expand, rising demand will be reflected in higher food prices, further incentivizing intensification of available crop land (Mellor 1973). 25 Although mean densities are low in Zambia, observed small holding sizes signals relatively high labor-land ratios at the farm level. Within this context, what evidence is there for induced innovation associated with relative land scarcity (whether or not such scarcity is related to observable population densities)? In any event, there is little doubt that rural populations are growing rapidly: the average annual rural population growth rate for the last decade is 2.32% in Zambia, compared with 1.76% for SSA as a whole (also see Figure 13). 26 Furthermore, land acquisitions by non-local investors are 25 Higher food prices will also provide net buyers with incentives to intensify, to avoid the higher costs of relying on the market. 26 Growth rates are averaged over 2000-2010. Data come from the World Development Indicators database. 123 on the rise: the so-called “land grab” by foreign investors, as well as speculative land market participation by national elites, are an increasingly documented feature of Zambia’s rural landscape (German et al. 2011). These trends can only exacerbate any existing constraints to arable land acquisition. This mystery of small farms amidst land abundance in Zambia prompts the following research questions: 1. Measurement and interpretation of population density: Are rural population densities a meaningful indicator of the relative factor endowments faced by small famers Zambia? Or are effective labor-land ratios much higher than they appear to be from conventional rural population density statistics? Specifically: a. Is there a fallacy of statistical aggregation through which high local densities are masked by aggregate level statistics? b. Does accounting for land quality make a difference in how we interpret the economic meaning of density? 2. Institutional constraints to area expansion: Related to the above question, we might ask: are there hidden claims on land that are obscured by simple density metrics? Specifically: a. What are the key institutions facilitating access to land and how do they vary with population density and access to markets? 3. Technological constraints to area expansion: What technologies are most important for enabling expansion in land abundant areas? Specifically: 124 a. What is the role of animal and mechanical traction in household cultivated area outcomes? What are the factors that govern access to these technologies? 4. Cultivated area, farm productivity and welfare: What is the role of farm size in farm management, performance and welfare outcomes? Specifically: a. Is intensification associated with small holding sizes? If so, under what conditions? b. How important is farm size for productivity, production and/or income generation? The first three questions attempt to clarify the determinants of land access. The fourth question seeks to clarify the impacts of land access in the smallholder development process. It is important to acknowledge at the outset that this research addresses some of these questions better than others. For example, with the household panel data at my disposal, I am able to observe cultivated area outcomes and technology use. However, I only imperfectly observe institutional factors and therefore conclusions related to such factors are more tenuous. The rest of this paper is laid out as follows. Section 2 reviews the major features of Zambian smallholder agriculture, rural population distributions and access to land. Section 3 describes the conceptual framework underlying the analysis and Section 4 describes the data used to implement this framework empirically. My estimation strategy is given in Section 5 and results are presented in Section 6. I discuss policy options in Section 7 and offer summary remarks in Section 8. 125 3.2 Access to land and rural development in Zambia 3.2.1 The paradox of land constraints under land abundance The population of Zambia is about 13 million, 61% of which reside in rural areas and earn their incomes primarily from agriculture (ZBS 2011). On the face of it, this rural majority enjoys a relatively beneficent resource endowment: agricultural potential is good in most parts of the country, with adequate amounts of rainfall (Figure 16), productive soils and moderate terrain suitable for agriculture. Rural population densities are relatively low in most areas of the country (Figure 17) and agricultural land is very widely perceived to be an abundant resource in Zambia. The overwhelming majority of Zambia’s agricultural producers are smallholders, conventionally defined as farming 10 hectares or less. Of these, 50% farm less than 2 hectares and about a quarter have farms of one hectare or less. These very small farm sizes characterize the rural farming population even in the lowest density areas of the country (Jayne et al. 2008). The small farm sector is characterized by low productivity, low levels of market engagement, and high poverty (Chapoto et al. 2012). Cultivation is largely non-mechanized, relying on hand hoes and oxen, and heavily dependent on family labor. There is minimal use of purchased inputs such as hybrid seed or inorganic fertilizer. On most farms, production is largely oriented to meet household consumption needs and most small farm production heavily emphasizes staple crops such as maize, groundnuts, roots and tubers. While the low-value, low-input, subsistence oriented portfolios characterize the majority of the sector, there is also some higher-value production and marketing taking place. In addition to 126 staple food crops, many small farmers produce higher-value cash crops, such as cotton, tobacco and paprika, and small livestock, primarily poultry and pigs for home consumption. In higher density areas close to urban centers there is also greater production and marketing of highvalue perishables, like fruits and vegetables, milk and eggs. For the most part, these marketengaged households are also farming relatively small amounts of land, although households in the 5-20 hectare category (referred to in Zambia as medium-scale farmers) tend to be relatively commercialized in their production and marketing patters. This paradox of small farms amidst apparent land abundance has several possible explanations. Many researchers have asserted that land is not the binding constraint to production for Zambian smallholders, but rather access to inputs including, and most critically, labor (Alwang et al. 1996, Wichern et al. 1999, Smith 2004, Ajayi et al. 2012). At the same time, there has been increasing recognition that much of the arable land in the country is effectively off-limits to smallholder expansion (Jayne et al. 2008). About half of Zambia’s total land area of 753,000 square kilometers is arable and nominally available for agriculture (47%); the remainder consists of National Parks and Game Management Areas (30%), lands unsuitable for agriculture (24%) or urban areas (2%; Jayne et al. 2008). Even the land resources classified as arable may not be truly available for agricultural expansion: other “hidden” constraints include lands under the control of mines, schools, health care facilities and other government institutions (Commission on Agriculture and Lands 2009). Furthermore, some land resources, such as some 127 forested areas, may have high opportunity costs of conversion which effectively place these areas out of reach for small farmers. 27 Jayne et al. (2008) also note that much of the remaining arable land which is theoretically available for smallholder cultivation is remote from markets and supporting infrastructure. Figure 18 shows the estimated travel time to large urban centers. This spatial pattern coincides with the greatest density of urban centers, road networks and other infrastructure (Figures 18 and 19.) There are two primary areas of relatively high access: along the rail line that runs through the center of the country, from Livingstone in Southern province, up through Lusaka and Central provinces, to the urban centers of the Copperbelt province. The second is in densely populated Eastern Province, along the Malawian border. The majority of the rural population is spatially concentrated in these areas of better access (Haggblade et al. 2009). In fact, the distribution of rural population density corresponds much more with access to markets than it does with most measures of agricultural production potential. To illustrate this, the distributions of rural population densities within different categories of market access and length of growing period (LGP) are shown in Table 19. While very low densities are observed in th th all categories of access, the upper ranges (e.g. 75 and 90 percentile values) are much higher in the most accessible areas than in the most remote areas. Such patterns are not so clearly observed with length of growing period. These general patterns are very robust to alternative 27 Spatial data on land cover, terrain and other geographical data are available and it should be possible to address this question. Such an effort, however, is beyond the scope of this essay. 128 measures of access and agricultural potential. The key message here is that accessibility appears to be playing a large role in the distribution of rural populations. 129 Table 19: Rural population density by categories of market access and agricultural potential hours to city <2 2-4 4-6 6-8 >8 Total LGP (days) 120-140 140-160 160-180 180-200 200-220 Total 10th 3.9 3.7 2.4 1.2 1.0 1.1 percentile 25th 50th 75th 6.5 14.0 26.9 5.9 9.4 19.1 4.1 7.0 14.5 3.4 6.3 11.5 2.4 4.3 6.6 2.8 5.2 9.1 90th 45.8 34.2 24.8 20.4 13.5 17.3 10th 0.6 0.7 1.1 2.0 3.0 1.1 percentile 25th 50th 75th 1.3 3.8 5.8 1.7 4.0 11.3 2.6 4.3 9.2 3.6 6.2 9.0 4.7 5.6 6.6 2.8 5.2 9.1 90th 9.1 23.7 16.7 14.1 11.4 17.3 Note: hours to city is the estimated travel time to the nearest town of 50,000 or more inhabitants. LGP = length of growing period. Population density data are from AfriPop for 2010. Other data sources are described in the data section. 130 Persons per square km Figure 15: Relationship between rural population density and market access Hours to city Note: hours to city is the estimated travel time to the nearest town of 50,000 or more inhabitants. Population density data are from AfriPop for 2010. Perhaps not entirely unrelated to remoteness, another dimension of land access is institutional. In fact, it is virtually impossible to talk about smallholder access to land in Zambia without reference to what Sitko (2010) calls the country’s “fractured system of land governance,” comprising State and Customary lands. State lands are held under leasehold tenure (99-year leases, after which control reverts to the State), which means they are formally titled, exchangeable on an open land market, and which may be used as collateral in formal credit markets. Zambia’s large scale commercial farms are all on State lands. Smallholder farmers, on the other hand, produce largely under customary tenure. Customary land is allocated under the 131 authority of local chiefs, in accordance with locally-specific traditions and norms. There is variation in these traditions from place to place but, in general, household landholding outcomes under customary tenure depend on factors such as the prevailing local cultural institutions governing marriage and descent, a household’s social capital and community ties, and the integrity of customary leadership in a particular area. 132 Figure 16: Average annual rainfall Note: Data come from the WorldClim database (Hijmans et al. 2005). For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. 133 Figure 17: Rural population density Source: AfriPop, for 2010. 134 Figure 18: Access to large urban centers Source: Author’s calculations. 135 Figure 19: Distribution of urban settlements and road infrastructure Sources: Population data from Zambia Bureau of Statistics. Road and settlement data from GRUMP database. 136 3.3 Conceptual model In this study, I am interested in how environmental conditioning factors such as population density and market access influence cultivated area outcomes at the household level and how these outcomes, in turn, enter into farm production choices and welfare outcomes. My conceptual model is separated into two parts: (i) the role of farm size in input demand and output supply decisions by farm managers; and (ii) the role of farm size in rural household welfare outcomes. 3.3.1 Relative factor endowments and farm management Boserup’s classic (1965) monograph described a stylized pathway of agricultural development under increasing population density that has become a touchstone for subsequent theory: as rural population density increases, farmers are induced to intensify cultivation, first by shortening fallow periods in shifting cultivation systems, and subsequently by adopting continuously-cultivated annual cycles, eventually adopting multiple cropping cycles per year. These changes are accompanied by increases in inputs per unit area of cultivated land. This idea was expanded upon through the theory of induced innovation, a framework developed to evaluate the relationships between resource endowments, population growth and technical change (first outlined in Hayami and Ruttan 1971 and subsequently expanded by Binswanger and Ruttan 1978, Hayami and Kikuchi 1981, Ruttan and Hayami 1984, and Hayami and Ruttan 1985). The basic idea is that changes in relative resource endowments (e.g. the labor-land ratio implied by population density) influence technical change by incentivizing the substitution of relatively more abundant factors of production for relatively scarce alternatives. 137 Thus, under rising labor-land ratios, as labor becomes less expensive as a factor of production (relative to land), production systems will tend to become more labor intensive as farm mangers substitute labor for land. Similarly, the relative costs of capital inputs will determine the feasibility of alternative capital-intensification strategies. The foregoing discussion implies that we would generally expect to find land-extensive production strategies pursued in low density areas and intensification strategies pursued in high density areas. There are, however, additional factors that play important roles in determining the relative costs of land use strategies. Here, I highlight two issues in particular: the role of institutions governing land access, and the role of technologies that enable expansion. Although we would generally expect that rural population density measures are meaningful indicators of relative land scarcity, there are several reasons why this assumption might be questionable. To begin with, this assumption ignores the role of local institutional mechanisms for allocating land under scarcity. Under well-functioning land markets, scarcity (as well as quality, etc.) will be reflected in land rental and sales prices. Under traditional systems, land may be allocated by other criteria. In no way are these processes guaranteed to be distribute resources in ways which are proportional to rural population density. There may be important hidden claims on land in a local area. For example, in an area with low population density and apparently abundant land resources, there may still exist large tracts of land which have claims made on them (legally or otherwise) by agents who may choose not to develop, e.g. land grabs for speculative purposes. Furthermore, as noted in the literature review, although land markets 138 are semi-liberalized in Zambia, the functioning of these markets varies considerably from place to place. Although most smallholder farmland is held under customary tenure, considerable conversion of lands to leasehold tenure has taken place in recent years 28 and land within a given area may consist of a mix of tenures, with different alllocative mechanisms associated with each. The impact of local land resource endowments on household landholding outcomes is likely to be mediated by the prevailing local institutions, which may be market-based or nonmarket-based or some combination of the two. Complicating the picture even further, this institutional heterogeneity may interact with household characteristics: household social capital plays an important role in accessing both non-market and market land allocation mechanisms (Sitko 2010). Another second set of issues has to do with the fact that household land use decisions are conditioned not just by the (actual or implied) cost of land, but also by the costs of using land in productive activities. Even if land itself were costless, there are conditions under which expansion may not be economic beyond a certain threshold. Consider the case of animal traction, which is a key component of extensive cultivation strategies: as the fixed and/or variable costs of accessing traction services increase, they become less viable. It is likely that the costs of accessing traction services markets increases with remoteness and decreases with population density (e.g. because of the greater costs of supplying services to thin markets). 28 Because comprehensive data on tenure status is not available, the extent of conversion is not clear. However, the rate of conversion is possibly very high. Editorials in the popular press often make assertions such as the following: “Customary land is also being turned into state land so easily and at an alarming rate” (The Post 2012). 139 These remoteness-imposed constraints are likely to be particularly important in a country like Zambia, where so much of the country is remote from markets. Household model Using these ideas as a basic framework, I define farm level management choices as function of farm size (i.e. cultivated area) and other factors which are influenced by population density and market access. Starting with a basic model of a profit-maximizing farm household, we may derive standard input demand and output supply decisions as functions of input and output prices, conditioned by other productive assets. 29 Because local rural populations constitute both supply and demand sources for staple commodities, we would expect that producer prices for such staples are a function of local population density. Assuming that households are net consumers, rising densities represent increased demand with consequent price reductions. Similarly, we expect that agricultural wage rates are also responsive to rural population densities. If smallholder households are net suppliers of agricultural labor, then rising densities will be associated with reductions in local wage rates. 29 Formulations of this model often assume separability of household production and consumption decisions, an assumption which implies well-functioning markets. This assumption may not be tenable in all parts of rural Kenya (Omamo 1998). In the presence of market failures, production and consumption decisions are non-separable. When this is the case, variables that affect consumption decisions, such as the prices of consumption goods, household wealth, labor availability, and other household characteristics, also affect production decisions. Thus, separable models are embedded within non-separable models. In this study, I assume non-separability and therefore my formulation of input supply and output demand is conditional on productive assets and other household characteristics which affect both production and consumption decisions. 140 In terms of productive assets, arguably the most important is land. After controlling for land quality and proximity to markets, landholding size is also a function of rural population density. More concretely, to the extent that population densities are reflecting land scarcity, we would expect to find smaller farm sizes in areas with higher densities. Generally speaking, then, we may model prices for staple outputs as well as some inputs (such as labor) as: (4) 𝑃 𝑜𝑗 = 𝑓 (𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑗 , 𝑎𝑐𝑐𝑒𝑠𝑠 𝑗 , 𝑋 𝑜𝑗 ) 𝑃 𝑖𝑗 = 𝑓 (𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑗 , 𝑎𝑐𝑐𝑒𝑠𝑠 𝑗 , 𝑋 𝑖𝑗 ) where 𝑃 𝑜 is the price of output o in community j, 𝑃 𝑖 is the price of input i in community j, density is a measure of local rural population density (which may include non-linear transformations), access is a measure of market access/remoteness, and 𝑋[𝑜, 𝑖]𝑗 is a vector of other factors related to the local price of the particular output or input. Similarly, we may render landholding for farmer i in community j as: 𝑙𝑎𝑛𝑑 𝑖 = (5) 𝑓 (𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑗 , 𝑖𝑛𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑠 𝑗 , 𝑡𝑒𝑐ℎ𝑛𝑜𝑙𝑜𝑔𝑦 𝑖 , 𝑙𝑎𝑏𝑜𝑟 𝑖 , 𝑍 𝑖 , 𝐶 𝑗 ) 𝑖𝑛𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑠 𝑗 = 𝑓 (𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑗 , 𝑎𝑐𝑐𝑒𝑠𝑠 𝑗 , 𝑠𝑜𝑐𝑖𝑎𝑙 𝑖 , 𝑍 𝑖 , 𝐶 𝑗 ) 𝑡𝑒𝑐ℎ𝑛𝑜𝑙𝑜𝑔𝑦 𝑖 = 𝑓 (𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑗 , 𝑎𝑐𝑐𝑒𝑠𝑠 𝑗 , 𝑍 𝑖 , 𝐶 𝑗 ) where land is the cultivated area decision, institutions is a set of factors related to local land allocation institutions, 𝑡𝑒𝑐ℎ𝑛𝑜𝑙𝑜𝑔𝑦 includes variables related to land expansion-enabling 141 30 technologies such as animal and mechanized traction, labor is household labor availability , 𝑠𝑜𝑐𝑖𝑎𝑙 is a set of household social capital characteristics which may be important to accessing institutional mechanisms, 𝑍 𝑖 is a vector of household characteristics and 𝐶 𝑗 is a vector of community level characteristics which may further affect land access. With these components in place, we may then clarify how population density and market access operate through prices and farm size decisions to enter into profit-maximizing farm management decisions. For example, let us define an output supply response function for farmer i as: (6) 𝑆𝑢𝑝𝑝𝑙𝑦 𝑖 = 𝑓 (𝑃 𝑜𝑗 , 𝑃 𝑖𝑗 , 𝑙𝑎𝑛𝑑 𝑖 , 𝑍 𝑖 , 𝐶 𝑗 ) The impact of population density in this process may be derived as the total partial derivative: 𝑑(𝑆𝑢𝑝𝑝𝑙𝑦) = 𝑑(𝑑𝑒𝑛𝑠𝑖𝑡𝑦) (7) 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝑑(𝑙𝑎𝑛𝑑) 𝜕𝑓 𝑑(𝑝𝑟𝑖𝑐𝑒𝑠) + + ∗ + ∗ 𝜕𝑍 𝜕𝐶 𝜕𝑙𝑎𝑛𝑑 𝑑(𝑑𝑒𝑛𝑠𝑖𝑡𝑦) 𝜕𝑝𝑟𝑖𝑐𝑒𝑠 𝑑(𝑑𝑒𝑛𝑠𝑖𝑡𝑦) + + 𝜕𝑙𝑎𝑛𝑑 𝑑(𝑖𝑛𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑠) 𝜕𝑓 ∗ ∗ 𝜕𝑙𝑎𝑛𝑑 𝜕𝑖𝑛𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑠 𝑑(𝑑𝑒𝑛𝑠𝑖𝑡𝑦) 𝜕𝑓 𝜕𝑙𝑎𝑛𝑑 𝑑(𝑡𝑒𝑐ℎ𝑛𝑜𝑙𝑜𝑔𝑦) ∗ ∗ 𝜕𝑙𝑎𝑛𝑑 𝜕𝑡𝑒𝑐ℎ𝑛𝑜𝑙𝑜𝑔𝑦 𝑑(𝑑𝑒𝑛𝑠𝑖𝑡𝑦) What if there are other mechanisms through which population density impacts management decisions? Much recent scholarship in technology adoption has emphasized the role of 30 Note that I treat labor availability as a household-level variable. This makes sense under the assumption of weak labor markets. In the empirical specification, however, I use both household labor availability and local wage rates as indicators of the relative cost of labor. 142 information networks and social learning in conditioning technology awareness, usage and profitability (e.g. Baerenklau, 2005; Foster and Rosenzweig, 2010; Conley and Udry, 2010). Since input and output choices in our model may be considered technologies, we might suspect that their levels are conditioned not just by prices and available land but also by the availability of information relevant to their use. More densely populated rural environments may offer better opportunities to acquire such information. To allow for this, we might extend equation (3) to include an additional channel of influence, i.e. (8) 𝑆𝑢𝑝𝑝𝑙𝑦 𝑖 = 𝑓 (𝑃 𝑜𝑗 , 𝑃 𝑖𝑗 , 𝑙𝑎𝑛𝑑 𝑖 , 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑗 , 𝑍 𝑖 , 𝐶 𝑗 ) where now population density also enters the model directly, as 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑗 . 31 The conceptual framework outlined above may be represented graphically as in the figure below (Figure 20). Note the influence of population density on landholding outcomes is mediated through institutions governing land access, which may or may not involve functioning land markets. Farm size outcomes (measured as cultivated area) are conditioned by population density and market access operating through several intermediate channels. It is worth unpacking this a bit now, as I will examine some of these relationships in more detail later. First of all, institutional It is possible to conjecture that population density is also affecting 𝑍 𝑖 and 𝐶 𝑗 , and thus the indirect channels expressed in the total partial derivative become correspondingly more complex. For the sake of tractability, I maintain the assumption that, after allowing for the direct and indirect channels described above, any residual mechanisms of indirect impact are negligible. 31 143 mechanisms of land access (be they market-based transactions of leasehold land, non-market allocations of land under customary authorities, or a hybrid such as the clandestine marketing of customary land) are likely to increase the effective costs of land acquisition in some proportion to the economic value of that land. Under very standard Ricardian concepts of land rent, we would expect that such value increases with proximity to urban markets (as well as with land quality and possibly also with other amenities of place). For convenience, I will call these institutional constraints to land expansion (although under well-functioning land markets equilibrium prices will be allocatively efficient). Technological means of area expansion, such as animal and mechanical traction, are likely to be more costly with increasing remoteness and with decreasing population densities. The primary reason for this would be the increased costs of participation in thin markets: as the number of buyers and sellers decreases, the costs of information, coordination, and other aspects of market transactions all increase (Fafchamps 1992, 2004). Furthermore, the fixed costs of maintaining oxen or tractors increases with remoteness (because of higher costs of veterinary/mechanical services, etc.), making their availability less likely in remote areas. We may combine these ideas in the stylized graphs shown in Figure 21. For convenience, I combine population density and market access measures on the x-axis since they tend to move together in the real world and conceptually play a similar role here. As density/access increases, the cost of accessing expansion-enabling technologies decreases. Land rents (as reflected by market or non-market institutions) are increasing, however, over the same density/access gradient. Thus, in a stylized way, the costs of obtaining usufruct rights over land may decrease 144 to zero in very remote areas, although the exercise of such rights will be increasingly constrained by the costs of exceeding the cultivated area limits imposed by family-labor/handhoe cultivation. Figure 20: Conceptual framework showing role between population density, market access and cultivated area in farm management decisions Market access Animal traction Land prices Cultivated area Population density Non-market institutions Mech. traction Labor Wage rates Output prices Information 145 Input demand & output supply Figure 21: Hypothesized relationship between land expansion constraints and access conditions HIGH A: Technology versus institutional constraints to expansion LOW Constraints to area expansion LOW HIGH Density & access HIGH B: Aggregate constraints to expansion technology constraints dominate institutional constraints dominate Constraints to area expansion LOW aggregate constraints to expansion LOW Density & access 146 HIGH 3.3.2 Rural population density and household welfare Higher population density implies smaller farm sizes. Institutions governing land access may moderate this influence significantly. These institutions may also respond to farm size distributions. Farm size is a driver of local migration patterns: bountiful land attracts migrants; scarce land promotes emigration. However, the value of farm size is understood in terms of income earning potential, which is a function of both land quality (agricultural potential) and access to markets. Thus, agricultural potential, market access and land availability operate jointly to drive population growth. Population growth, in turn, increases population density. 147 Figure 22: Conceptual framework showing linkages between population density and income 148 Public investments may amplify agricultural potential, market access, and migration patterns. For example, investments in irrigation infrastructure may serve to elevate the agricultural potential of a given area. Similarly, investments in road infrastructure, electrification, health facilities and other types of public goods may increase production potential and/or decrease the costs of market access, thereby enhancing farm income potential (and may also indirectly affect non-farm income potential, if such public investments also increase the profitability of non-agricultural industrial activities). Because of the multiple channels in which population density may affect income that cannot be modeled directly because of data unavailability, I cannot define and measure these channels explicitly. Instead, I take a reduced form approach, building on a standard household model of utility maximization. Consider a utility-maximizing farm household along the lines of Singh, Squire and Strauss’ (1986) model, whose level of well-being at time 𝑡 is denoted by 𝑌 𝑡 . Assuming non-separability of production and consumption decisions, which is likely to characterize many households in rural Zambia, I define 𝑌 𝑡 as a function of prices, household- and community-level asset endowments. I write this function as: (6) 𝑌 𝑡 = 𝑓 (𝑃 𝑜𝑗 , 𝑃 𝑖𝑗 , 𝑙𝑎𝑛𝑑 𝑖 , 𝑍 𝑖 , 𝐶 𝑗 ) where 𝑃 𝑜𝑗 , 𝑃 𝑖𝑗 , 𝑙𝑎𝑛𝑑 𝑖 , 𝑍 𝑖 and 𝐶 𝑗 are defined as in the previous section. Also as before, let us assume that 𝑃 𝑜𝑗 , 𝑃 𝑖𝑗 and 𝑙𝑎𝑛𝑑 𝑖 are functions of population density and market access. Finally, as we did with farm management decisions in the preceding sub-section, we might want to allow for population density to condition welfare outcomes directly as well as indirectly. For 149 example, population density may modulate the value or performance of some types of public assets that have a bearing on welfare outcomes. One way in which this might work is by conditioning the availability of information: more densely population places may be characterized by more rapid and/or accurate diffusion of information about market prices, the availability of marketed goods and services, transportation conditions, etc. (Tiffen et al.1994). To allow for this, I extend equation (6) to include a direct measure of the information benefits accruing through local population density in community 𝑗 as: (7) 𝑌 𝑡 = 𝑓 (𝑃 𝑜𝑗 , 𝑃 𝑖𝑗 , 𝑙𝑎𝑛𝑑 𝑖 , 𝐼 𝑗 �𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑗 � , 𝑍 𝑖 , 𝐶 𝑗 ) Finally, we might be interested in the effect of asset distributions on welfare outcomes. There is substantial empirical and theoretical support for the idea that the distribution of assets has strong implications for growth outcomes: populations with egalitarian asset distribution patterns have tended to experience higher rates of economic growth than populations with highly concentrated asset distributions (Johnston and Kilby 1975, Mellor 1976). The principal mechanism underlying this relationship is the multiplier effects arising from local consumption expenditures: households allocate a portion of their consumption to local non-tradable goods and services, which provide income for local producers and service providers. Because poorer households allocate larger expenditure shares to local consumption, the aggregate effect of is much larger when assets are equally distributed, than when they are highly concentrated. This generalization extends to land, which is the most important asset for rural households. Recent scholarship has provided empirical support for the importance of land distributions on income growth in rural communities (Quan and Koo 1985, Deninger and Squire 1998). 150 To implement this idea, I augment the household welfare model with a term describing local distributional equality of land: (8) 𝑌 𝑡 = 𝑓 (𝑃 𝑜𝑗 , 𝑃 𝑖𝑗 , 𝑙𝑎𝑛𝑑 𝑖 , 𝐼 𝑗 �𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑗 � , 𝑐𝑜𝑛𝑐𝑒𝑛𝑡𝑟𝑎𝑡𝑖𝑜𝑛 𝑗 , 𝑍 𝑖 , 𝐶 𝑗 ) where 𝑐𝑜𝑛𝑐𝑒𝑛𝑡𝑟𝑎𝑡𝑖𝑜𝑛 𝑗 denotes the concentration or skewness of landholdings within the area. I describe how I implement this term in the data section. 3.3.3 Land quality and the interpretation of population density The conventional measure of population density – i.e. persons per square kilometer – implicitly treats land as a homogenous resource. However, failing to account for differences in land quality can systematically bias empirical evaluations of the determinants of productivity (Bhalla and Roy, 1988; Benjamin, 1995; Lamb, 2003; Assução and Braido, 2007). Pingali and Binswanger (1987) made this point explicitly about comparing labor-land ratios across countries in different regions: they assert that it may be very misleading to compare population densities across areas with different production potentials. The same point can be made for comparing areas within a country. As noted above, population density is important in development theory because of what it implies about resource scarcity. However, under a given set of production technologies, geographical areas with greater production potential are capable of sustaining larger numbers of people. It stands to reason, then, that the standard definition of rural population density may under-value the effective population density in low potential areas. I 151 address this issue primarily by using a rich set of agroecological controls in my empirical specifications. I also, however, propose and test some alternative density measurements. 32 3.3.4 Hypotheses I anticipate finding that population density and market access both play a role in conditioning farm size outcomes and that farm size is an important factor in farm management choices and welfare outcomes. However, the channels of impact are complex, involving multiple indirect impacts and (possibly) direct impacts. I hypothesize that I will find the following relationships: • Cultivated area increases with population density and market access at low levels, but decreases at higher levels, approximating an inverted U-shape o This is a result of the relative importance of different constraints which are functions of density and access:  Technology constraints (such as access to animal and mechanical traction) are decreasing in density and access  Institutional constraints (including prices) are increasing in density and access • Output prices increase in population density and decrease in market access 32 Yet another potential problem is that population density measures may fail to reflect skewed distributions of actual land access within the units for which it is defined. In other words, the density of a given area tells us something about the average relationship in that area, but not about the distribution within the area. To a certain extent, this is simply an issue of scale. In this study, I emphasize local estimates of population density which are much more meaningful to local outcomes than averages taken over large areas. However, it is important to note that the population data I use are not based on observations of actual landholdings, but rather of people known to live within a given areas. I address this question more fully later in the paper. 152 • Agricultural wage rates decrease in population density and market access Because output supply and input demand are functions of farm size, output prices and wage rates (and may also include important direct impacts) the net impacts of conditioning factors on farm management and welfare outcomes is not straightforward. In general, I expect that better market access will have positive effects on all final outcomes. The expectations about population density are harder to make a priori but, at relatively low densities, I expect that increasing density will have positive impacts on final outcomes. In the table below, I lay out the relationships that I expect to find. 153 Table 20: Hypothesized relationships between geographic conditions, intermediate outcomes and final outcomes of interest Outcome conditioning factors population market density access intermediate outcomes land output agricultural holding prices wage rates intermediate outcomes cultivated area output prices agricultural wage rate ∩ + - ∩ + farm management choices fertilizer demand gross output ? ? ? ? + + + (+/-) (+/-) welfare outcomes total household income off-farm income ? ? ? ? + - + ns + ns Notes: ∩ indicates an expected inverted U-shaped relationship; + indicates an expected positive relationship; - indicates an expected negative relationship; +/- indicates uncertainty about the direction of impact; ns stands for “not significant”. Signs in parentheses, e.g. (+), indicate uncertainty about whether or not the effect is detectable. 154 3.4 Data 3.4.1 Household panel survey data The farm household data used in this paper come from a nationally representative panel survey of rural smallholder households in Zambia. The panel consists of three waves: 2001, 2004 and 2008. A total of 6,922 households were interviewed in the first round. In the second round, 5,419 households were surveyed, of which 5,358 households were also present in the initial round. In the third round, 8,094 households were surveyed, of which 4,286 households were present in earlier rounds. Unless otherwise indicated, this analysis uses the full set of crosssectional observations in each year. It is important to acknowledge that the nature of the panel survey restricts my analysis to farmers cultivating less than 20 hectares of land. To be sure, the vast majority of farms in Zambia fall into this category and the data I use is representative of rural smallholders in the country. However, to the extent that access to land and intensive versus extensive production strategies are likely to be very different for large farms than for small farms, it is important to acknowledge. Furthermore, there are two other ways in which the survey sample design limits inferences drawn from analysis of these data. First, by focusing on smallholder farming areas, it is possible that the survey is not fully representative of small farms operating in State lands (in which large commercial farms dominate and therefore were not targeted by the survey sample design). Second, the sample design is based on rural areas and therefore excludes urban and peri-urban areas by design. However, some important smallholder growth pathways, such as high-value horticulture production and marketing, are most prevalent in peri-urban areas 155 (Chapoto et al. 2012). Thus, while this analysis is relevant to the majority of the farm sector, we do not observe some important minority categories (large farms, peri-urban farms, and farms within large contiguous State lands areas). Because these minority categories possibly represent some of the most vibrant elements of Zambian agriculture (as suggested by Chapoto et al. 2012), it is important to acknowledge the limitations in scope of the data used in this analysis. 3.4.2 Population density estimates Rural population densities calculated for very broad areas may mask important local variations. Such aggregation bias may drive erroneous analytical conclusions. 33 It is possible that some of the “mystery” of small farms amidst land abundance is simply an artifact of overly coarse measurements of rural population distributions. To counter this possibility, I use high-resolution gridded datasets to characterize the rural population densities of the survey communities. These datasets are from the Global Rural-Urban Mapping Project database (GRUMP; Balk and Yetman 2004) and from the AfriPop project (Tatem et al. 2007, Linard et al. 2011, Linard et al. 2012). Both datasets represent significant improvements over coarser estimates of population density often used in econometric work (e.g. tables of regional population density estimates), for two reasons. First, input statistical data are at fairly high levels of disaggregation (57 Districts, in the case of Zambia). Second, the reported population totals have been further disaggregated spatially to obtain more realistic estimates of local rural densities. The GRUMP 33 A related concept is the ecological fallacy in which misleading inferences about the nature of individuals are deduced from aggregate observations on the groups to which those individuals belong. 156 and AfriPop datasets do this in different ways, based on alternative assumptions. In the GRUMP data, the urban and rural components of population distributions have been disaggregated on the basis of reported populations of urban centers 34 within districts, combined with satellite- derived information on the distribution of urban centers of 5,000 people or more. The input data come from the 2000 census; estimates for 2010 were made on the basis of urban and rural growth rates defined by the Zambian Bureau of Statistics. 35 The resulting population estimates 2 are made for 1km grid cells. The AfriPop dataset uses a similar starting point, i.e. reported population counts for some administrative unit, but takes a different approach to allocating counts within reporting units. Whereas GRUMP only makes an urban-rural distinction, AfriPop defines a probabilistic relationship between settlement locations (defined by classified Landsat imagery) and the distribution of people within reporting units. Ancilliary information on landcover is also used in the AfriPop allocation algorithm. The AfriPop data is based on year 2000 census data at the 34 From the GRUMP website (http://sedac.ciesin.columbia.edu/data/collection/grump-v1): “Urbanized localities are defined as places with 5,000 or more inhabitants that are delineated by stable night-time lights. For poorly lit areas, alternate sources are used to estimate the extent of cities." 35 It is not entirely clear how the growth rates were applied in the case of Zambia. The GRUMP website (http://sedac.ciesin.columbia.edu/data/collection/grump-v1) offers this information about the population projections in the related dataset GPW3 (from which the GRUMP is derived): “The population counts that the grids are derived from are extrapolated based on a combination of subnational growth rates from census dates and national growth rates from United Nations statistics.” Presumably published rates from the Zambian Bureau of Statistics are being used, although neither the exact rates nor their level of definition (e.g. national versus province) is made explicit for Zambia. 157 ward level. 36 2010 estimates are based on growth rates from Zambian Bureau of Statistics, as 2 with GRUMP. The resulting population estimates are made for 100m grid cells. See Tatum et al. (2007) for a fuller comparison of GRUMP and AfriPop. In general, GRUMP may be seen as a more conservative set of disaggregation assumptions than AfriPop. However, the AfriPop assumptions appear very reasonable and validation assessments suggest that these assumptions perform well in practice (Tatem et al. 2007, 2011). The figure below illustrates the differences in GRUMP and AfriPop by showing estimates from each dataset for the same region of northern Zambia, between Solwezi (North Western) and Kitwe (Copperbelt). If the assumptions in AfriPop are correct, and the village locations in the survey dataset are reasonably accurate, then we may prefer the AfriPop dataset, as it will more likely give us local estimates which are closer to the truth. If we doubt the allocation assumptions of AfriPop and/or we doubt the accuracy of our village locations, then we might prefer GRUMP as a less problematic indicator. In my research, a priori my preference is to use the AfriPop data, but I report estimation results based on densities from both datasets. 36 Ward level information was used for all provinces except for Copperbelt, Central, Luapula and Northwestern Provinces, for which district level data were used. (http://web.clas.ufl.edu/users/atatem/pub/Population_data_200511.pdf) 158 Figure 23: Comparison of gridded population datasets, estimates for 2010 159 3.4.3 Other spatial datasets This analysis also incorporates information from several other gridded geospatial datasets. Locally interpolated time-series data on rainfall come from the University of East Anglia’s CRUTS 3.1 Climate Database (CRU 2011; Mitchell and Jones 2005). Length of growing period data come from the GAEZ 3.0 database (Fischer et al. 2000). Definitions of arable land were based on land cover data from GLOBCOVER 2009 (Tóth et al. 2010). Data on historical rural population were taken from the HYDE dataset (Goldewijk et al. 2011). Elevation data were obtained from NASA’s SRTM data (Rodriguez et al. 2005). Data on estimated travel time to urban markets are described in Chamberlin (2012). Details on the spatial and temporal resolutions of these datasets are provided in the table below. 160 Table 21: Gridded spatial datasets used in this study Dataset Description Spatial resolution GRUMP population 0.008333 decimal Temporal resolution & range 2 degrees (about 1km ) AfriPop population 0.0008333 decimal fixed: 2000; 2010 estimates based on (regional?) growth rates degrees (about 100m ) fixed: 2000; 2010 estimates based on (regional?) growth rates monthly: 1901-2009 2 CRU-TS 3.1 rainfall 0.5 decimal degrees GAEZ 3.0 length of growing period 0.08333 decimal GlobCover 2009 land cover classes .00277778 dec. degrees HYDE historical population estimates 0.08333 decimal elevation 90m SRTM 2 fixed: 2000 degrees (about 10km ) 2 fixed: 2009 (about 300m ) 2 yearly: 1901-2000 degrees (about 10km ) 2 fixed 161 The spatial data were merged into the household survey data by using village geographical coordinates to sample the gridded spatial datasets at the corresponding locations. The spatial data come at various spatial resolutions, but the general procedure was to calculate an average value of grid cells within a 10 kilometer radius of the village centroid. This value was then incorporated into the household database (where, by construction, all households within a village share the same values of the geographical variables). 3.4.4 Land inequality The conceptual model for welfare outcomes includes a term measuring the equality of land distributions in different areas. To implement this idea, I define an index of land concentration at the level of a village neighborhood: 𝑁 𝐼 = � 𝑠2 𝑖 𝑖=1 where 𝑠 2 is the squared share of the (sample) total neighborhood land holdings held by farmer 𝑖 𝑖 and 𝑁 is the number of farmers in the neighborhood. 37 A simple way to define the neighborhood would be to use the encompassing District or other administrative boundary. In this study, however, I propose using a more refined measure, based on the three nearest neighbors to any particular village. Because I know where the villages are, I am able to implement this idea using a Geographic Information System. Figure 24 shows the village neighborhoods defined for the Zambia survey locations. (Note: The map is 37 This is an extension of the Herfindahl concentration index. 162 provided mainly to indicate the spatial extent of survey villages and the proximity of villages to neighboring villages. Because neighborhood memberships may overlap (and usually do), there is no easy way to visually distinguish neighborhood boundaries in this map. The important thing to recall about the neighborhood definition is that the neighborhood for any particular village consists of itself plus the next three nearest villages.) 163 Figure 24: Village neighborhood definitions 164 Table 22: Variables used in this analysis category conditioning factors variable population density hours to town units persons/km2 hours p5 p25 p50 p75 p95 mean n 4.6 7.9 14.9 27.2 72.3 22.7 3772 1.2 2.9 5.5 9.3 26.9 7.9 3772 intermediate outcomes cultivated area agricultural wage rate maize wholesale price hectares 0.38 ZMK*1000/day 15 ZMK/kg 435 dependent variables used fertilizer fertilizer application rate maize harvest gross output maize yield per capita gross output per capita total income adult equivalents used animal traction used mech. traction used hired labor 1=yes kg/ha kg ZMK*1000 kg/ha ZMK*1000/AE ZMK*1000/AE AE 1=yes 1=yes 1=yes informal institutions chief kin local years in village 1=yes 1=yes years 0 0 2 0 1 6 0 1 12 1 1 21 1 1 37 50% 3772 83% 3772 14.8 3764 formal institutions % with titled land within State land area % 1=yes 0 0 0 0 0 0 0 0 0 1 3% 3772 8% 3772 labor, technology & capital constraints 165 0.81 25 511 1.25 35 545 2.25 50 638 5.06 80 661 1.92 3772 41 3768 568 3772 0 0 0 1 1 36% 3772 32 79 133 219 400 167 1352 115 460 863 2013 8244 2259 3260 170 548 1,140 2,200 7,168 2,291 3721 284 710 1,268 2,130 4,140 1,610 3260 37 112 231 462 1,305 421 3721 360 1,140 2,306 5,060 20,600 5,800 3772 1.8 3.7 5.1 6.8 9.8 5.4 3772 0 0 0 1 1 36% 3770 0 0 0 0 0 2% 3770 0 0 0 0 1 12% 3770 Table 22 (cont’d) category household characteristics variable female-headed age of head education productive assets own radio number of oxen units p5 p25 p50 p75 p95 mean n 1=yes 0 0 0 0 1 22% 3772 years 32 40 50 62 76 51.5 3702 years 0 3 6 7 12 5.4 3769 ZMK*1000 0 20 232 1,952 17,700 3,494 3772 1=yes 0 0 1 1 1 61% 3772 count 0 0 0 0 5 0.85 3772 production potential average rainfall rainfall variability elevation slope mm CV meters degree other community variables hours hours km ZMK/kg hours to State lands hours to city km to transport fertilizer price 552 3.6 742 0.6 675 4.1 1016 1.0 760 4.6 1133 1.5 877 5.1 1258 2.1 0.0 1.5 5.3 11.6 1.5 4.2 7.2 12.5 0.0 0.5 3.0 8.0 2,100 2,160 2,200 2,400 Notes: ZMK = Zambian Kwacha (2008); AE = adult equivalent 166 1,020 6.9 1458 4.3 776 4.8 1130 1.8 3772 3772 3772 3772 25.4 8.0 28.1 9.7 40.0 8.4 2,600 2,258 3772 3772 3769 3772 3.5 Estimation 3.5.1 Estimation challenges In implementing the conceptual model, there are two main estimation challenges. First, unobserved household heterogeneity (such as different levels of knowledge, know-how or other productive capital) may cause inconsistent parameter estimates if such heterogeneity is correlated with the independent covariates in the model being estimated. A standard approach to addressing this in panel data contexts is through the fixed-effects estimator, which differences away unobserved time-constant variation. However, in our household model specifications, several important household- and community-level controls are time-invariant; a fixed-effects approach would not allow us to examine their impacts. Furthermore, some of the time-varying household-level covariates show little variation over time (e.g. level of education, gender of household head) and are best implemented as time-invariant characteristics. For these reasons, I implement a correlated random effects (CRE) estimator. The CRE approach entails defining a model of the unobserved heterogeneity as a function of the time-averages of the time-varying model covariates; by including these time-averaged variables as additional regressors in the main model of interest, I control for any time-constant unobserved heterogeneity, similar to the fixed-effects (FE) estimator (Wooldridge [2010], following Mundlak [1978] and Chamberlain [1984]). An advantage of the CRE approach over the FE approach is that it allows measurement of the effects of time-constant independent variables, as would a regular random-effects (RE) model. The advantage of the CRE estimator over the RE 167 estimator is that, conditional on the validity of our model, I also controlling for unobserved heterogeneity. A second issue is the likelihood that population density is endogenous to cultivated area decisions and other household level outcomes, via an omitted variables argument. More concretely, it is likely that farm sizes (and other production choices) in a community are partly the result of factors which also affect population density, but which may not be perfectly observable. Many of these factors are likely to be time-invariant. For example, geographical factors related to land productivity will have influenced the historical attractiveness of a given area for settlement, and will also be relevant to local farm production choices. Of course, there may also be time-varying factors which influence both outcomes as well. For example, investments in rural infrastructure may affect the influx of settlers, as well as affect the farm management strategies pursued by local households. However, such causal mechanisms are likely to have very long lags in how they drive changes in population density. Therefore it is reasonable to focus on the time-invariant factors which may be affecting population density outcomes at the community level as well as farm-level outcomes. Because I draw on a rich set of geographical variables (such as rainfall, soil, terrain and access characteristics) I am arguably able to control for the full set of geographical factors which may be driving both population density and household-level outcomes. However, any remaining omitted variable bias will be mitigated by use of the CRE framework, which controls for unobserved time-invariant heterogeneity. 168 Endogeneity issues do crop up elsewhere in my estimation, however. Specifically, cultivated area decisions will be made simultaneously with decisions about whether or not to hire in labor, to use animal traction and/or to use mechanical traction. Since I am interested in the role of these technologies in the cultivated area decision, I must address the endogeneity of these variables in the land area equation. To account for such endogeneity, I employ a control function approach (Wooldridge 2010). This approach entails specifying an auxiliary regression model of technology choices as a function of strictly exogenous household- and communitylevel determinants. The residuals from this model are then incorporated into subsequent regression models, in addition to the population density variable, in order to both test and control for the endogeneity of population density. Compared with instrumental variables approach to dealing with endogeneity, the control function method has the advantage of being more easily implemented in a model where the suspected endogenous covariate enters in multiple ways (i.e. the variable enters normally as well as through one or more transformations such as higher powers, time-demeaned values, etc.). In such cases, the IV approach would require an additional IV for each transformation, whereas the CF approach addresses endogeneity through a single additional covariate, leading to efficiency gains over IV approaches that are often considerable. I instrument the use animal traction with the share of households within the village neighborhood who own oxen (and thus are able to rent out services). Similarly, I instrument the use mechanical traction with the share of households within the neighborhood who own their own tractors. I use the share of non-local households within the area as an instrument for hiring-in decision (based on the assumption that the influx of non-local residents leads to 169 greater local agricultural labor supply). I comment in more detail on these results in the next section. Attrition Bias In panel data studies, attrition bias can be a serious concern. If attrition is systematically related to the household characteristics which appear in the model, then the sample is no longer random and econometric results may be biased (Wooldridge 2010). In the Zambia data, the rate of attrition between each round is about 14%. Using the tests proposed by Baulch and Quisumbing (2011), I find that attrition is non-random, with respect to the covariates used in the main models of interest. (Balch and Quisumbing suggest that the pseudo R-squared from a probit model of attrition can be interpreted as the portion of attrition that is non-random; in my case, about 25% of the attrition rate is systematically explained by household and community level covariates appearing in my estimating equations.) To ensure that my estimation results are robust to attrition bias, I weight the data using the inverse probability of attrition in any round. Conditional on appearing in the first survey round, let use denote the probability of re-interview in in the second round as P2. Similarly, let us denote the probability of re-interview in the third round, conditioned on appearance in the second round, as P3. The probability of being in the full 3-wave panel, conditional on appearing in the first round, is 1/(P2*P3). A minor additional complication is that the Zambia panel data are already weighted by the inverse probability of appearing in the sample in the first round. 38 38 See Megill (2005) for more details on the sampling scheme and weighting recommendations for the Zambia panel dataset. 170 Let us call this initial weighting 1/P1. Then, the probability of being in the full 3-wave panel is 1/(P1*P2*P3). 3.5.2 Estimation framework To operationalize the conceptual framework, I am interested in estimating several different farm management and household welfare outcomes, i.e. the input demand, output supply and household welfare equations. I express these as a set of reduced form equations. Although I do not specify the structural linkages between these equations, I recognize that outcomes are unlikely to be independent of one another. I therefore estimate all the farm management and welfare equations in a Seemingly Unrelated Regression specification, which allows for contemporaneous cross-equation error correlation. To control for attrition bias, using the weights described above, the bootstrap is defined to draw from the attrition-bias-corrected sample. 171 3.6 Results 3.6.1 Descriptive results 3.6.1.1 Rising rural densities Population densities in the villages in the household survey range from 4 to 212 persons per square kilometer, although the majority of villages are in low population density areas. About 75% of the sample households live in areas with densities of less than 20 persons per square kilometer (Table 23). Table 23: Distribution of rural population densities in the household survey villages Province Lusaka North Western Western Central Northern Copperbelt Southern Luapula Eastern National percentile panel households mean 5th 25th 50th 75th 95th n % 1.3 1.3 1.8 4.9 12.6 3.4 96 3% 2.7 5.0 7.2 7.7 8.9 7.1 244 7% 3.5 7.1 8.3 9.4 17.4 9.2 414 12% 5.5 6.9 9.1 17.1 17.5 11.3 383 11% 4.1 10.4 14.7 17.8 22.2 14.1 668 19% 8.9 9.1 10.9 22.6 82.8 21.3 219 6% 8.1 9.5 24.9 33.7 53.7 25.1 403 11% 11.6 14.2 17.9 24.4 62.6 35.5 364 10% 5.5 21.0 43.2 44.7 64.9 35.9 771 22% 4.1 8.3 15.4 24.3 54.9 21.0 3562 100% Source: AfiPop for 2010; density defined as rural persons per square kilometer of land To see how rural densities have been growing over the past two decades, I use district level data from the Zambian Census Bureau, with rural population density estimates for 1990, 2000 172 and 2010. 39 The table below shows provincial summaries of rural population density over this period. (Note: Ndola Urban and Lusaka Urban districts are not included; therefor the rural shares of the provincial population do not include these areas.) Table 24: Rural population density growth and levels, by district Province Central Copperbelt Eastern Luapula Lusaka Northern North Western Southern Western national rural share of population 2010 90% 26% 89% 81% 72% 81% 81% 76% 88% 75% average rural population density 1990 2000 2010 6.9 9.8 12.5 15.6 16.8 20.1 19.9 25.8 33.0 10.9 15.8 20.0 8.2 10.8 16.1 7.2 9.9 13.9 3.5 4.4 5.4 11.1 13.6 17.8 5.3 6.3 7.3 10.2 13.0 16.6 change in density 1990-2010 5.5 4.6 13.1 9.0 7.9 6.7 1.9 6.7 1.9 6.4 growth in density 1990-2010 25% 11% 22% 26% 29% 29% 18% 20% 14% 21% avg annual growth rate 1990-2010 1.3% 0.6% 1.1% 1.3% 1.5% 1.4% 0.9% 1.0% 0.7% 1.1% Note: Based on district-level population data from Zambia Bureau of Statistics (1990, 2000, 2010). Ndola Urban and Lusaka Urban districts are not included in these summaries. These data show that rural densities have been increasing at moderate rates throughout the country. If densities, despite their low levels, are proportionally reflective of land scarcity, and if such scarcity is driving relocation behavior, then we would expect to find that rural population 39 The districts for the different years were slightly different, which required, in some instances, merging multiple districts to get a matching set (e.g. where districts had been created by splitting from an earlier district, both the child and the parent districts had to be re-merged to match corresponding districts from earlier periods). Also, urban and rural population totals were only available for 2010; the rural population shares for earlier periods were established on the basis of the relative urban and rural growth rates (defined nationally) and the districtspecific overall growth rate. Densities were calculated on the basis of the total land area of the district, as calculated within a geographic information system. 173 growth is highest in the low density areas. What we actually observe is a mixed picture: density growth rates are highest in the low density areas, but this relationship is not exceptionally strong and absolute increases in density are largest in the high density areas. To see this more clearly, consider the figure below. The top left panel of the figure shows density growth rates (1990-2010) plotted against density levels (for 1990). The top right panel of the figure shows absolute growth for the same period plotted against levels. These patterns are not strongly suggestive of a convergence in rural densities over time. When density growth is plotted against access to markets (measured in terms of hours to the nearest city of 50,000 or more inhabitants), the patterns are somewhat similar: growth rates increase very moderately with remoteness, but absolute growth in rural densities is greatest in districts with the better access to markets. 174 0 0 growth: 1990-2010 10 20 30 growth rate: 1990-2010 .01 .02 .03 .04 40 Figure 25 Rural population density growth and levels, by district 20 40 rural population density (1990) 60 0 20 40 rural population density (1990) 60 0 0 growth: 1990-2010 10 20 30 growth rate: 1990-2010 .01 .02 .03 .04 40 0 0 10 20 hours to city 30 40 0 10 20 hours to city 30 40 Source: District-level rural population density estimates from Zambia Bureau of Statistics (1990, 2000, 2010). Hours to city = hours travel time to the nearest city of 50,000 or more inhabitants (as of 2000). 175 3.6.1.2 Alternative measures of density As noted earlier, standard measures of rural population density treat land as a homogenous quantity. Thus, standard measures may fail to reflect important differences in land quality. One way to respond to this is by weighting land by some metric of agronomic potential. To implement this idea, I defined an alternative measure of population density which uses the same rural population count as the density numerator, but with a different denominator: I now weight land area by a relative measure of production potential. I use a spatial database on Net Primary Productivity (NPP), measured as the mass in grams of carbon per square meter per year (Zhao et al. 2005). 40 I use an NPP value of 1,000 as a baseline (this is approximately the th value of the 80 percentile of grid cells within the cropland extent of sub-Saharan Africa). Land area in grid cells with NPP values below this value are linearly weighted, such that a grid cell which is half as productive as the baseline has its effective land area reduced by 50%. This has the effect of increasing effective population densities in areas which are characterized by low production potential. 40 NPP is a measure of the rate at which chemical energy is stored as biomass in a given period. Since almost all of this production in terrestrial ecosystems is done by vascular plants, it is a handy proxy for vegetative growth potential in a particular area. 176 Figure 26 Distributions of alternative population density measures 177 The top row of Figure 26 illustrates the different distributions of the conventional and qualityweighted rural density metrics in Zambian SEAs. The conventional measure is shown in the upper left (and is identical to the distribution described in Table 17). The effect of qualityweighting is to increase the effective density for most of the sample: mean density increases from 19 to 24 and median density increases from 13 to 15. Of course, the weighting scheme, while sensible, is fundamentally arbitrary. The purpose of showing the weighted density distribution is not to propose a definitive alternative to measuring rural density, but rather to highlight the fact that conventional density measures may mask important aspects of land pressure. Another approach to defining effective density may recognize that, for any given locality, not all land is available for cultivation (a point emphasized specifically for Zambia by Jayne et al. 2008). This may have to do with environmental reasons -- e.g. marshland or steep slopes unsuitable for agriculture -- or institutional factors such as ownership restrictions. To implement this idea, I use data on classified land cover to restrict the population density calculation to cultivated land only. I use the GLOBCOVER 2009 database (Bontemps et al. 2010) to derive estimates of the amount of land within the survey sample enumeration units which were classified as completely or partially cultivated in the year 2009. 41 Because the amount of arable land can never exceed the total amount of land in a grid cell, the resulting density calculations are as large or larger than the “traditional” calculations using total land area. The distribution of these 41 More concretely, I considered any of the following land cover categories to represent arable land: “Rainfed croplands”, “Mosaic cropland (50-70%) / vegetation (grassland/shrubland/forest) (20-50%)”, “Mosaic vegetation (grassland/shrubland/forest) (50-70%) / cropland (20-50%)” or “Post-flooding or irrigated croplands (or aquatic)”. 178 values in Zambia is shown in the bottom left quadrant of Figure 26. The change in “effective density” by this measure is very large: the mean value is 653 persons per square kilometer of arable land, as compared with 15 persons per square kilometer of total rural land. If we further weight by quality (based on NPP, as described above), the effective density values become higher still, as shown in the lower right quadrant. There are a number of caveats that should be borne in mind when considering these arableonly measures. First, the land cover classification may not be fully capturing actual agricultural land use. Correct classification of agricultural land uses in smallholder systems is generally more difficult than in large-scale systems, especially in tropical areas. The main reason for this difficulty lies in the fine matrix of heterogeneous land cover that typically characterizes small farms with scattered plots. Some farming systems, e.g. banana and coffee systems, are notoriously difficult to distinguish from forest in tropical areas. Furthermore, in low input systems, land under fallow at the time of data collection may be indistinguishable from natural vegetation. The direction of bias is probably to underrepresent the actual agricultural extent. As such, the numbers above might be taken as a lower bound on the actual agricultural area. Again, the point here is to not to propose a definitive alternative measure, but rather to suggest that actual land pressures may be higher – perhaps considerably higher – than conventional population density measures would suggest. 179 3.6.1.3 Shrinking farm sizes, and dwindling fallow Table 25 presents farm household characteristics, including indicators of intensification, across population density quintiles. 42 Average per capita landholdings are quite small throughout the country – less than half a hectare, on average. Landholdings are remarkably consistent across density categories; this may simply be a reflection of the overall low levels of rural population density in Zambia, but may also indicate constraints to land access that are not well represented by rural population density. Similarly, fallow rates are fairly consistent across density categories. The agricultural orientation of survey households is pronounced across the board: agricultural share of household income is about 70% for most of the sample villages, increasing slightly in the highest density quintile. Wage rates, income and value of household assets all show ambiguous patterns across population density quintiles: highest values tend to be at both high and low ends of the population density range. Although these patterns run counter to our expectations, recall that these are unconditional distributions. I control for other factors in the econometric work presented in the following section. What about trends over time? Table 26 shows household-level trends in total holding size, cultivated area, and fallow land between 2001 and 2008. The data are stratified by quartiles of rural population density. These data indicate that farm sizes and fallow are both decreasing 42 The surveyed households are smallholder and medium-scale farm households by definition. That is, households cultivating in excess of 50 hectares are excluded from this sample (although in a very few cases the total land holding size of some of these farms may exceed 50 hectares). 180 everywhere. These patterns appear spatially generalized: there is no clearly observable association between the rates of shrinkage with population density (or with access indicators, not shown here). Thus, a priori, these trends are not clearly being driven by rural densities. However, the rates of decline are pronounced and closely resemble household landholding trends in much more densely population countries (e.g. Jayne et al. 2003). Arguably, if access to land is unconstrained and freely available in customary tenure areas, we might expect to see smaller rates of decline, as newly formed households could obtain new land from customary authorities rather than relying exclusively on subdivision of family farms. 181 Table 25: Smallholder household characteristics by rural population density quintile 2 Quintiles of population density: Landholding per adult equivalent (ha) Share of land in fallow Net farm income per hectare ǂ Net farm income per adult equivalent ǂ Value of productive assets ǂ Household income per hectare ǂ Household income per adult equivalent ǂ Agricultural wage rate ǂ Fertilizer expenditure per hectare ǂ Fertilizer use (share of households) Fertilizer use rate (kg/ha) Marketed % of production Specialization index (1=most specialized) Agricultural share of income 1-7 0.47 0.16 824 319 1882 2798 706 33 86 0.23 44 0.18 0.65 0.68 persons per km 7-10 10-16 16-28 28-751 0.59 0.60 0.53 0.57 0.17 0.19 0.14 0.13 831 820 798 903 430 387 365 469 3844 1304 2658 4348 2697 2384 2821 2193 874 777 754 791 34 31 35 37 99 78 79 122 0.30 0.26 0.27 0.44 52 40 41 64 0.26 0.21 0.25 0.29 0.64 0.58 0.67 0.64 0.70 0.70 0.66 0.75 Note: ǂ Values are 1000s of Zambian Kwacha. 182 2 persons per km of arable land 3-15 15-28 28-65 65-128128-7035 0.54 0.55 0.49 0.60 0.58 0.17 0.18 0.15 0.14 0.15 849 817 793 885 840 384 361 331 462 435 1091 1239 3355 3624 4809 2412 2597 2826 2800 2318 744 737 743 888 800 33 32 34 36 34 70 73 85 130 109 0.21 0.25 0.26 0.41 0.36 36 39 44 68 55 0.21 0.20 0.23 0.29 0.25 0.61 0.58 0.68 0.65 0.66 0.71 0.68 0.65 0.72 0.72 Table 26: Panel household trends in farm size and fallow, 2001-2008 holding size cultivated area fallow area fallow share change in land holding change in fallow 2001 2008 2001 2008 2001 2008 2001 2008 2001-2008 2001-2009 least dense 3.07 2.86 2.23 2.12 0.84 0.69 17% 13% -0.16 -0.11 density 2nd 3.08 2.79 2.26 2.06 0.82 0.66 16% 12% -0.70 -0.54 quartile 3rd 2.61 2.19 2.07 1.82 0.55 0.36 13% 9% -0.63 -0.32 most dense 2.93 2.03 1.95 1.71 0.98 0.29 14% 8% -0.29 -0.19 total 2.92 2.47 2.13 1.93 0.80 0.50 15% 11% -0.46 -0.30 Note: Area values in hectares. Changes in land holding and fallow area are absolute values (i.e. difference in hectares between 2001 and 2008), calculated at the household level. 183 3.6.1.4 Indicators of intensification and extensification Boserupian and induced innovation theory suggests that, as rural population densities increase and farm sizes decrease, we would expect that agriculture becomes more capital- and laborintensive. One way to examine this set of relationships is by looking at factor ratios over population density gradients in our dataset. To begin with, we would expect that capital-land ratios are increasing with density. Although we do not fully observe total capital expenditures in the Zambian survey data, we do observe total fertilizer expenditures, so I use this as a proxy for total expenditure. Figure 27 (a) shows a non-parametric estimate of the ratio of fertilizer expenditures to cultivated farmland as a function of population density. Even at the low density levels characterizing most of the sample, there is a discernible positive relationship. However, given the strong relationship between population density and market access, this may also simply be a market access story: proximity to fertilizer markets will lower the effective cost of fertilizer use. Figure 27 (b) shows the relationship between the same intensification measure as a function of market access (measured as hours travel time to the nearest town of 50,000 or more persons).The negative sloping relationship is what we would expect if the costs of capital inputs are increasing in remoteness at a greater rate than the real price of land. Since land rents are almost certainly decreasing in remoteness, this is what we would expect to see. We would also expect that capital-labor ratios are increasing with density and/or decreasing with remoteness. Since we do not observe actual labor demand in our dataset, I use the 184 number of adult equivalents in the household as a proxy. 43 Figure 28 (a) shows a non- parametric estimate of the ratio of fertilizer expenditures to adult equivalents as a function of population density. The positive relationship is very similar to that of capital-land ratios, showing a positive relationship throughout most of the sample. The negative relationship between capital-labor ratios and remoteness, shown in Figure 28 (b), is what we would expect if costs of capital inputs are increasing in remoteness at a greater rate than the real price of labor. Thus, here again, although intensification may be associated with a land-scarcity story, it is also very consistent with a remoteness story which may have little to do with land scarcity. Finally, theory suggests that labor-land ratios at the farm level are also increasing in population density. Figure 29 (a) shows this relationship, again using adult equivalents as a proxy for labor, and estimating the relationship of this ratio with population density by means of a local polynomial estimator. The relationship is declining over most of the range of densities in the sample. Conversely, the relationship with remoteness is positive, as shown in Figure 29-B. On the face of it, this makes little sense as it suggests that land intensification is inversely related with population density and positively related with remoteness. Digging deeper, however, we can unpack what is happening. Figure 30 shows that farm sizes are increasing in density up to about 40 persons per square kilometer, after which they decline (panel a). Interestingly, household sizes follow a very similar pattern (panel b). To a certain extent, these spatial trends in farm size and family size are mirrored in terms of access gradients (panels c and d). 43 Alwang et al. (1996) say that smallholder labor is almost exclusively family labor. This appears to be changing – I show some numbers later in the essay – but as a generalization it still seems safe to say that the majority of labor in small farm production still comes from family members. 185 Recall that this farm size outcome over the population density gradient is consistent with our hypothesized cultivated area outcomes, which are a result of institutional constraints playing a dominant role in good access/high density areas and technology constraints playing a dominant role in remote/low density areas. Panel a in Figure 31 shows that the probability of survey respondents indicating that arable land is available within the community increases with remoteness. Panel b shows the probability of observing leasehold land within our sample. 44 Under the assumption that the prevalence of non-customary tenure is an indicator of local land rents, this graph indicates that such rents are highest in high access areas. Panels c and d show the usage of mechanical traction and animal traction, respectively. These are both steeply declining in remoteness, suggesting that the availability of these technologies for land expansion is strongly contingent on market access. Taking these various trends into account it can easily be seen that the patterns in labor-land ratios shown in Figure 29 do not represent intensification as much as they do a constrained ability to expand. The decreasing availability of expansion technology is exacerbated by household sizes which are declining in remoteness. This latter observation is worthy of further investigation. It may reflect spatial patterns in demography and/or public health (e.g. greater morbidity in remote areas), or simply that there are greater rates of out-migration from remote rural areas. 44 Recall that this indicator almost certainly underrepresents the true extent of leasehold land in the area because we do not observe large commercial farms and other holders of land under non-traditional tenure. However, under random sampling, I maintain that the observed share of lease holding in the sample is likely proportional to the unobserved total prevalence of leasehold land. 186 These figures provide basic support for the idea that institution constraints to land access are decreasing in access but that technological constraints are increasing. Tables 27 and 28 further substantiate this stylized picture by showing a wider range of indicators by quartile of access. Of course, these are all unconditional distributions. In the next section I turn to econometric analysis which evaluates the determinants and impacts of land access in a more rigorous way. 187 Figure 27 Capital-land ratios over space 180 (a) Capital-land ratios and rural population density thousands of ZMK per hectare 100 120 140 160 95% CI 50th 75th 80 99th percentile 0 20 40 60 80 persons/sq.km 188 100 120 140 Figure 27 (cont’d) 100 (b) Capital-land ratios and market access thousands of ZMK per hectare 80 20 40 60 95% CI 50th 75th 0 99th percentile 0 10 30 20 hours to town 40 50 Note: The graph shows fertilizer expenditures per hectare of cultivated land as a function of population density (Panel A) and of hours to the nearest town of 50,000 or more inhabitants (Panel B). Estimates are from local polynomial regressions which use the Epanechnikov kernel, degree=0, bandwidth=30. The 95% confidence interval is represented by the area shaded grey. In panel A, reference percentile values are shown for the distribution of population densities in survey villages. Household data come from the Supplemental Survey and are averaged over all three years (2001, 2004 and 2008). Population density is from AfriPop 2000. 189 Figure 28 Capital-labor ratios over space 90 (a) Capital-labor ratios and rural population density thousands of ZMK per adult equivalent 60 70 80 50 95% CI 50th 75th 40 99th percentile 0 20 40 80 60 persons/sq.km 190 100 120 140 Figure 28 (cont’d) 40 (b) Capital-labor ratios and market access thousands of ZMK per adult equivalent 30 10 20 95% CI 50th 75th 0 99th percentile 0 10 20 30 hours to town 40 50 Note: The graph shows fertilizer expenditures per adult equivalent as a function of population density (Panel A) and of hours to the nearest town of 50,000 or more inhabitants (Panel B). Estimates are from local polynomial regressions which use the Epanechnikov kernel, degree=0, bandwidth=30. The 95% confidence interval is represented by the area shaded grey. In panel A, reference percentile values are shown for the distribution of population densities in survey villages. Household data come from the Supplemental Survey and are averaged over all three years (2001, 2004 and 2008). Population density is from AfriPop 2000. 191 Figure 29 Labor-land ratios over space 18 (a) Labor-land ratios and rural population density adult equivalents per hectare 10 12 14 16 95% CI 8 50th 0 20 75th 99th percentile 40 60 80 100 persons/sq.km 192 120 140 Figure 29 (cont’d) 25 (b) Labor-land ratios and market access adult equivalents per hectare 20 15 95% CI 50th 75th 10 99th percentile 0 10 20 30 hours to town 40 50 Note: The graph shows adult equivalents per hectare of cultivated land as a function of population density (Panel A) and of hours to the nearest town of 50,000 or more inhabitants (Panel B). Estimates are from local polynomial regressions which use the Epanechnikov kernel, degree=0, bandwidth=30. The 95% confidence interval is represented by the area shaded grey. In panel A, reference percentile values are shown for the distribution of population densities in survey villages. Household data come from the Supplemental Survey and are averaged over all three years (2001, 2004 and 2008). Population density is from AfriPop 2000. 193 Figure 30: Farm size, household size and traction technology use at different levels of market access (a) persons/sq.km 194 Figure 30 (cont’d) (b) persons/sq.km 195 Figure 30 (cont’d) (c) 196 Figure 30 (cont’d) (d) 197 Figure 31: Institutional indicators and traction technology usage at different levels of market access (a) 198 Figure 31 (cont’d) (b) 199 Figure 31 (cont’d) (c) 200 Figure 31 (cont’d) (d) 201 Table 27: Indicators of institutional constraints, by access quartile % reporting: no % with land acquisition channel cultivated local land available titled land rented purchased walked in area (ha) best 65.2% 6.9% 2.2% 7.7% 7.9% 2.11 62.6% 3.4% 0.8% 3.6% 8.6% 2.08 Access 2nd quartile 3rd 51.9% 3.0% 0.6% 3.0% 11.3% 1.97 worst 43.3% 1.6% 0.4% 2.1% 19.7% 1.82 total 55.8% 3.7% 1.0% 4.1% 11.8% 1.99 Table 28: Indicators of technology constraints, by access quartile own hired own hired animal animal mechanical mechanical traction traction traction traction best 22.2% 22.3% 1.2% 2.5% 22.5% 19.9% 0.4% 2.0% Access 2nd quartile 3rd 16.2% 14.5% 0.2% 1.0% worst 10.5% 11.9% 0.2% 0.5% total 17.9% 17.2% 0.5% 1.5% 202 hiredin cultivated labor area (ha) 11.4% 2.11 12.9% 2.08 12.0% 1.97 13.6% 1.82 12.5% 1.99 3.6.2 Econometric results 3.6.2.1 Determinants of farm size Estimation results of the cultivated area model are reported in Table 29. 45 The table shows 2 specifications: (1) treats traction technology choices and hiring-in labor as exogenous, and (2) treats traction and hiring-in as endogenous, using the CF residuals from the reduced form estimates of the traction technology decisions (which are reported in Table 30). Both specifications include additional household and community level variables well as provincial dummies. Year dummies and CRE controls are not reported for conciseness. Coefficient estimates from both models are very similar, indicating that any generalized bias arising from failure to address endogeneity in traction choices is not large. In model (2), the coefficient estimates for the CF residuals are shown at the bottom of the table. These coefficients are not significant, providing further indication that any bias arising from simultaneous decision making about traction use and cultivated area is minor. (Similar results were obtained when using sub-sets of these suspected endogenous variables, e.g. when omitting the hiring-in decision.) Given these findings, my discussion of the other coefficient estimates will not distinguish between models (1) and (2) unless the coefficient estimates are very different. 45 I also considered measures of farm size which include fallow land and land which is rented or borrowed out (i.e. total land controlled, rather than just land cultivated). Unfortunately, data on fallow land were not collected in 2004 and the cost of losing a panel period was deemed to be too high to use this measure. Furthermore, the amount of rented/borrowed out land is mostly negligible, so any differences in farm size deriving from this category are minimal. For these reasons, in this analysis I consider farm size as cultivated land only. 203 Population density is negative, as expected, but not significantly different from zero. Market access (measured here as elsewhere in terms of hours of travel time to the nearest urban center of 50,000 or more inhabitants) has much larger effects: cultivated area tends to increase with distance from markets, although at a declining rate. The turning point (reported at the th bottom of the table) is about 21 hours: about the 90 percentile of the access distribution in the sample. Beyond this point, cultivated area is diminishing in remoteness. The use of animal and mechanical traction is a strong positive determinant, as expected. Using animal traction results in about an additional half a hectare of cultivated area. Using mechanical traction has a more variable effect, depending upon whether it is treated as exogenous or endogenous. Treated as exogenous, use of mechanical traction enables an additional ¾ of a hectare to be cultivated; treated as endogenous, use of a tractor results in an additional 3.5 hectares to be cultivated. The implied direction of bias (downward) is suspect: a priori, we would anticipate that any bias arising from simultaneity would exert a positive influence on the measured relationship. This may be taken as further support for preferring specification (1) over the CF specification in column (2). 46 The hiring-in decision is a strong positive determinant. The coefficient estimate when treated exogenously is about the same as for animal traction. (Recall that this variable only measures the use of technology or outside labor, but not the extent to which such inputs were used.) 46 In initial specification testing, I evaluated several other IV approaches, which coincided closely with the CF results reported here. This suggests that failure of the CF assumptions (which are slightly stronger than standard IV assumptions given the CRE setup and non-linear first stage models) is not a problem. I cannot, however, rule out problems with misspecification of the instruments used in the reduced form estimations. 204 When treated exogenously, the coefficient is negative, which is hard to interpret directly since we are already controlling for household labor and other conditions. Again, because the control function results do not indicate endogeneity, I prefer the specification in column (1). Household size (measured as adult equivalents) is not a significant determinant of cultivated area outcomes, which is surprising given the importance of family labor in farm production in Zambia. The negative coefficient on the agricultural wage rate is consistent with a positive role for labor in area outcomes (both directly - through the costs of hiring in labor- and indirectly as it represents the opportunity costs of family labor); however, this variable is also not significant. Furthermore, none of the institutional variables are significant (share of titled land 47 in the neighborhood , being kin to the chief or local headman, or number of years the household has been in the village). 48 Female-headed households control about 0.2 hectares less than male-headed households, on average. Because nearly a quarter of small farm households in Zambia are headed by females, this finding suggests that efforts to address land constraints must seriously address gendered differences in access. Value of productive assets is the other significant household level factor: an increase of 2% results in an additional hectare of expected cultivated land. 47 As a robustness check, I compared this indicator with alternatives: the share of titled land in the village, and in the nearest 4 villages. Results were mostly invariant to choice of indicator. 48 I evaluated a number of other household- and village-level cultural controls, such as rules of descent (matrilineal versus patrilineal) and tribal affiliations, but found these not to be significant covariates in any of the models of interest and so do not include them in the current specifications. 205 Larger farms are associated with lower rainfall, which suggests that extensification may be a response to poorer agricultural production endowments: areas with lower rainfall tend to be less productive and more land is required in such areas to produce a given amount of surplus. However, larger farms are also associated with lower rainfall variability, which suggests that production risk in rainfed systems may discourage extensification investments. Higher elevation is associated with larger farm sizes, although the reasons for this are ambiguous. One explanation is that higher altitude areas in Zambia are less prone to malaria, tsetse and other diseases that may act as brakes on expansion in lower lying areas. Higher areas are also characterized by cooler temperatures and reduced exposure to water and heat stress, which may make them less vulnerable to some kinds of production risks. Steeper slopes are associated with smaller farm sizes, probably as a result of the greater land preparation investments required to prevent soil degradation in steep areas (e.g. via terracing). 206 Table 29: Determinants of cultivated area density hours to town 2 hours to town animal traction mechanical traction hired labor power household adults wage rate % titled (N) chief kin years in village female head age 2 age education log assets own radio number of oxen rainfall rainfall variability elevation slope CF res: animal CF res: mech CF res: hired labor Access turning point (hours) N Years Estimator (1) (2) cultivated area: cultivated area: no endogeneity controls control function coeff. p-value coeff. p-value -0.0022 (0.181) -0.0027 (0.173) 0.0300 (0.003)*** 0.0348 (0.026)** -0.0007 (0.001)*** -0.0008 (0.021)** 0.4599 0.8869 0.4518 -0.0053 -0.0000 0.7152 0.0200 -0.0024 -0.2400 -0.0006 0.0000 (0.000)*** (0.024)** (0.000)*** (0.869) (0.252) (0.399) (0.720) (0.324) (0.000)*** (0.961) (0.855) 0.6394 4.0687 -0.3000 -0.0021 -0.0000 0.2079 0.0425 -0.0024 -0.2474 -0.0021 0.0000 (0.020)** (0.072)* (0.538) (0.948) (0.158) (0.749) (0.535) (0.264) (0.000)*** (0.837) (0.642) 0.0076 0.0203 0.0325 0.0548 -0.0001 -0.0198 0.0009 -0.0255 (0.407) (0.000)*** (0.569) (0.094)* (0.000)*** (0.442) (0.000)*** (0.151) 21.03 0.0127 0.0201 -0.0035 0.0540 -0.0001 -0.0078 0.0009 0.0073 -0.0867 -1.3841 0.4111 21.75 (0.197) (0.000)*** (0.956) (0.098)* (0.177) (0.804) (0.000)*** (0.816) (0.557) (0.112) (0.112) 6707 2 CRE 6707 2 CRE Note: Dependent variable is cultivated hectares. (N) indicates the average value for the v-village neighborhood, as defined earlier. Estimation of second stage uses a linear CRE. Significance levels, based on cluster robust p-values, are denoted: * p<0.10, ** p<0.05, *** p<0.01. CRE controls, province dummies and time dummy are included but not reported here. 207 Table 30: First stage reduced form models the determinants of animal and mechanical traction usage main avg. oxen (N) % own tractor(N) % local (N) hours to town 2 hours to town % titled (N) chief kin years in village adults female head age 2 age education log assets own radio rainfall rainfall CV elevation slope N Years Estimator (1) probability of animal traction use coeff. p-value 0.4386 (0.000)*** (2) probability of mechanized traction use coeff. p-value 9.3429 (0.000)*** (3) probability of hiring in labor coeff. p-value -0.0152 (0.000)*** -0.0105 (0.188) -0.2302 (0.048)** 0.0189 (0.033)** -0.0004 (0.101) -1.2126 0.2658 0.0018 -0.0053 -0.0684 -0.0151 0.0001 (0.008)*** (0.000)*** (0.361) (0.716) (0.595) (0.105) (0.216) 0.6732 -0.1811 0.0024 -0.0196 0.1325 -0.0093 0.0001 (0.374) (0.031)** (0.538) (0.434) (0.629) (0.619) (0.729) 0.2260 0.0424 0.0027 -0.0056 -0.1667 -0.0226 0.0002 (0.640) (0.350) (0.234) (0.744) (0.259) (0.024)** (0.013)** 0.0110 0.0245 0.0262 -0.0004 -0.0523 -0.0001 -0.1337 (0.059)* (0.000)*** (0.652) (0.000)*** (0.048)** (0.551) (0.000)*** 0.0227 0.0107 0.1774 -0.0001 -0.1828 -0.0001 -0.1091 (0.040)** (0.429) (0.142) (0.001)*** (0.003)*** (0.821) (0.003)*** 0.0599 0.0039 -0.1066 0.0001 0.0012 0.0002 0.0920 (0.000)*** (0.555) (0.089)* (0.000)*** (0.962) (0.103) (0.000)*** 7224 2 Probit-CRE 7224 2 Probit-CRE 7224 2 Probit-CRE Note: Dependent variables are binary indicators of whether or not the household used animal traction or mechanical traction on one or more fields. (N) indicates the average value for the vvillage neighborhood, as defined earlier. Traction use is not mutually exclusive; households may use both or neither. Estimation uses Probit CRE. Significance levels, based on cluster robust pvalues, are denoted: * p<0.10, ** p<0.05, *** p<0.01. CRE controls and time dummy are included but not reported here. 208 Table 31: Determinants of output prices and agricultural wage rates (1) (2) (3) Maize wholesale Agricultural wages: Agricultural wages: price cost to weed 1 ha monthly coeff. p-value coeff. p-value coeff. p-value density 0.6788 (0.003)*** 15.9362 (0.540) -134.1017 (0.005)*** hours to town -0.2670 (0.752) -197.9126 (0.000)*** -374.6562 (0.151) rain 0.0096 (0.162) -1.5295 (0.008)*** -0.3802 (0.862) rainfall -0.0111 (0.288) 2.9174 (0.000)*** -2.5538 (0.327) rainfall variability -4.5453 (0.489) -375.4214 (0.385) 612.9902 (0.667) Copperbelt 95.0336 (0.000)*** 7199.6612 (0.022)** 1.50e+04 (0.102) Eastern -41.3442 (0.004)*** -1.10e+03 (0.560) -4.54e+03 (0.293) Luapula 56.5173 (0.058)* -1.25e+04 (0.000)*** -3.83e+03 (0.470) Lusaka 8.9468 (0.763) 9673.4745 (0.001)*** -333.0502 (0.964) Northern 46.9489 (0.038)** -1.12e+04 (0.000)*** 3583.2824 (0.632) Northwestern 63.6058 (0.069)* -6.43e+03 (0.007)*** 4447.8509 (0.600) Southern -37.6015 (0.028)** -8.53e+03 (0.000)*** 3834.8748 (0.624) Western 113.3372 (0.000)*** 1214.5676 (0.575) -7.41e+03 (0.303) N Years Estimator 2271 3 RE 7489 2 RE 266 3 RE Note: Dependent variables are measured in nominal Zambian Kwacha. Estimation uses Random Effects. Significance levels, based on cluster robust p-values, are denoted: * p<0.10, ** p<0.05, *** p<0.01. Time dummy are included but not reported here. 209 Although we have seen that concerns about endogeneity of traction decisions may not be an issue here, it is still worthwhile to examine the reduced form estimates of using animal or mechanical traction. These results are shown in Table 30. The probability of using animal power is positively conditioned by the number of oxen in the neighborhood. This finding is robust to alternative aggregate measures of oxen ownership. Note that the measure of animal traction I use here does not distinguish between ownership and rental. (Note that we do observe this information in the survey data. However, the main reason for not modeling these choices separately is the difficulty in establishing separate instruments for reduced form equations of both own-animal and hired-animal traction outcomes.) This makes it difficult to determine the exact role of oxen availability in the neighborhood. It could indicate the supply of animal traction for hire, but could also indicate lower costs of ownership (e.g. through local availability of pasture or veterinary services, through the local incidence of disease). As expected, animal traction usage is negatively correlated with distance from market centers (hours to town). Surprisingly, the share of state land in the village neighborhood is a negative determinant (generally, we would anticipate the opposite since leasehold land is more common in high access areas). It is possible that this indicator is picking up the relative availability of draft animals versus mechanical traction services. (Another possibility is unobserved disease vectors: cattle corridor disease has been especially problematic in Central and Southern and other regions which also have large commercial farms and greater incidence of leasehold tenure.) Of the institutional factors, being a relative to the village headman or local chief is positively correlated with animal traction usage, and highly significant. This suggests that there may be some non-market institutional aspects of accessing draft animal power. 210 The probability of mechanized traction usage is similarly positively conditioned by the local prevalence of tractors. Here, however, hours to town is not a significant determinant (although the sign on hours to town is negative as we expected). Given the low numbers of mechanized traction users in our dataset, this may simply reflect limited power of the estimator, rather than the true impact of access conditions. The prevalence of titled lands is positively associated with mechanized traction, which accords with the conjecture above about the relative supply of animal versus mechanized traction services in state lands areas. However, this coefficient is not significant and so any such conjectures remain very tentative. Estimation results for the price models are reported in columns 1 and 2 of Table 31. For wholesale maize prices, I have used a naïve expectations model: the dependent variable (maize price) is the value of the median district maize price observed at the time of harvest in the preceding season. Estimation results, shown in column 1, indicate that maize prices are increasing in population. This is what we would expect with a population of smallholders who are net consumers, on average. None of the other geographical determinants were significant (including, surprisingly, access to markets, which suggests that a different indicator may be better suited to maize price formation processes and/or may be reflective of the fact that we only imperfectly observe maize prices: the prices used are defined at the District level). I show two sets of results for agricultural wage rates: an indicator of the cost of hiring someone to week a 1 hectare plot is the dependent variable in the model reported in column 2; the local monthly agricultural wage rate is reported in the column 3. A priori, we expect that wages are rising in density. We clearly observe this relationship in the monthly agricultural wages (column 211 3), but not in the piece-work labor cost (column 2). Distance from markets is negatively correlated with both measures, which makes sense as the demand for agricultural labor likely declines with distance from commercial farms, which tend to be located in more accessible areas. In the econometric analysis that follows, the wage rate used is the same as the dependent variable in column 2. The main reason for this is the number of observations that we have. Although monthly wage rates were reported for all three years, there are very few observations, and it was not deemed feasible to impute missing values. Hired labor costs for weeding a 1 hectare plot, in contrast, were widely reported at the household level, although only collected for two years. 3.6.2.2 Impacts of farm size The estimated determinants of input demand are reported in columns 1 and 2 of Table 32. I use two measures of input demand: a binary indicator of inorganic fertilizer usage (fertilizer use, shown in column 1), and the application rate measured in kg/ha (column 2). Landholding is a significant and positive determinant of whether or not a farmer uses inorganic fertilizer, although the effect is quite small: an additional hectare increases the probability of use by about 2.5 percent. Holding size is a strong negative determinant with the intensity of use, indicating that smaller farms are applying more per unit area. An additional hectare of land is associated with a reduction of 1.5 kg/ha in inorganic fertilizer application. Together, these findings indicate that while smaller farms are less likely to use fertilizer, when they do use fertilizer, they do so at more intensive levels. 212 Neither wage rates nor output prices have detectable effects on fertilizer demand. Fertilizer price is a significant determinant of adoption (although the partial effect is vanishingly small). The application rate is not significantly influenced by price, however. 49 Better access to markets (i.e. reduced travel time as measured by hours to town) is a strong negative determinant of both fertilizer use indicators. Other factors with notably positive impacts on input demand are increased rainfall (for the application rate only), reduced rainfall variability, and higher elevation. The positive impact of slope on fertilizer adoption may reflect the need to restore soil fertility in areas prone to surface erosion. Important household level controls include education, value of productive assets, and whether or not the household owns a radio, an indicator of access to information. Households with livestock are less likely to use purchased inorganic fertilizer, which may be the result of greater availability of manure as a fertilizer, but may also be an indicator of emphasis on livestock over crop production. Allowing for population density to enter indirectly through the landholding and price variables we may construct a total partial derivative for fertilizer demand with respect to population density (as indicated in equation 4 in the conceptual section). Averaging across the sample, I 49 Note that a linear probability model (LPM) was used for equation 1. A Probit or Logit model is generally preferred for binary outcomes. However, initial specification testing indicated that the linear model works fairly well at approximating average partial effects from the non-linear models. To take advantage of the efficiency gains from systems estimation, it was decided to maintain a linear functional form and include the LPM as part of the SUR system reported here. Similarly, the fertilizer application rate has a pile up of zero-valued observations at the lower end and nonlinear estimators such as Tobit are often preferred in such cases. However, for similar reasons to those just described for the fertilizer use decision, initial specification testing suggested that a linear model gave a reasonable approximation to nonlinear estimator results and so I also opted for including this model in the SUR system as a linear equation. Wooldridge (2010) provides a good discussion of the tradeoffs to using linear models in cases such as these, as well as the efficiency gains accruing to SUR estimation. 213 obtain average total partial effects (ATPEs), which are presented at the bottom of the table. For both fertilizer outcomes, the estimated ATPE is insignificant. This accords with what we have already seen: although cultivated area outcomes are important, their linkage with density as conventionally measured is tenuous. I examine results for the supply response models in column 3 (maize output, measured in kilograms). Maize harvest quantity is strongly conditioned by landholding size: an additional tenth of a hectare is associated with an additional 135 kilograms of harvested maize. The quantity of harvested maize is also positively associated with maize wholesale prices, but not with wage rates or fertilizer prices. Of the household characteristics, education of the household head is the most significant positive determinant. Access does not appear to have direct effects on output supply. Of the other community-level variables, only elevation is a significant determinant; elevation in this case may be capturing some unobserved biophysical production endowments that rainfall alone does not reflect. The average total partial effects of population density on output supply are positive and significant: an increase in density of 10 persons per square kilometer is associated with an additional 119 kgs of maize production. The major channel of this net effect is through the increase in maize output prices. Although not motivated explicitly in our conceptual model, I also show results of maize yields regressed against the same set of explanatory variables used in the output supply model. Landholding size is not a significant determinant of yield. Output prices are positively correlated with yield, however, as we would expect. Interestingly, wage rates and fertilizer prices are both 214 positive determinants of yield. In the case of wage rates, we may understand this as an income channel, rather than an input cost (given the low rates of hired-in labor on small farms). Remoteness is positively correlated with maize yields, which may reflect the importance of maize production and sales in more remote areas (where high value marketing and non-farm income opportunities are scarcer). The average total partial effect of population density is not significant. 215 Table 32: Determinants of input demand and output supply land maize price wage rate fertilizer price hours to town adults female age 2 age education log assets own radio rainfall rainfall CV elevation slope ATPE (PD) N Years Estimator (1) fertilizer use=1 coeff. p-value 0.0227 (0.000)*** -0.0000 (0.906) -0.0000 (0.510) -0.0001 (0.080)* -0.0051 (0.000)*** 0.0022 (0.578) 0.0419 (0.259) 0.0097 (0.000)*** -0.0001 (0.001)*** (2) fertilizer kg/ha coeff. p-value -1.5418 (0.017)** -0.0048 (0.809) -0.0000 (0.700) -0.0017 (0.833) -0.9556 (0.000)*** -0.8422 (0.381) 4.7734 (0.596) 1.9936 (0.002)*** -0.0161 (0.006)*** (3) maize output (kg) coeff. p-value 1347.4 (0.000)*** 1.8751 (0.003)*** 0.0032 (0.275) 0.2366 (0.357) 4.6136 (0.459) -46.200 (0.127) 98.130 (0.729) -0.9893 (0.960) -0.0477 (0.796) (4) maize yield (kg/ha) coeff. p-value 1.7559 (0.809) 0.6342 (0.005)*** 0.0027 (0.009)*** 0.2287 (0.013)** 4.6732 (0.036)** -1.0135 (0.926) 29.180 (0.774) 5.3753 (0.449) -0.0498 (0.452) 0.0217 0.0015 0.0555 -0.0000 -0.0215 0.0005 0.0114 5.4734 -0.3330 2.7989 0.0013 -8.8757 0.0834 -0.4291 57.051 -0.4243 -184.41 -0.0884 -82.250 0.8061 -30.043 32.660 -0.0880 47.760 0.0308 7.7486 0.5340 -5.6661 (0.000)*** (0.394) (0.001)*** (0.574) (0.006)*** (0.000)*** (0.013)** 0.0002 (0.172) 6374 2 SUR-CRE (0.000)*** (0.429) (0.473) (0.536) (0.000)*** (0.000)*** (0.699) -0.0094 (0.435) 6374 2 SUR-CRE (0.000)*** (0.974) (0.133) (0.188) (0.168) (0.004)*** (0.390) 11.853 (0.005)*** 6374 2 SUR-CRE (0.000)*** (0.985) (0.278) (0.200) (0.717) (0.000)*** (0.651) 0.0638 (0.933) 6374 2 SUR-CRE Notes: ATPD = average total partial effect. Significance levels are denoted by: * p<0.10, ** p<0.05, *** p<0.01. Robust clustered standard errors shown in parentheses. 216 Table 33: Factors affecting measures of productivity and household welfare land wage rate maize price hours to town household adults female head age 2 age education log assets own radio rainfall rainfall variability elevation slope ATPE (PD) (1) log per capita gross value of output coeff. p-value 0.2551 (0.000)*** 0.0000 (0.025)** 0.0002 (0.428) 0.0046 (0.011)** -0.1794 (0.000)*** -0.0627 (0.463) 0.0057 (0.344) -0.0001 (0.323) 0.0190 0.0068 0.0880 0.0000 0.0083 (0.000)*** (0.086)* (0.019)** (0.241) (0.629) 0.0006 (0.000)*** -0.0135 (0.216) 0.0025 (0.314) (2) log per capita off-farm income coeff. p-value -0.0303 (0.336) -0.0000 (0.834) 0.0023 (0.013)** -0.0321 (0.000)*** -0.0980 (0.032)** (3) log per capita gross income coeff. p-value 0.1862 (0.000)*** 0.0000 (0.002)*** -0.0001 (0.499) -0.0009 (0.624) -0.1567 (0.000)*** 0.9617 (0.019)** -0.0456 (0.119) 0.0004 (0.129) 0.0072 (0.932) -0.0014 (0.821) -0.0000 (0.858) 0.2480 0.0759 0.6373 0.0004 -0.1902 0.0602 0.0231 0.1122 0.0001 -0.0595 (0.000)*** (0.000)*** (0.000)*** (0.000)*** (0.021)** -0.0022 (0.000)*** -0.0419 (0.426) -0.0010 (0.104) (0.000)*** (0.000)*** (0.003)*** (0.002)*** (0.000)*** 0.0000 (0.973) 0.0030 (0.781) -0.0006 (0.286) N 7304 7304 7304 Years 2 2 2 Estimator SUR-CRE SUR-CRE SUR-CRE Notes: Cluster robust p-values; significance indicated by: * p<0.10, ** p<0.05, *** p<0.01. Time and provincial dummies not shown. CRE controls not shown. ATPD = average total partial effect. Robust p-values are shown in parentheses. Significance levels are denoted by: * p<0.10, ** p<0.05, *** p<0.01. 217 I now turn to productivity and welfare outcomes. Table 33 shows welfare outcomes that may also be interpreted as labor productivity measures. There are three outcome variables: log per capita gross farm output, log per capita off-farm income, and log per capita gross household income. The first is restricted to the value of farm production. The second is restricted to the value of non-farm, non-agricultural income, and includes (non-farm) wage labor, business income and cash remittances. The third includes farm production plus off-farm agricultural and non-agricultural income. (All variables are normalized by the number of adult equivalents in the household.) Cultivated area and the agricultural wage rate are significantly positive determinants of labor productivity (model 1). Maize prices, however, are not significant, perhaps because they imperfectly represent the variation in the full set of output prices facing producers. Interestingly, distance from town is positively associated with productivity. Of the household factors, education is an important positive determinant: an additional year of formal education is associated with a 2% increase in labor productivity. Productive assets are important positive determinants of income, as expected, as is radio ownership. Determinants of log per capita off-farm income are shown in column 2. Neither land nor wage rates are important determinants. Maize wholesale price is a positive and significant determinant. This result is difficult to interpret causally; a more plausible explanation is that maize prices are correlated with unobserved non-farm opportunities. It is possible, for example, that areas with relatively large non-agricultural sectors have larger net demand for maize. Remoteness has a strong negative effect on off-farm income: each hour from town is 218 associated with a 3.2% decline in per capita off-farm earnings. Surprisingly, female-headed households have greater off-farm earnings in per capita terms. Closer inspection of the data indicates that this is likely due to two factors: the smaller average sizes of female-headed households (which is exacerbated by adult equivalent weighting, which further deflates the size of such households) and relatively larger amounts of cash remittances (especially in 2008). Other important household-level determinants are more readily interpretable. Education and assets are strong positive correlates of off-farm income. Income is also higher in areas of better rainfall and lower rainfall variability. Determinants of log per capita gross income are shown in column 3. Both cultivated area and wage rates are important positive correlates. An additional hectare of land results in an expected gain of nearly 19% in per capita gross income. Neither maize wholesale price nor distance from town has significant effects. One way of interpreting the access result is to recall the role of access in models 1 and 2: gross output is increasing in distance while off-farm earnings are decreasing in distance. These effects tend to counterbalance one another in log per capita gross income, which is comprised of both income sources. Of the other householdlevel determinants, education and assets are pronounced positive determinants. Gross income is also higher in areas of better rainfall and lower rainfall variability. 3.6.2.3 Alternative population density measures As mentioned earlier, it is possible that standard density measures are not representative of actual land scarcity conditions because they fail to address differences in land quality. One way that I have addressed this is by using a rich set of geographical controls for local biophysical 219 production endowments in the regression specifications shown thus far. I now extend my evaluation of this issue by evaluating the robustness of the landholding model results to alternative density measurements. In Table 34, I repeat the baseline results in column 1 and compare these to results for specifications using alternative measures. In column 2, density is defined as persons by quality-weighted land area, as described earlier in the descriptive results section. (Recall that NPP values are used to weight land in terms of latent productivity.) In column 3, I use density defined as persons per km of arable land only (where arable is defined as land currently under cultivation; see the footnote on page 178 for details.). In column 4, I use this arable-land density further weighted by quality. Results are very similar regardless of which density measure is used: none of the coefficient estimates vary much in magnitude, and the overall fit across corresponding model pairs does not change much. The quality-weighted density measures do tend to have slightly more significant coefficient estimates than the un-weighted measures, which would be what we would expect if the weighted measures are more accurately reflecting effective land scarcity. However, none of these measures is significant at conventionally acceptable levels. Similar results were obtained from using weighted density measures in the other models of interest. Because the overall story does not change much as a result of using the weighted density, I do not present those results here. My tentative conclusion is that our understanding of the role of population density and landholding outcomes is not very sensitive to alternative density measures. I emphasize that this is a tentative conclusion, however, because there are many other possible approaches to 220 defining arable land and to establish quality-weighting schemes. Furthermore, I may not be capturing important factors related to institutional land use restrictions, i.e. factors which may enter into the ratio of people to available agricultural land. 221 Table 34: Cultivated area results with alternative definitions of rural population density (1) all land density hours to town 2 hours to town % titled (N) animal traction mech. traction hired labor chief kin years in village rainfall rainfall CV elevation slope adults wage rate female head age 2 age education log assets own radio oxen owned Copperbelt Eastern Luapula coeff. -0.0022 0.0300 -0.0007 p-value (0.181) (0.003)*** (0.001)*** (2) all land (weighted) coeff. p-value -0.0017 (0.140) 0.0297 (0.004)*** -0.0007 (0.001)*** 0.7152 0.4599 0.8869 0.4518 0.0200 -0.0024 -0.0001 -0.0198 0.0009 -0.0255 -0.0053 -0.0000 -0.2400 -0.0006 0.0000 (0.399) (0.000)*** (0.024)** (0.000)*** (0.720) (0.324) (0.000)*** (0.442) (0.000)*** (0.151) (0.869) (0.252) (0.000)*** (0.961) (0.855) 0.6996 0.4603 0.8868 0.4519 0.0200 -0.0024 -0.0001 -0.0207 0.0009 -0.0257 -0.0053 -0.0000 -0.2402 -0.0006 0.0000 (0.411) (0.000)*** (0.024)** (0.000)*** (0.719) (0.333) (0.000)*** (0.425) (0.000)*** (0.146) (0.869) (0.251) (0.000)*** (0.961) (0.857) 0.5634 0.4551 0.8818 0.4285 0.0279 -0.0029 -0.0001 -0.0253 0.0009 -0.0298 -0.0048 -0.0000 -0.2318 -0.0021 0.0000 (0.520) (0.000)*** (0.025)** (0.000)*** (0.619) (0.240) (0.000)*** (0.341) (0.000)*** (0.095)* (0.883) (0.234) (0.000)*** (0.862) (0.742) 0.5632 0.4551 0.8819 0.4285 0.0279 -0.0029 -0.0001 -0.0254 0.0009 -0.0298 -0.0048 -0.0000 -0.2318 -0.0021 0.0000 (0.520) (0.000)*** (0.025)** (0.000)*** (0.618) (0.241) (0.000)*** (0.340) (0.000)*** (0.095)* (0.883) (0.234) (0.000)*** (0.862) (0.742) 0.0076 0.0203 0.0325 0.0548 -0.1476 0.0142 -0.0058 (0.407) (0.000)*** (0.569) (0.094)* (0.354) (0.921) (0.963) 0.0074 0.0204 0.0325 0.0548 -0.1511 0.0269 -0.0085 (0.414) (0.000)*** (0.569) (0.094)* (0.343) (0.853) (0.946) 0.0092 0.0207 0.0383 0.0543 -0.1343 -0.0601 -0.0273 (0.318) (0.000)*** (0.510) (0.097)* (0.417) (0.649) (0.831) 0.0092 0.0207 0.0383 0.0543 -0.1323 -0.0604 -0.0276 (0.318) (0.000)*** (0.510) (0.097)* (0.424) (0.647) (0.829) 222 (3) arable land coeff. -0.0000 0.0358 -0.0008 p-value (0.565) (0.001)*** (0.001)*** (4) arable land (weighted) coeff. p-value -0.0000 (0.506) 0.0359 (0.001)*** -0.0008 (0.001)*** Table 34 (cont’d) (1) all land Luapula Lusaka Northern Northwestern Southern Western N Years Estimator coeff. -0.0058 -1.0541 0.0626 -0.1976 -0.6919 -0.7952 6707 2 CRE p-value (0.963) (0.000)*** (0.518) (0.130) (0.000)*** (0.000)*** (2) all land (weighted) coeff. p-value -0.0085 (0.946) -1.0495 (0.000)*** 0.0590 (0.542) -0.1964 (0.133) -0.6883 (0.000)*** -0.7909 (0.000)*** 6707 2 CRE (3) arable land coeff. -0.0273 -1.0408 0.0315 -0.1983 -0.7280 -0.8055 6561 2 CRE p-value (0.831) (0.000)*** (0.757) (0.154) (0.000)*** (0.000)*** (4) arable land (weighted) coeff. p-value -0.0276 (0.829) -1.0396 (0.000)*** 0.0309 (0.762) -0.1984 (0.154) -0.7278 (0.000)*** -0.8051 (0.000)*** 6561 2 CRE Notes: Robust p-values shown in parentheses. Significance levels denoted by * p<0.10, ** p<0.05, *** p<0.01. 223 3.6.2.4 A closer look at landholding This section addresses the question of whether or not farms of different sizes face the same constraints in expanding. In my baseline analysis, which looked at the determinants of holding size across the entire sample, I obtained the several general results: after correcting for endogeneity, landholding is negatively associated with population density, even at the relatively low population densities found in rural Zambia. Household size and the availability of animal traction were very strong determinants of farm size, signaling the importance of labor availability and technology constraints. I also found that social capital variables and tenure status were significant determinants, confirming that local institutions play a very important role in household landholding outcomes. In this section, I look more closely at the determinants of landholding by using a quantile regression approach to estimate the same model. This should help to understand how population density and institutional conditioners may affect landholdings of different sizes. th Results are shown in Table 35. The 5 columns show quantile regression results for the 10 , th th th th 25 , 50 , 75 and 90 percentiles of landholding, respectively. In all regressions, I use the same set of regressors used in the model reported on in Table 29. Results indicate that population density is important for outcomes across most of the distribution, but is of most importance at the lower end of the farm size distribution. One interpretation of this result is that population density may be more indicative of actual land 224 constraints for the smallest farmers than it is for medium scale farmers. Further unpacking this idea, it is possible that the labor/land ratios implied by population density measures are mediated through institutional (or other) factors which affect small farmers differently than they affect medium scale farmers. The role of access appears to be fairly consistent across the distribution: farm sizes increase with remoteness up to about 20 hours from town, after which they decline. This is consistent with the different factors constraining agricultural expansion, as laid out in the conceptual framework. Animal traction is important everywhere, although mechanical traction only th th approaches significance in the 25 and 50 percentile range. These results confirm the importance of devising ways to overcome labor and capital constraints related to expansion. Of the institutional indicators, the local share of leasehold land is a negative determinant everywhere, but impacts of this indicator are largest and most significant for the larger farm sizes. This is interesting as it suggests that, to the extent that the prevalence of leasehold tenure is an indicator of unobserved land rents, such increased valuation affects larger smallholders more than the smallest farmers. Relationship to the village headman or local chief is a strong positive correlate of holding size for the smallest holders. This is consistent with the observation that social capital is often important in negotiating customary mechanisms of land access by small farmers (e.g. Sitko 2010). Furthermore, the diminished importance of kinship in larger holding size outcomes is consistent with the idea that outsiders with sufficient capital may gain non-local land acquisition – to the extent that such acquisitions are visible at all in our data, they are likely to 225 be at the larger end of the holding spectrum. Another way to interpret this result is that small farmers without locally-relevant social capital may find it difficult to obtain access to land in communities where they are not local. Such factors appear to play smaller roles at higher holding sizes and, presumably, for wealthier would-be migrants. Female headed households are more likely to have smaller holdings in all parts of the distribution, as we saw in the baseline model. Landholding increases at a diminishing rate with age of household head, which likely indicates a life-cycle effect: farmers are able to expand as their farming experience and household assets increase (including household labor availability); however, as children leave to start their own households, farms become smaller. This effect is most pronounced at the smaller holding sizes; it is not significant for the largest holdings, which may explain why this effect was not detectable in the baseline results which did not distinguish between holding sizes. The idea of distributional equality is perhaps also of importance to landholding outcomes. Adding the land concentration index as an additional regressor in the quantile regressions th yields the estimates shown in Table 36. The land concentration index is negative for the 25 th th and 50 percentiles, but only significant at the 25 percentile. This provides tentative evidence that more uniform local land distributions are associated with marginally larger holdings in the lower-middle of the farm size distribution. The other coefficient estimates do not change significantly after introducing the concentration term, which suggests that the impacts of land concentration on household outcomes is not closely related to the controls already in the model. 226 To be sure, this is a tenuous result. Further specification testing may help to refine the measure. Additionally, an important limitation of the present dataset is that we only observe farm sizes within the small and medium farm sector; larger farms within the area are not observed. Therefore, our measure of land concentration is almost certainly downward biased. Furthermore, aside from detection issues, the question of whether or not we can ascribe a causal role to a land concentration index is debatable. The main reason for this is that local land concentration is probably endogenous to household-level landholding outcomes. These caveats aside, the results shown above are suggestive of deeper structures of land access wherein unequal access to assets, or to the institutions which allocate access, strongly condition outcomes. 227 Table 35: Quantile regression results for household landholding size (1) (2) (3) (4) 10 coeff. p-value -0.0009 (0.212) 0.0094 (0.028)** -0.0002 (0.058)* 25 coeff. p-value -0.0019 (0.000)*** 0.0117 (0.011)** -0.0003 (0.003)*** 50 coeff. p-value -0.0023 (0.002)*** 0.0132 (0.008)*** -0.0003 (0.001)*** 75 coeff. p-value -0.0025 (0.037)** 0.0200 (0.012)** -0.0005 (0.004)*** 95 coeff. p-value 0.0003 (0.838) 0.0155 (0.068)* -0.0004 (0.089)* 0.3210 0.2056 0.2129 -0.0091 -0.0000 -0.0783 0.0454 0.0005 -0.1818 0.0113 -0.0001 (0.000)*** (0.081)* (0.000)*** (0.247) (0.594) (0.700) (0.007)*** (0.593) (0.000)*** (0.040)** (0.023)** 0.4200 0.3000 0.2642 0.0063 0.0000 -0.6680 0.0753 0.0010 -0.2617 0.0047 -0.0000 (0.000)*** (0.425) (0.000)*** (0.652) (0.548) (0.016)** (0.024)** (0.475) (0.005)*** (0.499) (0.539) 0.5792 0.5404 0.5636 -0.0045 -0.0000 -0.7040 0.0097 -0.0018 -0.4019 0.0178 -0.0001 (0.000)*** (0.360) (0.000)*** (0.851) (0.401) (0.017)** (0.838) (0.318) (0.003)*** (0.044)** (0.100) 0.5711 2.0946 0.9514 0.0057 0.0000 -1.2451 0.0172 -0.0031 -0.5948 0.0052 -0.0000 (0.002)*** (0.274) (0.003)*** (0.869) (0.985) (0.098)* (0.801) (0.454) (0.009)*** (0.689) (0.900) 0.0048 0.0122 0.0409 0.0449 -0.0000 -0.0032 0.0004 0.0061 (0.158) (0.000)*** (0.246) (0.146) (0.013)** (0.738) (0.000)*** (0.506) 0.0056 0.0081 0.0599 0.0654 -0.0001 -0.0394 0.0005 -0.0022 (0.239) (0.029)** (0.170) (0.012)** (0.000)*** (0.003)*** (0.000)*** (0.854) 0.0187 0.0122 0.0911 0.1589 -0.0002 -0.0121 0.0010 -0.0254 (0.061)* (0.032)** (0.222) (0.003)*** (0.000)*** (0.579) (0.000)*** (0.071)* 0.0337 0.0196 0.0709 0.1709 -0.0002 0.0314 0.0013 -0.0565 (0.004)*** (0.056)* (0.457) (0.114) (0.000)*** (0.379) (0.000)*** (0.118) th density hours to town 2 hours to town animal traction 0.2819 (0.000)*** mechanical traction 0.1409 (0.044)** hired labor power 0.1540 (0.001)*** household adults -0.0014 (0.871) wage rate 0.0000 (0.772) % titled (N) -0.0900 (0.723) chief kin 0.0572 (0.006)*** years in village 0.0017 (0.043)** female head -0.1706 (0.000)*** age 0.0135 (0.006)*** 2 -0.0001 (0.002)*** age education log assets own radio oxen owned rainfall rainfall CV elevation slope Turning point: access (hours) -0.0013 0.0083 0.0716 0.0095 -0.0000 0.0045 0.0003 0.0011 19.92 (0.691) (0.004)*** (0.011)** (0.741) (0.007)*** (0.592) (0.000)*** (0.899) th th 19.23 20.60 228 th 22.29 (5) th 21.45 Table 35 (cont’d) (1) th N Years 10 coeff. p-value 6707 2 (2) (3) th th 25 coeff. p-value 6707 2 50 coeff. p-value 6707 2 (4) (5) 75 coeff. p-value 6707 2 95 coeff. p-value 6707 2 th th Note: Province and year controls included in model but not reported here. CRE controls (time averages of all time-varying covariates) included but not reported here. Clustered p-value in parentheses: *** p<0.01, ** p<0.05, * p<0.1 229 Table 36: Quantile regression results for household landholding size, including land concentration index as additional regressor (1) th 10 coeff. density -0.0009 land concentration (N) 0.0146 hours to town 0.0095 2 -0.0002 hours to town animal traction mechanical traction hired labor power household adults % titled (N) chief kin years in village female head age 2 age education log assets own radio oxen owned rainfall rainfall CV elevation slope Turning point: access (hours) (2) (3) th th (4) th (5) th p-value (0.105) (0.923) (0.015)** (0.048)** 25 coeff. p-value -0.0019 (0.003)*** -0.3088 (0.025)** 0.0128 (0.005)*** -0.0003 (0.002)*** 50 coeff. p-value -0.0029 (0.004)*** -0.2522 (0.277) 0.0140 (0.025)** -0.0004 (0.006)*** 75 coeff. p-value -0.0029 (0.051)* -0.1755 (0.615) 0.0207 (0.002)*** -0.0005 (0.000)*** 95 coeff. p-value -0.0001 (0.981) 0.6241 (0.335) 0.0125 (0.173) -0.0003 (0.118) 0.2784 0.1468 0.1325 -0.0010 -0.2467 0.0636 0.0017 -0.1730 0.0128 -0.0001 (0.000)*** (0.227) (0.035)** (0.907) (0.270) (0.000)*** (0.039)** (0.000)*** (0.001)*** (0.001)*** 0.3175 0.1811 0.2049 -0.0085 -0.1540 0.0513 0.0007 -0.1799 0.0112 -0.0001 (0.000)*** (0.115) (0.000)*** (0.410) (0.443) (0.006)*** (0.514) (0.000)*** (0.062)* (0.072)* 0.4296 0.3534 0.2743 0.0084 -0.6745 0.0849 0.0005 -0.2750 0.0071 -0.0001 (0.000)*** (0.130) (0.000)*** (0.526) (0.029)** (0.011)** (0.777) (0.000)*** (0.286) (0.353) 0.5757 0.5595 0.5602 -0.0032 -0.7019 0.0317 -0.0005 -0.4069 0.0204 -0.0002 (0.000)*** (0.386) (0.000)*** (0.872) (0.097)* (0.502) (0.761) (0.000)*** (0.041)** (0.067)* 0.5923 2.1253 0.9794 0.0009 -1.3173 0.0331 -0.0032 -0.6598 0.0077 -0.0000 (0.000)*** (0.200) (0.000)*** (0.977) (0.149) (0.694) (0.412) (0.000)*** (0.680) (0.849) -0.0022 0.0095 0.0709 0.0114 -0.0000 0.0055 0.0004 0.0044 (0.537) (0.000)*** (0.001)*** (0.680) (0.109) (0.577) (0.000)*** (0.538) 0.0038 0.0119 0.0447 0.0431 -0.0000 -0.0026 0.0004 0.0056 (0.292) (0.002)*** (0.076)* (0.048)** (0.012)** (0.833) (0.000)*** (0.545) 0.0055 0.0073 0.0619 0.0626 -0.0001 -0.0368 0.0005 -0.0010 (0.362) (0.137) (0.019)** (0.010)*** (0.000)*** (0.029)** (0.000)*** (0.920) 0.0177 0.0118 0.0703 0.1619 -0.0002 -0.0122 0.0010 -0.0225 (0.035)** (0.048)** (0.130) (0.012)** (0.000)*** (0.635) (0.000)*** (0.205) 0.0349 0.0171 0.0603 0.1604 -0.0003 0.0436 0.0014 -0.0559 (0.017)** (0.035)** (0.510) (0.226) (0.000)*** (0.324) (0.000)*** (0.102) 19.92 19.23 20.60 230 22.29 21.45 Table 36 (cont’d) (1) th N Years 10 coeff. p-value 6711 2 (2) (3) (4) (5) 50 coeff. p-value 6711 2 th 75 coeff. p-value 6711 2 95 coeff. p-value 6711 2 th 25 coeff. p-value 6711 2 th th Note: Province and year controls included in model but not reported here. CRE controls (time averages of all time-varying covariates) included but not reported here. Clustered p-value in parentheses: *** p<0.01, ** p<0.05, * p<0.1 231 3.7 Discussion 3.7.1 Farm growth and market access Policies and investments aimed at increasing farm productivity are best understood when they are contextualized within the institutional, infrastructural and other conditions that govern market access. In their study of smallholder growth in Zambia, Chapoto et al. (2012) make the general point that Productivity gains enable farmers to generate surpluses for sale and reduce unit production costs. Market access provides the conduit for monetizing productivity gains, permitting household specialization and kick starting the structural transformation process. Yet one component without the other will not suffice. Productivity gains without markets lead to temporary production surges and price collapses. Markets without increased farm productivity remain moribund, with farm households unable to generate surpluses for sale at competitive prices. (p.3) They further note that, in the case of Zambia, successful growth trajectories have tended to follow certain patterns which are strongly conditioned by market access conditions of different locations (as well as by household endowments, etc.). For example, horticulture marketers are located in districts near major urban centers, which also tend to have better access to water and other infrastructure. Geographical context also matters for successful maize marketers, but in a different way: growers located closer to FRA buying stations are more likely to expand commercialized maize output. FRA buying stations are located throughout the maize producing areas of the country, but tend to be located in rural centers. Access to reliable rural road 232 infrastructure and relative proximity to rural market towns is of similar importance to other kinds of staples and non-perishable cash crop marketing. Others have made similar assessments (e.g. Alwang et al. 1996, Seigel 2008). Rural areas relatively well articulated with the main transportation arteries and close to major border crossings may also be wellpositioned to supply food demand in the high density areas of southern DRC, southern Tanzania, Malawi and Zimbabwe (Haggblade et al. 2009). Chapoto et al. (2012) define two predominant pathways out of the low-input, low-output subsistence traps, both of which critically involve increased market orientation. The “high road” involves focusing on high-value agriculture, such as horticulture, dairy and poultry. This trajectory, while only accessible to a small share of farmers, allows for fairly rapid income growth and asset accumulation. It also does not require land expansion. It does, however, require good access to markets, infrastructure and supporting services. The “low road” is lesssteep, i.e. more accessible, but takes longer to reach the same asset accumulation outcomes. This road is characterized by gradual expansion of production and marketing of lower-value cash crops, such as cotton. The availability of land for expansion does play a role in this pathway (as do viable input markets and supporting services, etc.). Even for the majority of farmers for whom agricultural market participation is unlikely to lead to radical wealth accumulation, the ability to generate production surpluses is critical to their participation in the agricultural transformation process (through which this majority will eventually exit the agricultural sector.) For many in this group, addressing constraints to land expansion is important and will likely become more so as population densities and the outside demand for land both increase. 233 Figure 32: Market participation and travel time to city of 50,000 or more inhabitants Although market participation levels among Zambian smallholders are quite low on average, the participation that does take place is strongly conditioned by access to markets. Figure 32 shows the relationships between market participation for various commodities and travel time to the nearest town of 50,000 people or more. Unsurprisingly, the probability of engaging in high-value markets – such as horticulture or milk – drops precipitously with distance. The market participation for maize also drops relatively quickly, although the curve for staples in general is much flatter. The marketed share of value of production, aggregated across all crops, mirrors these patterns. Many other indicators of engagement with input and output markets (not shown here) also adhere to this general pattern: with increasing remoteness, farmers use less fertilizer and other inputs, have production portfolios which are tilted more heavily toward 234 staples production, and satisfy more of their consumption needs through own-farm production rather than through the market. Exploratory analysis did not reveal any clearly defined thresholds in access gradients that were common to multiple marketing outcomes. In other words, market participation does not change abruptly at some particular distance from market, but rather declines continuously with distance (a result which is robust to choice of market access indicator). Nonetheless, based on the narrative descriptions of typical marketing-based growth pathways in Chapoto et al. (2012), I define three market access regimes: (a) accessible areas, defined as being within two hours travel of the railway and associated road transport corridor and/or within two hours of a city of 50,000 or more inhabitants; (b) semi-accessible areas, defined as being within 4 hours of a rural market town of 20,000 or more (but not meeting the criteria class a); and (c) remote areas, encompassing everywhere else. High-value market participation is probably only feasible within the accessible areas. Marketing of staples and low-value cash crops is viable under a broader range of conditions, and is the strategy with comparative advantage in the semi-accessible areas. Market participation will be most constrained in the remote areas, but staples marketing for local and regional consumption will be viable in many areas. I use this set of generalized market access environments as a framework for considering the role of land access in the growth processes accessible by famers in different areas. The growth trajectories associated with the different environments, along with the role of land in these trajectories, is laid out in the table below. Growth trajectories which are least dependent upon land expansion are mostly confined to the accessible areas. Promotion of intensification in 235 these areas, linked with appropriate market opportunities, will have higher chances of success given the lower costs of inputs and accessing supporting services, such as extension and credit markets. Attention to land may still focus on the very bottom end (i.e. preventing landlessness), but otherwise may be secondary to a focus on growth pathways that do not involve extensification. Removing constraints to expansion in semi-accessible and remote areas are a more important policy focus. Enhancing access to animal traction appears to be particularly important, especially for those farming less than 4 hectares: under hand hoe tillage and family labor a farm manager cannot typically farm more than 2 hectares; with animal traction this limit expands to about 4 hectares. Farmers in all regions of the country are mostly well below these farm sizes (Table 38), the enhanced access to animal traction will still be valuable. Policy options aimed at easing this constraint include investments in animal health research (tsetse and cattle corridor disease have taken large tolls on animal populations), the provision of veterinary services, and credit for animal acquisition. Another set of interventions may focus on enhancing household labor productivity through the health care investments aimed at reducing the burdens imposed by chronic and epidemic diseases (malaria, HIV). Promotion of labor-saving production technologies, such as minimum tillage, may also have positive impacts on this constraint (although such approaches have their own sets of constraints; see Haggblade and Tembo 2003). The continued expansion of rural infrastructure should expand the boundaries of the areas within which market participation is feasible. As Jayne et al. (2008) point out, the majority of rural producers are unable to benefit from a focused provision of rural infrastructure to farm 236 blocks and commercial farm areas. Expansion and extension of such investments to what are currently non-commercial areas is important. Furthermore, the role of rural town development is often missing in discussions of rural accessibility, which are often focused on roads. However, the growth of rural towns is linked with the expansion of rural non-farm economic activity and demand for local labor. The linkages between agriculture and expansion of the non-farm sector are well articulated (Haggblade et al. 2007). While investments in infrastructure and rural town development are costly and involve long time lags before their impacts are felt. Nonetheless, a long view of the market-based framework in Table 37 will recognize that the boundaries between these regimes are dynamic: the relative share of remote production environments should decrease as rural infrastructure expands and new urban centers develop from smaller towns. 237 Table 37: Targeting land constraints Access conditions Growth trajectories Role of land expansion in growth process Primary constraints to land expansion Policy/investment priorities Good access: within an hour of a major transport corridor, and less than 3 hours from an urban center Specialization in horticulture and other high-value perishables Of secondary importance High and unequally distributed costs of accessing leasehold land Delivery of extension services Intermediate access: more than 3 hours from urban centers; close to rural towns and allweather roads Expansion of market oriented production Remote areas: far from towns and lacking yearround road access Expansion of staples and nonperishable cash crops Increasing emphasis on nonfarm employment Targeted input and credit markets Human capital development Fragmented holdings Expansion especially important for emergent farmers in transition to larger commercial production Fragmented holdings Availability of animal traction Seek scale economies in provision of mechanized power to producer organizations Of primary importance Availability of animal traction Promote access to oxen; animal disease R&D; expansion of veterinary services Cost of mechanized traction Fertilizer cost and availability Promote consolidation of holdings (possibly via enhanced participation in leasehold markets) Development of labor saving technologies Input and credit market development 238 Table 38: Share of farms by category of holding size access conditions accessible semi-accessible remote % farming % farming % farming % farming % farming < .5 ha .5-1 ha 1-2 ha 2-4 ha 4+ ha 16% 24% 30% 20% 11% 15% 23% 32% 19% 11% 15% 23% 32% 20% 10% It is instructive to use this framework for examining the sources of growth in smallholder productivity over the last decade. The table below (Table 39) shows the results of a diagnostic regression where the dependent variable is change in log per capita gross farm income between 2004 and 2008. The independent variables all measure changes taking place over the same period. To track sources of growth, measures include: per capita land holding, per capita fertilizer expenditure and binary variables measuring whether or not animal traction was used and whether or not labor was hired in. To indicate the strategies pursued, there are variables measuring the share of high value crops (=horticulture) in sales and the marketed share of production. To control for shocks occurring over the same period, I include measures of distance to transportation services, the wholesale maize price and the agricultural wage rate. The dependent variable is a measure of labor productivity. Results indicate that the role of land expansion in labor productivity growth is most important in remote areas, and least important in highly accessible areas (indicated by the coefficient on per capita cultivated area). This is consistent with the idea that high-value specialization offers growth pathways that allow for intensification (because of lower input costs, higher output prices, and entry opportunities for high-value markets). Per capita expenditure of fertilizer echoes this trend: both the magnitude and the significance of this measure of capital intensification are increasing with accessibility. 239 Use of animal traction and hired-in labor show very similar patterns, being more important sources of growth in semi-accessible areas than in remote areas, but most important in the highly accessible areas. Returns to market participation (marketed % of total output) are important everywhere, and the returns to high-value market specialization (horticulture % of marketed output) are particularly strong in the high access regime. Agricultural wage rates, while not consistently significant, do offer some further insights into the spatial disaggregation of growth pathways: coefficients on both are negative in the most accessible areas and positive in the most remote areas (and significant in the remote areas). Smallholder farmers tend to be net suppliers of agricultural labor, especially in remote areas; in high access areas, greater earning potential from market-oriented activities means a greater probability to being a net demander of labor. 240 Table 39: Factors affecting farm income growth; dependent variable: log farm income per adult equivalent cultivated area / AE ǂ use of animal traction (=1) hired in labor (=1) marketed % of total output horticulture % of marketed output maize wholesale price agricultural wage rate N Years Estimator ǂ (0.016) 1.31e-06*** (0.000) 6.34e-07** (0.000) 4.50e-07 (0.002) 0.428*** (0.000) 0.325** (0.011) 0.882*** (0.000) 0.043** (0.014) -.0006662 (0.179) -1.95e-07 (0.899) -.0002319*** (0.000) (0.029) 0.322*** (0.000) 0.138** (0.047) 0.542*** (0.000) 0.018*** (0.003) -.0001869 (0.419) 1.13e-06 (0.372) -.0002274*** (0.000) (0.128) 0.192*** (0.006) 0.153** (0.013) 0.541*** (0.000) 0.009*** (0.000) .0004717** (0.041) 5.26e-06*** (0.000) -.0001351*** (0.000) 877 2 FE fertilizer expenditure / AE rainfall (1) (2) (3) Accessible Semi-accessible Remote areas areas areas coeff./p-value coeff./p-value coeff./p-value 0.118** 0.807*** 0.926*** 2619 2 FE 2737 2 FE Notes: ǂ ae = adult equivalent. Cluster robust p-values * p<0.10, ** p<0.05, *** p<0.01 241 3.8 Conclusions Despite its relative abundance of arable land, Zambian agriculture is dominated by very small farms which are shrinking further over time. At the outset of the paper, I characterized this as a mystery, since many descriptive features of Zambia’s small farm sector seem more characteristic of high-density environments than low-density environments. While we cannot discount the presence of land constraints at low density levels, the analysis presented here indicates that economic remoteness plays an important role in constraining area expansion by smallholder farmers. A major mechanism through which remoteness operates is by conditioning the accessibility of land expansion technologies such as animal and mechanical traction. One of the questions addressed by this research was how best to measure and interpret rural population density with respect to land access questions. Using high-resolution estimates of local rural population density and controlling for institutional factors as well as market access, this analysis has shown that at the low densities found in Zambia, rural population density plays only a moderate direct role in cultivated area outcomes. However, the impact of population density on farm size outcomes is largest for smaller holding sizes, suggesting that to the extent that density reflects relative resource scarcity, such scarcity affects the smallest farmers first. These results were robust to alternative definitions of population density, including measures which were weighted for land quality. A second, related objective was to better understand the institutional constraints to area expansion. Although our ability to directly observe the institutions facilitating access to land is 242 limited, this analysis does support the hypothesis that non-customary land tenure is more common in high-access areas, that leasehold land is more accessible to larger holders, and may indirectly raise the costs of land. However, assertions about such indirect effects can only be tentative. It is important to acknowledge that our ability to observe competing claims on local land resources is limited in the data currently available. In particular, we do not observe large farms or other large-scale landholdings in the survey communities. 50 As information about the spatial location of large farms improves, it will be important to further investigate the impact they have on land availability for neighboring smallholders (both directly and indirectly, e.g. through affecting local agricultural land rents). A related research agenda of considerable importance is to empirically describe the structural linkages between large and small farms in close proximity, e.g. through employment, access to inputs and other services. A third research objective was to assess some of the important technological constraints to area expansion. In this analysis I specifically emphasized the role of animal and mechanical traction, as well as hired-in labor, in household cultivated area outcomes. Each of these is shown to have large impacts on cultivated area outcomes. Access to traction resources appears to be very strongly associated with market access gradients. Policies aimed at increasing smallholder production should focus on mitigating these constraints. Interestingly, this result 50 Note that we also only imperfectly observe many other institutional constraints on land access, such lands designated as protected areas or other special status land uses. I did use GIS data on the location of National Parks and Game Management Areas to define additional variables measuring SEA-level land use constraints but these were not significant and were not included in the final econometric analyses. Still, it is worth noting that with improved spatial data on such lands becomes available, this question should be revisited. 243 indicates that the same access conditions that are traditionally understood to affect intensification also affect extensification in areas like Zambia with low rural densities. The fourth major objective of this research was to better understand the role of land in farm production, productivity and welfare outcomes. Results strongly indicate that farm size is a major positive determinant of production volume, and land and labor productivity. Although smaller farms do apply fertilizer more intensively, such capital intensification does not generally translate into productivity gains. 51 Farm size is also important for agricultural and total income generation, although not for off-farm non-agricultural income. The role of land as a productive asset appears particularly strong in more remote areas, where the prospects of intensification are more limited. This finding highlights the importance of overcoming land expansion constraints in such areas. It is important to acknowledge some of the limitations of this study. I highlight two areas in particular that were beyond the scope of this work but which are important research questions and very germane to land access issues in Zambia. First, our ability to observe competing claims on local land resources is limited in the data currently available. In particular, we do not observe large farms or other large-scale landholdings in the survey communities. 52 As 51 An important caveat is that this statement is made on the basis of the entire sample. For high-value specialists, such as horticulture marketers, the returns to capital intensification are likely much higher than for the sample as a whole. Such farmers, however, are a relatively small minority. A promising avenue for future work would be to disaggregate the study of land and intensification processes in Zambia by separately considering farmers who are pursuing different production and marketing strategies. 52 Note that we also only imperfectly observe many other institutional constraints on land access, such lands designated as protected areas or other special status land uses. I did use GIS 244 information about the spatial location of large farms improves, it will be important to further investigate the impact they have on land availability for neighboring smallholders (both directly and indirectly, e.g. through affecting local agricultural land rents). A related research agenda of considerable importance is to empirically describe the structural linkages between large and small farms in close proximity, e.g. through employment, access to inputs and other services. A related point is the limited survey coverage of peri-urban areas and areas within well-serviced contiguous blocks of State land. Some important smallholder intensification strategies – such as high-value horticulture production and marketing – appear to be most prevalent in these areas. While limited in both geographical scope and in the share of smallholders engaged in such strategies, it is important to acknowledge that the role of land may be quite different for smallholders in these conditions than for the smallholder population in general. Future data collection may seek to explore these high-access production and marketing contexts in more detail. A final question has to do with rural mobility. If a household faces acute land access constraints (whether or not these constraints are well represented by population density) could not this farmer move to another community with more accessible land? On the face of it, this is a reasonable question to ask, since Zambia appears to have such a surplus of land under customary authority. However, the social capital requirements of accessing land through informal channels may present considerable barriers to accessing land by non-local households data on the location of National Parks and Game Management Areas to define additional variables measuring SEA-level land use constraints but these were not significant and were not included in the final econometric analyses. Still, it is worth noting that with improved spatial data on such lands becomes available, this question should be revisited. 245 without ties to the community. In any case, data on rural-rural migration in Zambia (as elsewhere in sub-Saharan Africa) is very limited. There is a pronounced information gap with respect to the extent of rural-rural mobility, how such mobility is driven by land and/or employment opportunities (which may, incidentally, be strongly linked with large farm locations), and how small farmers negotiate institutional mechanisms for land access in areas where they are not local. In summarizing the policy implications of this research, I have proposed that the role of market access in farm growth trajectories provides a useful framework for policy options related to overcoming land constraints. Households which have better access to markets and infrastructure are able to farm larger amounts of land. However, growth pathways which involve specialization in high-value agricultural markets – and which do not critically require land for expanding production – are also most viable in areas with the best access. This argues for an integrated focus on access to land and access to markets. Expanding infrastructure and supporting services will reduce the costs of remoteness, enabling some expansion. Complementary efforts will also be required to expand access to draft power, with an emphasis mostly on animal power in remote and semi-accessible areas, but also on mechanized power in accessible and semi-accessible areas where higher densities enable economies of scale in provision via farmer groups, etc. At the same time, promotion of growth through intensification, rather than extensification, may be targeted to the most accessible areas where opportunities are greatest for participation in high-value agricultural markets. 246 APPENDICES 247 APPENDIX A Access indicator selection criteria I have proposed a set of access dimensions of importance in the conceptual model, which I motivated through reasoning grounded in theory and earlier literature. In this appendix, I discuss structured approaches to indicator selection set for market access in developing countries. I first describe the structured reasoning approach I used in more detail, and then generalize that process for application in other situations. I then discuss data-driven methods to indicator set selection. i. Structured reasoning The 2009 World Development Report notes that “sub-Saharan Africa today suffers from the triple disadvantages of low density, long distance, and deep division that put the continent at a developmental disadvantage” (WB 2009: 283). These “three Ds” can serve as an entry point into conceptualizing the important aspects of accessibility, from which we may then consider candidate indicators. Distance is the starting point for most conceptual treatments of economic remoteness (e.g. Jacoby 2000, Fafchamps and Shilpi 2003, 2005, Stifel and Minten 2008, Deichmann et al. 2009, Fafchamps 2012). The economic importance of distance, however, is perhaps best conceptualized as the time and/or monetary costs of travel. Similarly, density (of labor, firms, economic activities, etc.) is a grounding concept underlying agglomeration economies (e.g. Marshall 1920, Krugman 1999, McCormick 1999, 2007, Fafchamps 2012). The idea of economic division is a little more slippery but, as an access concept, generally refers to institutional and/or social barriers to exchange (WB 2009). To a certain extent these concepts map onto other conceptualizations of economic remoteness. Consider the gravity model of trade potential, which posits that the potential flow between two locales is a function of some measure of economic mass (or attraction) normalized by distance or cost of exchange (Bergstrand 1985). This idea has been adapted to measures of rural accessibility in which the economic weight of any given market decays over space such that the remoteness of a place is a function of the distance to markets as well as the economic 248 importance of those markets (Deichmann 1997, Yoshida and Deichmann 2009). 53 Thus, distance and density (of opportunities) are seen as complementary components of a multidimensional market access situation. 54 In terms of small farms and their access to centers of economic activity, the key point is that while distance certainly matters, so does what one finds when one arrives. In my study, variables that mediate the flow of people, goods and information are represented by mobile phones and roads, which are the most salient features of Kenya’s rural communication and transportation landscapes. In addition, I propose that the richness of local market opportunities can be expressed in terms of key public and private services. Drawing from the data available in the household survey, I identify extension services as the main service of relevance for output marketing, and the locations of fertilizer retailers as the main service of interest for input market participation. Additionally, I include distance to electricity because many rural services are directly or indirectly dependent upon electricity. For example, artificial insemination services rely on access to refrigeration, which is typically limited to areas on the electrical grid. Chamberlin and Jayne (2013) argue that the choice of accessibility indicators should conform to the specifics of the research question and/or empirical setting. My approach here has been to suggest a set of access indicators that (a) are available in the data, (b) are of relevance to the questions being pursued, and (c) are important within the study context. Nonetheless, a useful avenue of further research could explore more detailed frameworks for indicator definition and Drawing on the gravity model, Yoshida and Deichman (2009) define a simple implementation 𝛼 of a potential accessibility index for location 𝑖 as: I 𝑖 = ∑ 𝑗 �𝑆 𝑗 � 𝑇 𝑖𝑗 � where, 𝑆 𝑗 is some indicator of the economic mass at market 𝑗 (which might be represented, for example, by the population of an urban center), and 𝑇 𝑖𝑗 is the distance (or cost, or travel time) between origin 𝑖 and target 𝑗. The parameter 𝛼 allows variation in the functional form of the impact of distance on potential accessibility. 53 54 Division, the third “D”, may be mapped onto this idea as an additional, non-physical component of economic distance, e.g. barriers to exchange imposed by institutional factors. 249 selection in a range of smallholder production and marketing environments. Such work, however, is beyond the scope of this essay. ii. Data-driven approaches In empirical settings where there are multiple indicators of access, one might also appeal to data-driven approaches to indicator selection. For concreteness, say we have a number of alternative indicators measuring access to different market-related services. If we have no theory to guide indicator selection a priori, there are several possible ways to allow characteristics of the data to guide our choice. First, if alternative indicators in the set under consideration are sufficiently correlated with one another, we might attempt to reduce the dimensionality of the indicator set by implementing a principal components analysis. Second, if alternative indicators are not highly correlated with one another, we might simply include them all. For intermediate cases, we might pursue a variable selection approach such as that provided by stepwise regression (Bendel and Afifi 1977). Such approaches, however, have numerous problems and are generally shunned by econometricians (Judd et al. 1989, Scribney 2011). 250 APPENDIX B Comparability of Fixed-Effects and Correlated Random Effects estimation results As a check on the robustness of the Correlated Random Effects (CRE) estimates, I also estimated models with the Fixed-Effects (FE) estimator. As discussed in the text, one downside of the FE estimator in the present context is the loss of impact estimates for time-invariant model covariates. However, the FE estimator is generally regarded as more robust than the CRE estimator (Wooldridge 2010). Therefore, it is of interest to show that FE and CRE estimates do not vary greatly from one another. Fortunately, this is the case for all the models estimated in this analysis. To illustrate this, I show the linear (non-spatial) FE and CRE estimates for the HCI and log value sold models in the table below. The CRE and FE estimates for the determinants of HCI model are shown in columns 1 and 2, respectively. The CRE and FE estimates for the determinants of log value sold are shown in columns 3 and 4. In both cases, the estimates for the time-varying components are quite close to one another, suggesting that the CRE assumptions are valid. The estimates are almost identical for the non-access household variables. Estimates differ a bit more in the access variables, but not enough to warrant especial concern. Comparison of CRE and FE results for other models show similar results, i.e. estimates differ by similar or smaller amounts than the comparisons shown here. For this reason, I show only the CRE results in the main body of the dissertation, as they appear to be converging to FE estimates and also let us examine the time invariant model covariates of interest. 251 Table 40 Determinants of HCI & log value sold HCI Log value sold (1) CRE b/p Access variables mobile km extension avg: km extension km tarmac avg: km tarmac CF res: mobile Other factors farm size (2) FE b/p (3) CRE b/p (4) FE b/p 11.119*** (0.000) 0.146 (0.122) -0.351 (0.118) -0.049 (0.661) -0.097 (0.499) -6.176*** (0.001) 9.404*** (0.002) 0.147 (0.121) 0.820** (0.013) 0.009 (0.416) 0.016 (0.506) -0.000 (0.975) -0.026 (0.105) -0.452** (0.025) 0.740** (0.034) 0.009 (0.422) -0.051 (0.649) -5.057*** (0.008) -0.000 (0.999) -0.398* (0.062) 1.821*** 1.828*** 0.238*** 0.238*** (0.000) (0.000) (0.000) (0.000) adult equiv. 0.159 0.166 0.063*** 0.064*** (0.449) (0.428) (0.007) (0.007) female 0.242 0.071 0.193* 0.184 (0.814) (0.945) (0.092) (0.112) age of head -0.101* -0.002 (0.068) (0.676) education -0.119 0.011 (0.574) (0.611) assets -0.002 -0.002 -0.000 -0.000 (0.607) (0.574) (0.867) (0.848) maize price 0.043*** 0.043*** 0.004*** 0.004*** (0.000) (0.000) (0.000) (0.000) rainfall -0.003 -0.002 0.001** 0.000 (0.281) (0.409) (0.050) (0.235) Log likelihood -2.10e+04 -1.97e+04 -1.09e+04 -9682.587 AIC 42102.494 39393.673 21932.183 19393.174 BIC 42276.136 39483.709 22105.884 19483.241 N 4588 4588 4598 4598 Note: CRE controls and time dummies not shown. * p<0.10, ** p<0.05, *** p<0.01 252 REFERENCES 253 REFERENCES Action Aid. 2009. “Communities fear displacement.” Action Aid/MS Zambia Newsletter June 2009. http://www.actionaid.dk/sw132091.asp Adams, Martin. 2003. Land tenure policy and practice in Zambia: issues relating to the development of the agricultural sector. Draft version: 13 January 2003. Mokoro Ltd, Headington UK. Ajayi, O. C., F. K. Akinnifesi, G. Sileshi, S. Mn’gomba, O. A. Ajayi, W. Kanjipite, and J. M. Ngulube. 2012. Managing conflicts over land and natural resources through collective action: A case study from rural communities in Zambia. CAPRi Working Paper No. 105. Washington, D.C.: International Food Policy Research Institute. http://dx.doi.org/10.2499/CAPRiWP105. Aker, J. 2008. Does Digital Divide or Provide? The Impact of Cell Phones on Grain Markets in Niger - Working Paper 154. Aker, J. 2010. Information from Markets Near and Far: Mobile Phones and Agricultural Markets in Niger. American Economic Journal: Applied Economics 2 (July 2010): 46–59. http://www.aeaweb.org/articles.php?doi=10.1257/app.2.3.46 Aker, Jenny C. and Marcel Fafchamps. 2010. How Does Mobile Phone Coverage Affect FarmGate Prices? Evidence from West Africa. Unpublished mimeo. University of California, Berkeley. http://www. aeaweb. org/aea/2011conference/program/retrieve. php (2010). Alene, A. D., V. Manyong, G. Omanya, H.D. Mignouna, M. Bokanga and G. Odhiambo. 2008. Smallholder market participation under transactions costs: Maize supply and fertilizer demand in Kenya. Food Policy, 33(4), 318-328. Alwang, J., P. Seigel, and S. Jorgensen. 1996. Seeking Guidelines for Poverty Reduction in Rural Zambia. World Development, 24(11):1711-1723. Amaya, N., and Alwayng, J. 2011. Access to information and farmer’s market choice: The case of potato in highland Bolivia. Journal of Agriculture, Food Systems, and Community Development 1(4), 35–53. http://dx.doi.org/10.5304/jafscd.2011.014.003 Anseeuw, W., M. Boche, T. Breu, M. Giger, J. Lay, P. Messerli, and K. Nolte. 2012. Transnational land deals for agriculture in the global south: analytical report based on the land matrix database. The Land Matrix Partnership, http://landportal.info/landmatrix/media/img/analyticalreport.pdf Anselin, L. 1988. Spatial Econometrics: Methods and Models. Dordrecht, Kluwer. 254 Anselin, L. 2001. Rao’s score test in spatial econometrics. Journal of Statistical Planning and Inferences 97, 113-139. Assunção, J. J., and Braido, L. H. 2007. Testing household-specific explanations for the inverse productivity relationship. American Journal of Agricultural Economics, 89(4), 980-990. Baerenklau, K. A. 2005. Toward an Understanding of Technology Adoption: Risk, Learning, and Neighborhood Effects. Land Economics 81 (1): 1-19. Balk, Deborah, and Gregory Yetman. 2004. "The Global Distribution of Population: Evaluating the gains in resolution refinement." Working Paper. Center for International Earth Science Information Network (CIESIN), Columbia University. Baltagi, Badi. 2008. Econometric analysis of panel data. Wiley. Baltenweck, I. and S. Staal. 2007. Beyond One-Size-Fits-All: Differentiating Market Access Measures for Commodity Systems in the Kenyan Highlands. Journal of Agricultural Economics, Vol. 58, No. 3, 2007, 536–548. Bandiera, O., and I. Rasul. 2006. Social networks and technology adoption in northern Mozambique. The Economic Journal 116 (514): 869 - 902. Barrett, C. B. 2008. Smallholder market participation: concepts and evidence from eastern and southern Africa. Food Policy, 33(4), 299-317. Barrett, C.B., and P.A. Dorosh. 1996. Farmers’ welfare and changing food prices: nonparametric evidence from rice in Madagascar. American Journal of Agricultural Economics 78 (3), 656–669. Baulch, B. and A. Quisumbing. 2011. Testing and adjusting for attrition in household panel data. CPRC Toolkit Note. Version: 22 September 2011. Chronic Poverty Research Centre (CPRC). Available from http://www.chronicpoverty.org/publications/details/testing-and-adjusting-forattrition-in-household-panel-data Bellemare, M.F., and C.B. Barrett. 2006. An ordered tobit model of market participation: evidence from Kenya and Ethiopia. American Journal of Agricultural Economics 88 (2), 324–337. Bendel, R. B., and Afifi, A. A. 1977. Comparison of stopping rules in forward “stepwise” regression. Journal of the American Statistical Association, 72(357), 46-53. Benjamin, Dwayne. 1995. Can unobserved land quality explain the inverse productivity relationship? Journal of Development Economics 46(1): 51–84. Bergstrand, Jeffrey H. 1985. The Gravity Equation in International Trade: Some Microeconomic Foundations and Empirical Evidence. The Review of Economics and Statistics, 67(3):474-481. 255 Bernard, T. and M. Torero. Unpublished mimeo. Bandwagon Effects in Poor Communities: Experimental Evidence from a Rural Electrification Program in Ethiopia. International Food Policy Research Institute, Washington DC. Bhalla, S. S., and P. Roy. 1988. Mis-specification in farm productivity analysis: the role of land quality. Oxford Economic Papers 55-73. Binswanger, H. P., and J. McIntire. 1987. Behavioral and material determinants of production relations in land-abundant tropical agriculture. Economic Development and Cultural Change 36(1), 73-99. Binswanger, H. P., and V. W. Ruttan. 1978. Induced innovation: technology, institutions, and development (pp. 22-47). Baltimore: Johns Hopkins University Press. Bontemps, Sophie, Pierre Defourny, and Eric Van Bogaert. 2010. GLOBCOVER 2009 Product description and validation report. European Space Agency and Université Catholique de Louvain. Document available at http://ionia1.esrin.esa.int. Boserup, E. 1965. The conditions of agricultural growth: The economics of agrarian change under population pressure. Chicago: Aldine De Gruyter. Brown, T. 2005. Contestation, confusion and corruption: Market-based land reform in Zambia. Chapter 3 in Competing Jurisdictions: Settling Land Claims In Africa, edited by Sandra Evers, Marja Spierenburg and Harry Wels. Competing jurisdictions: Settling land claims in Africa. Leiden, the Netherlands: Brill Academic Publishers. Canette, Isabel. 2011. “Positive log-likelihood values happen” The Stata blog, 16 February 2011. http://blog.stata.com/2011/02/16/positive-log-likelihood-values-happen/ Case, A. 1992. Neighborhood influence and technological change. Regional Science and Urban Economics 22, 491–508. Chamberlain, G. 1984. “Panel Data,” in Handbook of Econometrics, Volume 2, ed. Z. Griliches and M.D. Intriligator. Amsterdam: North Holland, 1248-1318. Chamberlin, J. 2012. Modeling a market access surface for Sub-Saharan Africa. Unpublished mimeo. Chamberlin, J., and T. S. Jayne. 2009. Has Kenyan Farmers’ Access to Markets and Services Improved? Panel Survey Evidence, 1997-2007 (No. 58545). Michigan State University, Department of Agricultural, Food, and Resource Economics. Chamberlin, J., and T. S. Jayne. 2013. Unpacking the Meaning of ‘Market Access’: Evidence from Rural Kenya. World Development (41): 245–264. 256 Chapoto, A., S. Haggblade, M. Hichaambwa, S. Kabwe, S. Longabaugh, N. J. Sitko and D. L. Tschirley. 2012. Agricultural Transformation in Zambia: Alternative Institutional Models for Accelerating Agricultural Productivity Growth, and Commercialization (No. 132339). Michigan State University, Department of Agricultural, Food, and Resource Economics. Clarke, John Innes, and Leszek A. Kosiński. 1982. Redistribution of population in Africa. Heinemann Publishing. Coase, Ronald H. 1937. The Nature of the Firm. Economica 4:386–405. Colson, E. 1966. Land Law and Land Holdings among Valley Tonga of Zambia. Southwestern Journal of Anthropology, Vol. 22, No. 1 (Spring, 1966), pp. 1-8 Commission on Agriculture and Lands. 2009. Second Report of the Committee on Agriculture and Lands for the Fourth Session of the Tenth National Assembly. Presented to Parliament on September 24. Parliament of Zambia, Lusaka. Conley, T. G., and C. R. Udry. 2010. Learning about a new technology: Pineapple in Ghana. The American Economic Review 100 (1): 35-69. Corrado, L. and B. Fingleton. 2010. Where is the economics in spatial econometrics? SIRE Discussion paper SIRE-DP-2011-02. Cragg, J. G. 1971. Some Statistical Models for Limited Dependent Variables with Application to the Demand for Durable Goods. Econometrica, Vol. 39, No. 5 (Sep., 1971), pp. 829-844. Croppenstedt, A., M. Demeke, and M. M. Meschi. 2003. Technology Adoption in the Presence of Constraints: The Case of Fertilizer Demand in Ethiopia. Review of Development Economics 7: 58–70. de Janvry, A., M. Fafchamps, and E. Sadoulet. 1991. Peasant Household Behavior with Missing Markets: Some Paradoxes Explained. Economic Journal 101(409): 1400–1417. Deaton, A. 1997. The analysis of household surveys: a microeconometric approach to development policy. Johns Hopkins University Press. Deichmann, Uwe, Forhad Shilpi and Renos Vakis. 2009. Urban Proximity, Agricultural Potential and Rural Non-farm Employment: Evidence from Bangladesh. World Development 37(3): 645– 660. Deichmann, Uwe. 1997. Accessibility Indicators in GIS. Department for Economic and Social Information and Policy Analysis, New York: United Nations Statistics Division. Deininger, K. 2003. Land policies for growth and poverty reduction. A World Bank Policy Research Report. Oxford and New York: World Bank and Oxford University Press. 257 Deininger, K., and L. Squire. 1998. New ways of looking at old issues: inequality and growth. Journal of Development Economics, 57(2), 259-287. Demsetz, H. 1988. Theory of the Firm Revisited. The Journal of Law, Economics and Organization 4: 141. Doss, C. R. 2006. Analyzing technology adoption using microstudies: limitations, challenges, and opportunities for improvement. Agricultural Economics 34 (3): 207 - 19. Elhorst, J. P. 2010. Applied spatial econometrics: Raising the bar. Spatial Economic Analysis 5(1): 9-28. Elhorst, J. P. 2011. Spatial panel models. Unpublished mimeo, September 2011. University of Groningen, Department of Economics, Econometrics and Finance, Groningen, the Netherlands. Elhorst, J.P. and S. Freret. 2007. Yardstick competition among local governments: French evidence using a two-regimes spatial panel data model. Paper presented at the European and North American RSAI Meetings, August 29-September 2, 2007, Paris, November 7-11, 2007, Savannah. Fafchamps, Marcel and Forhad Shilpi. 2003. The spatial division of labor in Nepal. Journal of Development Studies 39 (6), 23–66. Fafchamps, Marcel, and Forhad Shilpi. 2005. Cities and specialization: evidence from South Asia. The Economic Journal 115 (503), 477–504. Fafchamps, Marcel. 1992. Cash crop production, food price volatility, and rural market integration in the third world. American Journal of Agricultural Economics 74.1 (1992): 90-99. Fafchamps, Marcel. 2004. Market institutions in Sub-Saharan Africa: theory and evidence. MIT Press, 2004. Fafchamps, Marcel. 2012. Reprint of development, agglomeration, and the organization of work. Regional Science and Urban Economics 42.5 (2012): 765-778. Feder, G., R. E. Just, and D. Zilberman. 1985. Adoption of agricultural innovations in developing countries: A survey. Economic Development and Cultural Change 33 (2): 255-98. Fischer, Günther, Harrij van Velthuizen and Freddy O. Nachtergaele. 2000. Global AgroEcological Zones Assessment: Methodology and Results. Interim Report IR-00-064. International Institute for Applied Systems Analysis and The Food and Agricultural Organization of the United Nations. Foster, A. D., and M. R. Rosenzweig. 2010. Microeconomics of technology adoption. Annual Review of Economics 2 (1): 395-424. 258 Franzese, R. J. and J. C. Hays. 2007. Spatial econometric models of cross-sectional interdependence in political science panel and time-series-cross-section data. Political Analysis 15(2):140-164. Franzese, R. J. and J. C. Hays. 2009. The Spatial Probit Model of Interdependent Binary Outcomes: Estimation, Interpretation, and Presentation. Paper presented at the Annual Meeting of the Public Choice Society, 6 March 2009. German, L., Schoneveld, G., and E. Mwangi. 2011. Processes of large-scale land acquisition by investors: Case studies from sub-Saharan Africa. In International Conference on Global Land Grabbing, University of Sussex (pp. 6-8). Gluckman, M., W. Allen, D. Peters and C. Tranell. 1948. Land Holdings and Land Usage among the Plateau Tonga of Mazabuka District: a reconnaissance survey, 1945. Oxford: Oxford University Press and Rhodes Livingstone Institute, Northern Rhodesia. Goldewijk, K. , A. Beusen, M. de Vos and G. van Drecht. 2011. The HYDE 3.1 spatially explicit database of human induced land use change over the past 12,000 years. Global Ecology and Biogeography 20(1): 73-86. Griliches, Zvi, and Jerry A. Hausman. 1986. Errors in variables in panel data. Journal of Econometrics, 31(1):93–118. Haggblade, S. and G. Tembo. 2003. Conservation Farming in Zambia. EPTD Discussion Paper 108. Environment and Production Technology Division, International Food Policy Research Institute, Washington, DC. Haggblade, S., P. Hazell, and T. Reardon. 2007. Transforming the rural nonfarm economy. International Food Policy Research Institute and Johns Hopkins University Press, Washington DC. Haggblade, Steven, Steven Longabaugh and David Tschirley. 2009. “Spatial Patterns of Food Staple Production and Marketing in South East Africa: Implications for Trade Policy and Emergency Response.” International Development Working Paper No.100. Department of Agricultural, Food and Resource Economics, Michigan State University. Hansungule, M., P. Feeney, and R. H. Palmer. 1998. Report on land tenure insecurity on the Zambian Copperbelt. Oxfam GB in Zambia. Hayami, Y., and M. Kikuchi. 1981. Asian village economy at the crossroads: An economic approach to institutional change. Tokyo: University of Tokyo Press. Hayami, Y., and V. W. Ruttan. 1970. Agricultural productivity differences among countries. The American Economic Review, 895-911. 259 Hayami, Y., and V. W. Ruttan. 1971. Agricultural development: an international perspective. Baltimore: The Johns Hopkins University Press. Hayami, Y., and V. W. Ruttan. 1985. Agricultural development: an international perspective, (Revised and Expanded Edition). Baltimore: The Johns Hopkins University Press. Heltberg, R. and F. Tarp. 2002. Agricultural supply response and poverty in Mozambique. Food Policy 27 (2002) 103–124 Hijmans, R.J., S.E. Cameron, J.L. Parra, P.G. Jones and A. Jarvis. 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25: 1965-1978. Holloway, G., and M. L. A. Lapar. 2007. How big is your neighbourhood? Spatial implications of market participation among Filipino smallholders. Journal of Agricultural Economics 58 (1): 3760. Holloway, G., B. Shankar, and S. Rahman. 2002. Bayesian spatial probit estimation: A primer and an application to HYV rice adoption. Agricultural Economics 27 (3): 383-402. Holloway, G., C. Barrett and S. Ehui. 2005. The Double-Hurdle Model in the Presence of Fixed Costs. Journal of International Agricultural Trade and Development 1: 17–28. Holloway, G., S. Ehui and Teklu. 2008. Bayes estimates of distance‐to‐market: transactions costs, cooperatives and milk‐market development in the Ethiopian highlands. Journal of Applied Econometrics, 23(5), 683-696. Jacoby, H. G. 2000. Access to markets and the benefits of rural roads. The Economic Journal, 110(465), 713-737. Jayne, T.S., J. Chamberlin, and M. Muyanga. 2012. “Emerging Land Issues in African Agriculture: Implications for Food Security and Poverty Reduction Strategies.” Paper presented as part of Stanford University’s Global Food Policy and Food Security Symposium Series, sponsored by the Center for Food Security and the Environment and the Freeman Spogli Institute for International Studies, January 12, 2012, Stanford, California. Jayne, Thomas S., Ballard Zulu, Gear Kajoba, and Michael T. Weber. 2008. "Access to Land and Poverty Reduction in Rural Zambia: Connecting the Policy Issues." International Development Collaborative Working Papers (2008). Jensen, R.T. 2007. The digital provide: Information (technology), market performance and welfare in the south Indian fisheries sector. Quarterly Journal of Economics 122(3), 879–924. Jensen, R.T. 2010. Information, efficiency, and welfare in agricultural markets. Agricultural Economics, Volume 41, Issue Supplement s1, pages 203–216, November 2010. 260 Johnston, B.F., and P. Kilby. 1975. Agriculture and Structural Transformation: Economic Strategies in Late-Developing Countries. New York: Oxford University Press. Johnston, Bruce F., and John Mellor. 1961. The Role of Agriculture in Economic Development. American Economic Review 51.4: 566-93. Judd, C. M., McClelland, G. H., and Ryan, C. S. 1989. Data analysis: A model comparison approach. San Diego: Harcourt Brace Jovanovich. Kajoba, G. 1994. Changing Perceptions on Agricultural Land Tenure under Commercialization among Small-scale Farmers: the Case of Chinena Village in Chibombo District (Kabwe Rural), Central Zambia. The Science Reports of the Tohoku University, 7th Series (Geography) Vol. 44 No. 1, December 1994. 43-64. Kajoba, G. 2002. Women and Land in Zambia: A Case Study of Small-Scale Farmers in Chenena Village, Chibombo District, Central Zambia. Eastern Africa Social Science Research Review 18(1): 35-61. Kelejian, H. H., and I. R. Prucha. 2010. Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. Journal of Econometrics 157 (1): 53-67. Kelejian, Harry H., and Gianfranco Piras. 2011. An extension of Kelejian's J-test for non-nested spatial models. Regional Science and Urban Economics 41 (2011) 281–292. Key, Nigel, Elisabeth Sadoulet, and Alain de Janvry. 2000. Transactions Costs and Agricultural Household Supply Response. American Journal of Agricultural Economics, Vol. 82, No. 2 (May, 2000), pp. 245-259 Krugman, Paul. 1999. The Role of Geography in Development. International Regional Science Review, 22(2):142–61. Lamb, Russell L. 2003. Inverse productivity: Land quality, labor markets, and measurement error. Journal of Development Economics 71(1): 71-95. Langyintuo, A. S., and M. Mekuria. 2008. Assessing the influence of neighborhood effects on the adoption of improved agricultural technologies in developing agriculture. African Journal of Agricultural and Resource Economics 2 (2): 151-69. Le Gallo, J., C. Ertur, and C. Baumont 2003. A spatial econometric analysis of convergence across European regions. European Regional Growth, Springer-Verlag, Berlin, 99-129. Leibenstein, H. 1950. Bandwagon, snob, and Veblen effects in the theory of consumers' demand. The Quarterly Journal of Economics, 64(2), 183-207. 261 LeSage, J. P. and R.K. Pace. 2009. Introduction to spatial econometrics. Boca Raton, US: CRC Press Taylor and Francis Group. Linard, C., Gilbert, M., Snow, R.W., Noor, A.M. and Tatem, A.J. 2012. Population Distribution, Settlement Patterns and Accessibility across Africa in 2010. PLoS ONE 7(2): e31743. Linard, C., M. Gilbert and A.J. Tatem. 2011. Assessing the use of global land cover data for guiding large area population distribution modeling. GeoJournal 2011(76):525–538 Manski, C. F. 1993. Identification of endogenous social effects: The reflection problem. The Review of Economic Studies, 60(3), 531-542. Marshall, Alfred. 1920. Principles of Economics, London: Macmillan. Mason, N. 2011. Marketing Boards, Fertilizer Subsidies, Prices, and Smallholder Behavior: Modeling & Policy Implications for Zambia. PhD dissertation. Department of Agricultural, Food and Resource Economics, Michigan State University, East Lansing, MI. Mason, N. and J. Ricker-Gilbert. 2013. Disrupting Demand for Commercial Seed: Input Subsidies in Malawi and Zambia. World Development, Volume 45, May 2013, Pages 75–91. McCormick, D. 1999. African Enterprise Clusters and Industrialization: Theory and Reality. World Development, 27(9): 1531-52. McCormick, D. 2007. “Industrialization through cluster upgrading: Theoretical perspectives.” Chapter 2 in Oyelaran-Oyeyinka, Banji and Dorothy McCormick, eds., Industrial Clusters and Innovation Systems in Africa: Learning Institutions and Competition. Tokyo: United Nations University Press. McKinnish, T. 2008. Panel Data Models and Transitory Fluctuations in the Explanatory Variable. Advances in Econometrics 21 (2008): 335-358. Megill, David J. 2005. Recommendations for Adjusting Weights for Zambia Post-Harvest Survey Data Series and Improving Estimation Methodology for Future Surveys. Working Paper No. 13, Food Security Research Project, Department of Agricultural Economics, Michgian State University: East Lansing, MI. March 2005. (Downloadable at: http://www.aec.msu.edu/agecon/fs2/zambia/index.htm ) Mellor, J. 1976. The New Economics of Growth: A Strategy for India and the Developing World. Ithaca: Cornell University Press. Mellor, J. W. 1973. Accelerated growth in agricultural production and the intersectoral transfer of resources. Economic Development and Cultural Change, 1-16. 262 Minten, B. and S. Kyle. 1999. The effect of distance and road quality on food collection, marketing margins, and traders' wages: evidence from the former Zaire. Journal of Development Economics, 60(2), 467-495. Mitchell, T. and P. Jones. 2005. An improved method of constructing a database of monthly climate observations and associated high-resolution grids. International Journal of Climatology, 25, 693-712. Moser, C. and C. Barrett. 2006. The complex dynamics of smallholder technology adoption: the case of SRI in Madagascar. Agricultural Economics 35 (2006) 373–388. Mpundu, Mildred. 2007. We know no other home than this: land disputes in Zambia. London: Panos Institute. Retrieved from http://www.research4development.info/PDF/Outputs/Panos/Zambialand.pdf Mundlak, Y. 1978. On the Pooling of Time Series and Cross Section Data. Econometrica 46, 6985. Munshi, Kaivan. 2004. Social learning in a heterogeneous population: technology diffusion in the Indian Green Revolution. Journal of Development Economics 73.1 (2004): 185-213. Mutangadura, Gladys. 2007. The incidence of land tenure insecurity in Southern Africa: Policy implications for sustainable development. Natural Resources Forum 31 (2007) 176–187 Muto, Megumi, and Takashi Yamano. 2009. The Impact of Mobile Phone Coverage Expansion on Market Participation: Panel Data Evidence from Uganda, World Development 37(12):18871896. Muyanga, M. and T. S. Jayne. 2006. Agricultural Extension in Kenya: Practice and Policy Lessons. Working Paper 26 (2006). Tegemeo Institute of Agricultural Policy and Development, Egerton University. North, D. 1990. Institutions, Institutional Change and Economic Performance. Cambridge University Press. Olwande, John and Mary Mathenge. 2011. Market Participation Among Poor Rural Households in Kenya. Tegemeo Working Paper. WPS 42/2011. Tegemeo Institute of Agricultural Policy and Development, Nairobi, Kenya Omamo, Steven Were and Lawrence O. Mose. 2001. Fertilizer trade under market liberalization: preliminary evidence from Kenya. Food Policy 26(1):1–10. Omamo, Steven Were. 1998. Transport Costs and Smallholder Cropping Choices: An Application to Siaya District, Kenya. American Journal of Agricultural Economics 80(1):116-123. 263 Omiti, John, David J. Otieno, Timothy O. Nyanamba and Ellen McCullough. 2009. Factors influencing the intensity of market participation by smallholder farmers: A case study of rural and peri-urban areas of Kenya. African Journal of Agricultural and Resource Economics 3(1): 5782. Overå, R., 2006. Networks, distance, and trust: Telecommunications development and changing trading practices in Ghana. World Development 34(7), 1301–1315. Palmer, Robin. 2000. Land tenure insecurity on the Zambian Copperbelt, 1998: Anyone going back to the land? Social Dynamics: A Journal of African Studies, 26:2, 154-170 Papke, L.E. and J.M. Wooldridge. 2008. Panel Data Methods for Fractional Response Variables with an Application to Test Pass Rates. Journal of Econometrics 145, 121-133. Pingali, P. L., and H. P. Binswanger. 1987. Population density and agricultural intensification: a study of the evolution of technologies in tropical agriculture. Chapter 2 in: Johnson, David Gale, and Ronald Demos Lee, editors. Population growth and economic development: issues and evidence. University of Wisconsin Press. Pingali, P. L., Bigot, Y., & Binswanger, H. P. 1987. Agricultural mechanization and the evolution of farming systems in sub-Saharan Africa. Johns Hopkins University Press. Pollak, R. A. 1976. Interdependent preferences. The American Economic Review, 66(3), 309-320. Quan, N. and A.Y.C. Koo. 1985. Concentration of Land Holdings: An Empirical Exploration of Kuznets’ Conjecture. Journal of Development Economics 18: 101-17. Renkow, M., D. Hallstrom, and D. Karanja. 2004. Rural infrastructure, transactions costs and market participation in Kenya. Journal of Development Economics 73 (2004) 349– 367 Rivers, D., and Q.H. Vuong. 1988. Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models. Journal of Econometrics 39: 347-366. Rodriguez, E., C.S. Morris, J.E. Belz, E.C. Chapin, J.M. Martin, W. Daffer, and S. Hensley. 2005. An assessment of the SRTM topographic products, Technical Report JPL D-31639, Jet Propulsion Laboratory, Pasadena, California, 143 pp. Ruttan, V. W., and Hayami, Y. 1984. Toward a theory of induced institutional innovation. The Journal of Development Studies, 20(4), 203-223. Scribney, Bill. 2011. Problems with stepwise regression. Summary of comments from an email exchange on STAT-L/SCI.STAT.CONSULT in 1996, subsequently annotated and revised and now available on the STATA support pages. http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/ 264 Siegel, P. 2008. Profile of Zambian Smallholders: Where and Who are the Potential Beneficiaries of Agricultural Commercialization. Africa Region Working Paper Series No. 113. The World Bank: Washington, DC, USA. Singh, I., L. Squire, and J. Strauss. 1986. Agricultural Household Models. Baltimore: Johns Hopkins University Press. Sitko, N. J. 2010. Fractured governance and local frictions: the exclusionary nature of a clandestine land market in southern Zambia. Africa, 80(01), 36-55. Smith, R. E. 2004. Land Tenure, Fixed Investment, and Farm Productivity: Evidence from Zambia’s Southern Province. World Development 32(10):1641–1661. Smith, R., and R. Blundell. 1986. An Exogeneity Test for a Simultaneous Equation Tobit Model with an Application to Labor Supply. Econometrica 54: 679-685. Solon, G. 1985. Comment on ‘Benefits and limitations of panel data’ by C. Hsiao. Econometric Review, 4 (1985):183–186. Stambuli, K. 2002. Elitist Land and Agricultural Policies and the Food Problem in Malawi. Journal of Malawi Society – Historical and Scientific, 55(2): 34-83. Stifel, D. and B. Minten 2008. Isolation and agricultural productivity. Agricultural Economics 39 (2008) 1–15 Suri, T. 2011. Selection and Comparative Advantage In Technology Adoption. Econometrica, Vol. 79, No. 1 (January, 2011), 159–209. Tatem, Andrew, Abdisalan Noor, Craig von Hagen, Antonio Di Gregorio, and Simon I. Hay. 2007. High Resolution Population Maps for Low Income Nations: Combining Land Cover and Census in East Africa. PLoS ONE 2(12): e1298. Tatem, Andrew, Nicholas Campiz, Peter Gething, Robert Snow, and Catherine Linard. 2011. The effects of spatial population dataset choice on estimates of population at risk of disease. Population Health Metrics 2011, 9:4. The Post. 2012a. “Chief Mumena's land fears” Editorial in Zambian daily The Post, Monday 18 June 2012. Available at http://www.postzambia.com/post-read_article.php?articleId=28007 The Post. 2012b. “Ndola woman implores government to probe council over land” Article by Kabanda Chulu in Zambian daily The Post, Monday 10 April 2012. Available at http://www.postzambia.com/post-read_article.php?articleId=26735&highlight=land%20grab The Times of Zambia. 2012. “Zambia: We'll Deal With Cadres Selling Land Illegally- Veep” Chila Namaiko, The Times of Zambia, 8 December 2012. Available at: http://allafrica.com/stories/201212080210.html 265 Tiffin, M., M. Mortimore, and F. Gichuki. 1994. More People, Less Erosion. Chichester, UK: Wiley. Timmer, P. 1988. “The Agricultural Transformation.” In H. Chenery and T.N. Srinivasan, eds., Handbook of Development Economics, Vol. 1. Amsterdam: North-Holland, pp. 275-331. Torero, Maximo, and Joachim Von Braun. 2006. Information and communication technologies for development and poverty reduction: The potential of telecommunications. Johns Hopkins University Press, 2006. Tóth, Géza, Bartosz Kozlowski, Sylvia Prieler and David Wiberg. 2012. Global Agro-Ecological Zones (GAEZ v3.0) User’s Guide. Institute for Applied Systems Analysis (IIASA) and the Food and Agriculture Organization of the United Nations (FAO). Laxenburg, Austria and Rome, Italy. Available at: http://www.iiasa.ac.at/Research/LUC/GAEZv3.0/docs/GAEZ_User_Guide.pdf UNDP. 2012. Africa Human Development Report 2012. United Nations Development Programme, Rome. Available at: http://www.afhdr.org/AfHDR/documents/HDR.pdf van de Walle, Dominique. 2009. Impact evaluation of rural road projects. Journal of Development Effectiveness 1(1): 15-36 WB. 2012. World Development Indicators 2012. World Bank, Washington, DC. Wichern, R., U. Hausner, and D. Chiwele. 1999. Impediments to Agricultural Growth In Zambia. TMD Discussion Paper 47. Trade and Macroeconomics Division, International Food Policy Research Institute, Washington, D.C. Williamson, Oliver E. 1979. Transaction-cost economics: the governance of contractual relations. Journal of Law and Economics, 22(2), 233-261. Wood, S. 2007. Spatial dimensions of the regional evaluation of agricultural livelihood strategies: Insights from Uganda. Ph.D. dissertation, Department of Economics, University of London. Wooldridge, J.M. 2010. Econometric Analysis of Cross Section and Panel Data, Second Edition. MIT Press: Cambridge, MA. World Bank. 2009. World Development Report 2009. Washington DC: World Bank. Yamano, T., Place, F., Nyangena, W., Wanjiku, J. and Otsuka, K. 2009. Efficiency and equity impacts of land markets in Kenya. Chapter 5 in Holden, S.T., Otsuka, K. and Place, F.M. (eds), The Emergence of Land Markets in Africa. Washington, DC: Resources for the Future. Yamauchi, Futoshi, Megumi Muto, Shyamal Chowdhury, Reno Dewina and Sony Sumaryanto. 2011. World Development 39(12):2232–2244. 266 Yoshida, N. and U. Deichmann. 2009. Measurement of Accessibility and Its Applications. Journal of Infrastructure Development 1(1):1-16. Yoshida, Nobuo and Uwe Deichmann. 2009. Measurement of Accessibility and Its Applications. Journal of Infrastructure Development (1): 1-16 ZBS. 2011. Preliminary Population Figures. Report for the Zambia 2010 Census of Population and Housing. February, 2011. Zambia Bureau of Statistics, Lusaka. Zhao, M., F. A. Heinsch, R. R. Nemani, and S. W. Running. 2005. Improvements of the MODIS terrestrial gross and net primary production global data set. Remote Sensing of Environment 95 (2005) 164 – 176. 267