MAIZE PRODUCTION IN ZAMBIA AND REGIONAL MARKETING: INPUT PRODUCTIVITY AND OUTPUT PRICE TRANSMISSION By William J. Burke A DISSERTATION Submitted to Michigan State University In partial fulfillment of the requirements For the degree of DOCTOR OF PHILOSOPHY Agricultural, Food & Resource Economics 2012 1 ABSTRACT MAIZE PRODUCTION IN ZAMBIA AND REGIONAL MARKETING: INPUT PRODUCTIVITY AND OUTPUT PRICE TRANSMISSION By William J. Burke Chapter 1 is an analysis of the determinants of maize yield response to fertilizer applications using longitudinal data collected in 2004 and 2008 from 7,127 smallholder maize fields. The Instrumented Pooled Correlated Random Effects estimator is employed to control for several statistical considerations often overlooked in the social science literature on smallholder production. The model is specified such that response rates to fertilizer application are conditional on certain farmer practices and the agro-ecological conditions under which maize is grown. Findings indicate top dressing is more effective than basal fertilizer on Zambian soils with average response rates of 4.3 kg/kg and 3.0 kg/kg respectively. This however masks a wide range of variability in fertilizer’s effectiveness. Top dressing response rates, for example, can be nearly 50% lower on coarse, sandy soils and on plowed fields where the majority of the topsoil is disturbed. Basal fertilizer is vulnerable to nutrient “lockup” in the acidic soils that prevail throughout Zambia. Average marginal yield response to basal fertilizer is just 2.1 kg/kg on the highly acidic soils where 51% of our sample fields are located. On semi-neutral soils, response rates can more than triple up to 7.6 kg/kg on average. Unfortunately, only 2% of our sample (and a similar proportion of all Zambian maize fields) are in areas where semi-neutral soils prevail. 2 Given transportation costs and average products, this study demonstrates that fertilizer use is unprofitable for most Zambian farmers at commercial prices, which has important implications for the long-run viability of subsidy programs. Specifically, if fertilizer is unprofitable for farmers commercially, there is no possibility for a successful “phase out” of a subsidy program after which farmers would continue to use commercial fertilizer. Chapter 2 addresses issues pertaining to marketing and trade policies. Expensive interventionist grain marketing and trade policies in many Southern African countries are frequently born from uncertainty regarding potential private sector performance. These policies have limited the activity of the private sector, which perpetuates the uncertainty over its potential performance. Indeed, many studies conclude that grain markets in Southern Africa are not integrated with each other and other world markets at least partially due to government policies and the transfer costs they impose. This study employs the price transmission model introduced by Myers and Jayne (forthcoming) using data from various sources to determine whether long-run spatial price equilibriums exist, and to measure the speed at which price shocks are transmitted. The key innovation in this research is the focus on markets that are connected through informal trade across international borders, specifically focusing on a pair of markets in Zambia and The Democratic Republic of Congo and a pair of markets in Malawi and Mozambique. In short, this study shows that when we examine the price relationship between markets that are relatively unimpeded by interventionist trade policies and when we control, to the extent possible, for transfer costs, markets in the Southern Africa region will likely perform in accordance with economic theory; a long-run price equilibrium will exist, arbitrage will apparently be carried out competitively, and price transmission is going to be fairly rapid. 3 Copyright by WILLIAM J. BURKE 2012 4 For my wife and son, and for my parents v ACKNOWLEDGEMENTS I would like to thank my major professor, Thom Jayne, and acknowledge the extremely valuable comments and advice he has provided over the past several years. I would also like to acknowledge the contributions and feedback from the rest of my examination committee: Roy Black, Bob Myers and Jeff Wooldridge. I greatly appreciate the support of the Faculty of the Department of Agricultural, Food and Resource Economics, especially Antony Chapoto, Margaret Beaver, David Mather, Dave Tschirley and Mike Weber. I would like to thank many of my fellow students at MSU, especially Joleen Hadrich, Nicky Mason, Jake Ricker-Gilbert, Joshua Ariga and Dingi Banda. I am grateful to Nick Sitko and Briton Walker for numerous rounds of useful comments. This research benefitted greatly from comments received from the staffs of the Zambia Agricultural Research Institute and the Famine Early Warning System Network’s Zambia head office. While conducting this research I was very fortunate to enjoy the support of Honorable Chance Kabaghe and the staff of the Indaba Agricultural Policy Research Institute (formerly the Food Security Research Project) with funding from the United States Agency for International Development. Additional funding was provided by the Bill and Melinda Gates Foundation through the Guiding Investments in Sustainable Agricultural Markets in Africa project. Any opinions, errors or omissions are solely my own. vi TABLE OF CONTENTS LIST OF TABLES……………………………………………………………………..................ix LIST OF FIGURES………………………………………………………………………………xi LIST OF ABBREVIATIONS……………………………………………………………...........xii INTRODUCTION...........................................................................................................................1 CHAPTER 1: Determinants of Maize Yield Response to Fertilizer Application in Zambia: Implications for Strategies to Promote Smallholder Productivity............................6 1.1 INTRODUCTION.........................................................................................................6 1.2 BACKGROUND.........................................................................................................10 1.3 CONCEPTUAL FRAMEWORK................................................................................13 1.3.1 Determinants of maize yield........................................................................13 1.3.2 Functional form of the yield model..............................................................17 1.4 ESTIMATION............................................................................................................22 1.4.1 Statistical considerations..............................................................................22 1.4.2 Methods to account for omitted variables and structural endogeneity.........24 1.5 DATA..........................................................................................................................30 1.6 RESULTS AND INTERPRETATION.......................................................................36 1.6.1 Determinants of yield (excluding fertilizer).................................................39 1.6.2 The cost of estimating via OLS....................................................................41 1.6.3 Yield response to fertilizer and profitability.................................................42 1.7 SUMMARY AND IMPLICATIONS..........................................................................57 APPENDICES...................................................................................................................65 APPENDIX A........................................................................................................66 APPENDIX B........................................................................................................68 APPENDIX C........................................................................................................71 APPENDIX D........................................................................................................74 APPENDIX E........................................................................................................75 APPENDIX F........................................................................................................78 REFERENCES..................................................................................................................79 CHAPTER 2: Competitive and Effective: Informal trade, spatial equilibrium and rapid price transmission in Southern Africa.....................................................84 2.1 INTRODUCTION.......................................................................................................84 2.2 DATA..........................................................................................................................87 2.2.1 Informal trade volume...................................................................................87 2.2.2 Retail maize grain prices...............................................................................88 2.2.3 Transfer costs................................................................................................90 2.3 METHODS..................................................................................................................92 2.3.1 Price transmission with transfer costs...........................................................93 2.3.2 Trade based thresholds..................................................................................97 2.3.3 Non-sequential threshold estimation...........................................................101 vii 2.3.4 Analyzing results........................................................................................102 2.4 RESULTS..................................................................................................................103 2.4.1 Results from Kitwe, Zambia and Kasumbalesa, The DRC........................103 2.4.2 Results from Cuamba, Mozambique and Liwonde, Malawi......................111 2.5 SUMMARY AND CONCLUSIONS........................................................................120 REFERENCES................................................................................................................122 viii LIST OF TABLES Table 1.1: Distribution of the number of maize fields cultivated by the household......................12 Table 1.2: Distribution of smallholder maize yield and yield determinants in Zambia (2004, 2008).................................................................................................33 Table 1.3: Distribution of binary yield determinants in Zambia (2004, 2008)..............................34 Table 1.4: Selected results from yield response estimates.............................................................37 Table 1.5: Average partial effects of fertilizer use from various conditional response model estimates (kg of maize output per kg of fertilizer input......................42 Table 1.6: Maize yield response to basal fertilizer over soil acidity ranges..................................43 Table 1.7: Yield response to top dressing fertilizer by tillage and soil types................................52 Table B.1: Select results from yield response estimates un-weighted for attrition bias................68 Table C.1: Instrumented, Pooled Correlated Random Effects estimation results compared to Fixed Effects Instrumental Variable estimation results...........................71 Table D.1: Net revenue maximizing seed rate sensitivity analysis...............................................74 Table E.1: Selected results from yield response estimates for quantities of Nitrogen and Phosphorus..........................................................................................75 Table F.1: Likelihood of successful harvest and fertilizer use by province..................................78 Table 2.1: Alternate identification assumptions for spatial price transmission models................96 Table 2.2: GP threshold selection for price transmission between Kitwe & Kasumbalesa.........106 Table 2.3: Diagnostic tests for price series stochastic properties (Kitwe & Kasumbalesa)........107 Table 2.4: Price transmission estimation results for Kitwe & Kasumbalesa...............................109 Table 2.5: GP threshold selection for price transmission between Cuamba & Liwonde............114 Table 2.6: Diagnostic tests for price series stochastic properties (Cuamba & Liwonde)............115 Table 2.7: Price transmission estimation results for Cuamba and Liwonde................................117 ix LIST OF FIGURES Figure 1.1: Maize yields in Zambia versus global exporting countries (1961-2009)....................10 Figure 1.2: Conditional Yield Response to Fertilizer Application................................................19 Figure 1.3: Marginal yield response to basal fertilizer application by soil pH..............................43 Figure 1.4: Simulated profitability of basal application near Choma............................................45 Figure 1.5: Simulated profitability of basal application near Choma: Actual relevant price ratio sensitivity analysis............................................................49 Figure 1.6: Cumulative distribution of average product of fertilizer use (kg/kg)..........................53 Figure 1.7: Expected average product of fertilizer applications in Zambia...................................55 Figure A.1: pH and soil type maps of Zambia...............................................................................66 Figure A.2: Soil Map of Zambia....................................................................................................67 Figure 2.1: Map of markets considered in the spatial price transmission analysis........................89 Figure 2.2: Price difference between Kitwe and Kasumbalesa and informal trade.....................104 Figure 2.3: Simulated shock to equilibrium prices in Kitwe and Kasambulesa..........................110 Figure 2.4: Price difference between Liwonde and Cuamba and informal trade levels..............112 Figure 2.5: Simulated shock to equilibrium prices in Cuamba and Liwonde..............................118 x LIST OF ABBREVIATIONS ADF APE APP AVCR CFS CRE CSA CSO DRC ERB FD FE FEWSNET FRA GP ha IITA IMR IPCRE IPW IV kg km KPPS MET mm mt MZN N OLS P PAM PBM PCRE pH RE RSA SA SCC SEA SEECM TAR ZARI ZMK Augmented Dickey-Fuller Average Partial Effect Augmented Phillips Perron Average Value Cost Ratio Crop Forecast Survey Correlated Random Effects Census Supervisory Area Central Statistics Office Democratic Republic of Congo Energy Regulation Board (Zambia) First Difference Fixed Effects Famine Early Warning System Network Food Reserve Agency Gonzalo-Pitarakis hectare International Institute of Tropical Agriculture Inverse Mills Ratio Instrumented Correlated Random Effects Estimator Inverse Probability Weight Instrumental Variable kilograms kilometer Kwiatkowski, Phillips, Schmidt, and Shin Zambia Meteorological Department millimeter metric ton Mozambican New Metical Nitrogen Ordinary Least Squares Phosphorus Program Against Malnutrition Parity Bounds Model Pooled Correlated Random Effects potential Hydrogen Random Effects Republic of South Africa Southern Africa Soil Carbon Content Standard Enumeration Area Single Equation Error Correction Model Threshold Autoregression Zambia Agricultural Research Institute Zambian Kwacha xi INTRODUCTION Indicators such as the food crisis of 2008, the ongoing famine in parts of the developing world and the stubbornly persistent poverty rates of the past several decades all suggest that a deep chasm still exists between the status quo and a world without hunger. According to the United Nations, the world has seen moderate progress towards accomplishing the first Millennium Development Goal’s target of halving the proportion of undernourished people in developing regions (reducing from 20% to 16% from 1990/92 to 2005/07), but this success has been outpaced by population growth. In fact, there are more undernourished people alive today than there were in 1990. Particularly in Africa, current population trends coupled with dismal agricultural productivity growth clearly demonstrate the urgent need to better understand the vast array of hindrances to achieving food security. Africa is arguably a potential breadbasket for the world, yet thus far most countries on the continent struggle to produce enough to feed themselves. The massive amount of food imported by countries in Africa every year in turn affects the availability of food in other countries (developing or otherwise) across the globe. Informing African food and agricultural policy discussions is therefore a key component of achieving the Millennium Development Goal to end hunger and poverty. Throughout Africa, food and agricultural policy discussions invariably revolve around two clearly linked issues: low productivity and poor market access for small farmers. These represent two fronts of the same war on poverty and solving one of these problems will ultimately be of little use without solving the other. Land is increasingly in short supply, which means land productivity increases and agricultural intensification is the only long-term strategy 1 to avoid a grim Malthusian scenario. Without improved market access, though, productivity gains will do nothing more than maintain the subsistence level of poverty that prevails in the developing world. In turn, without productivity gains and production surpluses, there is not much to be gained from improved market access. The following chapters of this dissertation address specific aspects of productivity and market access individually, respectively focusing on Zambia and the Southern African region more generally. Chapter 1 is an analysis of the determinants of maize yield response to fertilizer applications using longitudinal data collected in 2004 and 2008 from 7,262 smallholder maize fields. The Instrumented Pooled Correlated Random Effects estimator is employed to control for several statistical considerations often overlooked in the social science literature on smallholder production. The model is specified such that response rates to fertilizer application are conditional on certain farmer practices and the agro-ecological conditions under which maize is grown. Findings indicate top dressing is more effective than basal fertilizer on Zambian soils with average response rates of 4.2 kg/kg and 3.0 kg/kg respectively. This however masks a wide range of variability in fertilizer’s effectiveness. Top dressing response rates, for example, can be nearly 50% lower on coarse, sandy soils and on plowed fields where the majority of the topsoil is disturbed. Soil acidity is a substantial limiting factor in Zambian maize production, both from the direct impact it has on maize plants and the impact it has on basal fertilizer’s effectiveness. Basal fertilizer is vulnerable to nutrient “lockup” in the acidic soils that prevail throughout Zambia. Lockup is the process whereby phosphorus intended for the plant is converted into iron 2 and aluminum phosphates that are unavailable for plant consumption. Average marginal yield response to basal fertilizer is just 2.2 kg/kg on the highly acidic soils where 51% of our sample fields are located. On semi-neutral soils, response rates can more than triple up to 7.9 kg/kg on average. Unfortunately, only 2% of our sample (and a similar proportion of all Zambian maize fields) are in areas where semi-neutral soils prevail. On the vast majority of Zambian maize fields basal fertilizer is generally ineffective. Given transportation costs and average products, this study demonstrates that fertilizer use is unprofitable for most Zambian farmers at commercial prices. In fact, fertilizer, if it is the only input under consideration, is only profitable for most households when it can be purchased (and maize can be sold) at subsidized prices. This finding has important implications for the long-run viability of subsidy programs. Specifically, if fertilizer is unprofitable for farmers commercially, there is no possibility for a successful “phase out” of a subsidy program after which farmers would continue to use commercial fertilizer. This calls for a shift in the design of agricultural productivity policies and rural poverty reduction programs, away from fertilizer subsidies as the cornerstone and towards developing a more integrated program. This may include fertilizer subsidies along with other inputs and agronomic practices, but must allow for sustainable and profitable crop intensification, even after the subsidies are withdrawn. Some possible alternatives are discussed in this study, such as distributing and demonstrating the acidity mitigating effects of lime or tailoring application methods to acidic soil. The key implication, however, is that research and extension need to be given higher priority in the agricultural policy portfolio. 3 Chapter 2 addresses issues pertaining to marketing and trade policies. Expensive interventionist grain marketing and trade policies in many Southern African countries are frequently born from uncertainty regarding potential private sector performance. These policies, including export bans, subsidized imports and license restrictions, have limited private sector activity, which perpetuates the uncertainty over its potential performance. Indeed, many studies conclude that grain markets in Southern Africa are not integrated with each other and other world markets at least partially due to government policies and the transfer costs they impose. Chapter 2 is an analysis of maize grain price transmission between markets in Southern Africa that trade maize informally. Informal trade is unlicensed, untaxed and characterized by a large number of small scale traders (as few as 1-2 bags of maize at a time) who collectively move substantially more grain throughout the region than is traded through formal channels. The fact that these transactions are difficult to regulate suggests the relationship between informal import and export markets can provide new insights into how international markets within the region might perform in the absence of interventionist policies. A Threshold Autoregressive Single Equation Error Correction Model is employed using data from various NGO and government sources to determine whether, and under what conditions, long-run spatial price equilibrium exists, and to measure the speed at which price shocks are transmitted between surplus and deficit markets. The key innovation in this research is the focus on markets that are connected through informal trade across international borders. In addition to focusing on these markets, the study also allows the economic relationships to potentially differ across multiple regimes according to the varying level of informal trade. The study analyzes price transmission between a pair of markets in Zambia and The Democratic Republic of Congo (DRC) as well as a pair of markets in Malawi and Mozambique. 4 In both cases, statistical selection criteria favor the single-regime, no threshold model for price transmission. This could partially be explained by the fact that trade between the markets in both of our models is effectively continuous (which rules out the most likely threshold value of no trade). However, this could also be an empirical issue related to the sample size (which is fairly small with 60-72 observations, depending on the market pair). Therefore, one conclusion of the study is that the research should be revisited after more data has become available. Results from single-regime model estimation provide evidence of long-run spatial price equilibrium in both market pairings. In both cases the coefficient estimate for the long-run equilibrium suggests competitive arbitrage links the informally trading markets. That is, price ratio estimates are not significantly different than one after controlling for transfer costs. The rate of price transmission was also similar in the two models estimated. The traditional half-life measurement of a transfer is estimated to be roughly 2.7 months between the Zambia and DRC markets or 2.5 months between the Mozambique and Malawi markets. Both of these represent fairly rapid price transmission relative to results from other studies. Through simulation analysis the study demonstrates that one month after a shock to equilibrium is introduced, 67% (77%) of the total value of the shock will have transferred from Zambia to DRC (Mozambique to Malawi). In short, this study shows that when we examine the price relationship between markets that are relatively unimpeded by interventionist trade policies and when we control, to the extent possible, for transfer costs, markets in the Southern Africa region will likely perform in accordance with economic theory; a long-run price equilibrium will exist, arbitrage will apparently be carried out competitively, and price transmission is going to be fairly rapid. 5 CHAPTER 1: Determinants of Maize Yield Response to Fertilizer Application in Zambia: Implications for Strategies to Promote Smallholder Productivity 1.1 INTRODUCTION Limited agricultural productivity is an immediate and overwhelmingly important challenge to food security and long-run poverty alleviation in Africa. Despite several decades of targeted policies, productivity amongst rural African farmers has stagnated at levels far below world averages and too low to support transformative economic growth (Poulton et al, 2006; FAOSTAT, 2003). In policy circles this is often attributed to low adoption of productivity enhancing technologies. Although a wide array of productivity-improving inputs and technologies exist, policies often focus heavily on promoting the adoption of fertilizer, for which application rates in Africa stand at just 10-30% of those in Asia, Europe and the Americas (FAOSTAT, 2008). In recent years this has fueled debate and a resurgence of fertilizer subsidy programs in numerous countries including Malawi, Zambia, Senegal, Mali and Kenya, among others. These policies are motivated by the notion that such support, designed to compensate farmers for the marketing and resource constraints, could initiate a virtuous cycle of technology adoption, productivity growth and poverty alleviation (Crawford, Kelley and Ricker-Gilbert, 2011; Chibwana et al, 2010; Dorward et al, 2008; Xu et al, 2009a). It is possible, however, that limited yield response rates and relative prices render fertilizer use unprofitable, which could explain low levels of adoption and question whether fertilizer use alone can sufficiently increase productivity. A few recent studies have 6 demonstrated that the profitability of adoption can differ significantly between households and fields even in the same local areas with roughly similar agro-ecological conditions (Waithaka et al., 2007; Xu et al, 2009b; Marenya and Barrett, 2010, Matsumoto and Yamano, 2010). In a study of hybrid maize seed adoption in Kenya, Suri (2011) demonstrates how variable seed productivity and input transaction costs explain farmer behavior. Specifically, the study highlights that adoption and non-adoption are actually the result of rational decision making after accounting for heterogeneous costs and benefits. The same may be true of fertilizer in Zambia. The productivity of inputs can depend on farm management, which the farmer can control, but also factors such as weather and soil characteristics, which may be perceived as exogenous and unchangeable. If yield response to fertilizer is unprofitable at the field level, then marketing and resource constraints are secondary fertilizer’s agronomic limitations, which will have important policy implications. Specifically, it would be appropriate to consider fertilizer as part of a broader framework of input strategies available to promote productivity. For economists to better understand adoption behavior and input profitability, and for policy makers to better design agricultural policies to promote productivity growth, we must better understand the agronomics of the smallholder farmer’s yield function. Modeling yield functions has been problematic in the social science literature, because estimating yield functions using household data (as social scientists tend to do, rather than using field test data as in agronomic literature) presents numerous challenges including uncertain model specification, 1 structural endogeneity of input decisions , omitted variables and unobserved heterogeneity. 1 “Structural endogeneity” is used here in reference to the endogeneity of chosen inputs that stems from the fact that the yield function itself is part of the structural model from which input demand is derived (the production constraint), as opposed to any endogeneity bias which would 7 Failing to adequately account for these problems may lead to biased results that could ill-inform farmers and policy makers. The objective of this study is to identify the determinants of variable yield response to fertilizer application and evaluate their impact on the profitability of fertilizer use. We will use longitudinal data from 7,262 fields farmed by smallholder Zambian maize producers interviewed in 2004 and 2008, as well as location specific data on weather and soil characteristics. In doing so we will address a number of agronomic and statistical considerations that have been underappreciated in economic research of household fertilizer use and we will demonstrate a method that can be used in future analyses to address these issues. This study will inform policy makers’ efforts to promote farmer productivity and demonstrate a method for estimating yield determinants that will guide future research on similar subjects. We will demonstrate that failure to control for endogeneity and unobserved heterogeneity leads to biased estimates of the average partial effect of fertilizer use in the case of Zambian smallholder maize yields. Furthermore, we will demonstrate that ecological conditions such as soil type and acidity can cause considerable variation in response rates to top dressing and basal fertilizer applications. We demonstrate that soil acidity in particular has dramatically decreased the yield response and profitability of fertilizer use in many parts of Zambia. This has important implications for the design and implementation of input support programs and budget allocations in Zambia and perhaps other food insecure countries. More importantly, it suggests that yields could be dramatically improved through a diversified input strategy and investment in agronomic research and extension. stem from correlation between the observed determinants and omitted variables included in the error term. 8 Section 1.2 briefly discusses the background information regarding fertilizer use in Zambia and the data we will use. Section 1.3 will present our conceptual framework for modeling yield determinants and the agronomic principles guiding our model. Section 1.4 discusses the statistical considerations relevant to estimation and provides an approach to account for them. Section 1.5 presents and interprets results. Section 1.6 concludes with implications for policy makers. 9 1.2 BACKGROUND Zambia is a nearly middle-income developing country in the Southern Africa region, sandwiched between the maize importing countries of Zimbabwe and the Democratic Republic of Congo. Maize is Zambia’s largest staple crop and the most likely agricultural commodity to provide substantial and broad-based export revenue. Most of Zambia’s neighbors, however, import maize from the Republic of South Africa (RSA) or other suppliers of the world market. This can partially be blamed on Zambia’s dismal land productivity as compared to maize exporting countries such as RSA, Argentina or the United States (Figure 1.1). If Zambia is to take advantage of the exporting opportunities that exist in the region and beyond, it is obvious that productivity must increase (or costs of production must come down). Figure 1.1: Maize yields in Zambia versus global exporting countries (1961-2009) Average Yield (kg/ha) 12000 10000 8000 6000 4000 2000 19 61 19 65 19 69 19 73 19 77 19 81 19 85 19 89 19 93 19 97 20 01 20 05 20 09 0 Year Argentina RSA USA Zambia Source: FAOSTAT accessed July, 2011. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. 10 2 Usage of inorganic fertilizers may be one way to achieve productivity gains. However, despite a significant increase in fertilizer subsidies over the past two years, only 51% of Zambian smallholders used fertilizer during the 2010 and 2011 growing seasons (CFS 2010, 2011). Moreover, among those who did use, the application rates were, on average, 152 kg/ha each for basal and top dressing, well below the 200 kg/ha government recommendation. Since 2006 roughly two thirds of all fertilizer users apply less than the recommended amount per hectare. (ZARI, 2002; CFS, 2006-2011). Fertilizer adoption (and application rates among those who do use) may be low due to low or variable effectiveness of the technology. Other heterogeneous factors of production may be limiting the effectiveness of fertilizer. Zambian Crop Forecast Survey (CFS) data indicate that smallholder fertilizer users produce a yield of 2.5 metric tons per hectare, on average. This is more than double the yield for non-users, but a third or less of what Zambian commercial farmers produce and less than 20% of the average yield for farmers in developed countries (CFS, 2006-2011; Lungu et al, 2009; Brittan et al, 2008). To explain the disparity between smallholder yields are so dramatically lower than those of their larger scale counterparts we must better understand the determinants of smallholder yield. Modeling production functions is an old science, dating back further than the popular work of Cobb and Douglas in 1928 (see Griliches and Mairesse (1995) for a summary of this and other early work). That said, when estimating these models using survey data there are many known statistical problems which may influence results. Some or all of these issues are 2 For the purpose of this study the phrase “fertilizer” refers to manufactured inorganic fertilizer. 11 Table 1.1: Distribution of the number of maize fields cultivated by the household Number of maize fields 1 2 3 4 5 6 more than 6 Source: SS04, SS08 Percent of fields farmed by households cultivating 69.0% 20.1% 6.9% 2.2% 0.6% 0.7% 0.4% Percent of households cultivating 84.0% 12.2% 2.8% 0.7% 0.2% 0.2% 0.1% frequently ignored in the agricultural economics literature, but in the following sections we will outline a conceptual framework and estimation approach to easily accommodate them. Specific variables used and the method of computing them (when necessary) are discussed in the following sections. 12 1.3 CONCEPTUAL FRAMEWORK 1.3.1 Determinants of maize yield The general form of the conceptual model for yield functions has not changed much over time. Following Heady (1956), the factors determining yield (Y) can be summarized as: (1.1) Y  f F , X , S  where F is a vector of fertilizer applications, X is a vector of other determinants that are controlled by the farmer and S is a vector of strictly exogenous yield determinants. In agronomic literature F would be treated as a vector of nutrients themselves, primarily the quantities of nitrogen (N), or phosphorus (P) that are applied. In this study, however, we are more interested in the perspective of the farmer, who, in Zambia, is far more likely to think in terms of quantities of fertilizer mixtures, which are classified as either basal or top dressings. 3 That said, because of their chemical make-up, it is important to treat theses mixtures separately in the model in order to understand the factors conditioning yield responses. Basal fertilizer is designed to be applied early and is primarily phosphoric. The basal fertilizer most frequently used in Zambia is more commonly known as Compound D. Top dressing fertilizer, most commonly Urea, is entirely N in terms of nutrients and is designed to be applied periodically throughout the growing season. P and N, and by extension basal and top dressing, contribute to plant growth in fundamentally different ways, and have different characteristics of nutrient dispersion (Eckert, 2010; Griffiths, 2010), and so will be included as separate variables. 3 Results of estimating this study’s model after converting “fertilizer” measurements into the quantities of N and P applied are available in Appendix E. 13 When N is applied to a field is generally consumed by plants within a few weeks of application, P applied to a field becomes soil ready over a number of years. In fact, it is estimated that only 20% of P applied is consumed by crops in the year of application (Griffith, 2010). Therefore, the amount of “carry-over” P in the soil is at least as relevant to current yields as contemporaneous basal fertilizer application rates (Lanzer and Paris, 1981; Goedeken et al, 1998). Practically speaking this suggests that lagged values of basal fertilizer should also be included in the yield model. Ideally, we would include contemporaneous and a series of lagged values for basal fertilizer dressing, or some indicator of the pre-existing plant-ready phosphorus. Unfortunately, data to address this as thoroughly as agronomic principles dictate is not available. In preliminary analysis we attempted to capture the carry over effect of P fertilization by including one survey period (3-4 years) lagged value for basal application at the household level. Recall, however, we are not able to know whether these applications were applied to the same fields from which we have our yield observations. As a result, or perhaps due to collinearity stemming from the lack of temporal variation in application rates within households, this variable was not significant and its inclusion came with high efficiency costs. We thus omit lagged basal application values from our analysis and are left only with the option of interpreting our basal fertilizer variable as representing both contemporaneous applications and as a proxy for lagged applications. We will also attempt to compensate for this omission when discussing the profitability of basal application by conducting price sensitivity analysis. We will adjust the prices assigned to basal according to a range of possible absorption rates. Following numerous other studies, factors of production controlled by the farmer, X, will include seed varieties and application rates, crop mixtures (i.e. whether maize is intercropped 14 4 with a nitrogen fixing plant , or whether it is forced to compete with another field crop such as millet), whether the field is weeded throughout the growing season, tillage method and tillage timing (Lichtenberg and Zilberman, 1986; Paris, 1992; Rabbinge, 1993 Chambers and Lichtenberg, 1994; Chambers and Lichtenberg, 1996; Carpentier and Weaver, 1997; van Ittersum and Rabbinge, 1997; Saha, Shumway and Havenner, 1997; Berck, Geoghegan and Stohs, 2000; Oude Lansink and Carpentier, 2001; Holloway and Paris, 2002; van de Ven et al, 5 2003; Guan et al, 2006) . With respect to timing, we control for whether planting took place prior to the first rains, when there is annual “nitrogen flush” into the soil from organic material that has decomposed throughout the dry season (Haggblade and Plerhoples, 2010). Strictly exogenous yield determinants, S, are soil characteristics such as endowment of nutrients, texture and acidity, as well as weather factors such as levels and temporal distribution of rainfall (Heady, 1956; Tolk et al, 1999; Snyder, 2010; others). Soil types for the enumeration areas in our study, which have been published by the Zambian Ministry of Agriculture and Cooperatives (Mambo and Phiri, 2003), will be classified as either Acrisols, Zambia’s most common clayish soil that covers 39% of the fields in our study, sandy and less developed soils (Arenosols, Leptosols, Podzols and Regosols accounting for 23% of the sample), rich soils that are relatively moist or abundant with organic matter (Cambisols, Gleysols, Histosols and Luvisols covering 5% of the sample) and other clayish soils (Alisols, Ferralsols, Lixisols and 4 Nitrogen fixing plants in our sample are groundnuts, soybeans, cowpeas and various beans. See Lindemann and Glover (2008) for a discussion on how these plants add nitrogen to soils. 5 In this literature other factors identified as yield determinants are pesticides, herbicides, fungicides, lime application and irrigation, however these practices are not prevalent in Zambia’s rural agriculture sector. 15 Vertisols on 19% of the sample fields). Other soil types account for the remaining 14% of the sample. The level of soil organic matter is critical for plant growth, both in terms of having nutrients available to plants and having sufficient soil carbon present to sustain microbe populations that process nutrients into forms available to plants (Bationo and Mokwunye, 1991; Manlay et al, 2007). Using measurable soil carbon content (SCC), Marenya and Barrett (2009) demonstrate how this characteristic affects demand for fertilizer in Kenya. Although we lack such field specific data, our soil type data are a more general control for this important yield determinant (note, several of the “rich” soils are defined as such due to the prevailing levels of organic material). At the field level we are also able to include a contemporary dummy variable for whether plant or animal manure is applied. Soil acidity will be controlled for using potential Hydrogen (pH) soil test results for the 341 enumeration areas used in this study. Soil tests were conducted by GRZ and reported in Mambo and Phiri (2003). According to the maize production guide published by the Zambia Agricultural Research Institute (ZARI, 2002), 4.4 is a critical pH threshold for maize production in Zambia, below which (i.e. in more acidic soils) plants will not have access to necessary soil nutrients, leading to limited root growth, wilting and diminished yields. A second critical threshold effecting growth occurs at a pH of 5.5 due to the effect acidity can have on phosphoric (basal) fertilization. In more acidic soils (pH< 5.5) phosphorus can be “locked” in the soil as it converts to iron and aluminum phosphates that are unavailable for plant consumption. The pH range of 5.5 - 7 is optimal for phosphoric fertilization since, in this range, phosphorus converts to plant-available mono- and di-calcium phosphates (Griffiths, 2010). In alkaline soils phosphorus is vulnerable to becoming locked into tri-calcium phosphates that are also unavailable to plants 16 (Griffiths, 2010), but this risk is not relevant in our model (Zambia pH tests reveal a range of soil acidity from 3.1 to 7.1). We will therefore include pH in our model by including dummy variables for which critical pH range is prevalent in each field’s area. One indicator variable will designate those fields where pH is between 4.4 and 5.5, one will designate those in an area where pH is between 5.5 and 7.1, while the effects of being in more acidic soils (pH<4.4) is subsumed into the intercept term. Since productivity is expected to be higher in less acidic soil, we expect the partial effect of these dummies to be positive. Maps for Zambia pH test sites and soil types can be found in Appendix A. Total rainfall levels for the growing season (November to March) will be included using data from 36 Zambia Meteorological Department (MET) as well as a variable for rainfall stress levels defined as the number of back to back ten day periods with less than 40 mm total rainfall during the rains (See Mason (2011) for description of rainfall data collection). Other net weather effects will be controlled for at the provincial level using dummy variables for each province, interacted with time trends. 1.3.2 Functional form of the yield model While the factors determining yield are largely undisputed, the functional form of the underlying data generating mechanism is less obvious. A widely accepted contention is that the effectiveness of certain factors of production is conditional on other factors. The most extreme form of this model was first posited by organic chemist Justus von Liebig in 1862, which has been called the “law of the minimum.” Consider, for example, a simple model in which yield is determined only by fertilizer, f, and something else, x. The strictest form of von Liebig’s function for yield, y, would be: 17 (1.2) y  min  1 F ,  2 x  If this model is correct and 2x is less than  1 F , no amount of fertilizer could be added to increase yield. More recently studies have argued there is a more sophisticated relationship between inputs and yield, but the principle of codependence remains (Berk, Stohs and Geoghegan, 2000; Guan et al, 2006; Xu, et al, 2009b). For example, another form may be a conditional yield response function: (1.3) y   0  1 F   2 x   1 F 2   2 x 2   3 F  x The debate continues on whether strict von Liebig models similar to equation (1.2) or those similar to equation (1.3), sometimes called polynomial models, better fit actual data generating mechanisms (Paris, 1992; Chambers and Lichtenberg, 1996; Berk, Stohs and Geoghegan, 2000). Tembo et. al. (2008) present the most flexible of the von Liebig options in the “linear response stochastic plateau” model, but this and other von Liebig models assume that either the limiting factor of production is known, or, if unknown, is the same for all observations. One could argue, however, that heterogeneity among households renders such strict models less applicable to survey data as compared to test field data. That is, even if a strict von Liebig model where fertilizer is the limiting factor of production better represented the data generating process for a sub-set of the sample, it almost certainly wouldn’t apply to all households. Therefore, equation (1.3) is more appropriate as its flexibility incorporates all households, and presents us with a household specific yield response to fertilizer that depends on x: (1.4) y   1  2 1 F   3 x F 18 To demonstrate how such a model could be interpreted, suppose that for two values of x ( x  and x  ) the yield response function for fertilizer appears as in Figure 1.2, where the profitable yield at the yield maximizing level of fertilizer use, F  , is indicated by the solid horizontal reference Figure 1.2: Conditional Yield Response to Fertilizer Application Yield Profitable yield when x  x  F  F x  x  Profitable yield when x  x F  F x  x F line. At x Fertilizer Application one might erroneously conclude that fertilizer use is unprofitable, even at the yield maximizing level of application. Instead, Figure 1.2 demonstrates a condition where fertilizer is not the binding constraint on profitability. Clearly, if we change the value of x to x  , yield maximizing fertilizer application (and a range of non-maximizing application rates) is profitable even if we allow the necessary yield level to increase to the dashed horizontal reference line. Unlike this example, of course, our model consists of two vectors of explanatory variables other than fertilizer: farmer choices and exogenous soil and weather effects. According 19 to the general model in equation 3, our estimation would include a full set of interaction and quadratic terms for all right hand side variables in both F and X. Doing so, however, may introduce a large degree of collinearity. To avoid this cost, we could focus on the interaction terms relevant to the objectives of this study (those for fertilizer variables), and return to agronomic literature to identify the interactions we should expect to be important. This is sometimes described as a “bottom-up” approach to model building. An alternative would be a “top-down” approach whereby we estimate a model including all interactions, then impose exclusion restrictions based on significance levels. In this study we take a hybrid approach, estimating the full model, and then imposing exclusion restrictions based on significance levels and what we know from the agronomic literature. It is worth noting that the significant interactions in our final results were largely significant in the model with the full set of interactions, but were estimated with less precision, hence the preference for the relatively parsimonious model described in the remainder of this section. As previously mentioned, phosphorus is vulnerable to waste in acidic soils. Specifically, below a pH level of 5.5, P converts to iron and aluminum phosphates, which are fairly useless to plants (Griffiths, 2010). Snyder (2010) reports that roughly 77% of phosphoric fertilizer can be wasted in soil with a pH lower than 4.5. In the pH range of 5.5 to 7, on the other hand, P converts to mono- and di-calcium phosphates which maximize the availability of phosphorus to plants (Griffiths, 2010). We will therefore include interaction terms between basal application and our acidity indicator variables. In some ways the productivity of P and N depend on each other. For example, they are 20 both essential elements in adenosine triphosphates, which are the key “energy unit” that plants form during photosynthesis and which is used to store and use the plant’s energy (Eckert, 2010; Griffiths, 2010). That said, over time the N in Urea can increase soil acidity, meaning P and N could be working against each other. To test which scenario is prevalent in Zambia we will also include an interaction term between basal and top dressing application rates. Nitrogen is vulnerable to a similarly unfavorable transformative process called volatilization in alkaline soils, but with no pH level in our data greater than 7.1, this interaction would not be relevant in this model. The largest threat to N fertilizer loss, rather, is through leaching. This is when nitrates dissolve in water and essentially get washed away before they can be consumed by the plant. There is a greater potential risk in certain soil types that are coarser and allow water to percolate more freely (Eckert, 2010). There is also a greater risk with tillage methods that disturb more soil, such as plowing. We will therefore include an interaction term between top dressing application rates and soil type and a dummy variable for plow tillage. Improved and hybrid seed varieties are specifically designed to better ingest both N and P, which suggests both basal or top dressing application should interact with our seed variety variable. Unfortunately, including separate interactions results in efficiency losses due to collinearity, so our final model will impose the restriction that the interaction with seed type is 6 the same for both types of fertilizer (i.e. seed type will be interacted with the sum of basal and top dressing application rates). 6 The correlation coefficient for basal and top dressing application rates is 0.91, significant at the 0.01% level. Modeling separate seed type interactions results in unstable, nonsensical parameter estimates. Including either a top dress or basal interaction term alone results in a similar interaction for each. Imposing the restriction described here, then, seems to be the best feasible method for allowing both yield responses to be a function of seed type. 21 1.4. ESTIMATION 1.4.1 Statistical considerations The model described in section 1.3 has numerous characteristics that must be considered before estimation. Specifically, we must acknowledge, understand the potential impact of, and as best we can control for omitted variables, the structural endogeneity of input decisions, and unobservable heterogeneity. Some of the most important determinants of yield are unobserved in our data (and most data used by social scientists), such as soil moisture and the pre-existing available nutrients in the soil (Griffiths, 2010; Eckert, 2010). Although we have already described how we will control for soil type and acidity and the total rainfall and stress periods, there are undoubtedly microvariations between fields with respect to soil content that will depend on past farming practices (fertilization, crop rotation, etc.). There are several ways this omitted information has been dealt with in the literature on yield functions in developing countries such as 1) use of trail rather than survey data (Traxler and Byerlee, 1993; Kauka et al, 1994; Rötter and van Keulen Mwato et al, 1999; Sakala et al, 2004), 2) assuming the unobserved effects are time-constant and controlling for it via the choice of estimator (Xu et al, 2009b), or 3) testing for the potential bias omission may cause and, if none is found, proceeding with interpretation of the model with known omitted variables (Guan et al, 2006). Of course, one could also collect data on soil nutrient content, but doing so using survey data from a very large sample would be expensive and impractical. That said, two recent studies (Marenya and Barrett, 2009; Matsumoto and Yamano, 2009) were able to incorporate soil quality data (specifically, field level carbon content as a 22 proxy for available nutrients) into fertilizer demand functions and found that farmers’ use of fertilizer is positively correlated with soil quality. This has important implications for estimation of yield models when using survey data. When an explanatory variable positively affects the dependant variable and is positively correlated with an omitted variable that also positively affects the dependent variable, we will have an upwardly biased estimate on the effects of the observed determinant (Wooldridge, 2002). In other words, if fertilizer is more likely to be used on fields with more productive soil, estimating our model with soil quality missing could make fertilizer seem more productive than it is. Omitted variables are not the only potential source of bias in our model, because fertilizer use is endogenous in the structural model for input demand that is constrained by the yield function. That is, fertilizer demand is determined by maximizing the following: (1.5) 7 max f , x ,l   p q q  C s.t. q  q F , x , l  C  q  c  p F  F  p x  x  p l  l  pT  T where q is quantity produced, pj are unit prices of inputs and outputs, T is the transfer cost of inputs and outputs, c is unit cost of production, and l is hectares of land (assume this is oneperiod rented land for simplicity). As before, F is fertilizer used and x is the vector of other 7 One could argue that fertilizer demand is not the result of profit maximization and that farmers should rather be considered income and credit constrained utility maximizers, which would complicate derivation of the demand function from this conceptual framework. In this discussion, however, we are simply attempting to demonstrate the endogeneity of input selection in the yield function. Moreover, even if farmers are more appropriately considered utility maximizers, inputs are endogenous since a shock to yield would relax (or tighten) the income constraint. 23 inputs and factors of production as described above. Since q/l=yield, dividing both sides of the first constraint by l, we can see that profit maximization is subject to the yield function. In other words, this demonstrates that while the choice of fertilizer used will determine the yield outcome, it is the functional form of yield itself that partially determines the level of fertilizer used. Moreover, if we are to treat the production function as a stochastic process, as it clearly is in any weather-dependant agricultural system, we can add the disturbance term to the production constraint and see that input demand will be a function of shocks to yield, presenting a clear endogeneity problem. When this is ignored results are knowingly inconsistent and, once again, inputs have upwardly biased coefficients. 8 Finally, unobserved heterogeneity also exists in our model in the form of farmer ability and management skill. As with the omission of soil nutrients, this will put upward bias on our estimate of yield response to fertilizer if it is positively correlated with fertilizer use (as one would expect). This too could be controlled for via proxy variables or through our choice of estimator, if we are willing to assume ability is a time-constant. 1.4.2 Methods to account for omitted variables and structural endogeneity We can allow for the possibility of a fundamental difference in the soil quality of fertilizer users by including an intercept shifter for those applying it as a proxy variable. This will also allow us to test for whether fertilizer is more likely to be used on depleted soils. If the coefficient estimate on the intercept shifter (the fertilizer use dummy variable) is negative and significant, it would suggest that the yields of fertilizer users would have been lower than that of 8 The claim of upward bias is assuming that correlation between the input demand and any production shock is positive, and that the effect of using the input is positive. 24 non-users if fertilizer had not been applied. In other words, it would suggest that fertilizer is more likely to be used on depleted soils. By using this proxy indicator variable for soil quality we need not impose the assumption that it is a time-constant, and allow for the fact that fields may degrade (or improve) over time. In exchange for this flexibility, admittedly, a structurally endogenous explanatory variable has been added to the model, but we will control for this when we control for the endogeneity of application rates. Other omitted variables we’ve discussed (e.g. farmer ability) can be treated as timeconstant determinants (as in Guan et al, 2006; Xu et al, 2009b), and thus be controlled for using estimators available when we have panel data. These estimators, however, would be inconsistent due to the structural endogeneity that remains in our model. We must, therefore, first deal with structural endogeneity. The endogeneity of inputs in production function estimation has been one of the key criticisms of production analyses since the seminal work of Cobb and Douglas in 1928. Traditionally this endogeneity has been dealt with in one of several ways. Most, despite wide acknowledgement of the problem, continue to simply ignore it implicitly, or do so explicitly with some attempt at justification (see Griliches and Mairesse (1995) for a summary of this and other criticisms). Initially, attempts to control for endogeneity were made by claiming inputs were contemporaneously exogenous (or pre-determined) and the omitted endogeneity effect was timeconstant and employing the fixed effects (FE) estimator (Hoch, 1955; Mundlak, 1961; Mundlak and Hoch, 1965). As Chamberlain (1982) pointed out, however, the necessary assumption for the consistency of the FE estimator is strict, not contemporaneous, exogeneity, so this method 25 has since fallen from favor. More recently intermediate inputs have been used as proxy variables to control for the effect of shocks (Olley and Pakes, 1996; Levinsohn and Petrin, 2003). This is not a particularly attractive option in our case, since 1) no such input is readily available, and 2) a more direct approach is available. Specifically, we can use instruments to control for this endogeneity via 2SLS. 9 Due to the interaction terms outlined in section 1.3, implementation will be slightly more complicated than standard 2SLS, but manageable. To illustrate, recall the simplified version of our model in equation 3, adding the stochastic error term, u, and indexes for individuals, i, and time, t: (1.6) y it   0   1 Fit   2 x it   1 Fit   2 x it   3 Fit  x it  u it 2 2 The structural model in section 1.4.1 (equation 5) demonstrates that a very attractive instrument for the endogenous F would be exogenous determinants of fertilizer demand, such as the price of fertilizer, p F , if we are willing to assume E u | p F   0 assumption). That said, if F is endogenous, then so are the model consistently using only pF . ˆ obtain the predicted value F . Since E F2 (an arguably weak and F  x , but we can still estimate First, we regress F on the instruments 1, x, p F  to ˆ u | p F   0 , any function of F is also exogenous in equation (1.6), so we can estimate via standard 2SLS allowing F, 9 F 2 and F  x to be Control function estimation is an alternative to 2SLS that is generally preferred when estimating non-linear models, but is difficult to implement with multiple endogenous explanatory variables. Since our model is linear in parameters and we will be treating multiple endogenous explanatory variables, 2SLS is preferred. 26 ˆ endogenous using F , ˆ F2 ˆ and F  x as instruments (Wooldridge, 2002). This method can be extended to multiple endogenous explanatory variables, provided that we have a unique exogenous variable for each that can be used to generate fitted values to use as instruments. 10 Equation 5 demonstrates that, in fact, all choice explanatory variables are endogenous in equation 6, not just F. Allowing all choice inputs to be endogenous, however, is not always going to be feasible due either to a lack of exogenous instruments or the loss of efficiency that may come when a large number of variables are treated as endogenous. In our model we will allow all fertilizer variables and their interactions to be endogenous, as well as all seed variables. We will use the prices of fertilizer, distance to the nearest town, distance to the nearest fertilizer retailer, and the education level of the household head as our exogenous instruments. To gain efficiency we will exploit the corner solution nature of fertilizer application rates and obtain the fitted values to be used in 2SLS by fitting Cragg’s (1971) double-hurdle model. Fitted values for the binary indicators of fertilizer and improved seed use will be obtained by fitting the OLS linear probability model. Seed application rate will be fitted via OLS. Finally, acknowledging time-constant unobserved heterogeneity, ci , we can re-write uit  ci   it , so that the simplified representation of our model becomes: y it   0   1 Fit   2 x it   1 Fit   2 x it   3 Fit  x it  ci   it 2 (1.7) 10 2 Note that all exogenous variables should be included in every regression estimated to obtain the fitted values that will be transformed into multiple instruments. 27 Following Mundlak (1978) and Chamberlain (1982, 1984), we will employ the Correlated Random Effects (CRE) estimator, which controls for ci by modeling its correlation with timevariant regressors as a function of the temporal means for each observation and estimating the reduced form. See Wooldridge (2010) for a thorough description of this estimator and its relative costs and benefits compared to other options such as FE, Random Effects (RE) or First Difference (FD) estimators. In our case this selection is motivated by the need to relax the assumption that ci is uncorrelated with other regressors (making CRE preferable to the RE estimator) and the fact that we wish to observe and interpret the effects of time-constant regressors (making CRE preferable to the FE and FD estimators). We will estimate via Pooled CRE (PCRE), which requires less strict exogeneity assumptions for consistency. One drawback is that the CRE estimator adds the time averaged values of endogenous regressors to the model, which must also be treated as endogenous, but time averages of fitted values and their interactions can simply be added as instruments to accommodate them. All together, our approach for estimating conditional yield response to fertilizer is to first specify a model where the partial effect of fertilizer application is a function of other factors based on the knowledge in agronomic literature. Second, using demand determinants as instruments, conduct tests for the endogeneity of key inputs (or all inputs chosen by the farmer, if possible). We will use the Hausman joint endogeneity test described in Wooldridge (2003, page 507). If endogeneity is not empirically significant it can be ignored and estimation can proceed via standard methods to control for unobserved heterogeneity. If it is significant, the third step is to compute fitted values for each endogenous variable and the product of fitted values and the variables with which endogenous variables are interacted. Fourth we compute the time average 28 of all time-variant explanatory variables and fitted values. Finally we add the time averaged values to the model and estimate via pooled IV using fitted values as instruments. We will call this the Instrumented Pooled Correlated Random Effects (IPCRE) estimator. In order to examine the costs of ignoring the statistical considerations outlined in section 1.4.1, we will report results from OLS, IV and PCRE estimators alongside the IPCRE results. For this study we will make one adjustment to the strict application of the PCRE and IPCRE estimators. Recall that one of our explanatory variables is the interaction of the basal fertilizer application rate and an indicator variable for whether pH is greater than 5.5. As will be shown below, there are relatively few observations in this pH range, and very few in this range who use basal fertilizer. The PCRE and IPCRE estimators effectively double the number of parameters one must estimate when we add the time-averages of each regressor, meaning many observations may be needed to estimate parameters with any degree of efficiency. Given the relatively small number of basal fertilizer users in the high pH range, we will ease the burden on the estimator by omitting the time averaged interaction term from our estimation. This is tantamount to making the assumption that, conditional on having controlled for the structural endogeneity of fertilizer use, the correlation between unobserved heterogeneity and fertilizer use is no different for farmers in the high pH range than it is for farmers in the low pH range. 11 11 If we relax this assumption, the only meaningful difference in results is that our estimate of the interaction term’s effect is greater and more significant between households (the coefficient on the time-averaged regressor) than within a household (the coefficient on the time-variant regressor). 29 1.5. DATA This study will be carried out using longitudinal household data from two surveys carried out in 2004 and 2008 that were supplements to the 1999/2000 post harvest survey conducted by the Zambian Central Statistics Office (CSO). Data from the 1999/2000 survey can not be employed (i.e. we will have a 2 wave panel, rather than 3 waves) due to a lack of key variables such as seed variety in earlier data. There are 4,221 field level observations for maize yield from 2004 and 4,431 from 2008. Many farmers have more than one field but because field numbers were not consistently assigned, we can only claim our panel is balanced at the household (not field) level. Where farmers had more than one field under maize cultivation in a given year, data will be stacked so that each household may potentially have more than one observation per year (note, 69% of the fields in our sample are held by the 84% of households that planted only 1 maize field, Table 1.1). Of the 8,652 fields with available data, 683 were held by farmers who only planted maize in one of the two survey years. These fields are omitted to maintain the balanced panel. To control for the potential selection bias these omissions may introduce, we will include an inverse Mills ratio (IMR) as a regressor (Heckman, 1976). The IMR will be computed based on a probit regression on the likelihood of growing maize in both years with the gender of the household head (male=1), the farm size and the number of adult equivalents as explanatory variables, along with the yield determinants. Each of these additional regressors is positively and significantly related to continuous maize production. The fact that we are using household data presents numerous statistical issues that have already been discussed, but it also introduces two types of data challenges: 1) cases where valid 30 observations contribute to misleading results and 2) suspected cases of measurement error. In the first case, for example, a portion of the sample will experience either full or partial crop loss due to flood or drought. This will lower our estimated response to fertilizer application, which, in the discussion of the determinants of variations in yield response, would be misleading. When crop loss occurs early in the rainy season, farmers frequently “re-plant” all or a portion of their fields. This additional quantity of seed applied will be reflected in our data, but the fact that the field has been re-seeded will not. This is another example of how valid responses could cause misleading results. The second data challenge is measurement error. For example, to estimate our model we standardize many input and the harvest figures at the per hectare level. This is important for our econometric analysis, but a side-effect is that any measurement errors on very small fields become amplified. To address both of these challenges, we put the data through a series of filters. First, we also omit 356 fields that were farmed by households where we observe more seed planted than maize was harvested, likely reflecting crop loss or abandonment and high levels of re-seeding. There are several observations yields higher than seed rates, but that do report extremely high seed applications. Once again these are likely fields that were re-seeded (or measurement error), so we omit 194 fields that were in the top 1% of the distribution of seed application rate (greater than 138.3 kg of seed per hectare). In order to omit cases where crop loss wasn’t reflected in the seed data, we also omit the 131 fields whose yield is in the bottom 1% of the remaining yield distribution as “failed” crops (harvesting less than 115 kg/ha). To address the problems stemming from small field measurement error, we omit 153 of the fields that were in the lowest 1% of the remaining distribution of field size (fields smaller than 0.1 hectares). Finally, 4 31 observations with missing data were omitted, leaving us with a panel, balanced at the household level, of 7,127 maize fields (3,448 from 2004 and 3,679 from 2008). All of this said, while considering fields where the crop was lost could be misleading in the yield response variance discussion, to ignore those fields in the profitability discussion could be equally misleading since farmers consider the risk of crop failure when making investment choices. So, after discussing the profitability of fertilizer use given a successful harvest, we will briefly discuss the expected profitability Each observation has been assigned an analytical weight by the CSO to make the sample nationally representative at the time it was selected. These weights were developed following the recommendations described in detail in Megill (2004), and are computed using population distribution data from the 2000 census. In short, these are the product of the inverse probabilities of being selected into the sample at each stage of stratification (within each district, the selection stages in this study are the census supervisory area (CSA), the standard enumeration area (SEA) and the household levels) (Megill, 2004), We will also use the inverse probability of re-interview weight (IPW) to correct for potential attrition bias. The IPW used was developed by Mather et al (2011). Results un-weighted for attrition bias are reported in Appendix B, which demonstrates that our results are robust. The variables used in our regression analysis are described in Tables 1 and 2. Table 1.2 shows the percentile distribution and mean of the dependant variable, maize yield measured in kilograms per hectare (kg/ha), as well as the continuous explanatory variables which includes fertilization and seed rates (kg/ha), as well as the total growing season (November to March) rainfall (mm) and the number of 20 day periods during which less than 40mm of rain fell, or “stress periods”. Among other things, Table 1.2 highlights the fact that the distribution of yields 32 Table 1.2: Distribution of smallholder maize yield and yield determinants in Zambia (2004, 2008) Percentiles of distribution Yield and Determinants 5 25 50 75 95 Mean Maize yield (kg/ha) 268 710 1,241 2,070 4,020 1,562.2 Basal application rate (kg/ha) 0 0 0 83 200 51.9 Top dressing application rate (kg/ha) 0 0 0 100 225 53.8 Seed application rate (kg/ha) 10 19 23 35 57 28.3 a 698 838 938 1053 1335 971.1 Growing season rainfall (mm) b 0 1 2 2 4 1.8 Number of rainfall stress periods Source: Supplemental Surveys to the 1999/2000 Post Harvest Survey: 2004, 2008 Notes: a- “Growing season” is November to March. b- “Rainfall stress” is the number of 20-day periods during the growing season with less than 40 mm total rainfall is fairly skewed amongst Zambian smallholders. In 2004 and 2008 the mean yield was over 1.5 metric tons (mt) per ha, but note that the majority of fields harvested less than 1.3 mt/ha, and 5% harvested more than 4 mt/ha). Table 1.2 also illustrates that fewer than half of Zambian smallholders used fertilizer during the period of our study, with the average field (including nonfertilized) having roughly 106 kg/ha of basal and top dressing fertilizers applied combined. The seed application rate on the average field is 28 kg/ha, which is 40% higher than the recommended rate of 20 kg/ha (ZARI, 2002). The median seed rate is closer to the recommendation (23 kg/ha), but only 25% of the fields are seeded in the 19-23 kg/ha range. In fact, according the ZARI (2002) and the results in Table 1.2, the majority of fields are too densely seeded and a fourth of all fields are planted with at least 75% more seed than recommended. Table 1.3 shows the distribution of the binary determinants of yield that will be used in our regression analysis, both in terms of the share of fields and the share of total hectares 33 Table 1.3: Distribution of binary yield determinants in Zambia (2004, 2008) Agricultural year 2003/04 2007/08 2003/04 2007/08 Yield determinants ----Share of fields---- ---Share of hectares--Animal or green manure applied .10 .12 .12 .12 a .05 .03 .03 .02 Planted with nitrogen fixer b Planted with non-N-fixing crop Tillage done before the rains Tillage done using traditional hand hoe Tillage done using planting basins Zero tillage used to prepare field Tillage done using plow Tillage done using ripping Tillage done using ridging Weeded once Weeded twice Weeded three or more times pH below 4.4 pH between 4.4 and 5.5 pH between 5.5 and 7.1 Acrisol Soil c Other clayish soils .02 .01 .02 .00 .38 .49 .02 .02 .34 .01 .12 .40 .50 .09 .52 .46 .02 .38 .19 .34 .30 .01 .03 .41 .01 .23 .41 .50 .09 .50 .48 .02 .37 .18 .36 .40 .02 .02 .44 .01 .11 .37 .53 .10 .46 .53 .01 .35 .24 .30 .23 .01 .03 .52 .01 .21 .40 .52 .08 .46 .53 .01 .35 .23 d .24 .25 .23 .23 Sandy/undeveloped soils e .05 .05 .05 .06 Developed/organic soils Other soils .14 .15 .13 .13 Fertilizer applied .36 .38 .45 .50 Planted with hybrid or open pollinated .40 .45 .48 .58 seed variety Source: Supplemental Surveys to the 1999/2000 Post Harvest Survey: 2004, 2008 Notes: a-Nitrogen fixing crops are groundnuts, soybeans, cowpeas and various beans. b-Any crop other than those listed as nitrogen fixing are planted in the field. c- “Clayish” soil types include Alisols, Ferralsols, Lixisols and Vertisols. d- “Sandy/undeveloped” soil types include Arenosols, Leptosols, Podzols and Regosols. e- “Developed/organic” soil types include Camisols, Gleysols, Histosols and Luvisols. 34 planted, by year. The determinant is more commonly found on smaller fields when the share of fields for a given determinant is greater than the share of hectares For example, we are equally likely to find conservative tillage methods such as ripping, planting basins or zero tillage on any field size, but these three methods apply to only 5% of all maize fields combined in both 2004 and 2008. Hand hoeing, which is more common on smaller plots, became a less common practice between our survey periods. By 2008 some 30% of all fields (down from 49%), accounting for 23% of the total area planted with maize (down from 40%) were tilled by hand hoe. The most common tillage method in 2008 (and more common on large fields) was plowing, which was the chosen method on 41% of all fields (accounting for 52% of the total maize hectares planted). We can also note from Table 1.3 that, while fertilizer (either basal or top dressing) was applied only about 37% of Zambian maize fields in 2004 and 2008, they account for nearly half of the total area planted over the same period, meaning fertilizer use was more common on larger fields. Finally, note that only 2% of the fields in Zambia (accounting for just 1% of the area under maize cultivation) are on semi-neutral soils (pH in the range of 5.5 to 7.1), while roughly 47% (or 53% of total area) are on acidic soils (pH in the range of 4.4 to 5.5) and 51% (or 46% of total area) are on very acidic soil (pH below 4.4). To understand how these factors affect yield and yield responsiveness to fertilizer, we now turn to regression results. 35 1.6. RESULTS AND INTERPRETATION Selected regression results are reported in Table 1.4. 12 Column (i) reports results from OLS estimation in order to demonstrate the difference, if any, when we ignore potential endogeneity and unobserved effects. Column (ii) reports results of IV estimation, ignoring unobserved heterogeneity. When obtaining the fitted values to be used in computing instruments, the exogenous price, distance and education variables were jointly significant at the 0.01% level or lower in each of the 5 models for basal application rate, top dressing application rate, the fertilizer use dummy variable, the improved seed use variable and the seed application rate, which supports their selection as instruments. The Hausman joint test rejects exogeneity (pvalue = 0.00). After allowing for the endogeneity of these inputs, the Hausman test for the significance of unobserved heterogeneity rejects no effect (p-value = 0.00). Results estimated via PCRE (ignoring endogeneity) are reported in column (iii), and IPCRE results are reported in column (iv). 13 We will first discuss the effects of inputs other than fertilizer and the robustness of our results to the choice of estimator. We will then evaluate the cost of ignoring endogeneity and unobserved effects, focusing on the average partial effects (APE) of fertilizer use from each 12 Coefficients on the provincial level net weather effect control variables (i.e. provincial dummies interacted with time trends) and the time-averaged components of the CRE estimators are not reported here. Full results are available upon request. 13 An alternative method of combining an IV and unobserved effects estimator that allows for time-constant variables would be to analyze estimation results from the regression based Hausman test for whether Fixed Effects IV (FEIV) is preferable to Random Effects IV. This yields the FEIV estimator on time variant regressors as well as an estimate for “between household” effects of time constant regressors. These results are reported in Appendix C alongside out IPCRE results to demonstrate robustness. Notably, the IPCRE estimator is able to explain slightly more variation in the data. 36 Table 1.4: Selected results from yield response estimates Estimator OLS IV PCRE Explanatory variables (i) (ii) (iii) Fertilizer use y -332.556*** -329.3706*** -337.7956*** Fertilizer user (1=yes) (0.00) (0.00) (0.00) Basal application rate 2.209*** 2.1215** 2.5520*** y (0.01) (0.01) (0.01) (kg/ha) y -0.005*** -0.0046** -0.0041** Basal rate squared IPCRE (iv) -345.4989*** (0.00) 2.5625*** (0.01) -0.0039** Top dress application y (kg/ha) (0.01) 6.053*** (0.00) (0.01) 6.0623*** (0.00) (0.02) 5.4690*** (0.00) (0.03) 5.4721*** (0.00) y -0.003*** -0.0028*** -0.0020*** -0.0019*** (0.00) (0.00) (0.00) (0.00) 221.958*** (0.00) 217.4828*** (0.00) 181.2930*** (0.00) 183.1064*** (0.00) 20.189*** (0.00) 19.8225*** (0.00) 21.4929*** (0.00) 20.7972*** (0.00) -0.117*** -0.1125*** -0.1135*** -0.1062*** (0.00) 183.065** (0.05) -24.788 (0.85) 39.047 (0.44) -10.635 (0.76) 44.576 (0.69) 145.710* (0.06) 234.180*** (0.00) 2.921 (0.99) 44.719 (0.36) -249.508 (0.16) (0.00) 173.9976* (0.06) -28.3596 (0.83) 39.5826 (0.43) -11.6556 (0.74) 42.5153 (0.71) 142.1479* (0.07) 228.3830*** (0.00) 9.7704 (0.95) 44.2043 (0.36) -243.3199 (0.16) (0.00) 382.9672*** (0.00) 192.0790 (0.15) 21.2478 (0.74) 21.3072 (0.65) 140.8068 (0.26) 56.5342 (0.63) 178.9874** (0.02) 183.0193 (0.43) 127.7026** (0.04) -77.0254 (0.74) (0.00) 377.0041*** (0.00) 190.6695 (0.15) 23.1003 (0.72) 20.8432 (0.66) 141.8314 (0.26) 51.7677 (0.66) 171.7717** (0.03) 190.4936 (0.41) 126.7231** (0.04) -72.2638 (0.76) Top dress rate squared Other inputs and tillage Improved seed use y (1=yes) Seed application rate y (kg/ha) Seed rate squared y Planted with an N fixer (1=yes) Other mixed crop (1=yes) Applied plant or animal manure (1=yes) Planting before the rains (1=yes) Planting basins (1=yes) Zero tillage (1=yes) Plowing (1=yes) Ripping (1=yes) Ridging (1=yes) Bunding (1=yes) 37 Table 1.4 (cont’d) OLS (i) 330.174** (0.03) 368.419** (0.02) 397.149** (0.02) 166.993*** (0.00) 413.063*** (0.00) -241.507*** (0.00) -164.7682** (0.02) 30.9856 (0.78) -227.991*** (0.00) IV (ii) 320.2510** (0.04) 357.8286** (0.02) 383.7406** (0.02) 163.2562*** (0.00) 400.1734*** (0.00) -243.2450*** (0.00) -157.4276** (0.03) 21.6836 (0.84) -211.0373*** (0.00) PCRE (iii) 350.5080* (0.06) 389.1368** (0.04) 447.4851** (0.02) 177.6978*** (0.00) 430.6062*** (0.00) -240.5323*** (0.00) -163.3278** (0.03) 17.6884 (0.87) -227.1971*** (0.00) PCRE IV (iv) 341.8989* (0.06) 380.0669** (0.04) 438.4622** (0.03) 177.7879*** (0.00) 419.5940*** (0.00) -242.3512*** (0.00) -156.3618** (0.04) 4.9952 (0.96) -208.5235*** (0.00) 0.2925 (0.51) -0.0002 (0.33) -24.3191 (0.16) Yes 0.2829 (0.52) -0.0002 (0.33) -25.3915 (0.15) Yes -0.2482 (0.63) 0.0000 (0.87) -2.5405 (0.93) Yes -0.2604 (0.62) 0.0000 (0.86) -3.0743 (0.92) Yes 1.0425 (0.12) 1.2032* (0.07) 1.2337 (0.15) 1.4766* (0.08) y 4.5741*** (0.00) 0.0003 4.5844*** (0.00) 0.0001 4.9442*** (0.00) -0.0013 4.9971*** (0.00) -0.0015 y (0.87) -0.7777 (0.94) -0.8125 (0.39) -0.8002 (0.32) -0.8527 y (0.23) -1.8421** (0.22) -2.0587** (0.24) -1.8564** (0.22) -2.0764** y (0.03) 0.2898 (0.02) 0.5131 (0.04) 0.3019 (0.02) 0.5431 y (0.82) -0.4695 (0.68) -0.8788 (0.81) -0.4048 (0.67) -0.8496 (0.61) (0.34) (0.66) (0.36) Weed once (1=yes) Weed twice (1=yes) Weed three times (1=yes) 4.4  pH  5.5 (1=yes) 5.5  pH  7.1 (1=yes) Clayish soils (excl. acrisols) Sandy/undeveloped soils Developed/organic soils Other soils Weather Rainfall (mm) Rainfall squared Stress periods (<20mm/dekad) Provincial weather effect Fertilizer interactions Basal y rate*1[pH  [ 4.4,5.5) ] Basal rate*1[pH [5.5,7.1] ] Basal rate*top dress rate Top dress rate*clay soil Top dress rate*sandy soil Top dress rate*rich soil Top dress rate*other soil y 38 Table 1.4 (cont’d) Top dress rate*Plow y tillage Fertilization y rate*improved seed use Inverse Mills ratio for growing maize both years Constant OLS (i) -1.1272** (0.02) IV (ii) -1.0830** (0.03) PCRE (iii) -0.8280 (0.24) PCRE IV (iv) -0.7364 (0.29) 0.6049** (0.02) 0.5680** (0.04) 0.4164 (0.20) 0.3451 (0.31) 84.8048 33.6675 109.7845 72.9007 (0.80) (0.92) (0.74) (0.83) 303.2773 339.3642 -159.8328 -97.9246 (0.35) (0.30) (0.83) (0.89) Observations 7127 7127 7127 7127 R-squared 0.260 0.259 0.265 0.265 Robust p-values in parentheses. Notes: y-treated as endogenous in IV regressions. *, **, *** significant at the 10%, 5% and 1% levels estimator. We will then discuss the variation of the APEs and their determinants and the profitability of fertilizer use. 1.6.1 Determinants of yield (excluding fertilizer) Use of improved seed does have positive independent effect on yields, though the estimate of its effect declines by roughly 15% when we control for the time-constant heterogeneity. All else equal, fields planted with hybrid or OPV seeds will yield 194 maize kg/ha, on average. Seed application rate (and the diminishing returns thereof) is significant at the 1% level, with this result very robust to estimator selection. The current price of hybrid seed in Zambia ranges as high as 11,000 ZMK/kg and, for example, in Choma (one of Zambia’s most productive districts) the real (2010) price of maize during the 2004 and 2008 harvest and marketing months was 782 ZMK/kg. At these prices and ignoring transportation costs, the IPCRE results suggest the net revenue maximizing seed 39 application rate is 32 kg/ha (p-value = 0.00). This is greater than the recommended application rate of 20 kg/ha (ZARI, 2002), but very much in line with our observations. Sensitivity analysis on calculation of the net revenue maximizing seed rate with respect to prices transportation costs are reported in Appendix D and indicate that the rate will exceed the ZARI recommendation under most conditions, which may explain the distribution of seed rates observed in Table 1.2. Maize that shares its field with a nitrogen fixing plant produces significantly more per hectare than mono-cropped maize. IPCRE results suggest this alone could increase yields by nearly 368 kg/ha. This result is highly statistically significant regardless of estimator choice, but notice the benefits of intercropping with a N-fixing plant are estimated to be twice as great when unobservable time-constant effects are controlled for. Soil acidity is highly significant and confirms that plants in less acidic soils will produce greater yields. All else equal, yield on fields where pH is in the 4.4 to 5.5 range will be 191 kg/ha greater than yield on more acidic fields, while yield on semi-neutral fields (pH between 5.5 and 7.1) will be 384 kg/ha greater. Compared to the mean yield on the very acidic (pH less than 4.4) fields, this represents a 13% to 26% difference in yield. Notice also, this result is very robust to choice of estimator. Yields tend to be worst on clayish soils (excluding Acrisols) and our general “other” soil types. Yield on undeveloped or sandy soils are also worse than those on Acrisols (the omitted soil type), while those on richer soils with more organic material are greater, however the latter is not a statistically significant difference. Weeding is a significant yield improving practice, unsurprisingly. It is somewhat surprising that all three of these variables are significant, considering the fact that 99.6% of the 40 fields are weeded at least once (i.e., we might have expected one of these to necessarily have been dropped due to near perfect collinearity with the intercept term). Most likely, the 0.4% of fields that were not weeded represents those which were partially abandoned. According to our IPCRE results, yields on fertilized fields would have been roughly 344 kg/ha lower than those on unfertilized fields had fertilizers not been used. This result is consistent with anecdotal evidence from Zambia that more productive land is desired by smallholders precisely because it doesn’t “need” fertilization, and that fertilizer is necessary only after soils have been extensively farmed for a number of years. 14 Economically, it stands to reason that demand for fertilizer would be greater on depleted soils, where the average and marginal product of application would be greater (assuming there are diminishing returns to N and P availability, which our results confirm). Notably, when we omit the intercept shifting dummy variable for fertilization, the APE estimates for both basal and top dressing are 24% lower than the IPCRE estimates. 1.6.2 The cost of estimating via OLS The APEs of basal and top dressing applications for each of these estimators are reported in Table 1.4. In short, the results are mixed. When we control for the endogeneity of seed and fertilizer as well as the unobserved household heterogeneity (Table 1.5, column iv), our estimate for the APE of top dressing is 14% lower than that estimated via OLS (column i). The effect of basal application, on the other hand, is 20% greater when we control for unobserved heterogeneity and endogeneity. This may suggest that better farmers with better knowledge are 14 Anecdotal evidence comes from farmer interviews conducted for the analysis in Sitko (2010), which were shared by the author. 41 Table 1.5: Average partial effects of fertilizer use from various conditional response model estimates (kg of maize output per kg of fertilizer input) Estimator OLS IV PCRE IPCRE Average partial effect (i) (ii) (iii) (iv) y Basal dressing 2.519*** 2.520*** 2.875*** 2.983*** (0.00) (0.00) (0.00) (0.00) 4.942*** 4.846*** 4.402*** 4.296*** Top dressing y (0.00) (0.00) (0.00) (0.00) Robust p-values in parentheses. Notes: y-treated as endogenous in IV regressions. *, **, *** significant at the 10%, 5% and 1% levels more likely to use top dressing because it is more effective, leading to contradicting differences for each type of fertilizer between IPCRE and OLS estimation. 1.6.3 Yield response to fertilizer and profitability The effect of basal fertilizer is relatively small in magnitude compared to top dressing (Table 1.5). This may partially be explained by the slow absorption rate of P, the primary chemical in basal fertilizer discussed in Section 1.3. This will be addressed further in the following subsection. The difference in yield response might also be explained by the factors determining the variation in yield response to basal application, which our results show to be significantly conditional on soil acidity. Figure 1.3 plots a Lowess regression (with a 0.3 bandwidth) and a scatter plot of our estimates of the incremental yield gains from basal fertilizer application on the vertical axis against soil pH levels on the horizontal axis. Notice that in the pH range below 4.4 yield response to basal application is minimal (2.14 incremental kg of maize per kg of fertilizer, Table 1.6). We see slightly higher response to basal application over the pH range from 4.4 to 5.5 at which levels the 42 Yield response to basal dressing (kg/kg) -5 0 5 10 Figure 1.3: Marginal yield response to basal fertilizer application by soil pH 3 4 5 Soil pH 6 7 Partial Effect (PE) Lowess of PE on pH Note: Marginal yield response computed conditional on application rate and soil acidity following regression results in Table 1.2 Table 1.6: Maize yield response to basal fertilizer over soil acidity ranges Soil pH 3.1 - 4.3 4.4 - 5.4 5.5 - 7.1 Average partial effect 2.140** 3.735*** 7.552*** of basal fertilizer (0.01) (0.00) (0.00) application (kg/kg) % of sample 51% 47% 2% average effect is 3.74 kg/kg (Table 1.6). Above the 5.5 pH level yield response increases considerably, more than doubling, on average to 7.55 kg/kg. Admittedly the latter is a thin estimate (recall that only 5% of these observations use fertilizer), but this result is certainly within reason by international standards. For example, field trails in Kenya conducted by Smailing et al (1992, Table3) showed, on average, 45 additional 43 kgs of maize per kg of P applied, with outcomes ranging as high as 127.3 kg/kg on one plot. In Cameroon, The et al (2001) present field trail data from 1997 to 2000 showing that mean yields on fields where local seed varieties were planted were 9.1 kg of maize greater per kg of P applied. Soil pH in those fields was reported at 4.5. Where P-fertilization was supplemented with lime application to mitigate low pH effects yields were nearly double those where fertilizer alone was applied, on average. Furthermore, this result is highly consistent with the agronomic principle that P-fertilizers are very vulnerable to being immobilized into iron and aluminum phosphates at pH levels below 5.5 (Griffiths, 2010). There is a vast agronomic literature on the limited effectiveness of phosphoric fertilizers on acidic soils for maize and other crops which corroborates these findings (examples include Omenyo et al, 2010; Gudu et al, 2005; Anetor and Akinrinde, 2007; Zhang et al, n.d.; and many others). The unfortunate fact for Zambia, however, is that very few of their maize fields are on neutral to semi-neutral soils. In fact, the majority of Zambian maize fields are on acidic soils where basal fertilizer is relatively ineffective (Table 1.5). Note also, there is a negative and significant interaction effect between basal and top dressing applications. As can be seen in Figure 1.3, this (and perhaps over fertilization of basal itself) has resulted in a negative estimate for the partial effect of basal application on a small share of the fields in our study (1%). On pH neutral soils this result would be rather surprising, because P and N are jointly elements that form adenosine triphosphates, which are the key “energy unit” that plants form during photosynthesis and which is used to store and use the plant’s energy (Eckert, 2010; Griffiths, 2010). In other words, we would expect a positive relationship between P and N fertilizers. In Zambia, however, N fertilization is apparently compounding the P consumption problems that are experienced on acidic soils. In other words, 44 Figure 1.4: Simulated profitability of basal application near Choma Average Value Cost Ratio of basal application 6 5 4 3 2 1 0 1 2 3 4 Simulation Number pH below 4.4 Simulation Parameters Price of maize (ZMK/kg) Price of Basal (ZMK/kg) Distance to maize (km) Distance to fertilizer (km) Price to transport maize (ZMK/km/kg) Price to transport basal (ZMK/km/kg) Basal application rate (kg/ha) Top dressing application rate (kg/ha) pH range 4.4 - 5.4 Simulation 1 782 3556 33.5 33.5 7.1 10.6 150 150 45 pH range 5.5 - 7.1 Simulation 2 782 1000 33.5 33.5 7.1 10.6 150 150 Simulation 3 1300 3556 33.5 33.5 7.1 10.6 150 150 Simulation 4 1300 1000 33.5 33.5 7.1 10.6 150 150 field level variations in acidity are causing an even greater impact on the (lack of) effectiveness of both P and N fertilization. Soil acidity has important implications for the profitability of fertilizer adoption and may contribute to the explanation as to why many Zambian farmers do not use or “under-use” fertilizer. Consider, for example, the effect of soil acidity on profitability in Choma, one of Zambia’s most productive districts. The average real cost of basal fertilizer at planting time (October to December) in 2003 and 2007 was 3,556 ZMK/kg (measured in 2010 Kwacha), and the real marketing season (May to July) price for maize in Choma was 782 ZMK/kg. 15 According to Sitko et al (forthcoming), the cost of transporting maize in Zambia is 7.1 Kwacha per kilogram per kilometer (ZMK/kg/km), and analysis from Burke et al (2011) indicates that the cost of transporting fertilizer is 10.6 ZMK/kg/km. 16 Our data shows the median distance from farm gate to Choma is 33.5 km in that district. Several simulations of the average value cost ration (AVCR) a farmer would face based on these parameters are illustrated in Figure 1.4. If we assume these transport costs are factored into commercial farm gate prices for the farmer’s inputs and output (simulation 1 in Figure 1.4), basal fertilizer use in the district for this representative example would not be profitable 17 15 at any pH level, even if we ignore all non- Nominal prices are from the Agricultural Market Information Center (AMIC) inflated to real 2010 prices using the Central Statistics Office’s published Consumer Price Index (CPI). 16 It stands to reason that the cost of transporting fertilizer would be greater per kg/km because it is generally transported on a smaller scale, by the farmer rather than a medium or large scale grain assembler. 17 “Profitable” is defined as a value cost ratio greater than 1.0. Crawford and Kelly (2002) and others suggest defining profitable as an AVCR greater than 2 to account for transfer costs. We use 1 since we include transportation in our simulations. Given the certain existence of other transfer costs (e.g. information gathering), we consider ours a liberal definition. 46 transportation transfer costs. If we assume all fertilizer on the field is subsidized at a price of 1,000 ZMK/kg (simulation 2), basal application is reasonably profitable at pH levels above 5.5, marginally profitable in the pH range of 4.4 to 5.5, but still not profitable where pH is below 4.4. In simulation 3 we return fertilizer to its commercial price, but assume the household is able to sell their maize at the subsidized price of 1,300 ZMK/kg (e.g. sell to the Food Reserve Agency (FRA)). In this scenario basal application would be profitable on semi-neutral soil (AVCR=1.9), but not so for any soil where pH is below 5.5. Finally, in simulation 4 we assume our farmer receives both subsidized fertilizer and a subsidized sale price. In this scenario basal application is clearly profitable for the high pH range (AVCR=5.5) and, considering we’ve accounted for transport costs, the 4.4 to 5.5 pH range as well (AVCR=2.5). In the lowest pH range, however, basal use is only marginally profitable even if both input and output prices are subsidized non-transport transfer costs are ignored (AVCR=1.3). Sixty-four percent of the maize fields farmed in Choma are in an area where prevailing pH is between 5 and 5.4, meaning use is not profitable at commercial prices, marginally profitable if fertilizer is subsidized, and only clearly worth the farmer’s investment if inputs and output prices are both subsidized. The other fields in Choma are on soils where pH is below 4.4. Only 2% of our nationally representative sample is in an area where pH is greater than 5.5, meaning acidity, if unchecked, is likely to render basal fertilizer application unprofitable on nearly all Zambian soils. 18 In this discussion, we must acknowledge that the profitability analysis for basal fertilizer assumes the full real price of fertilizer should be applied. As discussed previously, many of the 18 An interactive Excel spreadsheet is available upon request to test the sensitivity of this finding. The finding that basal use is not profitable on the overwhelming majority of maize fields, barring subsidized inputs or outputs, is quite robust. 47 benefits of phosphoric fertilization are not realized in the year of application due to the time it takes for phosphorus to convert to consumable phosphates (if it is going to do so). So, the “actual” real relevant price is the real price times the ratio of nutrients that are consumed in the first year (and the carry over nutrients captured by proxy) to the total fertilizer nutrients that will eventually be consumed by one of the farmer’s crops. Estimating this ratio presents a significant challenge, but we can establish a reasonable range of assumptions for testing the sensitivity of our profitability conclusions. The high end assumption is that all of the nutrients are consumed in the first year. This is the assumption imposed in Figure 1.4. If farmers are unaware of the physics of phosphoric fertilization (i.e. the carry-over effects), then note, this is the perceived relevant price, and applying this assumption allows us to demonstrate the perceived profitability of basal fertilizer application. If farmers are aware of carry over effects, they would know that real profitability will depend on some lower share of the fertilizer’s price. The low end assumption of this ratio can be derived based on knowledge of basal fertilizer’s composition. Phosphorus constitutes roughly half of the major nutrients delivered in basal dressing. The other half includes nitrogen, potassium and some minor nutrients that are either consumed or washed away in the year of application. According to Griffiths (2010), at a minimum, 20% of the phosphorus that will ever be consumed will be taken in the year of application. We will also assume that, on average, an additional 20% of the phosphorus in basal fertilizer will never be consumed by the farmer’s crops. On Zambia’s primarily acidic soils, this is a very conservative assumption. Therefore, the minimum share of the price of basal that we should consider relevant to our profitability analysis is 0.7, or 70 percent (i.e. the 50% attributed to other nutrients, plus 40% of the half that is attributed to phosphorus). 48 Average Value Cost Ratio of basal application Figure 1.5: Simulated profitability of basal application near Choma: Actual relevant price ratio sensitivity analysis % of basal 8 price 7 assumption 6 5 4 3 2 1 0 1 2 3 4 70% 1 2 3 4 80% 1 2 3 4 90% 8 7 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0 Simulation number Simulation Simulation Simulation Simulation Simulation Parameters 1 2 3 4 Price of maize (ZMK/kg) 782 782 1300 1300 Price of Basal (ZMK/kg) 3556 1000 3556 1000 Distance to maize (km) 33.5 33.5 33.5 33.5 Distance to fertilizer (km) 33.5 33.5 33.5 33.5 Price to transport maize (ZMK/km/kg) 7.1 7.1 7.1 7.1 Price to transport basal (ZMK/km/kg) 10.6 10.6 10.6 10.6 Basal application rate (kg/ha) 150 150 150 150 Top dressing application rate (kg/ha) 150 150 150 150 49 Again based on the characteristics of maize marketing in the Choma region, sensitivity analysis to this ratio assumption is presented in Figure 1.5. Here we apply a range of assumptions on actual real prices in the range of 70-90% of the perceived real price. Notably, even at the most conservative assumption for relevant price ratio (70%), basal fertilizer application at commercial prices is only marginally profitable on semi-neutral soils and unprofitable on the acidic soils that prevail in the region. Fortunately, the situation illustrated in Figure 1.3 is not fixed, and there are measures that can be taken to make basal fertilizer use profitable on acidic soils and increase Zambian productivity. First, certain fertilizer application methods could shift up the response rates in Figure 1.3, so that higher response levels are seen at pH levels more commonly found in Zambia. For example, it has been shown that applying small bands of fertilizer very near, around or under the seed makes P-fertilization more effective in acidic soil (Boman et al, 1992). This is known as “banding” application, as opposed to evenly spreading fertilizer over the entire field, or “broadcasting,” as is commonly practiced in Zambia. Also, private firms in developed countries produce “phosphorus enhancing” fertilizer supplements which claim to harmlessly alter the soil chemistry near the fertilizer to protect it from becoming unavailable to plants. 19 These fertilizer supplements have been tested extensively on U.S. soil, where they have a 15-20% effect on increasing yields, but the benefits may be greater on more acidic soils. Second, it is important to realize that fertilizer is one of many available maize production inputs. Specifically, other inputs could be used to shift the distribution of fields in Table 1.5 up to the pH levels at which P-fertilization is naturally more productive. This is the primary reason 19 For example, see http://www.chooseavail.com/Science.aspx 50 farmers throughout the world apply lime to their fields. Moreover, certain types of lime have the added benefit of adding Calcium and Magnesium (useful elements) to the soil, while neutralizing Manganese and Aluminum (harmful elements) (Snyder, 2010). Alternatively, management practices could be designed to both increase pH and basal fertilizer’s resistance to acidity, in order to find a productive middle-ground. That said, the optimal solution is not obvious. Finding this solution will require substantial research, and will require more funding allocated to institutions such as the government funded Zambia Agricultural Research Institute (ZARI, in the Ministry of Agriculture and Cooperatives) or the donor funded International Institute of Tropical Agriculture (IITA), which recently selected Zambia to be the home of its regional offices. 20 Top dressing is more effective on most Zambian fields with an APE of 4.3 incremental kgs of maize per kg of fertilizer applied (p-value = 0.00, Table 1.5). With respect to acidity, this is not surprising since the N in top dressing is more vulnerable to volatilization in alkaline soils, which are generally not found in Zambia. Note from regression results, however, that this figure, too, masks a range of yield response rates. For example, fields on one of Zambia’s sandy/undeveloped soil types will see, on average and all else equal, 2.1 kg/kg lower response rate to top dressing fertilizer than a farmer on the clayish Acrisol soil most common in Zambia (p-value = 0.02, Table 1.4). This is likely due to the fact that water more readily percolates through these soils, causing the N provided by top dressing to leach away from the plant faster. Plow tillage and improved seed variety use also have expected impacts on the effectiveness of top dressing, though these are only significant at 20 see http://www.daily-mail.co.zm/media/news/viewnews.cgi?category=19&id=1256193183 51 Table 1.7: Yield response to top dressing fertilizer by tillage and soil types Tillage method Plowing Other tillage methods Soil type ------Average partial effect of top dressing (kg/kg)----Sandy soils 2.625** 3.285*** (0.02) (0.00) Other soil types 4.197*** 4.978*** (0.00) (0.00) Source: regression results reported in Table 1.4 Note: robust p-values in parentheses. *** significant at 1% , **significant at 5% the 29% and 31% levels respectively when our model is estimated via IPCRE (notably, both effects are of greater magnitude and statistical significance if we ignore unobserved heterogeneity). Farmers who plow their fields will see yield response to top dressing 0.74 kg/kg lower than a farmer with less soil-disruptive tillage such as ripping, basin planting or hand hoeing. Again, this is likely caused by increased nutrient loss as water is more able to seep through loosened soil. Together these and other effects can cause considerable variance in yield responses. Table 1.7 presents the APE of top dressing fertilizer according to whether plow tillage is employed and whether the field is in a region of sandy/undeveloped soil or some other soil type. 21 While the average effects of top dressing are statistically significantly different from zero at the 2% level or lower in each case, response rate is 90% higher in non-sandy soils which are not tilled via plow (4.98 kg/kg) compared to that in plowed, sandy soil (2.63 kg/kg). When soils are either sandy and unplowed or plowed and not sandy, response to top dressing fertilizer is 3.29 kg/kg and 4. 20 kg/kg respectively, on average. 21 Note from regression results that other soil interactions indicate response rates on them were not statistically significantly different form those on Acrisols, so they are grouped together here. 52 Average product of fertilizer (kg/kg) 2 4 6 8 Figure 1.6: Cumulative distribution of average product of fertilizer use (kg/kg) .93 0 .37 0 .2 .4 .6 Cumulative share of users .8 1 Basal Application Top Dressing Source: Author’s calculations As with basal fertilizer, although yield response to top dressing is statistically significant, it is not necessarily profitable to use it in Zambia. Based on average real wholesale prices for maize in several towns throughout Zambia during harvest months in 2004 and 2008, and national fertilizer prices in during the planting months in 2003 and 2007, 22 a farmer would require, on average, a yield response rate of 4.1 kg/kg in order to achieve a AVCR greater than 1 (i.e. be profitable if we ignore all transfer costs). Figure 1.6 shows the cumulative distribution of the average product of basal and top dressing fertilizer estimated for the fertilizer users in our sample 22 Wholesale maize prices for Kabwe, Ndola, Choma, Lusaka and Chipata for the post-harvest months of May to July 2003 and 2007 (881 ZMK/kg of maize, on average), and fertilizer prices from October to December 2004 and 2008 (3591 ZMK/kg of fertilizer) are from AMIC. Prices are inflated to 2010 values using the CSO Consumer Price Index. 53 with a horizontal reference line at an average product of 4.1. Thus, observations to the left of where the reference line bisects the distribution represent fields where fertilizer use is not profitable. At these commercial prices, fertilizer use was not profitable on 39% of all fields with respect to top dressing, and 92% of all fields with respect to basal fertilizer. To achieve an AVCR of 2 (i.e. the “rule of thumb” profitable value that would account for transfer costs (Crawford and Kelly, 2002)), we require an average product of 8.2. This is estimated for just 1 fertilized field in our sample with respect to basal fertilizer and 2 fertilized fields with respect to top dressing. While this figure may be surprising, bear in mind these results are based on the assumption that there will be a successful harvest (i.e. these estimates are based on data that filtered out crop failures). If we factor in the regionally variant likelihood of crop failure 23 (computed as described in the data section ), expected profitability is even lower. Figure 1.7 presents the cumulative distribution of the expected AP of fertilizer application among users in our sample. Using the same commercial prices described for Figure 1.6, top dressing fertilizer would expectedly be unprofitable 42% of those applying top dressing and 99% of those applying basal. Beyond explaining why some farmers choose not to use fertilizer, the profitability analysis conducted here, particularly for basal dressing, makes one wonder why fertilizer is being used at all in Zambia. There are a number of possible explanations. First of all, for farmers who don’t sell maize, and particularly for those who supplement their own production with maize purchases, this AVCR is an irrelevant measurement of fertilizers usefulness. For 23 The means of the probability of successful harvest are presented by province in Appendix F 54 Expected average product(kg/kg) 2 4 6 8 Figure 1.7: Expected average product of fertilizer applications in Zambia .42 0 .99 0 .2 .4 .6 Cumulative share of users .8 1 Basal Application Top Dressing Source: Author’s calculations these farmers the per kg value of what they grow should be measured by the amount they save from not having to purchase maize, rather than how much they earn from selling. Moreover, farmers may not trust the market’s reliability to provide maize, and thus place an unmeasured value on having their own production. Secondly, as our previous simulations analysis demonstrates, subsidized input and output prices affect farmer incentives. For example, if we apply the subsidized fertilizer price (1,000 ZMK/kg) to our analysis in Figure 1.6, we only need a response rate of 2.8 for use to be profitable (ignoring transfer costs). Thus, at the subsidized fertilizer price basal fertilizer use is not profitable on just 54% of the fertilized fields in our study and top dressing is not profitable for only 11%. This begs the question as to whether government policies that distort the market 55 induce farmers to employ economically inefficient production technologies. In the least, the fact that fertilizer use is only profitable when subsidized on the majority of Zambian maize fields has implications for the long-run viability of subsidy programs. Specifically, this suggests subsidies will not successfully induce adoption that will last after subsidies have been removed. Finally, farmers may not differentiate between basal and top dressing fertilizers, and thus comingle the benefits when considering fertilizer decisions. That is, the farmer may not expect a 5 kg/kg response from top dressing and a 2 kg/kg response to basal fertilizer, but rather a 3.5 kg/kg response from each when they’re applied simultaneously. At subsidized prices, this may well be considered a profitable response for both fertilizers. To more rigorously answer this question is beyond our scope, but what is clear from these results is that low profitability of fertilizer use is prevalent in Zambia. 56 1.7 SUMMARY AND IMPLICATIONS Indicators such as the food crisis of 2008, the ongoing famine in parts of the developing world and the stubbornly persistent poverty rates of the past several decades all suggest that a deep chasm still exists between the status quo and a world without hunger. Particularly in Africa, current population trends in terms of growth and concentration imply that increased small holder productivity through improved yields will be absolutely necessary if we are to achieve the goal of ending global poverty and hunger. Many African countries in recent years have returned to input subsidies, particularly for fertilizer, in an effort to achieve this goal with underwhelming results. To better inform agricultural policies designed to benefit small farmers, one must understand the determinants of their yields and how they respond to input investments. In this study we estimate the determinants of maize yields in Zambia using data from 7,262 fields from households that grew maize and were interviewed in 2000, 2004 and 2008. Following known agronomic principles, we allowed the effects of some inputs, particularly fertilizer, to be conditional on other factors of the maize’s growing condition. Our discussion highlights that many unobserved but very important factors are frequently omitted from yield models that use household data, such as soil quality and moisture content. We recommend an approach for estimating these models to mitigate the potential bias that could stem from said omissions and from the structural endogeneity of input use. Our results indicate that top dressing is generally more effective on Zambian maize crops, but significantly less so on sandy and plowed fields, where more of the nutrients in the fertilizer will be vulnerable to leaching. Consistent with agronomic principles, we find that phosphoric basal fertilizer is significantly less effective on acidic soils. Both types of fertilizer are more 57 effective when used in conjunction with improved seed varieties, but for the vast majority of our sample fertilizer use would not be profitable at commercial prices. The results of this study have several important implications for researchers and policy makers alike. First, while the method used here seems to have successfully controlled for many of the omitted variables and structural endogeneity inherent in our model, one of the key lessons learned is that household economic models would benefit greatly from improved agronomic data availability. Note, for example, our results showed no significant relationship between rainfall and yield. Obviously, this does not come from the lack of rainfall’s influence on rain-fed maize crops, but rather from the fact that total rainfall data in the growing season is not always a good proxy for the soil moisture conditions throughout the growing season, the true determinant of yield. The distribution of rainfall is likely to be a more important determinant of crop yield than the total quantity of rainfall. We would also very likely find a much stronger relationship between soil acidity and yields if our data were at the field level. As collecting this information becomes more affordable, it would be extremely valuable to incorporate it into household survey data. Beyond affecting yield itself, these factors could potentially add considerable explanatory power to models of input demand (as in Marenya and Barrett, 2009) and output market participation. Following the approach to circumvent this lack of data, our results demonstrate that failing to account for unobserved heterogeneity and endogeneity increased our estimate of yield response to top dressing fertilizer and decreased our estimate of basal dressing. This suggests that the farmers in our sample with better unobservable productivity are more likely to have top dressing applied and less likely to have basal applied. This could be a reflection of the fact that 58 farmers with better knowledge and skill are aware of the fact that top dressing is more effective than basal dressing on most of Zambia’s acidic soils. We find that top dressing fertilizer, which provides nitrogen, is more effective than phosphoric basal fertilizer on Zambian soils, but response rates are less attractive in coarse, sandy soils and on plowed fields where the majority of the topsoil is disturbed. Moreover, given transportation costs and response rates (4.2 kg/kg on average), adoption of top dressing fertilizer would be unprofitable for many Zambian farmers at commercial prices. Soil acidity is a substantial limiting factor in Zambian maize production, both from the direct impact on maize plants and the impact it has on fertilizer effectiveness. Basal fertilizer in particular is vulnerable to nutrient “lockup” in the acidic soils that prevail throughout Zambia, rendering its nutrients largely unavailable to plants and its application even less profitable for farmers, at least at commercial prices. In fact, fertilizer (if it is the only input under consideration) is only profitable for many households when it can be purchased at subsidized prices, which has important implications for the long-run viability of subsidy programs. Specifically, this essentially eliminates the possibility of a successful “phase out” of the program after which farmers would continue to use fertilizer at commercial prices. This calls for a shift in the design of agricultural and rural poverty reduction programs away from fertilizer subsidies and distribution as the cornerstone to a more integrated program that may include fertilizer subsidies along with other inputs and agronomic practices to allow for sustainable and profitable crop intensification even after the fertilizer subsidies are withdrawn. In Zambia, ZARI is the government funded agricultural research institute where this development would most likely take place. In a follow-up to this analysis, several researchers 59 and officials were interviewed to discuss the results presented here. Every agronomist and official interviewed was fully aware of the scientific fact that acidity affects the productivity of maize plants both directly and through its impact on the effectiveness of fertilizer, and not surprised in the least by our results. In the opinion of senior ZARI researchers, one approach to address this and other productivity limitations is to develop acid resistant seed varieties that are specifically designed to prosper on Zambian soils. There are, however, very limited resources dedicated to such endeavors. According to one official, the budget allocated to improved plant development for all of Zambia is less than $100,000 annually, and the laboratories most often actually receive less than half of that. The remainder is frequently re-allocated to the FISP or FRA programs that respectively subsidize input purchases and output sales in the maize market. That said, additional research may not be necessary to improve yields in the short term because results from existing Zambia-specific research has generated existing knowledge that has never been fully exploited. Specifically, it is known to Zambian agronomists that lime application is the most direct management practice to solve the problem of soil acidity. So, if Zambian officials are aware of both the acidity problem and the solution to it, one must wonder why lime is not being applied to smallholder fields (less than 2% of our sample applied lime, and those who did applied at just 5-10% of the recommended rate, on average). According to officials, there are two primary constraints. First is the cost of getting the appropriate amount of lime on to the field. The product itself is relatively inexpensive (the per kilogram retail price is approximately 10% that of basal fertilizer), but ZARI application recommendations are 1-2 tonnes per hectare (or 2.5-5 times 60 greater than the recommended fertilization rate). Moving that quantity of lime, according to officials, has been cost prohibitive. For example, while a 50 kg bag of lime may cost 25,000 ZMK at the retailer, officials estimate the cost at the farm gate can be as high as 80,000 ZMK. If we accept this as fact, distributing sufficient lime would indeed cost more than twice the cost of distributing the current amount of basal fertilizer. However, the added benefit would, by all anecdotal accounts, outweigh the added cost. Unfortunately, we are unable to do a proper analysis of the profitability of lime use because there simply isn’t enough data. For example, less than 2% of the fields in our sample had lime applied, and on the rare occasions it was, rates were very low. In a 2001 publication, The et al. demonstrated that lime alone could more than double yields on acidic soils in Cameroon, while phosphoric fertilization alone will have extremely limited impact. ZARI has experimental results demonstrating the same is true in Zambia, but their results have never been made easily accessible. The second constraint is the lack of farmer awareness regarding the negative impacts of soil acidity and the mitigating effects of lime application. There was a brief period during which the Program Against Malnutrition (PAM, a government funded safety net to provide resources to the poorest Zambian smallholders) included lime in the package of goods distributed to their beneficiaries. Not realizing the potential benefits, or in some cases believing lime would damage rather than enhance their soil, officials say the majority of the recipients either disposed of the input or mixed it with water to use as paint for their houses. At current budget allocations, extension officials claim they are not adequately equipped to convince farmers to alter their input strategies. For example, ZARI has produced “production guides” for several crops, including one for maize which discusses the importance of liming. 61 Funding, however, has limited production of these guides to about 2,000 copies per crop. In a nation of more than 1,500,000 small farming households, this is staggeringly inadequate. That said, several members of the Ministry of Agriculture and Cooperatives claim that putting a guide in every household would not change behavior without sustained extension efforts. Many claim on-the-ground evidence will be necessary through example plots that iteratively demonstrate the benefits of soil pH management for several years. Resources for maintaining such plots, however, have not been made available. It is estimated that one plot per District Camp would be sufficient to sensitize farmers to the benefits of managing pH and other soil characteristics. There are roughly 1,700 such Camps in Zambia, each with a resident extension agent in place. Similar to the distribution in our sample, officials estimate that roughly half of them are in areas where highly acidic soil is prevalent. By their reckoning, the additional cost of installing and managing such a plot would be ZMK 2.5-5 million per year. For a pilot program of 100 Camps over 3 years, this would be a total commitment of less than roughly 1% of the government’s allocation to FISP in 2011 alone. By the conservative cost estimation, to manage one of these demonstration plots in every one of the 800 Camps where soil acidity is the worst would cost 4 billion Kwacha per year. This is less than the 5.2 billion allocated to the annual Agricultural Show, a three day event held in the capital every year, which, incidentally, is partially meant to be a farmer outreach program. Liming, of course, is not the only solution to the productivity limitations of acidic soil. For example, it has been shown that applying small bands of fertilizer very near, around or under the seed at the time of planting makes P-fertilization more effective in acidic soil (Boman et al, 1992). This is known as “banding” application, as opposed to evenly spreading fertilizer over the entire field, or “broadcasting,” as is commonly practiced in Zambia. Extension officials say 62 farmers are willing to try the new application method, but insist on waiting until after germination to avoid potentially wasting resources. This is unfortunate, because the nutrients in basal fertilizer, which are critical in the early stages of plant life, require the time between planting and germination to become ready for the plant’s consumption. Late application, therefore, may dramatically reduce fertilizer effectiveness. Again, more extension work and demonstrations may be required to combat this problem. Also, private firms in developed countries produce “phosphorus enhancing” fertilizer supplements which claim to harmlessly alter the soil chemistry near the fertilizer to protect it from becoming unavailable to plants. These fertilizer supplements have been tested extensively on U.S. soil, where they have a 15-20% effect on increasing yields, but the benefits may be greater on more acidic soils. Resources should be allocated so that such alternatives can be tested in Zambia. Alternatively, some combination of inputs management practices could be designed to both increase pH and basal fertilizer’s resistance to acidity in order to find the most productive maize farming practices. Finding the optimal solution will require research and funding allocated to institutions like ZARI. Fertilizer use is often cited as one of the key differences between African countries and the Green Revolution success stories like India, where application rates are many times higher. The recent resurgence of fertilizer subsidy programs throughout Africa confirms that policy makers are keen to promote fertilizer use among their smallholders at almost any cost. In separate studies the Economist Intelligence Unit (2008) and the International Food Policy Research Institute (Fan et al 2007) ranked fertilizer subsidies during the Green Revolution 63 among the least effective policies in terms of benefit/cost ratio and poverty reduction. Consistently in the top of these rankings is agricultural research and development. Fertilizer (along with irrigation and tailored hybrid seed use) may have been the appropriate technology for the agricultural systems of Asia’s Green Revolution. This does not imply, however, a mono-focus on fertilizer alone is the appropriate technology for many areas of Africa, even though increasing fertilizer use will be required. The challenge is to find packages of inputs and agronomic practices that will be profitable for farmers to adopt sustainably. Our results demonstrate that in countries farming on acidic soil, alternative measures will need to be taken if yields are to increase to the levels required to support economic transformation. This may be through tailored application methods and tillage practices, or through the use of supplementary inputs such as lime and phosphorus enhancers. The optimal prescription is unknown and finding it will require a investment in agronomic research specific to each county or region’s agricultural system. It is clear, on the other hand, that uninformed subsidy policies are not likely to succeed in producing long-term economic growth. 64 APPENDICES 65 APPENDIX A Figure A.1: pH and soil type maps of Zambia Soil Reaction 3.2‐3.8  3.8‐4.4  4.4‐4.8  4.8‐5.4  5.4‐7.1  Compiled by Mambo and Phiri (2003) 66 Figure A.2: Soil Map of Zambia Soils of Zambia LEGEND N W E ' S ' ' Kaputa Chiengi Mpulungu ' ' Nchelenge ' Mwense ' ' Mbala ' Mporokoso Mansa' Kasama' Luwingu ' Mungwi ' ' Isoka Chinsali ' Chilubi Samfya ' ' Mwinilunga Zambezi ' Solwezi ' Chililabombwe ' Milenge ' Chingola ' Mufulira ' Kitwe ' ' Luanshya '' Ndola ' Serenje Masaiti ' ' Mpongwe ' Mkushi Kasempa Kabompo ' ' Lukulu ' ' ' Kaoma Kalabo ' Mongu ' ' ' ' Senanga ' ' Sesheke Chongwe ' Lusaka Kafue ' Namwala Mazabuka ' Monze ' ' Gwembe Siavonga ' ' ' ' ' ' Luangwa ' Choma Kalomo ' Sinazongwe Lundazi Chipata Katete ' Petauke ' Nyimba ' Kabwe Chibombo Mumbwa Itezhi-Tezhi Kapiri Mposhi Chama Mpika ' ' Nakonde Kawambwa ' ' ' SoSoil Types il Types Acrisols Acrisols Alisols Alisols Arenosols Arenosol Cambisols Cambisols Ferralsols Ferralsols Fluvisols Fluvisols Gleysols Gleysols Histosols Histosols Leptosols Leptosols Lixisols Lixisols Luvisols Luvisols Nitisols Nitisols Phaeozems Phaeozems Planosols Planosols Planosols Planosols Podzols Podzols Regosols Regosols Solonchaks Solonchaks Solonetz Solonetz Vertisols Vertisols Associations Associations Kazangula ' ' Livingstone Soil Research Team, Mt. Makulu, Chilanga (2002) September 2002 Soils Research Team, Mt. Makulu, Chilanga 67 APPENDIX B Table B.1: Select results from yield response estimates un-weighted for attrition bias OLS IV PCRE IPCRE (i) (ii) (iii) (iv) Fertilizer use y -321.292*** -320.8636*** -323.2965*** -332.0191*** Fertilizer user (1=yes) (0.00) (0.00) (0.00) (0.00) Basal application rate 2.1643** 2.0875** 3.1121*** 3.1090*** y (0.01) (0.02) (0.00) (0.00) (kg/ha) y -0.0045** -0.0041** -0.0046*** -0.0043*** Top dress application y (kg/ha) (0.02) 6.2809*** (0.00) (0.02) 6.3275*** (0.00) (0.00) 5.6336*** (0.00) (0.00) 5.6851*** (0.00) y -0.0029*** -0.0029*** -0.0020*** -0.0020*** (0.00) (0.00) (0.00) (0.00) 243.723*** (0.00) 240.2565*** (0.00) 216.6267*** (0.00) 218.5080*** (0.00) 22.0193*** (0.00) 21.6226*** (0.00) 22.5534*** (0.00) 21.8066*** (0.00) -0.1336*** -0.1295*** -0.1218*** -0.1141*** (0.00) 219.036*** (0.01) -56.2685 (0.59) 73.6087 (0.14) 18.0235 (0.58) 51.6056 (0.66) 136.9593* (0.06) 264.517*** (0.00) 44.5209 (0.79) 55.2042 (0.21) -170.4319 (0.30) (0.00) 218.8710*** (0.01) -57.3608 (0.59) 73.7688 (0.13) 18.0287 (0.58) 50.6198 (0.66) 134.9510* (0.06) 263.7214*** (0.00) 46.8629 (0.77) 56.2856 (0.20) -166.8851 (0.31) (0.00) 358.8799*** (0.00) 121.7717 (0.30) 65.9419 (0.28) 40.1343 (0.37) 138.7752 (0.27) 42.1195 (0.71) 196.8665*** (0.00) 221.4980 (0.28) 113.4178* (0.05) 90.9730 (0.67) (0.00) 358.6640*** (0.00) 121.9097 (0.30) 66.9046 (0.27) 40.2256 (0.36) 138.2669 (0.27) 38.5208 (0.73) 193.5208*** (0.00) 225.2132 (0.27) 113.4221* (0.05) 92.6726 (0.66) Basal rate squared Top dress rate squared Other inputs and tillage Improved seed use y (1=yes) Seed application rate y (kg/ha) Seed rate squared y Planted with an N fixer (1=yes) Other mixed crop (1=yes) Applied plant or animal manure (1=yes) Planting before the rains (1=yes) Planting basins (1=yes) Zero tillage (1=yes) Plowing (1=yes) Ripping (1=yes) Ridging (1=yes) Bunding (1=yes) 68 Table B.1: (cont’d) OLS (i) 425.107*** (0.00) 459.430*** (0.00) 477.403*** (0.00) IV (ii) 423.3512*** (0.00) 457.3851*** (0.00) 474.6010*** (0.00) PCRE (iii) 437.2456** (0.01) 457.4049*** (0.01) 519.1867*** (0.00) PCRE IV (iv) 435.5213** (0.01) 455.2685*** (0.01) 518.2911*** (0.00) 125.230*** (0.01) 5.5  pH  7.1 (1=yes) 318.3120** (0.01) Clayish soils (excl. -197.908*** acrisols) (0.00) Sandy/undeveloped soils -136.2594** (0.04) Developed/organic soils 74.0828 (0.49) Other soils -152.5237** (0.02) Weather Rainfall (mm) 0.0724 (0.86) Rainfall squared -0.0001 (0.56) Stress periods -23.0017 (<20mm/dekad) (0.18) Provincial weather effect Yes Fertilizer interactions Basal 0.9973 y (0.14) rate*1[pH  [ 4.4,5.5) ] Basal 4.6162*** y (0.00) rate*1[pH  [5.5,7.1] ] y -0.0001 Basal rate*top dress rate 124.3582*** (0.01) 312.7045** (0.01) -198.5336*** 126.2784*** (0.01) 343.4986*** (0.01) -207.0411*** 126.9407*** (0.01) 337.2888*** (0.01) -207.3878*** (0.00) -131.9179* (0.05) 59.4174 (0.57) -142.9610** (0.03) (0.00) -135.0685* (0.05) 63.3488 (0.56) -156.2458** (0.02) (0.00) -130.0181* (0.06) 47.7495 (0.65) -144.5014** (0.04) 0.0655 (0.87) -0.0001 (0.57) -23.6562 -0.1403 (0.78) -0.0001 (0.81) -20.5949 -0.1554 (0.75) -0.0000 (0.83) -20.6126 (0.16) Yes (0.47) Yes (0.47) Yes 1.0975 (0.11) 0.7448 (0.38) 0.9374 (0.27) 4.4737*** (0.01) 4.8063*** (0.00) 4.7160*** (0.00) -0.0002 -0.0013 -0.0015 Weed once (1=yes) Weed twice (1=yes) Weed three times (1=yes) Soil characteristics 4.4  pH  5.5 (1=yes) Top dress rate*clay soil Top dress rate*sandy soil y (0.98) -0.6225 (0.90) -0.6697 (0.36) -0.6178 (0.27) -0.6840 y (0.32) -1.5272* (0.30) -1.7074* (0.33) -1.5428* (0.29) -1.7383* (0.09) (0.06) (0.09) (0.06) 69 Table B.1: (cont’d) Top dress rate*rich soil Top dress rate*other soil y OLS (i) 0.2817 IV (ii) 0.5071 PCRE (iii) 0.3275 PCRE IV (iv) 0.5553 y (0.83) -0.6051 (0.70) -0.8903 (0.80) -0.5101 (0.68) -0.8239 (0.53) -1.0478** (0.04) (0.36) -1.0232** (0.05) (0.60) -0.7326 (0.30) (0.40) -0.6649 (0.35) 0.4754* (0.07) 0.4517 (0.12) 0.2142 (0.50) 0.1512 (0.66) 268.2505 (0.35) 7262 0.261 285.3525 (0.32) 7262 0.261 35.0534 (0.96) 7262 0.265 69.4362 (0.92) 7262 0.265 Top dress rate*Plow y tillage Fertilization y rate*improved seed use Constant Observations R-squared 70 APPENDIX C Table C.1: Instrumented, Pooled Correlated Random Effects estimation results compared to Fixed Effects Instrumental Variable estimation results Estimator IPCRE FEIV Explanatory variables (i) (ii) Fertilizer use y -345.4989*** -344.8795*** Fertilizer user (1=yes) (0.00) (0.00) y 2.5625*** 2.6583*** Basal application rate (kg/ha) (0.01) (0.01) y -0.0039** -0.0041** Basal rate squared (0.03) (0.02) y 5.4721*** 5.5416*** Top dress application (kg/ha) (0.00) (0.00) y -0.0019*** -0.0020*** Top dress rate squared (0.00) (0.00) Other inputs and tillage y 183.1064*** 184.8419*** Improved seed use (1=yes) (0.00) (0.00) y 20.7972*** 20.5921*** Seed application rate (kg/ha) (0.00) (0.00) y -0.1062*** -0.1032*** Seed rate squared (0.00) (0.00) Planted with an N fixer (1=yes) 377.0041*** 179.6058* (0.00) (0.05) Other mixed crop (1=yes) 190.6695 -37.4358 (0.15) (0.76) Applied plant or animal manure (1=yes) 23.1003 44.3713 (0.72) (0.37) Planting before the rains (1=yes) 20.8432 -10.2484 (0.66) (0.77) Planting basins (1=yes) 141.8314 44.4658 (0.26) (0.69) Zero tillage (1=yes) 51.7677 145.7218* (0.66) (0.06) Plowing (1=yes) 171.7717** 227.4446*** (0.03) (0.00) Ripping (1=yes) 190.4936 22.4789 (0.41) (0.89) Ridging (1=yes) 126.7231** 52.4627 (0.04) (0.27) 71 Table C.1: (cont’d) PCRE IV (i) -72.2638 (0.76) 341.8989* (0.06) 380.0669** (0.04) 438.4622** (0.03) FE IV (ii) -232.2502 (0.17) 328.3955** (0.03) 367.8507** (0.02) 395.2880** (0.02) 177.7879*** (0.00) 419.5940*** (0.00) -242.3512*** (0.00) -156.3618** (0.04) 4.9952 (0.96) -208.5235*** (0.00) 161.2311*** (0.00) 400.8690*** (0.00) -243.7009*** (0.00) -154.4745** (0.04) 18.8047 (0.86) -213.1332*** (0.00) -0.2604 (0.62) 0.0000 (0.86) -3.0743 (0.92) Yes 0.2566 (0.56) -0.0002 (0.37) -26.8415 (0.13) Yes 1.4766* 1.2768* y (0.08) 4.9971*** (0.06) 4.8421*** y (0.00) -0.0015 (0.00) -0.0015 y (0.32) -0.8527 (0.35) -0.7748 y (0.22) -2.0764** (0.25) -2.1184** (0.02) (0.02) Bunding (1=yes) Weed once (1=yes) Weed twice (1=yes) Weed three times (1=yes) Soil characteristics 4.4  pH  5.5 (1=yes) 5.5  pH  7.1 (1=yes) Clayish soils (excl. acrisols) Sandy/undeveloped soils Developed/organic soils Other soils Weather Rainfall (mm) Rainfall squared Stress periods (<20mm/dekad) Provincial weather effect Fertilizer interactions y Basal rate*1[pH  [ 4.4,5.5) ] Basal rate*1[pH  [5.5,7.1] ] Basal rate*top dress rate Top dress rate*clay soil Top dress rate*sandy soil 72 Table C.1: (cont’d) Top dress rate*rich soil Top dress rate*other soil Top dress rate*Plow tillage Fertilization rate*improved seed use y PCRE IV (i) 0.5431 FE IV (ii) 0.4389 y (0.67) -0.8496 (0.73) -0.9157 y (0.36) -0.7364 (0.32) -0.8210 (0.29) 0.3451 (0.31) 72.9007 (0.83) -97.9246 (0.89) 7127 0.265 (0.22) 0.3563 (0.29) 64.8138 (0.85) 338.6581 (0.30) 7127 0.261 y Inverse Mills ratio for growing maize both years Constant Observations R-squared 73 APPENDIX D Table D.1: Net revenue maximizing seed rate sensitivity analysis Maize output prices (ZMK/kg) Commercial FRA less ` Commercial less transport FRA transport (782) (569) (1300) (1087) ------------Net revenue maximizing seed rate-----------Seed input prices (ZMK/kg) Recycled/local (5000) 68.7 57.3 80.9 77.3 Recycled plus transport (5213) 67.4 55.5 80.1 76.4 Cheap hybrid (7000) 56.5 40.5 73.6 68.5 Cheap hybrid plus transport (7213) 55.2 38.7 72.8 67.6 Expensive hybrid (11000) 32.1 6.9 58.9 50.9 Expensive hybrid plus transport (11213) 30.8 5.1 58.1 50.0 Source: Author’s calculation based on regression results in Table 1.4, column iv. Distance to seed seller and maize purchaser is assumed to be 30 kilometers with transportation costs of 7.1 ZMK per kilogram per kilometer for both. Interactive Excel spreadsheet is available upon request for further sensitivity analysis. 74 APPENDIX E Table E.1: Selected results from yield response estimates for quantities of Nitrogen and Phosphorus Estimator OLS IV PCRE IPCRE Explanatory variables (i) (ii) (iii) (iv) Fertilizer use y -333.958*** -331.1359*** -338.3046*** -346.1022*** Fertilizer user (1=yes) (0.00) (0.00) (0.00) (0.00) y 5.7178 5.2571 7.6456 7.6909 Phosphorus rate (kg/ha) (0.24) (0.28) (0.17) (0.17) y -0.1142** -0.1042* -0.0844* -0.0767 Phosphorus rate squared (0.04) (0.05) (0.10) (0.14) y 12.7779*** 12.8134*** 11.6151*** 11.6275*** Nitrogen rate (kg/ha) (0.00) (0.00) (0.00) (0.00) y -0.0133*** -0.0132*** -0.0096*** -0.0092*** Nitrogen rate squared (0.00) (0.00) (0.00) (0.00) Other inputs and tillage Improved seed use 223.198*** 218.9245*** 180.2142*** 182.3408*** y (0.00) (0.00) (0.00) (0.00) (1=yes) Seed application rate 20.1287*** 19.7589*** 21.4384*** 20.7406*** y (0.00) (0.00) (0.00) (0.00) (kg/ha) y -0.1162*** -0.1118*** -0.1129*** -0.1056*** Seed rate squared Planted with an N fixer (1=yes) Other mixed crop (1=yes) Applied plant or animal manure (1=yes) Planting before the rains (1=yes) Planting basins (1=yes) Zero tillage (1=yes) Plowing (1=yes) Ripping (1=yes) Ridging (1=yes) (0.00) 183.0391** (0.05) -25.3239 (0.85) 38.5694 (0.45) -10.3577 (0.77) 44.5066 (0.69) 145.1570* (0.06) 233.959*** (0.00) 3.2847 (0.98) 44.7112 (0.36) (0.00) 174.1369* (0.06) -28.7940 (0.83) 39.0638 (0.44) -11.3667 (0.75) 42.5595 (0.71) 141.5658* (0.07) 228.5035*** (0.00) 10.2106 (0.95) 44.2000 (0.36) 75 (0.00) 382.7085*** (0.00) 191.6873 (0.15) 20.7332 (0.75) 21.4012 (0.65) 140.7991 (0.26) 56.5381 (0.63) 180.0028** (0.02) 183.6101 (0.43) 127.5074** (0.04) (0.00) 376.9016*** (0.00) 190.4080 (0.15) 22.5477 (0.73) 20.9489 (0.66) 141.8740 (0.26) 51.8175 (0.66) 173.0150** (0.03) 191.1802 (0.41) 126.4577** (0.04) Table E.1: (cont’d) Bunding (1=yes) Weed once (1=yes) Weed twice (1=yes) Weed three times (1=yes) OLS (i) -249.1835 (0.16) 330.0369** (0.03) 368.3921** (0.02) 397.1221** (0.02) IV (ii) -242.9858 (0.16) 320.2822** (0.04) 357.9670** (0.02) 384.0413** (0.02) PCRE (iii) -77.5609 (0.74) 350.0009* (0.06) 388.6086** (0.04) 447.0500** (0.02) IPCRE (iv) -72.9201 (0.75) 341.5968* (0.06) 379.6600** (0.04) 438.2906** (0.03) Soil characteristics 4.4  pH  5.5 (1=yes) 164.5800*** 160.5193*** 175.0973*** 174.8300*** (0.00) (0.00) (0.00) (0.00) 5.5  pH  7.1 (1=yes) 410.4628*** 397.3711*** 427.9988*** 416.6282*** (0.00) (0.00) (0.00) (0.00) Clayish soils (excl. -239.5355*** -240.9423*** -238.0959*** -239.5104*** acrisols) (0.00) (0.00) (0.00) (0.00) Sandy/undeveloped soils -161.7011** -154.2497** -159.9352** -152.8558** (0.03) (0.04) (0.03) (0.04) Developed/organic soils 33.2519 24.1157 20.5983 8.0954 (0.76) (0.83) (0.85) (0.94) Other soils -225.3741*** -208.3845*** -224.4293*** -205.6920*** (0.00) (0.00) (0.00) (0.01) Weather Rainfall (mm) 0.2905 0.2813 -0.2464 -0.2586 (0.51) (0.52) (0.64) (0.62) Rainfall squared -0.0002 -0.0002 0.0000 0.0000 (0.33) (0.33) (0.88) (0.87) Stress periods -24.4845 -25.5716 -3.0180 -3.5836 (0.92) (0.90) (<20mm/dekad) (0.16) (0.14) Provincial weather effect Yes Yes Yes Yes Fertilizer interactions y 5.5736 6.4282* 6.5662 7.8338* P-rate * 1[pH  [ 4.4,5.5 ] (0.10) (0.06) (0.13) (0.07) P-rate * 23.2858*** 23.4128*** 25.1482*** 25.5038*** y (0.00) (0.00) (0.00) (0.00) 1[pH  [5.5,7.1] ] y 0.0114 0.0093 -0.0096 -0.0126 P-rate * N-rate N-rate *clay soil N-rate * sandy soil y (0.53) -1.4995 (0.60) -1.5785 (0.55) -1.5491 (0.44) -1.6631 y (0.21) -3.4489** (0.19) -3.8479** (0.22) -3.4836** (0.19) -3.8899** (0.03) (0.02) (0.03) (0.02) 76 Table E.1: (cont’d) N-rate * rich soil N-rate * other soil N-rate * Plow tillage y OLS (i) 0.4462 IV (ii) 0.8356 PCRE (iii) 0.4572 IPCRE (iv) 0.8785 y (0.84) -0.9926 (0.72) -1.7415 (0.84) -0.8839 (0.71) -1.6999 y (0.55) -2.0186** (0.30) -1.9470** (0.60) -1.5253 (0.32) -1.3660 (0.02) 1.5477** (0.02) (0.03) 1.4450* (0.05) (0.23) 1.1064 (0.19) (0.28) 0.9127 (0.31) 83.7789 (0.80) 305.0357 (0.35) 7127 0.260 33.4891 (0.92) 340.7693 (0.30) 7127 0.259 109.9451 (0.74) -160.6450 (0.83) 7127 0.265 73.8904 (0.82) -100.1533 (0.89) 7127 0.265 Fertilization y rate*improved seed use Inverse Mills ratio for growing maize both years Constant Observations R-squared 77 APPENDIX F Table F.1: Likelihood of successful harvest and fertilizer use by province Successful Top dress harvests over time Fertilizer used Basal rates rates ------------Share of fields------------------kg/ha among users-----Central 0.96 0.57 140.5 148.2 Copperbelt 0.99 0.50 144.9 157.4 Eastern 0.98 0.32 102.8 114.8 Luapula 0.96 0.32 186.3 175.1 Lusaka 0.91 0.61 141.6 146.6 Northern 0.97 0.43 191.2 191.3 North Western 0.96 0.24 153.6 151.3 Southern 0.90 0.36 125.4 127.7 Western 0.80 0.07 134.0 124.1 Zambia 0.93 0.35 142.2 146.5 Note: “Successful harvests over time” is the percentage of fields from 2006-2011 harvesting 115 kg/ha or more of maize, based on CSO Crop Forecast Surveys. All other data come from the full sample of maize fields from the 2004 and 2008 Supplemental Surveys, not the selected observations described in this study’s data section. Province 78 REFERENCES 79 REFERENCES Berck, P., J. Geoghegan and S. Stohs. 2000. “A Strong Test of the von Liebig Hypothesis.” American Journal of Agricultural Economics 82: 948-55. Boman, R.K., R.L. Westerman, G.V. Johnson, M.E. Jojola. 1992. “Phosphorus fertilization effects on winter wheat production in acid soils.” p. 195-200. In J.L. Havlin (ed.) Proc. Great Plains Soil Fert. Conf. Denver, CO. Vol. 4. Brittan, K.L., J.L. Schmierer, D.J. Munier, K.M. Klonsky, P. Livingston. 2008. “Sample costs to produce field corn on mineral soils in the Sacramento valley.” Burke, W.J., M. Hichaambwa, D. Banda and T.S. Jayne. 2011. “The Cost of Maize Production by Smallholder Farmers in Zambia.” Food Security Research Project Working Paper 50. Lusaka, Zambia. Carpentier, A. and R.D. Weaver. 1997. “Damage Control Productivity: Why Econometrics Matters.” American Journal of Agricultural Economics 79: 47-61. Chamberlain, G. 1982. “Mulivariate Regression Models for Panel Data.” Journal of Econometrics 18: 5-46. Chambers, R.G., and E. Lichtenberg. 1994. “Simple Econometrics of Pesticide Use.” American Journal of Agricultural Economics 76: 406-17. Chambers, R.G., and E. Lichtenberg. 1996. “A Nonparametric Approach to the von Liebig-Paris Technology.” American Journal of Agricultural Economics 78: 37386. Cobb, C.W., and P.H. Douglas. 1928. “A Theory of Production.” American Economic Review 18: 139-72. Crawford, E., V. Kelly, T.S. Jayne and J. Howard. 2003. “Input Use and Market Development in Sub-Saharan Africa: An Overview.” Food Policy 28: 277-92 Eckert, D. 2010. “Efficient Fertilizer Use - Nitrogen.” The Efficient Fertilizer Use Manual, accessed June 7, 2010: http://www.rainbowplantfood.com/agronomics/efu/nitrogen.pdf Goedeken, M.W., G.V. Johnson, W.R. Raun, and S.B. Phillips. 1988. “Soil Test Phosphorus Crop Response Projections to Variable Rate Application in Winter Wheat.” Communications in Soil Science and Plant Analysis 29: 1731-38. Griffith, B. 2010. “Efficient Fertilizer Use - Phosphorus.” The Efficient Fertilizer Use Manual, accessed June 7, 2010: http://www.rainbowplantfood.com/agronomics/efu/phosphorus.pdf 80 Griliches, Z. and J. Mairesse. 1995. “Production Functions: The search for identification.” National Bureau of Economic Research Working Paper #5067. Cambridge, Massachusetts. GRZ. 2002. “Zambia: Poverty Reduction Strategy Paper.” Government of Zambia. available at: http://www.imf.org/External/NP/prsp/2002/zmb/01/ Guan, Z., A. Oude Lansink, M.K. van Ittersum and A. Wossink. 2006. “Integrating Agronomic Principles Into Production Function Specifications: A Dichotomy of Growth Inputs and Facilitating Inputs.” American Journal of Agricultural Economics 88: 203-14. Haggblade, S. and T. Plerhoples. 2010. “Productivity Impact of Conservation Farming on Smallholder Cotton Farmers in Zambia.” Food Security Research Project Working Paper no. 47, Lusaka, Zambia. Heckman, J. J. 1976. “The Common Structure of Statistical Models of Truncation, Sample Selection, and Limited Dependent Variables and a Simple Estimator for Such Models.” Annals of Economic and Social Measurement 5, 475–492. Hoch, I. 1955. “Estimation of Production Function Parameters and Testing for Efficiency.” Econometrica 23: 325-26. Holloway, G. and Q. Paris. 2002. “Production Efficiency in the von Liebig Model.” American Journal of Agricultural Economics 84: 1271-78. Lanzer, E.A. and Q. Paris. 1981. “A New Analytical Framework for the Fertilization Problem.” American Journal of Agricultural Economics 63: 93-103 Levinsohn, J. and A. Petrin. 2003. “Estimating Production Functions Using Inputs to Control for Unobservables.” Review of Economic Studies 70: 317-42. Lichtenberg, E. and D. Zilberman. 1986. “The Econometrics of Damage Control: Why Specification Matters.” American Journal of Agricultural Economics 68: 261-73. Mambo, A. and L.K. Phiri. 2003. “Soil reaction (pH) in the soils of Zambia: Memoir accompanying the soil reaction (pH) map of Zambia.” The Republic of Zambia, Mt. Makulu Central Research Station (available upon request) Mason, N. 2011. “Growing season total rainfall and rainfall ‘stress’ district-level lookup file for 1990/91-2009/10 agricultural years.” Food Security Research Project data documentation (available upon request). Mather, D., D. Boughton, and T.S. Jayne. 2011. “Smallholder Heterogeneity and Maize Market Participation in Southern and Eastern Africa: Implications for Investment Strategies to Increase Marketed Food Staple Supply.” International Development Working Paper (number forthcoming), Michigan State University, East Lansing. 81 Megill, D.J. 2004. “Recommendations for Sample Design for Post-Harvest Surveys in Zambia Based on the 2000 Census.” Food Security Research Project Working Paper number 11. Lusaka, Zambia Mundlak, Y. 1961. “Empirical Production Function Free of Management Bias.” Journal of Farm Economics 43: 44-56. Mundlak, Y. and I. Hoch. 1965. “Consequences of Alternative Specifications of CobbDouglas Production Functions.” Econometrica 33: 814-28. Olley, G.S. and A. Pakes. 1996. “The Dynamics of Productivity in the Telecommunications Equipment Industry.” Econometrica 64: 1263-97. Oude Lansink, A. and A. Carpentier. 2001. “Damage Control Productivity: An Input Damage Abatement Approach.” Journal of Agricultural Economics 52: 11-22. Paris, Q. 1992. “The von Liebig Hypothesis.” American Journal of Agricultural Economics 74: 1019-28. Rabbinge, R. 1993. “The Ecological Background of Food Production.” in D. J. Chadwick and J. Marsh, eds. Crop protection and Sustainable Agriculture. Ciba Foundation Symposium 177, Chichester, UK: John Wiley, pp 2-29. Saha, A., C.R. Shumway and A. Havenner. 1997. “The Econometrics of Damage Control.” American Journal of Agricultural Economics 79: 773-85. Sitko, N.J. 2010. “Fractured governance and local frictions: the exclusionary nature of a clandestine land market in southern Zambia” Africa: The Journal of the International African Institute 80(1): 36-55 Sitko, N., T.S. Jayne, J. Mangisoni, L. Kirimi, F. Karin, D. Tschirley, D. Boughton, C. Donovan, H. Zevale, D. Banda. forthcoming. “Maize Value Chains in Eastern and Southern Africa.” International Development Working Paper, Michigan State University, East Lansing. Suri, T. 2011. “Selection and Comparative Advantage in Technology Adoption.” Econometrica 79(1): 159-209. The, C., H. Calba, W.J. Horst and C. Zonkeng. 2001. “Maize Grain Yield Correlated Responses to Change in Acid Soil Characteristics After Three Years of Soil Amendments.” Paper prepared for the Seventh Eastern and Southern Africa Regional Maize Conference. 11th to 15th February, 2001. 82 Tolk, J.A., T.A. Howell and S.R. Evett. 1999. “Effect of mulch, irrigation and soil type on water use and yield of maize.” Soil and Tillage Research 50 (2), pp 137 - 147. van de Ven, G.W.J., N. de Ridder, H. van Keulen and M.K. van Ittersum. 2003. “Concupts in Production Ecology for Analysis and Design of Animal and PlantAnimal Production Systems.” Agricultural Systems 76: 507-25. Van Ittersum, M.K. and R. Rabbinge. 1997. “Concepts in Production Ecology for Analysis and Quantification of Agricultural Input-Output Combinations.” Field Crops Research 52: 197-208. Von Liebig, J. 1862. “Chemistry in its application to Agriculture and Physiology.” 7e, volume II, Brunswick: F. Vieweg and son. Xu, Z., W.J. Burke, T.S. Jayne and J. Govereh. 2009a. “Do Input Subsidy Programs ‘Crowd In’ or ‘Crowd Out’ Commercial Market Development? Modeling Fertilizer Demand in a Two-Channel Marketing System.” Agricultural Economics 40: 79-94. Xu, Z., Z. Guan, T.S. Jayne and R. Black. 2009b. “Factors Influencing the Profitability of Fertilizer Use on Maize in Zambia.” Agricultural Economics 40: 437-46. Zambia Agricultural Research Institute (ZARI). 2002. “Maize Production Guide.” Publication prepared by the Soils and Crops Research Branch, Ministry of Agriculture and Cooperatives. G.C. Kaitisha, P. Gondwe, M.V. Mukwavi and G.M. Kaula (eds). 83 CHAPTER 2: Competitive and Effective: Informal trade, spatial equilibrium and rapid price transmission in Southern Africa 2.1 INTRODUCTION Expensive interventionist grain marketing policies in many countries in Southern Africa (SA) are frequently born from uncertainty regarding potential private sector performance (Tschirley and Jayne, 2008; Mwanawamo et al., 2005; Tschirley et al., 2004; more). These policies, including export bans, exclusively state owned import rights and other license restrictions, have limited private sector activity and perpetuated the uncertainty over its potential performance. Many studies conclude that grain markets in SA are not integrated with each other and world markets at least partially due to government policies (both market interventions and infrastructural investments) and the transfer costs they impose (Myers and Jayne, forthcoming; Rashid and Minot, 2010; Keats et al., 2010; more) Despite these policies, however, data collected in recent years shows that a considerable amount of staple grains are traded across borders throughout the SA region through informal channels (FEWSNET, 2009). Informal traders deal in small quantities (usually just 50 to 100 kilograms at a time), without trading licenses and with no official record of their transactions. Often times, for example, this could be a farmer on a bicycle, crossing the border to fetch a higher price when liquidating some of the household’s stored grain. With thousands of small informal traders operating, however, the aggregate volume of informal trade within the region is substantial. 24 The fact that these transactions are difficult to regulate suggests the relationship 24 Relatively speaking, formal (i.e. licensed) trade occurs very rarely and data on these transactions are largely unavailable. In the few instances where it is available (e.g. Zambia to 84 between informal import and export markets can provide new insights into how international markets within the region might perform in the absence of interventionist policies. The objective of this essay is to analyze intra-regional price transmission in Southern Africa by determining whether, and under what conditions, long-run spatial price equilibrium exists, and by measuring the speed at which price shocks are transmitted between surplus and deficit markets. A key innovation in this research is the focus on markets that are connected through informal trade across international borders, and the use of data on the amount of this informal trade that is occurring to inform the analysis. This study will improve the overall understanding of intra-regional price transmission so that more informed policies can be developed. The objective will be met by employing the Myers and Jayne (forthcoming) extension of the threshold autoregressive (TAR) model. Specifically, we use the amount of inter-regional informal trade as a threshold variable and allow the long-run spatial price equilibrium and the speed of adjustment to differ across potentially multiple trade regimes. As in Myers and Jayne (forthcoming), transfer costs will be explicitly incorporated into the model, rather than assumed constant as in most previous price transmission analyses (e.g. Obstfeld and Taylor, 1997; Goodwin and Piggot, 2001; Shepton, 2003; Balcombe, Bailey and Brooks, 2007; Aker, 2007; others). The number and value of regime defining thresholds will be selected based on the Gonzalo and Pitarakis (2002) penalty function approach. We focus on two informal trade routes: 1) Kitwe in Zambia to Kasumbalesa in The Democratic Republic of Congo (DRC) and 2) Cuamba in Mozambique to Liwonde in Malawi. DRC data from COMESA), data suggests that informal trade accounts for at least 78% of the total traded maize in a given month, but usually more than 90% and frequently 100%. 85 In both cases we find evidence of long-run equilibrium and price transmission that is fairly rapid compared to results from other studies. We find no evidence of multiple regimes, which further indicates the functioning of the informal markets are not as vulnerable to exogenous limitations to trade, such as policy restrictions and transport capacity constraints, as more formal trade flows as found by Myers and Jayne (forthcoming). The next section will describe the data to be used in this study, followed by a section detailing the method employed. Then results will be presented and the final section will detail the conclusions of this study. 86 2.2 DATA This study will combine data on informal trade volumes, maize grain prices, diesel fuel prices and exchange rates from several sources. 2.2.1 Informal trade volumes Informal trade flow data are collected and reported as national level statistics by the Famine Early Warning Systems Network (FEWSNET) in the monthly bulletin series “Informal Cross Border Food Trade in Southern Africa” (For example see FEWSNET, 2009). 25 Due to the nature of this trade and the porous borders throughout the region, collecting accurate informal trade data obviously presents numerous challenges. The FEWSNET method is to hire “border monitors” stationed at various border points that were identified through a consultant study as locations where the largest amount of informally traded maize crosses over. These enumerators take a daily count of the number of bags informal traders carry across borders, usually on bicycles. These counts are then converted to tonnage in the monthly reports based on bag weights (the bags used to transport the maize are designed to hold dry maize at specific size/weight ratios. The most common, for example, is the “50 kg bag”). There are two monitored border locations in the area between Kitwe and Kasumbalesa. These account for all the data on maize traded between these countries. On the border between Mozambique and Malawi there are 11 monitor stations, 6 of which are in the area between Cuamba and Liwonde. These 6 stations account for the majority of the informal trade between these countries. 25 In addition to the bulletins, much of the information in Section 2.2.1 comes from personal communication with Chansa Mushinge, Country FEWSNET Representative for Zambia, and other members of the FEWSNET/Zamia staff whose assistance is greatly appreciated. 87 Admittedly, this cannot be considered an exhaustive measurement of all the maize grain informally traded between these countries. FEWSNET estimates that they are counting roughly 80% of the trade between Mozambique and Malawi and about a third of the trade from Zambia to DRC. That said, based on consultant studies, FEWSNET officials believe the share of total trade which is actually captured through their data collection is consistent, so we consider the figures reported to be a good proxy for actual trade volume. Potential measurement error should thus not affect our ability to test for threshold effects based on trade volumes. The data cover a period from July 2004 to August 2010, providing 74 monthly observations. 2.2.2 Retail maize grain prices Prices from numerous markets in Zambia are collected weekly by the Central Statistics Office (CSO) and reported as monthly averages in nominal Zambian Kwacha (ZMK) through FEWSNET. Time series data covering the same period covered by informal trade data are available for over 30 different markets in Zambia. 26 This study will focus on Kitwe; a large town near the DRC border from which informally traded maize is exported. Price data are available for only one market in the DRC, Kasumbalesa, near its southern border with Zambia. Travelling on the tarmac road that connects them, Kasumbalesa is 98 kilometers from Kitwe. Kasumbalesa prices are collected by FEWSNET and reported in nominal US dollars. Price data collection in DRC did not begin until July 2005 (one year after the collection of trade data began), so the price transmission model for Kitwe and Kasumbalesa will be estimated using 62 monthly observations. 26 A map of all 30 markets is available upon request. 88 Figure 2.1: Map of markets considered in the spatial price transmission analysis Tanzania D.R.C. Malawi Kasumbale sa Kitwe Lilongwe Zambia Liwonde Lusaka Cuamba Blantyre Mozambique Zimbabwe Chimoio 89 Nampula The Ministry of Agriculture and Rural Development in Mozambique collects weekly retail maize grain prices in several northern markets near the Malawi border through its Agricultural Market Information System. This study will focus Cuamba, in north eastern Mozambique’s surplus producing region where much of the informally traded maize is grown. In Malawi, weekly retail maize prices are collected by the Ministry of Agriculture (MOA) as part of the Retail Price Survey. This study will focus on Liwonde, near Malawi’s eastern border, 258 kilometers from Cuamba and along the road from Cuamba to Blantyre (Malawi’s largest southern city). The road between these towns is a combination of graded soil and tarmac. A map of the markets used in this study can be found in Figure 2.1. There is one missing value in the Cuamba price series. This was replaced with an imputed value using the best subset regression of all other available prices, including many not included in the price transmission analysis. All prices were converted to the currency of the exporting market (either Zambian Kwacha, ZMK or Mozambique New Metical (MZN) using monthly averages of daily exchange rates reported at oanda.com and fxtop.com. 27 Although the DRC currency is the Congolese Franc, Kasumbalesa prices are reported in USD by FEWSNET. These prices were converted to Kwacha using the product of the Dollar-to-Franc and Franc-toKwacha exchange rates. 2.2.3 Transfer costs Diesel price is an important time-varying component of transfer costs and will be included using per liter diesel price in Lusaka and Nampula for the Zambian and Mozambican models 27 Oanda.com does not provide exchange rates for Congelese francs prior to 2006, hence the use of fxtop.com for supplemental data. 90 28 respectively . Zambian diesel prices are reported by the Energy Regulation Board (ERB), and 29 Mozambican diesel prices come from Direccao Nacional de Energia . Since fuel prices are tightly regulates in both countries, the Lusaka and Nampula prices should track diesel price movements in the exporting markets of our study very closely. Although an important component of transfer costs, it is unlikely that diesel prices alone will control for all transfer costs. For example, costs unrelated to transportation such as uncertainty premiums, or search and price discovery costs are likely independent of diesel prices. Such costs, however, are often difficult or impossible to observe. To the extent possible unobserved transfer costs will be controlled for through model specification, as will be discussed in the following section. 28 Though most informally traded maize moves across borders on bicycles or other man and animal powered vehicles, it will have otherwise been moved throughout the region on diesel powered trucks or mini-vans. 29 Many thanks to the authors of Tostao and Brorson (2005) for sharing the Mozambique diesel price data. 91 2.3 METHODS The study of market integration and price transmission has evolved substantially over the past few decades. 30 This class of models has grown from early work that focused on correlation coefficients between price series (Cummings, 1967; Lele, 1967; Blyne, 1973), to simple cointegration models (Harriss, 1979, and references therein), and more recently to non-linear models that allow for the presence of transfer costs that had previously been ignored. Two main prevailing modeling approaches have emerged: the parity bounds model (PBM) and the threshold autoregressive models introduced by Balke and Fomby (1997). Traditional TAR models are estimated with a single threshold based on the size of the price margin between the two markets. Price transmission is allowed to occur at different speeds depending on whether the current price difference is above or below the threshold. The threshold value in such models is interpreted as the transfer costs of getting product from one market to the other. Van Campenhout (2007) argues convincingly for the superiority of TAR, citing, among other concerns, the PBM’s questionable distributional assumptions for changes in price margins that are initially lower than transfer costs. This study will employ a variation on the TAR model. As with the PBM, the TAR model has been popular in numerous prior applications (Obstfeld and Taylor, 1997; Abdulai, 2000; Goodwin and Piggot, 2001; Shepton, 2003; Balcombe, Bailey, and Brooks, 2007; Van Campenhout, 2007; Aker, 2007; others). However, traditional TAR models also have their drawbacks. First, nearly all applications impose the (frequently implicit) assumption that 30 See Aker (2007) or Van Campenhout (2007) for a more thorough review of the methods for measuring price transmission and market efficiency over the past few decades. Also see Rashid and Minot (2010) for a review of price transmission literature focused on Africa. 92 31 transfer costs are constant over time in order to estimate the threshold parameter . This simplifying assumption is often applied with the justification that many of the factors driving transfer costs are unobservable, and it is better to assume transfer costs are constant than to ignore them altogether. While this may be true, traditional TAR models leave no room to include time-varying factors driving transfer costs that are observable, such as the price of fuel. Secondly, the traditional TAR model allows for a long-run equilibrium when the price margin is below the estimated transfer costs. According to the economic theory motivating such models, however, there is no spatial arbitrage opportunity when transfer costs exceed price margins. In other words, traditional TAR models assume spatial price equilibrium when economic theory suggests that none should exist. This model mis-specification could be a potentially costly source of parameter estimation bias. Thirdly, traditional TAR models are estimated using observations on prices only, while quantities traded are implicitly assumed to exist or not depending on whether price transmission occurs. If data are available, however, trade volume would provide a more theoretically sound variable upon which to base a threshold estimate, since trade (or the possibility of trade) is the actual mechanism through which price transmission occurs. 2.3.1 Price transmission with transfer costs Myers and Jayne (forthcoming) address many of these shortcomings by introducing a multipleregime price transmission model with explicit inclusion of transfer costs and using the volume of trade as the threshold variable. For a start, they consider a model that explicitly controls for 31 A notable exception is Van Campenhout (2007), who allows the transfer cost threshold to adjust according to a linear time trend. 93 transportation costs. Following that work, we first introduce the equilibrium relationship (assuming trade is occurring): (2.1) ptA   0  1 ptB   2 k t  u At where ptA is the price in market A in time t, ptB is the price in market B in time t, kt is the unit cost of transferring a good from market B to market A, and uAt is the random shock to equilibrium. If unit costs are unobservable, kt could alternatively be considered as a vector of observable determinants of transfer costs. Note, this equilibrium equation implicitly assumes market A is the importer and B the exporter. As Myers and Jayne (forthcoming) explain, if the stochastic term is serially uncorrelated, then adjustment to equilibrium is immediate after a shock. If there is autocorrelation, on the other hand, adjustment is a dynamic process whose duration depends on the structure of autocorrelation in uAt. We would have perfect spatial arbitrage if 0  0 and 1   2  1 , but this is seldom observed empirically because there are a variety of unobserved factors that may cause deviations from the perfect spatial arbitrage conditions (Myers and Jayne, forthcoming). Moreover, allowing  0  0 helps address some of the difficulties fully measuring transfer costs by enabling the model to control for any time-constant unobservable transfer costs that may exist. The long-run relationship between prices is represented by the value of  1 , but equation (2.3) does not tell us anything about the speed of adjustment if a dynamic process does exit. To understand this, again following Myers and Jayne (forthcoming), we must further specify 94 equations for the origin market price, the unit transfer costs, and the potential autocorrelation structure: (2.2) ptB   0  u Bt (2.3) k t   0  u kt (2.4) a(L)ut = εt where ut = (uAt, uBt, ukt) , εt = (εAt , εBt , εkt ) is an i.i.d. (0, Ω) error vector and n   ci Li a(L) i 0 is a matrix polynomial in the lag operator with c 0 = I. Notice that no assumptions have yet been made on roots of a(L) (on whether price are stationary or nostationary) Then, following the derivation outlined in Myers and Jayne (forthcoming), equations (2.1) - (2.4) can be written in a single equation error correction form (SEECM) as 32 follows : (2.5) ptA    1ptB   2 k t    ptA 1  1 ptB 1   2 k t 1   1 ptB 1  2 k t 1      bi ptA i  1ptB i   2 k t i    1 ptB   n i 1 n n   ci p t i  2 k t   d i k t i   t i 1 B i 1 32 There are a number of valid ways to estimate this system depending on the stochastic properties of the data series, however the SEECM is most convenient when considering the possibility of multiple regimes. 95 where the parameters  , bi , c i , di , 1 ,  2 , 1 and 2 , are functions of the parameters in the error structure defined in equation (2.4). See Myers and Jayne (2010) for a more thorough derivation of these parameters, but note that the error terms εAt and 0 . , εBt 1 and  2 represent the direct correlation between and εkt. The composite intercept term,  , is a function of 0 , 0 , As before, the long-run relationship between prices is represented by the value of with variable transfer costs controlled for by allowing 1 ,  2  0 , and now we have  , which measures whether and how quickly prices will return to equilibrium after a shock. Equation (2.5) represents a model for spatial price transmission in its general form, and is quite flexible in that variables can be either stationary or nonstationary and cointegrated, but which is under-identified. Identification depends on assumptions regarding the stochastic and cointegrating characteristics of the underlying data. Myers and Jayne (2010) present different sets of assumptions and their corresponding empirical models, summarized in Table 2.1. For now ignoring the possibility of trade thresholds, suppose we assume all prices are nonstationary and a cointegrating vector exits (the cointegration model). Then we can restrict 1  2  0 , and allow   0 , in which case equation (2.5) is now just identified. In the identified model, n is chosen to eliminate autocorrelation in the residual term, and fitting with Table 2.1: Alternate identification assumptions for spatial price transmission models Stationary Exogenous Parametric Version Variables Variables assumptions Model Name i None None 1   2  0 Cointegration ii p tA , p tB , k t p tB , k t 1   2  0 Stationary Summarized from Myers and Jayne (2010) 96 NLS provides optimal asymptotic Gaussian inference (Phillips and Loretan, 1991). Note that 1 and  2 are derived from cross correlations in the structural error terms (see Myers and Jayne, 2010). Therefore, allowing 1  0 and exogeneity assumptions regarding p tB k t . 33 and 2  0 implies that we need not make If all variables are nonstationary but not cointegrated, then no long run equilibrium relationship exists and =0. If there is evidence that all variables are stationary we can restrict estimate equation (2.5) allowing 1  0 and 2  0 . 1   2  0 and In this case we do assume export market prices and transfer costs are exogenous (Myers and Jayne, 2010). The characteristics of other potential identifying assumptions are discussed in Myers and Jayne (2010). 2.3.2 Trade-based thresholds Part of the objective of this paper is to allow for informal trade based thresholds that allow for structural differences in spatial equilibrium and price transmission, which is not addressed in equation (2.5). There are several reasons one might expect there to be trade-based thresholds for price transmission. For example, we might find a low-level trade threshold if policy restricts trade that would otherwise be encouraged by price incentives. On the other hand, we might find a higher-level regime change as trade volume approaches the capacity limit for transportation between markets (Coleman, 2009; Myers and Jayne, forthcoming). 33 In other words, the cointegration form of this model is robust to violation of the “central market” assumption that is made by Ravallion (1986) and many others. 97 To introduce the potential for trade-based thresholds we re-write equation (2.5) as ptA  f(X t ,  ), where Xt is the vector of the relevant prices from equation 5 and their lags, and  is the relevant parameter vector, the possibility of thresholds can be expressed as: (2.6) ptA  f(X t , m ), if  m 1`  q t   m ,   for all m  1...m  1 * where q is the variable upon which the threshold is based (the quantity of maize informally traded in our case). The threshold parameters,  , represent the levels of that variable at which equilibrium and price transmission structurally changes. The  parameters with an m subscript, therefore, are unique to each trading regime. There are regimes. Note that when trade is unidirectional  0 m * thresholds and m * +1 trading  0 , and in any case  m*1   . Estimating parameters within each regime has already been discussed. Equation (2.6), however, introduces m * +1 new parameters to identify:  1 ... m*  and m * itself (the number of thresholds). For any given m , the threshold parameters can be identified employing the Gonzalo-Pitarakis (GP) penalized criterion function approach, where the threshold parameter(s) is (are) chosen to maximize the objective function: (2.7)  SS T  g T  QT m   max ln    T Km   SST   34 where K is the number of parameters to estimate in the single-regime (i.e. no threshold) model , SS T is the sum of squared residuals for that model and SS T   is the sum of squared residuals of the m-threshold model. The function g  is defined to finalize the criterion 34 For example, in our case, if we’re estimating the n=1 model, K=9. 98 function. Gonzalo and Pearakis (2002) simulate results for 5 alternative specifications where BIC, AIC, HQ, BIC2 and BIC3 respectively define g  as ln  , 2, ln ln  , 2 ln  and 3 ln  . The GP approach is based on the fact that the distribution of a conventional likelihood-ratio test statistic to compare such alternatives is unknown due to the nuisance parameter issue under the null hypothesis of no thresholds 35 (Hansen, 1996; Gonzalo and Pitarakis, 2002). For any given m, the GP criterion function approach is analogous to the sup-Wald grid search employed by most single threshold studies for identifying the optimal threshold. Over a range of values of m, the right hand term of the GP criterion penalizes the objective function for model overparameterization. Though there is not an explicit test available for the existence, number or value of multiple-thresholds, the GP criterion is the most direct method for allowing the data to discover the “best” multiple-threshold model. At this stage, however, it may appear as though this method faces a paradox. To identify the optimal thresholds, we need some functional form of the price relationship (i.e. we need to be able to compute SS T   to employ the GP criterion). However, in order to properly identify the functional form, we need to know which value of qt denotes the optimal threshold (i.e. we can examine the stochastic properties of prices within each regime). Fortunately, although each would provide different coefficient estimates for long-run equilibrium and speed of price transmission, each model discussed in Table 2.2 is just identified, and thus for any given  each provides the same SS T   value. Therefore, we can optimize equation (2.7) to discover 35 Hansen (2000) describes a Monte-Carlo approach for regime testing when m=1, however implementation of that test has proven infeasible with our data, likely due to collinearity resulting from small within-regime sample sizes (Hansen, 2011). This was true even when the threshold was forced to be held at the median value of the threshold variable (i.e. when the sample was split in half). 99 optimal threshold values by applying naïve identification assumptions to equation (2.5), at no cost. In fact, the “Stationary” model from Table 2.2 reduces to a linear form, so we can employ A the OLS regression of pt on ptB, kt and the appropriate number of differences and lags to execute maximization of equation (2.7) (Myers and Jayne, forthcoming). Finally, to identify m * , first note that it will be some integer such that m *  0,1..., M  , where M  intT  n  1 / K  1   T  n  1  1 . The numerator within the integer function (T-n-1) is the number of usable observations. The denominator K  1   T  n  1 is the minimum number of observations we will allow within each regime where K+1 is number of observations needed for each regime to have enough degrees of freedom to estimate with inference and   0,1 determines the additional share of the sample which will be included in each regime to avoid over-parameterization. Following the recommendation of Balke (2000) we will set   0.15 . The resulting integer is thus the maximum number of regimes into which we could split our sample. We subtract 1 because there is one less threshold than there are regimes. Then, the GP criterion suggests that we can identify the optimal number of thresholds according to: (2.8) m *  arg max QT m  0 m  M In practice this approach can be employed either sequentially or non-sequentially. It should be noted that in our case we must assume qt is contemporaneously exogenous. We argue qt be considered exogenous because market information faces delays when travelling across borders in informal markets and informal traders may face delays in getting to market. 100 can Therefore, it is reasonable to assume trading decisions in time t are made based on prices in time t-1. 36 2.3.3 Non-sequential threshold estimation Sequential threshold estimation as described in Gonzalo and Piterakis (2002) is asymptotically consistent and less computationally demanding. However, due to our small sample size we can apply non-sequential estimation which should lead to more accurate small sample results. The approach used for non-sequential estimation is: 1. Identify the GP-optimal threshold of a 2 regime model, initially assuming n=1. 2. Identify the GP-optimal thresholds of a 3 regime model, disregarding the information learned in step 1, and re-examining all possible GP criterion values for a lower and upper threshold. 3. Repeat step 2, disregarding previous estimates and adding a potential regime each time. Stop after estimating the model with M+1 regimes. 4. Compare the GP criterion value for each optimal multiple-threshold models identified under the various assumptions in steps 1-3, and choose the model with the highest value (again, if no GP values are greater than 0, all multiple-threshold models should be rejected in lieu of the single-regime model). 5. Analyze stochastic properties of variables w/in each regime. 6. Estimate the appropriate SEECM within each regime, again restricting n=1. 7. Test for autocorrelation in the residuals. 8. If autocorrelation persists, add a lag and return to step 1. 36 Obviously, this argument would be even stronger with higher frequency data. If this is not convincing or accurate, it may be more appropriate to choose the threshold parameter using lagged values of q. 101 9. Analyze results. The advantage of non-sequential selection is that it will result in a model that is at least no “worse”, and may identify a “better” model than the sequential estimation, which maintains previously identified threshold values when additional regimes are considered. This is because all potential models that could be identified sequentially will also be considered when identifying models non-sequentially. The cost is that non-sequential selection is more computationally demanding, potentially requiring the comparison of many thousands more models. In the univariate-based multiple-threshold model this is not an overwhelming requirement. When estimating a multivariate-based multiple-threshold model, on the other hand, the demands of this approach grow exponentially with the number of variables allowed to determine thresholds. 2.3.4 Analyzing Results In price transmission studies the primary unit of analysis is often the half-life, h, of a price shock, or the amount of time it takes for half of the adjustment back to long-run equilibrium to occur after a shock. Half lives are computed either as a function of regression results, where h  ln(0.5)/ln(1   ) (Van Camenhout (2007) and others), or via simulation using regression results (Myers (2009) and others). It is also useful to examine the effect of a shock graphically, where we can see how simulated markets in equilibrium would react to a one-time permanent shock in the price of the exporting market. In addition to quantifying the overall time it takes for shocks to transfer, graphical simulations can show us the path of adjustment back to equilibrium, accounting for the dynamic processes not explicitly represented by 102 1 and  . 2.4. RESULTS 2.4.1 Results from Kitwe, Zambia and Kasumbalesa, The DRC We first look at the relationship between prices in Kitwe, an urban center in northern Zambia, and Kasumbalesa, a smaller urban center located 98 kilometers away and across the border in The DRC. The road from Zambia’s surplus producing regions to the export market in Kasumbalesa runs through Kitwe. Therefore, barring some other prohibitive market conditions, it would seem quite reasonable to find price transmission between these markets when the opportunity for spatial arbitrage exists. Prices and trade data are examined descriptively in Figure 2.2, where we plot time on the horizontal axis, trade volume on the left vertical axis, and price difference on the right vertical axis with a horizontal reference line at zero. Graphically, it appears there is generally comovement between price difference and informal trade volume, though quantities vary greatly. In the period from the end of 2005 into 2006, it seems that as price difference dropped to zero, trade volume decreased at a similar rate. Throughout 2006 spikes in the price difference appear to have been met with similar spikes in trade, though evidently not enough to pull the price difference down to close to zero. Then, around March/April in 2007 (the beginning of a good harvest season in Zambia) informal trade spikes and the previously positive price difference quickly drops to zero. When it does, trade drops, then price differences slightly rise, followed by trade which nudges the difference back to around zero, and so on into 2008. During 2008 there is a prolonged period where trade is relatively low, reaching a minimum of nearly zero exports in March 2009, and absolute price differences grow fairly 103 0 1000 Trade volume (mt) 2000 3000 4000 -1000 -500 0 500 1000 Price difference: Import-Export (ZMK/kg) Figure 2.2: Price difference between Kitwe and Kasumbalesa and informal trade 2005m1 2006m1 2007m1 2008m1 t 2009m1 Trade volume 2010m1 Price gap consistently. Notice, however, the price difference is negative (DRC prices are lower) and since DRC is not a surplus producer it is reasonable that the trade would be low and prices may have a different long run equilibrium. DRC is not a self-sufficient maize producer, so this was likely a period when that country was importing from elsewhere. There is no empirical evidence suggesting maize trade (formal or informal) ever flowed from DRC to Zambia during this period, which anecdotal evidence confirms (Mushinge, 2011) Around mid-2009 price difference again jumps above zero, which corresponds to a similar spike in informal trade volume. After a brief dip back to nearly zero, the price difference between these markets bounces around 10 cents per 50kg bag, with corresponding movements in the volume of informal trade throughout the remainder of the period covered by our data. While 104 this apparent co-movement is interesting it does not directly address our objective of determining whether there are long-run equilibrium states between these markets and whether price transmission actually occurs. To that end, we now turn to estimating the multiple-threshold price transmission model. With the initial one lag model there are 9 parameters to estimate and 60 usable observations. Based on the criterion described in Section 2.3, this implies we can have two thresholds and three regimes at most. Results from the GP selection process are presented in Table 2.2. The first potential threshold identified is at the trade level of 1095 metric tons of maize. According to the BIC criterion (GP = 0.03) this model is slightly superior to the single regime model, while the AIC and HQ criterions lend a bit more support with GP values of 0.34 and 0.22 respectively. On the other hand, the BIC2 and BIC3 criterions, which more heavily penalize over-parameterization, strongly favor the single-regime model (respectively returning GP values of -0.59 and -1.20). The three regime model identifies optimal thresholds at 555 and 1095.3 metric tons of 37 informally traded maize . The BIC, BIC2 and BIC3 criterions all have their lowest GP values for this model (-0.09, -1.31 and -2.54 respectively), and all would favor the single-regime over the triple-regime. On the other hand, the AIC and HQ criterions both have the highest GP values for the three-regime model (0.54 and 0.30 respectively). 37 Interestingly, we notice that, if we were performing this analysis sequentially and if we had accepted the first threshold, our results would not have differed from non-sequential estimation presented here. 105 Table 2.2: GP threshold selection for price transmission between Kitwe & Kasumbalesa Penalty criterion function Threshold value (metric tons of Model traded maize) BIC AIC HQ BIC2 BIC3 1 Two regime Three regime 2 1095.3 555.0 1095.3 ----------------------GP values-----------------------0.0268 0.3409 0.2181 -0.5874 -1.2015 -0.0858 0.5425 0.2968 -1.3141 -2.5424 Unfortunately, these results are not definitive. At least one of these criterion support either the single regime model or one of the potential threshold models being considered, leaving us to decide which criterion to trust. In simulated examples presented by Gonzalo and Pitarakis (2002) the BIC2 and BIC3 criterions have the best performance by far when their DGM has no threshold (correctly identifying the model in nearly 100% of the simulations). In the same simulations the AIC and HQ criterions perform rather poorly. When their DGM has tworegimes the criterions most frequently correctly identify the model are AIC, HQ and BIC (in that order), but the BIC2 criterion also performs fairly well. Based on these results, they conclude “BIC and to a lesser extent BIC2 display the best overall performance, with an excellent ability to point to the true model even for moderately small sample sizes,” such as the one used in this study. On an un-weighted average the BIC and BIC2 criterions favor the single-regime model. Even if we were to only consider the BIC criterion, support for the threshold model is fairly weak (i.e. the GP is very close to zero). In such a case Hansen’s (2000) bootstrap test would be an informative addition to the evidence on which model is best, but performing this test was not 106 feasible with our data. 38 This is likely due to within regime collinearity stemming from our fairly small sample size (Hansen, 2011). So, based on the sum of evidence we will proceed estimating our model without a trade threshold. Next we examine the stochastic properties of the prices in our model using the full sample under the single-regime model. Table 2.3 summarizes the p-values from augmented Dickey-Fuller (ADF), augmented Phillips-Perron (APP) and Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) unit root tests as well as the Engle Granger cointegration test. In each case these tests fail to reject the non-stationary (or non-trend-stationary) null hypothesis using either the ADF or APP tests. Furthermore, the majority of the KPSS tests reject the stationary null hypothesis at the 10% level or lower, again with or without including a trend. The only exception is the diesel price which fails to reject the trend stationary null hypothesis at the 10.1% Table 2.3: Diagnostic tests for price series stochastic properties (Kitwe & Kasumbalesa) Test Kasumbalesa Kitwe Diesel Unit root (Non-stationary null) ADF 0.78 0.40 0.20 ADF, trend 0.55 0.18 0.24 APP 0.43 0.34 0.47 APP, trend 0.17 0.53 0.48 Unit root (Stationary null) KPSS <0.03 <0.03 <0.05 KPSS, trend <0.01 <0.10 <0.101 Cointegration EG 0.00 Notes: MacKinnon (1994) approximate p-values reported for ADF and APP tests. Relative pvalues for KPSS tests are based on approximate critical values reported in Kwiatkowski et al. (1992). 38 Specifically, when running Hansen’s Gauss code for the bootstrap test the program returned the error message “matrix not positive definite.” Personal communication with Hansen (2011) confirms this “occurs most typically when you try to estimate a regression with too few observations in each ‘regime’ defined by the threshold.” This occurred for every minimum within-regime sample size setting attempted. 107 level (the test statistic is 0.118 against the 10% critical value of 0.119). The Engle Granger cointegration test of all regressors in the dynamic model strongly rejects the hypothesis of nonstationary residuals, indicating a cointegration relationship exists. All together, these results indicate we should estimate equation (2.5) for these markets applying the “cointegration model” assumptions described in Table 2.1. These results are presented in Table 2.4. First note from the Lung-Box tests reported at the bottom of this table that there is no evidence of autocorrelation in the residual terms, and thus no need to add further lags to the model. Our estimate of  1 is 0.9988 with a  1 =0 null hypothesis p-value of 0.02. The t-test for whether this estimate is significantly different from 1 yields a p-value of 0.998 (i.e. this estimate is significantly different than zero, but not significantly different than one). The 95% confidence interval for this estimate is 0.13 to 1.86. It is noteworthy that the coefficient estimate is almost exactly what one would expect if we have controlled for transfer costs sufficiently and price transmission occurred over the long-run through competitive arbitrage. The estimate for the speed of price transmission parameter,  , is -0.2236, which translates into a half life of 2.74 months. The 95% confidence interval for this estimate is -0.424 to -0.023, which translates to a half life interval of 1.26 to 29.89 months. Surprisingly our estimate of 2 is not statistically significant at any meaningful level, suggesting that diesel prices do not explain any of the difference between Kitwe and Kasumbalesa maize prices. This may be a reflection of the fact that, although they are separated by a national border, these markets are separated by fewer than 100 km of tarmac. Therefore, diesel costs for transportation between them are not very high and will not change much with diesel price changes. Furthermore, at least a portion of the 108 Table 2.4: Price transmission estimation results for Kitwe & Kasumbalesa SEECM under Parameter cointegration [95% Confidence Interval] µ: Constant 129.776 [-297.55, 557.10] (212.85) β1: Long-run relationship 0.9988 [0.13, 1.86] (0.43) β2: Diesel Prices -0.0727 [-0.40, 0.25] (0.16) -0.2236 [-0.42, -0.02] : Speed of transmission (0.10) -0.6859 [-0.34, 0.22] 1 (0.14) 0.1162 [-1.50, 0.13] 2 (0.41) b1 -0.0617 [-0.26, 0.59] (0.21) c1 0.1633 [-0.24, 0.47] (0.18) d1 0.0614 [-0.14, 0.26] (0.10) a Half-life (months) Goodness of Fit: 2.74 R2 2 Adjusted- R 0.19 0.06 Residual autocorrelation: Q(1) 0.86 Q(3) 0.62 Q(5) 0.49 Q(7) 0.43 Note: Standard errors in parentheses. a) Half life is calculated as ln(0.5)/ln(1+). th Residual autocorrelation tests (Q(j)) are p-values for portmanteau tests for j degree white noise in the residuals. Insignificant results suggest white noise (i.e. no autocorrelation). 109 Figure 2.3: Simulated shock to equilibrium prices in Kitwe and Kasambulesa 1500 Price ($/kg) 1400 1300 1200 1100 1000 1 2 3 4 5 6 7 8 9 10 11 12 Months Kitwe Maize Kasumbalesa Maize transportation is frequently done on the back of a bicycle, the cost of which is not directly associated with diesel price. Other parameter estimates have no significant economic interpretation individually (Myers and Jayne, forthcoming). Figure 2.3 simulates a year-long relationship between Kitwe and Kasumbalesa prices The simulation is initiated holding Kitwe maize price, represented by the dashed line, and Lusaka diesel price (not shown) at their data means (1,171 ZMK/kg and 5,799 ZMK/liter respectively). The predicted equilibrium price of maize in Kasumbalesa is 1,329 ZMK/kg, which is reasonably close to the actual mean price (1,316 ZMK/kg). In the third month a one-time permanent increase in Kitwe price is introduced in the amount of 150 ZMK/kg (the average month-to-month change in Kitwe price over the sample period). 110 We see fairly rapid price shock transmission, which was expected based on the half-life calculation. Kasumbalesa prices begin to respond in the month of the shock itself, and by the month after the shock 67% of the shock has been transferred. From this month onward the shock th adjustment gradually levels off so that by the 8 month after the shock (month 12 in Figure 2.3), Kasumbalesa price has effectively reached the new equilibrium price around 1,479 ZMK/kg. Although there is not strong evidence of informal trade based threshold effects, it is clear that the informal trade that regularly takes place draws prices in theses spatially separated markets towards an equilibrium, and that the speed of adjustment is rapid. We will compare the speed of transmission estimated here to that between Mozambique and Malawi as well as to results from other studies in the following sub-section. 2.4.2 Results from Cuamba, Mozambique and Liwonde, Malawi Next we will examine the relationship between prices in Cuamba, Mozambique, and Liwonde, Malawi. Cuamba is situated in one of the highest surplus production regions in Mozambique which is the source of a major trade flow into Malawi (Haggblade et al., 2008). Liwonde is inside the Malawian border along the road that runs from Cuamba to Blantyre, Southern Malawi’s largest city. The majority of the informal maize traded between Malawi and Mozambique crosses the border area between Cuamba and Liwonde (Mushinge, 2011), so, one would expect to see a long-run equilibrium between these market prices and price transmission if the markets are working well. Prices and trade data are examined descriptively in Figure 2.4. Unlike the case with Zambia and The DRC, there is occasionally reverse trade flow between Malawi and Mozambique (i.e. informal Malawian traders occasionally sell to Mozambique). Thus, in Figure 111 0 Trade volume (mt) 5000 10000 -6 -4 -2 0 2 4 Price difference: Import-export (MZN/kg) 15000 Figure 2.4: Price difference between Liwonde and Cuamba and informal trade levels 2004m1 2006m1 2008m1 2010m1 t Total trade (mt) Price gap (MZN/kg) Malawian exports (mt) 2.4, we plot time on the horizontal axis, total trade volume and the share of total trade flowing from Malawi to Mozambique on the left vertical axis, and price difference on the right vertical axis. There is a horizontal reference line indicating zero price difference. The fact that our data reports simultaneous bi-directional trade presents two concerns for this study. First we must explain why this would occur. Secondly we must discuss whether this has an impact on the appropriateness of our empirical approach. In theory, trade would only occur when there is opportunity for spatial arbitrage, or when the price difference between spatially separated markets exceeds transfer costs. This obviously rules out any economic justification for simultaneous bi-directional trade between integrated 112 competitive markets. Nevertheless, we find that informal trade flows in both directions 66% of the time. Data aggregation could partially explain this apparent paradox. That is, we are using monthly data on prices and trade flows, but prices can fluctuate within a month. Thus, on any given day price incentives may encourage trade to flow to Mozambique, even if during the majority of the month incentive is for trade to flow to Malawi. There are also possible explanations beyond data aggregation. For example, the theory precluding bi-directional trade assumes perfect information, which is not most likely the case for informal traders on the Malawi/Mozambique border. In fact, it would be a very strong assumption to say the quality of price information is even homogenous across traders, much less perfect. Thus, it is feasible that some traders make the decision to export grain based on ill-informed price expectations and counter to the behavior of their colleagues. In fact, imperfect information is one of the reasons we expect to see dynamics in the price transmission between markets at all. The second issue is whether our model is appropriate when there is bi-directional trade. One might be concerned that bi-directional trade implies bi-directional price transmission. 39 The structural model in equations (1) - (4), on the other hand, has Malawian market price being explained by Mozambican market price, but not vice versa. Recall, however, the autocorrelation structure in equation (2.4) allows us to proceed without making any exogeneity assumptions, so our model does allow for bi-directional price transmission implicitly (i.e. through the correlation 39 In fact, bi-directional price transmission could exist even if trade was uni-directional but the importing and exporting markets were comparable in economic size. Specifically, if factors affecting domestic supply and demand in one market occur simultaneously with and independently of factors affecting the domestic supply and demand of the market with which it trades, both prices could be disturbing the spatial equilibrium. 113 Table 2.5: GP threshold selection for price transmission between Cuamba & Liwonde Penalty criterion function Threshold value (metric tons of Model traded maize) BIC AIC HQ BIC2 BIC3 1 Two regime Three regime 2 2398.0 2398.0 5608.6 ----------------------GP values------------------------0.2943 -0.0097 -0.1230 -0.8289 -1.3634 -0.4913 0.0779 -0.1487 -1.5604 -2.6296 in the error terms) see Myers and Jayne (forthcoming). We thus conclude that equation (2.5) is appropriate to carry out the remainder of our analysis. The only difference is that we will now 40 use net trade as our potentially threshold-defining variable . With the initial model there are 9 parameters to estimate and we have 72 usable observations. Based on the criterion described in Section 2.3, this implies we can once again have 2 thresholds and 3 regimes at most. Results from the GP selection process are presented in Table 2.5. These results are far less ambiguous than those for the Kitwe/Kasumbalesa model and almost exclusively favor the single regime model. The only exception comes from comparing the three-regime model to the single regime model according to the AIC criterion. This produces a GP value of less than 0.08 suggesting the threshold model is weakly superior. Based on the results of every other criterion, however, we conclude the single-regime model is most appropriate. Once again, it would be informative to compare this conclusion to the results of a Hansen (2000) test, but this was once again not feasible due to sample size (Hansen, 2011). 40 41 Total trade is an alternative threshold variable, but we note that using this does not change the conclusions of our GP selection process described here. 41 Once again, these tests invariably returned the “matrix not positive definite” error message in Gauss, which according the Hansen (2011) is most likely caused by small within-regime sample sizes. 114 Table 2.6: Diagnostic tests for price series stochastic properties (Cuamba & Liwonde) Test Liwonde Cuamba Diesel Unit root (Non-stationary null) ADF 0.32 0.05 0.35 ADF, trend 0.26 0.11 0.13 APP 0.27 0.04 0.43 APP, trend 0.17 0.09 0.09 Unit root (Stationary null) KPPS >0.03 >0.10 >0.10 KPSS, trend >0.10 >0.10 >0.10 Cointegration EG 0.00 Notes: MacKinnon (1994) approximate p-values reported for ADF and APP tests. Relative pvalues for KPSS tests are based on approximate critical values reported in Kwiatkowski et. al. (1992). Next we examine the stochastic properties of the prices in our model using the full sample under the single-regime model. Table 2.6 summarizes the p-values from ADF, APP and KPSS unit root tests as well as the EG cointegration test. For the Liwonde maize prices tests fail to reject the non-stationary null hypothesis using either the ADF or APP tests with and without including trends and the KPPS tests reject the stationary and trend-stationary null hypotheses at the 10% level or lower. For Cuamba maize prices, ADF and APP do reject the non-stationary null at the 4-5% level, but only reject the non-trend-stationary hypothesis at the 11 and 9% levels respectively. Conversely the KPSS tests reject the stationary and trend-stationary null at the 10% level. Diesel price results fail to reject the non-stationary null (although the APP results reject the non-trend stationary hypothesis at the 9% level) and reject the stationary null at the 10% level. If we accept that all variables are non-stationary, the Engle Granger cointegration test strongly rejects the hypothesis of non-stationary residuals, indicating a cointegration relationship exists. 115 The sum of evidence in these results suggests we should once again continue our analysis applying the cointegration model assumptions described in Table 2.1 to estimate equation (2.5). However, the evidence is not as unambiguous as in the Kitwe/Kasumbalesa case, so we might conclude that Cuamba price is stationary while Liwonde and diesel prices are non-stationary. In such a case there could be no long-run equilibrium, but this scenario is unlikely since spatial arbitrage continuously takes place between these markets. The existence of a statistically significant long-run equilibrium with price transmission would further question the possibility of such a scenario. Results of these estimations are presented in Table 2.7. First note from the Lung-Box tests reported at the bottom of this table that there is no evidence of autocorrelation in the residual terms, and thus no need to add further lags to the model. Our estimate of 1 is 0.822 with a p-value of 0.00 and a 95% confidence interval of 0.52 to 1.12. This coefficient estimate is also close to what one would expect if we have controlled for transfer costs sufficiently and these markets were integrated over the long-run through competitive arbitrage. We fail to reject the competitive arbitrage null hypothesis (i.e. H 0 :  1  1 ) with a p-value of 0.24. We again find fairly rapid price transmission with a  estimate of -0.246 that translates into a half life of 2.46 months. The 95% confidence interval is -0.416 to -0.076, translating to a half life range of 1.29 to 8.94 months. Note this coefficient estimate is similar to the Kitwe/Kasumbalesa model estimate; however results are more precise in this case. This may represent a stronger relationship between Malawi and Mozambique, but we also note the 20% difference in sample size, which may also improve efficiency. Unlike the model for Kitwe and Kasumbalesa, here we find a significant estimate of 2 . Given that these markets are much farther apart than Kitwe and Kasumbalesa, it is not surprising to see diesel 116 Table 2.7: Price transmission estimation results for Cuamba and Liwonde SEECM under cointegration Parameter [95% Confidence Interval] [-3.38, -0.35] µ: Constant -1.864 (0.76) β1: Long-run relationship 0.822 (0.15) [0.52, 1.12] β2: Diesel Prices 0.336 (0.12) -0.246 (0.09) -0.342 (0.13) -0.310 (0.14) [0.11, 0.57] [-0.42, -0.07] [-0.34, 0.16] [-0.63, -0.05] : Speed of transmission 1 2 b1 -0.092 (0.08) [-0.11, 0.19] c1 0.040 (0.11) [-0.54, -0.08] d1 -0.040 (0.08) [-0.20, 0.12] a 2.46 Half-life (months) : Goodness of Fit: R2 2 Adjusted- R 0.62 0.57 Residual autocorrelation: Q(1) 0.94 Q(3) 0.27 Q(5) 0.55 Q(7) 0.72 Note: Standard errors in parentheses. a) Half life is calculated as ln(0.5)/ln(1+). th Residual autocorrelation tests (Q(j)) are p-values for portmanteau tests for j degree white noise in the residuals. Insignificant results suggest white noise (i.e. no autocorrelation). 117 Figure 2.5: Simulated shock to equilibrium prices in Cuamba and Liwonde 8.50 8.00 Price ($/kg) 7.50 7.00 6.50 6.00 5.50 5.00 1 2 3 4 5 6 7 8 9 10 11 12 Months Mozambique Maize Malawi Maize price is more significant in this model. Other parameter estimates have no discernable economic interpretation individually (Myers and Jayne, forthcoming). Figure 2.4 simulates a year-long relationship between Cuamba (dashed line) and Liwonde (solid line) prices. The simulation is initiated holding Cuamba maize price and Nampula diesel price (not shown) at their data means (5.63 MZN/kg and 27.91 MZN/liter respectively). The predicted equilibrium price of maize in Liwonde is 6.40 MZN/kg, which is reasonable but slightly higher than the actual mean price (6.18 MZN/kg). In the third month a one-time permanent increase in Cuamba price is introduced in the amount of 0.84 MZN/kg (the average month-to-month change in Cuamba price over the sample period). 118 Once again based on the half-life calculation, we expect to see fairly rapid price transmission. By the month after the change in Cuamba price, 77% of the shock has already transferred, which is slightly more than our estimation of Kitwe/Kasumbalesa price transmission. From the month after the shock onward the shock adjustment gradually levels off so that by the th 8 month after the shock (month 12 on Figure 2.4), Liwonde price has effectively reached the new equilibrium price around 7.09 MZN/kg. Although there is not strong evidence of informal trade based threshold effects, it is clear that the informal trade that regularly takes place is drawing these spatially separated markets towards an equilibrium and that price transmission is fairly rapid. For example, Van Campenhout (2007) estimates maize grain price transmission half lives within Tanzania using the traditional TAR model. When price difference exceeds estimated transfer costs, that study estimates transmission at 3.7 weeks when markets are relatively close together (355 km) up to 11.6 weeks for markets that are farther apart (503 km). Myers and Jayne (forthcoming) estimate speed of transmission between South Africa and Zambia at a half life rate of 1.2 to 7.8 months, depending on trade regimes. Compared to these results and other studies, the price transmission speed of 2.74 and 2.46 months for Kitwe/Kasumbalesa and Cuamba/Liwonde is fairly rapid. 119 2.5 SUMMARY AND CONCLUSION The objective of this study was to analyze intra-regional market price transmission in Southern Africa, focusing on informal trading partners in Zambia and the DRC as well as Mozambique and Malawi. Specifically, we sought to determine whether and under what conditions long-run spatial price equilibrium exists and the speed at which price shocks are transmitted between Kitwe and Kasumbalesa (in Zambia and DRC respectively), and Cuamba and Liwonde (in Mozambique and Malawi respectively). The majority of the existing literature suggests Southern African markets are relatively isolated from outside price changes due to prohibitive policies (including export bans, exclusively state owned import rights and other license restrictions) and high transfer costs. There is less evidence available on how these markets could be expected to perform in the absence of these institutional barriers. The existence of informal trading partners with transfer costs that are relatively low (compared to transfers from between the region and the rest of the world) and which trade relatively outside the realm of political influence gives us the opportunity to examine the performance potential these markets have. This evidence will also inform policy decisions that must be made when trade opportunities or isolated deficits occur within the region. Following Myers and Jayne (forthcoming) we employed a single-equation error correction price transmission model that allows for time varying transfer costs and allows the relationship between markets to vary depending on trade levels. Although formal trade seldom takes place between countries north of South Africa, informal trade transfers a substantial amount of grain throughout the region. 120 Unexpectedly, we did not find evidence strong enough to support estimation of a model with trade-based threshold effects. Although this is somewhat surprising, it is also encouraging to discover that the functioning of the informal markets are not as vulnerable to exogenous limitations to trade, such as policy restrictions and transport capacity constraints, particularly in comparison to the formal trade markets that have been the focus of most previous studies. In the single-regime models thus estimated we find significant evidence of long-run spatial price equilibrium in both market pairings. In both cases the coefficient estimate for the long-run equilibrium suggests competitive arbitrage links the informal trading markets with price ratio estimates close to one after controlling for transfer costs. The rate of price transmission was also similar in the two models estimated. The traditional half-life of a transfer is estimated to be roughly 2.7 months between Kitwe and Kasumbalesa or 2.5 months between Cuamba and Liwonde both of which represent fairly rapid price transmission compared to other findings in the literature on price transmission. Through simulation analysis we demonstrate that one month after a shock to equilibrium is introduced, 67% (77%) of the total value of the shock will have transferred from Kitwe to Kasumbalesa (Cuamba to Liwonde). In short, this study shows that when we examine the price relationship between markets that are relatively unimpeded by interventionist trade policies and when we control, to the extent possible, for transfer costs, markets in the Southern Africa region will likely perform in accordance with economic theory; a long-run price equilibrium will exist, arbitrage appears to be carried out competitively, and price transmission is fairly rapid. 121 REFERENCES 122 REFERENCES Aker, J.C. 2007. “Cereal Market Performance during Food Crises: The Case of Niger in 2005.” Department of Agricultural and Resource Economics, University of California-Berkeley. Balke, N.S. and T.B. Fomby. 1997. “Threshold Cointegration.” International Economic Review 38: 627-45. Balke, N.S. 2000. “Credit and economic activity: Credit regimes and nonlinear propagation of shocks.” The Review of Economics and Statistics 82 (2): 344-349. Balcombe, K., A. Bailey and J. Brooks. 2007. “Threshold Effects in Price Transmission: The Case of Brazilian Wheat, Maize, and Soya Prices.” American Journal of Agricultural Economics 89: 308-23. Blyne, G. 1973. “Price Series Correlation as a Measure of Market Integration.” Indian Journal of Agricultural Economics 28: 56-9. Cummings, R.W. 1967. “Pricing Efficiency in the Indian Wheat Market.” Impex, New Delhi, India. FEWSNET. 2009. “Informal Cross Border Food Trade in Southern Africa” Issue 54, Famine Early Warning Systems Network. Other issues: http://www.fews.net/Pages/archive.aspx?pid=1&loc=3&l=en Gonzalo, J. and J. Pitarakis, 2002. “Estimation and Model Selection Based Inference in Single and Multiple-threshold Models.” Journal of Econometrics 110: 319-52. Goodwin, B.K. and N.E. Piggot. 2001. “Spatial Market Integration in the Presence of Threshold Effects.” American Journal of Agricultural Economics 83: 302-17. Haggblade, S., H. Nielson, J. Govereh and P. Dorosh. 2008. “Potential Consequences of IntraRegional Trade in Short-Term Food Security Crises in Southeastern Africa.” Report #2 prepared by Michigan State University for the World Bank. Hansen, B. 1996. “Inference When a Nuisance Parameter is Not Identified Under the Null Hypothesis.” Econometrica 64: 413-30. Hansen, B. 2000. “Sample Splitting and Threshold Estimation.” Econometrica 68 (3): 575-603. Hansen, B. 2011. Personal communication with the author of “Sample Splitting and Threshold Estimation.” Econometrica 68 (3): 575-603. Harriss, B. 1979. “There is Method in my Madness: Or is it Vice Versa? Measureing Agricultural Market Performance.” Food Research Institute Studies 17: 197-218. 123 Kwiatkowski, D., P. C. B Phillips, P. Schmidt, and Y. Shin. 1992. “Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root?” Journal of Econometrics 54: 159-178. Lele, U.J. 1967. “Market Integration: A Study of Sorghum Prices in Western India.” Journal of Farm Economics 49: 147-59. Mackinnon, J.G. 1994. “Asymptotic Distribution Functions for Unit Root and Cointegration Tests.” Journal of Business and Economic Statistics 12: 167-76. Mushinge, C. 2011. Personal communication with the Country FEWSNET Representative for Zambia, September 27, 2011 Mwanaumo, A., T.S. Jayne, B. Zulu, J. Shawa, G. Mbozi, S. Haggblade and M. Nyembe. 2005. “Zambia’s 2005 Maize Import and Marketing Experience: Lessons and Implications.” Food Security Research Project Policy Synthesis #11. Lusaka. Myers, R.J. and T.S. Jayne. 2010. “Price Transmission under Multiple-regimes and Thresholds with an Application to Maize Markets in Southern Africa” Michigan State University, Department of Agricultural, Food, and Resource Economics. Mimeo. East Lansing. Myers, R.J. and T.S. Jayne. forthcoming. “Multiple-regime Spatial Price Transmission with and Application to Maize Markets in Southern Africa.” American Journal of Agricultural Economics. Obstfeld, M. and A.M. Taylor. 1997. “Nonlinear Aspects of Goods-Market Arbitrage and Adjustment: Hecksher’s Commodity Points Revisited.” Journal of the Japanese and International Economies 11: 441-79. Phillips, P.C.B. and M. Loretan. 1991. “Estimating Long-Run Economic Equilibria.” The Review of Economic Studies 58: 407-36. Rashid, S. and N. Minot. 2010. “Are Staple Food Markets Efficient in Africa? Spatial Price Analyses and Beyond.” Paper prepared for the COMESA policy seminar on “Food Price Variability: Causes, consequence, and policy options,” 25-26 January 2010, Maputo, Mozambique. Ravallion, 1986. “Testing Market Integration.” American Journal of Agricultural Economics 68: 102-9. Shepton, P.S. 2003. “Spatial Market Arbitrage and Threshold Cointegration.” American Journal of Agricultural Economics 85: 1041-46. Tschirley, D. and T.S. Jayne. 2008. “Food Crises and Food Markets: Implications for Emergency Response in Southern Africa.” MSU International Development Working Paper No.94, Michigan State University, Department of Agricultural, Food, and Resource Economics. 124 Tschirley, D., J. Nijhoff, P. Arlindo, B. Mwinga, M. Weber and T. S. Jayne. 2004. “Anticipating and Responding to Drought Emergencies in Southern Africa: Lessons from the 20022003 Experience.” Prepared for the NEPAD Conference on Successes in African Agriculture, 22-25 November 2004, Nairobi, Kenya. Van Campenhout, B. 2007. “Modelling Trends in Food Market Integration: Method and an Application to Tanzanian Maize Markets.” Food Policy 32: 112-27. 125