. . .. . ‘. ‘ ......%.......r...: V . ., . V .. ‘ _ . a. . ‘ . . .Lr... Erika...» I x. . V 09133.? . .‘v In. . . . . v . .. . t . 1.1 .. Q93... .1: :71. exhifihm‘nn ‘ . . ‘ . { . . , . . ‘ 8m .. inn 3,. . 3. if»... r ‘5. t .6: ‘ . ..!.:. . :1- . 2. . duly: I; mil}. .2»... , ‘gm 3‘ h! , G. u .» hark .: i s. . s - ' t i .-n 11', \‘V;s. v 1 , . 1 .1 \II: a» A: \ . X!!! . h. ‘11....‘Il t J.IA£.1.\.“.._,I 1 ..6vn-.O..3 . .Vl:l ‘ This is to certify that the dissertation entitled THE INFLUENCE OF DEMAND MODEL SELECTION ON HOUSEHOLD WELFARE ESTIMATES: AN APPLICATION TO SOUTH AFRICAN FOOD EXPENDITURES presented by LESIBA ELIAS BOPAPE has been accepted towards fulfillment of the requirements for the Ph.D. degree in Agricultural Economics fl Maj(r)Profes§or’s Signature ‘3/ 2/ 011 T I Date MSU is an Affinnative Action/Equal Opportunity Institution LIBRARY Michigan State University 4 — —-------.--u—.-.-ua.-.- n.— o----.--.—.-------«_ 4 PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE M? 95. [900% 2/05 p:IClRC/DateDue.indd-p.1 THE INFLUENCE OF DEMAND MODEL SELECTION ON HOUSEHOLD WELFARE ESTIMATES: AN APPLICATION TO SOUTH AFRICAN FOOD EXPENDITURES By Lesiba Elias Bopape A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Agricultural Economics 2006 ABSTRACT THE INFLUENCE OF DEMAND MODEL SELECTION ON HOUSEHOLD WELFARE ESTIMATES: AN APPLICATION TO SOUTH AFRICAN FOOD EXPENDITURES By Lesiba Elias Bopape This study analyzes food expenditure patterns in South Africa, taking into account differences in demand behavior across rural and urban households, as well as across income groups. The study makes three main contributions from this study. First, it develops a Lagrange Multiplier (LM) test that can be used to determine whether the demand model should be specified with a quadratic or a linear expenditure term. The advantage of this test over the Wald test, which is based on the significance of the quadratic expenditure term in the demand model, is that it can be conducted without having to explicitly estimate the quadratic (in expenditure) demand model, which tends to be highly nonlinear. Second, the study examines the effects on household welfare of an indirect food tax reform, and evaluates the magnitude of the biases in the welfare estimates due to demand model is misspecification. The tax reform evaluated is the zero- rating of value-added tax (VAT) on meat products. Lastly, this study examines the differences in the consumption patterns between rural and urban households, and across households in different income groups. The study makes use of panel data on household food consumption in South Africa, collected as part of the KwaZulu-Natal Income Dynamics Study. Results from both the LM and the Wald tests support the inclusion of the quadratic expenditure term. The implication of this finding is that popular functional forms such as the almost ideal demand system (AIDS), which have Engel curves that are linear in expenditure, would not give an accurate picture of demand behavior of the households considered in this study. Given these findings, this study estimates the quadratic almost ideal demand system (QUAIDS), which a generalization of AIDS that allows for a quadratic relationship between budget shares and expenditure. The QUAIDS model is used to estimate demand functions for seven food groups—grains, meat and fish, fruits and vegetables, dairy, oils and fats, sugar, and other foods. The endogeneity of expenditure in the demand model is explicitly tested and, where necessary, corrected for using the control function approach. The model is also adjusted to account for a large fraction of observed zero expenditures using a two-step procedure appropriate for equation system estimation. Five of the seven food groups were found to be expenditure elastic, the exceptions being meat and fish and other foods. Demand behavior differs significantly between rural and urban households, as well as across income groups, implying that an accurate analysis of expenditure patterns in South Africa requires a disaggregated analysis that takes into account this heterogeneity in demand behavior. All households gain from the removal of VAT on meat, with welfare gains being larger for high-income households. On average, the AIDS expenditure elasticity estimates tend to be larger than the estimates based on QUAIDS. The AIDS model was also found to systematically overstate the welfare gains of the tax reform considered on this study, particularly for households with large expenditure levels. Copyright by LESIBA ELIAS BOPAPE 2006 To My Family For Your Encouragement and Support ACKNOWLEDGEMENTS This dissertation is the product of joint efforts by many individuals, to whom I owe sincere gratitude. I am greatly indebted to the chair of my guidance committee, Professor Robert Myers, whose technical and professional advice most students can only hope for. The ingenious comments and suggestions from Professor Jeffrey Wooldridge greatly improved the quality of the methods used in this study. I also thank Professors John Staatz and Kellie Raper for their invaluable comments and suggestions during the various stages of this dissertation. The technical support from Dr. Brian Poi of StataCorp greatly expedited the completion of this research. Professors Michael Carter of the University of Wisconsin, Julian May of the University of KwaZulu-Natal, and Ingrid Woolard of the University of Cape Town facilitated my access to the KIDS dataset. Dr. Jorge Agiiero answered many of the questions I had on the technical details of the dataset. This research was made possible by the financial support from the Department Of ' Agricultural Economics at Michigan State University, Mandela Economics Scholars Program, the National Research Foundation of South Afiica, and the Agricultural Research Council of South Africa. I am greatly indebted to my friends and colleagues whose support during the four years I spent in Michigan greatly lightened the task of completing my studies. We trekked through the heavy Michigan snow with Gerald Nyambane for stimulating and intellectual discussions at the Peanut Barrel. My study partners, Lilian Kirimi and Mary Mathenge, still have to convince me on the workings of the Law of Iterated Expectations. vi At last, I was able to convince my friend Elan Satriawan that I tackled endogeneity adequately in my model. I admire the ingenuity of Khathutshelo Sikhitha. Special mention also goes to the following individuals: Yanyan Liu, Anthony Chapoto, Wei Zhang, Ben Moholwa, Hikuepi Katjiuongua, Sindi Kirimi, Shaufique Sidique, Rui Benfica, Zhiying Xu, Vandana Yadav, Nomalanga Grootboom, and Annah Molosiwa. I am deeply thankful to the love, support, and encouragement of my wife, Letta, and our two beautiful daughters, Napyadi and Pheladi. My family in South Africa made huge sacrifices during the long years of my graduate studies. My mother and late father have always believed in the power of education to transform lives. The weekend calls to my brothers Simon, Jonas, and Tshepo, and my sisters Maria and Monene, sustained my motivation. vii TABLE OF CONTENTS LIST OF TABLES .............................................................................................................. x LIST OF FIGURES ......................................................................................................... xiii CHAPTER ONE: INTRODUCTION .............................................................................. l 1 . 1 Introduction .................................................................................................... 1 1.2 Research Gap and Study Motivation ............................................................. 2 1.3 Objectives of the study ................................................................................... 6 1.4 Organization of the Dissertation .................................................................... 7 CHAPTER TWO: REVIEW OF RELATED LITERATURE ......................................... 8 2. 1 Introduction .................................................................................................... 8 2.2 Commodity Grouping and Separability ......................................................... 8 2.3 Modeling Preferences .................................................................................. 11 2.4 Demographic Variables in Demand ............................................................. 17 2.5 Observed Zero Expenditures ........................................................................ 22 2.6 Chapter Summary ........................................................................................ 29 CHAPTER THREE: EMPIRICAL MODEL ................................................................. 3O 3. 1 Introduction .................................................................................................. 30 3.2 Empirical Model .......................................................................................... 30 3.2.1 Quadratic Almost Ideal Demand System: The General Form ..................... 31 3.2.2 Quadratic Almost Ideal Demand System: The Estimation Form ................ 35 3.3 Estimation .................................................................................................... 38 3.3.1 A Test for Nonlinearity of the Demand Model ............................................ 41 3.3.2 Nonlinear Estimation ................................................................................... 44 3.4 Econometric Issues ...................................................................................... 45 3.4.1 Attrition ........................................................................................................ 45 3.4.2 Expenditure Endogeneity ............................................................................. 49 3.4.3 Observed Zero Expenditures ........................................................................ 51 3.5 Chapter Summary ........................................................................................ 54 CHAPTER FOUR: DATA SOURCES AND DESCRIPTION ..................................... 55 4. 1 Introduction .................................................................................................. 55 4.2 Surveys and Data Description ...................................................................... 55 4.3 Background Information: KwaZulu-Natal Province ................................... 59 4.4 Descriptive Statistics .................................................................................... 60 4.5 Zero-Expenditure ......................................................................................... 67 4.6 Chapter Summary ........................................................................................ 68 CHAPTER FIVE: _EMPIRICAL RESULTS ................................................................. 7O 5. 1 Introduction .................................................................................................. 70 5.2 Attrition ........................................................................................................ 70 viii 5.2.1 Difference-of-means tests for major outcomes and control variables ......... 71 5.2.2 A probit model for probability of attrition ................................................... 74 5.2.3 Difference-of-coefficient tests: attritors versus nonattritors ........................ 76 5.3 Nonlinearity ................................................................................................. 78 5.3.1 LM test results: OLS and SUR estimations ..................................................... 78 5.3.2 LM test results: IV-ZSLS and 3SLS estimations ............................................. 80 5.4 Demand Model Results ................................................................................ 85 5.4.1. The effects of controlling for expenditure endogeneity ............................ 94 5.4.2. The effects of home production ................................................................ 94 5.4.3. The effects of attrition ............................................................................... 95 5.4.4. The effects of excluding the quadratic expenditure term .......................... 95 5.4.5. The effect of imposing linearity ................................................................ 99 5.5 Rural-Urban and Income-Groups Differences ........................................... 100 5.6 The Problem of Observed Zero Expenditures ........................................... 109 5.7 Implications of demand model selection for welfare measures ................. 111 5.8 Comparison with previous studies ............................................................. 119 5.9 Chapter summary ....................................................................................... 123 APPENDIX A ................................................................................................................. 125 ADDITIONAL RESULTS FOR CHAPTER FIVE ....................................................... 125 CHAPTER SIX: SUMMARY AND CONCLUSIONS .............................................. 148 6.1 Summary .................................................................................................... 148 6.2 Conclusions ................................................................................................ 152 REFERENCES ............................................................................................................... 154 ix Table 4.1 Table 4.2 Table 4.3 Table 4.4 Table 4.5 Table 4.6 Table 5.1 Table 5.2 Table 5.3 Table 5.4 Table 5.5 Table 5.6 Table 5.7 Table 5.8 Table 5.9 Table 5.10 LIST OF TABLES Average Budget Shares of Broad Consumption Goods ................................ 61 The Composition of Composite Food Commodity Groups .......................... 63 Average Budget Shares and Prices of the Food Groups ............................... 63 Income-groups and Rural-urban differences: Budget shares and prices ...... 65 Summary statistics of household composition variables .............................. 66 Proportions of households with zero consumption for various food groups ...................................................................................................................... 68 Differences-of-Means Tests between the Attritors and Nonattritors in KIDS 1993 .............................................................................................................. 72 A Selection Probit Model for analyzing Attrition between KIDS Panel Waves ........................................................................................................... 75 Testing Impact of Attrition on the Coefficients of the Budget Share Equations for the Individual Food Groups ................................................... 78 Tests for Nonlinearity of the Demand System based on Statistical Significance of ............................................................................................. 79 Results of the Test for the Endogeneity of Expenditure ............................... 83 Tests Endogeneity of Expenditure, and for the Statistical Significance of Lambda in the demographically-extended QUAIDS Model ....................... 88 Results of the Wald Tests for Demographic effects, Structural Change, and Seasonality ................................................................................................... 91 Expenditure Elasticities estimated from QUAIDS Models with and without Endogeneity Adjustments ............................................................................ 92 Own-price elasticities estimated from QUAIDS models with and without endogeneity adjustments .............................................................................. 93 Expenditure Elasticities estimated from LA/AIDS and AIDS models with and without endogeneity adj ustrnents .......................................................... 96 Table 5.1 1 Table 5.14 Table 5.15 Table 5.16 Table 5.17 Table 5.18 Table 5.19 Table 5.20 Table A1 Table A2 Table A3 Table A4 Own-price elasticities estimated from LA/AIDS and AIDS models with and without endogeneity adjustments ................................................................. 98 Estimated Expenditure Elasticities: Rural-Urban and Income Group Differences ................................................................................................. 106 Estimated Marshallian and Hicksian Own Price Elasticities ...................... 108 Expenditure and Own Price Elasticities for the Dairy Commodity with adjustment for the Zero-Expenditure Problem .......................................... 111 Average Welfare Effects of Zero-Rating VAT on Meat ............................ 115 Average Welfare Gains of Zero-Rating VAT on Meat: Rural-Urban and Income Group Differences ......................................................................... 118 Estimates of Elasticities in South Afiica: Previous Studies ....................... 120 Comparison of elasticities from different demand model specifications.... 121 The estimated reduced forms for expenditure and expenditure—squared 125 The estimated reduced forms for individual commodity groups ................ 126 Parameter estimates for QUAIDS with and without endogeneity- adjustrnent .................................................................................................. 127 Own and Cross Price Elasticities estimated from the QUAIDS Mmodel with and without Endogeneity Adjustment ........................................................ 130 Table A5 (Part I) Expenditure and Price Elasticities estimated using Data from only Households who do not engage in Own-production .................................. 132 Table A5 (Part II) Expenditure and Price Elasticities estimated using Data on all Table A6 Table A7 Table A8 Table A9 Table A10 Households including those who Attrited .................................................. 133 Parameter Estimates from the AIDS Model with and without Endogeneity Ajustment ................................................................................................... 134 Own and Cross Price Elasticities estimated from the AIDS Model with and without Endogeneity-Adjustment .............................................................. 136 Parameter Estimates for QUAIDS: Rural-Urban Differences .................... 138 Own and Cross Price Elasticities: Rural-Urban Differences ...................... 140 Parameter Estimates of the QUAIDS Model: Income-Groups Differences 142 xi Table All Own and Cross Price Elasticities: Income-Group Differences ................... 145 xii LIST OF FIGURES Figure 5.1 Welfare Gain due to Removal of 14% VAT on Meat: Compensated Variation .................................................................................................... 114 Figure 5.2 Welfare Gain due to Removal of 14% VAT on Meat: Equivalent Variation .................................................................................................................... 114 Figure 5.3 The Distribution of Welfare Gains ............................................................. 116 Figure 5.4 Bias in Welfare Gain from Using the Nonlinear AIDS Model ................... 117 Figure 5.5 Bias in Welfare Gain from Using the LA/AIDS Model ............................. 117 xiii CHAPTER ONE INTRODUCTION 1.1 Introduction Aggregate per capita availability data suggest that South Africa is food secure in almost all basic foodstuffs. Furthermore, South Africa has the highest per capita income in Sub-Saharan Africa, and is categorized as a middle-income country with average per capita gross national income of US $3,650 in 2004 (World Bank, 2004). These facts suggest that hunger and food security should not be major policy issues in the country. However, these aggregate data mask a highly unequal distribution Of income and a huge divide between relatively affluent urban areas and destitute conditions in many rural communities. The richest 20% of the population receives over 60% of the income while the poorest 20% receives less than 3% (World Development Report, 2002). At the household level, over 30% of the population is categorized as vulnerable to food insecurity and over 20% of the children are estimated to be stunted and vitamin A deficient (Human Science Research Council, 2004). Policies designed to reduce income inequality, hunger, and malnutrition, have had mixed results. Major social, economic, and political reforms introduced since the demise of apartheid and the emergence of democratic government in 1994 have obviously redistributed wealth. But income inequality and household food insecurity remain. One of the problems is that little is known about how food expenditure patterns differ across different income groups, and across different geographic regions. Without a thorough understanding of the heterogeneity of food expenditure patterns, and how these patterns are changing over time, it will continue to be difficult to design policies that improve food security effectively over a broad range of heterogeneous low-income households. An accurate assessment of the distributional impacts of policies such as commodity tax reform requires accurate estimates of price and income effects, and how these differ across households in different socioeconomic groups. This study seeks to improve knowledge and understanding of the heterogeneity in food expenditure patterns in South Africa. The study makes use of an unusually rich panel dataset on household food consumption, collected as part of the KwaZulu-Natal Income Dynamics Study (KIDS). The KIDS dataset contains detailed information on household socioeconomic and demographic characteristics, which permit heterogeneity effects to be analyzed. The dataset followed the same households over a ten-year period, with surveys in 1993, 1998, and 2004, to study changes in their incomes, expenditures, and poverty levels. Data on prices and expenditures on various food products consumed by households were also collected. This study utilizes the KIDS data to estimate demand functions for seven food groups—grains, meat and fish, fruits and vegetables, dairy, oils and fats, sugar, and all other foods. Household locations, socioeconomic characteristics, and income levels are used to explain heterogeneity in food expenditure patterns. 1.2 Research Gap and Study Motivation This study makes three main contributions. First, to the best of our knowledge this study is the only theoretically consistent panel data study of food consumption in South Africa done to date. Previous studies on food consumption in South Afiica have either been limited to examining only one commodity (e. g., Taljaard, 2003; Nieuwoudt, 1998; Poonyth et al., 2001) or have used highly aggregated composite commodity definitions, and typically ignored any impact of demographic factors on food demand (Bowmaker and Nieuwoudt, 1990; Liebenberg and Groenewald, 1997). The only theoretically consistent study of food demand in South Africa we are aware of that uses micro-level data incorporating household demographic characteristics is by Agbola (2003). However, Agbola uses cross-sectional data that was collected in 1993—one year prior to South Africa’s first democratic government. Clearly, such data do not capture periods of important social and economic reforms that affect households’ profiles and, hence, their food consumption patterns. As earlier mentioned, South Afiica became a democracy in 1994 and as a result major policies were implemented with implications that the study by Agbola (2003) does not capture. The KIDS panel dataset allows for greater price variability across the sample, and also covers most of the period associated with major policy reforms in South Africa. Unlike most previous food demand studies, this study explicitly tests for the endogeneity of expenditure in the budget share equations, and then controls for it. Among existing food demand studies, only LaFrance (1991) and Dhar et al. (2003) consider the problem of expenditure endogeneity. Expenditure endogeneity may arise whenever the household expenditure allocation process across food groups is correlated with other factors not captured by the explanatory variables used in demand estimation (i.e., bundled in the error term). In this case, least squares estimation of the demand model gives inconsistent parameter estimates. This study takes advantage of recent advances in econometric methods designed to overcome this problem, and to enhance demand estimation with micro level panel data. We estimate a quadratic almost ideal demand system (QUAIDS) controlling for expenditure endogeneity, and explicitly accounting for the problem of observed zero- expenditures. Most of the previous food demand studies in South Africa use the almost ideal demand system (AIDS) model. The shortcoming of the AIDS model is that it assumes linear Engel curves and constant expenditure elasticity. Such assumptions have been shown to be restrictive, even in developing countries (examples include Meekashi and Ray (1999) and Abdulai (2004)). Furthermore, because of high income inequality and large disparities in the economic conditions between rural and urban households in South Africa, pooling data across all households obscures important information on variability in demand behavior across households in different socioeconomic and demographic groups. To determine the impact of this household heterogeneity on demand, this study analyzes separately the food expenditure patterns of rural and urban households, as well as households in different income groups. Second, this study builds on the work of Banks et al. (1997) to develop a test that can be used to determine whether the demand model should be specified with a quadratic (QUAIDS) or a linear (AIDS) expenditure variable. In particular, the implication of corollary 2 in Banks et al. (p.533) is that a utility-derived demand system that is rank 3 and exactly aggregable cannot have coefficients on both the linear and the quadratic expenditure terms that are independent of prices.l In other words, if such a demand model has a coefficient on the linear expenditure term that is independent of prices, then it must have a coefficient on the quadratic expenditure term that is price dependent. This study uses Bank et al.’s corollary 2 to develop a Lagrange Multiplier (LM) test that allows one to determine whether or not a QUAIDS specification is necessary. No other study was found to have explicitly conducted this test, certainly not with South Afiican data. Hence, this study provides richer information on food consumption behavior in South Africa than has been obtained from existing studies. Finally, the study examines the effects on household welfare of zero-rating the value-added tax (VAT) on meat products.2 While most of the basic food commodities such as grains, milk, fruits and vegetables are zero-rated in South Africa, meat is not. Meat is taxed at the standard VAT rate of 14%. Whether or not meat should be zero-rated has been a subject of contention between the government and lobby groups (most notably the Congress of South African Trade Unions) since the introduction of VAT in 1991 (Watkinson and Makgetla, 2002). This study contributes to this issue by providing quantitative measures of the impacts of this tax reform on household welfare. We use the QUAIDS parameter estimates to calculate indirect utilities before and afier the tax reform. These are then used to compute two money metric welfare measures of the tax effect, namely compensating variation and equivalent variation. To determine the sensitivity of these welfare measures to demand model selection, and the bias that results ' As will be made clearer below, the rank of a demand system has implications for aggregation and the non- linearity of Engel curves. Higher rank models are well suited to approximate non-linear Engel curves ofien found in empirical analyses. QUAIDS has a rank of 3. 2 A commodity is zero-rated if it is taxable, but taxed at a rate of 0%. Zero-rating a commodity is different from exempting it; by law, exempted commodities cannot be taxed (i.e., they are not taxable). when a restrictive functional form is used, we also estimate these welfare measures using parameters from the (nonlinear and linear) AIDS model. There are no studies we are aware of that compute these welfare measures for South Africa, certainly not with this dataset. 1.3 Objectives of the study The broad objective of this study is to analyze the responsiveness of South African households to food price and income changes as well as other relevant socioeconomic factors, particularly focusing on the KwaZulu-Natal Province. This objective is accomplished by estimating a food demand model using appropriate econometric techniques. The specific objectives are: 1. To estimate a household food demand model for South Africa. The model accounts for the effects of demographic and socioeconomic characteristics, and explicitly controls for expenditure endogeneity and observed zero expenditures. 2. To determine how food expenditure patterns differ across rural and urban households, as well as across income groups. 3. To estimate price and expenditure elasticities of demand for food using the model from objective (1), and to evaluate how these differ across rural and urban households as well as across households in different income groups. 4. To examine the effects on household welfare of zero-rating the value- added tax (VAT) on meat products The results of this study will provide important insights into food policy formulation and implementation in South Afiica. In particular, accounting for household heterogeneity in demand has implications on the likely effects of alternative policies on food consumption and food safety nets. These results should be particularly useful in implementation of the national food security strategy—the Integrated Food Security Strategy, established in 2002 by the South African cabinet—which emphasizes improvements in household-level nutrition and increases in the provision of food safety nets. These policies can be made more effective if they are based on behavioral parameters specific to particular demographic and socioeconomic groups. 1.4 Organization of the Dissertation The rest of the dissertation is organized as follows. The next chapter presents a review of related literature. The relationship between utility maximization theory and demand functional forms is discussed, and a review of alternative approaches to modeling preferences is provided. Chapter three presents the empirical model and discusses various econometric tests to be implemented. Chapter four describes the survey and data sources, and then presents a descriptive analysis of expenditure patterns across time and household groups. Chapter five presents the empirical results, while Chapter six concludes with a summary and conclusions. CHAPTER TWO REVIEW OF RELATED LITERATURE 2.1 Introduction This chapter reviews the literature on the theory and empirical estimation of consumer demand. The first section discusses various approaches to restricting a large number of goods in the consumer’s utility maximization problem to a smaller number, more manageable for empirical estimation. This is followed by section 2.3 which discusses demand system functional forms, and the assumptions about their underlying preferences. The various methods for incorporating demographic variables are discussed in section 2.4. This is followed by a review of the literature on censored demand modeling. The final section summarizes the key points in this chapter. 2.2 Commodity Grouping and Separability In the standard utility maximization problem, a consumer makes budget allocation decisions on large numbers of goods with different relative prices. The solution to this problem gives the amount demanded of each good as a function of its price, prices of other goods, and the consumer’s income. However, when the number of goods involved is too large, the consumer’s allocation problem becomes complex for the empirical analyst. The problem of finding theoretically appealing approaches to reducing this large number of goods to a relatively small and more manageable number has attracted, and continues to attract, attention in the demand literature. The literature proposes two alternative approaches; one that groups commodities based on the behavior of their relative prices—the composite commodity theorem—; and another that makes assumptions about the consumers’ preferences—separability and two-stage budgeting. Originally proposed by Hicks (1936) and Leontief (1936), the composite commodity theorem asserts that, if a group of prices move in parallel, then the corresponding group of commodities can be treated as a single good. The price and quantity indices of the commodity groups are used to derive an expenditure function that satisfies the usual properties of expenditure functions (increasing in utility and prices, concave in prices, and linearly homogenous). The usefulness of the composite commodity theorem in constructing commodity groupings for empirical analysis is limited (Deaton and Muellbauer, 1980a). One source of limitation is that relative prices fluctuate considerably in practice. Also, it would be difficult to justify some of the aggregates that are imposed. For example, a relatively volatile price of meat would prevent it being grouped with other foods whose prices are relatively stable. In an attempt to circumvent some of these limitations, Lewbel (1996) develops a generalized composite commodity theorem, which is an extension of original Hicks- Leontief idea. The generalized composite commodity theorem relaxes the assumption of perfect correlation among group prices and instead allows for less than perfect co- movement among intra-group prices. It assumes that the distribution of an individual commodity’s price is independent of the composite group price, and tests for the generalized composite commodity theorem are based on cointegration relationships between each of the good’s prices and the price indices of groups to which they belong. Interesting applications of Lewbel’s generalized composite commodity theorem are by Davis (2003) and Reed et al. (2005). Davis extends Lewbel’s bivariate Engel-Granger testing approach to a multivatiate framework, while Reed et al. applies the generalized composite commodity theorem to nonlinear demand systems. In contrast to the composite commodity theorem which relies on an external factor (namely, the constancy of relative prices) to define commodity groups, separability defines commodity groups using consumer preferences themselves. If preferences are weakly separable, then commodities can be partitioned into groups so that preferences within groups can be described independently of consumption in other groups. This implies that a subutility function can be defined for each group, so that the values of each of these subutilities can be added to give total utility. The concept of a utility tree, proposed by Strotz (1957, 1959), allows consumers to break a decision into multiple steps. A closely related concept is two-stage budgeting, which hypothesizes that consumers first allocate total expenditure to broad groups of goods (the first stage) and then allocate each group expenditure to the individual commodities in that group (the second stage). Weak separability is both necessary and sufficient for the second stage of two-stage budgeting (Deaton and Muellbauer, 1980a). Several studies attempt to empirically test the restrictions imposed by separability within flexible demand systems. These studies derive functional relationships that must hold between goods that belong to the same group and goods that belong to other groups, expressed in terms of price and expenditure elasticities. They then econometrically test for whether these relationships are supported by data. Included in these studies are those by Bales and Unnevehr (1988), Moschini (1992), Moschini et al. (1994), Nayga and Capps (1994), Edgerton (1997), and Carpentier and Guyomard (2001). 10 However, there are a number of problems associated with the tests mentioned above. One of the problems, as mentioned by Lewbel (1996), is that weak separability restrictions require that group prices depend on the parameters of the individual’s utility function. Also, separability restrictions are difficult to test powerfully due to the multicollinearity of aggregate price data. In the context of this study, a more important problem with both the tests for separability and commodity groupings is that they have been developed within the context of time series data. Their implementability in the context of cross-sectional or panel data is limited. Given that this study uses panel data, we will follow the ‘traditional’ approach of maintaining the assumption of weak separability between foods and all other broad consumption goods. The food items that are closely related will then be grouped together into composite food commodities, where a group comprises items that are closely substitutable. 2.3 Modeling Preferences The preference-based approach to modeling choice behavior treats the individual’s tastes, summarized by a preference relation, as his or her primitive characteristic. The theory is developed by first imposing rationality axioms on the individual’s preferences and then analyzing the consequences of these preferences on his or her choice behavior. The preference-based approach to modeling choice behavior provides a useful framework for analyzing data on demand. Functional forms proposed for the econometric analysis of these data can be evaluated in terms of whether they are consistent with theory. Failure of data to conform to theoretical predictions may indicate excessive 11 restrictiveness of the chosen functional form. It may also mean that people do not behave as theory suggests. When the chosen functional form gives meaningful results, then it becomes the basis for estimating such demand behavioral parameters as price and income elasticities. The earliest empirical demand studies are characterized by extensive use of single equation methodology. At the center of these analyses has been the measurement of elasticities. The requirement that demand systems satisfy properties such as adding-up was ignored and perhaps unimportant because these early studies considered only a fraction of the total budget (Deaton and Muellbauer, 1980a). This made it tempting to choose explanatory variables pragmatically with the goal of getting better model fits. The single equation approach to demand modeling changed with the introduction by Stone (1954) of the linear expenditure system (LES). The LES was among the first attempts to derive utility-based demand models. The derivation of LES imposes theoretical restrictions of adding-up, homogeneity, and symmetry. Among the implications of the LES are that goods cannot be inferior and that all goods must be substitutes, which obviously makes it too restrictive a functional form to model demand, except for cases where commodities are grouped into very broad categories so that it is reasonable not to expect inferiority or complementarity among them. Also, while the imposition of the theoretical restrictions helps generate degrees of freedom, it also forces the analyst to take theoretical restrictions as given (because these restrictions are embedded in the model). An interesting alternative is one that allows restrictions to be tested empirically. 12 The Rotterdam model proposed by Theil (1965) and estimated by Barten (1969) allows for restrictions to be tested statistically. In many ways, the approach followed in deriving the Rotterdam model is similar to Stone’s, except the Rotterdam model is specified using variables in first-differences. Like the LES, the derivation of the Rotterdam model emphasized the use of theoretical restrictions to generate degrees of fi'eedom. Among the limitations of the Rotterdam model are that it imposes constant price and expenditure elasticities and that it typically does not satisfy theoretical restrictions when applied to data (Deaton and Muellbauer, 1980a). It became apparent that specifying functional forms using theoretical restrictions to generate degrees of freedom did not offer much promise. It was partly for this reason that research in the 19705 focused on developing flexible functional forms. This approach entailed approximating the direct utility function, the indirect utility fitnction, or the cost function with some specific functional form that has enough parameters to be regarded as a reasonable approximation to whatever the true unknown function might be. Important contributions in this regard were the transcendental logarithmic (translog) model of Christensen, J orgensen, and Lau (1975) and the almost ideal demand system (AIDS) model of Deaton and Muellbauer (1980b). The indirect translog model (as originally proposed by Christensen et al. (1975)) is derived by applying Roy’s identity to a function that approximates the unknown indirect utility function by a quadratic form in the logarithms of the price to expenditure ratios. Unfortunately, the demand functions derived from this indirect utility function are complicated and difficult to estimate. Its modified version (J orgenson et al., 1982), the 13 direct translog model, makes the discomforting assumption that, for all goods, prices are determined by quantities rather than the other way round. Deaton and Muellbauer’s AIDS model marked an important breakthrough in the quest for flexible functional forms. In fact, no dramatic advances have been made since its introduction in 1980, although some refinements (discussed next) have been made. The particularly desirable properties of AIDS are that it satisfies the axioms of choice exactly and can be interpreted in terms of economic models of consumer behavior when applied to either aggregate or disaggregate (e. g., household) data. It also allows for consistent aggregation of individual demands to market demands. Both AIDS and J orgenson et al.’s translog models are members of the Price- Independent Generalized Logarithmic (PIGLOG) class of demand models (Muellbauer, 1976), which have budget shares that are linear functions of log total expenditure. Specification of Engel curves (i.e., relationships between a commodity’s budget share and total expenditure) that are linear functions of log total expenditure are extensions of the earlier work by Working (1943) and Lesser (1963). For many commodities, however, there is increasing evidence that Engel curve analysis based on this Working-Lesser form does not provide an accurate picture of behavior. Empirical Engel curve studies indicate that further terms in total expenditure are required for some, if not all, expenditure share equations (Lewbel, 1991; Blundell et al., 1993). Also, Engel curves may vary with the labor market status and region (Browning and Meghir, 1991). Banks et al. (1997) generalize PIGLOG preferences to allow for nonlinearities in total expenditure. We discuss in general terms the differences l4 in approach to modeling preferences using the PIGLOG class vis-a-vis other general classes. Let w,- denote expenditure share of good i (i = 1, , K), x denote total expenditure, and a(p) denote a price index used to deflate total expenditure, where p is a K-vector of prices. A general form that nest those derived from the PIGLOG class is w.- = Alp) + B.(p)lnx + C.(p)g(x) (2.1) where A,(p), B,(p) , C, (p), and g(x) are differentiable functions. Equation (2.1) says that expenditure shares are linear in log total expenditure and another function of total expenditure, represented by g(x). Thus, the C ,(p)g(x) term allows for potential nonlinearity in demands. Engel curves of the PIGLOG class have C, (p) equal to zero, so that for this class of preferences, demands are modeled as linear functions of In x. Lewbel (1991) defines the rank of a demand system as the dimension of the space spanned by its Engel curves. Based on this definition, the rank of equation system (2.1) equals the rank of the N X 3 matrix of Engel curve coefficients, having rows [A,. (p): B,(p): C ,(p)] for good i (Banks et al., 1997). This matrix has three columns, so 3 is the maximum possible rank of equation system (2.1). Exactly aggregable demand systems are defined as demand systems that are linear in functions of x. Gorman (1981) proved that the maximum possible rank of any exactly aggregable demand system (with any number of terms) is 3. Thus, based on this theoretical result, there would be little or no gain in adding additional terms of the form D, (p)h(x) if exact aggregation is desired. In fact, Banks et al. show that 15 all rank 3 exactly aggregable utility-derived demand systems of the form represented by equation (2.1) have g(x) = (In x)2 . So, given that rank 3 forces g(x) to have this specific functional form, budget shares of form (2.1) are quadratic in In x, and therefore, are quadratic in In x itself. Banks et al. (1997) also show that rank 3 exactly aggregable demand systems cannot have both 8, (p) and C, (p) independent of prices. So, the AIDS model has the form of equation (2.1) with each B, constant (that is, independent of prices) and every C, = 0. To allow for potential nonlinearity in expenditure, it may be tempting to consider extending the AIDS model by simply adding a squared expenditure term with a constant, nonzero coefficient C,. In fact, a number of studies do this (Blundell et.al., 1993; Labeaga and Puig, 2002; Christensen, 2004; Browing and Collado, 2004). However, as Banks et al. show (Corollary 2, p. 53 3), B, and C, cannot both be constants for all commodities i in a rank 3 demand system. Based on these algebraic facts and restrictions from utility theory, Banks et al. derive an exactly aggregable rank 3 demand system —the Quadratic Almost Ideal Demand System (QUAIDS). The QUAIDS model is of the form represented by equation (2.1), with the - nonlinear expenditure term set to quadratic and its coefficient, C, (p), dependent on prices via the inverse of a Cobb-Douglas price aggregator. Hence, the QUAIDS model nests the AIDS model. Due to the restriction by AIDS that B, is constant and C, = 0 , its expenditure elasticities are constant. In contrast, QUAIDS permits goods to be luxuries at some expenditure levels and to be necessities at others. In this study, we estimate the demand parameters and the price and income elasticities using the QUAIDS model, given 16 its generality over AIDS and its other desirable properties to be explained in detail in the next chapter. Recently, several demand studies have emerged that confirm the appropriateness of QUAIDS in modeling preferences. Examples using developed country data include Abdulai (2002) who applies QUAIDS to the food expenditure data from Switzerland, Moro and Sckokai (2000) who use Italian food expenditure data, Banks et al. (1997) and Blundell and Robin (1999) who both use expenditure data on broad consumption goods from the UK, and Fisher et al. (2001) who apply QUAIDS to the US. aggregate consumption data. A number of studies in developing countries are also emerging that support QUAIDS. However, these studies are fewer compared to those from developed countries. Examples include Abdulai and Aubert (2004) using Tanzanian food expenditure data, Meenkashi and Ray (1999) using Indian food expenditure data, Gould and Villarreal (2006) using food expenditure data from urban China, and Molina and Gil (2005) using aggregate consumption data from Peru. 2.4 Demographic Variables in Demand Demographic variables such as household size and age composition play an important role in determining household demand patterns. The treatment of demographic effects in the context of theoretically plausible demand systems dates back to Barten (1964). Since Barten’s work, studies proliferated that were aimed at finding theoretically consistent techniques to incorporate demographic effects into demand analysis (for a review, see Pollak and Wales (1992)). 17 Ideally, there are two ways to incorporate demographic effects into a demand system; one is to use unpooled data and the other to use pooled data (Pollak and Wales, 1978). The first approach involves separating the entire dataset into sub-samples with identical demographic profiles and then estimating a demand system for each sub-sample separately. This approach allows all of the parameters of the demand system to depend on the demographic variables, so that there is no need to specify the form of the relationship between the parameters and the demographic variables. The major drawback of this approach, apart from its apparent inefficiency, is that it does not make it possible to draw inferences about households with one demographic profile from observations on the behavior of households with different profiles. The second approach, which uses pooled data, involves three separate but interrelated steps (Pollak and Wales, 1980). The first step involves specifying a class of demand systems for every admissible demographic profile. The second involves specifying which parameters depend on the demographic variables and which do not, and the third involves specification of a functional form for each parameter which depends on the demographic variable. Earlier attempts to incorporate demographic effects into complete demand systems using pooled data have led to the development of five general procedures: (i) demographic scaling of Barten (1964); (ii) Gorman’s (1976) specification; (iii) the reverse-German specification; (iv) the modified Prais-Houthakker procedure; and (iv) demographic translating of Pollak and Wales (1981). The procedures are general in the sense that they do not assume the original demand system has a particular functional form, but can be used in conjunction with any complete demand system. 18 Demographic translation replaces the original demand system, w, (p, m), by w, (p,m)= 6, + W, (p,m —Z:, p ,6 j ), where the 5’s are translation parameters. Hence, specifying demographic variables using translation can be viewed as allowing “subsistence” (typically the intercept) parameters of a demand system to depend on the demographic variables. Linear demographic translation, specifies demographic variables as intercept shifiers S 19,-(1): 251%. (2.2) 3:] where z = (21 , ..., Z5) is a vector of demographic variables, and the 8’s are parameters to be estimated. Linear demographic translation is the most common specification in empirical demand studies. Demographic scaling involves applying scaling functions to prices and quantities. The scaling functions depend on demographic variables, and are interpreted as reflecting the number of ‘equivalent adults” in the household (when the same scaling functions are the same for all goods), or as measuring the number of equivalent adults on a scale appropriate to each good (when the scaling functions differ from one good to another). This procedure leads to the interpretation of the household’s preferences as depending not on the quantity of the raw commodities it consumes, but on the quantity per equivalent adult. The challenge when using demographic scaling is the criteria for choosing scaling functions or for choosing numerical scaling values. The criteria for specifying scaling functions have been based on such factors as nutritional and physiological needs, poverty 19 measures, and expenditure behavior of households. While such criteria may be intuitively appealing, they are often inconsistent with theory. Scaling procedures that are made to be theoretically consistent typically lead to the imposition of implausible behavioral assumptions, such as zero substitution possibilities among goods (Deaton and Muellbauer, 1980a). Gorman (1976) proposed a general form that incorporates demographic translating and scaling. Gorman’s specification is obtained from the original demand system by first scaling and then translating. The “reverse Gorman” specification is very similar to Gorman’s specification, except it is obtained first by demographic translating and then scaling. Given that this form proposed by Gorman and its “reverse” version nest demographic scaling, it inherits the same weaknesses associated with scaling. Proposed in its original form by Prais and Houthakker (1955), the modified Prais- Houthakker procedure incorporates demographic variables into demand equations using a single income scale and a specific scale for each good. The Prais-Houthakker procedure replaces the original demand system by w, (p, m) = s,W, (p, m / so), where the s,~’s are “specific scales” for commodities which depend on the demographic variables, and so is an “income scale” implicitly defined by the budget constraint 2:, p,s,W, (p,m/ s0 ) = m . However, Prais and Houthakker never reconciled their technique with an overall budget constraint (Pollak and Wales, 1981). The main limitation in applying the Prais-Houtkker procedure is that it does not yield a theoretically consistent demand system. Pollak and Wales (1981) show that this procedure yields theoretically plausible demand systems only under the very special case where the original demand system corresponds to an additive direct utility function. The limitation imposed by this additivity restriction is 20 quite severe. The implication of the additivity restriction is that no good (or group of goods) occupies any special position in the utility function (Deaton and Muellbauer, 1980a). Since the function is additive, new groups can always be created by combining any others, such that no particular relationship exists between pairs of goods. Lewbel (1985) extended Gorman’s (1976) procedure by proposing a unified approach which combined the five procedures explained above. Lewbel’s procedure modifies the expenditure function by first replacing each price by a function that depends on all prices and demographic variables and then subjecting the resulting expenditure function to a further transformation that depends on all prices and demographic variables. However, Lewbel’s contribution was mainly theoretical and too general to apply empirically; hence it has rarely been used in empirical work. A relatively recent study by Bollino et al. (2000) extends Gorman’s (1976) procedure by following an approach similar to Lewbel’s. Unlike Lewbel, Bollino et al. provide both a theoretical derivation of their technique and a procedure for empirical estimation. Unfortunately, the estimation procedure proposed in Bollino et al. is computationally complex, and it can accommodate the estimation of only a few consumption categories. In their paper, Bollino et al. applied their procedure to only three categories of goods. An unambiguous ranking of these procedures is not possible (Pollak and Wales, 1981). One of the factors that make it difficult to rank these procedures is that not all of them are nested. Their assessment also depends on the functional form used to estimate the demand system. The theoretically more appealing technique of Bollino et al. (2000) restricts the number of consumption categories that can be analyzed. However, the 21 number of goods estimated in empirical demand systems is large, so that its usefulness in practice can at best be very limited. In this study, we are estimating demand for seven food groups, which immediately rules out the use of Bollino et al.’s procedure. We estimate the QUAIDS model in its most flexible form, allowing for nonlinearity in the price index used to deflate total expenditure and allowing the coefficient of the quadratic expenditure term to depend on prices. Given that we are estimating QUAIDS in this highly nonlinear form, a preferred method to introduce demographic variables is one that will not create further nonlinearities. It is for this reason that we choose to incorporate demographic variables as intercept shifters through Pollak and Wales’s (1981) linear demographic translation method. 2.5 Observed Zero Expenditures The behavioral response of households to changes in their economic environment takes place on either the intensive or extensive margin (Meyerhoefer, 2002). Households respond along the intensive margin when they are consuming a non-zero amount of the good, so that a change in the independent variable (such as a commodity’s price) leads them to marginally increase or decrease the amount they presently consume. The extensive margin refers to households that must make a decision whether or not to consume any amount of the good when its price (or some other exogenous factor) changes. These are households who are either not consuming the good initially, or those that respond to the exogenous change by completely exiting the market for that good. The response to changes in exogenous factors by households on the intensive margin entails a continuous change in the dependent variable, and can be easily modeled by traditional 22 regression techniques. However, modeling scenarios with some households on the extensive margin and others on the intensive margin requires statistical analyses based on composite distributions. These are defined to contain a discrete probability mass on the boundary of the choice set, allowing for the positive probability of zero consumption, and a continuous density corresponding to positive consumption levels (Meyerhoefer, 2002). The early empirical work in demand modeling estimated demand functions on aggregate time series data, or household level data with highly aggregated commodity groupings. Demand estimation with these aggregate data allow the use of standard econometric techniques that assume the dependent variables in the system of demand equations follow a joint normal distribution, and hence, do not allow for the positive probability of zero expenditure levels. When aggregate data are used, the number of observations with zero expenditure share values is typically very small, such that deleting these observations from the sample and carrying out estimations on only the positive observations consistently identifies the demand function (Meyerhoefer, 2002). Subsequent work on demand modeling made increasing use of micro data on highly disaggregate commodity groups. When micro data are used, it becomes increasingly likely to observe non-consumption of some commodities by a large number of households. This makes the strategy of deleting non-consuming households unattractive, particularly given the large number of degrees of freedom that is lost. Also, the exclusion of a large number of observations in this nonrandom manner may cause selection bias. The first attempts to develop estimation techniques that explicitly capture consumer behavior on the extensive and intensive margins were done in a single equation I context. In this context, several limited dependent variable models have been developed 23 to deal with zero expenditure values generated by different underlying processes. One of the reasons for observed zero expenditures is that the market price for a given commodity exceeds the household’s reservation price, leaving the household at a comer solution and censoring the expenditure distribution at the point of non-consumption. This reasoning has motivated the use of the Tobit model (Tobin, 195 8) to estimated censored expenditure relationships. Under the Tobit formulation, the same variables are assumed to determine both the value of the continuous observations and the discrete switch to non- consumption at zero, making it only appropriate in cases where consumers are rationed out of the market by prices higher than they are willing to pay. Other models have been developed that are appropriate for situations where zero expenditure values are a result of the infi'equency of purchase, which occurs when the purchase of some commodities is not observed due to the short span of the survey period (Deaton and Irish, 1984; Blundell and Meghir, 1987). Popular among these is the “double hurdle model” of Craig (1971). The study by Wales and Woodland (1983) was among the first attempts to derive econometric techniques to estimate a theoretically plausible demand system in the presence of zero expenditures. Wales and Woodland propose two alternative models to estimate censored systems of equations, based on assumptions about preferences. The first model assumes that preferences are randomly distributed in the population, so that each individual’s direct marginal utility function for each good can be additively augmented with a normally distributed error term. These stochastic marginal utility functions are then substituted into the Kuhn-Tucker conditions to determine the set of goods with zero consumption and define the commodity demand functions. A 24 multivariate normal density function for the vector of commodity demands is derived through a change of variables transformation, and used to assign a probability to each possible combination of consumption and non-consumption. The number of integrations to be performed on the density is equal to the number of non-consumption realizations. Unfortunately, the derivation of the maximum likelihood estimates in this case involves the evaluation of multiple integrals, which can be computationally infeasible for large equation systems. The second model proposed by Wales and Woodland assumes that commodity demands are the result of individual nonrandom utility maximization subject to a budget constraint. This second model essentially extends Am‘emiya’s (1974) Tobit estimator for a system of simultaneous equations to account for the budget constraint during estimation. An error term, assumed to follow a truncated multivariate normal distribution, is added to the demand share equations. As is the case with the first model, the truncated density is obtained by integrating non-consumed goods out of the joint normal density, and the likelihood function is constructed as the product of the individual truncated density functions. The main difference between the two approaches lies in the assumptions each makes about the processes generating the zero consumption values. The first model assumes that zero consumption is determined by Kuhn-Tucker conditions, so that stochasticity enters the model through random preferences, while the second model incorporates stochasticity through additive disturbances on the share equations, so that the possibility of zero consumption occurs because disturbances follow a truncated joint normal density. The similarity between these models lies in the fact that both assume zero 25 expenditures represent comer solutions where consumers are rationed out of the market by prices higher than they are willing to pay. The drawback in both models is that their empirical implementation is virtually infeasible for larger systems of equations, given the difficulty of performing multiple numerical integrations. Building on Wales and Woodland’s first model, Lee and Pitt (1986) develop a method for estimating censored demand systems that is dual to the Kuhn-Tucker approach. Lee and Pitt also assume that preferences are randomly distributed over the population but, unlike Wales and Woodland, they use the indirect utility function resulting from utility maximization without non-negativity constraints. Application of Roy’s Identity to the indirect utility function defines what Lee and Pitt call unconstrained “latent notional demands”, each of which is a firnction of market prices. They call the demands notional because they result from utility maximization with respect to the budget constraint only, allowing them to take on negative values, and latent in the sense that only nonnegative realizations are observable (Meyerhoefer, 2002). The notional demand fimctions can be related to observed demands by finding positive shadow prices, which are themselves functions of observable market prices, supporting the zero-valued demand levels. Households compare shadow prices to market prices to select a demand regime, so that if the shadow price of a good is less than its market price consumption is zero. The likelihood function is constructed as the product of the conditional density of disturbances for the consumed goods given the non-consumed disturbances with the density of disturbances for the non-consumed goods. The likelihood must be integrated over the domain of shadow price values. The drawback of Lee and Pitt’s approach is the same as that of Wales and Woodland, namely that it can only be used where a small 26 number of commodities is involved. Hence, despite the theoretical attractiveness of the Wales and Woodland and Lee and Pitt approaches, their computational infeasibility limit their usefulness in practice. Amdt (1999) proposed a methodology that attempts to overcome the above limitation. Amdt follows Lee and Pitt’s formulation of estimating equations by solving for the reservation prices and substituting them into the demand functions. But instead of specifying a likelihood function to carry out the estimation, Amdt proposes maximizing an entropy function subject to restrictions. These restrictions include prior distributions imposed on elasticity estimates, symmetry restrictions, constraints that the reservation prices are less than or equal to market prices, and the requirement that all outcome probabilities sum to one. Since entropy maximization problems do not involve the evaluation of numerical integrals, they can be solved using standard nonlinear optimization packages. However, whether the maximum entropy estimator meets the requirements of economic theory, particularly the curvature restrictions, has not been fully investigated (Meyerhoefer, 2002). Motivated by the work of Heckman (1976), other alternative two-step procedures have been developed to reduce the computational burden of multiple integrals that bedevils one-step procedures. Heien and Wessells (1990) proposed a two-step estimation procedure for a system of demand equations with limited dependent variables. In the first step, a probit model is used to calculate the inverse Mills ratio (IMR) for each commodity. The IMR is then used as a selectivity regressor in each equation during the second stage. The system in the second stage is estimated with seemingly unrelated regression (SUR). Heien and Wessells’s procedure has seen widespread use in empirical 27 food demand studies (see Yen et al. (2002) for examples). However, it was later shown that Heien and Wessels’s procedure is inconsistent due to the presence of a mathematical error in its derivation (Shonkwiler and Yen, 1999). One feature to note about one—step estimation procedures, such as those due to Wales and Woodland (1983) and Lee and Pitt (1986), is that they are only appropriate for modeling comer solutions. This is because they assume that the same process that governs the positive observed demand also governs the consumption decision itself. But this is not the only explanation for the presence of zero expenditure levels. A natural alternative would be a procedure that captures other phenomena such as infrequency of purchases. Shonkwiler and Yen (1999) derive such as estimator as a multivariate generalization of Amemiya’s (1985) type 2 Tobit model. Their model contains a separate binary censor used to predict the probability of consumption for each good in the system. This is then multiplied by the expectation of demand for the respective good conditional on positive consumption to generate the unconditional censored demand equations. The Shonkwiler and You procedure is carried out in two steps. In the first step, single equation probit models are used to forecast the probability of consumption and construct the second stage demand system, which is subsequently estimated by either maximum likelihood or seemingly unrelated regression. This procedure is particularly suited to situations where large equation systems are involved, given the reduced computational burden associated with it. In this study, we estimate a demand system involving seven food groups. Given the suitability of Shonkwiler and Yen’s procedure for large equation systems, and its consistency, it is our chosen approach to model the 28 zero-expenditure problem. A detailed discussion of this procedure follows in the empirical model chapter. 2.6 Chapter Summary This chapter reviewed literature on the relationship between utility maximization theory and demand functional forms, and discussed alternative approaches to modeling preferences. In empirical demand estimation, a large number of goods are involved, and to reduce this large number of goods into a manageable few, the assumption of weak separability is typically invoked. The developments in the literature on econometric tests for separability have focused mainly on time series applications. The AIDS model. has been the most widely used functional form for empirically estimating price and expenditure elasticities. However, its assumptions that budget shares are linear in expenditure, and that expenditure elasticities are constant regardless of the point in the expenditure spectrum, are limiting. This study uses QUAIDS, which is a generalization of AIDS that allows for nonlinearity in expenditure and allows the goods to be luxuries at some expenditure levels and necessities at others. The effects of demographic variables are incorporated into the demand model using demographic translation. The methods that have been developed in the literature to model observed zero-expenditures are, for the most part, not suitable for large equation systems due to their computational complexity. Due to the large nmnber of equations estimated in this study, we use a two-step procedure developed by Shonkwiler and Yen (1999). 29 CHAPTER THREE EMPIRICAL MODEL 3.1 Introduction This chapter discusses the specification and estimation of the empirical model. The general form of the model is discussed first, followed by the estimation form. Section 3.3 discusses estimation issues, paying particular attention to implications of the nonlinearity of the model, and deriving an LM test for nonlinearity. Econometric issues associated with QUAIDS estimation are the focus of section 3.4. In this section, the problems of expenditure endogeneity and non-consumption are discussed in more detail, and the strategies to modeling them are provided. The final section, section 3.5, is a summary of the chapter. 3.2 Empirical Model Popular functional forms such as the almost ideal demand system (AIDS) of Deaton and Muellbauer (1980a) and the translog model of J orgenson et al. (1982) have budget shares that are linear functions of log total expenditure. However, as discussed in the previous chapter, further terms in total expenditure may be required for some, if not all budget share equations. Banks et al. (1997) show that if some commodities require these extra terms, then parsimony, coupled with utility theory, restricts the nonlinear term to be quadratic in log income. Based on this restriction, they derive an extension of the AIDS model—the quadratic almost ideal demand system (QUAIDS)—which has log 30 total expenditure as the leading term in budget share equations and higher order total expenditure terms. 3.2.1 Quadratic Almost Ideal Demand System: The General Form The QUAIDS model assumes that household preferences belong to the following quadratic logarithmic family of expenditure functions: _ ub(p) lnc(u,p) — lna(p) + 1—2(p)b(p)u (3.1) where u is utility, p is a vector of prices, a(p) is a function that is homogenous of degree one in prices, b(p) and Mp) are functions that are homogeneous of degree zero in prices. The corresponding indirect utility (V) function is: 1n V = fl 1‘” ; (150(1)) T + l(p)}—l (3.2) where x is total expenditure. The specific functional form for Mp) is: K K 1(1)) = 22,111,)” where 2,1, = 0 (3.3) i=1 (=1 and where i = 1, ..., K denote the number of goods entering the demand model. Deaton and Muellbauer’s AIDS model has an indirect utility function given by equation (3.2), 31 but with 11(p) set to zero. The specification Of the functional forms for a(p) and b(p) in QUAIDS is similar to their specification in AIDS, in which they are made to be sufficiently flexible to represent any arbitrary set of first and second derivatives of the cost function. Application of Shepard’s lemma to the cost function (3.1) or Roy’s identity to the indirect utility function (3.2) gives the QUAIDS model in budget shares form: x w, = a, +gy,1np, + ,6, ln[a(p)] + fihnL—XQHZ (3.4) where a, B, y, and A are parameters. As can be seen from the budget shares (3.4), the QUAIDS model specializes to AIDS when all of the 2’s are zero across all equations. Hence, the AIDS model is nested within QUAIDS, and the AIDS specification can be tested based on the statistical significance of the 1’s. As with the original AIDS model, the theoretical restrictions of adding-up, homogeneity, and symmetry in the QUAIDS model are expressed in terms of its parameters. Adding-up requires 21w, =1, and can be expressed in terms of model parameters as: Za,=l flaw £1130 Zy,=0 Vj. (3.5) 32 Since Marshallian demands are homogenous of degree zero in (p, x), K 27,, =0 Vi. (3.6) j=l Slutsky symmetry implies that: 7,, = 7,, Vi,j. (3.7) The parameter a,- in the QUAIDS model can be interpreted as the share of an item in the budget of a subsistence household (i.e., the case of u = O) at the base year prices (Meenkashi and Ray, 1999). The expression ,6, + 2(2, / b(P)Iln(x/ a(P))] measures the impact of a 1% increase in real expenditure on the budget share of commodity i. Unlike in the AIDS model where ,1,- = O V i, this expression is capable of changing signs depending on the point in the expenditure spectrum. In other words, the QUAIDS model allows the possibility of normal goods becoming inferior or inferior goods becoming normal, as one moves along the expenditure spectrum of households. In contrast, expenditure elasticities are all constant in the AIDS model. Formulas for the QUAIDS expenditure and price elasticities are derived by differentiating the budget share equations with respect to In x and In pj, respectively. Following Banks et al. (1997), we simplify the expressions for the elasticity formulas by using the intermediate results: 33 [1, a 5%? = ,3, + Z—&{ln[$]} (3.3) ._ 6W, _ _ K _ liflj l— 2 y” zalnp, 4" #{a’ +123,” In”) b(p){ln[a(P)]} ' (3'9) In terms of the ,u, , the formula for expenditure elasticities can be written as: e, =1+fl. (3.10) Using expression a, , the formula for the Marshallian or uncompensated price elasticities can be written as: e” =fi_ _. (3.11) where 6,115 the Kronecker delta taking the value 5,, =1 if i = j and 5,, = 0 if i ¢ j. The Hicksian or compensated price elasticities are calculated by invoking the Slutsky equation: e; =e,',‘. +w,e, (3.12) 34 The QUAIDS model is used in this study to estimate price and expenditure elasticities using panel data on households. The next subsection focuses on the empirical estimation of the QUAIDS model. 3.2.2 Quadratic Almost Ideal Demand System: The Estimation Form As before, denote commodities (and therefore, equations) by i, where i = 1,...,K , and let h = 1, ..., N denote households, and t = 1, ..., T index time periods. The empirical specification of the QUAIDS model is 2 K h A. xh S w: = a, + E y, lnp", +,6, lnl: x, ]+ ’ lnl: ' ] + E 6323-1-85 (3.13) Fl 1 1 06:) b6?) 06:) s=l ’ where Z, = (2,", ,..., 2:) is a set of demographic variables for household h at time t, In a(pf) is the price index defined as K K '= I: K lna(pf) = 01,, +2“) lnpj', +£22yjklnp21np£ (3.14) j=l j 1 and b(pf) is the Cobb-Douglas price aggregator b(pf’) = 1:103)“. (3.15) 35 The a’s, y’s, and ,B’s in the budget share equations (3.13) are restricted by theory to be the same as those in equations (3.14) and (3.15). In estimating the QUAIDS model, total expenditure, x, is defined as expenditure on all food items consumed by the household. Price data is at the cluster level, which in most cases means at the village level for rural areas or at magisterially-defined districts for urban areas. So, households in different clusters face different prices, and this is the reason commodity prices are indexed with the h (household) superscript. To control for varying preference structures and heterogeneity across households, demographic variables are incorporated in budget share equations through the linear demographic translation method of Pollak and Wales (1978). This method specifies observed household heterogeneity as a linear combination of socio-demographic variables observed in the data (sz '6, ). The socio-demographic variables considered here are household size, rural-urban dummy, race, and education of the household head. Dummy variables for the year of survey are included among the 2 variables to control for structural change in consumers’ preferences and other aggregate time effects that may influence expenditure patterns (such as those related to the overall macroeconomic environment). The month of the survey is also included to control for the likely effects of seasonality on consumption behavior. Given that there are existing food demand studies in South Africa based on the AIDS model, it is instructive to also estimate the AIDS model in this study, so that elasticity estimates can be compared with those obtained from QUAIDS. After all, once the unrestricted QUAIDS model has been estimated, the estimation of AIDS becomes a 36 trivial task because it only requires restricting the coefficient on the quadratic expenditure term to zero. The empirical specification of AIDS is: n h S w" = a + ln ” + In x, + 52" +u” 3.16 I! l 27:} p}! fl: JP?) é Is S! l! ( ) 1=l The ,6 parameters of the AIDS model determine whether goods are luxuries or necessities (Deaton and Mueallbauer, 1980a). When ,8,- > 0, an increase in x leads to an increase in w,- so that good i is a luxury. Similarly, ,6,- < O for necessities. The y), parameters measure the change in the ith budget share following a unit proportional change in p,- with x/a(p) held constant. The formula for the AIDS expenditure elasticity is given by3: e =fi+1. (3.17) e; : 7—” _ EL[WJ _ fl} 111(ij] (318) where 6,, is the Kronecker delta taking a value of one if i = j, and zero if i 75 j. The Hicksian price elasticities are obtained by invoking the Slutsky equation: 3 The household superscript and time subscripts are not included here because the elasticities are calculated at time- and household-pooled sample means. 37 e; =e; +w,e, . (3.19) The elasticities in both the QUAIDS and AIDS models are estimated at sample means of prices, expenditures, and budget shares. 3.3 Estimation For purposes of estimation, an error term, 5,? , is added to each of the commodity . _ h . . share equatrons. The errors a =[ 81'; , 5;, , ..., 5K, ] are assumed to have a multivariate normal distribution with covariance matrix 2. However, due to the adding-up condition, direct estimation of the hill equation system is not possible because 2‘. is singular. To get around this problem, one of the K demand equations is dropped from the system during estimation; the remaining (K-l) equations are estimated by maximum likelihood. The question of which among the K equations to drop is irrelevant because, as Batten (1969) shows, such a choice does not influence the demand parameter estimates. The full covariance matrix, together with the parameters of the K ‘h equation, are recovered by applying the delta method (Barten, 1969). An interesting econometric feature of both the QUAIDS and AIDS models, is that they are both conditionally linear in the price aggregators 1n a(pf) and b(pf ). This conditional linearity has been used in the AIDS model (in which the only source of nonlinearity is the ln a(pf’) price aggregator) to simplify empirical estimation. In particular, most demand studies approximate the nonlinear price aggregator ln a(pf) by a 38 linear index, which leads to a specification in which budget shares are linear in all parameters and therefore, can be estimated in a straightforward way. The most commonly used approximations for In a(pf) are the Laspeyres index, Stone index, or modified (by Moschini (1995)) Stone index. Another reason for the linear approximation is that in practical applications, prices are relatively collinear, so that In a(pf’) is approximately proportional to any appropriately defined price index (Deaton and Muellbauer, 1980a). This latter reason is particularly relevant in demand studies that use time series data. However, the imposition of a linear structure to variables whose true relationship is nonlinear can have undesirable consequences on the reliability of parameter estimates. Pashardes (1993) argues that the linearization of AIDS causes an omitted variables problem. Based on analytical expressions and empirical results, Pashardes shows that linearization of AIDS can understate own price elasticities and cross price elasticities of goods that are either luxuries or necessities, and overstate the cross price elasticities of the other goods. Buse (1994) also views linearization of AIDS as an omitted variable problem, and shows through Monte Carlo analyses that linearization may lead to inconsistency of the widely used seemingly unrelated regression (SUR) estimator. Both Pashardes and Buse’s assessment of linearized AIDS are based on the original Stone price index. Moschini (1995) shows that the Stone index fails to satisfy the commensurability property of index numbers; in other words, it is not invariant to changes in the units of measurement. Using the Laspeyres price index as a starting point, Moschini develops a price index—the modified Stone price index— which is invariant to units of measurement and which, he argues, approximates the nonlinear AIDS model well. We 39 are not aware of studies that evaluate Moschini’s modified Stone price index in a manner similar to those used by Buse and Pashardes to evaluate the original Stone index. While approximating 1n a(pf') by a linear price index solves the nonlinearity problem in the AIDS model, it does not solve the nonlinearity problem in the QUAIDS model due to the division by the price aggregator b(pf’) in the coefficient of the quadratic expenditure term. The temptation to include a linear (in parameters) quadratic term in expenditure may be natural. In fact, a number of demand studies force the coefficient of the quadratic expenditure term to be constant (included in these studies are Blundell et al. (1993), Christensen (2004), Labeaga and Puig (2002), and Browing and Collado (2004).). In the terminology of equation (2.1) of chapter 2, imposing a constant coefficient on the quadratic expenditure term is equivalent to assuming that C, (p) is independent of prices. However, as Banks et al. (1997) show, no rank 3 exactly aggregable utility-derived demand system exists that has both the coefficients on the linear and the quadratic expenditure terms independent of prices (Corollary 2 on p.533 of Banks et al. (1997)). Some of these studies (specifically Christensen (2004)) acknowledge the fact that forcing b(pf) to be constant is to give away the integrability property of the demand system.4 In this study, we build on Banks et al.’s (1997) study (particularly corollary 2) and develop a formal test for the statistical significance of prices in the coefficient of the quadratic expenditure term. The logic behind this test is that if this coefficient does not depend on prices, then there is no need to include a quadratic expenditure term in the model once the linear term has been included, because the coefficient of the quadratic ’ lntegrability as used here means that for a given system of demand functions (which have a symmetric, negative semidefinite matrix), there should be a utility function from which these demand functions can be derived. 4o expenditure term must depend on prices. Higher order expenditure terms are also unnecessary because utility theory restricts the nonlinear expenditure term to be quadratic. So, a test of the statistical significance of the quadratic expenditure term is in effect, a specification test of the AIDS versus QUAIDS model. But because AIDS can be approximated linearly and QUAIDS cannot, a test for the statistical significance of prices can also be viewed as a test for nonlinearity of the demand model. 3.3.1 A Test for Nonlinearity of the Demand Model To derive this test, it is necessary to relax the theoretical constraint in the QUAIDS model that the ,8,- parameters in the Cobb-Douglas price aggregator b(pf') are the same as the coefficients on the linear expenditure terms (that is, the coefficient on h h . . . . . . . , . h x, / a(p, )). This 13 because If we maintain the restrrctron that the ,6,- s In b(p, ) are the same as the ,6,- coefficients on the linear expenditure term, then the null hypothesis that the A’s in b(pf) are all zero will make the second term, ln( x,"/ a(pf’ )), in budget share equations to disappear. This will make the demand system to be a function only of the quadratic expenditure term, which is inappropriate. To avoid this problem, define a new K price aggregator b(pf’)= n (193)” , where 19, and ,6, are allowed to differ from each (=1 other. For ease of exposition, we suppress the household (h) and time (t) subscripts, and absorb all the terms not involving the quadratic expenditure term into the vector q and their associated parameters (i.e., parameters not involving the 6?, ’s) into the vector (p. With this new notation, the expenditure share equations (3.13) can now be expressed as: 41 " a. '1 _x_ 2 W. — g.(q.¢)+)~,[l;[p.] {ln[a(p)]} +6, (3.20) K —1 We want to test the null hypothesis that the vector of coefficients 0 in [H [93' J is i=1 identically zero (i.e., Ho: 0 = 0). The restricted model (with 0 = 0) is easier to estimate than the unrestricted model, which makes the Lagrange Multiplier (LM) test an attractive approach. Consider maximization of the log-likelihood subject to a set of constraints c(0) — r = 0. Let x be the Lagrange multiplier and define the Lagrangean function: A = lnL+K(c(0)—r) (3.21) where In L is the log-likelihood function for commodity i given by: 2 lnL = —%ln(27r)- glue2 — lip“ ‘:(W'» ]. (3.22) i=1 The first derivative of the log-likelihood function with respect to 6, is: alnL _ K 9,4 .9” a, _x_ 2 66’, — l.[l;[p.~ J (mp. p. DP,- ]{1n[a(p):|} . (3.23) 42 Evaluated at the null, Ho: 0 = 0, the first derivative (3.23) becomes: 6 ln L x 2 35-2., 1n p, {Int—d} . (324) Based on equation (3.24), a test for statistical significance of prices in b(p) reduces to K adding price times expenditure-squared interaction terms (Zln p, -{In [x/a(p)])2 ) to the i=1 demand model that is linear in expenditure (i.e., equation (3.13) with A: = 0) — the unrestricted model — and comparing it with the restricted model, which is just the QUAIDS expenditure share equations (3.13). To carry out this test, we first estimate the restricted model and obtain the residuals. These residuals are then regressed on all variables, including the price times expenditure-squared interaction terms. The R- squared, R3 , from this regression is used to compute the LM statistic, LM = N - R: . This LM statistic follows a Chi-squared distribution with degrees of freedom equal to the number of restrictions being tested. For testing purposes, the translog price aggregator, 1n a(pf ), is approximated by the modified Stone price index suggested by Moschini (1995), In a(pf' )22 [:1 w,01n( ,,), where 75,0: Hz 1w: is the mean budget share across households in the base period. The LM test just discussed is useful for preliminary analysis of the data to determine whether the demand model should be specified with a quadratic (QUAIDS) or a linear (AIDS) expenditure variable. An obvious alternative would be to estimate the QUAIDS 43 model and test for the statistical significance of the quadratic expenditure term.5 However, the QUAIDS model is highly nonlinear and difficult to estimate. This LM test is a usefirl contribution because it allows one to test pararnetrically whether or not the quadratic expenditure is necessary, without having to estimate the highly nonlinear QUAIDS model. 3.3.2 Nonlinear Estimation Given the speed and power of the nonlinear algorithms available today, maximum likelihood estimation of more flexible firnctional forms of the demand model is feasible. This can improve the precision with which income and price elasticites are measured. For this reason, we estimate the QUAIDS demand model (3.13)-(3.15) in its nonlinear form, allowing flexibility in the price aggregators 1n a(pf) and b(pf‘ ). When modeling 1n a(pf) in the flexible form (3. 14), one of the problems is that it is virtually impossible to estimate a0 empirically. Deaton and Muellbauer (1980b) suggest assigning a value to 01,, prior to estimation. In particular, they propose interpreting do as the outlay required for a minimal standard of living when prices are unity (see Deaton and Muellbauer (1980b, p. 316). However, as Moschini et al. (1994) observe, the likelihood function is flat in ac , so that the actual choice of a0 does not matter for the approximation properties of the demand model. This implies that the computed elasticities are not affected by the choice ofoto. Moschini et al. choose a value of a0 = O, which proved useful in their context given that it simplified the formulas for 5 Also, one can use nonparametric methods to analyze the shape of the Engel curves (i.e., relationships between a commodity’s budget share and total expenditure), as did Banks et al. (1997). 44 their separability tests. Our choice of (10 in this study is based on the suggestion by Deaton and Muellbauer, primarily because choosing are in this way has relevance to economic theory. Apart from its inherent nonlinearity, the QUAIDS model has a very large number of parameters. To reduce the total number of parameters to be estimated, cross-equation restrictions are imposed during estimation. All the nonlinear AIDS and QUAIDS models are estimated by maximum likelihood using Stata, extending the programs written by Poi (2002) for estimating a four-equation demand system with no demographic variables to those that allow for a seven-equation system with demographic variables. 3.4 Econometric Issues 3.4.1 Attrition A typical concern when using household panel data involves the extent of sample attrition and the degree to which attrition is nonrandom. While attrition is a common concern in any longitudinal study, it is particularly serious for studies conducted in developing countries, due to the generally poor communication infrastructures. Furthermore, the high levels of mobility and long distance migration associated with development are likely to complicate longitudinal survey work in developing countries. Partly offsetting these concerns, however, are the much lower refusal rates typical in developing countries, perhaps reflecting lower opportunity costs of time and possibly different cultural attitudes toward the interviewing process (Deaton, 1997). While a large literature exists in developed countries on the implications on nonrandom attrition, only a few studies have considered this topic in developing countries, perhaps reflecting the 45 relative paucity of panel datasets in developing countries.These studies include those by Alderman et al. (2000), Maluccio (2000), and Thomas et al. (2001)). In theory, three factors underlie the level of attrition in a survey: (1) the mobility of the target population, (2) the success with which those who move are followed and re- interviewed, and (3) the number of refusals. Thus, attrition is often closely linked to migration behavior (Maluccio, 2000). In the field, poor effort by enumerators and fieldworkers can also exacerbate attrition. Attrition in panel surveys can be viewed as a specific type of nonresponse and, from a conceptual viewpoint, many of the insights regarding nonresponse in cross- sectional surveys carry over to panels. Fitzgerald et al. (1998) provide a statistical framework for the analysis of attrition bias. They distinguish between two types of sample selections; selection of variables observed in the data, and selection on variables that are unobserved. They develop tests for attrition using the two selection types. While neither of the two attrition/selection types necessarily imposes a bias on estimates, selection on observables is more amenable to statistical solutions. In particular, if one finds that there is attrition in the data, then one can determine whether or not there is selection on observables. Selection on observables basically means sample selection based on variables that are observed prior to attrition (e. g., in the first round of the survey). Even if there is selection on observables, this does not necessarily bias the estimates of interest. Thus, one needs to test for possible attrition bias in the estimates of interest as well. More formally, assume that what is of interest is a conditional population density f(y|q) where y is a scalar dependent variable and q is a scalar independent variable (an 46 extension to make q a vector does not change to results of the discussion). The model takes the form y = no + 7r,q + s, y, observed if A = 0 (3.25) where A is an attrition indicator equal to 1 if an observation is missing its value of y because of attrition, and equal to zero if an observation is not missing its value of y. Since (3.25) can be estimated only if A = 0 (that is, one can only determine g(qu, A = 0)), one needs additional information or restrictions to infer f(.) from g(.). These can come from the probability of attrition, Prob(A = 0L)», q, z), where z is an auxillary variable (or vector of variables) that is observable for all units but not included in x. This implies estimations ofthe form A. =00 +0",q+0'22+u (3.26) A, :1 if A,‘ 2 o , (3.27) =0 if A, <0 Selection on unobservables occurs if z is independent of e|q but v is not independent of elq. Selection on observables is the reverse: it occurs if z is not independent of elq but 0 is independent of elq. That is, selection on observables occurs if Prob(A = Oly, q, z) = Prob(A = Olq, z); selection on unobservables occurs if this equality fails to hold, so that the attrition function cannot be reduced from Prob(A = 0b», q, z). 47 Selection on unobservables is often presented as dependent on the estimation of the attrition index equation. Identification, however, usually relies on nonlinearities in the index equation or an exclusion restriction, i.e., some 2 that is not in q. It is difficult to rationalize most such exclusion restrictions because, for example, personal characteristics that affect attrition might also directly affect the outcome variable, i.e., they should be in q (Alderman et al., 2000). There may be some such identifying variables that are external to individuals and not under their control, such as characteristics of the interviewer in the various rounds. However, identifying restrictions are generally not available, which makes selection on unobservables an obstacle to accurate parameter estimation. If there is selection on observables, the critical variable is z, a variable that affects attrition probability and that is also related to the density of y conditional on q. Two sufficient conditions for the absence of attrition bias due to attrition on observables are either (1)2 does not affect A or (2)2 is independent of y conditional on q. Attrition tests can be based on either of these two conditions. One test is simply to determine whether candidate variables for z significantly affect A. Another test is based on Becketti, Gould, Lillard, and Welch (BGLW) (1988). In the BGLW test, the value of y at the initial wave of the survey (yo) is regressed on q and on A. The test for attrition is based on the significance of A in that equation. The analysis of attrition in this study follows the approaches suggested by Becketti et al. (1988) and Fitzgerald et al. (1998). In particular, we test for whether or not attrition significantly affects estimated multivariate relations. Our analysis of attrition begins with a comparison of the means of a selected number of key household and community variables. Besides being informative, the comparison of means is also 48 intuitively appealing, because the idea that attrition is likely to bias estimates is often made on the basis of such univariate comparisons (Alderman et al., 2000). We then estimate probits for the probability of attrition in order to investigate what variables predict attrition and determine whether or not the probability of attrition can be explained significantly by observable variables. Finally, we test whether coefficient estimates differ for the two subsamples, one that attrits and one that is re-interviewed. The results from these attrition tests will help us in deciding how to deal with it. 3.4.2 Expenditure Endogeneity Most empirical demand analyses do not cover all products and services that households purchase. Data limitations, finite computer memory, and the increased complexity and time required for estimating large models make it necessary to abstract from a completely specified demand system containing a different equation for each of the myriad goods available in the market (LaFrance, 1991). The practice is typically to assume that preferences are separable and estimate a set of conditional demands for the goods of interest as functions of prices and total expenditure on these goods (Pollak, 1969). However, such a practice raises questions regarding the possibility of simultaneity bias in the budget share equations. Total expenditure may be determined jointly with the expenditure shares of the individual commodities being analyzed, making it endogenous in the expenditure share equations. Also, expenditure endogeneity issues may arise whenever the household expenditure allocation process is correlated with other unobserved behavior not captured by the explanatory variables in the budget share equations. In this case, these unobserved effects would be bundled in the error term. 49 Estimation ignoring expenditure endogeneity may lead to inconsistent demand parameter estimates. In cross-sectional demand studies, the common procedure to control for expenditure endogeneity is instrumental variables. With panel data, a number of possibilities to correct for unobserved heterogeneity are available, including linear transformations of the original model, such as through fixed effects and first differencing to remove the unobserved heterogeneity component of the error term. However, such transformations are difficult to implement with nonlinear models such as QUAIDS derived from consumer utility maximization theory. In this study, we follow Bundell and Robin (1999) and control for endogeneity using an extension of the limited information augmented regression technique suggested by Hausman (1978). This procedure is also known as the control function approach. To illustrate how the augmented regression technique works, consider the regression of y), the dependent variable, on a set of exogenous explanatory variables, 2, and an endogenous explanatory variable, y;, i.e., y) = z'p + 1922.6 Also, suppose an instrumental variable, 22, exists for yz. Correction for the endogeneity of y; using the control function approach proceeds in two steps. The first step involves estimating a reduced form regression of the endogenous variable on a set of instrumental variables, where the set of instrumental variables include all the other exogenous explanatory variables (i.e., regress y; on 1 and 22). The residuals, 17 , from this first-stage regression are then included as an additional explanatory variable in the original y) equation. The OLS estimates of the parameters p and 1c in this augmented regression are identical to the Two- 6 For illustration purposes, we consider the case of one endogenous variable and one instrumental variable. The case of multiple endogenous variables and multiple instruments can be handled in a straightforward way using the basic framework explained here. 50 Stage Least Squares (ZSLS) estimator (Blundell and Robin, 1999). Moreover, testing for the significance of the coefficient on 13 is a test for the exogeneity ofyz. Following Banks et al. (1997), we use total household income and its square as instruments for expenditure (and expenditure squared). 3.4.3 Observed Zero Expenditures In each equation of the QUAIDS model, the dependent variable w,',’ is observed with nonnegative values. In situations where micro data are used, it is very likely to observe non-consumption of some commodities due to purchase infrequency and corner solutions. If a nonnegligible proportion of the w,',' values are identically zero, then the w: variable becomes partly continuous with a positive probability mass at zero. OLS regression using the subsample for which w: > 0 estimates the demand parameters inconsistently due to nonrandom sample selection problem, while OLS using all of the data will not consistently estimate the demand parameters due to the nonlinearity in the conditional mean of wf,’ (for a general discussion see Wooldridge (2002), pages 524- 525) In the case of a single-equation demand model, censoring in the dependent variable can be handled in a straightforward way by applying a maximum likelihood (ML) Tobit model. However, when estimation is for systems of equations, and censoring occurs in multiple equations, then direct estimation by ML becomes difficult because of the need to evaluate multiple integrals in the likelihood function. To a large extent, this problem of having to evaluate multiple integrals explains why many of the theoretical models discussed in chapter 2 have seen virtually no use in empirical demand studies. 51 The two-step procedure proposed by Heien and Wessells (1990) offered great promise as a solution to the computational infeasibility of these models, but as was shown by Shonkwiler and Yen (1999), it is inconsistent due to a mathematical error in its derivation. In this study, we apply the consistent two-step procedure developed by Shonkwiler and Yen, which corrects for the inconsistency associated with the Heien and Wessells procedure. To introduce the Shonkwiler and Yen procedure, consider a structure in which censoring of each commodity i at time t is governed by a separate stochastic process it h 2,, ' 1' , + V” such that h h h h , h - h h Wu =Wu t’mt a‘I’)+£u lfz,,'1', +v,, > 0 (3 .28) = otherwise where w,',' is observed expenditure share for hth household, \y is a vector containing all parameters in a particular demand equation, 1: is a vector of exogenous variables, 1'; is a conformable vector of parameters, and a: and v,’,' are random errors; p and m are interpreted as before. Assume that the vector of disturbances a? = [8,}; , 6‘51”.” 62,] and h v, = [v3 , v; ,..., v2] are normally distributed. Correlation is allowed only between h h h h h, . z,,'r, + v,, and w,, , ,m, ,\|I) for each commodlty and among w,’,' (pf,m,” mt) and 52 wj’, f' ,m," ;\|l ), i #127 Using equation (3.28) and the bivariate normality of [8: ,v,’,' I , the mean of w,’,’ conditional on a positive observation is z’."'r EIWIIIVL' > -ZZ'T.-)= w£(pf,m,";w)+ 5.- $13.1: (3.29) where ¢ (.) and (I>(.) are the standard normal probability density and distribution fimctions, respectively. Based on the facts that Prob(vf,’ > —z,’; ’1', )= (b(zf; 'T,) and E (w;l Iv;l < —z,';'t,)= O , the unconditional mean of w,’,’ is E(w,’,’)= (b(zfi'r, )w: f,m,";w)+ 6,¢(z,'.;'r,). Based on E(W: ), the system of share equations can be written as w: = @(zixr. :(pi',m.";w)+ 6.¢(z$:'r.)+ 4‘: (3.30) where 6,? = w: — E (w;I | pf , m,” , z 3 ). System (3.30) can be estimated in two steps: (i) first, obtain the maximum-likelihood probit estimates ‘9, of 1', using the binary outcomes wf,‘ = O and w,',' > 0 , and then (ii) calculate ¢(zf;’r,) and Chiz] 164.08 [0.0000] ' Standard errors in parentheses? “" indicates significance at 1 percent level; ” and "‘ indicate significance at 5 % and 10 %, respectively. These variables are also not jointly significant (p=0.4297). Households who owned the dwellings they lived in 1993 were more likely to be re-interviewed, and those who had been victims of crime 12 months prior to the 1993 survey were less likely to be re-interviewed. The Chi2 statistic for the overall significance of the relation is significant, which implies that attrition between 1993 and 2004 was not purely random. The issue of whether or not attrition affects the consistent estimation of the parameters of interest in this study is dealt with in the next section. 75 5.2.3 Difference-of-coefficient tests: attritors versus nonattritors The question addressed in this section is whether or not the coefficients for the outcome variables that are relevant to this study are affected by attrition. In other words, the task is to find out whether or not the coefficients of interest in this study can be estimated consistently using only the sample of households that were re-interviewed. To accomplish this task, we examine whether or not the households who subsequently leave the sample differ in their initial behavioral relationships from those who stay. Following Becketti et al. (1988) and Alderman et al. (2000), we regress an outcome variable at the initial wave of the survey on predetermined variables and test whether the coefficients of the predetermined variables differ for those respondents who are subsequently lost due to attrition versus those who are re-interviewed. The outcome variables chosen for this analysis are the budget shares of the food groups that enter the demand model. We estimate reduced forms for the budget share equations, and test for whether there are significant statistical differences in the estimated coefficients of nonattitor and attritor households. '6 To carry out the above difference-of-coefficients test, we divide the 1993 sample into two groups, group 1 comprising households who are interviewed in all the three panel surveys, and group 2 comprising households who subsequently leave the sample. Two indicator variables are then created, corresponding to each of the two groups. The first indicator variable, G1, takes on the value 1 if the household belongs to the first group and 0 otherwise, and the second indicator variable, G 2, takes on the value 1 if a household belongs to the second group and 0 otherwise. The budget share equations are then '6 In the next sections, we estimate two demand models separately, one using all households and another using only households in the panel, to further determine the impact of attrition on coefficient estimates. 76 regressed on a set of household and community variables and their interactions with the G2 variable. The test for equality of coefficients involves testing whether the coefficient on each non-interacted variable is equal to the coefficient of the corresponding variable interacted with G 2. To fix ideas, let 2" represent the set of household and community variables included as explanatory variables in the reduced-form food expenditure function, and let z’,’ represent a full set of interaction terms involving G; and the 2" variables. The following reduced form regressions are estimated .11- h h h w, =a+z 'B+z,_'y+v The test for equality of coefficients between attritors and nonattritors is simply an F-test of the null hypothesis that y = 0. The same household characteristics that enter the selection probit model are used in the reduced form for household food expenditure, except the property ownership variable is left out given that it is not expected to directly affect food consumption decisions. Also, the aggregate food price index is replaced by the individual commodity prices. The results of the F tests are reported in Table 5.3 (to conserve space, the full set results are not reported). We report two F statistics at the bottom of Table 5.3. The first F statistic is for the joint equality of all coefficient pairs including the constant term, and the other for joint equality of all coefficients excluding the constant term. None of these F statistics is statistically significantly different from zero, implying that we cannot reject the null hypothesis of equality of coefficients for attritors and nonattritors. In other words, it can be concluded that the coefficients on the variables in these budget share equations can be estimated consistently using the sample of only nonattriting households. 77 Table 5.3 Testing Impact of Attrition on the Coefficients of the Budget Share Equations for the Individual Food Groups F tests for equality of coefficients Grains Meats/Fish FruitsNeg Dairy Oils/fats Sugar Other Joint effect of 0.75 0.67 1.31 0.84 0.98 1.02 1.31 attrition on all (0.6816) (0.7494) (0.2172) (0.5867) (0.4611) (0.4238) (0.2172) estimates including the constant Joint effect of 0.43 0.68 1.07 0.93 1.08 1.12 1.34 attrition on all (0.9209) (0.7249) (0.3790) (0.5008) (0.3764) (0.3422) (0.2092) coefficients but not the constant p—values (for prob > F) in parentheses Hence, based on these F -test tests, it can be concluded that, overall, attrition does not have a significant impact on the slope coefficients of the budget share equations.17 5.3 Nonlinearity 5.3.1 LM test results: OLS and SUR estimations Our test for nonlinearity, developed in chapter three, builds on the work by Banks et al. (1997). In particular, the implication of corollary 2 in Banks et al. (p.533) is that a utility-derived demand system that is rank 3 and exactly aggregable cannot have both the coefficients on the linear and the quadratic expenditure terms that are independent of prices. In other words, if such a demand model has a coefficient on the linear expenditure term that is independent of prices, then it must have a coefficient on the quadratic expenditure term that is price dependent. Hence, our test for nonlinearity involves testing ’7 Another way to check for the impact of attrition on coefficient estimates is to estimate the model of interest without attrition correction and with attrition correction (using a procedure such as inverse probability weighting), and then examine if there are large differences in the coefficient estimates. However, this is computationally burdensome when using nonlinear models that require difficult user- defrned programs. 78 for the statistical significance of prices in the coefficient on the quadratic expenditure term. The top part of the table reports results of the LM tests for nonlinearity in each of the budget share equations. The first column of Table 5.4 reports the results of these tests from pooled OLS estimation. Table 5.4 Tests for Nonlinearity of the Demand System based on Statistical Significance of Prices of the Coefficient on the Quadratic Expenditure Term Nonlinearity tests in the individual budget share equations Significance of Prices: LM Statistics (Heteroskedasticity-robust) Commodity OLS (p-value) IV-ZSLS (p—value) Grains Meat and fish Fruits and vegetables Dairy Oils, butter, and other fats Sugar Other 22.03 (0.0025) 5.23 (0.6314) 23.99 (0.0011) 23.77 (0.0012) 12.43 (0.0372) 19.63 (0.0064) 75.46 (0.0000) 24.64 (0.0009) 19.63 (0.0064) 21.59 (0.0030) 7.63 (0.3660) 6.98 (0.4310) 11.34 (0.1244) 41.87 (0.0000) System-wide test for nonlinearity (all budget share equations) SUR 3SLS Chi-square (p-value) 158.63 (0.0000) 87.82 (0.0000) Based on these results, the null hypothesis that the coefficient on the quadratic expenditure term is independent of prices is rejected (at the 10% significance level) in all individual budget share equations, except that for meat and fish. The implication of these results is that the individual budget share equation for meat and fish only require a linear 79 expenditure term, so that the inclusion of the quadratic expenditure term is unnecessary in this case. To allow for cross equation correlations, we also estimate the equations jointly in a systems framework and test the null hypothesis that the coefficients on the expenditure terms across all equations do not depend on prices. Results of the Chi2 test computed from the SUR estimation of budget share equations are reported at the bottom part (first column) of Table 5.5. This test provides strong evidence against the null of price- independence of the coefficient on the quadratic expenditure term. 5.3.2 LM test results: IV-ZSLS and 3SLS estimations As discussed in chapter three, total household food expenditure is likely to be correlated with the error term, and this may affect the results of the LM tests. In this subsection, we explicitly test for the endogeneity of expenditure to determine whether or not the LM test needs to be adjusted. Total household income is used as an identifying instrumental variable (IV) for total household food expenditure.18 For income to be a good IV for expenditure, it must meet two conditions: the relevance condition, which requires that income be sufficiently correlated with expenditure (the endogenous variable), and the exogeneity condition, which requires that income must not be correlated with the error term in the demand model. The former condition is testable, and the latter cannot be tested (see Wooldridge, pages 118-122). A test for the relevance condition involves determining whether income is partially correlated with expenditure. '8 For brevity, the term expenditure will be used to refer to total household expenditure, and income will be used to refer to total household income. 80 This test involves determining whether log total household income (In m) is statistically significant in the reduced form regression for log expenditure (In x). The middle column of Table A1 in the appendix to this chapter reports parameter estimates of the reduced form regression for In x. Based on simple t-tests, the coefficient on (In m)2 is significantly different from zero, while the coefficient on In m is not, mainly due to the collinearity between 1n m and (In m)2. A formal test for the relevance condition involves testing for the joint significance of the coefficients on both instrumental ’1 variables in the reduced form. The results of the F tests for the joint significance of In m and (In m)2 are presented in the bottom row of Table 5.4. These tests provide evidence of a strong partial correlation between 1n m and (In m)2 and In x. Thus, based on the results of this test, it can be concluded that income and income squared are relevant instruments for expenditure, and hence, the former will be used as instrumental variables for the latter in the analyses that follow (the exogeneity assumption is, of course, maintained). Table 5.4 also reports the results of the tests for expenditure exogeneity. The procedure for carrying out endogeneity tests is as discussed in chapter three; it involves augmenting the budget share equation for each food group with residuals from the reduced forms for expenditure (1?), then testing for their statistical significance. Table A2 at the end of this chapter presents estimates of the reduced forms for each budget share equation. The results of the tests for the significance of r7 (and hence, for the test of the null hypothesis that expenditure is exogeneous) are reported in the top portion of Table 5.5. As can be seen from these test results, there is limited statistical evidence against the null hypothesis of expenditure exogeneity. The hypothesis is rejected (at the 10% significance level) only in the budget shares equations for grains, meat and fish, and 81 dairy. The strong statistical evidence supporting expenditure exogeneity in the grains budget share is somewhat counterintuitive. One would have expected that, because grains comprise food items that are major staples in South African diets, its relation with total food expenditure would be somewhat ‘fixed.’ The problem with testing for endogeneity in the individual budget share equations is that it ignores correlations among the equations. To allow for these correlations, we estimate the budget share equations as a system using the seemingly unrelated regressions (SUR) and then test the null hypothesis that 1? is statistically significant across all the .f’T‘m'I equations. Results of the 352 tests for the statistical significance of 1? are reported in middle part of Table 5.5 (to conserve space, not all the coefficient estimates of the SUR estimation are reported). The first 12 statistic is computed from the unrestricted SUR estimation, while the second one is calculated from the theory-restricted SUR estimation; the demand theory restrictions imposed are homogeneity and symmetry. Both the restricted and unrestricted SUR-based tests reject the null hypothesis that the coefficients on I? are jointly zero across all equations, providing evidence that expenditure is endogenous in the system. Since we are using two identifying instruments (In m and (In m)2) for In x, we have one overidentifying restriction. Our test for overidentifying restriction follows Wooldridge (2002), pp. 122-124. This test (also known as the Sargan test) involves estimating each budget share equation by IV-ZSLS (using 1n m and (In m)2 as instruments) and obtaining residuals, regressing these residuals on all exogenous variables and obtaining the R-squared statistic (call this R: ), and then computing {(1) as 82 the product of the number of observations times R3. The results of these tests are also presented in the top portion (second column) of Table 5.5. Table 5.5 Results of the Test for the Endogeneity of Expenditure Tests for endogeneity of expenditure in the individual budget share equations Commodity Endogeneity tests Overidentification tests t stat (p-value) 12 (p-value) Grains 3.59 (0.000) 0.01 (0.958) Meat and fish -4.11 (0.000) 8.40 (0.004) Fruits and vegetables 1.37 (0.171) 0.27 (0.600) Dairy -1.85 (0.064)) 4.58 (0.032) Oils, butter, and other fats -0.26 (0.796) 2.17 (0.141) Sugar and sugar products 1.13 (0.258) 10.03 (0.001) Other 0.22 (0.825) 1 1.12 (0.000) Tests for endogeneity of expenditure across all budget share equations in the system SUR (Unrestricted) 26.73 (0.0002) SUR(Restricted)' 17.64 (0.0072) Test for the relevance of income and income-squared as IVs for expenditure F stat. (p-value) 1 16.90 (0.0000) ' The demand theory restrictions imposed during estimation are symmetry and homogeneity (additivity is satisfied automatically by the data) As with the findings from expenditure exogeneity tests, statistical evidence on the validity of the instruments based on the overidentification tests is mixed across budget share equations. Statistical evidence in support of the exogeneity of In m and (In m)2 is strong in the budget share equations for grains, fruits and vegetables, and oils and fats. There is no clear relationship between findings from exogeneity tests and overidentification restrictions (i.e., the budget share equations that pass the 83 overidendification tests are not the same as those that pass (or do not pass) expenditure exogeneity tests). The finding that not all the budget share equations pass the overidentification tests indicates that one of the instruments, In m or (In m)2, may not be completely exogenous in these equations. In summary, the exogeneity tests indicate that expenditure is endogenous in the budget share equations for grains, meat and fish, and dairy. However, the null hypothesis that expenditure is exogenous across all budget share equations is rejected. Given these findings, it is necessary to determine whether or not the LM tests results stay the same when expenditure endogeneity is adjusted, particularly for the grains, meat and fish, and dairy commodities. The results in the second column of Table 5.4 are computed from budget share equations estimated using instrumental variables two-stage least squares (IV-ZSLS), with income used as an IV for expenditure. As can be seen from these results, after adjusting for expenditure endogeneity, the null hypothesis that the budget share equation for meat and fish is linear expenditure in expenditure is rejected. We consider these results (based on IV-2SLS) to be more reliable in the case of meat and fish, given the finding from the endogeneity tests which indicated that expenditure is endogenous in the budget share equation for meat and fish. Contrary to the LM test results based on the equation-by- equation OLS estimations, the null hypotheses that expenditure is linear in the budget share equations for dairy and oils and fats are not rejected. The null hypothesis of expenditure exogeneity was not rejected in the budget share equation for oils and fats, which implies that if the individual budget share equations are considered, the OLS results may be more reliable. 84 To account for cross equation correlations, we also estimated the equations as a system, and tested the null hypothesis that the coefficient on the quadratic expenditure term does not depend on prices. The results of this test are reported in the bottom part (second column) of Table 5.4. The )8 statistic for this test is computed from three-stage least squares (3 SLS) estimation with the expenditure instrumented by income. Similar to the results of the SUR test, this test provides strong evidence against the null of price- independence of the coefficient on the quadratic expenditure term. In summary, the null hypothesis that expenditure enters the demand model linearly is rejected in six of the seven budget share equations based on the endogeneity-unadjusted OLS estimations, and in four budget share equations when expenditure endogeneity is corrected for using IV-ZSLS. When cross-equation correlations are allowed for using SUR and 3SLS, we find strong evidence in favor of the QUAIDS model specification. Given these results of the nonlinearity tests, and the finding in the previous subsection that expenditure is endogenous in some budget share equations and that it is endogenous in the equation system, we will proceed by estimating the demand models that include the quadratic expenditure term (that is, the QUAIDS model) with expenditure endogeneity endogeneity adjusted for. Results of the endogeneity-unadjusted models will also be presented, so as to determine the extent to which ignoring endogeneity biases the parameter estimates. 5.4 Demand Model Results This section reports estimation results of the demand models. All of the demand models are estimated using pooled maximum likelihood (ML), with convergence occurring at 0.000001, the default tolerance level for MLE in Stata. Theoretical 85 restrictions of adding-up, homogeneity, and symmetry are maintained during estimation. The budget equation for the ‘other’ food group was deleted during estimation to avoid singularity of the variance-covariance matrix when all seven equations are included. Parameters of the deleted equation are recovered through the adding-up restrictions. Due to the large number of parameters estimated, not all estimation results will be presented. Presentation will concentrate mainly on the elasticity estimates, and how these change when different model specifications are considered. This section starts by conducting model specification tests. The LM tests for QUAIDS versus AIDS model specifications based on the statistical significance of prices in the coefficient on the quadratic expenditure term (Table 5.5) supported the QUAIDS specification. In this section, we conduct the likelihood ratio (LR) tests for AIDS versus QUAIDS specifications. The LR testing approach differs slightly from the LM-based tests of the previous section in that the LR tests are based on the observation that the only difference between the AIDS and QUAIDS models is that AIDS contains only the linear expenditure term, while QUAIDS contains both the linear and quadratic expenditure terms. In other words, QUAIDS nests AIDS, so that once QUAIDS has been estimated, a test for whether or not AIDS is the appropriate model specification involves simply checking for the statistical significance of the quadratic expenditure term. The coefficient K on the quadratic expenditure term is 2., / b(p) = 2, /1_[p”' so that testing for its if ’ i=1 significance involves simply testing for the statistical significance of lambda (2,). The main difference between this test and the LM tests is that this test requires the estimation of the QUAIDS (unrestricted) model, as opposed to the LM test for which only the estimation of the restricted model is needed. Also, the approaches followed in deriving 86 these tests are different in that the tests based on LM statistic tests for the statistical significance of prices in b(p), while the LR test checks for the significance of 2 in the coefficient on the quadratic expenditure term. The tests for significance of 2,- are conducted in the endogeneity-uncorrected QUAIDS model. The reason for using endogeneity-uncorrected model, as Opposed to the model with endogeneity corrected for, is to avoid problems of inferential invalidity caused by generated regressors (Wooldridge, 2002; pp. 115-118). This problem arises here because the use of the control function procedure involves including the residuals from the reduced form regressions as regressors in the demand model. But, because these residuals are generated using the same data used for demand estimation, their inclusion as regressors raises questions in terms of the asymptotic validity of the estimated standard errors and test statistics on other regressors (parameters can still be estimated consistently). The first column of Table 5.6 reports results of the tests of the null hypothesis that lambda (2,) is zero in the budget share equation of each of the food groups. This hypothesis is rejected in the budget share equations of four of the seven food groups, thus providing some evidence in support of the QUAIDS model specification. So, in addition to the budget share equation for meat and fish, which was also found to require only the linear expenditure term by the LM tests (Table 5.5), the LR tests indicate further that the budget share equations for fruits and vegetables and oils and fats only require the linear expenditure term. The findings in the case of oils and fats are the same as those from IV- ZSLS. As explained above, the approaches followed in constructing the two tests differ, so that their leading to different conclusions regarding which budget share equation is 87 linear in expenditure and which is not is not necessarily unexpected. However, what we can conclude is that both provide evidence against the AIDS model as a system specification. A test of the null hypothesis that 2 is not different from zero in all the budget share equations is strongly rejected (Chi2 = 72.56, p=0.0000). Table 5.6 Tests Endogeneity of Expenditure, and for the Statistical Significance of Lambda in the demographically-extended QUAIDS Model Tests based on the budget share equations for each commodity Expenditure Endogeneity Tests Significance of QUAIDS Nonlin. AIDS LA/AIDS Commodity Lambda t-tests (p-val.) t-tests (p-val.) t-tests (p-val.) 11’ tests (p-val.) Grains 12.42 (0.0004) -0.05 (0.964) -0.11 (0.911) 0.87 (0.385) Meat and Fish 0.01 (0.9501) -1.53 (0.126) -1.75 (0.081) -2.60 (0.009) Fruits and Vegetables 0.73 (0.3929) 1.17 (0.241) 1.06 (0.288) 1.81 (0.071) Dairy 12.37 (0.0004) 0.31 (0.753) -0.02 (0.986) -0.67 (0.502) Oils, butter, other fats 1.97 (0.1600) -0.99 (0.321) -1.12 (0.261) -0.68 (0.498) Sugar 11.99 (0.0005) -O.24 (0.807) -0.83 (0.405) -0.09 (0.929) Other 50.94 (0.0000) 1.07 (0.284) 1.93 (0.053) 1.12 (0.263) System-wide (all equations) test x2 (p-value) 72.56 (0.0000) 5.03 (0.5398) 8.22 (0.2226) 10.22 (0.1158) The next step in the analysis in this section is to test for the exogeneity of total household food expenditure in the QUAIDS model. As discussed in chapter three, expenditure endogeneity is adjusted for using the control fimction approach, which 88 involves augmenting each of the budget share equations in the demand system with the residuals from the reduced form regression for expenditure. Once the reduced form residuals have been augmented to the budget share equations, testing for exogeneity becomes straightforward, as it involves simply testing for the statistical significance of the coefficient on the residuals. We follow Blundell and Robin (1999) and include only the residuals from the reduced form for In x, using income and income squared as instruments for In x. Results of the Chi2 tests for expenditure exogeneity in the QUAIDS-estimated budget share equations of the individual food groups are reported in the second column of Table 5.6. As is clear from these results, statistical evidence against expenditure endogeneity in the individual budget share equations is very weak, much weaker than was found with OLS-based tests. The results of the test of the null hypothesis that expenditure is exogenous across all budget share equations in the demand model are presented in the last row of Table 5.6. This test gives a Chi2 statistic of 5.03 (p=0.5398), implying that in the case of system estimation of the budget share equations, it may not be necessary to control for the expenditure endogeneity. Thus, based on these results, if one were to test the null hypothesis of expenditure exogeneity in the individual budget share equations using tests based on OLS estimations, there would be a higher likelihood of rejecting the null hypothesis than if one used tests based on the nonlinear QUAIDS estimation. The third and fourth columns of Table 5.6 also report results of the tests for expenditure endogeneity based on the AIDS and LA/AIDS models. These will be discussed in detail in the latter sections. For now, it suffices to mention that exogeneity tests in the nonlinear AIDS model lead to the same conclusion as tests based on the QUAIDS model—that 89 expenditure is not endogeneous across all the budget share equations. Tests based on the LA/AIDS model provide some evidence, although weaker (x2 = 10.22, p = 0.1158), Of expenditure endogeneity. To summarize the results in this section, we found that both the LM and LR specification tests support QUAIDS, as opposed to the AIDS, model. Tests for expenditure exogeneity based on the QUAIDS estimation led to failure to reject the null hypothesis of expenditure exogeneity. Hence, based on these test results, our preferred estimates are the results of the QUAIDS model without adjustment for expenditure endogeneity. However, given the findings from the OLS-based tests that expenditure is endogenous, we also report the results of the endogeneity-adjusted QUAIDS model. Maximum likelihood parameter estimates of the QUAIDS model are presented in Table A3 at the end of this chapter. The third column reports parameter estimates of the endogeneity-unadjusted QUAIDS model, while the fifth column reports those for estimated from QUAIDS controlling for expenditure endogeneity. Only 9 of the 28 price effects are significantly different from zero at the 10% significance level, suggesting that there is not much quantity response to movements in relative prices, possibly due to the level of aggregation in the commodity groups. Most (34 out of 49) of the coefficient estimates on the demographic variables are statistically different from zero. Households with large sizes consume more grains and dairy products while their small-sized counterparts consume more meat and fish and fruits and vegetables. These results are as expected, given that grains provide a relatively cheap source of calories compared to such foods as meat and fish, and given that large households are likely to have more children who consume milk and other dairy products. Formal tests of the effects of household 90 demographic characteristics and time effects on expenditure patterns are presented in Table 5.7. As is clear from these results, the null hypothesis of no demographic effects is strongly rejected. The null hypothesis that preferences have remained stable over time (i.e., that there was no structural change) is also rejected, as shown by highly significant year dummies. The month of the survey is also significant across the budget share equations, indicating the importance of seasonality in food purchase and consumption patterns. Table 5.7 Results of the Wald Tests for Demographic effects, Structural Change, and Seasonality 12 Degrees of freedom p-value Demographic effects 1244.97 24 0.0000 Structural change (aggregate time effects) 125.08 12 0.0000 Month/seasonality effects 45.24 6 0.0000 The interpretation Of price and income effects is best discussed in terms of elasticities. Estimates of expenditure elasticities based on both the endogeneity-adj usted and endogeneity-unadjusted QUAIDS models are reported in Table 5.8. We focus in this section on the elasticity estimates from the endogeneity-unadjusted models. All expenditure elasticity estimates are statistically significant at less than 1% level. The estimates are all positive, as would be expected for broadly defined commodities like the ones considered here. Hence, all of the seven food groups can be classified as normal, 91 which implies that their demand increases as total household expenditure increases. Meat and fish are luxuries, with expenditure elasticities in excess of unity. Table 5.8 Expenditure Elasticities estimated from QUAIDS Models with and without Endogeneity Adjustments Commodity Expenditure Elasticity Endog. Adjusted Grains 0.3881 0.3881 (0.1254) (0.1254) Meat, fish 1.1363 1.1363 (0.1326) (0.1326) Fruits, vegetables 0.9739 0.9739 (0.1095) (0.1095) Dairy 0.6509 0.6509 (0.1518) (0.1518) 0115, butter, fats 0.5572 0.5572 (0.0989) (0.0989) Sugar 0.2924 0.2924 (0.0886) (0.0886) Other 2.9761 2.9761 (0.1818) (0.1818) Standard errors in parentheses Table 5.9 presents estimates of the own price elasticities. Estimates of all own and cross price elasticities are reported in Table A4 at the end of this chapter. As can be seen from Table 5.9, all estimates of the own price elasticities are highly significant (at 1% level). These estimates are negative, as expected. Apart from dairy, all food groups are either (approximately) unitary elastic or price elastic based on the estimated Marshallian price elasticities. This indicates the degree of responsiveness that households have to changes in the prices of these foods. However, when only the substitution effects are considered, grains and meat and fish become less price elastic, as shown by the inelastic compensated own-price elasticities. 92 Table 5.9 Own-price elasticities estimated from QUAIDS models with and without endogeneity adjustments Marshallian/uncompensated own-price elasticities Endog. Adjusted Grains -1.0296 -1.0458 (0.0742) (0.0856) Meat, fish -l.1077 -1.1058 (0.1586) (0.1943) Fruits, vegetables -0.9803 -0.9739 (0.0957) (0.0967) Dairy -0.8143 -0.8238 (0.0788) (0.0977) Oils, butter, fats -l.0604 -l.0679 (0.0745) (0.0755) Sugar -1 .0463 -l .0477 (0.0749) (0.0842) Other -4.6426 -4.9186 (0.6605) (0.7423) Hicksian/compensated own-price elasticities Grains -0.9032 -0.9330 (0.0408) (0.0456) Meat, fish -0.8455 -0.8312 (0.0306) (0.0342) Fruits, vegetables -0.8087 -0.8082 (0.0193) (0.0217) Dairy -0.7715 -0.7842 (0.0100) (0.0118) 0115, butter, fats -1.0307 -1.0334) (0.0053) (0.0064) Sugar -l.0320 -1.0333 (0.0043) (0.0053) Other -4.3462 -4.6264 (0.0181) (0.0210) Standard errors in parentheses 93 5.4.1. The effects of controlling for expenditure endogeneity Given the findings from the OLS-based tests for expenditure endogeneity, we also report results of the QUAIDS model with adjustment for expenditure endogeneity. The fifth column of Table A3 at end of this chapter reports the parameter estimates of the endogeneity-corrected QUAIDS model. The estimates of most interest in comparing the two QUAIDS specifications (one with, and the other without endogeneity adjustment) are the expenditure elasticities. Expenditure elasticities estimated from endogeneity-adjusted QUAIDS model are reported in the second column of Table 5.8. These estimates remain virtually unchanged across all equations after controlling for expenditure endogeneity. Their signs are the same, and their magnitudes and standard errors are very similar. The own-price elasticity estimates computed from the endogeneity-adjusted QUAIDS model are reported in the second column of Tables 5.9. As can be seen, controlling for expenditure endogeneity does not change the estimates of the price elasticities significantly. Given the finding that expenditure is not endogenous in the QUAIDS model, our preferred estimates are those from QUAIDS without endogeneity adjustment. 5.4.2. The effects of home production To determine the possible effects of home production on expenditure patterns, the elasticities were also estimated excluding the households who reported engaging in home production. The elasticity estimates are reported in Table A5 (Part I) at the end of the chapter. We compare these elasticity estimates with those estimated from QUAIDS with demographic variables. Apart for the meat and fish food group, the elasticity estimates 94 are similar to those obtained when all households are used. The exclusion of households who report home production changes the classification of meat and fish from luxuries to necessities. However, these changes in magnitudes are also the result of the elimination of a large number of observations from the main dataset. About 31% of the observations have nonzero own production. The elimination of this large number of households is expected to change results, so that these changes cannot all be attributed solely to own production. 5.4.3. The effects of attrition In addition to the attrition tests of section 5.2, we investigate the possible effect of attrition on demand parameters by estimating the elasticities using data on all households, including those who attrited. The elasticities are estimated using the QUAIDS model with demographics. The results are reported in Table A5 (Part II) at the end of the chapter. Compared these estimates with the panel data-based estimates, it can be seen that, apart from dairy, the expenditure elasticities estimated using the two datasets are very similar. The own-price elasticities, particularly the Hicksian elasticities, are also very similar. These findings are consistent with those from the attrition tests in section 5.2. 5.4.4. The effects of excluding the quadratic expenditure term This subsection examines the impact of excluding the quadratic expenditure term in the demand model. As explained in chapter three, the exclusion of the quadratic expenditure term from the QUAIDS model results in the AIDS model, so that the AIDS is nested within QUAIDS. If no assumptions are made about the translog price 95 aggregator, 1n a(p) (equation (3.14), chapter 3), then the AIDS model is nonlinear in parameters, which makes it hard to estimate empirically. However, once the QUAIDS model has been estimated, estimation of the AIDS model is straightforward, because this requires simply restricting the coefficient on the quadratic expenditure term to zero. In this subsection, we estimate the nonlinear AIDS model, and compare elasticities computed from it with those computed from the QUAIDS model. Table A6 at the end of this chapter presents the maximum likelihood parameter estimates of the AIDS model. Estimates of the AIDS expenditure and own-price elasticities are reported in Table 5.10 and Table 5.1 1, respectively. Table 5.10 Expenditure Elasticities estimated from LA/AIDS and AIDS models with and without endogeneity adjustments Nonlinear AIDS Endog. Adjusted Endog. Adjusted Grains 0.8237 0.7937 0.8288 0.8325 (0.0138) (0.0372) (0.0137) (0.0346) Meat, fish 1.1482 1.2671 1.1434 1.2153 (0.0185) (0.0494) (0.0183) (0.0452) Fruits, vegetables 0.8783 0.7993 0.8827 0.8408 (0.0176) (0.0471) (0.0174) (0.0432) Dairy 1.1799 1.2438 1.1704 1.1720 (0.0378) (0.1024) (0.0374) (0.0940) Oils, butter, fats 0.6846 0.7333 0.6899 0.7627 (0.0270) (0.0771) (0.0268) (0.0704) Sugar 0.5800 0.5860 0.5874 0.6378 (0.0249) (0.0727) (0.0246) (0.0657) Other 1.7038 1.5950 1.6900 1.5208 (0.0383) (0.1049) (0.0379) (0.0955) Standard errors in parentheses 96 Estimated expenditure elasticities are all significant at the 1% level. Given the finding in the previous sections that the QUAIDS elasticity estimates are also significant at the 1% level, the choice between QUAIDS and AIDS expenditure elasticity estimates cannot be based on their statistical significance. On average, the AIDS expenditure elasticity estimates are larger than the QUAIDS estimates, which is consistent with the QUAIDS model not allowing for adequate curvature in the expenditure response of households. On average, the expenditure elasticity estimates differ by about 53% between the two models, with most of the AIDS estimates being larger. The largest differences occur in the elasticity estimates for grains, dairy, and sugar food groups, with expenditure elasticity estimates from the AIDS model being about 113%, 80%, and 101%, respectively, larger than those based on the QUAIDS model. Exclusion of the quadratic expenditure term has a larger effect on the magnitudes of expenditure elasticities than price elasticities. This is to be expected, given that the main difference between the AIDS and QUAIDS models lies in the specification of the expenditure term. These are also the three food groups whose budget share equations were found to require the quadratic expenditure term based on both the LM and LR tests. This partly explains why exclusion of the quadratic expenditure term has the largest effect on elasticity estimates of these food groups. Estimates of the own price elasticities are reported in Table 5.11. The differences in the magnitudes of the estimated own price elasticities are substantial also in the grains and sugar. The expenditure and own-price elasticities do not change significantly when correction is made for endogeneity. This is to be expected because the null hypothesis of expenditure exogeneity was rejected in the AIDS model (Table 5.6). 97 Table 5.11 Own-price elasticities estimated from LA/AIDS and AIDS models with and without endogeneity adjustments Marshallian/uncompensated own-price elasticities Nonlinear AIDS Endrg. Adjusted Endog. Adjusted Grains -0.8828 -0.8451 -0.9279 -0.9274 (0.0384) (0.0593) (0.0356) (0.0395) Meat, fish -1.0822 -1.0364 -1.0996 -l.1129 (0.0461) (0.0532) (0.0472) (0.0476) Fruits, vegetables -0.8694 -0.8169 -0.9015 -0.8935 (0.0348) (0.0486) (0.0339) (0.0346) Dairy -0.8545 -0.8441 -0.8797 -0.8775 (0.0784) (0.0810) (0.0774) (0.0778) Oils, butter, fats -l.0293 -1.0522 -1.0668 -1.0845 (0.0747) (0.0828) (0.0733) (0.0761) Sugar -0.8986 -0.8973 -0.9455 -0.9312 (0.0613) (0.0614) (0.0622) (0.0641) Other -0.8830 -0.9873 -1.2065 -1.2240 (0.1094) (0.1387) (0.1029) (0.1037) Hicksian/compensated own-price elasticities LA/AIDS Nonlinear AIDS Endog. Adjusted Endog Adjusted Grains -0.6146 -0.5866 -0.6580 -0.6563 (0.0368) (0.0500) (0.0350) (0.0357) Meat, fish -0.8172 -0.7440 -0.8357 -0.8325 (0.0466) (0.0600) (0.0471 ) (0.0470) Fruits, vegetables -0.7146 -0.6761 -0.7459 -0.7454 (0.0340) (0.0429) (0.0338) (0.0338) Dairy -0.7769 -0.7623 -0.8027 -0.8003 (0.0786) (0.0828) (0.0773) (0.0774) Oils, butter, fats -0.9928 -1.0131 -1.0300 -1.0439 (0.0745) (0.0809) (0.0731) (0.0751) Sugar -0.8704 -0.8688 -0.9169 ~0.9001 (0.0613) (0.0615) (0.0623) (0.0650) Other -0.7133 -0.8284 -1 .03 82 -1.0726 (0.1108) (0.1459) (0.1031) (0.1051) Standard errors in parentheses 98 5.4.5. The effect of imposing linearity Nonlinearity in the AIDS model is caused by the translog price index, In a(p) (equation (3.14), chapter 3). It is common in applied demand analysis to linearize the AIDS model by replacing this translog price index with a share-weighted price index. Moschini (1995) develops an index which enhances the approximation abilities of the linear AIDS model compared to the commonly used indices (such as Stone’s, Laspeyres, and Paasche indices). The replacement of the translog index with Moschini’s or any of these linear indices leads to a model that is linear in parameters, which greatly simplifies empirical estimation. In fact, the only existing theory-based food demand study in South Africa that considers an exhaustive list of food commodity groups similar to the one considered in this study uses the linearized AIDS model (LA/AIDS model). This subsection examines the effect on the price and expenditure elasticities, of imposing linearity on the AIDS model. To accomplish this, an LA/AIDS model is estimated with the translog index replaced by Moschini’s price index, and then the LA/AIDS elasticity estimates are compared with those calculated from the nonlinear AIDS and QUAIDS models. The estimates of the LA/AIDS expenditure and own-price elasticities are reported in the first two columns of Table 5.11. The expenditure elasticities estimates and standard errors of the nonlinear AIDS and LA/AIDS are very similar; on average, the elasticity estimates differ by less than 1% and the standard errors by less than 2%. However, the differences in the magnitudes of the elasticity estimates from the two models are larger (about 5%) when expenditure endogeneity is corrected for. A possible explanation is the finding of some evidence of 99 expenditure endogeneity in the LA/AIDS model. The differences in the magnitudes of own price elasticity estimates from the two models are also small. On average, the own price elasticity estimates from LA/AIDS tend to be larger than those from the nonlinear AIDS model. There is an important difference between the elasticity estimates from the endogeneity-corrected LA/AIDS and AIDS models: the estimates of the Marshallian and Hicksian own-price elasticities calculated from the LA/AIDS model are larger in absolute value than those computed from the nonlinear AIDS model for all food commodities. Hence, at least for the current sample, the linearization of the AIDS model leads to a downward bias in the estimates of own-price elasticities. Similar to the findings in previous section when nonlinear AIDS was compared with QUAIDS, the differences in the estimated expenditure elasticities between the LA/AIDS and QUAIDS models are largest for grains (112%) and sugar (108%). The reason for these large differences is that, in addition to excluding the quadratic expenditure term, the LA/AIDS model further imposes linearity on the price index, In a(p). As will be shown later, welfare measures that require estimates of expenditure elasticities will differ substantially, depending on which model (QUAIDS or AIDS) is used. 5.5 Rural-Urban and Income-Groups Differences This section examines whether or not there are any significant disparities in food expenditure patterns between rural and urban households, as well as among households in different income groups. We estimate demand models separately for rural and urban households, as well as for households in each of the three income groups. To create the income groups, the CPI-deflated income of each household is averaged over the three 100 panel years, and then households are ranked from lowest to highest based on their averaged incomes. Households are then divided into three income groups, with households in each income group comprising about one-third of the total sample. Although the inclusion of the quadratic expenditure term may be appropriate for the pooled data (i.e., pooled across all households), it may not be necessary when specific household groups are considered. For this reason, we start the analyses in this section by testing for whether or not the quadratic expenditure term is significant in the QUAIDS model estimated for each of the sub-samples (i.e., in each of the rural, urban, and income groups samples). As before, these tests are conducted in the endogeneity-unadjusted models. Results Of these tests, based on the statistical significance of lambda, are reported in Table 5.12. Similar to the findings with pooled data, evidence in support of the quadratic expenditure specification in the budget share equations for the individual commodities is mixed. However, when all the budget share equations are considered jointly, the evidence in favor of QUAIDS is robust across all the five sub-samples. An interesting observation about these test results is that there is a clear rural-urban difference, with the statistical evidence in support of QUAIDS stronger for the urban than for the rural sample. The statistical evidence in favor of QUAIDS tends to also be weaker in the individual income groups data than in the pooled data. A possible explanation is that because the role of the quadratic expenditure term is to capture the curvature of the expenditure responses of households, grouping households with similar incomes together homogenizes the sample, thereby playing the role that is supposed to be played by the quadratic expenditure term. Meenkashi and Ray (1999) found a similar result in India, where statistical evidence in 101 support of QUAIDS was weak when data on the individual Indian states were considered, but strong in the data pooled across states. Table 5.12 Tests for Quadratic Expenditure Specification based on the Statistical Significance of Lambda in the QUAIDS model Statistical Significance of Lambda: 1’ tests1 Commodity Rural Urban Low Middle High Grains 26.28 4.23 1.10 9.08 0.25 (0.0000) (0.0397) (0.2951) (0.0026) (0.6165) Meat, fish 11.30 32.75 1.34 1.73 6.78 (0.0008) (0.0000) (0.2473) (0.1886) (0.0092) Fruits, vegetables 0.97 1.1 l 0.01 0.01 2.17 (0.3245) (0.2923) (0.9861) (0.9775) (0.1404) Dairy 2.70 8.88 2.10 0.29 15.06 (0.1004) (0.0029) (0.1470) (0.5924) (0.0001) Oils, butter, fats 0.19 3.89 2.02 2.76 0.15 (0.6661) (0.0486) (0.1556) (0.0964) (0.7018) Sugar 12.52 3.11 6.53 8.84 4.08 (0.0004) (0.0777) (0.0106) (0.0029) (0.0433) Other 17.37 25.35 5.53 15.70 7.78 (0.0000) (0.0000) (0.0187) (0.0002) 0.0053 All equations 49. 10 64. 93 I 4. 20 28.04 32. 82 (0.0000) (0.0000) JD. 02 75) (0.0001) (0.0000) " p-values in parentheses Another interesting observation about these test results is that the statistical evidence in support of the inclusion of the quadratic expenditure term strengthens with income. The main result derived from the tests in Table 5.12 is that it is appropriate to include the quadratic expenditure term when estimating the demand model for each of the five sub- samples, and hence, that all of the analyses in this section are based on QUAIDS. Since it is possible that the results of the tests for expenditure endogeneity may change depending on the specific sub-sample being considered, the next task in our analysis is to F-tests for the endogeneity of expenditure in each of the five sub-samples. 102 Table 5.13 Samples Tests for Expenditure Endogeneity in the Rural-Urban and Income Groups Relevance of Instruments: F -tests Rural Urban Low Middle High Expenditure 59.94 77.68 14.40 14.81 16.71 (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) Expenditure Endogeneity: 12 tests based on QUAIDS estimationl Commodity Rural Urban Low Middle High Grains 0.06 30.86 1.36 0.27 1.07 (0.8121) (0.0000) (0.2443) (0.6012) (0.3012) Meat, fish 3.57 0.45 0.28 1.87 0.58 (0.0590) (0.5010) (0.5990) (0.1711) (0.4473) Fruits, vegetables 1.59 4.10 5.12 0.31 0.38 (0. 2075) (0. 0428) (0.0237) (0.5749) (0.5355) Dairy 0.01 14.65 1.14 0.01 0.20 (0.9417) (0.0001) (0.2857) (0.9988) (0.6557) Oils, butter, fats 3.19 3.19 0.61 0.38 0.01 (0.0740) (0.0740) (0.4364) (0.5366) (0.9873) Sugar 2.09 7.58 2.52 0.66 1.62 (0.1479) (0.0059) (0.1 121) (0.4158) (0.2028) Other 3.09 0.60 0.39 7.51 0.39 (0.0786) (0.4372) (0.5302) (0.0061) (0.5328) All equations 9.89 51.06 10. 73 8.95 3.28 (0.1295) (0.0000) (0.0972) (0. I76 7) (0.7559) Overidentification: x2 tests Grains 0.10 3.43 0.19 2.57 1.11 (0.7423) (0.0640) (0.661 1) (0.1089) (0.2919) Meat, fish 0.05 14.72 1.41 1.30 1.71 (0.05) (0.0001) (0.2343) (0.2540) (0.1912) Fruits, vegetables 0.95 11.23 0.37 32.46 0.43 (0.3290) (0.0008) (0.5439) (0.0000) (0.5106) Dairy 2.75 1.83 2.46 3.33 1.18 (0.0971) (0.1759) (0.1 166) (0.0682) (0.2771) Oils, butter, fats 0.12 2.28 0.16 3.99 0.04 (0.7299) (0.131 1) (0.6895) (0.0457) (0.8482) Sugar 5.58 9.57 4.65 6.65 0.04 (0.0182) (0.0020) (0.0310) (0.0099) (0.8410) Other 4.39 4.08 2.40 1.12 7.37 (0.03 84) (0.0433) (0.1215) (0.2904) 10.0066) p-values in parentheses 103 The results of the tests for the relevance of income and income squared as instrumental variables for expenditure are reported in the top part of Table 5.13. Consistent with the previous findings based on the pooled sample, total household income explains a significant portion of the variation in total household food expenditure, so that assuming the endogeneity condition is satisfied, income can be used as a valid instrument for expenditure. Since the QUAIDS model has already been estimated, testing for expenditure endogeneity simply requires augmenting the residuals from the reduced form regressions for expenditure to each of the budget share equations in the demand model, and testing for their statistical significance. The middle part of Table 5.13 reports results of these )8 tests, first in the individual budget share equations, and then across all budget share equations in the QUAIDS equation system. Overall, statistical evidence against the null hypothesis of expenditure exogeneity is very weak in the sub-samples. It is only in the urban sub-sarnple that statistical evidence strongly supports the alternative of expenditure endogeneity. The null of expenditure exogeneity is also rejected on the low income sub-sample, albeit weakly so (p = 0.0972). Similar to the findings with pooled data, results of the overidentification tests, (reported in the bottom portion of Table 5.13) are mixed in the individual budget share equations. There appears to be a problem with exogeneity of the instruments in the urban sub-sample, which, unfortunately, happens to be the sub-sample that requires correction for expenditure endogeneity. There does not appear to be problems with the exogeneity of instruments in the budget share equations estimated using the low income sub-sample. 104 In accordance with the preceding model specification and endogeneity test results, the analyses that follow will be based on the QUAIDS model, with correction for expenditure endogeneity in urban and low income sub-samples. The maximum likelihood parameter estimates of the QUAIDS model for rural and urban samples are reported in Table A8 at the end of this chapter. Given that the finding of expenditure endogeneity in the urban sample, we report parameter estimates for both endogeneity-corrected and uncorrected models for the urban samples in Table A8. From the parameter estimates, it can be seen that in both rural and urban areas, larger-sized households consume more grains and less meat and fish and fruits and vegetables. The estimates of expenditure elasticities are presented in Table 5.14. For some food groups, the difference in the estimated expenditure elasticities between rural and urban samples is quite substantial. For urban households, a 1% increase in total food expenditure leads to only 0.02% increase in the quantity consumed of meat and fish. It is very different with the rural households, where the same 1% expenditure increase leads to about 1.73% increase in consumption of meat and fish. This is one of the reasons why it is necessary to examine expenditure patterns of rural households separately from urban households. The finding that expenditure on grains is less responsive to increases in total household expenditure in the rural areas than urban areas is counterintuitive. The reason for the unresponsiveness of grain purchases to expenditure increases among rural households could be due to their engagement in home production (90% of households who reported nonzero home production reside in the rural areas). Endogeneity correction does not significantly impact the expenditure elasticity estimates of urban households. 105 Estimates Of all price and expenditure elasticities are reported in Table A9 at the end of this chapter. Table 5.14 Estimated Expenditure Elasticities: Rural-Urban and Income Group Differences Rural Urban Low Middle High Endog.Adjust Endog.Adjust Grains 0.0662 0.9338 0.2122 0.1294 0.8626 (0.1524) (0.1937) (0.6568) (0.2370) (0.2566) Meat, fish 1.7359 0.1164 1.8501 1.5624 0.3554 (0.1650) (0.1863) (0.6344) (0.2632) (0.2754) Fruits, vegetables 1.0300 1.0548 0.7247 0.8229 1.2257 (0.1285) (0.1743) (0.5033) (0.2139) (0.2135) Dairy 0.9512 0.7034 0.3841 1.0100 0.0249 (0.1991) (0.2277) (0.5209) (0.3445) (0.2626) Oils, butter, fats 0.8042 0.3323 0.4615 0.4310 0.7425 (0.1 194) (0.1822) (0.3627) (0.1429) (0.2365) Sugar 0.2423 0.1524 0.0332 0.1945 0.7116 (0.1051) (0.1664) (0.3796) (0.1425) (0.1448) Other 2.6091 3.1482 3 .4620 3.2054 2.6957 (0.2251) (0.2942) (0.8175) (0.3584) (0.3541) Standard errors in parentheses We focus now on the income-group differences. The maximum likelihood estimates of the QUAIDS model for each income group are presented in Table A10 at the end of this chapter. As in the rural—urban samples, households with larger sizes consume more grains and less meat and fish and fruits and vegetables in all the income groups. Estimated expenditure elasticities for each of the income groups are also presented in Table 5.14. Grains are a necessity across income groups, while meat and fish are luxuries in the low and middle income groups. High income households spend more of their total expenditure increases on fruits and vegetables than middle and low income households. 106 Again, these differences in demand behavior reaffirm the need for disaggregated analysis of expenditure patterns by income groups in a country with high income disparities like South Africa. Nevertheless, the estimated expenditure elasticities are as one would expect a priori. Given the budget share of oils and fats is about the same among all income groups (approximately 6%), the finding that low income households respond to total expenditure increases by consuming more 0118 and fats is of policy interest, given the likely negative effects if the consumption levels of these products exceed levels recommended for healthy diets. Correction for expenditure endogeneity in the low income sub-sample has the largest impact on grains and dairy. Interestingly, expenditure endogeneity was not found to be a problem in these commodities. Table 5.15 reports estimates of the Marshallian and Hicksian price elasticities. Own-price elasticities are all negative as expected. Rural households are less responsive to increases in prices of grains and meat and fish, than rural households. This behavioral pattern is also shared by high income households in comparison to low and middle income households. The response to the price increase of meat and fish food group makes the differences in the demand behavioral patterns among the rural~urban and income group more vivid. High income households respond to a 1% increase in the price of meat and fish by decreasing consumption by 0.59%. This is very different than low income households, for whom the same 1% price increase leads to a decrease of about 2.48% in the consumption of meat and fish. This further underscores the need to undertake disaggregated analysis of demand behavior in a South Africa. 107 Table 5.15 Estimated Marshallian and Hicksian Own Price Elasticities Marshallian Own Price Elasticities Commodity Rural Urban Low Middle High Endog.Adjust Endog.Adjust Grains -1.0083 -0.7474 -0.8889 -1.1924 -0.8770 (0.1077) (0.0998) (0.1868) (0.1987) (0.0668) Meat, fish -1.9868 -0.5248 -2.4796 -1.6434 -0.5898 (0.3020) (0.1889) (1.4364) (0.4451) (0.1550) Fruits, vegetables -1.0614 -0.9598 -0.9468 -0.8690 -1.2072 (0.1287) (0.1568) (0.2343) (0.1374) (0.2764) Dairy -0.9247 -0.7245 -0.9815 -0.7310 -1.1427 (0.0993) (0.1383) (0.3596) (0.1742) (0.3081) Oils, butter, fats -1.1463 -0.8409 -0.9035 -1.2419 -0.9912 (0.0918) (0.1440) (0.1712) (0.0966) (0.1720) Sugar -1.0731 -1.1547 -1.l357 -1.0992 -0.9203 (0.0905) (0.1502) (0.4463) (0.1241) (0.1075) Other -2.8749 ~5.1792 -7.2 895 -5. 1092 -3 .9698 (0.5953) (1.1354) (3.6996) (1.3480) (1.2688) Hicksian Own Price Elasticities Grains -0.9840 -0.5 191 -0.8111 -1 . 1471 -0.6515 (0.0559) (0.0474) (0.2406) (0.0828) (0.0671) Meat, fish -l.6203 -0.4934 -2.0963 -1.2967 -0.4964 (0.0348) (0.0502) (0.1314) (0.0584) (0.0724) Fruits, vegetables -0.8751 -0.7835 -0.81 19 -0.7226 - l .0054 (0.0232) (0.0291) (0.0937) (0.0381) (0.0351) Dairy -0.8719 -0.6640 -0.9617 -0.6669 -1.1407 (0.0110) (0.0196) (0.0268) (0.0219) (0.0215) Oils, butter, fats -1.1040 -0.8227 -0.8781 -1.2191 -0.9524 (0.0063) (0.0100) (0.0200) (0.0076) (0.0123) Sugar -1.0597 -1.1492 -1.1337 -l.0890 -0.8952 (0.0058) (0.0061) (0.0221) (0.0075) (0.0051) Other -2.6709 -4.7331 -7.0282 -4.8478 -3.5879 (0.0176) (0.0417) (0.0617) (0.0292) (0.0502) Standard errors in parentheses 108 5.6 The Problem of Observed Zero Expenditures For reasons such as purchase infrequency and corner solution outcomes from agents’ optimization problems, it is common to observe zero budget shares for some commodities in household expenditure data. Estimation of a demand system with a large percentage of zero expenditure shares may lead to inconsistent parameter estimates unless econometric techniques appropriate for such data structures are used. As indicated in chapter four, the problem of zero expenditure shares is severe for the dairy commodity (see Table 4.6). Apart from the dairy commodity, the percentage of observations with zero expenditure shares is very low for the other six commodities (4% at most). Hence, it would not be appropriate to model the budget shares for these commodities using econometric techniques meant for data structures with large number of zeros in the dependent variable. In the pooled sample, about 14% of the dairy budget shares are zeros. Non-purchase of dairy products is higher among rural households (18%) and among households in the lower income brackets (24% for low-income households and 12% for middle-income households). Hence, adjustment for zero expenditure shares is made in the estimation of the dairy budget share equations for these household groups. In this study, the adjustment for zero expenditure shares follows the two-step procedure proposed by Shonkwiler and Yen (1999). The Shonkwiler and Yen procedure is carried out in two steps. In the first step, a single equation probit model is estimated to compute the probability and the cumulative density values. The dependent variable in the probit models is a binary variable taking a value of one if positive purchase occurs and zero otherwise. The independent variables are the exogenous variables income and income-squared, household demographics that 109 enter the demand model (household size, urbanization, and education of household head), year dummies and survey month variable. Four additional variables not included in the demand model are included in the probit models, namely household size-squared, gender and age of household head, and a dummy variable for household ownership of a refrigerator. In the second step, equation (3.30) is re-specified with the following changes made: (1) the budget shares of the dairy commodity are multiplied by the cumulative density values (