S ubmitted to M ichigan State University i n partial fulfillment of the requirements f or the degree of T his dissertation consists of three chapters concerning both empirical studies and esti- mation mytholog ies of the discrete choice model s in the area of demand estimation. The first chapter is a pure e mpirical study of estimating Chinese outbound tourism demand under a discrete choice model framework. The second chapter considers a mixture discrete choice model in which consumers have unobservable and heterogeneous choice sets and proposes a corresponding two - step mixture estimation approach . The third chapter contains a set of simulation studies regarding the two - step mixture approach proposed in the second chapter . More specifically, the first chapter implements a discrete choice approach to estimate the determinants of Chinese outbound tourism demand after year 2004, since when Chinese citizens could travel to most major overseas destinations without political restrictions. Start- ing from travelers' utility specifications, this chapter implem ents basic linear regressions to estimate Chinese tourists' sensitivity to the cost of travel and other characteristics of the destinations. The price and income elasticities are estimated as well. Th is chapter also pro- poses a strategy to quantify the welf are gains of Chinese tourists from the opening of Tai- wan (to mainland China) as a new destination. The second chapter proposes a two - step mixture approach to estimate discrete choice sets are viewed as different consumer types. Each type of consumers has distinct cri teria on the attributes of products according to which their choice sets are formatted. After as- suming the choice set formation process, the choice sets distribution and preference param- eters can be jointly estimated by a two - step mixture approach. A key i nsight is that the approach can be applied to store level data. While having individual level data is not a must , it can provide guidance on the formation of choice sets. The effectiveness of the proposed mixture approach is demonstrated via a set of Monte Carlo simulations and three empirical applications on markets of milk, potato chips and hotdogs using the IRI marketing data. The third chapter is a follow - up of Chapter 2 and is based on more simulation studies . In this chapter I review the data generat ion process (DGP) of my mixture model, discuss the failure of another estimation method which depends on the BLP - type inversion under my DGP setup, and then conduct Monte Carlo simulation experiments to examine the va- lidity of the two - step mixture approach and demonstrate its superiority over other tradi- tional estimation methods under various scenarios. iv I would like to express special thanks to my family, especially my wife for all their love and t rust. They gave me everything they had to help me achieve my goal. I would not have been able to finish this dissertation without their support. v TABLE OF CON TENTS ................................ ................. ................................ ................................ ................................ ....... ................................ ............. ................................ ................................ ............... ................................ ................................ ......... ................................ ................................ ................................ ........ ................................ ................................ ............. ................................ ................................ ................................ ..... ................................ ................................ ................................ ....................... ................................ ................................ ................................ ............. ................................ ....................... ................................ ................................ ................................ ..... ................................ ................................ ................................ ....... ................................ ................................ ........................ ................................ ................................ ........... ................................ ................................ ............................. ................................ ................................ .......... ................................ ................................ ......................... ................................ ................................ ............................. ................................ ................................ ................. ................................ ................................ ....... ................................ ................................ .......... ................................ ................................ ...................... ................................ ................................ .......................... ................................ ................................ ............... ................................ ................................ .......................... ................................ ................................ ....................... ................................ ................................ ................................ ....................... ................................ ................................ ................................ ............. ................................ ................................ .......... ................................ ................................ ................................ ..... vi ................................ ................................ ................. ................................ ................................ ....................... ................................ ...................... ................................ ............................ ................................ ................................ ......................... ................................ ................................ ..................... ................................ ................................ ................................ ..................... ................................ ................................ ................................ ............ vii Table 1.1: New ADS R ecipients by Y ear ................................ ................................ ..... 20 Table 1.2: Number of Chinese Tourists' Arrivals in Destinations ................................ 21 Table 1.3: Descriptive Statistics of China ................................ ................................ .... 22 Table 1.4: Descriptive Statistics of Destinations ................................ .......................... 23 Table 1.5: Geographical Variables of Destinations ................................ ...................... 24 Table 1.6: Estimates Results of Model 1 ................................ ................................ ...... 25 Table 1.7: Estimates Results of Model 2 ................................ ................................ ...... 26 Table 1.8: Estimates Results of Model 3 ................................ ................................ ...... 27 Table 1.9: Estimated Elasticities (Year 2014) ................................ .............................. 28 Table 2.1: Monte Carlo Results I: Varying Number of Markets ................................ .. 73 Table 2.2: Monte Carlo Results II: Varying Size of Choice Set ................................ ... 74 Table 2.3: Product Statistics ................................ ................................ ......................... 75 Table 2.4: Product Features ................................ ................................ .......................... 76 Table 2.5: Market Demographics ................................ ................................ ................. 77 Table 2.6: Individual - level Purchasing Behavior --- Market of Milk ............................. 78 Table 2.7: Summary Statistics for Selected Products --- Market of Milk ...................... 79 Table 2.8: Results: Demand E stimation --- Milk ................................ ............................ 80 Table 2. 9 : Individual - level Purchasing Behavior --- Market of Potato Chips ................ 82 Table 2.1 0 : Summary Statistics for Selected Products --- Market of Potato Chips ......... 83 Table 2.1 1 : Results: Demand estimation --- Potato Chips ................................ ................ 84 viii Table 2.1 2 : Individual - level Purchasing Behavior --- Market of Hotdogs ..................... 86 Table 2.1 3 : Summary Statistics for Selected Products --- Market of Hotdogs ............... 87 Table 2.1 4 : Results: Demand Estimation --- Hotdogs ................................ ...................... 88 Table 3.1: Monte Carlo Results III: Varying Values of Choice Set Determi n an t a Variabl e ................................ ................................ ................................ ..... 107 Table 3.2: Monte Carlo Results IV: Weak Instrument Variable ................................ 108 Table 3.3: Monte Carlo Results V: Misspecified Cutoff Point ................................ ... 109 1 CHAPTER 1 E STIMATING CHINESE OUTBOUND TOURISM DEMAND : A DISCRETE CHOICE APPROACH 1.1 Introduction T raveling abroad has become more and more popular among Chinese people. In the last two decades, stimulated by the progressively political liberalization, the improving res- idents' income and living standards, the changing and diversifying socio - cultural values, China has witne ssed an exponential growth in both the number and expenditure of its citi- zens' international travels. During 2014, Chinese outbound tourists reached 107 million 1 , spending 164.9 billion dollars overseas (World Bank, 2016), making China the largest source of world outbound tourism. As the economic and social factors that have facilitated the growth remain positive in the long term, Chinese outbound tourism is still in the early growth stage. Given the increasing economic, socio - cultural and environme ntal impacts of Chinese outbound tourism on the destinations, it is of great importan ce to analyze the determinants of its demand. However, in the existing literature, published studies on Chinese outbound tourism demand are relatively small, and the numbe r of studies that employed econometric approaches is even more limited. Lim and Wang (2008) implemented ARIMA technique to model the number of Chinese outbound tourists to Australia. However, as other studies based on the time - series models, it could not a ccount for the economic and social factors 1 Tourists to Hong Kong and Macao (special administrative regions of China) accounts for more than 60 percents of the total Chinese outbound tourists. This study excludes these two destinations because the travel pattern to them is inconsistent with assumpt ions on which the discrete choice model is based. Fortunately, it does no harm to excludes them due to a nice property of the discrete choice model. 2 that cause variations in tourism flows. Lin et al.(2015) implemented the ARDL framework to model the main factors that affect Chinese outbound tourism for 11 major destinations. But the demand of each destination was estimated separately , which doesn't entitle one to analyze how potential travelers choose their destinations. Eilat and Einav (2004) applied discrete choice estimation to a large three - dimensional data set of tourist flows. They fo- cused on the total in flows and grouped the destinations as High GNP destination and Low GNP destination. This paper follows the framework of Eilat and Einav (2004) and focus on China, esti- mating its outbound tourism demand utilizing data from year 2005 2 to 2014. One can view C hinese residents as consumers, the world as a market of differentiated products, and the destination countries as discrete choices. Starting from travelers' utility specifications, this paper implements basic linear regressions to estimate Chinese tourists ' sensitivity to the cost of travel and other characteristics of the destinations. The price and income elasticities are estimated as well. This paper makes a first step, and proposes some extensions to be made in the future. Random coefficients can be int roduced to the utility specification to allow more heteroge- neities among tourists, making the discrete choice model more flexible (Berry et al. (1995)). Micro - level data can be used along with the market - level data to yield more reliable esti- mates results (Petrin, 2002). Finally, welfare analysis can be conducted. Due to historical and political reasons, Chinese mainland citizens could not travel to Taiwan until late 2008. Based on utility 2 From 2005, Chinese citizens could travel to most major overseas destinations (excludes US and Taiwan, whi ch became available in 2008) without political restrictions. 3 specification, the discrete choice approach makes it possible to qu antify the benefit gains of Chinese tourists from the opening of Taiwan as a new destination. This paper proposes a simulation - based method for such welfare analysis. The paper is organized as follows. Section 1.2 provides a brief background of Chinese out bound tourism. Section 1.3 introduces the discrete choice model and the empirical strat- egy. Section 1.4 explains how the variables are chosen and constructed. Estimates results are presented in Section 1.5 . Section 1.6 proposes future research directions a nd Section 1.7 concludes. 1.2 The Development of Chinese Outbound Tourism Although it has large size today, outbound tourism is still a recent phenomenon in China. Chinese government only allowed its citizens to travel abroad for official, business and ed ucation reasons between 1949 and 1990. Few Chinese residents traveled to foreign destinations during that period. After 1990, China started to allow leisure outbound travel and gradually relaxed its outbound travel policy. Chinese residents could take leis ure tours organized by the Chinese Travel Service (CTS) to Malaysia, Singapore and Thailand in 1991. In 1995, the Chinese National Tourism Administration (CNTA) Approved Destination Status (ADS) policy. Chinese travel agencies selected b y govern- ment were permitted to organize package tours to countries that had received ADS. The ADS agreement allows Chinese travel agencies to apply VISAs for entering a foreign des- tination in behave of all members of a tour group. By legitimizing overseas leisure travel, facilitating the process of obtaining a visa, and providing package tours, ADS agreements laid down the foundation for the explosion of Chinese outbound tourism. Countries 4 receiving ADS agreements each year were listed in Table 1. 3 In addit ion to the ADS policy, many other events contributed to the growth of Chinese outbound tourism. In late 1999, the "golden weeks" was implemented. The holiday weeks gave people enough time to make long - distance travels, successfully stimulated not only dome stic but also outbound tourism industries. In 2001, China entered the WTO. Together with the commercial convenience, the cooperation of international tourism systems also got strengthened. In 2004, China signed the "Memorandum of Understanding"(MoU) with Schengen Area countries, which assured an easier way to apply for short duration VISAs and made it possible to visit multiple Europea n destinations without specific limitations for Chinese tourists. In 2008, China and United States reached an agreement and signed the MoU for tourism. In the same year, Taiwan started to allow leisure tourism for Chinese mainland residents, indicating tha t most major tourist destinations in the world were open to Chinese residents. The above political liberalizations, together with Chinese residents' improving income and living standards, changing and diversifying socio - cultural values , result in the bloom of Chinese outbound tourism. Chinese outbound tourists increased from 4.5 million in 1995 to 107 million in 2014, with an annual growth rate of 18.15%. The outbound tourism ex- penditure increases from 3.7 billion to 164.9 billion dollars with an annual gro wth rate of 22.12% (World Bank, 2016). China has outnumbered United States and Germany in both nu mber of outbound tourists and amount of their expenditures, making itself the largest source of world outbound tourism and still presenting great potential to continue growing in the future. 3 Note all ADS recipients are countries. 5 1 . 3 The Discrete Choice Model As mentioned above, for each year, Chinese outbound tourism can be treated as a mar- ket of differentiated products, while the different destination countries can be viewed as discrete choices. S tarting with the simple conditional logit model, assume the conditional utility of consumer i , resident of China, from travelling to destination country j in year t , is given by: where is a k - vector includin g different observed characteristics of country j , and can either be fixed (e.g. distance to China, language...) or vary across years (e.g. price level, GDP per capita...). is country j 's unobserved characteristic and can include the attrib- utes tha t attract tourists but are hard to quantity (e.g. resorts' attractiveness). is indi- vidual taste error and assumed to be distributed i.i.d. across individuals and destination countries, and over time. is a k - vector of parameters to be estimated and is assumed to be identical across individuals and time. Based on Equation ( 1. 1), consumer i chooses the destination to maximize his utility. The probability of choosing traveling to country j is where is the number of destination choices in year t . Alternative zero represents the outside option of not travelling abroad. In this application, the outside option can be either to travel dome stically or not to travel at all. While the domestic travel data is not available, these two cases cannot be distinguished. Without loss of generality, the utility of the outside option is normalized to zero in each market. Assume the follows the type I extreme value distribution (whose CDF is 6 ) , then the individual taste errors can be integrated out, and becomes: Furthermore, assume each resident makes his travel decision only once a year, which is, he travels abroad no more than once a year. Then becomes the predicted market share of destination j in year t , . After some transformation the following equation can be obtained Equation ( 1. 4) is the basic model to be estimated in section V, and OL S comes into the way naturally. However, in this application, may be of great importance. The destina- tions' unobserved characteristics or unquantifiable attractiveness of famous resorts (such as Statue of Liberty) play an important role in potential traveler's decision - making process. With the penal data set, the id entification power of the parameters of interest can be im- proved by introducing heterogeneity across destination countries. Assume where is a permanent component for country j and is a temporary shock which is inde pendent across countries and years. Then under ideal assumptions, Random Effect and Fixed Effect estimates can be implemented. One may want to include explanatory variables which depend only on individual i and year t , but don't vary across destinations in to the utility function. For example, year dum- mies (may interact with income) can be included to measure the utility gains from making an international travel for each year. While such variables can be added in simple reduced form models, in the structural model here, unfortunately, variables don't vary across j would 7 be canceled out in the market share computing procedure (in equation ( 1. 2)) and won't remain in equation ( 1. 4), indicating the corresponding marginal utility cannot be identified. Such limitat ion, together with the difficulty of interacting income with price, cause troubles when estimating the effect of Chinese residents' income growth on their outbound tourism behaviors. More detailed discussion about this issue will be made in section 1.4 . 1.4 Data and Variable Construction and published by the World Tourism Organizati on (UNWTO). The information was ob- tained on the basis of data supplied by each of the destination countries and therefore cor- responds to arrivals of Chinese resident tourists in these countries. Due to this data prepar- ing procedure, the information source s vary from country to country. Table 2 lists the number of Chinese resident arrivals to top destinations, ordered by the average number of sources. B efore continuing, ther e are some limitations on the use of this data which are worth multip le destinations and multiple trips with each one to a single destination. According to the model assumptions, each arrival is treated as a single choice. These would cause an overestimate of total outbound trips. An obvious example is Hong Kong and Macao. Since they are both Special Administrative Regions of China and not far from each other, Chinese tourists always make a bundle trip visiting both places. This is the main reason why Hong 8 Kong and Macao are excluded from this discrete choice estimation. Thi s problem also arises in the trips to some European countries that are relatively small in area and contigu- ous to each other, and in the trips to Asian countries such as Japan and South Korea (usually a cruise tour). But it is far less serious compared to the Hong Kong and Macao bundle. Single destination tourism is main style for Chinese outbound travelers. Second, the data cannot be classified into groups according to the purpose of travel. Chinese residents make outbound travels for different purposes: leisure tourism (sight - seeing, shopping, rec- reation and cultural activities, summer camp, honeymooning, etc.), business or public travel, visiting relatives or friends and other purposes. It would be more interesting and accurate if we could analyze the d eterminants for each purposes separately. Travelers of different purposes may have different sensitivities to different explanatory variables. For instance, leisure tourists may be more sensitive to price, while business and public travels may depend more on the economy of the destination country and the intensity of the eco- nomic relations between two countries. However, the data only documents the total number between tourists based on their different pur- poses. Eilat and Einav(2004) focused on the leisure tourism and proposed one strategy that might handle this issue: While the percentages of Chinese outbound tourists of different purposes are not available, the fractions of the total tourists (from all the worl d) arrivals to any destination countries for leisure purposes can be obtained. The leisure tourist arrivals can be approximated by multiplying the total flow and the fractions for each destination country, presuming that for each destination, the annual in tensity of leisure inbound tour- ism from Chinese visitors is identical of the annual intensity of leisure inbound tourism from all the world. This presumption is unrealistic here because Chinese outbound tourism 9 is still in the growing stage and Chinese tou rists behave differently from tourists of other (especially developed) countries. Such approximation may cause serious bias for some des- tination countries that is even worse than not to adjust at all . There are two other reasons why the data of total arriv als could be used without adjustment. First, there are not clear barriers between different purposes of travel. Travelers with business or public purposes can also conduct leisure activities (sight - seeing, shopping) during their stay in the foreign country sure behavior can have strong impacts on destination countries, regardless of their main es varies corresponding to their main travel purposes, such tastes variation can be introduced into the model by allowing random coefficients, which will be discussed in section 1.6 . Another shortcoming of the data source is some values are missing. Fortun ately, this is not a big problem. The missing values are mainly for small countries with quite small numbers of Chinese tourist arrivals. Besides, the conditional logit specification is quite in a specific year can be treated as part of outside options, and we can use available data for the estimation without worrying about creating biases. This pro perty also ensures the feasibility to exclude Hong Kong and Macao from the estimation. To create the market share for destination country, the number of arrivals is divided by the Chinese urban population, presuming that only Chinese urban residents make inter- national travels, which is quite reasonable. While Chinese rural residents account for nearly half of Chinese total population , their income is too low compared to the urban residents (see Table 1. 3 for details). 10 T urning to the explanatory variables, firstly we need variables that measure the cost of the international trips. This is not straightfo rward since the market of outbound tourism is very different from traditional markets (e.g. Automobile market) in which the products have clear prices. Travelers can choose their length of stay freely and data on length of stay is only available for a limi ted number of destinations. Also the travel cost depends largely on activities taken by travelers and can vary a lot. There is also opportunity cost for the time spent on outbound travel which is hard for quantification. Given such difficulties, it's almos t impossible to measure the cost of travel in money value. In this specific application, two variables are used ------ relative price of living and distance from China. The relative price of living is based on "the ratio of Purchasing Power Parity conver- si on factor to market exchange rate" (World bank, 2016) , which tells how many dollars are needed to buy a dollar's worth of goods in the country as compared to the United States, and is also referred as the national price level. The relative price is obtain ed by dividing the "price level" of destination country by the "price level" of China, hereafter denoted as , and can be interpreted as how many units of good a consumer has to give up in China in order to purchase one unit of good in country j . This var iable has a nice property that it captures the variation of real exchange rate over time and can be used for cross - sectional comparison at the same time. The distance to China is an important geographic variable. It's highly correlated with the transportat ion cost and the time needed for the travel. Weighted distance between major cities of China and the destination country (hereafter denoted as ) is used in this ap- plication (Source: The GeoDist Database) . 25 cities of China were used to compute the we ighted distances. 11 To control for the economical relation between China and destination country, two variables concerning international trade are used, which are the import and export shares of China's trade volume with the destination country out from Ch ina's total import and export volumes (Source: World Integrated Trade Solution 4 ). Travelers whose main purpose is business are more sensitive to these shares. On the other hand, as China imports from one country, that country's products are exposed to Chin ese consumers. The products may contain the history, culture and technology standard of the origin country. This may inspire Chinese consumers' aspiration to explore the country of origin. The variable used to describe the economy of destination is GDP pe r capita, PPP ad- justed (Source: World Bank, 2016 5 ), enter ing the utility function in its nature logarithm. In addition, two dummies are included: dummy for common language (whether speak Chi- nese) which can control for culture similarity, and dummy for common border. Table 1. 4 and Table 1. 5 show the variables discu ssed above for major Chinese tourist destinations. One may be interested of how the income level of potential travelers would affect their travel decisions --- whether to travel abroad and where to go. However, one limitation for this application is that Here the variables describing the price are relative price level and distance, indicating that However, as mentioned in the end of section III, if one adds the income into the reduced form to be estimated (Equation ( 1. 4)) without interacting it with variables that vary across destinations, it would violate the basic setting of the structural approa ch. Two different 4 The source of data for Taiwan is Bureau of Foreign trade, Taiwan 5 The source of data for Taiwan is National Statistics, Taiwan 12 adjustments are made in order to include income into the utility specification, the detail of which is in next section. 1.5 Estimation eferred as Model 1): w here includes all other explanatory variables. Table 1. 6 reports the estimated parameters from three methods: ordinary least squares (OLS), Ra ndom effects and Fixed effects. All the coefficients are of expected signs in all three estimations. For the OLS, all parameter estimates are significant except for coeffi- cients of relative price level and export share. The estimate of becomes significant once I include the destination heterogeneity and implement RE and FE. Coefficients of international trade suggest that imports have more influence than exports on Chinese tour- consuming imported products would inspire consumers to travel to the country of origin. As mentioned above, one may suppose the income level would affect potential travel- , one may care less about the cost. Utility function is modified (referred as Model 2) as an attempt to capture this intuition: w here 13 denotes the income of Chinese urban resident. More sophisticated estimate method is needed due to the variation of . Some simplification could be made if a representative consumer is used: w here is the per capita disposable income of Chinese urban households in year t , which went beyond 20000 Chinese Yuan since year 2011. With such simplification, linear estimations can be implemented. The estimates results are listed in Table 1. 7 OLS reports insignificant estimates of coefficients of price level. The RE estimat e is quite inter- esting: the estimate of is significant, indicating Chinese travelers are sensitive to the dest ination ' s price level when they have relatively low income. As their income increases, they pay less attention to the living cost of the des tination (the estimate of is less than the estimate of in absolute value and is insignificant). The FE estimate gives a similar result. The estimated coefficients of the distance are significant in both OLS and RE esti- mates, and decreases in ab solute value when income gets higher. The results suggest that Chinese travelers are sensitive to the transportation cost and opportunity cost of time, and that sensitivity gently decreases as they get higher income. Given the above evidence that income do affect travel choices, the utility specification can be adjusted to include income in a more natural way (referred as Model 3) in the con- venience of post - estimate analysis: 14 As the same simplification made in Model 2, a representative consumer is used and then equals to for all i . Table 1. 8 lists the results. The estimated coefficients re- lated to cost are significant in both RE and FE specifications. Based on this utility function, one can calculate the own - and cross - price elasticity: The price here refers to the relative price level, or real exchange rate, which can be affected by the variation of nominal exchange rate and price level of either country, China or the destination. The income elasticity can be calculated as a nd the i nco me elasticity of making an international travel is: Table 1. 9 reports the estimated own - price and income elasticities for year 2014. 1.6 Future Research Dir ection s A. Random Coefficients Estimation While the simple logit model is convenient for computation, there are some implausi- ble limitations. The heterogeneity of consumers only comes from the idiosyncratic logit 15 error, and this causes the "independence of irrelevant alternatives (IIA) " problem, which is, the ratio of the probability (market share) of two choices does not change depending on the set of choices that are available. For example, suppose that the market shares of Japan and United States were the same before Taiwan became an available choice. After Taiwan joins the market, IIA implies that the market shares of Japan and United States will still be th e same. However, intuitively, Japan's market share would decrease more than United States' since Japan and Taiwan are both close to China so that Taiwan is more likely to take over market share from Japan. The unrealistic substitution patterns caused by II A also show up in terms of cross price elasticties, violating the intuition that tourists who substitute away from a certain destination would be more likely to choose their new destination based on similar characteristics. The IIA problem can be eased by allowing random coefficients in the utility function, which makes the model more flexible. Travelers have individual - specific tastes for ob- served characteristics of destinations according to their main purposes of travel. Berry et al. (1995) proposed a met hod to estimate the random coefficients (mean and standard error). Petrin (2002) made an extension to augment market share data with information relating consumer demographics to the characteristics of the products they purchase. Income is also part of th e random coefficients, and one can expect that the way how income enter the utility specification matters the estimate results --- the coefficients, elasticities, and the wel- fare changes. B. Quantifying the welfare gains from the opening of new destination ------ Taiwan As an advantage compared with other models, the discrete choice model is based on the utility function, indicating that it is possible to conduct welfare analysis. Petrin (2002) 16 quantifies how the introduction of the minivan changed the tota l (consumer and producer) welfare (measured by compensation variation) in United States. Similar attempt (equiva- lent variation based) 6 can be made to estimate the benefits gained by Chinese (mainland) tourists from the opening of Taiwan as a new destinatio n choice. For a traveler to Taiwan, the welfare benefit is how much income he would like to give up to keep Taiwan as an available choice. To be more specific, this is a simulation - based analysis: 1) Make R random draws, each of which represents an indivi dual, and contains his personal - specific coefficients (including his income ) and his logit taste errors for all feasible choices ( ); 2) For each individual i , compute his utilities of all choices (the utility of choosing outside good is normalized to be zero) . He chooses travelling to Taiwan if that yields him the maximal utility, denoted as 3) For each individual whose first best choice is Taiwan, find his second - b est choice and the corresponding utility . This is the choice he would make if Taiwan is not available in the market. 4) Find such that . Given the utility is increas- ing in income, should be negative and is i ' s welfare benefit. 5) The sum of 's for all individuals whose first best choice is Taiwan is the total welfare gains for a population of R. Petrin (200 2 ) showed models estimated without micro data yield much larger welfare 6 Petrin (2002) his approach due to a problem related to outside goods. However, equivalent variations can be calculated, avoiding that problem. 17 numbers than the model using them and depend largely on the idiosyncratic logit taste error. This suggests consumer - level data is needed to make the welfare analysis more reliable. 1.7 Conclusions This paper uses a discrete choice approach to estimate the determinants of Chinese outbound tourism demand after year 2004, since when Chinese citizens could travel to most major overseas destinations without political restrictions. The des tinations are viewed as differentiated products. Given the specificity of the tourism market, the relative (to China) cost of living and the distance from China are chosen to measure the "price" of traveling to a destination. Attempts are made to include i ncome into the utility specification to examine how the change of income affect people's travel decision. Simple logit models and general linear regression estimates (OLS, RE, FE) are implemented. The estimates results indicate Chinese travelers are sensit level when they have relatively low income. As their income increases, such sensitivity decreases and becomes insignificant. Travelers prefer destinations that are close to home, and the preference gently decreases as their i ncome get higher. The estimated own - price and income elasticities of most popular destinations are reported. The results also support the intuition that tourists prefer destinations that are more developed, have similarity in culture (speak Chinese) and sh are a common border. The destination's economical rela- tionship with China is not likely among travelers' consideration. As for future research directions, Random Coefficients estimation can be implemented to ease the limitations of simple logit model. If a vailable , consumer - level data could be used along with the market - level data to yield more reliable estimate results. An equivalent - 18 variation - based method is proposed to quantify the welfare gains of Chinese tourists from the opening of Taiwan as a new des tination. 19 APPENDIX 20 A PPENDIX FOR CHAPTER 1 21 22 23 24 25 26 27 28 Destinations Own - Price Elasticities Income Elasticities RE FE RE FE South Korea - 0.82 - 0.58 0.10 0.09 Thailand - 0.39 - 0.27 0.10 0.13 Taiwan - 0.50 - 0.35 0.08 0.08 Japan - 1.01 - 0.71 0.14 0.14 United States - 1.02 - 0.72 0.36 0.50 29 BIBLIOGRAPHY 30 BIBLIOGRAPHY Arita, S ., S . Croix, and J . Mak (2012). How big? The impact of approved destination status on mainland Chinese travel abroad. Working Paper - University of Hawaii Economic Research Organization, University of Hawaii at Manoa 2012/3 . Berry, S. T. (1994). Estimating discrete - choice models of product differentiation. The RAND Journal of Economics , 242 - 262. Eilat, Y., and L. Einav* (2004). Determinants of international tourism: a three - dimen- sional panel data analysis. Applied Economics , 36 (12), 1315 - 1327. Hicks, J. R. (1945). The generalized theory of c onsumer's surplus. The Review of Eco- nomic Studies , 13 (2), 68 - 74. Jin, X., and Y. Wang (2016). Chinese outbound tourism research: A review. Journal of Travel Research , 55 (4), 440 - 453. Lim, C. (1999). A meta - analytic review of international tourism demand. Journal of Travel Research , 37 (3), 273 - 284. Lim, C., and Y. Wang (2008). China's post - 1978 experience in outbound tourism. Math- ematics and Computers in simulation , 78 (2 - 3), 450 - 458. Lin, V. S., A. Liu and H. Song (2015). Modeling and forecasting Ch inese outbound tour- ism: An econometric approach. Journal of Travel & Tourism Marketing , 32 (1 - 2), 34 - 49. Mayer, T., and S. Zignago database. National Bureau of Statistics of China , China Statistical Y earbook 2015 [ Electronic ] . Nevo, A . (2010). Empirical models of consumer behavior . No. w16511. National Bureau of Economic Research, 2010. Berry, S., J. Levinsohn, and A. Pakes (1995). Automobile prices in market equilib- rium. Econometrica: Journal of the Econometric Society , 841 - 890. Peng, B., H. Song, G. I. Crouch and S. F. Witt (2015). A meta - analysis of international tourism demand elasticities. Journal of Travel Research , 54 (5), 611 - 633. Petrin, A. (2002). Quantifying the benefits of new products: The case of the minivan. Jour- nal of political Economy , 110 (4), 705 - 729. Rodrigues, V, and Z . Breda. (2014) Chinese Outbound Tourism Market. 7th World Conference for Graduate Research in 31 Tourism, Hospitality and Leisure: 693 - 698 . World Bank (201 6 ). Indicators: Data. Retrieved from http://data.worldbank.org/indicator . World Tourism Organization (2016), Data on Outbound Tourism (calculated on basis of arrivals in destination countries) dataset [Electronic], UNWTO, Madrid, data updated on 10/01/2016. 32 CHAPTER 2 2. 1 Introduction The discrete choice model has long been utilized in the fields of economics and mar- keting to conduct demand estimation, understand consumer preference and purchasing be- havior, analyze and predict market shares of differentiated product s. As a structural model, the central idea of the discrete choice model is utility maximization ------ among a variety of differentiated products in a certain market, a consumer chooses one product that can give her the highest utility level. The traditio nal literature on discrete choice model (Train (2009) makes a comprehen- sive coverage on the topic) pays large attention on modeling the utility function of the considers purchasing, as exogenously given. In most empirical studies, researchers assign all consumers a universal choice set that contains all available products in a given market to estimate the model. However, the choice set chosen by the econometrician may not be genous because of a variety of reasons including but not limited to insufficient information, searching cost, time constrains, advertising exposure, commitment and so on. As in a paper presenting a formal analysis of the distributional structure of random utility models, Man- ski (1977) points out that a stochastic choice set formation model which assigns a realizing probability to an alternative choice set faced by a co nsumer is an essential part in modeling the choice decision process. It has been suggested by not only theoretical derivatives ( Eliaz 33 and Spiegler (2011), Masatlioglu, Nakajima and Ozbay (2012), Manzini and Mariotti (2014) , etc.) but also empirical exami nations ( Goeree (2008), Pires (2012), Paola and Marco (2013),Lu (2018), etc.) that ignoring the choice set heterogeneity can cause consid- (e.g. elasticities) that rely on them. T here are some kinds of literature dealing with the choice set heterogeneity. The first 7 preference param- eters can be consistently estimated based on a subsample which is drawn from the true choice set according to an appropriate probability distribution. Such subsample is denoted torical choices and lie trician. Lu (2018) proposes a similar estimating approach based on the bounds of choice by two observed sets, re- spectively, the largest and smallest possible choice sets, the bounds combined with a mon- otonicity property derived from utility maximization could imply a system of inequality restrictions on observed choice probabilities which ca n generate a set of moment conditions that could be used to identify and estimate the preference parameters. Such approaches have several limitations. Firstly, the construction of choice sets de- requirements on the data sets. Researchers must have individual - level panel data that contains a relatively long - time period to conduct such estimating methods. In many cases where only industry - level data 7 See, e.g., Fox (2007) and Crawfor d, Griffith and Iaria (2016). 34 is available, or the panel is not long enough, suc h approaches would become non - applicable. Secondly, such approaches need to assume a consumer has stable choice set in terms of products over time, which is that, if a consumer ever purchases one product, then this prod- set for all the time periods. This assumption is gen- erally true but would likely to be violated in certain circumstances. For example, if a con- sumer of breakfast bread usually shops grocery store in a hurry and only chooses from the products on display, h er choice set would not be stable in terms of products since the store always switch on - display items. Another limitation, which is the most fatal one, is that only the preference parameters can be consistently estimated, given satisfactory data and assum- i ticities and many other post - estimation implications which are cared the most by the indus- try decision makers. Only ranges can be predicted. Depending on how the sufficient sets (or the smallest and largest possible choice sets in Lu (2018) ) are constructed, the ranges can sometimes be considerably large and not instructive. A nother cate gory of existing approaches extends the random utility framework by modeling the choice set formation process and simultaneously estimate both choice set formation and preference parameters. 8 Swait and Ben - Akiva (1985) propose a constraint - based view of choice set formation and corresponding approaches to structuri se and pa- rameteri se choice set models, which have been widely adopted by empirical studies. The assumptions on choice set formation vary corresponding to different industries of interest and can be quite plausible in certain applications. For example, Georee (2008) focuses on 8 See, e.g., Ben - Akiva and Boccara (1995), Goeree (2008), Hortacsu and Syverson (2004). 35 the personal computer industry and specifies the choice set formation based on advertising exposure. The proba dustry - le vel sales data. However, the specification of choice set formation for a certain application is usually restrictive and generally cannot be duplicated in other industries. For instance, the assumption in Georee (2008) that the choice set formation is based on adver- tising exposure seems reasonable only for the industries in which the products refresh rap- idly and consumers have limited information on what products are available in the market. In addition, the more detailed the specification of the choice set formation the econometri- cian makes, the higher risk of misspecification she would face. T his paper proposes a mixture model in which the choice sets and preference parame- ters can be jointly estimated. The specific setup of the choice set formation process makes this paper different from the existing literature utilizing choice set formation models. Dif- ferent choice sets are viewed as different consumer types. Each type of consumer has dis- tinct criteria on the product attributes according to which their cho ice sets are formatted. A teria. I assume the type of consumer, which is equivalently the type of choice set, follows a multinomial distribution that is unknown to e conometricians and need to be estimated. A control function method is implemented to deal with the product heterogeneity in a two - step estimation process. need to be stabl 36 attributes, which is more realistic. For example, this setup is applicable to the situation in which one type of consumer chooses only from the products on display or be promote d (we would be indicator variables in the model). Like the Georee (2008) discussed above, my approach makes assumptions on the formation of choice sets. I propose a choi ce set for- mation model which is different from those exists and fits in with the reality better in certain circumstances. A potential problem is that when we add the number of attributes in the choice set formation process, the number of different choice s ets will grow exponentially, and it could make the estimation computationally demanding and increase the difficulty for identification. Basically, there is a tradeoff between increasing the accuracy of identi- fication and reducing the risk of choice set for mation misspecification. I propose several strategies to control the number of different choice sets types. Those strategies can be im- plemented flexibly depending on the specific industry that is focused on, which will be discussed later in this paper. T he mixture approach is applied to the IRI marketing data set for demand estimation. 9 The IRI marketing data set contains store level and household weekly panel data of prod- ucts (available in supermarkets) in 30 categories over 10 years. Specifically, I exami ne the approach on the markets of milk, potato chips and hotdogs. Comparing with the simple logit and BLP estimation methods that assume a universal choice set for every consumer, the mixture approach yields significantly different preference parameters (o n price) and distributing pattern of price elasticities across products. 9 I would like to thank IRI. For making the data available. All estimates and analysis in this paper, based on data provided by IRI. Are b y the author and not by IRI. 37 T he rest of the paper proceeds as follows: in section 2 .2 I describe the full setup of the model. Section 2. 3 discusses the identification and estimation method. Section 2. 4 reports the results of Monte Carlo simulations. An introduction of the IRI marketing data set and the results of three empirical applications are reported in section 2. 5. Section 2. 6 concludes. 2 .2 The Model The mixture model extends the basic discrete choice mod el by allowing for heteroge- was proposed by other researchers in the early time (e.g. Manski (1977) ) and it is reflected here in equation (2.6). The contribution of this paper is its specification of the choice set formation process and the corresponding estimation strategy. This section first introduces the primitives of the discrete choice model and then specifies the choice set formation pro- cess. A discussion about the difference between an existing representative choice set for- mation model and mine follows in the end. A ssume that a market lasts for T periods and consists of a set of differentiated products . n the market there are I consumers, each of which chooses one product from in each period. The indirect utility to consumer i from choosing product j (> 0) at time t is 38 The utility from choosing the outside option is where is a vector consisting of product attributes, consumer demographics and their interactions. is the random coefficient and can be written as w here is the mean part and is the random part for consumer i , , is standard normal error . and are vectors of parameters. 10 is the unobserved (to the researcher) product heterogeneity. is an i.i.d stochastic term following Type - I extreme value distribution across i , j (including 0) and t . Each consumer i chooses a product that gives him/her the maximal utility from his/her choice set . The choice indicator of whether consumer i chooses product j at time t conditional on his/her choice set is The basic discrete choice model assumes identical universal choice set for all consum- ers, which is for . Actually is heterogeneous and can be any sub- set of . An approach to proceed is to specify a choice set formation process wh ich gen- erates a probability distribution of the differentiated choice sets: where is the power set of . 10 See Berry, Levinsohn, and Pakes (2004) for details about the random coefficients logit model. 39 Define , the conditional probability of consumer i choosing a product j w here the integral is over the distributions of and , the summand is over all the different alternative choice sets. The existing models of choice set formation can be viewed as different specifications imposed on (2.5). In the following subsection I propose my spec- ification. In this model I assume the choice set formation process is based on certain product attribu tes cared by consumers while they are considering what to purchase. By hereafter I formation process is illustrated firstly by a simple example and then the gen eral formation process is proposed. 2 .2. 2. 1 A Simple Example I ample simple, assume there is only one choice set determinant attribute. vel panel data that in year 2011, 5.41% of milk consumers purchased fat - free milk only, while 34.86% of milk consumers only purchased non - fat - free milk. These shopping patterns suggest that different consumers may have dif- ferent choice sets: some consumers consider fat - free milk only, while some other consum- ers consider merely non - fat - free milk while making their purchasing decisions. Then we 40 can use fat content as the attribute that determines the types of choice sets and assume it is the only choice set d eterminant attribute. This attribute has two possible values: fat - free or non - fat - free (which includes reduced fat and whole milk). Then it can generate three types of choice sets: Type 1: consists of fat - free products; T ype 2: consists of non - fat - free pr oducts; T ype 3: consists of all products of any fat content (either fat - free or non - fat - free). The outside option naturally lies in each type of choice set. A consumer has a choice set out from the above three types with a corresponding nonnegative probabi lity. The three probabilities sum up to 1. The type of choice set is also the type of consumers who have that choice set. F ormally, assume there are D choice set determinant attributes. Denote the set that contains all choice set determinant attributes as } . Each element of represents one choice set determinant attribute. In this example D= 1 and . is the indicator of whether the milk product is fat - free and it has two possible values de- noted as The superscript m refers to the m - th possible value of the at- tribute . Then the different types of choice sets can be represented by a indicator vector . T he m - th ( m = 1, 2) element is related to the m - th possible value of . means a consumer of this type excludes products with out from his/her choice set, while means a consumer of this type includes products with in his/her choice set. For instance, if (the choice set of) a consumer is of type : means the consumer excludes products with out 41 from his/her choice set, which is to say, all non - fat - free products are not in his/her choice set; means the consumer includes products with in his/her choice set, whic h is to say, all fat - free products are in his/her choice set. To conclude, this consumer has a choice set consisting of merely fat - free products and belongs to type 1 as I discussed above. T he vector has possible non - zero values, each of which represents a type of choice set: T ype 1: , consists of fat - free products; T ype 2: , consists of non - fat - free products; T ype 3: , consists of all products with any fat content; T he case represents an empty choice set and is out of consideration since a consumer intending to buy nothing is excluded from the market. Notice that this is very different from the outside option, which naturally lies in every type of choice set. Let denotes the choice set corresponding to , and denotes the set con- sisting of all possible choice sets generated by . Then in this example: Define the probability distribution of the choice sets as: Then the choice set formation process has completed. 2. 2 .2.2 General case Follow the notations in the simple example above, assume there are D choice set de- terminant attributes. Denote the set that contains all choice set determinant attributes as } . Each element of represents one choice set determinant attribute. 42 Notice that usually has overlaps with , the variables ente ring the utility function. Further assume the number of possible values for each attribute is finite. 11 There are possible values for attribute : Then I use a vector of vectors, , to represent the type of differentiated choice sets. , the d - th element of , is an indicator vector consisting of zeros and ones corresponding to the choice set determinant attribute . Specifically, ) . F or , m eans a consumer of this type excludes products with out from his/her choice set, while means a consumer of this type includes products with in his/her choice set. Notice that if and it would be the simple exampl e discussed in the previous subsection. Let denotes the choice set corresponding to , and denotes the set con- sisting of all possible choice sets generated by . For each attribute , the corresponding can have different non - zero values. So, the combination of D attributes will generate types of choice set without other restrictions. This is also , the number of elements of . Define the probability distribution of the choice sets as: where the elements of are nonnegative and sum up to 1. There are some other choice set formation models which are preferred by the 11 T he choice set determinant attributes can also have continuous values, which would be discussed in section 3.1. 43 researchers in certain circumstance. Swait and Ben - Akiva (1987) propose d the framework of random constraint probabilistic choice sets models and described several examples. Here I illustrate a representative model which is utilized by Goeree (2008) in the demand esti- mation of the personal computer market. For simplicity I sup press the time subscription. The model assumes any subset of (equivalently, all elements of ) the probability: w here is the probability that product l is considered by consumer i, thus included with probability 1. Goeree (2008) interpreted the term as the information technolo gy which describes the effectiveness of advertising at informing consumers about products. It is given by: w here is a vector consisting of product attributes and consumer demographics, is the unobserved consumer het erogeneity, is a pre - assumed function form and is the corresponding parameter vector that needs to be estimated. Denote my choice set model which is described in (2.7) as Model 1 and the above model that is described in (2.8) as Model 2. The essent ial difference of the two models is procedure of how the choice sets are constructed. Model 1 firstly determines all types of choice sets according to criteria on selected attributes and then fills in the different choice sets with the products in the mark et. Model 2 first looks at the products in the market, the alternative choice sets are then determined as all subsets of the universal set that contains 44 all the products. Due to the distinct underlining choice sets structures, there are two differences bet ween the two models that are more explicit: The realizing probabilities of the alternative choice sets. In Model 1, the probabilities are assumed to be exogenous and will be estimated directly (although with some transfor- mations); While in Model 2, as illu strated in (2.8), the probabilities are determined by a function of variables concerning the products attributes and consumer demographics. In case the number of choice sets types is large in Model 1, it would have more parameters that need to be estimated than Model 2 and cause problems for identification. Some strat- egies of controlling the number of alternative choice sets are proposed in section 3.1.1. In addition, assuming a function form of the probabilities as in Model 2 is at the risk of mis- specifica tion. The number of alternative choice sets. In Model 1, the number of alternative choice sets are fixed across markets (periods) and can be controlled at a relatively small level. In Model 2, assuming there are J differentiated products in the market, the number of alterna- tive choice sets is and increases at exponential rate with respect to J . It would be ex- tremely computational demanding given a large value of J . 45 2. 3 Identification and Estimation 2. 3 .1.1 The number of alternative choice sets Let contains all the information of the choice set determinant attributes of all products in market at period t. U nder the choice set distribution (2.7) and considering that follows Type - I extreme value distribution, (2.6) can be rewritten as ( 2.10 ) T he foundation of jointly estimating preference and choic e set distribution is to match the choice probabilities as in ( 2.10 ) with the observed market shares of the products. The number of moments generated by equation systems ( 2.10 ) is the aggregate number of the differentiated products in the market over perio ds ( t ). This number can be surpassed by the number of choice set types if we specify large value of and grows at exponential rate, in which case the model cannot be identified. To make the identification possible, some strategies need to be taken before the esti- mation. One strategy is to divide the possible value of into groups and reduce the dimension of from to . For example, if is the engine size o f vehicle, we can group them into small, medium and large size. Now means a consumer with this type includes products with belong ing to the l - th group in his/ her choice set. Ex- tending this idea, we can also accommodate continuous attributes by dividing the values of this attribute into several intervals. The other strategy is to control the number of choice set 46 determinant attributes. The selection of choice set determinant attributes should depend on common sense or evidence from consumer pu rchasing records. T he example in section 2. 2.2.1 is a good application of the above two strategies. Firstly, the individual - level purchasing records show that some consumers only purchase milk products of certain fat content, suggesting that fat content is a choice set determinant at- tribute. Then the fat content is divided into two groups --- fat - free and non - fat - free. The - fat - only 3 types of choice set, together wit h 3 probability parameters to estimate. T he above two strategies are aiming at reducing the number of possible values for a choice set determinant attribute ( ) and the number of choice set determinant attributes ( ) , thus reducing the number of choice known parameters after all the choice sets are determined. since we have constructed all possible choice sets, it is possible that some choice sets will never realize or will only be the true choice sets of a very small part of the whole population. Consider the example of potato chips market and set the indicator of whether a product is on promotion as the choice set determinant attribute. Intuitively, there would be no consumers only considering prod- ucts that are not on promotion. If we have individual - chasing history, we can examine such intuition before estimating the model and assume those particular choice sets having zero probability to realize, which can reduce the number of unknown parameters needed to be estimated. 2. 3 .1.2 The product heterogeneity , but the producers and consumers do. The prices are very likely to be functions of unobserved characterist ics, and 47 this causes endogeneity problems. If price is positively correlated with the unobserved quality, the price coefficients (in absolute value) would be understated by estimation meth- ods that ignore the endogeneity. I nstrument variables naturally come in as a solution. Berry (1994) was the first to im- plement instrument variables methods to deal with the endogeneity problem in discrete choice models. BLP (1995) proposed a well - known estimation technique (hereafter re- ombines the contract mapping inversion and GMM for random coefficients preference models. I was hoping the BLP inversion would work under the setup of my model, however unfortunately it turned out not to be the case. 12 Alternatively, a control function met hod is implemented in my estimation procedure. I fol- low the spirit of Kim and Petrin (2010) 13 , constructing control variates to approximate the unobserved heterogeneities. The choice of instrument variables follows the idea of BLP. Assume any observed prod uct attributes except price are uncorrelated with unobserved heterogeneity . Denote as a vector contain ing the heterogeneities of all products in market t . Let be the set of observed characteristics except price for product j affect demand and costs in market t . Then the vector contains all observed elements in market t which are relevant to the determination of equilibrium price. The price for product j in market t is given by the price function : 12 A discussion ab out the invalidity of a BLP - inversion - based estimation approach will be proposed in Chap- ter 3. 13 Kim and Petrin (2010) attempted to address the endogenous prices while allowing for non - separability between observed and unobserved factors. 48 K im and Petrin (2010) showed that when prices are additively separable in the unob- serv ed factors, the price function can be written as: The term is the difference between price and its expected value conditional on observed exogenous factors. Write as the vector of residual s which act the role of the conditioning variables. The is one - to - one with . Then can be approximated by with a linear function. Equation ( 2.11 ) indicates that every observed characteristic of every product affects every price, implying that any product characteristic can be a valid instrument for any price. Thus, the number of instruments would be too abundant relative to the number of observa- (19 94) to derive three optimal instruments with respect to each observed product charac- teristics o! t he characteristic itself, the sum of the characteristic over products except itself from the same firm, and the sum of the characteristic over products from othe r firms. The selection of the control variates follows the similar procedure. The above steps of con- structing instruments and controls are also adopted by Kim and Petrin (2010). 2. 3 .2.1 Preparation Let denote the set consisting o f all types of choice set after the simplification strategies (discussed in section 3.1) have been taken at period t . And let R denote the num- ber of choice set types: . Order the elements of so the set can be written as: 49 ket. Notice that there is a subscript t for the choice sets. Although the criteria for the choice set formation is fixed, the choice set det erminant attributes of a product can vary over time, so a certain type of choice set can contain different products over time. The corresponding probability distribution is: Accordingly, the choice probability (predicted market share) of product j becomes: ( 2.15 ) Before we formally start the estimation, noticing that the distribution probabilities of the choice sets are restricted --- nonnegative and sum up to 1, the following transformation is needed: ( 2.16 ) where Let . 2. 3 .2.2 The two - step estimation Estimation for the mixture model proceeds in two steps. In the first step, I estimate the control variates from the pricing equation ( 2.12 ). In the second step, I implement the non - linear least squares to match the predicted choice probabilities given by ( 2.15 ) with 50 observed market shares treating the control variates as additional regressors. Fol low the discussion in section 2. 3.1.2, let contain all observed characteristics except price for product j , construct the instrument as: where is the set of products produced by the same firm that produces j . Then we define and run OLS of on to get the residuals . The control function is spec- ified as Define and the first step estimation ends at obtaining the estimation of the control variates , and the heterogeneity is estimat ed by the control function as: The second step is the non - linear least squares. Define a vector to contain all the 51 parameters that need to be estimated: Since ( 2.15 ) is an integral, the choice probability can be obtained via simulation: ( 2.20 ) where as defined in (2.3), is a set of random draws from standard normal. is the realizing probability of choice set as defined in ( 2.16 ). is the control function estimation of product heterogeneity from the first step as defined in ( 2.19 ). Follow ing the idea of matching the predicted choice probabilities with t he observed market shares, the least squares estimator can be obtained as: where is the market share observed from the data, is the predicted choice probability defined jointly by equation ( 2.20 ) and (2.3), ( 2.16 ), ( 2.19 ). R emark: In certain situations, we want to divide all possible values of a choice set determinant attribute into several groups to reduce the total number of choice set types. the distribution probabilities. This is attainable with the mixture approach. An example will be given in Chapter 3. 2. 4 M onte Carlo Simulations T his section introduc es the designs of Monte Carlo experiments and presents the 52 simulation results. The data set is generated by several steps. Firstly, construct the market of differentiated goods. Then define the consumers with preferences and choice sets. At last each consu mer chooses one product which gives the maximal utility level. The choices of consumers are then integrated into industry - level sales data, on which the mixture esti- mation can be implemented. Assume the market includes J diff erentiated products and last for T periods. Let be the product space, where zero represents the outside option. Assume the un- observed product heterogeneity to be . There are two product attributes: , ( j J ; t T ) where ; The product heterogeneity participates in the determination of , is the mean value of attribute for product j across periods, , is the time shock of for product j at time t . , is con- stant for product j across periods. I make m ake the choice set determinant attribute. instance, fat content for the milk market. Here the is a continuous variable, but there is absolutely no problem if it is a discrete variable, e.g. indicator variable. Also, i n practice it is fine for to vary over time. Assume there are I consumers which can be divided into three types in terms of choice sets. A percentage of consumers make their choices out from products with all possible values of (all products in market). Another percentage of consumers only consider products with . The rest percentage of ) consumers only 53 consider products with . Corresponding choice sets are denoted as . Here I set the cutoff point to be 0.5, the mean of so have the equal The utility to consumer i from choosing an inside product j at time t isd The utility from the outside option is w here is the random part of the coefficient of for consumer i , ; is an i.i.d stochastic term following type - I extreme value distribution across consumers, products (including the outside option) and times. E ach consumer chooses the option that yields the highest utility. Notice that is a parameter that affects the probability of choosing the outside choice. A higher means higher utilities for inside goods and yields a lower market share for the outside c hoice. Denote consumer choice set as , the choice indicator of whether consumer i chooses product j at time t is Then the market share of product j at time t is In this simulation study, I set . 54 Here I take the cutoff point of the choice set determinant variable , which is 0.5 in the setup, as already known. So, the structures of the three types of choice sets are deter- mined prior to the estimation. The estimation follows the two - step method discussed in section 3.2. Since the data generation process is known, the selection of instrument variables and control variates can be simplified. I use a very strong instrument for : Notice that is the part of excluding the heterogeneity . I n the first step, run OLS of on and get the residual as the control variate. Then in the second step, do the non - linear least squares while treating the residual from the first step as an additional regressor. The first set of simulations examine the convergence of the mixture estimator with various number of markets (periods), T . The number of markets directly affect the sample size. One additional market provides J (number of products) additional observations. In the simulations the structure of the choice sets is assumed to be known, which is to s ay, we know the choice set determinant variable is and the cutoff point is 0.5. The number of products ( J ) is fixed at 20. The true parameter values are . The results are presented in Table 1. The estimators given by the mixture estimation rage of 55 standard deviation. are precise to the true parameter values even with relatively small sample size. Looking at the first case T = 5: There are only observations, while the biases of all mixture estimators except are less than or around 0.01. In general, the biases become smaller T increases. There are a few counter - examples that the biases go up as the sample size inc reases, but the deviations are so tiny that the estimators can be used to calculate the market share and elasticities without concerns. The standard deviations behave in the way as expected. Larger sample size gives smaller standard deviation. The converge nce rate of the mixture estimation is satisfactory ----- the standard deviation of each mixture estimator except decreases by around 50% as the market periods increase from 5 to 25. The estimation result of is not as desirable as other parameters. Altho has a very small bias on aver- age, its standard deviation is quite large and goes down slowly as the sample size increases. When T = 50, the standard deviation of is still as large as 0.2477, while the true value of parame ter is 0.5. On the other hand, the simple logit estimation behaves much worse set heterogeneity. The adjusted R - square (labeled as ) is also listed in the tabl e. Note still can give some sense on how well the estimators can be used to predict the market shares. We can see that the mixture esti- mation has a much higher a nd more stable in all experiments, proving its superiority over the simple logit. In the next step, I examine the performance of the estimators by varying the size of the choice set, which is controlled ty the number of products ( J ) in market. The other 56 parameters are fixed at . The results are presented in Table 2. As the number of products increases, the sample size (number of observations) increases as well as the size of the choice set. In the mean- while, as shown in Table 2, the probability of outside option decreases. The reason is that chosen. The results show that as the number of products goes up, the average biases and sta ndard deviations of all preference estimators decrease, while the choice set probabilistic distribution estimators perform slightly worse. The reason might be that there are two forces interacting with each other as the number of products increases. On one hand, the increasing sample size leads to more precise estimators. On the other hand, the structures of choice sets become more complicated, thus increasing the difficulty for accurate estima- tion. The choice set probabilistic distribution estimators are m ore affected by the later force. Again, the mixture estimation presents significant superiority over the simple logit under whatever choice of the number of products. 2. 5 Empirical Application T his section introduces how the mixture approach is applied to estimate the demand the data source --- IRI marketing data set --- and then illustrate in detail the empirical ap- plications, respectively, the markets of milk, potato chips and hotdogs. 57 T he IRI marketing data set 14 contains store level and household panel data of products in 17 food and beverage categories and 13 non - food categories over the years 2001 - 2012. While the data set covers 49 markets all over the U.S, I focus on two cities, Eau Claire, WI and Pittsfield, MA, of which the individual level data is available. The household panel data is constructed by the surveys taken regularly by the households who sign contracts with the IRI company i n these two cities, containing households purchase history , specifi- cally the timing, store and products bought on every shopping trips to about 80 percent of all the local stores. (The other 20 percent of all stores were not covered by the IRI marketing da ta set.) The store level data was collected weekly on quantity sold , price and promotion information of each product . In addition, for each category, there is a description file re- cording the key attributes for all products. Table 3 and Table 4 list some general infor- mation for a subset of 9 food categories which are included in the IRI data set. Each product has a unique UPC number. By store/week/UPC combination, the above three data sources can be merged into one data set that includes quantity sold , price, promotion information and key attributes for all products in each store and each week. This is the aggregated - level data set on which the mixture estimation is implemented. More details will be provided in the following subsec- tions. T he data set a lso has information on demographic characteristics for each household in the panel teristics and provides a comparison with the U.S Census data. We can see that the sample 14 S ee Bronnenberg , Kruger, and Mela (2008) for an overview of the IRI data set. 58 populatio n is more urban, less diverse, slightly more educated and wealthier than the U.S. national average. A significant difference between the sample and the national average is sidering the fact that the corresponding IRI panelist generally do the shopping for the whole households, we can still say that the sample of panelists is a good representation of the population of markets customers. T he strategy of conducting empirical application is as follows. The first step is to ana- lyze the patterns of consumers purchase behavior using the household panel data. Then basing on the purchase patterns that are found in first step, determine the attributes of prod- uct that enter the choice set formation process. For instance, if we find that a significant percentage of consumers only purchase fat free milk, it would be reasonable to assume the fat content is an attribute that determines the choice sets types. The ne xt step is to apply the mixture approach on the store level data to jointly estimate the choice set distribution and preference parameters. The markets on which the empirical application focuses are milk, potato chips and hotdogs. To accommodate the requi rement of the discrete choice model that each consumer makes only one purchase decision in a certain period, assume a consumer visits a store one time per week and purchases at most one unit of product in all the three markets (milk, potato chips, hotdogs) . The evidence from the household panel data supporting the above assumptions are reported in the subsections. 59 2. 5.3.1 M arket of Milk Consider the market of milk in year 2011. I firstly analyze the consumer purchasing behavior using the individual level data. Table 2. 6 presents some summary statistics. For the year 2011, the individual level data set has 92,871 panelists who have 129,145 single shopping records. On average, each consumer visited a store 1.39 times per week, and purcha sed 1.52 units of milk per shopping trip. This shopping pattern suggests the assumptions of the discrete choice model are acceptable. T o analyze the consumers purchasing behavior related to product attributes, I use the records for which the individual lev el data and store level data can perfectly matched, and only consider consumers who had shopping records for more than 5 weeks in the year. This rules out those consumers who were unlikely in the market of milk and there are 1219 consumers left. The averag e number of different products ever purchased by each con- sumer was 7.4. Looking at the fat content attribute, 5.41% consumers only purchased fat free milk. And for 15.34% consumers, fat free milk made up more than 80% of their total milk purchasing records . On the contrary, 34.86% consumers never purchased fat free milk, and for 52.17% consumers non - fat - free milk made up more than 80% of their total milk purchasing records. The last row of Table 2. 6 considers the attribute of whether the product is on promo tion, which means there is a price reduction flag for the product (when the temporary price reduction is 5% or greater). T he above evidence makes it reasonable to assume that there are a group of consumers only consider fat free milk (or non - fat - free) whil e making purchasing decisions. Then I assume the distribution of the types of choice sets in the way that has been discussed in 60 choice set formation for the milk mark et, since merely 1.39% consumers only purchased the on - promotion product. Then I prepare the store level data for the estimation. The sample consists of milk sales records of the stores which are matched with the individual level data in year 2011. Define the outside goods to be the products of which the average weekly sales were less than 25 units. Then there are 106 different products left as inside goods and they made up 94% of the total milk sales and had 3,364 product - level weekly sales records. Table 2. 7 shows the summary statistics of some products that have the greatest average weekly sales. ole is an indicator of whether there are advertisements, coupons or rebates for the product. than the Let i and t denote the consumer and week, respectively. Assume the utility to consumer i of buying inside product j in week t to be w here denotes the per - gallon price; measures the package volume in gal- lon; , and are indicators for in - store display, advertising and price reduction flag; and are indicators of low - fat milk (fat content between 0.5% and 5%) and whole fat milk; is the unobserved product heterogeneity; 61 to are random coefficients with Normal distribution; is an i.i.d error with Type I extreme value distribution. The characteristics which are chosen to construct the instrument variables are and the constant term. The instrument variables and the control func- tion are formed by the way described in section 2. 3.3.2 as equation ( 2.17 ) and ( 2.18 ). Basing on the above setups, the mixture estimate approach is applied on the store level data. The estimate results are reported in Table 2. 8 , which lists the estimators of the means of preference coefficients and choice set probabilistic distributions, the estimators of the standard deviations of preference coefficients, coefficients in the control function, and other statistics of the estimation methods. For comparison, the results from simple logit as well. The BLP estimation approach was proposed by BLP (1994), it allows for random nobserved choice sets. approach gives much greater (more than twice in absolute value) mean estimator for the price coefficient than the simple logit does. Besides, the mixture estimation gives signifi- - mators for the coefficients of fat content indicators ( and ) become insignificant from simple logit to Mixture estimation. The reason is that the fat content is a choice set determinant attribute in the market of milk. It can - free 62 coefficients while not dealing with choice set heterogeneity will worsen the estimation re- sults. While the correctness of above statement needs to be further ch ecked, the BLP ap- proach does perform badly in the rest two empirical studies. T he lower part of Table 2. 7 lists the estimated the probability distribution of the choice sets types. The estimated probabilities of a consumer purchasing only from fat - free and non - fat - purchasing data (in Table 2. 6): For 15.34% consumers, fat free milk made up more than 80% of their total milk purchasing records; 34.86% consumers purchased only non - fat - free milk according to their milk purchasing records. We can see that the estimated probability distribution of the choice sets fit well into the empirical data. B asing on the estimators, I calculate the price elasticities. The lower part of Table 2. 8 reports the mean and standard error estimated own - price elasticities. We can see that the elasticities predicted by the Mixture model are much more elastic on average and have greater variation than those predicted by the simple logit model. This is n ot surprising since the estimator of the price coefficient given by Mixture estimation is more than as twice much as that given by the simple logit. And the result is consistent with the intuition that ignoring the unobserved product heterogeneity (which i s positively correlated to its price) would underestimate (in absolute value) the price elasticity. The result given by BLP lies in between the results of simple logit and Mixture estimation. The last row of Table 2. 9 lists the adjusted R - squared. The Mixt ure estimation has a much greater adjusted R - squared (.570) than the simple logit (.418) and the BLP (.470), 63 suggesting that the Mixture model does a better job of explaining the reality by modeling the choice set heterogeneity. 2. 5.3.2 Market of Potato Chips In this application I consider the market of potato chips for year 2010 - 2011. For this - size combination, which means the potato chips of different flavors are treated as the same product as long as they a re of the same brand and package size. This expediency of excluding the flavor attribute is made because of a large trouble since the prices and marketing activities (p romotion, display, advertising) of UPCs under the same brand - size combination usually move together. The UPC level variables is the sale - weighted average of the p (advertising / promotion) is an indicator variable that equals to 1 if any UPC within the product is on display (advertising / promotion). Table 2. 9 presents some summary statistics of the individual leve l purchasing records. In the year 2010 - 2011, there were 62,501 panelists who made 86,722 shopping trips in total. On average, each panelist made 1.38 shopping trips per week and purchased 1.35 pack of potato chips per trip. This shopping pattern supported the assumptions of discrete choice model that each consumer makes one shopping decision in one period and purchase at most one unit of product. Similar as the milk market, I keep the shopping records of consumers who purchased potato chips for more than 5 weeks in the two years to analyze their purchasing behavior related to product attributes. There were 737 consumers remaining in the sample. F or these 64 consumers who purchased potato chips frequently, on average the number of different po- tato chips products purchased by each of them was 8.5. From the lower part of Table 2. 10 products o 90% of their total potato chips purchasing records. If we lower the standard form the 90% to 80%, the percent of consumers even raises to 60.24%. T he corresponding statistics for othe r two attributes --- 2. 9 y and choice set determinant attribute. The other two attributes discussed above may also be the heir corresponding (18.32%). As discussed in previous section, there is a tradeoff between identification and the risk of misspecification. The next step is to prepar e the store level sales data for estimation. The sample consists of the sale records of potato chips from the store that matches the individual level data in year 2010 - 2011. Define the products with less than 10 average weekly sales to be the out- side goods . Then the inside goods consisted of 59 different products and made up 97.6% of the total potato chips sales. There were 3,503 product - level weekly sales records for estimation. Table 2. 1 0 lists the summary statistics of some products with the greatest ave r- age weekly sales. 65 We can see that among the top 7 best sellers there were only two brands --- nt marketing activities in the market. For all the products which were listed in Table 2. 1 0 , the probabil- ities of being on display and on promotion were more than 0.4. For some products the probability of being on display (/on promotion) were even greater than 0.9(/0.8). The prob- abilities of being advertised were not that high as those of the other two marketing activities but still much higher than that in the market of milk. The utility to consumer i of buying inside product j in week t is assumed to be w here denotes the per - 12oz price; measures the package volume in ounce; , and are indicators for in - store display, advertising and price re- is the unobserved product heterogeneity; to are ran- dom coefficients with Normal distribution; is an i.i.d error with Type I extreme value distribution. The characteristics which are chosen to construct the instrument variables are and the constant term. I construct the instrument variables and the control function following the way described in section 2. 3.3.2 as equation ( 2.17 ) and ( 2.18 ). Table 2. 1 1 report s the estimation results. The est imators given by the mixture approach 66 looks problematic: it gives a significantly p ositive estimator for the mean coefficient of Price (.542). At the same time, the BLP estimator of the coefficient of Promotion has ex- traordinarily large mean (2.557) and variation (2.843). The adjusted R - BLP estimation is merely .067. The above evidence supports the conjecture that allowing for random coefficients while not dealing with choice set heterogeneity would worsen the estimation results. logit es estimators given by both mixture and simple logit estimations are significant and of ex- pected signs. The mixture estimation gives insignificant estimator for the coefficie on the probability distribution of the choice sets. The insigni ficance with respect to the preference parameter can be interpreted as for those consumers who considered all potato significantly increase their utility level of purc hasing the product. Another point which worth noticing is that the mixture estimation gives much greater estimator for the coeffi- 0.662. This indicates that the simple logit would largely underestimate the effect of dis- to other brands. 67 The estim ated probability distribution of the choice sets is listed in the lower part of Table 2. 12. According to the estimation, 71.0% of consumers considered all potato chip products, another 27.7% of consumers only considered the potato chip products which were on promotion. The last type of choice sets, merely containing products which were not on promotion, had an insignificant estimated realizing probability, which is consistent with the intuition. The lower part of Table 2. 13 reports the estimated price elast icities and ad- justed R - squared. On average the mixture estimation gives more elastic result. The mean of own - price elasticity given by mixture estimation is - 1.320, for the simple logit it is - 1.158. Comparing the adjusted R - e estimation fits the data better. 2. 5.3.3 Market of Hotdogs This subsection analyzes the market of hotdogs in year 2011. Like the previous two markets, I firstly analyze the consumer purchasing behavior using the individual level data. Part of the result s are listed in Table 2. 1 2 . For the year 2011, there were 26,287 panelists who had 28,841 single shopping records. On average each consumer visited a store 1.1 times per week and purchased 1.6 units of hotdog products per shopping trip. Such shopping patte rn generally accorded with the as- sumptions of the discrete choice model. To analyze consumers shopping behavior related to product attributes, I only keep the shopping records of panelists who purchased hotdogs for no less than 3 weeks in the year since th e other panelists were not likely in the market of hotdogs. For the remaining 766 panelists , on average the number of different hotdog products purchased by each of them was 4.04. From the lower part of Table 2. 1 2 we can see that, similar to the market of potato 68 for the market of hotdogs. 19.58% consumers only purchased hotdog products which were ore than vacuum - packaged. On average, vacuum - packaged products made up 25% of minant attribute since only 2.48% (/ 3.13%) of consumers only purchased the hotdog prod- ucts which were on display (/ vacuum - packaged). In the next step I p repare the store level data for the estimation. The sample consists of sales records from the store - level data which were matched with the individual - level data in year 2011. There were 96 different products ever in the market in year 2011. Order the produ cts according to the average weekly sales from the greatest to the smallest, the products with the order over 65th are defined as the outside goods. The remaining inside goods made up 96.1% of the total sales and there were 2,854 product - level weekly sales records. Table 2.13 lists the summary statistics of some products with the top sales. best seller in the market also had the highest frequency of being on display, on promotion an d advertised. Among the top 8 best sellers there were 2 products that were vacuum - pack- aged. In the lower part of Table 2.13 , comparing the average value of different attributes among all products, top - 20 - sales products and top - 10 - sales products we can find a positive 69 The utility to consumer i of buying inside produc t j in week t is assumed to be w here denotes the per - 16oz price; measures the package volume in ounce; , and are indicators for in - store display, advertising and price re- duction flag; is an indicator of whether the product is vacuum - packaged. is the unobserved pro duct heterogeneity; to are random coefficients with Normal distribution; is an i.i.d error with Type I extreme value distribution. The characteristics which are chosen to construct the instrument variables are and the constant term. I construct the instrument variables and the control function following the way described in section 3.3.2 as equation ( 2.17 ) and ( 2.18 ). The estimation results are listed in Table 2. 14 . As before, the 3 columns report, respec- tively, the simple logit estimation, the BLP estimation approach and the Mixture estimation approach. Remember that in the previous empirical application of the potato chips market, the BLP estimation was highly questionable as it gave a sig nificantly positive estimator of the mean part of the price coefficient. In current case, the mean price coefficient estimator given by BLP estimation is - .085, much smaller (in absolute value) than those given by simple logit ( - .259) and the Mixture estim ation ( - .497). Intuitively the BLP estimation re- sult seems unreliable again. The Mixture estimation gives the largest (in absolute value) estimator of the mean part of price coefficient, suggesting that the other estimations would underestimate the price e lasticity for the hotdogs market, which has been proven in the 70 lower part of Table 2. 1 4 . Similar to the market of potato chips, the Mixture estimation gives insignificant esti- the key factor in the choice set formation process. The significance of the the insignificance with respect to the preference parameter can be i nterpreted as, for those consumers who considered all hotdog products in the market no matter they were on pro- the product. The estimators of both the mean coeffic negative in both simple logit and Mixture estimations, suggesting that on average consumer prefer small size and non - vacuum - packaged products. The middle part of Table 2. 1 4 reports the estimated probability dis tribution of the choice sets for the mixture estimation. According to the results, 72.2% of consumers con- sidered all hotdog products, another 26.5% of consumers only considered the hotdog prod- ucts that were on promotion. The last type of choice set that co ntains merely products which were not on promotion had a tiny estimated realizing probability of 1.3%, which is con- sistent with the intuition. The estimated probabilities roughly accord with the shopping pattern indicated by the individual level data. Look back at Table 14: 19.58% of consumers products made up more than 80% of their total hotdogs purchasing records. The adjusted R - 2. 1 4 . The mixture estimation has a higher ad- justed R - squared, suggesting the mixture model fits the data better. 71 2. 6 Concluding Remarks I n this paper I propose a two - step mixture approach to estimate discrete choice models with unobserved choice sets. The approach implements a control function method to deal with product heterogeneity and a choice set formation model to deal with the unobserved choice sets. The strategy of empirical study is to firstly assume the choice set formations and then jointly estim ate the preference parameters and the probability distribution of choice sets. This estimation approach is designed to be applied to the aggregate sales data. And when individual level data on consumer purchasing history is available, the purchasing patter ns of consumers in the sample can provide guidance on the choice set formations. The validity of the mixture model can be examined by comparing the estimated choice set distribution with the actual purchasing patterns of consumers. The effectiveness of the Mix- ture approach is demonstrated via Monte Carlo experiments. It is then applied to IRI mar- keting data in three markets, respectively, milk, potato chips and hotdogs and is shown to be useful in correcting the biases caused by assuming a universal choice set. Currently I have implemented the mixture estimation on the aggregate data, but this idea can be extended to the individual level data as well, which is an on - going project of my research. One of the empirical difficulties is that the individual level data and the ag- gregate sales data are usually not well matched since they are collected separately. I n addition, the traditional BLP estimation approach performs questionably in the em- pirical application part of this paper, which calls for further investi gation. 72 APPENDIX 73 A PPENDIX FOR CHAPTER 2 10 onte Carlo Results I: Varying Number of Markets T=5 T=25 T=50 mixture logit mixture logit mixture logit Bias .0616 - 1.2251 .0158 - 1.3063 .0160 - 1.2651 Sd.D .3514 .3597 .3013 .3348 .2477 .3503 Bias - .0084 .4973 - .0104 .5090 - .0099 .5025 Sd.D .1107 .0880 .0858 .0828 .0638 .0810 Bias - .0034 - 1.0167 - .0029 - .9628 .0011 - .9963 Sd.D .0947 .6330 .0407 .5370 .0281 .5554 Bias - .0001 - .0010 .0017 Sd.D .0424 .0188 .0125 Bias - .0154 - .0119 - .0063 Sd.D .0691 .0355 .0330 Bias .0029 .0039 .0023 Sd.D .0246 .0128 .0126 Bias .0125 .0080 .0040 Sd.D .0493 .0268 .0240 Ave .9947 .4428 .9949 .4670 .9949 .4831 Sd.D .0019 .1617 .0010 .1273 .0007 .1151 74 Monte Carlo Results II: Varying Size of Choice Set J=20 J=50 mixture logit mixture logit Bias .0158 - 1.3063 .0302 - 1.2707 Sd.D .3013 .3348 .1880 .2130 Bias - .0104 .5090 - .0055 .5071 Sd.D .0858 .0828 .0460 .0539 Bias - .0029 - .9628 .0011 - 1.0495 Sd.D .0407 .5370 .0365 .3751 Bias - .0010 .0009 Sd.D .0188 .0133 Bias - .0119 - .0155 Sd.D .0355 .0481 Bias .0039 .0058 Sd.D .0128 .0150 Bias .0080 .0098 Sd.D .0268 .0361 Ave .9949 .4670 .9900 .4353 Sd.D .0010 .1273 .0014 .0740 Ave. .1256 .0521 75 12 T able 2.3 15 : P roduct Statistics Unique Parent Companies Unique Vendors Unique Brands Unique UPCs Carbonated Beverages 232 249 538 10331 Cold Cereal 135 139 671 4402 Hot Dogs 207 233 367 1865 Margarine 53 60 113 533 Mayonnaise 124 129 166 744 Milk & Milk Products 280 357 636 8830 Peanut Butter 69 72 97 513 Salty Snacks 949 989 1762 18655 Soup 188 201 253 4899 Yogurt 126 146 331 4081 15 The statistics of Table 2.3 and 2.4 were summarized by Rider (2013). 76 13 T able 2. 4 : P roduct Features Category Number of UPCs Carbonated Beverages Total UPCs 10331 Lower Suger/Lower Calorie 27 Diet/Calori Free 2719 Cold Cereal Total UPCs 4402 No Sugar/Low Sugar 392 Fiber/Whole Grain Claim 3532 Hot Dogs Total UPCs 1865 Lower or Reduced Fat/Fat Free 171 Margarine Total UPCs 533 Low - Cal/Low - Fat/Healthy Oil 206 Mayonnaise Total UPCs 744 Lower Sugar 13 Low - Fat/Fat - Free 276 Milk & Milk Products Total UPCs 8830 Low - fat/Skim Milk 5254 Whole Milk 1200 Peanut Butter Total UPCs 513 Lower Sugar 114 Reduced Sodium/Sodium Free 83 Salty Snacks Total UPCs 18655 Reduced - Fat/Fat - Free/Light 3370 Lower/Reduced Sodium 478 Yogurt Total UPCs 4081 Low - Fat/Fat - Free 3549 Reduced/Low - Calorie 518 77 14 T able 2. 5 : M arket Demographics Variable IRI DATA (%) US Census (%) General Characteristics Percent Urban 85.2 80.3 Percent Over 65 Y.O. 11.9 12.4 Percent Female 51.2 50.9 Percent Women with Children 31.4 12.4 Race & Ethnicity Percent White 77.3 75.1 Percent Black 12.7 12.3 Percent Foreign Born 8.5 11.1 Percent Foreign Born: Latin America 3.6 5.7 Education Percent of Males with HS Diploma 26.7 27.6 Percent of Males with Bachelor's 18 16.1 Percent of Males with Master's 6.4 6 Percent of Females with HS Diploma 28.9 29.6 Percent of Females with Bachelor's 16.5 15 Percent of Females with Master's 6.1 5.8 Income Median household income $44,994 $41,994 %Households: Less than $10,000 8.5 9.5 %Households: $10,000 - $20,000 11.6 12.6 %Households: $20,000 - $35,000 18.8 19.4 %Households: $35,000 - $50,000 16.5 16.6 %Households: $50,000 - $75,000 20.6 19.4 %Households: $75,000 or more 24.1 22.5 78 Individual - level Purchasing Behavior --- Market of Milk Obs Mean Percent (%) Quan- tity 1 2 3 >=4 Number of weekly trips 92,871 16 1.39 (.72) 70.80 22.27 4.96 1.97 quantity purchased per trip 129,145 17 1.52 (.85) 61.94 29.08 5.71 3.27 different products per con- sumer 1,219 18 7.4 (3.50) % of total shopping records 100 >95 >90 >80 % of con- sumers Fat - free 1,219 5.41 6.32 19 9.84 15.34 Non - fat - free 1,219 34.86 39.62 46.60 52.17 On promo- tion 1,219 1.39 1.97 4.27 10.25 16 This is the number of panelists in the individual level data set. 17 This is the number of total shopping trips in the in dividual level data set. 18 This is the number of consumers for whom the individual level data set and the store level data set can be matched. 19 For 6.32% of consumers, Fat - free milk products made up more than 95% of their total purchasing records in marke t of milk. Other elements in the same part of the table are interpreted in a similar way. 79 16 Table 2. 7 : Summary Statistics for Selected Products --- Market of Milk Brand Fat content Vol (gallon) Ave. sales per week Ave. Price ($ per 1 gallon) Ave. Display Ave. Pro- motion Ave. Feature PRV 1% 1 1,002 3.16 0.00 0.15 0.06 KEMPS fatfree 1 992 2.71 0.00 0.45 0.09 KEMPS 1% 1 956 2.42 0.00 0.25 0.13 PRV 2% 1 949 3.35 0.00 0.17 0.06 PRV fatfree 1 937 2.97 0.00 0.15 0.02 KEMPS SE- LECT whole 0.5 887 4.20 0.35 0.73 0.04 KEMPS SE- LECT fatfree 1 797 3.45 0.00 0.04 0.00 KEMPS 2% 1 632 2.82 0.00 0.33 0.08 Market Aver- age of ALL 5.52 0.02 0.13 0.02 Top 20 3.53 0.05 0.18 0.03 Top 10 3.39 0.05 0.23 0.04 80 17 Table 2.8: Results: Demand E stimation --- Milk Logit BLP Mixture Parameter Variable Means Price - .349 (.012) - .628 (.031) - .831 (.158) Vol .040 (.079) - .493 (.182) .701 (.428) Display .527 (.119) - 3.934 (1.707) .325 (.251) Advertising .341 (.129) - .160 (1.536) .054 (.372) Promotion .261 (.055) .189 (.077) .064 (.193) LowFat - .365 (.0447) - .204 (.053) .022 (.294) Whole - .446 (.059) - .205 (.069) .217 (.287) _Cons - .138 (.111) 2.670 (.272) 1.483 (.870) Choice Sets Distribution Pr(whole set) .549 (.095) Pr(fatfree) .157 (.042) Pr( - fatfree) .294 (.092) 81 Logit BLP Mixture Parameter Variable Std. Deviations Price .000 (.644) .135 (.162) Vol 2.909 (.355) .640 (.350) Display 4.898 (1.460) 1.294 (.419) Advertising .937 (3.331) .479 (.546) Promotion .000 (6.734) .931 (.427) Control Function - .173 (.135) - .036 (.085) - .076 (.082) Price Elast. (%) Mean - 1.935 - 3.447 - 4.319 Sd. D. .693 1.263 1.413 Adjusted R - squared .418 .470 .570 82 18 Table 2. 9 : Individual - level Purchasing Behavior --- Market of Potato Chips 20 Obs Mean Percent (%) Quan- tity 1 2 3 >=4 Number of weekly trips 62,501 1.38 (.70) 70.02 23.67 4.55 1.76 quantity pur- chased per trip 86,722 1.35 (.69) 71.65 24.38 1.90 2.06 different prod- ucts per con- sumer 737 8.5 (3.93) % of total shopping records 100 >95 >90 >80 % of con- sumers On Display 737 2.85 2.99 6.38 16.96 On Promotion 737 18.32 21.85 36.77 60.24 LAYS 737 4.88 5.56 8.28 16.01 20 Refer to footnotes 16 - 19 (for table 2.6) for the interpretations of some elements in the table. 83 19 Table 2. 10 : Summary Statistics for Selected Products --- Market of Potato Chips Product Ave. Sales Ave. Price Ave. Dis- play Ave. Pro- motion Ave. Ad- vertising per week per 10oz LAYS - 10oz 467.64 3.37 0.7 0.82 0.45 LAYS NATURAL - 8.5oz 328.44 3.48 0.91 0.79 0.28 LAYS NATURAL - 10.5oz 286.87 3.09 0.66 0.64 0.43 OLD DUTCH - 10oz 239.27 3.09 0.41 0.82 0.36 OLD DUTCH - 8.5oz 209.21 3.4 0.39 0.8 0.12 LAYS KETTLE COOKED - 8.5oz 177.91 3.48 0.91 0.47 0.22 WAVY LAYS - 10.5oz 151.99 3.08 0.41 0.75 0.15 Market Average of ALL 3.31 0.44 0.64 0.12 Top 20 2.95 0.66 0.75 0.18 Top 10 3.14 0.69 0.85 0.24 84 20 Table 2. 1 1 : Results: Demand E stimation --- Potato Chips Logit BLP Mixture Parameter Variable Means Price - .347 (.023) .542 (.098) - .397 (.255) Vol - .067 (.007) .140 (.051) - .030 (.068) Display .662 (.042) 1.462 (.380) 1.207 (.111) Advertising .570 (.059) 1.152 (.202) .654 (.163) Promotion .612 (.044) 2.557 (.585) .000 (.003) Lays .639 (.043) 1.041 (.370) .870 (.201) _Cons .333 (.152) - 4.939 (.629) - .021 (.325) Choice Sets Distribution Pr(whole set) .710 (.084) Pr(promotion) .277 (.086) Pr( - promotion) .013 (.003) 85 ( ) Logit BLP Mixture Parameter Variable Std. Deviations Price .381 (.358) .001 (.133) Vol .067 (.117) .000 (.011) Display 1.207 (.780) .024 (.158) Advertising .237 (5.329) .033 (.235) Promotion 2.843 (.655) .003 (.027) Lays .256 (7.282) .000 (.110) Control Function - .040 (.398) .080 (.149) .201 (.148) Price Elast. (%) Mean - 1.158 1.929 - 1.320 Sd. D. .443 1.117 .506 Adjusted R - squared .266 .067 .400 86 21 Table 2. 1 2 : I ndividual - level Purchasing Behavior --- Market of Hotdogs 21 Obs Mean Percent (%) Quantity 1 2 3 >=4 Number of weekly trips 26,287 1.10 (.34) 91.34 7.79 .72 .15 quantity purchased per trip 28,841 1.60 (.98) 57.63 32.86 4.35 5.17 different products per con- sumer 766 4.04 (2.21) % of total shopping records 100 >95 >90 >80 % of con- sumers On Display 766 2.48 2.48 2.48 4.96 On Pro- motion 766 19.58 19.71 22.06 35.77 Vacuum 766 .65 .65 .78 1.17 21 Refer to footnotes 16 - 19 (for table 2.6) for the interpretations of some elements in the table. 87 22 Table 2. 1 3 : Summary Statistics for Selected Products --- Market of Hotdogs Brand Vol- ume Product type Ave. Sales Ave. Price Vac- uum Ave. Dis- play Ave. Promo- tion Ave. Adver- tising (OZ) per week ($ per 16oz) OSCAR MAYER 16 WIE- NER 244.21 2.59 0 0.42 0.81 0.15 SCHWEI- GERT 12 FRANK 155.87 1.57 0 0.12 0.33 0.04 OSCAR MAYER 16 WIE- NER 129.85 2.75 0 0.33 0.69 0.15 OSCAR MAYER 16 WIE- NER 116.46 2.76 1 0.21 0.69 0.13 OSCAR MAYER 16 FRANK 109.88 2.53 0 0.38 0.5 0 BAR S 12 FRANK 92.11 1.4 0 0.09 0.36 0.02 JOHN MOR- RELL 12 FRANK 73.11 1.66 0 0.11 0.4 0.09 OLD WISCON- SIN 24 WIE- NER 71.96 4.18 1 0.13 0.31 0 Market of ALL 3.65 0.36 0.08 0.31 0.04 Average Top 20 2.66 ,19 0.15 0.46 0.09 Top 10 2.46 0.25 0.2 0.53 0.09 88 Results: Demand Estimation --- Hotdogs Logit BLP Mixture Parameter Variable Means Price - .259 (.013) - .085 (.073) - .497 (.172) Vol - .025 (.002) - .017 (.008) - .025 (.017) Display 1.040 (.074) 1.132 (.084) 1.074 (.210) Advertising .474 (.095) .608 (.127) .212 (.205) Promotion .489 (.042) .622 (.074) .067 (.151) Vacuum - .236 (.036) - .325 (.557) - .707 (.155) _Cons - .261 (.074) - 1.062 (.348) .245 (.261) Choice Sets Distribution Pr(whole set) .722 (.022) Pr(promotion) .265 (.022) Pr( - promotion) .013 (.002) 89 ( ) Logit BLP Mixture Parameter Variable Std. Deviations Price .000 (1.962) .331 (.187) Vol .001 (.281) .001 (.010) Display .000 (49.250) .066 (.171) Advertising .000 (.183) .508 (.209) Promotion .650 (.579) .107 (.100) Vacuum .322 (3.318) .910 (.218) Control Function .023 (.144) .149 (.101) .119 (.088) Price Elast. (%) Mean - .932 - .312 - 1.144 Sd. D. .391 .122 .265 Adjusted R - squared .373 .310 .378 90 BIBLIOGRAPHY 91 Ackerberg, D ., L . Benkard, S . Berry, and A . Pakes (2007) . Econometric Tools for An- alyzing Market Outcomes . in J. J. Heckman and E. Leamer, eds., Handbook of Economet- rics , North - Holland, chapter 63, pp. 4171 - 4276. Allenby, P. M. And P. E. Rossi (1998) . Marketing models of consumer heterogeneity . Journal of Econom etrics , 89 (1), 57 - 78. Anderson, S., A. Depalma, and J. F. Thisse (1989) . Demand for Differentiated Products, Discrete Choice Models, and the Characteristics Approach . Review of Economic Studies , 56,21 35. [1027] Bajari, P., and C. L. Benkard (2005) . Demand Estimation with Heterogenous Consumers and Unobserved Product Characteristics: A Hedonic Approach . Journal of Political Econ- omy , 113, 1239 1276. [1051,1056] Ben - akiva, M., and B. Boccara (1995) . Discrete choice models with latent choice sets . International Journal of Research in Marketing , 12, 9 24. B erry , S. (1994) . Estimating Discrete Choice Models of Product Differentiation . Rand Journal of Economics , 25, 242 262. [1030,1041] Berry, S., J. Levinsohn, and A. Pakes (1995) . Automobile Prices in Market Equilibrium . Econometrica , 63, 841 890. [1017,1033,1041] Berry, S., J. Levinsohn, and A. Pakes (1999) . Voluntary Export Restraints on Automo- biles: Evaluating a Trade Policy . American Economic Review , 89, 400 430. [1031,1041,1061,1062] Berr y, S., J. Levinsohn, and A. Pakes (2004) . Differentiated Products Demand Systems from a Combination of Micro and Macro Data: The New Car Market . Journal of Political Economy , 112, 68 105. [1033] Bronnenberg, Bart J., M. W. Kruger, and C. F. Mela (2008) . Da tabase paper: The IRI marketing data set . Marketing Science , 27(4) 745 - 748. Chamberlain , G. (1987) . Asymptotic Efficiency in Estimation with Conditional Moment Restrictions . Journal of Econometrics , 34, 305 344. [1030,1031,1062] Crawford, G. S., R. Griffith, and A. Iaria (2016) . Demand Estimation with Unobserved Choice Set Heterogeneity . Working paper. Eliaz, K., and R. Spiegler (2011) . Consideration sets and competitive marketing . The Re- view of Economic Studies , 78(1), 235 262. 92 F ox , J. T. (2007) . Se miparametric estimation of multinomial discrete choice models using a subset of choices . The RAND Journal of Economics , 38(4), 1002 1019. G oeree , M. S. (2008) . Limited Information and Advertising in the U.S. Personal Computer Industry. Econometrica , 76, 10 17 1074. Gourieroux, C., A. Monfort, E. Renault, and A. Trognon (1987) . Generalized Residuals . Journal of Econometrics , 34, 5 32. [1034] H endel , I. (1999) . Estimating Multiple - Discrete Choice Models: An Application to Com- puterization Returns . Review of Eco nomic Studies , 66, 423 446. [1027,1029] Hortacsu, A., and C. Syverson (2004) . Product Differentiation, Search Costs, and Com- petition in the Mutual Fund Industry: A Case Study of S&P 500 Index Funds . The Quar- terly Journal of Economics , 119(2), pp. 403 456. Kim, H . And K . I. Kim (2017) . Estimating store choice with endogenous shopping bun- dles and price uncertainty . International Journal of Industrial Organization , 54, 1 - 36. Kim, K. I., and A. Petrin (2010) . Control Function Corrections for Unobserved Factors in Differentiated Product Models . Kruger, M. W. And D. Pagni (2008) . IRI Academic Data Set Description, version 2.3 . Chicago: Information Resources Incorporated. L u , Z. (2018) . Estimating Multinomial Choice Models with Unobserved Choice S ets . Working paper. M anski , C. F. (1977). The structure of random utility models. Theory & Decision, 8 (3), 229 - 254. Manzini, P., and M. Mariotti (2014) . Stochastic Choice and Consideration Sets . Econo- metrica , 82(3), 1153 1176. Masatlioglu, Y., D. Nakajima, and E. Y. Ozbay (2012) . Revealed attention . The Ameri- can Economic Review , 102(5), 2183 2205. M c F adden , D. (1978) . Modeling the choice of residential location . Transportation Re- search Record , (673). Pakes, A., and D. Pollard (1989) . Simulation and the Asym ptotics of Optimization Esti- mators . Econometrica , 57, 1027 1057. [1039] Paola, M., and M. Marco (2013) . Stochastic Choice and Consideration Sets . P etrin , A. (2002) . Quantifying the Benefits of New Products: The Case of the Minivan . Journal of Political Economy , 110, 705 729. [1018,1033,1035,1050] P ires , T. (2012) . Consideration Sets in Storable Goods Markets . working paper, 93 Northwestern University . R ider , J. K. (2013) . Essays in Applied Economics . UC Berkeley Electronic Theses and Dissertations . S tern , S. (1997) . Simulation - Based Estimation . Journal of Economic Literature , 35, 2006 2039. [1039] Stern , S. (2000) . Simulation Based Inference in Econometrics: Motivation and Methods . in Simulation - Based Inference in Econometrics: Methods and Applications , ed. by R. Mariano, M. J. Weeks, and T. Scheuermann. Cambridge, U.K.: Cambridge University Press, 9 37. [1039] Swait, J., and M. Ben - akiva (1987) . Incorporating random constraints in discrete models of choice set generation . Transportation Res earch Part B Methodological 21.2:91 - 102. Train , K. E. (2009) . Discrete choice methods with simulation . Cambridge university press. W ooldridge , J. (2002): Econometric Analysis of Cross Section and Panel Data . Cam- bridge, MA: MIT Press. [1031,1063] 94 CHAPTER 3 3.1 Introduction The last chapter emphasized the importance of considering the choice set heterogene- ity while constructing and estimating discrete choice models. I proposed a mixture model to handle this issue. The mixture model extends the basic discrete choice model by allowing for heterogeneous choice sets instead of the identical universal choice set for all consumers. In the model, consumers are assumed to have different c hoice sets which can be viewed as consumer types. Each type of consumer has distinct criteria on the product attributes ac- different types of potential choice sets whil choosing a certain discrete alternative. The type of consumer, which is equivalently the type of choice set, follows a multinomial distribution that is unknown to econometricians and need to be estimated. Under t he special setup of my mixture model, the traditional estimation method basing on the BLP inversion and GMM fails to work. Instead, I pro- posed a two - step mixture approach which implements a control function method to deal with product heterogeneity and a c hoice set formation model to deal with the unobserved choice sets. The mixture approach was applied to the IRI marketing data in three markets and was shown to be useful in correcting the biases caused by assuming a universal choice set. It is necessary to conduct Monte Carlo simulation experiments to examine the con- sistency of the estimators given by the two - 95 two sets of simulation experiments which could be treated as the benchmark cases (respec- consists of more simulation experiments, the goal of which is to examine the performance of the two - step mixture approach and make comparisons with several alternative estimation m ethods under various scenarios. In the benchmark case, the value of the choice set determinate variable for each product is assumed to be constant over time (i.e. the fat content of a milk product). In order to examine the applicability of the mixture appr oach, here I also examine the situation in which the value of the choice set determinant variable can vary over time (i.e. whether on promotion for a potato chips product) by adjusting the data generating process. On the other hand, since there is always e ndogeneity problem in the reality and consistently estimating price coefficient requires appropriate instrument variables , I investigate the consistency of the estimators under a series of situations in which the instrument variables change from - step mixture approach are compared with simple logit and BLP estimators. The results show that the two - step mixture estimators tend to converge to the true parameters and perform muc h better than the other two estimators. The rest of this paper proceeds as follows: Section 3. 2 reviews the data generation .3 proposes the corresponding two - step mix- ture estimation method. Section 3. 4 reports the results of the Monte Carlo simulation ex- periments. Section 3. 5 concludes. 3.2 Data Generation Process The data generating process follows the idea of Chapter 2, Section 4: Firstly, 96 construct the market of differentiated goods; Then define the consume rs of heterogenous preferences and choice sets; At last each consumer chooses one product which yields the maximal utility level from his/her choice set. The choices of consumers are then integrated into industry - level sales data, on which the two - step mix ture estimation approach can be implemented. The detail of the data generating process was introduced in Chapter 2, here I will briefly remind the key setups. Assume the market includes J differentiated products and last s for T periods. The util- ity to cons umer i from choosing an inside product j at time t is The utility from the outside option is w here and are two product attributes , is the product hetero- geneity and is its time shock . I make is generated as: The first part, , is the base price for product j . , the unob- served product heterogeneity enters the price function so the observed price is correlated with the unobserved error term. The second part, , is a time shock. is assumed to be the choice set determinant attribute. In the benchmark case, is constant over time for product j : for each period t and for each product j. 97 I specif y a random coefficient for the price variable , with the mean part and the random part, ; is an i.i.d stochastic term following type - I extreme value distribution across consumers, products (including the outside option) and times. Assume there are I consumers which can be divided into three types in terms of choice sets. A percentage of consumers make their choices out f rom products with all possible values of (all products in market). Another percentage of consumers only consider products with . The rest percentage of ) consumers only consider products with . Corresponding choice sets are denoted as . Here I set the cutoff point to be 0.5, the mean of so have the equal size (in terms of the number of products inside the choice set). The outside option is always inclu ded in all types of choice sets. Denote consumer choice set as , , define the choice indicator of whether consumer i chooses product j at time t as Then the market share of pr oduct j at time t is In the benchmark case, I set . 98 3.3 Estimation Strategies Define According to the derivative in Chapter 2, the probability of a consumer choosing prod- uct j at time t, which is also the predicted market share of product j at time t is (3.3) where is the CDF of . Noticing that the distribution probabilities of the choice sets are nonnegative and sum up to 1, we can make the following transformation: (3.4) where Let . An essential part in the estimation process is to match the predicted market share with the observed market share . Since there are random coefficients and endogeneity in the utility function, one would naturally ask if there are p ossible adjustments to the BLP estimation method to make it appropriate to this setup. 99 If there were no heterogeneous choice sets, there would be no summation in equa- tion (3.3), the model simplifies to regular discrete choice model and the BLP estimation can proceed as follows: 1) For a candidate value of , use the contract mapping algo- rithm to determine the as defined in equation (3.2) for all products over all markets ; 2)Use GMM to estimate equation (3.2) and obtain the GMM residuals; 3)Minimi ze the GMM residual over by iterating over the above steps. In the mixture model there are two more parameters which determine the choice set distribution. An estimating approach which is an analogy of the BLP estima- tion naturally come as follows: 1) For a candidate value of , use the contract mapping algorithm to deter- mine the as defined in equation (3.2) for all products over all markets ; 2) Use GMM to estimate equation (3.2) and obtain the GMM residuals; 3)Minimize the GMM residual over by iterating over the above steps. However, in the simulation study the above BLP - type estimation turn ed out to be invalid. Although whe n are at the true parameter values the GMM can recover the correct preference parameters, the GMM residual is not minimized at the same time. Theoretical proof regarding the failure of the BLP - type estimation can be a further research topic. As proposed in Chapter 2, a valid control function approach for the mixture model proceeds in two steps. In the first step, I estimate the control variates for the endogenous variable . In the second step, I implement the non - linear least squares to match the predicted choice probabilities given by equation ( 3.3 ) with observed market shares given 100 by equation (3.1) treating the estimated control variat es as an additional regressor. S ince the data generation process is known, I can choose an appropriate instrument variable for and of the instrument variable. In the benchmark cas e I set the instrument variable to be: T his is a very strong instrument variable since it is the part of excluding the het- erogeneity . Then the first step is to run OLS of on the endogenous variable and get the residuals . The product heterogeneity is then approximated by in the second step. The second step is the non - linear least squares. Define a vector to contain all the parameters that ne ed to be estimated: Since (2.15) is an integral, the choice probability can be obtained via simulation: ( 3.5 ) w here is a set of random draws from standard normal. Here I set the sam- ple size ; is the realizing probability of choice set as defined in (3.4 ). Follow ing the idea of matching the predicted choice probabilities with the observed market shares, the least squares estimator can be obtained as: 101 where is the market share obtained from equation (3.1 ) , is the predicted choice probability defined jointly by equation (3.4) through ( 3.6 ) . In the above procedures I take the cutoff point of the choice set determinant variable , which is 0.5 in the setup, as already known. I n the ca of the choice set determinant variable , it needs to be estimated together with other pa- rameters. Denote the cutoff point as , , it determines the structure of choice sets as follows: Then just follow what I discussed above in the first situation for the estimation. The only differenc e is that is now an element in the parameter vector . 3 .4 Simulation Results In the simulation part of Chapter 2, I examined the performance of the mixture approach under two and have gotten pleasant results. In this chapter more experiments have been taken to explore the properties of the mixture estimators. In addition to the Mixture estimators, the simple logit estimators and the BLP estimators are reported as comparison . The BLP estimators are obtained by the regular BLP estimating approach assuming all consumers have an identical universal choice set (no choice set heterogeneity). 102 In the benchmark case, the value of the cho ice set determinant variable for each product j is assumed to be constant over time. However, as I discussed in Chapter 2, it is also possible that the choice determinant variable can have different values for a prod- uct over time. For instance, in the study of potato chips and hotdogs markets in Chapter 2, , and a product can be either on promotion or not depending on the marketing plan of the supermarket at one time point. The first simulation experiment examines the case when the value of choice set determinant variable can vary over time. In the benchmark case (referred as DGP I ), for each period t and for each product j . and i.i.d. across each period t and each product j . All the other settings including parameters and variable distri- , the coefficient of the choice set determinant variable, the bias and standard deviation increase are still at a relatively low level which make the Mixture estimator still acceptable. These results suggest that the Mixture ap proach can be applied to the situations in which the value of the choice set determinant variable varies over time. For the choice set distribution pa- two estimators, Logit and BLP, perform much worse than the Mixture estimator. Their estimators are largely biased with relatively large standard deviations. 103 The next experiment focuses on the instrument variables. The ideal situation for the mixture estima tion is that the endogenous variable (price variable) is a function of the in- strument variables (observed exogenous variables) and the unobserved product heteroge- neity. Then the residuals obtained in the first step of the mixture estimation can appropri- ate ly approximate the heterogeneity. In practice it is not easy to find such perfect instru- ment variables, the selected instrument variables might have components which are uncor- related with the endogenous variable, which causes the control function to be an invalid approximation of the unobserved heterogeneity, thus make the final estimators inaccurate. In this experiment I examine the performance of the mixture estimators when the correla- tion between the endogenous variable and the instrument variable varies In the benchmark case the instrument variable is . This is an ideal instrument variable since it is just the part of excluding the heterogeneity . Now define a new instrument variable as , where , i.i.d. across j and t , and is independent of and . The parameter determines the correlation between and . When , it is the benchmark case. Denote the case when as DGP III, and as DGP IV. We can see in Table 3.2 that the results are consistent with the intuition. Although the biases remain at an ac- ceptable low level, the standard deviations grow up rapidly as the instrument variable varies , the standard deviation of Finally, I examine the performance of the estimators under the conditions that the choice set structures are unknown. Fo r simplicity, assume there is no endogenous product 104 heterogeneity and random coefficients in the model. As discussed in the end of section 3 .3, assume the cutoff point of the choice determinant attribute ( ) is unknown to econome- tricians and need to be estimated together with other parameters. In this set of experiments the true cutoff point is set to be 0.5, 0.6 ,0.7 and 0.8. According to the setup, a larger leads to a larger choice set and a smaller choice set . When =1, is the universal choice set and only contains the outside option. Table 3.3 estimators from assuming t he cutoff point is known as 0.5. The next column labeled is unknown and needs to be estimated. We can see that as the true value of deviates from 0.5 (the mean of ) , given by le logit as gets closer to 1. when the deviates from 0.5. Despite , the model gives nearly unbiased estimators with accepta- ble standard deviations. The reason is the model assumes is unknown and estimate it together with other parameters. However, looking at the first panel with , which is the situation tha is when we need to estimate , the object function for the optimization is discontinuous w.r.t. , thus makes it difficult to give precise estimators. I n practice, the above situation in which the cutoff point is unknown would not be 105 likely to happen . First, the choice set determinant attributes usually have discrete but not continuous values. In majority cases, the choice set determinant attributes are dummy var- iables (e.g. whether a milk product is fat - free, whether a product is on promotion/display - valued attribute to format the choice set, the cutoff point is usually common sense and can be taken as given prior to the estimation. Otherwise, consumers have different opinions on the cutoff points, acteristics and the structure of her/his choice set. 3 .5 Concluding Remarks This chapter continues the discussions about the two - step mixture approach of esti- mating discrete choice models with unobserved choice sets proposed in Chapter 2. In this chapter I review the data generation process (DGP) of my mixture model, discuss the fail- ure of another estimation method which depends on the BLP - type inversion under my DGP setup, and then conduct three sets of Monte Carlo simulation experiments to examine the validity of the two - step mixture approach and demonstrate i ts superiority over other tradi- tional estimation methods under various scenarios. 106 APPENDIX 107 A PPENDIX FOR CHAPTER 3 onte Carlo Results III : Varying Values of Choice Set Determinant Vari- able DGP I DGP II M ixture L ogit BLP Mixture Logit BLP Bias .0158 - 1.3063 .0172 - 1.4675 Sd.D .3013 .3348 .3614 0.4005 Bias - .0104 .5090 .0328 0.5579 Sd.D .0858 .0828 .0935 0.0953 Bias - .0029 - .9628 - .0732 - 1.6737 Sd.D .0407 .5370 .1081 0.7687 Bias - .0010 - .0013 Sd.D .0188 .0205 Bias - . 0119 . 00 74 Sd.D . 0355 . 05 41 Bias . 0039 - . 0094 Sd.D . 0128 . 0352 Bias . 008 0 . 0020 Sd.D . 0268 . 0251 108 onte Carlo Results IV : Weak Instrument Variable DGP I DGP III DGP IV Bias Sd.D Bias Sd.D Bias Sd.D Bias Sd.D Bias Sd.D Bias Sd.D Bias Sd.D 109 =.5 =.6 =.8 Mixture (1) Mixture (2) Mixture (1) Mixture (2) Mixture (1) Mixture (2) Bias - .0001 - .1648 .0882 - .1978 - .4709 - .4387 Sd.D .0245 .1372 .1342 .1532 .4547 .2837 Bias .0002 .0043 .0239 .0015 .0631 .0111 Sd.D .0052 .0339 .0175 .0283 .0215 .0424 Bias - .0010 - .1813 - .4113 - .1080 .0617 .0283 Sd.D .0334 .4214 .3046 .3419 .5308 .2666 B ias - .0017 - .0230 . 1217 - . 0145 . 3262 - .0083 S d.D . 0266 . 0493 . 1242 . 0488 . 1403 . 0922 Bias .0002 .0153 - .1191 .0169 - .2296 - .0037 Sd.D .0106 .0236 .0753 .0267 .0529 .0531 Bias .0015 .0077 - .0026 - .0024 - .0966 .0120 Sd.D .0171 .0294 .0547 .0264 .0945 .0474 Bias . 0 360 . 0 311 - . 0555 Sd.D .0566 .0526 .1131 110 BIBLIOGRAPHY 111 Berry, S. (1994) . Estimating Discrete Choice Models of Product Differentiation . Rand Journal of Economics , 25, 242 262. [1030,1041] Berry, S., J. Levinsohn, and A. Pakes (1995) . Automobile Prices in Market Equilibrium . Econometrica , 63, 841 890. [1017,1033,1041] Berry, S., J. Levinsohn, and A. Pakes (1999) . Voluntary Export Restraints on Automo- biles: Evaluating a Trade Policy . American Economic Review , 89, 400 430. [1031,1041,1061,1062] Berry, S., J. Levinsohn, and A. Pakes (2004) . Differentiated Products Demand Systems from a Combination of Micro and Macro Data: The New Car Market . Journal of Political Economy , 112, 68 105. [1033] Kim, K. I., and A. Petrin (2010) . Control Function Corrections for Unobserved Factors in Differentiated Product Models . Train, K. E. (2009) . Discrete choice methods with simulation . Cambridge university press.