S
ubmitted to
 
M
ichigan State University
 
i
n partial fulfillment of the requirements
 
f
or the degree of
 

T
his dissertation consists of three chapters concerning both empirical studies and esti-
mation mytholog
ies
 
of the discrete choice model
s 
in the area of demand estimation. The 
first chapter is a pure e
mpirical study of estimating Chinese outbound tourism demand 
under a discrete choice model framework. The second chapter
 
considers a mixture discrete 
choice model in which consumers have unobservable and heterogeneous choice sets and
 
proposes a 
corresponding 
two
-
step 
mixture 
estimation 
approach
.
 
The third chapter contains 
a set of simulation studies
 
regarding
 
the two
-
step
 
mixture approach proposed in the second 
chapter
.
 
More specifically, the first chapter implements
 
a discrete choice approach to
 
estimate 
the determinants of Chinese outbound tourism demand after year 2004, since when Chinese 
citizens could travel to most major overseas destinations without political restrictions. Start-
ing from travelers' utility specifications, 
this chapter
 
implem
ents basic linear regressions to 
estimate Chinese tourists' 
sensitivity
 
to the cost of travel and other characteristics of the 
destinations. The price and income elasticities are estimated as well. Th
is
 
chapter also
 
pro-
poses a strategy to quantify the welf
are gains of Chinese tourists from the opening of Tai-
wan (to mainland China) as a new destination. 
 
The second chapter proposes a two
-
step mixture approach to estimate discrete choice 

sets are viewed as different consumer types. Each type of consumers has distinct cri
teria 
 
on the attributes of products according to which their choice sets are formatted. After as-
suming the choice set formation process, the choice sets distribution and preference param-
eters can be jointly estimated by a two
-
step mixture approach. A key i
nsight is that the 
approach can be applied to store level data. While
 
having
 
individual level data is not 
a
 
must
, 
it can provide guidance on the formation of choice sets. The effectiveness of the proposed 
mixture approach is demonstrated via a set of Monte
 
Carlo simulations and three empirical 
applications on markets of milk, potato chips and hotdogs using the IRI marketing data. 
 
The third chapter is a follow
-
up of Chapter 2 and 
is based on 
more 
simulation studies
. 
In this
 
chapter
 
I
 
review the data generat
ion process (DGP) of my mixture model, 
discuss 
the failure of another estimation method which depends on the BLP
-
type inversion under 
my DGP
 
setup, and then 
conduct
 
Monte Carlo simulation experiments 
to 
examine the va-
lidity of the two
-
step mixture approach
 
and demonstrate its superiority over other tradi-
tional estimation methods under various scenarios.
 
 
iv
 

I would like to express special thanks to my family, especially my wife for all their 
love and 
t
rust.
 
They gave me everything they had to help me achieve my goal. I would not 
have been able to finish this dissertation without their support. 
 
 
v
 
TABLE OF CON
TENTS
 
 
................................
.................
 

................................
................................
................................
.......
 

................................
.............
 

................................
................................
...............
 

................................
................................
.........
 

................................
................................
................................
........
 

................................
................................
.............
 

................................
................................
................................
.....
 

................................
................................
................................
.......................
 

................................
................................
................................
.............
 

................................
.......................
 

................................
................................
................................
.....
 

................................
................................
................................
.......
 

................................
................................
........................
 

................................
................................
...........
 

................................
................................
.............................
 

................................
................................
..........
 

................................
................................
.........................
 

................................
................................
.............................
 

................................
................................
.................
 

................................
................................
.......
 

................................
................................
..........
 

................................
................................
......................
 

................................
................................
..........................
 

................................
................................
...............
 

................................
................................
..........................
 

................................
................................
.......................
 

................................
................................
................................
.......................
 

................................
................................
................................
.............
 

................................
................................
..........
 

................................
................................
................................
.....
 

vi
 

................................
................................
.................
 

................................
................................
.......................
 

................................
......................
 

................................
............................
 

................................
................................
.........................
 

................................
................................
.....................
 

................................
................................
................................
.....................
 

................................
................................
................................
............
 

vii
 

Table 1.1:
   
New ADS 
R
ecipients by 
Y
ear
 
................................
................................
.....
20
 
Table 1.2:
   
Number of Chinese Tourists' Arrivals in Destinations
 
................................
21
 
Table 1.3:   Descriptive Statistics of China
 
................................
................................
....
22
 
Table 1.4:   Descriptive Statistics of Destinations
 
................................
..........................
23
 
Table 1.5:   Geographical Variables of Destinations
 
................................
......................
24
 
Table 1.6:   Estimates Results of Model 1
 
................................
................................
......
25
 
Table 1.7:   Estimates Results of Model 2
 
................................
................................
......
26
 
Table 1.8:   Estimates Results of Model 3
 
................................
................................
......
27
 
Table 1.9:   Estimated Elasticities (Year 2014)
 
................................
..............................
28
 
Table 2.1:   Monte Carlo Results I: Varying Number of Markets
 
................................
..
73
 
Table 2.2:   Monte Carlo Results II: Varying Size of Choice Set
 
................................
...
74
 
Table 2.3:   Product 
Statistics
 
................................
................................
.........................
75
 
Table 2.4:   Product Features
 
................................
................................
..........................
76
 
Table 2.5:   Market Demographics
 
................................
................................
.................
77
 
Table 2.6:
   
Individual
-
level Purchasing 
Behavior
---
Market of Milk
 
.............................
78
 
Table 2.7:
   
Summary Statistics for Selected Products
---
Market of Milk
 
......................
79
 
Table 2.8:
   
Results: Demand 
E
stimation
---
Milk
 
................................
............................
80
 
Table 2.
9
:  
 
Individual
-
level Purchasing Behavior
---
Market of Potato Chips
 
................
82
 
Table 2.1
0
:  Summary Statistics for Selected Products 
---
Market of Potato Chips
 
.........
83
 
Table 2.1
1
:  Results: Demand estimation
---
Potato Chips
 
................................
................
84
 
viii
 
Table 2.1
2
:  
Individual
-
level Purchasing Behavior 
---
 
Market of Hotdogs
 
.....................
86
 
Table 2.1
3
:
  
Summary Statistics for Selected Products 
---
 
Market of Hotdogs
 
...............
87
 
Table 2.1
4
:
  
Results: Demand Estimation
---
Hotdogs
 
................................
......................
88
 
Table 3.1:  
 
Monte Carlo Results III: Varying Values of Choice Set Determi
n
an
t
 
a
Variabl
e
 
................................
................................
................................
.....
 
107
 
Table 3.2:   Monte Carlo Results IV: Weak Instrument Variable
 
................................
108
 
Table 3.3:   Monte Carlo Results V: Misspecified Cutoff Point
 
................................
...
109
 
 
1
 
CHAPTER 1
 
E
STIMATING CHINESE OUTBOUND TOURISM DEMAND
:
 
A DISCRETE CHOICE APPROACH
 
 
1.1
 
Introduction
 
 
T
raveling abroad has 
become more and more popular among Chinese people. In the 
last two decades, stimulated by the progressively political liberalization, the improving res-
idents' income and living standards, the changing and diversifying socio
-
cultural values, 
China has witne
ssed an exponential growth in both the number and expenditure of its citi-
zens' international travels. During 2014, Chinese outbound tourists reached 107 million
1
, 
spending 164.9 billion dollars overseas (World Bank, 2016), making China the largest 
source of world outbound tourism. As the economic and social factors that have facilitated 
the growth remain positive in the long term, Chinese outbound tourism is still in the early 
growth stage.
 
Given the increasing economic, socio
-
cultural and 
environme
ntal
 
impacts of Chinese 
outbound tourism on the destinations, it is of great importan
ce to analyze the determinants 
of its demand. However, in the existing literature, published studies on Chinese outbound 
tourism demand are relatively small, and the numbe
r of studies that employed 
econometric
 
approaches is even more limited. Lim and Wang (2008) implemented ARIMA technique 
to model the number of Chinese outbound tourists to Australia. However, as other studies 
based on the time
-
series models, it could not a
ccount for the economic and social factors 
                                        
          
1
 
Tourists to Hong Kong and Macao (special administrative regions of China) accounts for more than 60 
percents of the total Chinese outbound tourists. This study excludes these two destinations because the travel 
pattern to them is inconsistent with assumpt
ions on which the discrete choice model is based. Fortunately, it 
does no harm to excludes them due to a nice property of the discrete choice model.
 
2
 
that cause variations in tourism flows. Lin et al.(2015) implemented the ARDL framework 
to model the main factors that affect Chinese outbound tourism for 11 major destinations. 
But the demand of each destination 
was estimated 
separately
, which doesn't entitle one to 
analyze how potential travelers choose their destinations. Eilat and Einav (2004) applied 
discrete choice estimation to a large three
-
dimensional data set of tourist flows. They 
fo-
cused
 
on the total in
flows and grouped the destinations as High GNP destination and Low 
GNP destination.
 
This paper follows the framework of Eilat and Einav (2004) and focus on China, esti-
mating its outbound tourism demand utilizing data from year 2005
2
 
to 2014. One can view 
C
hinese residents as consumers, the world as a market of differentiated products, and the 
destination countries as discrete choices. Starting from travelers' utility specifications, this 
paper implements basic linear regressions to estimate Chinese tourists
' 
sensitivity
 
to the 
cost of travel and other characteristics of the destinations. The price and income elasticities 
are estimated as well.
 
This paper makes a first step, and proposes some extensions to be made in the future. 
Random coefficients can be int
roduced to the utility specification to allow more heteroge-
neities among tourists, making the discrete choice model more flexible (Berry et al. (1995)). 
Micro
-
level data can be used along with the market
-
level data to yield more reliable esti-
mates results 
(Petrin, 2002). 
 
Finally, 
welfare analysis can be 
conducted. Due to historical and political reasons, 
Chinese mainland citizens could not travel to Taiwan until late 2008. Based on utility 
                                        
          
2
 
From 2005, 
Chinese citizens could travel to most major overseas destinations
 
(excludes US and Taiwan, 
whi
ch became available in 2008)
 
without political restrictions.
 
3
 
specification, the discrete choice approach makes it possible
 
to qu
antify the 
benefit
 
gains 
of Chinese tourists from the opening of Taiwan as a new destination.
 
This paper proposes 
a simulation
-
based method for such welfare analysis.
 
The paper is organized as follows. Section 
1.2
 
provides a brief 
background
 
of Chinese 
out
bound tourism. Section 
1.3
 
introduces the discrete choice model and the empirical strat-
egy. Section 
1.4
 
explains how the variables are chosen and constructed. Estimates results 
are presented in Section 
1.5
. Section 
1.6
 
proposes future research directions a
nd Section 
1.7
 
concludes.
 
1.2
 
 
The 
Development
 
of Chinese Outbound Tourism
 
Although it has large size today, outbound tourism is still a recent phenomenon in 
China. Chinese government only allowed its citizens to travel abroad for official, business 
and ed
ucation reasons between 1949 and 1990. Few Chinese residents traveled to foreign 
destinations during that period. After 1990, China started to allow leisure outbound travel 
and gradually relaxed its outbound travel policy. Chinese residents could take leis
ure tours 
organized by the Chinese Travel Service (CTS) to Malaysia, Singapore and Thailand in 
1991.
 
In 1995, the Chinese National Tourism Administration
 
(CNTA)
 

Approved Destination Status (ADS) policy. Chinese travel agencies selected b
y govern-
ment were permitted to organize package tours to 
countries
 
that had received ADS. The 
ADS agreement allows Chinese travel agencies to apply VISAs for entering a foreign des-
tination in behave of all members of a tour group. By legitimizing overseas 
leisure travel, 
facilitating the process of obtaining a visa, and providing package tours, ADS agreements 
laid down the foundation for the explosion of Chinese outbound tourism.
 
Countries 
4
 
receiving ADS agreements each year were listed in Table 1.
3
 
In addit
ion to the ADS policy, many other events contributed to the growth of Chinese 
outbound tourism. In late 1999, the "golden weeks" was implemented. The holiday weeks 
gave people enough time to make long
-
distance
 
travels, successfully stimulated not only 
dome
stic but also outbound tourism industries. In 2001, China entered the WTO. Together 
with the commercial convenience, the 
cooperation
 
of international tourism systems also 
got strengthened. In 2004, China signed the "Memorandum of Understanding"(MoU) with 
Schengen Area countries, which assured an easier way to apply for short duration VISAs 
and made it possible to visit multiple Europea
n destinations without specific limitations for 
Chinese tourists. In 2008, China and United States reached an agreement and signed the 
MoU for tourism. In the same year, Taiwan started to allow leisure tourism for Chinese 
mainland residents, indicating tha
t most major tourist destinations in the world were open 
to Chinese residents.
 
The above political liberalizations, together with Chinese residents' improving income 
and living standards, 
changing and diversifying socio
-
cultural values
, result in the bloom
 
of Chinese outbound tourism. Chinese outbound tourists increased from 4.5 million in 1995 
to 107 million in 2014, with an annual growth rate of 18.15%. The outbound tourism ex-
penditure 
increases
 
from 3.7 billion to 164.9 billion dollars with an annual gro
wth rate of 
22.12% (World Bank, 2016). China has outnumbered United States and Germany in both 
nu
mber
 
of outbound tourists and
 
amount of
 
their expenditures, making itself the largest 
source of world outbound tourism
 
and still presenting great potential to 
continue growing 
in the future.
 
                                        
3
 
Note all ADS recipients are countries. 
 
5
 
1
.
3
 
 
The Discrete Choice Model
 
As mentioned above, for each year, Chinese outbound tourism can be treated as a mar-
ket of differentiated products, while the different destination countries can be viewed as 
discrete choices. S
tarting with the simple conditional logit model, assume the conditional 
utility of consumer 
i
, resident of China, from travelling to destination country 
j
 
in year 
t
, is 
given by:
 

where
 

is a k
-
vector includin
g different observed characteristics of country 
j
, and 
can either be fixed (e.g. distance to China, language...) or vary across years (e.g. price level, 
GDP per capita...).
 

is country 
j
 
's unobserved characteristic and can include the attrib-
utes tha
t attract tourists but are hard to 
quantity
 
(e.g. resorts' attractiveness). 


is indi-
vidual taste error and assumed to be distributed i.i.d. across individuals and destination 
countries, and over time.
 

is a k
-
vector of parameters to be estimated and is assumed to 
be identical across individuals and time. 
 
Based on Equation (
1.
1), consumer 
i
 
chooses the destination to maximize his utility. 
The probability of choosing traveling to country 
j
 
is 
 

where
 

is the number of destination choices in year 
t
. Alternative zero represents 
the outside option of not travelling abroad. In this application, the outside option can be 
either to travel dome
stically or not to travel at all. While the domestic travel data is not 
available, these two cases cannot be distinguished. Without loss of generality, the utility of 
the outside option is normalized to zero in each market.
 
Assume the 


follows the type
 
I extreme value distribution (whose CDF is 


6
 

) , then the individual taste errors can be integrated out, and 


becomes:
 

Furthermore,
 
assume each resident makes his travel decision only once a year, which 
is, he travels abroad no more than once a year. Then 


becomes the predicted market 
share of destination 
j
 
in year 
t
, 


. After some transformation the following
 
equation can 
be obtained
 

Equation (
1.
4) is the basic model to be estimated in section V, and OL
S comes into the 
way naturally.
 
However,
 
in this application, 


may be of great importance. The destina-
tions' unobserved characteristics or unquantifiable attractiveness of famous resorts (such 
as Statue of Liberty) play an important role in potential traveler's 
decision
-
making
 
process. 
With the penal data set, the id
entification power of the parameters of interest can be im-
proved by introducing heterogeneity across destination countries. Assume 
 

where
 

is a permanent component for country
 
j
 
and 


is a temporary shock 
which is inde
pendent across countries and years. Then under ideal assumptions, Random 
Effect and Fixed Effect estimates can be implemented.
 
One may want to include explanatory variables which depend only on individual 
i
 
and 
year 
t
, but don't vary across destinations in
to the utility function. For example, year dum-
mies (may interact with income) can be included to measure the utility gains from making 
an international travel for each year. While such variables can be added in simple reduced 
form models, in the structural
 
model here, unfortunately, variables don't vary across j would 
7
 
be canceled out in the market share computing procedure (in equation (
1.
2)) and won't 
remain in equation (
1.
4), indicating the corresponding marginal utility cannot be identified. 
Such limitat
ion, together with the difficulty of interacting income with price, cause troubles 
when estimating the effect of Chinese residents' income growth on their outbound tourism 
behaviors. More detailed discussion about this issue will be made in section 
1.4
.
 
1.4
 
Data and Variable Construction
 

and published by the World Tourism Organizati
on (UNWTO).  The information was ob-
tained on the basis of data supplied by each of the destination countries and therefore cor-
responds to arrivals of Chinese resident tourists in these countries. Due to this data prepar-
ing procedure, the information source
s vary from country to country.  
Table 2
 
lists the 
number of Chinese resident arrivals to top destinations, ordered by the average number of 

sources.
 
B
efore continuing, ther
e are some limitations on the use of this data which are worth 


multip
le destinations and multiple trips with each one to a single destination. According to 
the model assumptions, each arrival is treated as a single choice. These would cause an
 
overestimate of total outbound trips. An obvious example is Hong Kong and Macao. 
Since 
they are both Special Administrative Regions of China and not far from each other, Chinese 
tourists always make a bundle trip visiting both places. This is the main reason why Hong 
8
 
Kong and Macao are excluded from this discrete choice estimation. Thi
s problem also 
arises in the trips to some European countries that are relatively small in area and contigu-
ous to each other, and in the trips to Asian countries such as Japan and South Korea (usually 
a cruise tour). But it is far less serious compared to 
the Hong Kong and Macao bundle. 
Single destination tourism is main style for Chinese outbound travelers.   Second, the 
data cannot be classified into groups according to the purpose of travel. Chinese residents 
make outbound travels for different purposes:
 
leisure tourism (sight
-
seeing, shopping, rec-
reation and cultural activities, summer camp, honeymooning, etc.), business or public 
travel, visiting relatives or friends and other purposes. It would be more interesting and 
accurate if we could analyze the d
eterminants for each purposes separately. Travelers of 
different purposes may have different sensitivities to different explanatory variables. For 
instance, leisure tourists may
 
be more sensitive to price, while business and public travels 
may depend more 
on the economy of the destination country and the intensity of the eco-
nomic relations between two countries. However, the data only documents the total number 

 
between tourists based on 
their different pur-
poses.
 
Eilat
 
and 
Einav(2004) focused on the leisure tourism and proposed one strategy that 
might handle this issue: While the percentages of Chinese outbound tourists of different 
purposes are not available, the fractions of the total tourists (from all the worl
d) arrivals to 
any destination countries for leisure purposes can be obtained. The leisure tourist arrivals 
can be approximated by multiplying the total flow and the fractions for each destination 
country, presuming that for each destination, the annual in
tensity of leisure inbound tour-
ism from Chinese visitors
 
is identical of the annual intensity of leisure inbound tourism 
from
 
all the world. This presumption is unrealistic here because Chinese outbound tourism 
9
 
is still in the growing stage and Chinese tou
rists behave differently from tourists of other 
(especially developed) countries. Such approximation may cause serious bias for some des-
tination countries 
that is even worse than not to adjust at all
. There are two other reasons 
why the data of total arriv
als could be used without adjustment. First, there are not clear 
barriers between different purposes of travel. Travelers with business or public purposes 
can also conduct leisure activities (sight
-
seeing, shopping) during their stay in the foreign 
country

sure behavior can have strong impacts on destination countries, regardless of their main 

es varies 
corresponding to their main travel purposes, such tastes variation can be introduced into 
the model by allowing random coefficients, 
which will be discussed in section 
1.6
.
 
Another shortcoming of the data source is some values are missing. Fortun
ately, this 
is not a big problem. The missing values are mainly for small countries with quite small 
numbers of Chinese tourist arrivals. Besides, the conditional logit specification is quite 

in a specific year can be 
treated as part of outside options, and we can use available data for the estimation without 
worrying 
about creating biases. This pro
perty also ensures the feasibility to exclude Hong 
Kong and Macao from the estimation.
 
To create 
the market share for destination country, the number of arrivals is divided 
by the Chinese urban population, presuming that only Chinese urban residents make inter-
national travels, which is quite reasonable. While Chinese rural residents account for nearly
 
half of Chinese total population
, their income is too low compared to the urban residents 
(see Table 
1.
3
 
for details).
 
10
 
T
urning to the explanatory variables, firstly we need variables that measure the cost of 
the international trips. This is not straightfo
rward since the market of outbound tourism is 
very different from traditional markets (e.g. Automobile market) in which the products 
have clear prices. Travelers can choose their length of stay freely and data on length of stay 
is only available for a limi
ted number of destinations. Also the travel cost depends largely 
on activities taken by travelers and can vary a lot. There is also opportunity cost for the 
time spent on outbound travel which is hard for quantification. Given such difficulties, it's 
almos
t impossible to measure the cost of travel in money value. In this specific application, 
two variables are used 
------
 
relative price of living and distance from China.
 
The relative price of living is based on "the ratio of Purchasing Power Parity conver-
si
on factor to market exchange rate" 
(World bank, 2016)
, which tells how many dollars are 
needed to buy a dollar's worth of goods in the country as compared to the United States, 
and is also referred as the national price level.  The relative price is obtain
ed by dividing 
the "price level" of destination country by the "price level" of China, hereafter denoted as 


, and can be interpreted as how many units of good a consumer has to give up in China 
in order to purchase one unit of good in country 
j
. This var
iable has a nice property that it 
captures the variation of real exchange rate
 
over time and can be used for cross
-
sectional 
comparison at the same time.
 
The distance to China is an important geographic variable. It's highly correlated with 
the transportat
ion cost and the time needed for the travel. Weighted distance between major 
cities of China and the destination country (hereafter denoted as 


) is used in this ap-
plication
 
(Source: The GeoDist Database)
.
 
25 cities of China were used to compute the 
we
ighted distances.  
 
11
 
To control for the economical relation between China and destination country, two 
variables concerning international trade are used, which are the import and export shares 
of China's trade volume with the destination country out from Ch
ina's 
total import and 
export volumes
 
(Source: World Integrated Trade Solution
4
).
 
Travelers whose main purpose 
is business are more sensitive to these shares. On the other hand, as China imports from 
one country, that country's products are exposed to Chin
ese consumers. The products may 
contain the history, culture and technology
 
standard of the origin country. This may inspire 
Chinese consumers' aspiration to explore the country of origin. 
 
The variable used to describe the economy of destination is GDP pe
r capita, PPP ad-
justed
 
(Source: World Bank, 2016
5
), 
enter
ing
 
the utility function in its nature logarithm.  
In addition, two dummies are included: dummy for common language (whether speak Chi-
nese) which can control for culture similarity, and dummy for common border.
 
Table
 
1.
4
 
and Table
 
1.
5
 
show the variables discu
ssed above for major Chinese tourist destinations.
 
 
One may be interested of how the income level of potential travelers would affect their
 
travel decisions 
---
 
whether to travel abroad and where to go. However, one limitation for 
this application is that 

Here the variables describing the price are relative price level and distance, indicating that 

 
However, as mentioned in the end of section III, if one adds the income into the reduced 
form to be estimated (Equation (
1.
4)) without interacting it with variables that vary across 
destinations, it would violate the basic setting of the structural approa
ch. Two different 
                                        
          
4
 
The source of data for Taiwan is Bureau of Foreign trade, Taiwan
 
5
 
The source of data for Taiwan is National Statistics, Taiwan
 
12
 
adjustments are made in order to include income into the utility specification, the detail of 
which is in next section.
 
1.5
 
Estimation
 

eferred as Model 1):
 

w
here
 

includes all other explanatory variables.
 
Table
 
1.
6
 
reports the estimated parameters from three methods: ordinary least squares 
(OLS),
 
Ra
ndom effects and Fixed effects. All the coefficients are of expected signs in all 
three estimations. For the OLS, all parameter estimates are significant except for coeffi-
cients of relative price level and export share. The estimate of 


becomes significant 
once I include the destination heterogeneity and implement RE and FE. Coefficients of 
international trade suggest that imports have more influence than exports on Chinese tour-

consuming imported products 
would inspire consumers to travel to the country of origin.
 
As mentioned above, one may suppose the income level would affect potential travel-

, one 
may care less about the cost. Utility function is modified (referred as Model 2)
 
as an attempt 
to capture this intuition:
 

w
here
 
 
13
 

denotes the income of Chinese urban resident. More sophisticated estimate method 
is needed due to the variation of
 

. Some simplification could be made if a 
representative 
consumer is used:
 

w
here
 
 
is the per capita disposable income of Chinese urban households in year 
t
, which 
went beyond 20000 Chinese Yuan since year 2011.
 
With such simplification, linear estimations can be implemented. The estimates results 
are listed in 
Table
 
1.
7

 
OLS 
reports insignificant estimates of coefficients of price level. The RE estimat
e is quite inter-
esting: the estimate of 


is significant, indicating Chinese travelers are sensitive to the 
dest
ination
'
s price level when they have relatively low income. As their income increases, 
they pay less attention to the living cost of the des
tination (the estimate of 


is less than 
the estimate of 


in absolute value and is insignificant). The FE estimate gives a similar 
result. The estimated coefficients of the distance are significant in both OLS and RE esti-
mates, and decreases in ab
solute value when income gets higher. The results suggest that 
Chinese travelers are sensitive to the transportation cost and opportunity cost of time, and 
that sensitivity gently decreases as they get higher income.
 
Given the above evidence that income do
 
affect travel choices, the utility specification 
can be adjusted to include income in a more natural way (referred as Model 3) in the con-
venience of post
-
estimate analysis:
 

14
 
As the same simplification made in Model 2, a representative consumer is used and 
then 


equals to 


for all 
i
. 
Table
 
1.
8
 
lists the results. The estimated coefficients re-
lated to cost are significant in both RE and FE specifications.
 
Based on this utility function, one can calculate the own
-
 
and cross
-
price elasticity:
 

The price here refers to the relative price level, or real exchange rate, which can be 
affected by the variation of nominal exchange rate and price level 
of either country, China 
or the destination.
 
The income elasticity can
 
be calculated as
 

a
nd the i
nco
me 
elasticity of making an international travel is:
 

Table 
1.
9
 
reports the estimated own
-
price and income elasticities for year 2014.
 
1.6
 
Future Research Dir
ection
s
 
A.
 
Random Coefficients Estimation
 
While the simple logit model is convenient for computation, there are some implausi-
ble limitations. The 
heterogeneity
 
of consumers only comes from the idiosyncratic logit 
15
 
error, and this causes the "independence of irrelevant alternatives (IIA) " problem, which 
is, the ratio of the probability (market share) of two choices does not change depending on 
the set of choices 
that are available. For example, suppose that the market shares of Japan 
and United States were the same before Taiwan became an available choice. After Taiwan 
joins the market, IIA implies that the market shares of Japan and United States will still be 
th
e same. However, intuitively, Japan's market share would decrease more than United 
States' since Japan and Taiwan are both close to China so that Taiwan is more likely to take
 
over
 
market share from Japan. The unrealistic substitution patterns caused by II
A also show 
up in terms of cross price elasticties, violating the intuition that tourists who substitute away 
from a certain destination would be more likely to choose their new destination based on 
similar characteristics.
 
The IIA problem can be eased by 
allowing random coefficients in the utility function, 
which makes the model more flexible. Travelers have individual
-
specific tastes for ob-
served 
characteristics
 
of destinations according to their main purposes of travel. Berry et 
al. (1995) proposed a met
hod to estimate the random coefficients (mean and standard error).  
Petrin (2002) made an extension to augment market share data with information relating 
consumer demographics to the characteristics of the products they purchase. Income is also 
part of th
e random coefficients, and one can expect that the way how income enter the 
utility specification matters the estimate results
---
 
the coefficients, elasticities, and the wel-
fare changes.
 
B.
 
Quantifying 
the welfare gains from the opening of new destination 
------
 
Taiwan 
 
As an advantage compared with other models, the discrete choice model is based on 
the utility function, indicating that it is possible to conduct welfare analysis. Petrin (2002) 
16
 
quantifies how the introduction of the minivan changed the tota
l (consumer and producer) 
welfare (measured by compensation variation) in United States. Similar attempt (equiva-
lent variation based)
6
 
can be made to estimate the benefits gained by Chinese (mainland) 
tourists
 
from the opening of Taiwan as a new destinatio
n choice.
 
For a traveler to Taiwan, 
the welfare benefit is how much income he would like to give up to keep Taiwan as an 
available choice. 
 
To be more specific, this is a simulation
-
based analysis:
 
1) Make R random draws, each of which represents an indivi
dual, and contains his 
personal
-
specific coefficients (including his income
 

) and his logit taste errors for all 
feasible choices ( 


);
 
2) For each individual
 
i
, compute his utilities of all choices (the utility of choosing 
outside good is normalized to be zero)
 

. He chooses travelling 
to Taiwan if that yields him the maximal utility, denoted as 


3) For each individual whose first best choice is Taiwan, find his 
second
-
b
est
 
choice 
and the corresponding utility 


. This is the choice he would make if Taiwan is 
not available in the market.
 
4) Find 


such that


. Given the utility is increas-
ing in income, 


should be negative and 


is 
i
 
'
s welfare benefit.  
 
5) The sum of 


's
 
for all individuals whose first best choice is Taiwan is the total 
welfare gains for a population of R.
 
Petrin (200
2
) showed models estimated without micro data yield much larger welfare 
                                        
          
6
 
Petrin (2002) 

his approach due to a problem related to outside goods. However, equivalent variations can be calculated, 
avoiding that problem.
 
17
 
numbers than the model using them and depend largely on the idiosyncratic logit taste error. 
This suggests consumer
-
level data is needed to make the welfare analysis more reliable.
 
1.7
 
Conclusions
 
This paper uses a discrete choice approach to estimate the determinants of Chinese 
outbound tourism demand after year 2004, since when Chinese citizens could travel to most 
major overseas destinations without political restrictions. The des
tinations are viewed as 
differentiated products.
 
Given the specificity of the tourism market, the relative (to China) cost of living and 
the distance from China are 
chosen
 
to measure the "price" of traveling to a destination. 
Attempts are made to include i
ncome into the utility specification to examine how the 
change of income affect people's travel decision. Simple logit models and general linear 
regression estimates (OLS, RE, FE) are implemented.
 
The estimates results indicate 
Chinese travelers are sensit

level when they have relatively low income. 
As their income increases, 
such sensitivity 
decreases and becomes insignificant. Travelers prefer destinations that are close to home, 
and the preference gently decreases as their i
ncome get higher. The estimated own
-
price 
and income elasticities of most popular destinations are reported. The results also support 
the intuition that tourists prefer destinations that are more developed, have similarity in 
culture (speak Chinese) and sh
are a common border. The destination's economical rela-
tionship with China is not likely among travelers' consideration.
 
As for future research directions, Random Coefficients estimation can be implemented 
to ease the limitations of simple logit model. If 
a
vailable
, 
consumer
-
level data could be 
used along with the market
-
level data to yield more reliable estimate results. An equivalent
-
18
 
variation
-
based method is proposed to quantify the welfare gains of Chinese tourists from 
the opening of Taiwan as a new des
tination. 
 
 
19
 
 
APPENDIX
 
 
20
 
A
PPENDIX FOR CHAPTER 1
 
 
21
 

22
 

23
 

24
 

25
 

26
 

27
 

28
 

Destinations
 
 
Own
-
Price Elasticities
 
 
Income Elasticities
 
 
RE
 
FE
 
 
RE
 
FE
 
South Korea
 
-
0.82
 
-
0.58
 
 
0.10
 
0.09
 
Thailand
 
 
-
0.39
 
-
0.27
 
 
0.10
 
0.13
 
Taiwan
 
 
-
0.50
 
-
0.35
 
 
0.08
 
0.08
 
Japan
 
 
-
1.01
 
-
0.71
 
 
0.14
 
0.14
 
United States
 
-
1.02
 
-
0.72
 
 
0.36
 
0.50
 

29
 
 
BIBLIOGRAPHY
 
 
30
 
BIBLIOGRAPHY
 
 
Arita, 
S
., 
S
. Croix, and 
J
. Mak
 
(2012).
 
How big? The impact of approved destination 
status on mainland Chinese travel abroad.
 
Working Paper
-
University of Hawaii Economic 
Research
 
Organization, University of Hawaii at Manoa
 
2012/3
.
 
Berry, S. T. 
(1994). Estimating discrete
-
choice models of product
 
differentiation.
 
The 
RAND Journal of Economics
, 242
-
262.
 
Eilat, Y., 
and
 
L. 
Einav*
(2004). Determinants of international tourism: a three
-
dimen-
sional panel data analysis.
 
Applied Economics
,
 
36
(12), 1315
-
1327.
 
Hicks, J. R.
 
(1945). The 
generalized
 
theory of c
onsumer's surplus.
 
The Review of Eco-
nomic Studies
,
 
13
(2), 68
-
74.
 
Jin, X., 
and
 
Y.
 
Wang
 
(2016). Chinese outbound tourism research: A review.
 
Journal of 
Travel Research
,
 
55
(4), 440
-
453.
 
Lim, C. 
(1999). A meta
-
analytic review of international tourism 
demand.
 
Journal of Travel 
Research
,
 
37
(3), 273
-
284.
 
Lim,
 
C., and 
Y. 
Wang
 
(2008). China's post
-
1978 experience in outbound tourism.
 
Math-
ematics and Computers in simulation
,
 
78
(2
-
3), 450
-
458.
 
Lin, V. S.,
 
A.
 
Liu and 
H. 
Song 
(2015). Modeling and forecasting Ch
inese outbound tour-
ism: An econometric approach.
 
Journal of Travel & Tourism Marketing
,
 
32
(1
-
2), 34
-
49.
 
Mayer, T., 
and
 
S. Zignago
 

database.
 
National Bureau of Statistics of China
, China Statistical Y
earbook 2015 [
Electronic
]
.
 
Nevo, A
.
 
(2010).
 
Empirical models of consumer behavior
. No. w16511. National Bureau 
of Economic Research, 2010.
 
Berry, S., 
J. 
Levinsohn, and 
A. 
Pakes
 
(1995). Automobile prices in market equilib-
rium.
 
Econometrica: Journal of 
the Econometric Society
, 841
-
890.
 
Peng, B., 
H.
 
Song, 
G. I. 
Crouch 
and
 
S. F. 
Witt
 
(2015). A meta
-
analysis of international 
tourism demand elasticities.
 
Journal of Travel Research
,
 
54
(5), 611
-
633.
 
Petrin, A. 
(2002). Quantifying the benefits of new products: 
The case of the minivan.
 
Jour-
nal of political Economy
,
 
110
(4), 705
-
729.
 
Rodrigues, V, and Z
.
 
Breda.
 
(2014)
 

Chinese Outbound Tourism Market.
 
7th World Conference for Graduate Research in 
31
 
Tourism, Hospitality and Leisure: 693
-
698
.
 
World Bank 
(201
6
). Indicators: Data. Retrieved from
 
http://data.worldbank.org/indicator
.
 
World Tourism Organization 
(2016), Data on Outbound Tourism (calculated on basis of 
arrivals in destination countries) dataset [Electronic], UNWTO, Madrid, data updated on 
10/01/2016.
 
 
32
 
CHAPTER 2
 

2.
1
 
Introduction
 
The discrete choice model has long been utilized in the fields of economics and mar-
keting to conduct demand estimation, understand consumer preference and purchasing be-
havior, analyze and predict market shares of differentiated product
s. 
As a structural model, 
the central idea of the discrete choice model is utility maximization 
------
 
among a variety 
of differentiated products in a certain market, a consumer chooses one product that can 
give her the highest utility level. 
 
The traditio
nal literature on discrete choice model (Train (2009) makes a comprehen-
sive coverage on the topic) pays large attention on modeling the utility function of the 

considers
 
purchasing, as exogenously given. In most empirical studies, researchers assign 
all consumers a universal choice set that contains all available products in a given market 
to estimate the model. However, the choice set chosen by the econometrician may not
 
be 

genous because of a variety of reasons including but not limited to insufficient information, 
searching cost, time constrains, advertising exposure, commitment and so
 
on. As in a paper 
presenting a formal analysis of the distributional structure of random utility models, Man-
ski (1977) points out that a stochastic choice set formation model which assigns a realizing 
probability to an alternative choice set faced by a co
nsumer is an essential part in modeling 
the choice decision process. It has been suggested by not only theoretical derivatives ( Eliaz 
33
 
and Spiegler (2011), Masatlioglu, Nakajima and Ozbay (2012), Manzini and Mariotti 
(2014) , etc.) but also empirical exami
nations ( Goeree (2008), Pires (2012), Paola and 
Marco (2013),Lu (2018), etc.) that ignoring the choice set heterogeneity can cause consid-

(e.g. elasticities) that rely 
on them.
 
T
here are some kinds of literature dealing with the choice set heterogeneity. The first 


7
 

preference param-
eters can be consistently estimated based on a subsample which is drawn from the true 
choice set according to an appropriate probability distribution. Such subsample is denoted 

torical choices and lie 

trician. Lu (2018) proposes a similar estimating approach based on the bounds of choice 

 
by two observed sets, re-
spectively, the largest and smallest possible choice sets, the bounds combined with a mon-
otonicity property derived from utility maximization could imply a system of inequality 
restrictions on observed choice probabilities which ca
n generate a set of moment conditions 
that could be used to identify and estimate the preference parameters. 
 
Such approaches have several limitations. Firstly, the construction of choice sets de-

requirements on the data 
sets. Researchers must have individual
-
level panel data that contains a relatively long
-
time 
period to conduct such estimating methods. In many cases where only industry
-
level data 
                                        
          
7
 
See, e.g., Fox (2007) and Crawfor
d, Griffith and Iaria (2016).
 
34
 
is available, or the panel is not long enough, suc
h approaches would become non
-
applicable. 
Secondly, such approaches need to assume a consumer has stable choice set in terms of 
products over time, which is that, if a consumer ever purchases one product, then this prod-

 
set for all the time periods. This assumption is gen-
erally true but would likely to be violated in certain circumstances. For example, if a con-
sumer of breakfast bread usually shops grocery store in a hurry and only chooses from the 
products on display, h
er choice set would not be stable in terms of products since the store 
always switch on
-
display items. Another limitation, which is the most fatal one, is that only 
the preference parameters can be consistently estimated, given satisfactory data and assum-
i


ticities and many other post
-
estimation implications which are cared the most by the 
indus-
try decision makers. Only ranges can be predicted. Depending on how the sufficient sets 
(or the smallest and largest possible choice sets in Lu (2018) ) are constructed, the ranges 
can sometimes be considerably large and not instructive. 
 
A
nother 
cate
gory
 
of existing approaches extends the random utility framework by 
modeling the choice set formation process and simultaneously estimate both choice set 
formation and preference parameters.
8
 
Swait and Ben
-
Akiva (1985) propose a constraint
-
based view of choice set formation and corresponding approaches to structuri
se
 
and pa-
rameteri
se
 
choice set models, which have been widely adopted by empirical studies. The 
assumptions on choice set formation
 
vary corresponding to different industries of interest 
and can be quite plausible in certain applications. For example, Georee (2008) focuses on 
                                        
          
8
 
See, e.g., Ben
-
Akiva and Boccara (1995), Goeree (2008), Hortacsu and Syverson (2004).
 
35
 
the personal computer industry and specifies the choice set formation based on advertising 
exposure. The proba


dustry
-
le
vel sales data. However, the specification of choice set formation for a certain 
application is usually restrictive and generally cannot be duplicated in other industries. For 
instance, the assumption in Georee (2008) that the choice set formation is based
 
on adver-
tising exposure seems reasonable only for the industries in which the products refresh rap-
idly and consumers have limited information on what products are available in the market. 
In addition, the more detailed the specification of the choice set 
formation the econometri-
cian makes, the higher risk of misspecification she would face.
 
T
his paper proposes a mixture model in which the choice sets and preference parame-
ters can be jointly estimated. The specific setup of the choice set formation process 
makes 
this paper different from the existing 
literature
 
utilizing choice set formation models.  Dif-
ferent choice sets are viewed as different consumer types. Each type of consumer has dis-
tinct criteria on the product attributes according to which their cho
ice sets are formatted. A 

teria. I assume the type of consumer, which is equivalently the type of choice set, follows 
a multinomial distribution that is unknown to e
conometricians and need to be estimated. A 
control function method is implemented to deal with the product heterogeneity in a two
-
step estimation process.
 

need to be stabl

36
 
attributes, which is more realistic. For example, this setup is applicable to the situation in 
which one type of consumer chooses only from the products on display or be promote
d (we 

would be indicator variables in the model). Like the Georee (2008) discussed above, my 
approach makes assumptions on the formation of choice sets. I propose a choi
ce set for-
mation model which is different from those exists and fits in with the reality better in certain 
circumstances. A potential problem is that when we add the number of attributes in the 
choice set formation process, the number of different choice s
ets will grow exponentially, 
and it could make the estimation computationally demanding and increase the difficulty 
for identification. Basically, there is a tradeoff between increasing the accuracy of identi-
fication and reducing the risk of choice set for
mation misspecification. I propose several 
strategies to control the number of different choice sets types. Those strategies can be im-
plemented flexibly depending on the specific industry that is focused on, which will be 
discussed later in this paper.
 
T
he
 
mixture approach is applied to the IRI marketing data set for demand estimation.
9
 
The IRI marketing data set contains store level and household weekly panel data of prod-
ucts (available in supermarkets) in 30 categories over 10 years. Specifically, I exami
ne the 
approach on the markets of milk, potato chips and hotdogs. Comparing with the simple 
logit and BLP estimation methods that assume a universal choice set for every consumer, 
the mixture approach yields significantly different preference parameters (o
n price) and 
distributing pattern of price elasticities across products. 
 
                                        
9
 
I would like to thank IRI. For making the data available. All estimates and analysis in this paper, based on 
data provided by IRI. Are b
y the author and not by IRI.
 
37
 
T
he rest of the paper proceeds as follows: in section 2
.2
 
I describe the full setup of the 
model. Section 
2.
3 discusses the identification and estimation method. Section 
2.
4 reports 
the results of Monte Carlo simulations. An introduction of the IRI marketing data set and 
the results of three empirical applications are reported in section 
2.
5. Section 
2.
6 concludes.
 
2
.2
 
 
The Model
 
The mixture model extends the basic discrete choice mod
el by allowing for heteroge-


was proposed by other researchers in the early time (e.g. Manski (1977) ) and it is reflected 
here in equation (2.6). The contribution of this paper is its specification of the choice set 
formation process and 
the corresponding estimation strategy. This section first introduces 
the primitives of the discrete choice model and then specifies the choice set formation pro-
cess. A discussion about the difference between an existing representative choice set for-
mation 
model and mine follows in the end.
 

A
ssume that a market lasts for T periods and consists of a set of differentiated products 


.
 

n the market there are I consumers, each of which chooses one 
product from 

 
in each period.
 
The indirect utility to consumer i from choosing product j 
(> 0) at time t is
 

38
 
The utility from choosing the 
outside option is
 

where 


is a vector consisting of product attributes, consumer demographics 
and their
 
interactions.


is the random coefficient and can be written 
as


w
here 

 
is the mean part and
 

is the random part for consumer 
i
, 


, 


is 


standard normal error
. 

 
and 

 
are 


vectors 
of parameters.
 
10
 

is the unobserved (to the researcher) product heterogeneity. 


is 
an i.i.d stochastic term 
following Type
-
I extreme value distribution across 
i
, 
j 
(including 0) 
and 
t
.
 
Each consumer i chooses a product that gives him/her the maximal utility from his/her 
choice set 


. The choice indicator of whether consumer 
i
 
chooses product 
j
 
at time 
t
 
conditional on his/her choice set is
 

The basic discrete choice model assumes identical universal choice set for all consum-
ers, which is 


for
 

. Actually 


is heterogeneous and can be any sub-
set of 

.
 
An approach to proceed is to specify a choice set formation process wh
ich gen-
erates a probability distribution of the differentiated choice sets:
 

where 


is the power set of 

.
 
                                        
10
 
See Berry, Levinsohn, and Pakes (2004) for details about the random coefficients logit model.
 
39
 
Define 


, the conditional probability of consumer 
i 
choosing a 
product 
j
 

w
here the integral is over the distributions of


and 

, the summand is over all the 
different alternative choice sets. The existing models of 
choice set formation can be viewed 
as different specifications imposed on (2.5). In the following subsection I propose my spec-
ification.
 

In this model I assume the choice set formation process is based on certain product 
attribu
tes cared by consumers while they are considering what to purchase. By hereafter I 

formation process is illustrated firstly by a simple example and then the gen
eral formation 
process is proposed.
 
2
.2.
2.
1
 
A Simple Example
 
I

ample simple, assume there is only one choice set determinant attribute. 
 

vel panel data that in year 2011, 5.41% of milk 
consumers purchased fat
-
free milk only, while 34.86% of milk consumers only purchased 
non
-
fat
-
free milk. These shopping patterns suggest that different consumers may have dif-
ferent choice sets: some consumers
 
consider fat
-
free milk only, while some other consum-
ers consider merely non
-
fat
-
free milk while making their purchasing decisions. Then we 
40
 
can use fat content as the attribute that determines the types of choice sets and assume it is 
the only choice set d
eterminant attribute. This attribute has two possible values: fat
-
free or 
non
-
fat
-
free (which includes reduced fat and whole milk). Then it can generate three types 
of choice sets:
 
Type 1: consists of fat
-
free products; 
 
T
ype 2: consists of non
-
fat
-
free pr
oducts;
 
T
ype 3: consists of all products of any fat content (either fat
-
free or non
-
fat
-
free).
 
The outside option naturally lies in each type of choice set. A consumer has a choice 
set out from the above three types with a corresponding nonnegative probabi
lity. The three 
probabilities sum up to 1. The type of choice set is also the type of consumers who have 
that choice set.
 
F
ormally, assume there are 
D
 
choice set determinant attributes. Denote the set that 
contains all choice set determinant attributes as 


}
. Each element of 

 
represents one choice set determinant attribute. In this example 
D=
1 and 


. 


is the indicator of whether the milk product is fat
-
free and it has two possible values de-
noted as 


The superscript 
m
 
refers to the 
m
-
th possible value of the at-
tribute 


.
 
Then the different types of choice sets can be represented by a 


indicator vector 


.
 
T
he 
m
-
th (
m
 
= 1, 2) element is related to the 
m
-
th possible value of 


. 


means a consumer of this type excludes products with 


out from 
his/her choice set, while 


means a consumer of this type includes products with 


in his/her choice set. For instance, if (the choice set of) a consumer is of type 


: 


means the consumer excludes products with 


out 
41
 
from his/her choice set, which is to say, all non
-
fat
-
free products are not in his/her choice 
set; 


means the consumer includes products with  


in his/her 
choice set, whic
h is to say, all fat
-
free products are in his/her choice set. To conclude, this 
consumer has a choice set consisting of merely fat
-
free products and belongs to type 1 as I 
discussed above. 
 
T
he vector 


has 


possible non
-
zero values, each of which 
represents a 
type of choice set:
 
T
ype 1: 


,
 
consists of fat
-
free products;
 
T
ype 2: 


,
 
consists of non
-
fat
-
free products;
 
T
ype 3: 


,
 
consists of all products with any fat content;
 
T
he case 


represents an empty choice set 
and is out of consideration since 
a consumer intending to buy nothing is excluded from the market. Notice that this is very 
different from the outside option, which naturally lies in every type of choice set.
 
Let 


denotes the choice set corresponding 
to 


,
 
and 


denotes the set con-
sisting of all possible choice sets generated by 


.
 
Then in this example:
 

Define the probability distribution of the choice sets as:
 

Then the
 
choice set formation process has completed.
 
2.
2
.2.2
 
General case
 
Follow the notations in the simple example above, assume there are 
D
 
choice set de-
terminant attributes. Denote the set that contains all choice set determinant attributes as 


}
. Each element of 

 
represents one choice set determinant attribute. 
42
 
Notice that 

 
usually has overlaps with 

 
, the variables ente
ring the utility function. 
Further assume the number of possible values for each attribute is finite.
11
 
There are 


possible values for attribute 


: 


Then I use a vector of vectors, 


,
 
to represent 
the type of differentiated 
choice sets. 


,
 
the 
d
-
th
 
element of 

, is an 


indicator vector consisting of zeros 
and ones corresponding to the choice set determinant attribute 


.
 
Specifically, 


)
. 
F
or 


, 


m
eans a consumer of this type excludes 
products with 


out from his/her choice set, while 


means a consumer 
of this type includes products with 


in his/her choice set. Notice that if 


and 


it would be the simple exampl
e discussed in the previous subsection. 
 
Let 


denotes the choice set corresponding to


,
 
and 


denotes the set con-
sisting of all possible choice sets generated by 

.
 
For each attribute 


,
 
the corresponding 


can have 


different non
-
zero 
values. So, the combination of 
D 
attributes will generate 


types of 
choice set without other restrictions. This is also 


,
 
the number of elements of 

.
 
Define the probability distribution of the choice sets as:
 

where the elements of
 

are nonnegative and sum up to 1.
 

There are some other choice set formation models which are preferred by the 
                                        
          
11
 
T
he choice set determinant attributes can also have continuous values, which would be discussed in section 
3.1.
 
43
 
researchers in certain circumstance. Swait and Ben
-
Akiva (1987) propose
d the framework 
of random constraint probabilistic choice sets models and described several examples. Here 
I illustrate a representative model which is utilized by Goeree (2008) in the demand esti-
mation of the personal computer market.
 
For simplicity I sup
press the time subscription. The model assumes any subset of 


(equivalently, all elements of 


)
 

the probability:
 

w
here 


is the probability that product l is considered by consumer i, thus included 

with probability 1. 
 
Goeree (2008) interpreted the 


term as the information technolo
gy which describes 
the effectiveness of advertising at informing consumers about products. It is given by:
 

w
here 


is a vector consisting of product attributes and consumer demographics, 


is the unobserved consumer het
erogeneity, 


is
 
a pre
-
assumed function form and 

 
is 
the corresponding parameter vector that needs to be estimated. 
 
Denote my choice set model which is described in (2.7) as Model 1 and the above 
model that is described in (2.8) as Model 2. The essent
ial difference of the two models is 
procedure of how the choice sets are constructed. Model 1 firstly determines all types of 
choice sets according to criteria on selected attributes and then fills in the different choice 
sets with the products in the mark
et. Model 2 first looks at the products in the market, the 
alternative choice sets are then determined as all subsets of the universal set that contains 
44
 
all the products.
 
Due to the distinct underlining choice sets structures, there are two differences bet
ween 
the two models that are more explicit:
 
The realizing probabilities of the alternative choice sets. In Model 1, the probabilities 
are assumed to be exogenous and will be estimated directly (although with some transfor-
mations); While in Model 2, as illu
strated in (2.8), the probabilities are determined by a 
function of variables concerning the products attributes and consumer demographics. In 
case the number of choice sets types is large in Model 1, it would have more parameters 
that need to be estimated
 
than Model 2 and cause problems for identification. Some strat-
egies of controlling the number of alternative choice sets are proposed in section 3.1.1. In 
addition, assuming a function form of the probabilities as in Model 2 is at the risk of mis-
specifica
tion.
 
The number of alternative choice sets. In Model 1, the number of alternative choice 
sets are fixed across markets (periods) and can be controlled at a relatively small level. In 
Model 2, assuming there are 
J
 
differentiated products in the market, the
 
number of alterna-
tive choice sets is 


and increases at exponential rate with respect to 
J
. It would be ex-
tremely computational demanding given a large value of 
J
.
 
 
45
 
2.
3
 
Identification and Estimation
 

2.
3
.1.1
 
The number of alternative choice sets 
 
Let 


contains all the information of the choice set determinant attributes of all 
products in market at period t. 
U
nder the choice set distribution (2.7) and considering that 


follows Type
-
I extreme value 
distribution, (2.6) can be rewritten as
 

(
2.10
)
 
T
he foundation of jointly estimating preference and choic
e set distribution is to match 
the choice probabilities as in (
2.10
) with the observed market shares of the products. The 
number of moments generated by equation systems (
2.10
) is the aggregate number of the 
differentiated products in the market over perio
ds (
t
). This number can be surpassed by the 
number of choice set types if we specify large value of 

 
and 


grows at exponential rate, in which case the model cannot be identified.
 
To make the identification possible, some strategies need to be taken before the esti-
mation. One strategy is to 
divide the
 
possible value of 


into 


groups and reduce the 
dimension of 


from 


to 


. For example, if 


is the engine size o
f vehicle, we 
can group them into small, medium and large size. Now 


means a consumer with 
this type includes products with 


belong
ing
 
to the 
l
-
th group in 
his/
her choice set. Ex-
tending this idea, we can also 
accommodate
 
continuous attributes 
by dividing the values of 
this attribute into several intervals.
 
The other strategy is to control the number of choice set 
46
 
determinant attributes. The selection of choice set determinant attributes should depend on 
common sense or evidence from consumer pu
rchasing records.
 
T
he example in section 
2.
2.2.1 is a good application of the above two strategies. Firstly, 
the individual
-
level purchasing records show that some consumers only purchase milk 
products of certain fat content, suggesting that fat content is
 
a choice set determinant at-
tribute. Then the fat content is divided into two groups 
---
 
fat
-
free and non
-
fat
-
free. The 

-
fat
-

only 3 types of choice set, together wit
h 3 probability parameters to estimate.
 
T
he above two strategies are aiming at reducing the number of possible values for a 
choice set determinant attribute (


)
 
and the number of choice set determinant attributes 
(

)
, thus reducing the number of choice
 

known parameters after all the choice sets are determined. since we have constructed all 
possible choice sets, it is possible that some choice sets will never realize or will only be 
the true choice sets 
of a very small part of the whole population. Consider the example of 
potato chips market and set the indicator of whether a product is on promotion as the choice 
set determinant attribute. Intuitively, there would be no consumers only considering prod-
ucts
 
that are not on promotion. If we have individual
-

chasing history, we can examine such intuition before estimating the model and assume 
those particular choice sets having zero probability to realize, which can reduce the 
number 
of unknown parameters needed to be estimated.
 
2.
3
.1.2
 
The product heterogeneity
 

,
 
but the producers and 
consumers do. The prices are very likely to be functions of unobserved characterist
ics, and 
47
 
this causes endogeneity problems. If price is positively correlated with the unobserved 
quality, the price coefficients (in absolute value) would be understated by estimation meth-
ods that ignore the endogeneity.
 
I
nstrument variables naturally come
 
in as a solution. Berry (1994) was the first to im-
plement instrument variables methods to deal with the endogeneity problem in discrete 
choice models. BLP (1995) proposed a well
-
known estimation technique (hereafter re-

ombines the contract mapping inversion and GMM 
for random coefficients preference models. I was hoping the BLP inversion would work 
under the setup of my model, however unfortunately it turned out not to be the case. 
12
 
Alternatively, a control function met
hod is implemented in my estimation procedure. I fol-
low the spirit of Kim and Petrin (2010)
13
, constructing control variates to approximate the 
unobserved heterogeneities. 
 
The choice of instrument variables follows the idea of BLP. Assume any observed 
prod
uct attributes except price are uncorrelated with unobserved heterogeneity 

.
 
Denote 


as a vector contain
ing
 
the heterogeneities of all products in market 
t
. Let 


be the set of observed characteristics except price for product 
j
 
affect demand and costs 
in market 
t
. Then the vector 


contains all observed elements in market 
t
 
which are relevant to the determination of equilibrium price. The price for product 
j
 
in 
market 
t 
is given by the price function
:
 

12
 
A discussion ab
out the invalidity of a BLP
-
inversion
-
based estimation approach will be proposed in Chap-
ter 3.
 
13
 
Kim and Petrin (2010) attempted to address the endogenous prices while allowing for non
-
separability 
between observed and unobserved factors.
 
48
 
K
im and Petrin (2010) showed that when prices are additively separable in the unob-
serv
ed factors, the price function can be written as:
 

The term 


is the difference between price and its expected value conditional on 
observed exogenous factors. Write 


as the vector of residual
s 
which act the role of the conditioning variables. The 


is one
-
to
-
one with 


.
 
Then 


can be approximated by 


with a linear function.
 
Equation (
2.11
) indicates that every observed characteristic of every product affects 
every price, implying 
that any product characteristic can be a valid instrument for any price. 
Thus,
 
the number of instruments would be too abundant relative to the number of observa-

(19
94) to derive three optimal instruments with respect to each observed product charac-
teristics
o!
t
he characteristic itself, the sum of the characteristic over products except itself 
from the same firm, and the sum of the characteristic over products from othe
r firms. The 
selection of the control variates follows the similar procedure. The above steps of con-
structing instruments and controls are also adopted by Kim and Petrin (2010).
 

2.
3
.2.1
 
Preparation
 
Let 


denote the set consisting o
f all types of choice set after the simplification 
strategies (discussed in section 3.1) have been taken at period 
t
. And let 
R
 
denote the num-
ber of choice set types: 


.
 
Order the elements of 


so the set can 
be written as:
 
49
 

ket. Notice that there is a subscript 
t
 
for the choice sets. Although the criteria for the choice 
set formation is fixed, the choice set det
erminant attributes of a product can vary over time, 
so a certain type of choice set can contain different products over time.
 
The corresponding probability distribution is:
 

Accordingly, the choice probability (predicted market share) of product 
j
 
becomes:
 

(
2.15
)
 
Before we formally start the 
estimation, noticing that the distribution probabilities of 
the choice sets are restricted 
---
 
nonnegative and sum up to 1, the following transformation 
is needed:
 

(
2.16
)
 
where 


Let 


.
 
2.
3
.2.2
 
The two
-
step estimation
 
Estimation for the mixture model proceeds in two steps. In the first step, I estimate the 
control 
variates 


from the pricing equation (
2.12
). In the second step, I implement the 
non
-
linear least squares to match the predicted choice probabilities given by (
2.15
) with 
50
 
observed market shares treating the control variates as additional regressors.
 
Fol
low the discussion in section 
2.
3.1.2, let 


contain all observed characteristics 
except price for product 
j
, construct the instrument 


as:
 

where 


is the set of products produced by the same firm that produces 
j
. Then we 
define
 

and run OLS of 


on 


to get the residuals 


.
 
The control function is spec-
ified as
 

Define 
 

and 
 

the first step estimation ends at obtaining the estimation of the control variates 


,
 
and the heterogeneity is estimat
ed by the control function as:
 

The second step is the non
-
linear least squares. Define a vector 

 
to contain all the 
51
 
parameters that need to be estimated:
 

Since (
2.15
) is an integral, the choice probability can be obtained via simulation:
 

(
2.20
)
 
where 


as defined in (2.3), 


is a set of random 
draws from standard normal.


is the realizing probability of choice set 


as defined in 
(
2.16
). 


is the control function estimation of product heterogeneity from the first step 
as defined in (
2.19
).
 
Follow
ing
 
the idea of matching the predicted choice probabilities with t
he observed 
market shares, the least squares estimator can be obtained as:
 

where 


is the market share observed from the data, 


is the 
predicted 
choice probability defined jointly by equation (
2.20
) and (2.3), (
2.16
), (
2.19
).
 
R
emark: In certain situations, we want to divide all possible values of a choice set 
determinant attribute into several groups to reduce the total number of choice set types. 

the distribution probabilities. This is attainable with the mixture approach. An example 
will 
be
 
given in Chapter 3.
 
2.
4
 
M
onte Carlo Simulations
 
T
his section introduc
es the designs of Monte Carlo experiments and presents the 
52
 
simulation results. The data set is generated by several steps. Firstly, construct the market 
of differentiated goods. Then define the consumers with preferences and choice sets. At 
last each consu
mer chooses one product which gives the maximal utility level. The choices 
of consumers are then integrated into industry
-
level sales data, on which the mixture esti-
mation can be implemented.
 

Assume the market includes 
J
 
diff
erentiated products and last for 
T
 
periods. Let 


be the product space, where zero represents the outside option. Assume the un-
observed product heterogeneity to be 


.
 
There are two product attributes: 
 

,
 
 
(
j
 

J
; 
t
 

T
)
 
where 


; The product heterogeneity 


participates in the determination 
of 


,
 

is the mean value of attribute 


for product 
j
 
across periods, 


, is the time shock of 


for product 
j
 
at time 
t
.
 

, is con-
stant for product 
j
 
across periods.
 
I
 
make 


m
ake 


the 
choice set determinant attribute.
 

instance, fat content for the milk market. Here the 


is a continuous variable, but there 
is absolutely no problem if it is a discrete variable, e.g. indicator variable. Also, 
i
n practice 
it is fine for 


to vary over time.
 
Assume there are 
I
 
consumers
 
which can be divided into three types in terms of choice 
sets. A percentage of 


consumers make their choices out from products with all possible 
values of 


(all products in market). Another percentage of 


consumers only consider 
products with 


. The rest percentage of 


)
 
consumers only 
53
 
consider products with 


.
 
Corresponding choice sets are denoted as 


.
 
Here I set the cutoff point to be 0.5, the mean of 


so 


have the 
equal 

 
The utility to consumer 
i
 
from choosing an inside product 
j
 
at time 
t
 
isd
 

The utility from the outside option is
 

w
here 


is the random part of the coefficient of 


for consumer 
i
,
 

;
 

is an i.i.d stochastic term following type
-
I extreme value distribution 
across consumers, products (including the outside option) and times. 
E
ach consumer 
chooses the option that yields the highest utility. Notice that 

 
is a parameter that affects 
the probability of choosing the outside choice. A higher 

 
means higher utilities for inside 
goods and yields a lower market share for the outside c
hoice. 
 
Denote consumer 

choice set as 


, the choice indicator of whether 
consumer 
i 
chooses product 
j
 
at time 
t
 
is 
 

Then the market share of product 
j 
at time 
t 
is 
 

In this simulation study, I set 


.
 
 
54
 

Here I take the cutoff point of the choice set determinant variable 


, which is 0.5 in 
the setup, as already
 
known. So, the structures of the three types of choice sets are deter-
mined prior to the estimation. 
 
The estimation follows the two
-
step method discussed in section 3.2. Since the data 
generation process is known, the selection of instrument variables and
 
control variates can 
be simplified. I use a very strong instrument for 


:
 
 
Notice that 


is the part of 


excluding the heterogeneity 


. 
I
n the first step, 
run OLS of 


on 


and get the 
residual as the control variate. Then in the second 
step, do the non
-
linear least squares while treating the residual from the first step as an 
additional regressor.
 
The first set of simulations examine the convergence of the mixture estimator with 
various
 
number of markets (periods), 
T
. The number of markets directly affect the sample 
size. One additional market provides 
J
 
(number of products) additional observations. In the 
simulations the structure of the choice sets is assumed to be known, which is to s
ay, we 
know the choice set determinant variable is 


and the cutoff point is 0.5. The number of 
products (
J
) is fixed at 20. The true parameter values are 


.
 
 
The results are presented in Table 1. The estimators given by the mixture estimation 


rage of 

55
 
standard deviation. 
 

are precise to the true parameter values even with 
relatively small sample size. Looking at the first case 
T
 
=
 
5: There are only 


observations, while the biases of all mixture estimators except 

 
are 
less than or around 0.01. In general, the biases become smaller 
T 
increases. There are a few 
counter
-
examples that the biases go up as the sample size inc
reases, but the deviations are 
so tiny that the estimators can be used to calculate the market share and elasticities without 
concerns. The standard deviations behave in the way as expected. Larger sample size gives 
smaller standard deviation. The converge
nce rate of the mixture estimation is satisfactory 
-----
the standard deviation of each mixture estimator except 

 
decreases by around 50% 
as the market periods increase from 5 to 25. The estimation result of 

 
is not as desirable 
as other parameters. Altho


has a very small bias on aver-
age, its standard deviation is quite large and goes down slowly as the sample size increases. 
When 
T = 
50, the standard deviation of 


is still as large as 0.2477, while the true value 
of parame
ter 

 
is 0.5. On the other hand, the simple logit estimation behaves much worse 

set heterogeneity.
 
The adjusted R
-
square (labeled as 


) is also listed in the tabl
e. Note 


still can give some sense on how well 
the estimators can be used to predict the market shares. We can see that the mixture esti-
mation has a much higher 
a
nd more stable 


in all experiments,
 
proving its superiority 
over the simple logit.
 
In the next step, I examine the performance of the estimators by varying the size of the 
choice set, which is controlled ty the number of products (
J
) in market. The other 
56
 
parameters are fixed at 


.
 
The results are presented in Table 2. As the number of products increases, the sample 
size (number of observations) increases as well as the size of the choice set. In the mean-
while, as shown
 
in Table 2, the probability of outside option decreases. The reason is that 

chosen. The results show that as the number of products goes up, the average biases and 
sta
ndard deviations of all preference estimators decrease, while the choice set probabilistic 
distribution estimators perform slightly worse. The reason might be that there are two 
forces interacting with each other as the number of products increases. On one
 
hand, the 
increasing sample size leads to more precise estimators. On the other hand, the structures 
of choice sets become more complicated, thus increasing the difficulty for accurate estima-
tion. The choice set probabilistic distribution estimators are m
ore affected by the later force. 
Again, the mixture estimation presents significant superiority over the simple logit under 
whatever choice of the number of products. 
 
2.
5
 
Empirical Application
 
T
his section introduces how the mixture approach is applied to
 
estimate the demand 

the data source 
---
 
IRI marketing data set 
---
 
and then illustrate in detail the empirical ap-
plications, respectively, the markets of milk, 
potato chips and hotdogs.
 
57
 

T
he IRI marketing data set
14
 
contains store level and household panel data of products 
in 17 food and beverage 
categories
 
and 13 non
-
food 
categories
 
over the years 2001
-
2012. 
While the data set covers 49 markets
 
all over the U.S, I focus on two cities, Eau Claire, WI 
and Pittsfield, MA, of which the individual level data is available.
 
The household panel 
data 
is constructed by the surveys taken regularly by the households who sign contracts 
with the IRI company i
n these two cities, containing households 
purchase history
, specifi-
cally
 
the timing, 
store
 
and products bought on every shopping trips 
to about 80 percent of 
all the local stores. (The other 20 percent of all stores were not covered by the IRI marketing 
da
ta set.)
 
The store level data
 
was collected weekly 
on quantity
 
sold
, price and
 
promotion
 
information of each product
.
 
In addition,
 
for each category, there is a description file re-
cording the key attributes for all products. Table 3 and Table 4 list 
some general infor-
mation for a subset of 9 food categories which are included in the IRI data set. Each product 
has a unique UPC number.
 
By store/week/UPC combination, the above three data sources can be merged into one 
data set that includes quantity sold
, price, promotion information and key attributes for all 
products in each store and each week. This is the aggregated
-
level data set on which the 
mixture estimation is implemented. More details will be provided in the following subsec-
tions.
 
T
he data set a
lso has information on demographic characteristics for each household 
in the panel

teristics and provides a comparison with the U.S Census data. We can see that the sample 
                                        
          
14
 
S
ee Bronnenberg
, Kruger, and Mela (2008) for an overview of the IRI data set.
 
58
 
populatio
n is more urban, less diverse, slightly more educated and wealthier than the U.S. 
national average. A significant difference between the sample and the national average is 

sidering the fact that the corresponding IRI panelist generally do the shopping for the whole 
households, we can still say that the sample of panelists is a good representation of the 
population of markets customers.
 

T
he strategy
 
of conducting empirical application is as follows. The first step is to ana-
lyze the patterns of consumers purchase behavior using the household panel data. Then 
basing on the purchase patterns that are found in first step, determine the attributes of prod-
uct that enter the choice set formation process. For instance, if we find that a significant 
percentage of consumers only purchase fat free milk, it would be reasonable to assume the 
fat content is an attribute that determines the choice sets types. The ne
xt step is to apply 
the mixture approach on the store level data to jointly estimate the choice set distribution 
and preference parameters. The markets on which the empirical application focuses are 
milk, potato chips and hotdogs. 
 
To accommodate the requi
rement of the discrete choice model that each consumer 
makes only one purchase decision in a certain period, assume a consumer visits a store one 
time per week and purchases at most one unit of product in all the three markets (milk, 
potato chips, hotdogs)
. The evidence from the household panel data supporting the above 
assumptions are reported in the subsections.
 
 
59
 

2.
5.3.1 
 
M
arket of Milk
 
Consider the market of milk in year 2011. I firstly analyze the consumer purchasing 
behavior using 
the individual level data. Table 
2.
6 presents some summary statistics.
 
For the year 2011, the individual level data set has 92,871 panelists who have 129,145 
single shopping records. On average, each consumer visited a store 1.39 times per week, 
and purcha
sed 1.52 units of milk per shopping trip. This shopping pattern suggests the 
assumptions of the discrete choice model are acceptable.
 
T
o analyze the consumers purchasing behavior related to product attributes, I use the 
records for which the individual lev
el data and store level data can perfectly matched, and 
only consider consumers who had shopping records for more than 5 weeks in the year. This 
rules out those consumers who were unlikely in the market of milk and there are 1219 
consumers left. The averag
e number of different products ever purchased by each con-
sumer was 7.4. Looking at the fat content attribute, 5.41% consumers only purchased fat 
free milk. And for 15.34% consumers, fat free milk made up more than 80% of their total 
milk purchasing records
. On the contrary, 34.86% consumers never purchased fat free milk, 
and for 52.17% consumers non
-
fat
-
free milk made up more than 80% of their total milk 
purchasing records. The last row of Table 
2.
6 considers the attribute of whether the product 
is on promo
tion, which means there is a price reduction flag for the product (when the 
temporary price reduction is 5% or greater).
 
T
he above evidence makes it reasonable to assume that there are a group of consumers 
only consider fat free milk (or non
-
fat
-
free) whil
e making purchasing decisions. Then I 
assume the distribution of the types of choice sets in the way that has been discussed in 
60
 

choice set formation for the milk mark
et, since merely 1.39% consumers only purchased 
the on
-
promotion product. 
 
Then I prepare the store level data for the estimation. The sample consists of milk sales 
records of the stores which are matched with the individual level data in year 2011. Define
 
the outside goods to be the products of which the average weekly sales were less than 25 
units. Then there are 106 different products left as inside goods and they made up 94% of 
the total milk sales and had 3,364 product
-
level weekly sales records. Table
 
2.
7 shows the 
summary statistics of some products that have the greatest average weekly sales. 
 

ole 

is an indicator of whether there are advertisements, coupons or rebates for the product. 

than the 

 
Let 
i
 
and 
t
 
denote the consumer and week, respectively. Assume the utility to consumer 
i
 
of buying inside product 
j
 
in week 
t
 
to be
 

w
here 


denotes the per
-
gallon price; 


measures the package volume in gal-
lon;  


,
 

and 


are indicators for in
-
store display, advertising and 
price 
reduction flag; 


and 


are indicators of low
-
fat milk (fat content 
between 0.5% and 5%) and whole fat milk; 


is the unobserved product heterogeneity; 
61
 

to 


are random coefficients with Normal distribution; 


is an i.i.d error with 
Type I extreme value distribution.
 
The characteristics which are chosen to construct the instrument variables are 


and the constant term. The instrument variables and the control func-
tion are formed 
by the way described in section 
2.
3.3.2 as equation (
2.17
) and (
2.18
).
 
Basing on the above setups, the mixture estimate approach is applied on the store level 
data. The estimate results are reported in Table 
2.
8
, which
 
lists the estimators of the means 
of 
preference coefficients and choice set probabilistic distributions, the estimators of the 
standard deviations of preference coefficients, coefficients in the control function, and 
other statistics of the estimation methods. For comparison, the results from
 
simple logit 

as well. The BLP estimation approach was proposed by BLP (1994), it allows for random 

nobserved choice sets.
 

approach gives much greater (more than twice in absolute value) mean estimator for the 
price coefficient than the simple logit does. Besides, the 
mixture estimation gives signifi-


-

mators for the coefficients of fat content indicators 
( 


and 


)
 
become insignificant from simple logit to Mixture estimation. 
The reason is that the fat content is a choice set determinant attribute in the market of milk. 
It can
 

-
free 

 
62
 

coefficients while not dealing with choice set heterogeneity will worsen the estimation re-
sults. While the correctness of above statement needs to be further ch
ecked, the BLP ap-
proach does perform badly in the rest two empirical studies.
 
T
he lower part of Table 
2.
7 lists the estimated the probability distribution of the choice 
sets types. The estimated probabilities of a consumer purchasing only from fat
-
free and
 
non
-
fat
-

purchasing data (in Table 
2.
6): For 15.34% consumers, fat free milk made up more than 
80% of their total milk purchasing records; 34.86% consumers purchased only 
non
-
fat
-
free 
milk according to their milk purchasing records. We can see that the estimated probability 
distribution of the choice sets fit well into the empirical data. 
 
B
asing on the estimators, I calculate the price elasticities. The lower part of Table
 
2.
8 
reports the mean and standard error estimated own
-
price elasticities. We can see that the 
elasticities predicted by the Mixture model are much more elastic on average and have 
greater variation than those predicted by the simple logit model. This is n
ot surprising since 
the estimator of the price coefficient given by Mixture estimation is more than as twice 
much as that given by the simple logit. And the result is consistent with the intuition that 
ignoring the unobserved product heterogeneity (which i
s positively correlated to its price) 
would underestimate (in absolute value) the price elasticity. The result given by BLP lies 
in between the results of simple logit and Mixture estimation.
 
The last row of Table 
2.
9 lists the adjusted R
-
squared. The Mixt
ure estimation has a 
much greater adjusted R
-
squared (.570) than the simple logit (.418) and the BLP (.470), 
63
 
suggesting that the Mixture model does a better job of explaining the reality by modeling 
the choice set heterogeneity.
 
2.
5.3.2
 
Market of 
Potato Chips
 
In this application I consider the market of potato chips for year 2010
-
2011. For this 

-
size combination, which means the potato chips of 
different flavors are treated as the same product as long as they a
re of the same brand and 
package size. This expediency of excluding the flavor attribute is made because of a large 

trouble since the prices and marketing activities (p
romotion, display, advertising) of UPCs 
under the same brand
-
size combination usually move together. The UPC level variables 

is the sale
-
weighted average of the p

(advertising / promotion) is an indicator variable that equals to 1 if any UPC within the 
product is on display (advertising / promotion).
 
Table 
2.
9
 
presents some summary statistics of the individual leve
l purchasing records. 
In the year 2010
-
2011, there were 62,501 panelists who made 86,722 shopping trips in total. 
On average, each panelist made 1.38 shopping trips per week and purchased 1.35 pack of 
potato chips per trip. This shopping pattern supported 
the assumptions of discrete choice 
model that each consumer makes one shopping decision in one period and purchase at most 
one unit of product.
 
Similar as the milk market, I keep the shopping records of consumers who purchased 
potato chips for more than 5 
weeks in the two years to analyze their purchasing behavior 
related to product attributes. There were 737 consumers remaining in the sample. 
F
or these 
64
 
consumers who purchased potato chips frequently, on average the number of different po-
tato chips products
 
purchased by each of them was 8.5. From the lower part of Table 
2.
10 


products o

90% of their total potato chips purchasing records. If we lower the standard form the 90% 
to 80%, the percent of consumers even raises to 60.24%. 
T
he corresponding statistics for 
othe
r two attributes 
---
 

2.
9


y and 

choice set determinant attribute. The other two attributes discussed above may also be the 

heir corresponding 

(18.32%). As discussed in previous section, there is a tradeoff between identification and 
the risk of misspecification. 
 
The next step is to prepar
e the store level sales data for estimation. The sample consists 
of the sale records of potato chips from the store that matches the individual level data in 
year 2010
-
2011. Define the products with less than 10 average weekly sales to be the out-
side goods
. Then the inside goods consisted of 59 different products and made up 97.6% 
of the total potato chips sales. There were 3,503 product
-
level weekly sales records for 
estimation.
 
Table 
2.
1
0
 
lists the summary statistics of some products with the greatest ave
r-
age weekly sales.
 
65
 
We can see that among the top 7 best sellers there were only two brands
---


nt marketing 
activities in the market. For all the products which were listed in Table 
2.
1
0
, the probabil-
ities of being on display and on promotion were more than 0.4. For some products the 
probability of being on display (/on promotion) were even greater 
than 0.9(/0.8). The prob-
abilities of being advertised were not that high as those of the other two marketing activities 
but still much higher than that in the market of milk.
 
The utility to consumer 
i
 
of buying inside product 
j
 
in week 
t
 
is assumed to be
 

w
here 


denotes the per
-
12oz price; 


measures the package volume in ounce;  


,
 

and 


are indicators for in
-
store display, advertising and price re-


is the unobserved product heterogeneity; 


to 


are ran-
dom coefficients with Normal distribution; 


is an i.i.d error with Type I extreme value 
distribution.
 
The characteristics which are chosen to construct the instrument variables are 


and the constant term. I construct the instrument variables and 
the control function following the way described in section 
2.
3.3.2 as equation (
2.17
) and 
(
2.18
).
 
Table 
2.
1
1
 
report
s
 
the estimation results. The est
imators given by the mixture approach 

66
 

looks problematic: it gives a significantly p
ositive estimator for the mean coefficient of 
Price (.542). At the same time, the BLP estimator of the coefficient of Promotion has ex-
traordinarily large mean (2.557) and variation (2.843). The adjusted R
-

BLP estimation is merely .067. The
 
above evidence supports the conjecture that allowing 
for random coefficients while not dealing with choice set heterogeneity would worsen the 
estimation results.
 

logit es

estimators given by both mixture and simple logit estimations are significant and of ex-
pected signs. The mixture estimation gives insignificant estimator for the coefficie


on the probability distribution of the choice sets. The insigni
ficance with respect to the 
preference parameter can be interpreted as for those consumers who considered all potato 

significantly increase their utility level of purc
hasing the product. Another point which 
worth noticing is that the mixture estimation gives much greater estimator for the coeffi-

0.662. This indicates that the simple
 
logit would largely underestimate the effect of dis-


to other brands.
 
67
 
The estim
ated probability distribution of the choice sets is listed in the lower part of 
Table 
2.
12. According to the estimation, 71.0% of consumers considered all potato chip 
products, another 27.7% of consumers only considered the potato chip products which were 
on promotion. The last type of choice sets, merely containing products which were not on 
promotion, had an insignificant estimated realizing probability, which is consistent with 
the intuition. The lower part of Table 
2.
13 reports the estimated price elast
icities and ad-
justed R
-
squared. On average the mixture estimation gives more elastic result. The mean 
of own
-
price elasticity given by mixture estimation is 
-
1.320, for the simple logit it is 
-
1.158. Comparing the adjusted R
-

e estimation fits the data 
better.
 
2.
5.3.3 
 
Market of Hotdogs
 
This subsection analyzes the market of hotdogs in year 2011. Like the previous two 
markets, I firstly analyze the consumer purchasing behavior using the individual level data. 
Part of the result
s are listed in Table 
2.
1
2
.
 
For the year 2011, there were 26,287 panelists who had 28,841 single shopping records. 
On average each consumer visited a store 1.1 times per week and purchased 1.6 units of 
hotdog products per shopping trip. Such shopping patte
rn generally accorded with the as-
sumptions of the discrete choice model.
 
To analyze consumers shopping behavior related to product attributes, I only keep the 
shopping records of panelists who purchased hotdogs for no less than 3 weeks in the year 
since th
e other panelists were not likely in the market of hotdogs. For the remaining 766 
panelists
, on average the number of different hotdog products purchased by each of them 
was 4.04. From the lower part of Table 
2.
1
2
 
we can see that, similar to the market of 
potato 
68
 

for the market of hotdogs. 19.58% consumers only purchased hotdog products which were 

ore than 


vacuum
-
packaged. On average, vacuum
-
packaged products made up 25% of 


minant attribute since only 2.48% (/ 3.13%) of consumers only purchased the hotdog prod-
ucts which were on display (/ vacuum
-
packaged).
 
In the next step I p
repare the store level data for the estimation. The sample consists 
of sales records from the store
-
level data which were matched with the individual
-
level 
data in year 2011. There were 96 different products ever in the market in year 2011. Order 
the produ
cts according to the average weekly sales from the greatest to the smallest, the 
products with the order over 65th are defined as the outside goods. The remaining inside 
goods made up 96.1% of the total sales and there were 2,854 product
-
level weekly sales
 
records. Table 
2.13
 
lists the summary statistics of some products with the top sales.
 

best seller in the market also had the highest frequency of being on display, on promotion 
an
d advertised. Among the top 8 best sellers there were 2 products that were vacuum
-
pack-
aged. In the lower part of Table 
2.13
, comparing the average value of different attributes 
among all products, top
-
20
-
sales products and top
-
10
-
sales products we can find
 
a positive 


69
 

The utility to consumer i of buying inside produc
t j in week t is assumed to be
 

w
here 


denotes the per
-
16oz price; 


measures the package volume in ounce;  


,
 

and 


are indicators for in
-
store display, advertising and price re-
duction flag; 


is an indicator of whether the product is vacuum
-
packaged.


is 
the unobserved pro
duct heterogeneity; 


to 


are random coefficients with Normal 
distribution; 


is an i.i.d error with Type I extreme value distribution.
 
The characteristics which are chosen to construct the instrument variables are 


and the constant term. I construct the instrument variables and 
the control function following the way described in section 3.3.2 as equation (
2.17
) and 
(
2.18
).
 
The estimation results are listed in Table 
2.
14
. As before, the 3 columns
 
report, respec-
tively, the simple logit estimation, the BLP estimation approach and the Mixture estimation 
approach. Remember that in the previous empirical application of the potato chips market, 
the BLP estimation was highly questionable as it gave a sig
nificantly positive estimator of 
the mean part of the price coefficient. In current case, the mean price coefficient estimator 
given by BLP estimation is 
-
.085, much smaller (in absolute value) than those given by 
simple logit (
-
.259) and the Mixture estim
ation (
-
.497). Intuitively the BLP estimation re-
sult seems unreliable again. The Mixture estimation gives the largest (in absolute value) 
estimator of the mean part of price coefficient, suggesting that the other estimations would 
underestimate the price e
lasticity for the hotdogs market, which has been proven in the 
70
 
lower part of Table 
2.
1
4
. 
 
Similar to the market of potato chips, the Mixture estimation gives insignificant esti-


the
 
key factor in the choice set formation process. The significance of the 

the insignificance with respect to the preference parameter can be i
nterpreted as, for those 
consumers who considered all hotdog products in the market no matter they were on pro-

the product. 
 
The estimators of both the mean coeffic

negative in both simple logit and Mixture estimations, suggesting that on average consumer 
prefer small size and non
-
vacuum
-
packaged products.
 
The middle part of Table 
2.
1
4
 
reports the estimated probability dis
tribution of the 
choice sets for the mixture estimation. According to the results, 72.2% of consumers con-
sidered all hotdog products, another 26.5% of consumers only considered the hotdog prod-
ucts that were on promotion. The last type of choice set that co
ntains merely products which 
were not on promotion had a tiny estimated realizing probability of 1.3%, which is con-
sistent with the intuition. The estimated probabilities roughly accord with the shopping 
pattern indicated by the individual level data. Look
 
back at Table 14: 19.58% of consumers 

products made up more than 80% of their total hotdogs purchasing records. The adjusted 
R
-

2.
1
4
. The mixture estimation has a higher ad-
justed R
-
squared, suggesting the mixture model fits the data better.
 
71
 
2.
6
 
Concluding Remarks
 
I
n this paper I propose a two
-
step mixture approach to estimate discrete choice models 
with unobserved choice sets. The 
approach implements a control function method to deal 
with product heterogeneity and a choice set formation model to deal with the unobserved 
choice sets. The strategy of empirical study is to firstly assume the choice set formations 
and then jointly estim
ate the preference parameters and the probability distribution of 
choice sets. This estimation approach is designed to be applied to the aggregate sales data. 
And when individual level data on consumer purchasing history is available, the purchasing 
patter
ns of consumers in the sample can provide guidance on the choice set formations. 
The validity of the mixture model can be examined by comparing the estimated choice set 
distribution with the actual purchasing patterns of consumers. The effectiveness of the
 
Mix-
ture approach is demonstrated via Monte Carlo experiments. It is then applied to IRI mar-
keting data in three markets, respectively, milk, potato chips and hotdogs and is shown to 
be useful in correcting the biases caused by assuming a universal choice 
set.
 
Currently I have implemented the mixture estimation on the aggregate data, but this 
idea can be extended to the individual level data as well, which is an on
-
going project of 
my research. One of the empirical difficulties is that the individual level 
data and the ag-
gregate sales data are usually not well matched since they are collected separately. 
 
I
n addition, the traditional BLP estimation approach performs questionably in the em-
pirical application part of this paper, which calls for further investi
gation.  
 
 
72
 
 
APPENDIX
 
 
73
 
A
PPENDIX FOR CHAPTER 
2
 
 
10

onte Carlo Results I: Varying Number of Markets
 
 
T=5
 
T=25
 
T=50
 
 
mixture
 
logit
 
mixture
 
logit
 
mixture
 
logit
 

Bias
 
.0616 
 
-
1.2251 
 
.0158 
 
-
1.3063 
 
.0160 
 
-
1.2651 
 
 
Sd.D
 
.3514 
 
.3597 
 
.3013 
 
.3348 
 
.2477 
 
.3503 
 

Bias
 
-
.0084 
 
.4973 
 
-
.0104 
 
.5090 
 
-
.0099 
 
.5025 
 
 
Sd.D
 
.1107 
 
.0880 
 
.0858 
 
.0828 
 
.0638 
 
.0810 
 

Bias
 
-
.0034 
 
-
1.0167 
 
-
.0029 
 
-
.9628 
 
.0011 
 
-
.9963 
 
 
Sd.D
 
.0947 
 
.6330 
 
.0407 
 
.5370 
 
.0281 
 
.5554 
 

Bias
 
-
.0001 
 
 
-
.0010 
 
 
.0017 
 
 
Sd.D
 
.0424 
 
 
.0188 
 
 
.0125 
 
 
Bias
 
-
.0154 
 
 
-
.0119 
 
 
-
.0063 
 
 
Sd.D
 
.0691 
 
 
.0355 
 
 
.0330 
 
 
Bias
 
.0029 
 
 
.0039 
 
 
.0023 
 
 
Sd.D
 
.0246 
 
 
.0128 
 
 
.0126 
 
 
Bias
 
.0125 
 
 
.0080 
 
 
.0040 
 
 
Sd.D
 
.0493 
 
 
.0268 
 
 
.0240 
 
 
Ave
 
.9947 
 
.4428 
 
.9949 
 
.4670 
 
.9949 
 
.4831 
 
 
Sd.D
 
.0019 
 
.1617 
 
.0010 
 
.1273 
 
.0007 
 
.1151 
 
 
74
 

Monte Carlo Results II: Varying Size of Choice Set
 
 
J=20
 
 
J=50
 
 
mixture
 
logit
 
 
mixture
 
logit
 

Bias
 
 
.0158 
 
-
1.3063 
 
 
.0302 
 
-
1.2707 
 
 
Sd.D
 
 
.3013 
 
.3348 
 
 
.1880 
 
.2130 
 

Bias
 
 
-
.0104 
 
.5090 
 
 
-
.0055 
 
.5071 
 
 
Sd.D
 
 
.0858 
 
.0828 
 
 
.0460 
 
.0539 
 

Bias
 
 
-
.0029 
 
-
.9628 
 
 
.0011 
 
-
1.0495 
 
 
Sd.D
 
 
.0407 
 
.5370 
 
 
.0365 
 
.3751 
 

Bias
 
 
-
.0010 
 
 
.0009 
 
 
Sd.D
 
 
.0188 
 
 
.0133 
 
 
Bias
 
 
-
.0119 
 
 
-
.0155 
 
 
Sd.D
 
 
.0355 
 
 
.0481 
 
 
Bias
 
 
.0039 
 
 
.0058 
 
 
Sd.D
 
 
.0128 
 
 
.0150 
 
 
Bias
 
 
.0080 
 
 
.0098 
 
 
Sd.D
 
 
.0268 
 
 
.0361 
 
 
Ave
 
 
.9949 
 
.4670 
 
 
.9900 
 
.4353 
 
 
Sd.D
 
 
.0010 
 
.1273 
 
 
.0014 
 
.0740 
 
Ave. 


.1256 
 
 
.0521 
 
 
75
 
12
T
able 2.3
15
:  
P
roduct Statistics
 
 
Unique Parent 
Companies
 
Unique 
Vendors
 
Unique 
Brands
 
Unique 
UPCs
 
Carbonated 
Beverages
 
232
 
249
 
538
 
10331
 
Cold Cereal
 
135
 
139
 
671
 
4402
 
Hot Dogs
 
207
 
233
 
367
 
1865
 
Margarine
 
53
 
60
 
113
 
533
 
Mayonnaise
 
124
 
129
 
166
 
744
 
Milk & Milk Products
 
280
 
357
 
636
 
8830
 
Peanut Butter
 
69
 
72
 
97
 
513
 
Salty Snacks
 
949
 
989
 
1762
 
18655
 
Soup
 
188
 
201
 
253
 
4899
 
Yogurt
 
126
 
146
 
331
 
4081
 
 
15
 
The statistics of Table 2.3 and 2.4 were summarized by Rider (2013).
 
76
 
13
T
able 
2.
4
:  
P
roduct Features
 
Category
 
 
Number of UPCs
 
Carbonated Beverages
 
 
Total UPCs
 
 
10331
 
Lower Suger/Lower Calorie
 
 
27
 
Diet/Calori Free
 
 
2719
 
Cold Cereal
 
 
Total UPCs
 
 
4402
 
No 
Sugar/Low Sugar
 
 
392
 
Fiber/Whole Grain Claim
 
 
3532
 
Hot Dogs
 
 
Total UPCs
 
 
1865
 
Lower or Reduced Fat/Fat Free
 
 
171
 
Margarine
 
 
Total UPCs
 
 
533
 
Low
-
Cal/Low
-
Fat/Healthy Oil
 
 
206
 
Mayonnaise
 
 
Total UPCs
 
 
744
 
Lower Sugar
 
 
13
 
Low
-
Fat/Fat
-
Free
 
 
276
 
Milk & Milk Products
 
 
Total UPCs
 
 
8830
 
Low
-
fat/Skim Milk
 
 
5254
 
Whole Milk
 
 
1200
 
Peanut Butter
 
 
Total UPCs
 
 
513
 
Lower Sugar
 
 
114
 
Reduced Sodium/Sodium Free
 
 
83
 
Salty Snacks
 
 
Total UPCs
 
 
18655
 
Reduced
-
Fat/Fat
-
Free/Light
 
 
3370
 
Lower/Reduced Sodium
 
 
478
 
Yogurt
 
 
Total UPCs
 
 
4081
 
Low
-
Fat/Fat
-
Free
 
 
3549
 
Reduced/Low
-
Calorie
 
 
518
 
 
77
 
14
T
able 
2.
5
:  
M
arket Demographics
 
Variable
 
IRI DATA (%)
 
US Census 
(%)
 
General 
Characteristics
 
 
Percent Urban
 
85.2
 
80.3
 
Percent Over 65 Y.O.
 
11.9
 
12.4
 
Percent Female
 
51.2
 
50.9
 
Percent Women with Children
 
31.4
 
12.4
 
Race & Ethnicity
 
 
Percent White
 
77.3
 
75.1
 
Percent Black
 
12.7
 
12.3
 
Percent Foreign Born
 
8.5
 
11.1
 
Percent 
Foreign Born: Latin America
 
3.6
 
5.7
 
Education
 
 
Percent of Males with HS Diploma
 
26.7
 
27.6
 
Percent of Males with Bachelor's
 
18
 
16.1
 
Percent of Males with Master's
 
6.4
 
6
 
Percent of Females with HS Diploma
 
28.9
 
29.6
 
Percent of Females with 
Bachelor's
 
16.5
 
15
 
Percent of Females with Master's
 
6.1
 
5.8
 
Income
 
 
Median household income
 
$44,994 
 
$41,994 
 
%Households: Less than $10,000
 
8.5
 
9.5
 
%Households: $10,000
-
$20,000
 
11.6
 
12.6
 
%Households: $20,000
-
$35,000
 
18.8
 
19.4
 
%Households: $35,000
-
$50,000
 
16.5
 
16.6
 
%Households: $50,000
-
$75,000
 
20.6
 
19.4
 
%Households: $75,000 or more
 
24.1
 
22.5
 
 
78
 

Individual
-
level Purchasing Behavior
---
Market of Milk
 
 
Obs
 
Mean
 
 
Percent (%)
 
 
Quan-
tity
 
1
 
2
 
3
 
>=4
 
Number 
of
 
weekly 
trips
 
92,871
16
 
 
1.39
 
(.72)
 
 
70.80
 
22.27
 
4.96
 
1.97
 
 
quantity 
purchased 
per trip
 
129,145
17
 
 
1.52
 
(.85)
 
 
61.94
 
29.08
 
5.71
 
3.27
 
 
different 
products 
per con-
sumer
 
1,219
18
 
 
7.4
 
(3.50)
 
 
% of total 
shopping records
 
 
100
 
>95
 
>90
 
>80
 
% of con-
sumers
 
Fat
-
free
 
1,219 
 
 
5.41
 
6.32
19
 
9.84
 
15.34
 
 
Non
-
fat
-
free
 
1,219 
 
 
34.86
 
39.62
 
46.60
 
52.17
 
 
On promo-
tion
 
1,219 
 
 
1.39
 
1.97
 
4.27
 
10.25
 
                                        
16
 
This is the number of panelists in the individual level data set.
 
17
 
This is the number of total shopping trips in the in
dividual level data set.
 
18
 
This is the number of consumers for whom the individual level data set and the store level data set can be 
matched.
 
19
 
For 6.32% of consumers, Fat
-
free milk products made up more than 95% of their total purchasing records 
in marke
t of milk. Other elements in the same part of the table are interpreted in a similar way.
 
79
 
16
Table 
2.
7
:  
Summary Statistics for 
Selected Products 
---
 
Market of Milk
 
Brand
 
Fat 
content
 
Vol
 
(gallon)
 
Ave. sales 
per week
 
Ave. 
Price
 
($ per 1 
gallon)
 
Ave. 
Display
 
Ave. Pro-
motion
 
Ave. 
Feature
 
PRV
 
1%
 
1
 
1,002 
 
3.16 
 
0.00 
 
0.15 
 
0.06 
 
KEMPS
 
fatfree
 
1
 
992 
 
2.71 
 
0.00 
 
0.45 
 
0.09 
 
KEMPS
 
1%
 
1
 
956 
 
2.42 
 
0.00 
 
0.25 
 
0.13 
 
PRV
 
2%
 
1
 
949 
 
3.35 
 
0.00 
 
0.17 
 
0.06 
 
PRV
 
fatfree
 
1
 
937 
 
2.97 
 
0.00 
 
0.15 
 
0.02 
 
KEMPS 
SE-
LECT
 
whole
 
0.5
 
887 
 
4.20 
 
0.35 
 
0.73 
 
0.04 
 
KEMPS 
SE-
LECT
 
fatfree
 
1
 
797 
 
3.45 
 
0.00 
 
0.04 
 
0.00 
 
KEMPS
 
2%
 
1
 
632 
 
2.82 
 
0.00 
 
0.33 
 
0.08 
 
Market 
Aver-
age
 
of
 
ALL
 
 
5.52
 
0.02
 
0.13
 
0.02
 
 
Top 20
 
 
3.53
 
0.05
 
0.18
 
0.03
 
 
Top 10
 
 
3.39
 
0.05
 
0.23
 
0.04
 
 
80
 
17
Table 2.8:  
Results: Demand 
E
stimation
---
Milk
 
 
Logit
 
 
BLP
 
 
Mixture
 
Parameter
 
Variable
 
 
Means
 
Price
 
 
-
.349
 
(.012)
 
 
-
.628
 
(.031)
 
 
-
.831
 
(.158)
 
 
Vol
 
 
.040
 
(.079)
 
 
-
.493
 
(.182)
 
 
.701
 
(.428)
 
 
Display
 
 
.527
 
(.119)
 
 
-
3.934
 
(1.707)
 
 
.325
 
(.251)
 
 
Advertising
 
 
.341
 
(.129)
 
 
-
.160
 
(1.536)
 
 
.054
 
(.372)
 
 
Promotion
 
 
.261
 
(.055)
 
 
.189
 
(.077)
 
 
.064
 
(.193)
 
 
LowFat
 
 
-
.365
 
(.0447)
 
 
-
.204
 
(.053)
 
 
.022
 
(.294)
 
 
Whole
 
 
-
.446
 
(.059)
 
 
-
.205
 
(.069)
 
 
.217
 
(.287)
 
 
_Cons
 
 
-
.138
 
(.111)
 
 
2.670
 
(.272)
 
 
1.483
 
(.870)
 
Choice Sets Distribution
 
Pr(whole set)
 
 
.549
 
(.095)
 
 
Pr(fatfree)
 
 
.157
 
(.042)
 
 
Pr(
-
fatfree)
 
 
.294
 
(.092)
 
 
81
 

Logit
 
 
BLP
 
 
Mixture
 
Parameter
 
Variable
 
 
Std. Deviations
 
Price
 
 
.000
 
(.644)
 
 
.135
 
(.162)
 
 
Vol
 
 
2.909
 
(.355)
 
 
.640
 
(.350)
 
 
Display
 
 
4.898
 
(1.460)
 
 
1.294
 
(.419)
 
 
Advertising
 
 
.937
 
(3.331)
 
 
.479
 
(.546)
 
 
Promotion
 
 
.000
 
(6.734)
 
 
.931
 
(.427)
 
Control Function
 

-
.173
 
(.135)
 
 
-
.036
 
(.085)
 
 
-
.076
 
(.082)
 
Price Elast. (%)
 
Mean
 
 
-
1.935
 
 
-
3.447
 
 
-
4.319
 
 
Sd. D.
 
 
.693
 
 
1.263
 
 
1.413
 
Adjusted R
-
squared
 
 
.418
 
 
.470
 
 
.570
 
 
82
 
18
Table 
2.
9
:  
Individual
-
level Purchasing Behavior
---
Market of Potato Chips
20
 
 
Obs
 
Mean
 
 
Percent (%)
 
 
Quan-
tity
 
1
 
2
 
3
 
>=4
 
Number 
of
 
weekly trips
 
62,501 
 
1.38
 
(.70)
 
 
70.02
 
23.67
 
4.55
 
1.76
 
 
quantity pur-
chased per trip
 
86,722 
 
1.35
 
(.69)
 
 
71.65
 
24.38
 
1.90
 
2.06
 
 
different prod-
ucts per con-
sumer
 
737 
 
8.5
 
(3.93)
 
 
% of total shopping records
 
 
100
 
>95
 
>90
 
>80
 
% of con-
sumers
 
On Display
 
737 
 
 
2.85
 
2.99
 
6.38
 
16.96
 
 
On 
Promotion
 
737 
 
 
18.32
 
21.85
 
36.77
 
60.24
 
 
LAYS
 
737 
 
 
4.88
 
5.56
 
8.28
 
16.01
 
 
20
 
Refer to footnotes 16
-
19 (for table 2.6) for the interpretations of some elements in the table.
 
83
 
19
Table 
2.
10
:  
Summary Statistics for Selected Products 
---
Market of Potato Chips
 
Product
 
Ave. Sales
 
Ave. Price
 
Ave. Dis-
play
 
Ave. Pro-
motion
 
Ave. 
Ad-
vertising
 
 
per week
 
per 10oz
 
 
LAYS
-
10oz
 
467.64
 
3.37
 
0.7
 
0.82
 
0.45
 
LAYS NATURAL
-
8.5oz
 
328.44
 
3.48
 
0.91
 
0.79
 
0.28
 
LAYS NATURAL
-
10.5oz
 
286.87
 
3.09
 
0.66
 
0.64
 
0.43
 
OLD DUTCH
-
10oz
 
239.27
 
3.09
 
0.41
 
0.82
 
0.36
 
OLD DUTCH
-
8.5oz
 
209.21
 
3.4
 
0.39
 
0.8
 
0.12
 
LAYS KETTLE COOKED
-
8.5oz
 
177.91
 
3.48
 
0.91
 
0.47
 
0.22
 
WAVY LAYS
-
10.5oz
 
151.99
 
3.08
 
0.41
 
0.75
 
0.15
 
Market Average of
 
ALL
 
3.31
 
0.44
 
0.64
 
0.12
 
 
Top 20
 
2.95
 
0.66
 
0.75
 
0.18
 
 
Top 10
 
3.14
 
0.69
 
0.85
 
0.24
 
 
84
 
20
Table 
2.
1
1
:  
Results: Demand 
E
stimation
---
Potato Chips
 
 
Logit
 
 
BLP
 
 
Mixture
 
Parameter
 
Variable
 
 
Means
 
Price
 
 
-
.347
 
(.023)
 
 
.542
 
(.098)
 
 
-
.397
 
(.255)
 
 
Vol
 
 
-
.067
 
(.007)
 
 
.140
 
(.051)
 
 
-
.030
 
(.068)
 
 
Display
 
 
.662
 
(.042)
 
 
1.462
 
(.380)
 
 
1.207
 
(.111)
 
 
Advertising
 
 
.570
 
(.059)
 
 
1.152
 
(.202)
 
 
.654
 
(.163)
 
 
Promotion
 
 
.612
 
(.044)
 
 
2.557
 
(.585)
 
 
.000
 
(.003)
 
 
Lays
 
 
.639
 
(.043)
 
 
1.041
 
(.370)
 
 
.870
 
(.201)
 
 
_Cons
 
 
.333
 
(.152)
 
 
-
4.939
 
(.629)
 
 
-
.021
 
(.325)
 
Choice Sets Distribution
 
Pr(whole set)
 
 
.710
 
(.084)
 
 
Pr(promotion)
 
 
.277
 
(.086)
 
 
Pr(
-
promotion)
 
 
.013
 
(.003)
 
 
85
 

(

)
 
 
Logit
 
 
BLP
 
 
Mixture
 
Parameter
 
Variable
 
 
Std. Deviations
 
Price
 
 
.381
 
(.358)
 
 
.001
 
(.133)
 
 
Vol
 
 
.067
 
(.117)
 
 
.000
 
(.011)
 
 
Display
 
 
1.207
 
(.780)
 
 
.024
 
(.158)
 
 
Advertising
 
 
.237
 
(5.329)
 
 
.033
 
(.235)
 
 
Promotion
 
 
2.843
 
(.655)
 
 
.003
 
(.027)
 
 
Lays
 
 
.256
 
(7.282)
 
 
.000
 
(.110)
 
Control Function
 

-
.040
 
(.398)
 
 
.080
 
(.149)
 
 
.201
 
(.148)
 
Price Elast. (%)
 
Mean
 
 
-
1.158
 
 
1.929
 
 
-
1.320
 
 
Sd. D.
 
 
.443
 
 
1.117
 
 
.506
 
Adjusted R
-
squared
 
 
.266
 
 
.067
 
 
.400
 
 
86
 
21
Table 
2.
1
2
:  
I
ndividual
-
level Purchasing Behavior 
---
 
Market of Hotdogs
21
 
 
Obs
 
Mean
 
 
Percent (%)
 
 
Quantity
 
1
 
2
 
3
 
>=4
 
Number of
 
weekly 
trips
 
26,287 
 
1.10
 
(.34)
 
 
91.34
 
7.79
 
.72
 
.15
 
 
quantity 
purchased 
per trip
 
28,841 
 
1.60
 
(.98)
 
 
57.63
 
32.86
 
4.35
 
5.17
 
 
different 
products 
per con-
sumer
 
766 
 
4.04
 
(2.21)
 
 
% of total shopping records
 
 
100
 
>95
 
>90
 
>80
 
% of con-
sumers
 
On Display
 
766 
 
 
2.48
 
2.48
 
2.48
 
4.96
 
 
On Pro-
motion
 
766 
 
 
19.58
 
19.71
 
22.06
 
35.77
 
 
Vacuum
 
766 
 
 
.65
 
.65
 
.78
 
1.17
 
 
21
 
Refer to footnotes 16
-
19 (for table 2.6) for the interpretations of some elements in the table.
 
87
 
22
Table 
2.
1
3
:  
Summary Statistics for Selected Products 
---
 
Market of Hotdogs
 
Brand
 
Vol-
ume
 
Product 
type
 
Ave. 
Sales
 
Ave. 
Price
 
Vac-
uum
 
Ave. 
Dis-
play
 
Ave. 
Promo-
tion
 
Ave. 
Adver-
tising
 
 
(OZ)
 
 
per 
week
 
($ per 
16oz)
 
 
OSCAR 
MAYER
 
16
 
WIE-
NER
 
244.21
 
2.59
 
0
 
0.42
 
0.81
 
0.15
 
SCHWEI-
GERT 
 
12
 
FRANK
 
155.87
 
1.57
 
0
 
0.12
 
0.33
 
0.04
 
OSCAR 
MAYER
 
16
 
WIE-
NER
 
129.85
 
2.75
 
0
 
0.33
 
0.69
 
0.15
 
OSCAR 
MAYER
 
16
 
WIE-
NER
 
116.46
 
2.76
 
1
 
0.21
 
0.69
 
0.13
 
OSCAR 
MAYER
 
16
 
FRANK
 
109.88
 
2.53
 
0
 
0.38
 
0.5
 
0
 
BAR S
 
12
 
FRANK
 
92.11
 
1.4
 
0
 
0.09
 
0.36
 
0.02
 
JOHN 
MOR-
RELL
 
12
 
FRANK
 
73.11
 
1.66
 
0
 
0.11
 
0.4
 
0.09
 
OLD 
WISCON-
SIN
 
24
 
WIE-
NER
 
71.96
 
4.18
 
1
 
0.13
 
0.31
 
0
 
Market 
 
of
 
ALL
 
 
3.65
 
0.36
 
0.08
 
0.31
 
0.04
 
Average
 
 
Top 20
 
 
2.66
 
,19
 
0.15
 
0.46
 
0.09
 
 
Top 10
 
 
2.46
 
0.25
 
0.2
 
0.53
 
0.09
 
 
88
 

Results: Demand Estimation
---
Hotdogs
 
 
Logit
 
 
BLP
 
 
Mixture
 
Parameter
 
Variable
 
 
Means
 
Price
 
 
-
.259
 
(.013)
 
 
-
.085
 
(.073)
 
 
-
.497
 
(.172)
 
 
Vol
 
 
-
.025
 
(.002)
 
 
-
.017
 
(.008)
 
 
-
.025
 
(.017)
 
 
Display
 
 
1.040
 
(.074)
 
 
1.132
 
(.084)
 
 
1.074
 
(.210)
 
 
Advertising
 
 
.474
 
(.095)
 
 
.608
 
(.127)
 
 
.212
 
(.205)
 
 
Promotion
 
 
.489
 
(.042)
 
 
.622
 
(.074)
 
 
.067
 
(.151)
 
 
Vacuum
 
 
-
.236
 
(.036)
 
 
-
.325
 
(.557)
 
 
-
.707
 
(.155)
 
 
_Cons
 
 
-
.261
 
(.074)
 
 
-
1.062
 
(.348)
 
 
.245
 
(.261)
 
Choice Sets Distribution
 
Pr(whole set)
 
 
.722
 
(.022)
 
 
Pr(promotion)
 
 
.265
 
(.022)
 
 
Pr(
-
promotion)
 
 
.013
 
(.002)
 
 
89
 

(

)
 
 
Logit
 
 
BLP
 
 
Mixture
 
Parameter
 
Variable
 
 
Std. Deviations
 
Price
 
 
.000
 
(1.962)
 
 
.331
 
(.187)
 
 
Vol
 
 
.001
 
(.281)
 
 
.001
 
(.010)
 
 
Display
 
 
.000
 
(49.250)
 
 
.066
 
(.171)
 
 
Advertising
 
 
.000
 
(.183)
 
 
.508
 
(.209)
 
 
Promotion
 
 
.650
 
(.579)
 
 
.107
 
(.100)
 
 
Vacuum
 
 
.322
 
(3.318)
 
 
.910
 
(.218)
 
Control Function
 

.023
 
(.144)
 
 
.149
 
(.101)
 
 
.119
 
(.088)
 
Price Elast. (%)
 
Mean
 
 
-
.932
 
 
-
.312
 
 
-
1.144
 
 
Sd. D.
 
 
.391
 
 
.122
 
 
.265
 
Adjusted R
-
squared
 
 
.373
 
 
.310
 
 
.378
 
 
90
 
 
BIBLIOGRAPHY
 
 
91
 

Ackerberg, 
D
., 
L
. Benkard, 
S
. Berry, and 
A
. Pakes
 
(2007)
.
 
Econometric Tools for An-
alyzing Market Outcomes
.
 
in J. J. Heckman and E. Leamer, eds., 
Handbook of Economet-
rics
, North
-
Holland, chapter 63, pp. 4171
-
4276.
 
Allenby, P. M. And P. E. Rossi 
(1998)
.
 
Marketing models of consumer heterogeneity
. 
Journal of Econom
etrics
, 89 (1), 57
-
78.
 
Anderson, S., A. Depalma, and J. F. Thisse
 
(1989)
.
Demand for Differentiated Products, 
Discrete Choice Models, and the Characteristics Approach
.
 
Review of Economic Studies
, 
56,21

35. [1027]
 
Bajari, P., and C. L. Benkard
 
(2005)
. 
Demand
 
Estimation with Heterogenous Consumers 
and Unobserved Product Characteristics: A Hedonic Approach
.
 
Journal of Political Econ-
omy
, 113, 1239

1276. [1051,1056]
 
Ben
-
akiva, M., and B. Boccara
 
(1995)
. 
Discrete choice models with latent choice sets
. 
International Journal of Research in Marketing
, 12, 9

24.
 
B
erry
, S. 
(1994)
. 
Estimating Discrete Choice Models of Product Differentiation
.
 
Rand 
Journal of Economics
, 25, 242

262. [1030,1041]
 
Berry, S., J. Levinsohn, and A. Pakes
 
(1995)
.
 
Automobile 
Prices in Market Equilibrium
. 
Econometrica
, 63, 841

890. [1017,1033,1041]
 
Berry, S., J. Levinsohn, and A. Pakes 
(1999)
. 
Voluntary Export Restraints on Automo-
biles: Evaluating a Trade Policy
.
 
American Economic Review
, 89, 400

430. 
[1031,1041,1061,1062]
 
Berr
y, S., J. Levinsohn, and A. Pakes
 
(2004)
. 
Differentiated Products Demand Systems 
from a Combination of Micro and Macro Data: The New Car Market
.
 
Journal of Political 
Economy
, 112, 68

105. [1033]
 
Bronnenberg, Bart J., M. W. Kruger, and C. F. Mela
 
(2008)
. 
Da
tabase paper: The IRI 
marketing data set
.
 
Marketing Science
, 27(4) 745
-
748.
 
Chamberlain
, G.
 
(1987)
. 
Asymptotic Efficiency in Estimation with Conditional Moment 
Restrictions
.
 
Journal of Econometrics
, 34, 305

344. [1030,1031,1062]
 
Crawford, G. S., R. 
Griffith, and A. Iaria
 
(2016)
.
 
Demand Estimation with Unobserved 
Choice Set Heterogeneity
.
 
Working paper.
 
Eliaz, K., and R. Spiegler 
(2011)
. 
Consideration sets and competitive marketing
. 
The Re-
view of Economic Studies
, 78(1), 235

262.
 
92
 
F
ox
, J. T.
 
(2007)
. 
Se
miparametric estimation of multinomial discrete choice models using 
a subset of choices
.
 
The RAND Journal of Economics
, 38(4), 1002

1019.
 
G
oeree
, M. S. 
(2008)
. 
Limited Information and Advertising in the U.S. Personal Computer 
Industry. 
Econometrica
, 76, 10
17

1074.
 
Gourieroux, C., A. Monfort, E. Renault, and A. Trognon
 
(1987)
.
 
Generalized Residuals
.
 
Journal of Econometrics
, 34, 5

32. [1034]
 
H
endel
, I. 
(1999)
.
 
Estimating Multiple
-
Discrete Choice Models: An Application to Com-
puterization Returns
.
 
Review of Eco
nomic Studies
, 66, 423

446. [1027,1029]
 
Hortacsu, A., and C. Syverson
 
(2004)
. 
Product Differentiation, Search Costs, and Com-
petition in the Mutual Fund Industry: A Case Study of S&P 500 Index Funds
.
 
The Quar-
terly Journal of Economics
, 119(2), pp. 403

456.
 
Kim, 
H
. And 
K
. I. Kim
 
(2017)
. 
Estimating store choice with endogenous shopping bun-
dles and price uncertainty
.
 
International Journal of Industrial Organization
, 54, 1
-
36.
 
Kim, K. I., and A. Petrin
 
(2010)
. 
Control Function Corrections for 
Unobserved Factors 
in Differentiated Product Models
.
 
Kruger, M. W. And D. Pagni
 
(2008)
.
 
IRI Academic Data Set Description, version 
2.3
.
 
Chicago: Information Resources Incorporated.
 
L
u
, Z.
 
(2018)
.
Estimating Multinomial Choice Models with Unobserved Choice S
ets
.
 
Working paper.
 
M
anski
, C. F. 
(1977). The structure of random utility models. 
Theory & Decision,
 
8
(3), 
229
-
254.
 
Manzini, P., and M. Mariotti
 
(2014)
. 
Stochastic Choice and Consideration Sets
.
 
Econo-
metrica
, 82(3), 1153

1176.
 
Masatlioglu, Y., D. Nakajima,
 
and E. Y. Ozbay
 
(2012)
. 
Revealed attention
.
 
The Ameri-
can Economic Review
, 102(5), 2183

2205.
 
M
c
F
adden
, D. 
(1978)
. 
Modeling the choice of residential location
.
 
Transportation Re-
search Record
, (673).
 
Pakes, A., and D. Pollard
 
(1989)
. 
Simulation and the Asym
ptotics of Optimization Esti-
mators
. 
Econometrica
, 57, 1027

1057. [1039]
 
Paola, M., and M. Marco
 
(2013)
. 
Stochastic Choice and Consideration Sets
.
 
P
etrin
, A. 
(2002)
.
 
Quantifying the Benefits of New Products: The Case of the Minivan
.
 
Journal of 
Political Economy
, 110, 705

729. [1018,1033,1035,1050]
 
P
ires
, T. 
(2012)
. 
Consideration Sets in Storable Goods Markets
.
 
working paper, 
93
 
Northwestern University
.
 
R
ider
, J. K. 
(2013)
. 
Essays in Applied Economics
.
 
UC Berkeley Electronic Theses and 
Dissertations
.
 
S
tern
, S.
 
(1997)
. 
Simulation
-
Based Estimation
. 
Journal of Economic Literature
, 35, 2006

2039. [1039]
 
Stern
, S.
 
(2000)
. 
Simulation Based Inference in Econometrics: Motivation and Methods
.
 
in 
Simulation
-
Based Inference in Econometrics: Methods and 
Applications
, ed. by R. 
Mariano, M. J. Weeks, and T. Scheuermann. Cambridge, U.K.: Cambridge University 
Press, 9

37. [1039]
 
Swait, J., and M. Ben
-
akiva
 
(1987)
.
 
Incorporating random constraints in discrete models 
of choice set generation
.
 
Transportation Res
earch Part B Methodological
 
21.2:91
-
102.
 
Train
, K. E. 
(2009)
.
 
Discrete choice methods with simulation
. Cambridge university press.
 
W
ooldridge
, J.
 
(2002): 
Econometric Analysis of Cross Section and Panel Data
. Cam-
bridge, MA: MIT Press. [1031,1063]
 
 
94
 
CHAPTER 
3
 

3.1
 
Introduction
 
The last chapter emphasized the importance of considering the choice set heterogene-
ity while constructing and estimating discrete choice models. I proposed a 
mixture model 
to handle this issue. The mixture model extends the basic discrete choice model by allowing 
for heterogeneous choice sets instead of the identical universal choice set for all consumers. 
In the model, consumers are assumed to have different c
hoice sets which can be viewed as 
consumer types. Each type of consumer has distinct criteria on the product attributes ac-

different types of potential choice sets whil

choosing a certain discrete alternative. The type of consumer, which is equivalently the 
type of choice set, follows a multinomial distribution that is unknown to econometricians 
and need to be estimated. Under t
he special setup of my mixture model, the traditional 
estimation method basing on the BLP inversion and GMM fails to work. Instead, I pro-
posed a two
-
step mixture approach which implements a control function method to deal 
with product heterogeneity and a c
hoice set formation model to deal with the unobserved 
choice sets. The mixture approach was applied to the IRI marketing data in three markets 
and was shown to be useful in correcting the biases caused by assuming a universal choice 
set.
 
It is necessary to
 
conduct Monte Carlo simulation experiments to examine the con-
sistency of the estimators given by the two
-

95
 
two sets of simulation experiments which could be treated as the benchmark cases (respec-

 
consists of more simulation experiments, the goal of which is to examine the performance 
of the two
-
step mixture approach and make comparisons with several alternative estimation 
m
ethods under various scenarios.
 
In the benchmark case, the value of the choice set determinate variable for each 
product 
is
 
assumed to be constant over time (i.e. the fat content of a milk product). In order to 
examine the applicability of the mixture appr
oach,
 
here
 
I also examine the situation in 
which the value of the choice set determinant variable can vary over time (i.e. whether on 
promotion for a potato chips product)
 
by adjusting the data generating process.
 
On the other 
hand, since there is always e
ndogeneity problem in the reality and consistently estimating 
price coefficient requires appropriate 
instrument 
variables
,
 
I investigate the consistency of 
the estimators under a series of situations in which the instrument variables change from 


-
step 
mixture approach are compared with simple logit and BLP estimators. The results show 
that the two
-
step mixture estimators tend to converge to the true parameters and perform 
muc
h better than the other two estimators.
 
 
The rest of this paper proceeds as follows: Section 
3.
2 reviews the data generation 

.3
 
proposes the corresponding two
-
step mix-
ture estimation method. Section 
3.
4 reports 
the results of the Monte Carlo simulation ex-
periments. Section 
3.
5 concludes. 
 
3.2
 
Data Generation Process
 
 
The data generating process follows the idea of Chapter 2, Section 4: Firstly, 
96
 
construct the market of differentiated goods; Then define the consume
rs of heterogenous 
preferences and choice sets; At last each consumer chooses one product which yields the 
maximal utility level from his/her choice set. The choices of consumers are then integrated 
into industry
-
level sales data, on which the two
-
step mix
ture estimation approach can be 
implemented. The detail of the data generating process was introduced in Chapter 2, here 
I will briefly remind the key setups.
 
Assume the market includes J differentiated products and last
s
 
for T periods. The util-
ity to cons
umer i from choosing an inside product j at time t is
 

The utility from the outside option is
 

w
here 


and 


are two product attributes
,
 

is the product hetero-
geneity
 
and 


is its time shock
.  I make 


is generated as:
 

The first part, 


, is the base price for product j
.
  

, the unob-
served product heterogeneity 


enters the price function so the observed price 


is 
correlated with the unobserved error term. The second part, 


, is a 
time 
shock.
 

is assumed to be the choice set determinant attribute. In the benchmark case, 


is constant over time for product j : 


for each period t and 


for each product j. 
 
97
 
I 
specif
y
 
a random coefficient 


for the price variable 


,
 
with 


the mean part 
and 


the random part, 


; 


is an i.i.d stochastic term following type
-
I 
extreme value distribution across consumers, products (including the outside
 
option) and 
times.
 

Assume there are 
I
 
consumers which can be divided into three types in terms of choice sets. 
A percentage of 


consumers make their choices out f
rom products with all possible 
values of 


(all products in market). Another percentage of 


consumers only consider 
products with 


. The rest percentage of 


)
 
consumers only 
consider products with 


.
 
Corresponding choice sets are denoted as 


.
 
Here I set the cutoff point to be 0.5, the mean of 


so 


have the 
equal size (in terms of the number of products inside the choice set). The outside option is 
always inclu
ded in all types of choice sets. 
 
Denote consumer 

choice set as  


,
  

, define the choice indicator 
of whether consumer 
i 
chooses product 
j
 
at time 
t
 
as 
 

Then the market share of pr
oduct 
j 
at time 
t 
is 
 

In the benchmark case, I set 


.
 
 
98
 
3.3
 
Estimation Strategies
 

Define 
 

According to the derivative in Chapter 2, the probability of a consumer choosing prod-
uct j at time t, which is also the predicted market share of product j at time t is 
 

(3.3)
 
where 


is the CDF of 


.
 
Noticing that the distribution probabilities of the choice sets are nonnegative and sum 
up to 1, we can make the following transformation:
 

(3.4)
 
where 


Let 


.
 
 
An essential part in the estimation process is to match the predicted market share 


with the observed market share 


.
 
 
Since there are random coefficients and endogeneity in the utility function, one 
would naturally ask if there are p
ossible adjustments to the BLP estimation method to make 
it appropriate to this setup. 
 
99
 
 
If there were no heterogeneous choice sets, there would be no summation in equa-
tion (3.3), the model simplifies to regular discrete choice model and the BLP estimation
 
can proceed as follows: 
1)
For 
a 
candidate value of 


, use the contract mapping algo-
rithm to determine the 


as defined in equation (3.2)
 
for all products over all markets
;
 
2)Use GMM to estimate equation (3.2) and obtain the GMM residuals; 3)Minimi
ze the 
GMM residual over 


by iterating over the above steps. 
 
 
In 
the
 
mixture model there are two more parameters 


which determine 
the choice set distribution.
 
An estimating approach which is an analogy of the BLP estima-
tion 
naturally come as follows:
 
1) 
For a candidate value of 


, use the contract mapping algorithm to deter-
mine the 


as defined in equation (3.2) for all products over all markets
; 2)
 
Use GMM 
to estimate equation (3.2) and obtain the GMM residuals; 3)Minimize the GMM residual 
over 


by iterating over the above steps. 
 
 
However, in the simulation study the above BLP
-
type estimation 
turn
ed out to be 
invalid. Although whe
n 


are at the true parameter values the GMM can recover 
the 
correct preference parameters, the GMM residual is not minimized at the same time.
 
Theoretical proof 
regarding the failure of the BLP
-
type estimation can be a further 
research 
topic.
 

As proposed in Chapter 2, a valid control function approach 
for the mixture model 
proceeds in two steps. In the first step, I estimate the control variates
 
for the endogenous 
variable 


.
 
In the second step, I implement the non
-
linear least squares to match the 
predicted choice probabilities given by 
equation 
(
3.3
) with observed market shares
 
given 
100
 
by equation (3.1)
 
treating the
 
estimated
 
control variat
es as 
an 
additional regressor.
 
S
ince the data generation process is known, I can choose an appropriate instrument 
variable for 


and 

of the instrument variable. In the benchmark cas
e I set the instrument variable to be:
 

T
his is a very strong instrument variable since it is the part of 


excluding the het-
erogeneity 


.
 
Then the first step is to run OLS of 


on the endogenous variable 


and get the residuals 


.
 
The product heterogeneity 


is then approximated by 


in 
the second step.
 
The second step is the non
-
linear least squares. Define a vector 

 
to contain all the 
parameters that ne
ed to be estimated:
 

Since (2.15) is an integral, the choice probability can be obtained via simulation:
 

(
3.5
)
 
w
here
 

is a set of random draws from standard normal.
 
Here I set the sam-
ple size 


; 


is the realizing probability of choice set 


as defined in 
(3.4
).
 
Follow
ing
 
the idea of matching the predicted choice probabilities with the observed 
market shares, the least squares estimator can be obtained as:
 
101
 

where 


is the market share 
obtained from equation (3.1
)
, 


is the 
predicted choice probability defined jointly by equation 
(3.4) through 
(
3.6
)
.
 
In the above procedures I take the cutoff point of the choice set determinant variable 


, which is 0.5 in the setup, as already known.
 
I
n the ca

of the choice set determinant variable 


,
 
it needs to be estimated together with other pa-
rameters. Denote the cutoff point as 


,
 

,
 
it determines the structure of choice 
sets as follows:
 

Then just follow what I discussed above in the first situation for the estimation. The 
only differenc
e is that 


is now an element in the parameter vector 


.
 
3
.4
 
Simulation Results
 
 
In the simulation part of Chapter 2, I examined the performance of the mixture 
approach
 
under 
two 


and have gotten pleasant results. 
In this chapter 
more 
experiments have been taken
 
to explore the properties of the mixture estimators.
 
In addition 
to the 
Mixture estimators, the 
simple logit estimators
 
and
 
the BLP estimators are reported 
as 
comparison
. The BLP 
estimators
 
are obtained by the regular BLP estimating approach 
assuming all consumers have an identical universal choice set (no choice set heterogeneity).
 
102
 
 
In the benchmark case, the value of the cho
ice set determinant variable 


for 
each product 
j
 
is assumed to be constant over time. However, as I discussed in Chapter 2, 
it is also possible that the choice determinant variable can have different values for a prod-
uct over time. For instance, in 
the study of potato chips and hotdogs markets in Chapter 2, 

, 
and a product 
can be either on promotion or not depending on the marketing plan of the supermarket at 
one time point. 
The
 
first simulation experiment 
examines
 
the case when the value of choice 
set determinant variable can vary over time.
 
In the benchmark case (referred as 

DGP I

), 


for each period 
t
 
and 


for each product 
j
. 


and i.i.d. across each 
period 
t
 
and each product 
j
. All the other settings including parameters and variable distri-

 
,
 
the 
coefficient of the choice set determinant variable,
 
the bias and standard deviation increase 

are still at a relatively low level which make the Mixture estimator still acceptable. These 
results suggest that the Mixture ap
proach can be applied to the situations in which the value 
of the choice set determinant variable varies over time. For the choice set distribution pa-


two estimators, Logit and BLP, perform much worse than the Mixture 
estimator. Their estimators are largely biased with relatively large standard deviations.
 
103
 
The next experiment focuses on the instrument variables. The ideal situation for the 
mixture estima
tion is that the endogenous variable (price variable) is a function of the in-
strument variables (observed exogenous variables) and the unobserved product heteroge-
neity. Then the residuals obtained in the first step of the mixture estimation can appropri-
ate
ly approximate the heterogeneity. In practice it is not easy to find such perfect instru-
ment variables, the selected instrument variables might have components which are uncor-
related with the endogenous variable, which causes the control function to be an 
invalid 
approximation of the unobserved heterogeneity, thus make the final estimators inaccurate. 
In this experiment I examine the performance of the mixture estimators when the correla-
tion between the endogenous variable and the instrument variable varies
 

In the benchmark case the instrument variable is 


. This is an ideal 
instrument variable since it is just the part of 


excluding the heterogeneity 


.
 
Now 
define a new instrument variable as 


, where 


, i.i.d. across 
j 
and 
t
, and is independent of 


and 


.
 
The parameter 

 
determines the correlation between 


and 


. When 


,
 
it is the benchmark case. 
Denote the case when 


as DGP III, and 


as DGP IV.
 
We can see in Table 
3.2 that the results are consistent with the intuition. Although the biases remain at an ac-
ceptable low level, the standard deviations grow up rapidly as the instrument variable varies 


,
 
the standard deviation of 

 
Finally, I examine the performance of the estimators under the conditions that the 
choice set structures are unknown. 
Fo
r simplicity, assume there is no endogenous product 
104
 
heterogeneity and random coefficients in the model. 
As discussed in the end of section 
3
.3, 
assume the cutoff point of the choice determinant attribute (


) is unknown to econome-
tricians and need to be
 
estimated together with other parameters. In this set of experiments 
the true cutoff point 


is set to be 0.5, 0.6 ,0.7 and 0.8. According to the setup, a larger 


leads to a larger choice set 


and a smaller choice set 


. When 


=1, 


is the universal choice set and 


only contains the outside option. 
 
Table 
3.3
 

estimators from assuming 
t
he cutoff point 


is known as 0.5. The next column labeled 


is unknown and needs to 
be estimated. We can see that as the true value of 


deviates from 0.5 (the mean of 


)
, 


given by 

le logit as 


gets 
closer to 1.  
 

when the 


deviates from 0.5. Despite 


,
 
the model gives nearly unbiased estimators with accepta-
ble standard deviations. 
The reason is the model assumes 


is unknown and estimate it 
together with other parameters. However, looking at the first panel with 


,
 
which 
is the situation tha


is
 
when we need to estimate 


, the object function for the optimization is discontinuous 
w.r.t. 


, thus makes it difficult to give precise estimators.
 
I
n practice, the above situation in which the cutoff point is unknown would not be 
105
 
likely to happen
. First, the choice set determinant attributes usually have discrete but not 
continuous values. In majority cases, the choice set determinant attributes are dummy var-
iables (e.g. whether a milk product is fat
-
free, whether a product is on promotion/display


-
valued attribute to 
format the choice set, the cutoff point is usually common sense and can be taken as given 
prior to the estimation. Otherwise, consumers have different opinions
 
on the cutoff points, 


acteristics and the structure of her/his choice set.
 
3
.5
 
Concluding Remarks
 
This chapter continues the discussions about the two
-
step mixture approach 
of esti-
mating discrete choice models with unobserved choice sets proposed in Chapter 2. 
In this 
chapter I review the data generation process (DGP) of my mixture 
model, discuss the fail-
ure of another estimation method which depends on the BLP
-
type inversion under my DGP 
setup, and then conduct
 
three sets of
 
Monte Carlo simulation experiments to examine the 
validity of the two
-
step mixture approach and demonstrate i
ts superiority over other tradi-
tional estimation methods under various scenarios.
 
 
106
 
 
APPENDIX
 
 
107
 
A
PPENDIX FOR CHAPTER 3
 
 
onte Carlo Results
 
III
:
 
Varying Values of
 
Choice Set Determinant Vari-
able
 
 
DGP I
 
 
DGP II
 
 
M
ixture
 
L
ogit
 
BLP
 
Mixture
 
Logit
 
BLP
 

Bias
 
.0158
 
-
1.3063 
 

.0172 
 
-
1.4675 
 

Sd.D
 
.3013
 
.3348 
 

.3614 
 
0.4005 
 

Bias
 
-
.0104
 
.5090 
 

.0328 
 
0.5579 
 

Sd.D
 
.0858
 
.0828 
 

.0935 
 
0.0953 
 

Bias
 
-
.0029
 
-
.9628 
 

-
.0732 
 
-
1.6737 
 

Sd.D
 
.0407
 
.5370 
 

.1081 
 
0.7687 
 

Bias
 
-
.0010
 
 
-
.0013 
 
 
Sd.D
 
.0188
 
 
.0205 
 
 
Bias
 
-
.
0119
 
 
.
00
74
 
 
Sd.D
 
.
0355
 
 
.
05
41
 
 
Bias
 
.
0039
 
 
-
.
0094 
 
 
Sd.D
 
.
0128
 
 
.
0352 
 
 
Bias
 
.
008
0
 
 
.
0020 
 
 
Sd.D
 
.
0268
 
 
.
0251 
 
 
108
 

onte Carlo Results
 
IV
:
 
Weak Instrument Variable
 
 
DGP I
 
DGP III
 
DGP IV
 

Bias
 
 
Sd.D
 
 
Bias
 
 
Sd.D
 
 
Bias
 
 
Sd.D
 
 
Bias
 
 
Sd.D
 
 
Bias
 
 
Sd.D
 
 
Bias
 
 
Sd.D
 
 
Bias
 
 
Sd.D
 
 
109
 

=.5
 

=.6
 

=.8
 
 
Mixture 
(1)
 
Mixture 
(2)
 
Mixture 
(1)
 
Mixture 
(2)
 
Mixture 
(1)
 
Mixture 
(2)
 

Bias
 
 
-
.0001 
 
-
.1648 
 
.0882 
 
-
.1978 
 
-
.4709 
 
-
.4387 
 
 
Sd.D
 
 
.0245 
 
.1372 
 
.1342 
 
.1532 
 
.4547 
 
.2837 
 

Bias
 
 
.0002 
 
.0043 
 
.0239 
 
.0015 
 
.0631 
 
.0111 
 
 
Sd.D
 
 
.0052 
 
.0339 
 
.0175 
 
.0283 
 
.0215 
 
.0424 
 

Bias
 
 
-
.0010 
 
-
.1813 
 
-
.4113 
 
-
.1080 
 
.0617 
 
.0283 
 
 
Sd.D
 
 
.0334 
 
.4214 
 
.3046 
 
.3419 
 
.5308 
 
.2666 
 

B
ias
 
 
-
.0017
 
-
.0230
 
.
1217
 
-
.
0145
 
.
3262
 
-
.0083
 
 
S
d.D
 
 
.
0266
 
.
0493
 
.
1242
 
.
0488
 
.
1403
 
.
0922
 

Bias
 
 
.0002 
 
.0153 
 
-
.1191 
 
.0169 
 
-
.2296 
 
-
.0037 
 
 
Sd.D
 
 
.0106 
 
.0236 
 
.0753 
 
.0267 
 
.0529 
 
.0531 
 

Bias
 
 
.0015 
 
.0077 
 
-
.0026 
 
-
.0024 
 
-
.0966 
 
.0120 
 
 
Sd.D
 
 
.0171 
 
.0294 
 
.0547 
 
.0264 
 
.0945 
 
.0474 
 

Bias
 
 
.
0
360 
 
 
.
0
311 
 
 
-
.
0555
 
 
Sd.D
 
 
.0566 
 
 
.0526 
 
 
.1131 
 
110
 
 
BIBLIOGRAPHY
 
 
111
 

Berry, S. 
(1994)
. 
Estimating Discrete Choice Models of Product Differentiation
.
 
Rand 
Journal of Economics
, 25, 242

262. [1030,1041]
 
Berry, S., J. Levinsohn, and A. Pakes 
(1995)
.
 
Automobile Prices in Market Equilibrium
. 
Econometrica
, 63, 841

890. [1017,1033,1041]
 
Berry, S., J. Levinsohn, and A. Pakes 
(1999)
. 
Voluntary Export Restraints on Automo-
biles: Evaluating a Trade Policy
.
 
American
 
Economic Review
, 89, 400

430. 
[1031,1041,1061,1062]
 
Berry, S., J. Levinsohn, and A. Pakes
 
(2004)
. 
Differentiated Products Demand Systems 
from a Combination of Micro and Macro Data: The New Car Market
.
 
Journal of Political 
Economy
, 112, 68

105. [1033]
 
Kim, K. I., and A. Petrin
 
(2010)
. 
Control Function Corrections for Unobserved Factors 
in Differentiated Product Models
.
 
Train, K. E. 
(2009)
.
 
Discrete choice methods with simulation
. Cambridge university press.