ESSAYS IN EMPIRICAL INDUSTRIAL ORGANIZATION

By

Andrew Zeyveld

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Economics—Doctor of Philosophy

2025

Chapter 1: Steering Consumers’ Learning: Evidence from Stockout Substitutions in Curb-

ABSTRACT

side Pickup

Items ordered for curbside pickup sometimes go out of stock, obliging the store to choose substitutes

on consumers’ behalf. Using novel data from a supermarket chain, I show these “stockout substi-

tutions” influence consumers’ future purchases through the mechanism of learning. This presents

the store with the following opportunity to increase its future profits: if the store selects substitutes

from profitable brands that consumers have never tried before, some consumers will learn that

they like the brands of their substitutes and purchase these brands’ products in the future. How-

ever, consumers are less likely to accept such substitutes than they are to accept substitutes from

brands they have previously purchased. To quantify the trade-off between steering consumers’

learning and maximizing the probability of substitutes’ acceptance, I estimate a learning-based

model of differentiated products demand. Although steering consumers’ learning proves an un-

profitable strategy, the store can still increase profits—and consumer welfare—by individualizing

substitutions according to consumers’ past purchases and demographics.

Chapter 2: Demand Estimation When Consumers’ Preferences Vary over Time

This paper shows that workhorse demand systems fail to reproduce important substitution patterns

when individual consumers’ preferences vary over time. This failure is rooted in the independence of

preferred alternatives (IPA) properties of conditional and mixed logit, which restrict the relationship

between consumers’ purchases and their preferences among unpurchased goods. To assess the

empirical relevance of the IPA properties, I employ novel data from stockout substitutions in

curbside pickup. For the two product categories that I study, I document substitution patterns that

are inconsistent with the IPA property of conditional logit. As for mixed logit, its IPA property

proves consistent with the substitution patterns in one of the two product categories. To quantify

the benefits of relaxing the IPA property of mixed logit, I compare the model’s goodness of fit with

that of mixed probit (which does not display an IPA property). In keeping with the descriptive

evidence, the results of this comparison vary by product category.

ACKNOWLEDGEMENTS

I am grateful for the guidance and mentorship of my committee co-chairs, Mike Conlin and Kyoo

il Kim. They have been deeply invested in every aspect of my professional development, from

training me to think like an economist to extending my econometric toolkit. I would also like to

thank committee members Arijit Mukherjee and Forrest Morgeson for their thoughtful comments.

Additional thanks are extended to the retailer that supplied my data. I am especially grateful to

Kevin D., who first introduced me to the problem of stockout substitutions.

I would like to recognize generous financial support from the Michigan State University De-

partment of Economics, College of Social Science, and Graduate School.

I am also indebted

to dunhummby [sic], which provided supplementary financial support; and to the Institute for

Cyber-Enabled Research at Michigan State University, which provided computational resources

and services.

Finally, I would like to thank my spouse, Anna Zeyveld Jeffries, for her kindness, compassion,

and patience throughout my studies.

iii

CHAPTER 1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

1

TABLE OF CONTENTS

CHAPTER 2

.

.
.
. .

Introduction .

.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

STEERING CONSUMERS’ LEARNING: EVIDENCE FROM
STOCKOUT SUBSTITUTIONS IN CURBSIDE PICKUP . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.

4
4
2.1
.
8
2.2 Background . .
2.3 Descriptive Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Conceptual Model . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Empirical Model and Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 30
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Estimation Results
2.7 Counterfactual Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.8 Conclusion .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
BIBLIOGRAPHY .
APPENDIX 2A

.
.
DATA STRUCTURE AND OBSERVABLE
CHARACTERISTICS . . . . . . . . . . . . . . . . . . . . . . . 65
ADDITIONAL DESCRIPTIVE EVIDENCE . . . . . . . . . . . 69
ESTIMATION DETAILS . . . . . . . . . . . . . . . . . . . . . 76
ESTIMATION RESULTS FOR APPLE SAUCE CUPS . . . . . 81
SUPPLEMENTARY COUNTERFACTUAL
SIMULATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . 83

APPENDIX 2B
APPENDIX 2C
APPENDIX 2D
APPENDIX 2E

.
.

.
.

.
.

CHAPTER 3

.

.

.

.

Introduction .

DEMAND ESTIMATION WHEN CONSUMERS’ PREFERENCES
VARY OVER TIME . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 86
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.1
. 88
3.2 Relationship to Prior Literature . . . . . . . . . . . . . . . . . . . . . . . . .
. 94
3.3 Theory: Alternate-Choice Data in Demand Systems . . . . . . . . . . . . . .
. 98
Institutional Background and Data . . . . . . . . . . . . . . . . . . . . . . .
3.4
3.5 Descriptive Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.6 Structural Evidence .
3.7 Conclusion . .
. 129
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
.
BIBLIOGRAPHY .
PROOF OF LEMMA 1 . . . . . . . . . . . . . . . . . . . . .
APPENDIX 3A
. 139
COMPARISON OF THEOREM 1 WITH PRIOR
APPENDIX 3B
THEORETICAL RESULTS . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.

. 141
APPENDIX 3C MONTE CARLO TESTS OF THEOREM 1 . . . . . . . . . . . 142
APPENDIX 3D

. .
.
.

APPENDIX 3E

APPENDIX 3F

APPENDIX 3G

CROSS-CHARACTERISTIC CORRELATIONS IN
(DIS)SIMILARITY . . . . . . . . . . . . . . . . . . . . . . . . 144
DETAILS ON THE STRUCTURAL ESTIMATION
METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
MULTIPLE-UNIT PURCHASES OF INDIVIDUAL
PRODUCTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
SUPPLEMENTARY RESULTS FROM STRUCTURAL
ESTIMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

iv

CHAPTER 1

INTRODUCTION

People often face discrete choices. For example, someone who is shopping for a pint of ice cream

might confront dozens of options which vary in brand, flavor, and price. To understand how people

approach such discrete choices, social scientists routinely employ random utility models. These

models relate people’s choices to their circumstances. Concerning ice cream, for instance, how do

consumers respond when the price of their favorite product increases?

To estimate random utility models, researchers need to make assumptions about people’s

decision-making process. One such assumption is that people possess perfect information about

all the available options. Another is that people’s preferences remain stable over time. My

dissertation explores how these assumptions can be relaxed. How should the researcher model

people’s discrete choices when they possess imperfect information? Or when their preferences

change over time? I take up these questions in the context of grocery shopping, an ideal natural

laboratory to study discrete choice. In the first place, I can follow individual consumers’ purchases

over time (provided they participate in the chain’s loyalty program). This allows me to identify

heterogeneity in consumers’ preferences, as well as shifts in individual consumers’ preferences over

time. In the next place, grocery chains collect demographic data about the consumers who frequent

their stores. This enables me to determine how characteristics like age or income are correlated with

consumers’ purchases. Neither of these features of grocery shopping data are, of course, unique to

this dissertation. In fact, a large literature within empirical industrial organization and quantitative

marketing studies grocery shopping for these reasons (such as Nevo (2001); Erdem, Keane, and

Sun [2008]; or Backus, Conlon, and Sinkinson [2021]). My dissertation, however, benefits from a

recent development within the grocery industry: curbside pickup.

Curbside pickup is a “click-and-collect” form of shopping in which consumers order groceries

online and later pick up their groceries from the local supermarket. Importantly, requested items

sometimes go out of stock after consumers have already placed their orders, but before the store

collects them. This obliges the store to select substitutes on the affected consumers’ behalf. Once

1

the consumers arrive at the store, they are presented with two options: either they can “accept” the

substitute chosen by the store, or they can “reject” it and buy no such item.

These “stockout substitutions” sometimes cause consumers to try out new products for the

first time. What they learn about these products might, in turn, influence their future purchases.

In Chapter 2, I present model-free evidence that is consistent stockout substitutions’ affecting

consumers’ future purchases through the mechanism of learning. Further, consumers appear to

learn more about their tastes for brands—meaning branded product lines, like the Ben & Jerry’s

line of ice cream—than about their tastes for other characteristics (like size or flavor). This suggests

the store could exploit stockout substitutions to increase its future profits. For, if consumers were

offered substitutes from high-margin brands that they have never tried before, some would discover

that they like their substitutes’ brands and then purchase these brands’ (profitable) products in

the future. Additional descriptive evidence, however, suggests consumers are reluctant to accept

stockout substitutes from unfamiliar brands.

To quantify the trade-off between steering consumers’ learning and maximizing the probability

of substitutes’ acceptance, I estimate a learning model of differentiated products demand. Coun-

terfactual simulations indicate that the store could not perceptibly increase its profits by steering

consumers’ learning. This is partly due to consumers’ disinclination to accept stockout substitutions

from unfamiliar brands. In addition, when consumers accept, they do not learn enough about their

substitutes’ brands for future profits to meaningfully change.

Although steering consumers’ learning proves an unprofitable strategy, it emerges that store

profits—as well as consumer profits—increase when the store individualizes its choice of substitute

based on consumers’ original orders, past purchases, and demographics. Moreover, a substantial

fraction of these gains can be achieved by individualizing the choice of substitute according to

consumers’ original orders alone.

In Chapter 3, I put the stockout data to a different purpose: namely, recovering within-consumer

variation in preferences. The idea is that individual consumers’ preferences sometimes change over

time. Take the case of coffee: many consumers prefer iced coffee during the summer but hot

2

coffee during the winter. To what extent do workhorse demand systems accommodate within-

consumer preference variation like this? I show that conditional logit imposes independence

between consumers’ purchases and their preferences among unpurchased goods. As for mixed

logit, this more flexible model imposes conditional independence between consumers’ purchases

and their preferences among unpurchased goods, given the realizations of consumers’ random

coefficients. In other words, someone’s purchase should be uninformative of time-specific factors

that influenced both her purchase and her preferences among the goods she did not purchase. I

term these the “Independence of Preferred Alternatives” (IPA) properties of conditional and mixed

logit, respectively.

These theoretical results raise two empirical questions. First, can data help determine whether

consumers’ preferences in a given market are consistent with the IPA property of mixed logit? And

second, how should demand be estimated when consumers’ preferences prove inconsistent with the

property?

I answer these questions using the same dataset as in Chapter 2. On these data, the conditional

logit IPA imposes independence between consumers’ original orders and their willingness to accept

specific products as stockout substitutes. This restriction proves inconsistent with the data from both

product categories that I study: namely, bottled water and flour. As for mixed logit, its IPA property

essentially dictates that individual consumers have the same preferences for stockout substitutes

on all of their shopping trips—irrespective of their original order choice. Descriptive evidence

suggests that consumers’ behavior is consistent with this prediction in only one of the two product

categories that I study: bottled water. Concerning flour, by contrast, consumers’ preferences for

substitutes seem to change across shopping trips (owing perhaps to variation in the planned recipe).

To help quantify the benefits of relaxing the IPA property of mixed logit, I compare the model’s

goodness of fit with that of mixed probit (which does not display an IPA property). Overall, mixed

probit seems to forecast consumers’ accept/reject decisions more precisely than mixed logit does.

In keeping with the descriptive evidence, this disparity appears to be more pronounced for the

product category of flour than for bottled water.

3

CHAPTER 2

STEERING CONSUMERS’ LEARNING: EVIDENCE FROM
STOCKOUT SUBSTITUTIONS IN CURBSIDE PICKUP

2.1

Introduction

Consumers often make decisions with imperfect information. This has motivated many studies

on government information provision. These studies frequently find that the government could

increase consumers’ welfare by providing easily accessible information. Take the case of grocery

shopping. When the government requires that unhealthy products carry warning labels, consumers

reduce their purchases of unhealthy products they had mistakenly believed to be healthy (Barahona,

Otero, and Otero 2023). Health insurance is another example: by publishing quality scores, the

government can steer consumers towards higher-quality insurance plans (Vatter 2024).1

Online platforms also steer consumers’ learning.2 However, online platforms’ incentives differ

from those of the government: whereas the government steers consumers’ learning to increase their

welfare, online platforms steer consumers’ learning to maximize profits. How, then, are consumers

affected when online platforms steer their learning? I take up this question in the context of curbside

grocery pickup (hereafter, “curbside pickup”). This is a “click-and-collect” form of shopping in

which consumers order groceries online and then pick up their groceries from the local supermarket.

Sometimes, however, the store cannot supply an ordered item because it has gone out of stock.

This obliges the store to select another item—known as a “stockout substitution”—to serve as a

replacement. Once the consumer arrives, she can either purchase this suggested substitute, or reject

it and buy no such item.3

Stockout substitutions sometimes cause consumers to try out new products. What they learn

1Concerning both food and health insurance, government information provision also causes a welfare-increasing
response on the supply side. In particular, food manufacturers formulate healthier products (Barahona, Otero, and
Otero 2023), while health insurers increase plan quality (Vatter 2024).

2Firms employ many methods to steer consumers’ learning. One is advertising, which serves to inform consumers
of a firm’s product range (Anand and Shachar 2011) as well as to signal its quality level (Ackerberg 2003). Strategic
pricing is another method of steering consumers’ learning. By reducing its prices, a firm encourages consumers to try
its own products (Osborne 2011) while discouraging them from trying its competitors’ products (Ching 2010).

3Of course, she could also go into the store to search for a different substitute. However, the data suggest that this

is quite rare.

4

about these products might, in turn, influence their subsequent purchases. Using novel data on

curbside pickup at a large regional supermarket chain, I supply model-free evidence that supports

this intuition; stockout substitutions do, indeed, affect consumers’ future purchases through the

mechanism of learning. Moreover, consumers seem to learn more about their preferences for

brands–meaning branded product lines, like the Häagen-Dazs line of ice cream—than about their

preferences for other characteristics (such as size). This suggests that the store could exploit

stockout substitutions to steer consumers’ learning towards high-margin brands. For, if consumers

were offered substitutes from high-margin brands they have never tried before, some consumers

would discover that they like their substitutes’ brands and then purchase these brands’ (profitable)

products in the future. However, less profitable outcomes are also possible. Consumers might

be inclined to reject substitutes from unfamiliar brands, leaving the store with zero margins on

the present transaction. Consistent with this hypothesis, reduced-form evidence suggests that

consumers prefer when stockout substitutes belong to brands they have previously purchased.

Given the uncertainty involved, could the store increase profits by steering consumers’ learning?

And would doing so increase, or decrease, consumer welfare? To answer these questions, I estimate

a learning model of differentiated products demand. Counterfactual simulations suggest that the

store cannot increase profits by steering consumers’ learning. This is because consumers are

reluctant to accept stockout substitutes from unfamiliar brands and, when they do accept, tend

to learn too little for their subsequent purchases—or the store’s future profits—to meaningfully

change. However, I find that store profits—as well as consumer welfare—increase when the store

individualizes its choice of substitute based on consumers’ original orders, past purchases, and

demographics. Much of these gains can be realized by personalizing the choice of substitute

according to the consumer’s original order alone.

From a policy perspective, my findings are significant for the following reason. Regulators

worry that online platforms like Amazon favor profitable products in their search rankings and

product recommendations (Farronato et al. [2024]). Taking as given that online platforms engage

in such behavior, would it be better for consumers if platforms steered demand towards profitable

5

products with, or without, the benefit of consumer microdata (like purchase histories or household

demographics)? Conditional on platforms’ maximizing variable profits (as opposed to revenue or

consumer welfare), my results suggest that consumers may benefit when platforms exploit consumer

microdata. This result is surprising in the context of curbside pickup, where the “outside option”

of procuring the relevant item elsewhere (or going without) is quite unattractive.4

The remainder of the paper proceeds as follows. In Section 2.2, I relate my analysis to prior

work in industrial organization and quantitative marketing. I also provide details about the purchase

environment and data. Briefly, I study a supermarket chain that offers three ways to shop:

in-

person, home delivery, and curbside pickup (where “stockout substitutions” occur). For each such

substitution, I observe the out-of-stock item and the substitute, as well as the consumer’s decision

to accept or reject the substitute. Besides the data on stockout substitutions, I also employ “scanner

data” that record consumers’ purchases at the store. These scanner data display a household-level

panel structure thanks to the chain’s loyalty program, enabling me to compare someone’s purchases

before versus after a stockout substitution. I also have household-level demographic data that the

store commissioned from a marketing firm.

In Section 2.3, I present descriptive evidence of the trade-offs faced by the store as it chooses

stockout substitutes. I begin by characterizing when consumers are willing, or unwilling, to accept

stockout substitutes. Probit regressions indicate that the probability of acceptance is increasing

in the similarity of the substitute’s observable characteristics—such as brand or size—to those

of the out-of-stock product (as well as the consumer’s past purchases). Next, I ask whether

stockout substitutions influence consumers’ learning. To provide insight, I examine stockouts

where consumers are offered substitutes that feature a brand, flavor, or other attribute they have never

tried before. It emerges that these consumers proceed to purchase products with these hitherto-

unfamiliar attributes more often in the future than do comparable consumers who successfully

picked up before the stockout event. This pattern is consistent with consumers’ learning about

4To secure a substitute other than that offered by the store, a consumer who has suffered a stockout in curbside
pickup would need to (i) find a new parking spot that is not reserved for curbside pickup, enter the store, and search for
an alternative substitute therein; or (ii) add an extra visit to a different store.

6

their tastes for the substitutes’ observable characteristics. Furthermore, consumers seem to learn

more about the characteristic of brand than they do about other characteristics. This is intuitive;

consumers are unlikely to learn much from, say, purchasing a specific quantity of milk for the

first time. Finally, I examine how products’ observable characteristics affect their retail margins

(meaning the difference between the retail price and the wholesale cost). The characteristic of

brand proves a key determinant of retail margins.

These empirical patterns present the store with the following strategic problem. Although

stockout substitutions enable the store to steer consumers’ learning towards profitable brands, doing

so would increase the risk that stockout substitutes are rejected (leaving the store with zero margins

on the present transaction). In Section 2.4, I present a conceptual framework that formalizes these

trade-offs in a simplified environment. This framework shows that steering consumers’ learning

has an ambiguous effect on their welfare. The same is true when the store individualizes its choice

of substitute based on consumers’ past purchases or demographics.

Section 2.5 builds a learning model of differentiated products demand and then explains the

estimation procedure. In the model, consumers are unsure of their tastes for a given brand until they

purchase one of its products. Consumers’ prior beliefs about brands, along with their true tastes,

are heterogeneous. Brand aside, I allow for unobserved heterogeneity in consumers’ preferences

for non-brand characteristics like flavor. Consumers’ preferences for these characteristics also

vary based on the household-level demographic information observed by the store (like household

income).

The estimated model parameters are reported in Section 2.6. With these in hand, I simulate

profits and consumer welfare under counterfactual substitution policies. These policies vary along

two dimensions. One is the store’s objective function. I compare outcomes when the store either

(a) maximizes expected present-trip profits alone or (b) maximizes the present-discounted value

of expected profits (both present and future). The second dimension along which counterfactual

policies vary is the extent of consumer microdata used. I compare policies that exploit (i) none

of these data, (ii) just the consumer’s original order, or (iii) the consumer’s past purchases and

7

household demographics as well as her original order.

I find that substitution policies designed to maximize present-trip or total profits yield similar

outcomes. This pattern, which holds in multiple product categories, suggests that the store cannot

profit from steering consumers’ learning. To determine why this is the case, I simulate outcomes

under counterfactual changes to the purchase environment or the primitives of consumers’ learning.

I find that if consumers were less reluctant to accept stockout substitutes from unfamiliar brands, and

if they experienced more learning conditional on acceptance, the store could perceptibly increase

its future profits by steering consumers’ learning. As for the store’s use or disuse of the consumer

microdata, I find that most gains from individualization can be secured by conditioning the choice

of substitute on the out-of-stock product. Concerning super-premium ice cream, for example, 78%

($0.28) of the per-stockout gains from individualizing the choice of substitute can be achieved if

the store individualizes its choice of substitute according to the consumer’s original order.

2.2 Background

2.2.1 Related Literature

An emergent literature studies how online platforms steer consumers’ learning. This literature

has so far focused on search goods. These are goods whose utility can be determined prior to

purchase by inspecting the good or reading a description (see Nelson [1970]). Concerning search

goods, platforms steer consumers’ learning by manipulating the set of products that consumers

encounter on the platform. Take the case of search rankings, where consumers are more likely

to click on—and learn about—products that are highly ranked. This creates an incentive for

platforms to assign profitable goods a higher search rank than similarly-popular goods that afford

smaller margins. Consistent with this intuition, Farronato, Fradkin, and MacKay (2023) show that

Amazon’s search rankings favor its own products (which afford high margins) over the products

of third-party sellers (which afford thinner margins). Meanwhile, Reimers and Waldfogel (2023)

develop an equilibrium framework to detect bias in search rankings, which they apply to data from

Amazon, Expedia, and Spotify. Search rankings aside, platforms also steer consumers’ learning

through product recommendations. Chen and Tsai (2024) show that Amazon’s “Frequently Bought

8

Together” recommendations privilege Amazon’s own products over those of third-party sellers.

Unlike the foregoing studies, I explore how online platforms can steer consumers’ learning

about experience goods. These are goods for which consumers cannot learn their tastes prior

to purchase.

Instead, consumers learn their tastes for these goods through usage experiences

after purchase. Unlike search goods, I find that online platforms struggle to profit from steering

consumers’ learning about experience goods. This suggests that experience goods may require

less regulatory scrutiny than search goods with respect to platforms’ (potential) influence over

consumers’ learning.

My findings also relate to a literature on the welfare effects of online personalization. Does

consumer surplus increase or decrease when platforms exploit consumer microdata? In a field

experiment, Donnelly, Kanodia, and Morozov (2024) show that the personalized search algorithm

of a large online retailer delivers higher consumer surplus than does a uniform bestseller-based

ranking—despite the former’s placing nonzero weight on products’ margins.

In another field

experiment, Dubé and Misra (2023) show that personalized pricing shrinks total consumer surplus

despite benefitting most consumers. Complementary to these studies, I show that when consumers’

preferred products become unavailable, they might benefit when the online platform leverages

consumer microdata in suggesting an alternative item—even if the platform exclusively attends to

variable profits (as opposed to consumer surplus).

This study also contributes to the literature on incomplete information and consumer learning.

This literature spans many environments, from school choice (see Allende, Gallego, and Neilson

[2019]) to the demand for household appliances (see Newell and Siikamäki [2014]). Like me, a

subset of this literature studies consumers’ learning in connection to online platforms. For instance,

Allcott et al. (2025) show that Google’s dominance in web search owes partly to consumers’

imperfect information about competitors. Another subset of the consumer learning literature

shares my focus on consumer packaged goods. Using data on consumers’ TV viewing habits and

packaged good purchases, Ackerberg (2003) show that consumers learn about the quality of yogurt

brands from the brands’ TV advertisements.

9

To the consumer learning literature, I contribute the first empirical characterization of a firm’s

optimal strategy to steer consumer learning about experience goods.5 The task proves unusually

tractable in the context of curbside pickup for the following reasons. First, consumers’ preferences

over groceries, along with their learning, can be distilled in a comparatively simple demand model.6

And second, general equilibrium effects are negligible so far as stockout substitutions are concerned.

That is, the focal store’s optimal substitution policy is not influenced by those of its competitors.7

2.2.2 Curbside Grocery Pickup

In curbside pickup, consumers order groceries online and later pick them up from bricks-

and-mortar supermarkets. This form of grocery shopping gained traction during the COVID-19

pandemic (Young 2023) and remains popular, with US sales exceeding $3 billion in February 2024

alone (Brick Meets Click and Mercatus 2024).

To see how curbside pickup works, picture a consumer who wants to purchase two items: ice

cream and apple sauce. She begins by visiting the store’s app or website. When she searches

for a specific item—such as “ice cream”—she sees a list of relevant products, along with prices,

images, and written descriptions. Once she identifies her preferred product—say, Häagen-Dazs

vanilla ice cream—she adds it to her virtual “shopping cart.” Having repeated this process for apple

sauce—choosing, say, Mott’s Cinnamon—she completes the order by indicating the time when she

plans to pick up her groceries (for example, “Between 8 and 9 am tomorrow morning”).

Once the consumer is ready to pick up her groceries, she drives to the store and parks in

a designated “curbside pickup” area. A store worker then brings the groceries out to her car,

where she pays for them. Importantly, the store maintains the same prices online as in-store;8 our

5I am only aware of one other study that empirically characterizes the optimal supply-side strategy to steer
consumers’ learning about goods of any description—experience or otherwise. Compiani et al. (2024) consider
how online platforms like Expedia should rank products in web searches, given that consumers possess incomplete
knowledge of products’ observable characteristics. The assumption is that consumers will learn a product’s true utility
once they have clicked on its web page, which describes the product’s observable characteristics.

6Because packaged foods are highly standardized and have just one usage case (namely, snacking), I adopt a
“one-shot” model of learning in which consumers learn their true tastes for products after trying them just once. In
addition, grocery shopping is characterized by many fast-paced but low-stakes decisions.
I therefore approximate
consumers’ behavior as being myopic, as opposed to forward-looking.

7As previously mentioned, it seems unlikely that consumers choose where to shop based on grocery stores’ handling

of stockout substitutions. See Sections 2.4 and 2.7 for further discussion.

8If a consumer places a curbside order such that the sum of the ordered items falls below a specified threshold, she

10

consumer will pay the same price for a given item as if she had physically entered the store and

purchased it there.

Stockout Substitutions.—The store is sometimes unable to supply an ordered item because it

has gone out of stock. In that event, the store will offer a similar item to serve as a substitute.

To illustrate how stockout substitutions proceed, let us revisit the (hypothetical) consumer who

has ordered ice cream and apple sauce. Sometime after she has placed her order but before her

intended pickup time, a store worker will collect the ordered items and set them aside, so that they

can be brought out immediately upon her arrival. As he does so, the worker may discover that

an ordered item has gone out of stock. Imagine, for instance, that our consumer’s preferred ice

cream—namely, Häagen-Dazs vanilla—is unavailable. To ensure that she is not left without ice

cream altogether, the worker will choose another product to serve as a substitute—say, Halo Top

vanilla ice cream.9 Then, when our consumer arrives at the store,10 she will be presented with two

options: either she can accept the substitute that the worker chose earlier on her behalf, or she can

reject it and buy no such product at all. If she accepts the substitute, she will pay the substitute’s

price (not that of the out-of-stock product).

2.2.3 Data

This study employs data from a regional supermarket chain that offers both in-person and online

shopping. Concerning the latter, consumers can choose whether they prefer curbside pickup or

home delivery.11 As far as online shopping is concerned, my analysis concentrates on curbside

will pay a fixed fee for curbside pickup.

9The store’s website and mobile app allow the consumer to leave item-level instructions for the store. For instance,
someone who is ordering strawberries might request “extra-ripe” berries. However, a consumer could also use this
feature to request a specific substitute if her preferred product goes out of stock. Although I do not observe whether a
consumer makes such a substitution request (or, for that matter, whether she leaves item-level instructions of any kind),
the retailer has indicated that, during the time period of my data, consumers almost never left item-level instructions.
10Since September 2021, the store has also allowed consumers to accept or reject substitutes remotely. When an
ordered item goes out of stock, the affected consumer receives a pop-notification or text to that effect, along with
information about the substitute (such as the name and price). She can then accept or reject the substitute using her
phone or computer. (If she fails to respond electronically, she will be offered the substitute at her car as in the old
procedure.)

11Home delivery resembles curbside pickup as far as orders are concerned. Unlike curbside pickup, however, home
delivery does not require the shopper to travel to the store. Rather, her groceries are delivered directly to her home.
For this convenience, she must pay a fee. (By contrast, curbside pickup is free for sufficiently large orders.)

11

pickup. This is because in home delivery, consumers select stockout substitutes themselves.12

The supermarket data consist of four distinct datasets. These include a “curbside stockout”

dataset, which details stockout events in curbside pickup; a “scanner” dataset, which records

consumers’ final purchases; a “demographics” dataset, which describes the characteristics of the

consumer’s household; and the chain’s product catalog, which describes the products carried by

the chain. I will now describe each of these datasets in turn.

Curbside Stockout Data.—The first dataset describes (attempted) stockout substitutions in curb-

side pickup from February 2020 to March 2022. Each observation includes the universal product

code (UPC) of both the out-of-stock product and the substitute. I also see the price of the substi-

tute,13 and whether it is accepted or rejected by the consumer.

Importantly, each observation in the data contains the loyalty ID number of the affected con-

sumer,14 along with the date, time, and store location of pickup. This information enables me to

identify the consumer’s past and future purchases within the scanner dataset (as described below).

To see what the curbside stockout data look like in practice, turn to Table 2A.1, which depicts

the observations that would result from the stylized example in Section 2.2.2.

Scanner Data.—The second dataset records all purchases at the chain, both online and in-

person, between April 2016 and July 2023. Each observation, which consists of a single transaction,

includes the UPCs and prices of all the items that were purchased, along with the consumer’s loyalty

ID. The data also record the date, time, and store location of the transaction. Finally, I observe

the wholesale costs of each item.15 Hence, by taking the difference between purchase prices and

wholesale costs, I can recover the “retail margin” of each item carried by the store.

Where curbside pickup is concerned, the scanner data only include a stockout substitute if it

is accepted by the consumer. To illustrate, consider once more the (hypothetical) consumer from

12When an item ordered for home deliver becomes unavailable, the store phones the shopper to determine her

preferred replacement.

13The price of the out-of-stock item is obtained from the scanner data (as I will explain shortly).
14Participation in the chain’s loyalty program is required to place curbside pickup orders.
15Prior to 2021, the retailer’s cost measure included some fixed costs in addition to the wholesale cost. There are
six months during which both the old cost measure and the new one (i.e., wholesale cost alone) are recorded. For
individual products, I observe these two cost measures moving roughly in tandem during this period.

12

the preceding subsection. Recall that she ordered Häagen-Dazs vanilla ice cream and Mott’s apple

sauce, but that the former went out of stock. Here, the substitute ice cream (Halo Top Vanilla) would

only appear in the data if she accepted the swap. By contrast, the apple sauce would certainly appear

in the scanner data, as it is the exact product that she had originally requested. See Table 2A.2 for

a comparison of the data entries that would result from acceptance versus rejection.

Regarding stockout substitutions, the scanner data enable me to infer the price of the out-of-stock

product. To do so, I search the scanner data for purchases of the relevant product on the same day,

and at the same store, as the intended pickup—either before or after the stockout event. Provided

that I locate at least one such observation, I approximate the out-of-stock product’s price as being

the mean of the observed purchase prices.16 If I do not observe any purchases of the product on

the same day (and at the same store) as the substitution, I instead compute the mean purchase price

on the day before the substitution.17 Failing that, I approximate the out-of-stock product’s price

by taking the average purchase price on the nearest date for which observations appear in the data.

If I have still not obtained the out-of-stock product’s price, I compute the average purchase price

for stores in the same (narrowly-defined) geographic area on the nearest date with observations in

the data (once more, up to seven days before or after the stockout event). The assumption is that

stores in the same geographic area will coordinate on discounts (which might be advertised through

mass mailings or billboards). To group stores by location, I rely on the most granular geographic

designation in the chain’s internal system.

Demographic Data.—The third dataset reports demographic information about consumers’

households. These data, which are gathered by a third-party consulting firm, report the household’s

income; the size of the household; the age of the oldest resident male (or, absent male residents,

the age of the oldest female); and whether the household owns or rents its home. See Chapter 2A

for details.

16Based on conversations with chain pricing personnel, this procedure should yield a very close approximation of
the true price. (The true price may differ slightly from the imputed one due to coupons or other consumer-specific
discounts.)

17Whereas it is possible for a consumer to place an order the day before pickup, it is impossible for her to place the
order the day after! Thus, the average purchase price on the day before the pickup is likely more representative of the
price that she expected to pay than is the average purchase price on the day after.

13

Product Catalog.—The fourth dataset describes the products sold by the chain. For each product,

the catalog lists the universal product code (UPC) and the brand, as well as the location within the

chain’s product taxonomy. I also observe a string description of the product that characterizes its

observable characteristics. To illustrate, here is a string description of a package of apple sauce

cups:

“MOTTS APPLESAUCE CINNAMON 18/4 OZ”

This description indicates that the apple sauce is sold under the Mott’s brand, that it is cinnamon-

flavored, and that the package contains eighteen cups of apple sauce (each measuring 4 oz). I employ

so-called “regular expressions” to extract this information. Sometimes, however, a product’s string

description omits one or more characteristics of interest.

In such cases, I consult either the

manufacturer’s website or that of a retailer that carries the product.18

2.2.4 Summary Statistics

I observe billions of transactions at the relevant supermarket chain. About forty percent of

these transactions involve a consumer who participates in the chain’s loyalty program. Because my

analysis focuses primarily on curbside pickup (where enrollment in the chain’s loyalty program is

required), I will hereafter refer to these individuals as simply “consumers.” The total number of

consumers exceeds five million.

Now consider consumers’ choice of shopping channel. Roughly one in five consumers places

at least one order for curbside pickup, so the chain fulfills millions of orders for curbside pickup.

Regarding stockout substitutions, of the hundred million–plus items ordered for curbside pickup

during the two years when I observe stockout substitutions, approximately twelve million undergo

stockout substitutions. Because curbside orders contain an average of twenty-three items during

this time period, consumers suffer an average of two stockout substitutions per order. Stockout

substitutes are accepted in 87.4% of cases.

18This step proves particularly important for the product category of ice cream, where most products’ characteristics
must be constructed completely by hand. Consequently, my analysis of ice cream focuses on the top 97–selling
“mainstream” ice cream products and the top 97–selling “super-premium” ice cream products.

14

In line with the New Empirical IO literature (e.g., Nevo [2001] and Backus, Conlon, and

Sinkinson [2021]), I do not model consumers’ demand for all packaged foods. Instead, I focus on

the following product categories: apple sauce cups, flavored milk, frozen french fries, and ice cream.

These product categories were chosen for three reasons. First, each category consists of experience

goods (as defined in Section 2.1). Second, each category contains many stockout substitutions.

This helps me identify the relationship between stockout substitutions and consumers’ learning.

Finally, the brands within these product categories afford meaningfully different retail margins. If

this were not so, the store could not profit from steering consumers’ learning towards particular

brands through stockout substitutions (which is the key strategy assessed in this paper).

Table 2.1 presents summary statistics for the four product categories studied. The number of

households suffering at least one stockout substitution ranges from roughly 7,500 (apple sauce

cups) to nearly 57,000 (ice cream). Because some consumers suffer multiple substitutions, the

overall number of substitutions per category is somewhat larger.

It is the largest for ice cream

(about 91,000) and the smallest for apple sauce cups (approximately 9,000). As for consumers’

willingness to accept stockout substitutes, the probability of acceptance ranges from 83.9% (ice

cream) to 90.6% (frozen french fries).

Table 2.1: Summary Statistics by Product Category

Statistic

Apple sauce
cups

Flavored
milk

Frozen french
fries

Ice
cream

Panel A. Overview

No. of households with 1+ substitutions
Total substitutions
Prob. accept (%)

7508
9001

87.2

13,014
17,484

30,588
39,397

56,762
91,139

88.0

90.6

83.9

No. of shopping trips

. . . of which curbside pickup
. . . with 1+ substitutions

Panel B. Per consumer with 1+ substitutions

20.4
6.2
1.2

40.7
9.5
1.3

21.7
5.7
1.3

33.9
7.1
1.4

Notes: Unless otherwise indicated, estimates are reported as means or totals.

Turning to the panel dimension of the data, Panel B characterizes the purchases of individual

15

consumers who suffer at least one stockout substitution. Depending on the product category, I

observe an average of twenty to forty-one shopping trips per consumer. Six to ten of these shopping

trips are curbside pickup (as opposed to in-store shopping or home delivery).

Chapter 2A presents additional summary statistics on state dependence in consumers’ purchases,

as well as the frequency and duration of stockout events.

2.3 Descriptive Evidence

In this section, I present descriptive evidence of the trade-offs faced by the store as it selects

stockout substitutes. I begin by characterizing the circumstances under which consumers are willing

to accept stockout substitutes. Next, I present model-free evidence that stockout substitutions

influence consumers’ learning. Finally, I examine the relationship between products’ observable

characteristics and their retail margins.

2.3.1 When Do Consumers Accept or Reject Stockout Substitutes?

A consumer’s decision to accept or reject a stockout substitute may impact the store’s future

profits as well as its present ones. Consider first the case where the consumer rejects. Here, the store

earns zero margins on the present shopping trip from the (unsuccessful) substitution. As for future

profits, the consumer’s rejection of the substitute suggests that she may be unhappy with the store’s

handling of the stockout. This dissatisfaction might, in principle, dent the store’s future earnings

if the consumer reduces her future patronage. However, it seems unlikely that the store’s handling

of a single stockout would affect where the consumer shops in the future.19 Now turn to the case

where the substitute is accepted. Regarding the present shopping trip, the store immediately earns

the retail margin associated with the substitute. As to future profits,20 the consumer will learn

whether she likes or dislikes the substitute product—provided that she has not already purchased it

previously, in which case she will already know whether the product is to her taste. This learning

19Other factors, like the convenience of the store’s location or the competitiveness of its prices, probably loom
larger in the consumer’s choice of grocery store. There is also a cost (in time and effort) associated with enrolling in a
rival chain’s pickup program and becoming proficient in its interface.

20The consumer might feel unhappy with the store’s handling of the stockout despite accepting the substitute (e.g., if
she accepts due to the inconvenience of procuring a replacement elsewhere). Although it appears unlikely that a single
unsatisfactory substitution could dent the consumer’s future patronage of the store (see above), it remains theoretically
possible that the store’s future profits are related to the consumer’s utility from an accepted substitute.

16

may affect her subsequent purchases (and ultimately the store’s future profits).

The task of this subsection is to characterize when consumers are willing to accept stockout

substitutes.

I focus on two key determinants of acceptance:

the substitute’s similarity to the

out-of-stock product, and the substitute’s similarity to products that the consumer has purchased

on previous shopping trips. Intuitively, the probability of acceptance should be increasing in the

similarity of the substitute’s observable characteristics—like its brand or size—to those of the

out-of-stock product and those of products that the consumer has purchased on previous shopping

trips. Regarding the out-of-stock product, I construct a set of indicator variables for the substitute’s

sharing a given characteristic 𝑘 (such as brand) with the out-of-stock product. Let same𝑖𝑘 = 1 if

consumer 𝑖 is offered a substitute that shares characteristic 𝑘 with the out-of-stock product, and

same𝑖𝑘 = 0 otherwise. As for the consumer’s past purchases, I include a set of indicator variables

for the substitute’s sharing a given characteristic 𝑘 with any of the products that the consumer has

bought before. Formally, let ever𝑖𝑘 = 1 if consumer 𝑖 is offered a substitute that shares characteristic

𝑘 with any of the products that she has purchased on past shopping trips, and ever𝑖𝑘 = 0 otherwise.

Observable characteristics aside, the prices of the substitute and out-of-stock product should

also be informative of acceptance. In particular, consumers may be reluctant to accept stockout

substitutes that are perceptibly pricier than the products they had originally ordered. One of my

empirical specifications therefore allows the probability of acceptance to depend on the difference

between the substitute’s price (𝑝𝑖,sub) and that of the out-of-stock product (𝑝𝑖,OOS). Another

specification incorporates the prices of the substitute and out-of-stock product in a more flexible

manner, including each as a separate explanatory variable.

All told, I take the following probit model to the data. Letting 𝑎𝑖 = 1 if consumer 𝑖 accepts and

𝑎𝑖 = 0 otherwise, I estimate

where

𝑎★
𝑖 =

𝐾
∑︁

𝑘=1

𝑎𝑖 =

1

0





if 𝑎★

𝑖 ⩾ 0

if 𝑎★

𝑖 < 0,

(cid:0)𝛾𝑘 same𝑖𝑘 + 𝜁𝑘 ever𝑖𝑘 (cid:1) + 𝑤( 𝑝𝑖,sub, 𝑝𝑖,OOS) + 𝜐𝑖.

(2.1)

17

and the idiosyncratic error 𝜐𝑖 is distributed i.i.d. standard normal. As previously mentioned, I
explore several price controls; 𝑤( 𝑝𝑖,sub, 𝑝𝑖,OOS) ∈ (cid:8)0, ( 𝑝𝑖,sub − 𝑝𝑖,OOS)𝜂, 𝜒 𝑝𝑖,sub + 𝜓 𝑝𝑖,OOS

(cid:9).

Table 2.2 reports the average marginal effects of the variables in Equation (2.1). For three

of the four observable characteristics studied—namely, brand, flavor, the number of mix-ins, and

quantity—the probability of acceptance is significantly greater when the substitute shares the

relevant characteristic with either (a) the out-of-stock product or (b) at least one previous purchase.

As for the characteristic of quantity, the substitute’s similarity in size to the out-of-stock product—

or the consumer’s past purchases—is uninformative of the probability of acceptance. This likely

reflects the limitations of this reduced-form exercise, not indifference by the consumer as to the

substitute’s quantity.21

The results for the other product categories studied—which include apple sauce cups, flavored

milk, and frozen french fries—are qualitatively similar. In particular, consumers are more likely to

accept substitutes whose brands they have previously purchased. (See Chapter 2B for details.)

2.3.2 Stockout Substitutions and Consumers’ Learning about Brands

This subsection supplies model-free evidence that stockout substitutions influence consumers’

learning. Throughout, I adopt the simplifying assumption that consumers learn about their pref-

erences/tastes for products’ observable characteristics, as opposed to their tastes for individual

products. This simplifying assumption aligns the descriptive analysis here with the demand model

estimated in Sections 2.5 and 2.6.

How might consumers learn about their preferences for products’ observable characteristics?

Consider a (hypothetical) consumer who always orders Häagen-Dazs vanilla ice cream. On one

occasion, however, her preferred ice cream goes out of stock. In its place, she is offered Halo Top

vanilla ice cream as a substitute. If she accepts, she will learn about her tastes for a new brand:

Halo Top. However, she won’t necessarily learn anything about her preferences for flavor because

the substitute shares the same flavor (namely, vanilla) as her preferred ice cream. What if, instead,

21For instance, conditional on the substitute’s differing from the out-of-stock product with respect to a given
characteristic, I do not quantify the degree of dissimilarity. To more accurately capture the consumer’s underlying
choice problem, it helps to estimate a structural model (as I do in Sections 2.5 and 2.6).

18

Table 2.2: Determinants of Acceptance: Average Marginal Effects from Probit Regressions

(1)

(2)

(3)

Brand

Sub shares OOS product’s brand

Ever purchased sub’s brand before

Flavor(s)

Sub shares OOS product’s flavor(s)

Ever purchased sub’s flavor(s) before

No. of mixins

Sub shares OOS product’s no. of mixinsa

Ever purchased sub’s no. of mixins beforea

Quantity (oz.)

Sub shares OOS product’s quantity (oz.)b

Ever purchased sub’s quantity (oz.) beforeb

Sub’s price – OOS product’s price

0.015***
[0.004]
0.011**
[0.003]

0.041***
[0.003]
0.041***
[0.003]

0.035***
[0.003]
0.017***
[0.003]

−0.011
[0.013]
0.000
[0.004]

0.016***
[0.004]
0.011**
[0.003]

0.041***
[0.003]
0.041***
[0.003]

0.035***
[0.003]
0.017***
[0.003]

−0.019
[0.014]
0.000
[0.004]
−0.009***
[0.002]

0.015***
[0.004]
0.011**
[0.003]

0.038***
[0.003]
0.041***
[0.003]

0.035***
[0.003]
0.017***
[0.003]

−0.032*
[0.014]
−0.002
[0.004]

0.004
[0.002]
−0.016***
[0.002]

Sub’s price

OOS product’s price

Observations
Pseudo 𝑅2

55,270
0.024

55,270
0.025

55,270
0.026

Notes: The dependent variable is whether a stockout substitute is accepted (= 1) or rejected (= 0).
The table reports average marginal effects, not coefficients. Standard errors are in brackets. As some
consumers suffer multiple stockouts, the standard errors are clustered at the consumer level.

* Significant at the 10 percent level.
** Significant at the 5 percent level.
*** Significant at the 1 percent level.

a The number of non–ice cream elements mixed into the ice cream (like caramel sauce or

chunks of cookie dough).

b Quantity is discretized as follows: 0 to 5 oz; 5 to 9 oz; 9 to 13 oz; 13 to 25 oz; 13 to 25 oz;

25 to 50 oz; 50 to 100 oz; more than 100 oz.

the store had offered Halo Top cookies & cream ice cream as a substitute? Then, if the consumer

accepted, she would learn her tastes for a new flavor (i.e., cookies & cream) as well as a new

brand (i.e., Halo Top). Notice that the amount of learning will vary by characteristic so far as the

consumer holds more accurate prior beliefs about her tastes for some characteristics than others.

Intuitively, ice cream buyers are more likely to learn about their preferences for brands or flavors

than they are to learn about, say, their preferences for the quantity of ice cream (which will mostly

19

depend on the size of their freezers and the frequency with which they eat ice cream).

The task of this subsection is, therefore, to determine whether stockout substitutions cause

consumers to learn about their preferences for observable product characteristics. I start by identi-

fying stockouts where consumers will learn about their tastes for one of the substitute’s observable

characteristics if they accept. For example, if I were interested in the characteristic of brand, I

would find stockout substitutions where consumers have never purchased the substitute’s brand be-

fore. Next, I tally how often these consumers’ future purchases share the substitute’s version of the

relevant characteristic. Intuitively, the following empirical pattern should emerge if stockout sub-

stitutions affect consumers’ learning. Of the consumers who accept the offered substitute—thereby

learning their true tastes for its version of the characteristic—some will discover that they like the

substitute’s version more than they had expected. A disproportionate share of these consumers’

future purchases should thus feature the substitute’s version of the characteristic, compared to the

counterfactual where they never purchased the substitute (and, in consequence, did not learn about

the substitute’s version of the characteristic). But how can I identify this counterfactual? That’s to

say, what would these consumers’ purchases have looked like if they had never suffered the stockout

substitution and, as a result, never learned about the substitute? To approximate this counterfactual,

I identify “control consumers” who order the same products as the focal consumers. Unlike the

focal consumers, however, these control consumers successfully pick up their preferred products

before they go out of stock and, in consequence, do not have the chance to learn about the substitute.

In spelling out my empirical strategy, it helps to focus on just one observable characteristic. I will

thus concentrate initially on the characteristic of brand and then explain how my strategy generalizes

to other characteristics. With this in mind, consider once more the hypothetical consumer who

always buys Häagen-Dazs vanilla ice cream. Now assume that she is offered Halo Top vanilla

ice cream as a stockout substitute. If she accepts, she will consume Halo Top–branded ice cream

for the first time, thereby learning whether she likes or dislikes the brand. Now assume that our

consumer does accept the substitution and that, on her subsequent shopping trips, she begins to

purchase Halo Top–branded ice cream. This shift in brands purchased, from Häagen-Dazs to Halo

20

Top, reflects two factors. One is the consumer’s learning about the Halo Top brand. The other is

confounding changes in the market environment; perhaps Halo Top has rolled out a new marketing

campaign at the same time as the stockout. Or, alternatively, our consumer might be tiring of the

taste of Häagen-Dazs ice cream so that, even without the stockout substitution, she would still have

switched to a different brand in the near future—like Halo Top.

To isolate the influence of the stockout substitution, I identify a “control consumer” who, like

the focal consumer, has never purchased any Halo Top–branded ice cream before. Additionally, the

control consumer has ordered the same Häagen-Dazs vanilla ice cream as the focal consumer, from

the same store, and on the same day. Unlike the focal consumer, however, the control consumer

arrives at the store just before the Häagen-Dazs vanilla ice cream go out of stock. Hence, he does

not suffer a stockout substitution and, in consequence, cannot learn his true tastes for the Halo Top

brand on the present shopping trip. Any future purchases of the Halo Top brand will, therefore, stem

solely from confounding changes in the purchase environment—not learning. This enables me to

“difference out” confounding changes in the purchase environment: whereas the focal consumer’s

future purchases reflect both (a) her learning about Halo Top (due to the substitution) and (b)

confounding changes in the environment, the control consumer’s future purchases reflect only (b).

Thus, if the focal consumer proceeds to purchase Halo Top ice cream more often than the control

consumer does, the disparity likely stems from the former’s learning.

Having sketched the intuition of my strategy, I now spell out the specifics. As suggested by the

foregoing thought experiment, I start by identifying stockout substitutions where the consumer has

never purchased the substitute’s brand before. For each such substitution, I identify all successful

curbside pickups of the focal consumer’s preferred product before it went out of stock.22 Of these

successful pickups, I drop those where the purchaser has bought the substitute’s brand before.

Among the remaining consumers, the “control consumer” is defined as the last one to successfully

22In Chapter 2B, I repeat the same procedure for the first consumer to pick up after the stockout ends. However,
intuition suggests that stockouts may cause endogenous price changes where the store hikes the prices of products
that recently went out of stock. By contrast, purchases before the stockout are insulated from such endogenous price
adjustments. At all events, the results are quantitatively unchanged by this alternative method of selecting the “control
consumer;” see Table 2B.6.

21

pick up the ordered product before it goes out of stock.23 Under the null hypothesis that stockout

substitutions do not result in consumer learning, the control consumer’s future purchases should

belong to the substitute’s brand just as often as the focal consumer’s future purchases do.

The foregoing procedure can be readily adapted to study characteristics other than brand. To do

so, I first identify stockout substitutions where the substitute’s version of the relevant characteristic

is one that the consumer has never purchased before (so that she will learn about the substitute’s

version if she accepts). Then I single out a “control consumer” from among the population

of consumers who have ordered the same product as the focal consumer, and who, like the focal

consumer, have never purchased a product with the substitute’s version of the relevant characteristic.

As with brand, I focus on the last such consumer to successfully pick up before the stockout event.

Table 2.3 presents the results of this descriptive exercise. The results bear an “intent-to-treat”

interpretation because I do not distinguish between observations where the substitute is accepted

(in which case the consumer learns about the substitute) and observations where the substitute is

rejected (in which case the consumer does not learn). This is because acceptance is endogenous;

consumers who expect to like the substitute’s observable characteristics are more likely to accept

than are consumers who expect to dislike its characteristics.

With this in mind, Table 2.3 is organized as follows. Each panel focuses on a specific observable

characteristic. The upper row of the panel corresponds to the “focal consumers,” who are offered

stockout substitutes with a version of the characteristic that the consumers have never tried. The

lower row of the panel, meanwhile, pertains to the “control consumers,” who successfully pick up

their preferred products (and do not, therefore, learn about the relevant characteristic).

Focus first on the number of purchases made by each type of consumer. For all characteristics

studied,24 the “focal consumers” (who suffer stockout substitutions) are observed making more

purchases before the stockout than are their control counterparts. The same disparity emerges for

23To ensure that the purchase environment is comparable to that experienced by the focal consumer, I drop any
observations where the “control consumer” picks up the focal consumer’s preferred product on a different week than
the control consumer. (The store typically updates prices and discounts once per week, on Sundays.)

24I omit results for quantity because there are only seven observations where (i) a “focal consumer” is offered a

substitute with an unfamiliar quantity and (ii) a “control consumer” can be identified.

22

Table 2.3: Model-Free Evidence That Stockout Substitutions Affect Consumers’ Learning

No. of purchases

Before stockout

After stockout

Pct. of future purchases with
sub’s version of characteristic

Consumer’s “treatment”

Mean

Std. dev. Mean

Std. dev. Mean

Std. dev.

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

Panel A. Characteristic of brand (292 obs.)

35.0
[0.2]
39.1
[0.2]

21.3
[0.1]
26.0
[0.1]

24.5
[0.1]
29.7
[0.2]

6.6
[0.1]
3.0
[0.0]

Panel B. Characteristic of flavor (200 obs.)

40.9
[0.3]
50.3
[0.4]

19.7
[0.1]
25.8
[0.1]

21.8
[0.1]
27.3
[0.2]

2.9
[0.0]
2.0
[0.0]

16.9
[0.1]
7.9
[0.1]

10.0
[0.2]
8.3
[0.2]

Panel C. Characteristic of no. of mix-ins (76 obs.)a

24.2
[0.5]
35.8
[0.8]

17.0
[0.2]
23.8
[0.3]

16.1
[0.2]
23.5
[0.3]

6.8
[0.2]
5.7
[0.2]

11.7
[0.2]
13.8
[0.5]

24.6
[0.1]
30.1
[0.1]

29.8
[0.2]
38.3
[0.2]

17.0
[0.3]
24.9
[0.5]

Notes: This table presents descriptive evidence that stockout substitutions influence consumers’ learning about
their preferences for observable product characteristics. Each observation consists of a stockout substitution
where the substitute does not share the relevant characteristic with any of the consumer’s past purchases (so
that, if she accepts, she will learn her tastes for the substitute’s version of the characteristic). To capture
confounding influences in the environment, results are also reported for “control consumers” who resemble the
focal consumers in most respects, but do not suffer stockout (and thus do not learn about the substitute). Each
“control consumer” is drawn from the population of consumers who have not yet purchased any products that
share the relevant characteristic with the substitute. The control consumer has also ordered the same product
as the focal consumer, from the same store, and on the same week. Unlike the focal consumer, however, she
successfully picks up her preferred product before it goes out of stock. From the pool of consumers satisfying
the foregoing criteria, I select the last one to have successfully picked up before the stockout event. Bootstrapped
standard errors are enclosed in brackets. The characteristic of quantity is omitted from this table because there
are only seven observations.

a The number of non–ice cream elements mixed into the ice cream (like caramel sauce or chunks of cookie

dough).

the number of purchases after the stockout. These discrepancies point to compositional differences

between the focal and control consumers; the results of this descriptive exercise should be taken as

suggestive, not definitive.

Consider now the percentage of future purchases that share the substitute’s version of the

characteristic.25 Concerning all three characteristics, this percentage is larger for the “focal”

consumers than for their “control” counterparts. That is consistent with a subset of the “focal”

25The corresponding percentage is omitted for past shopping trips as, by construction, none of these consumers

have ever purchased a product that shares the relevant characteristic with the substitute.

23

consumers’ discovering that they like their substitute’s version of the relevant characteristic and,

in consequence, purchasing products with that version of the characteristic again in the future.

Notice that the disparity between the “focal” and “control” consumers is larger for the characteristic

of brand (3.6 percentage points) than for the characteristics of flavor and mix-ins (0.9 and 1.1

percentage points, respectively). This suggests that consumers learn more about their preferences

for brands than about their preferences for flavor or the number of mix-ins.

Other Product Categories.—The results for apple sauce cups, flavored milk, and frozen french

fries prove qualitatively similar to those for ice cream. Tables 2B.2 to 2B.4 report that consumers

who suffer stockouts within these categories proceed to purchase the substitute’s brand 1.9 to 7

percentage points more often than their control counterparts (depending on the product category).

And as with ice cream, buyers in these categories generally seem to learn less about non-brand

observable characteristics than about brand.

Robustness Checks.—Other mechanisms besides learning could explain these results. One such

mechanism is the “buy it again” feature of the store’s app and website, which enables consumers to

perform repeat purchases with a single click. Importantly, the “buy-it-again” list includes accepted

stockout substitutes. This raises the following question. Do consumers purchase stockout substitutes

on subsequent shopping trips because it is convenient, or because they have learned about the

substitutes? To adjudicate between these explanations, I modify the foregoing descriptive exercise

as follows. Rather than comparing focal and control consumers with respect to all subsequent

purchase—both online and offline—I instead focus solely on in-store purchases, which should

be unaffected by the “buy-it-again” list. The results (presented in Table 2B.5) prove reassuring.

Although sample sizes (and statistical power) shrink dramatically, consumers in two of the four

product categories still purchase their substitute’s brand much more frequently on their subsequent

in-store shopping trips than do their control counterparts.26

26Concerning apple sauce cups, the focal consumers purchase the substitute’s brand 7.1 percentage points more
often than do their control counterparts. The corresponding disparity is 4.3 percentage points for frozen french fries.
As for the remaining categories, the focal iced cream buyers proceed to the substitute’s brand slightly more often than
do their control counterparts (0.5 percentage points), while the focal flavored milk buyers purchase the substitute’s
brand 0.1 percentage points less often.

24

2.3.3 What Determines Products’ Retail Margins?

In this subsection, I study the determinants of products’ retail margins—that is, the differences

between their retail prices and wholesale costs.27 How do observable characteristics like brand,

size, or flavor influence a product’s profitability?

To provide insight, I estimate the linear regressions of the form

𝑝 𝑗𝑡𝑠 − 𝑤𝑐 𝑗𝑡𝑠 = 𝑥 𝑗 𝛾 + 𝜈 𝑗𝑡𝑠,

(2.2)

where 𝑝 𝑗𝑡𝑠 and 𝑤𝑐 𝑗𝑡𝑠 respectively denote the price and wholesale cost of good 𝑗 at time 𝑡 in store

𝑠, while 𝑥 𝑗 denotes the observable characteristics. As the data span more than seven years, I adjust

for inflation by converting both prices and margin costs to 2021 dollars.28

Figure 2.1 reports the results. Unlike the foregoing descriptive analysis, here I distinguish

between “mainstream” and “super-premium” ice creams. This is because my structural analysis in

Sections 2.6 and 2.7 focuses on the latter segment (which is economically important in its own right;

see Sullivan [2020]). For a given ice cream segment, the corresponding panel plots the estimated

coefficients on key observable characteristics. For discrete characteristics with many values, such

as brand or flavor, I assign the top-selling value as the base level and then report the coefficients on

the three next-most-popular values.

Consider first the coefficients on the brand dummies. These bear the following interpretation:

how do the retail margins of a product sold under the indicated brand differ from those of an

otherwise-identical product sold under the omitted brand (which is the top-selling brand in the

category)? The estimates suggest that products’ brands are, indeed, a key determinant of their

respective retail margins. Within both the mainstream and super-premium ice cream segments,

the margins of the most profitable brand exceed those of the least profitable brand by fifty cents.

Now turn to products’ non-brand characteristics. It is immediately evident that the characteristic of

flavor is unimportant for margins. Margins do appear to increase with quantity, however—especially

where super-premium ice cream is concerned.

27As mentioned previously, the store reported a hybrid cost measure (wholesale cost + some fixed costs) until 2021.

For simplicity, these descriptives focus on the time period after 2021, when wholesale costs are directly observed.

28To reduce the influence of brief fluctuations in the CPI, I normalize values using the six-month smoothed CPI.

25

a. Mainstream ice cream

b. Super-premium ice cream

Figure 2.1: Determinants of Retail Margins
Notes: This figure plots estimates of the coefficients (𝛾) on products’ observable characteristics using the
specification in Equation (2.2). The horizontal bars provide 95% confidence intervals.

Chapter 2B presents similar descriptive regressions for the other product categories studied.

Within each category, the characteristic of brand proves to be one of the key determinants of

margins.

2.4 Conceptual Model

The preceding section highlighted three empirical patterns in the data. First, consumers prefer

stockout substitutes whose observable characteristics resemble those of the out-of-stock product (or,

at a minimum, consumers’ past purchases). Second, stockout substitutions influence consumers’

learning about their preferences for observable characteristics—particularly in relation to the char-

acteristic of brand. Last, the characteristic of brand is among the most important determinants of

products’ retail margins.

I will now distill these empirical patterns in a stylized conceptual model. The task is to formalize

the store’s strategic problem as it chooses stockout substitutes and to trace the effect on consumer

welfare. This conceptual model will inform the empirical model that I take to the data in Sections

2.5 through 2.7.

2.4.1 The Store’s Problem

Consider a store that offers three goods for curbside pickup: 𝐴, 𝐴′, and 𝐵. Let 𝑝 𝑗 and 𝑚𝑐 𝑗

denote the price and marginal cost, respectively, of good 𝑗 ∈ { 𝐴, 𝐴′, 𝐵}. Assume that good 𝐵

26

−0.2−0.10.00.10.20.3CoefficientBrand: BreyersBrand: EdysBrand: RegionalFlavor: ChocolateFlavor: MintFlavor: Sweet CreamNo. of mixinsQuantity−0.20.00.20.40.6CoefficientBrand: GraeterBrand: HalotopBrand: Häagen-DazsFlavor: CaramelFlavor: ChocolateFlavor: FruitNo. of mixinsQuantityaffords a higher retail margin than do goods 𝐴 and 𝐴′:

𝑝𝐵 − 𝑚𝑐𝐵 > max{𝑝 𝐴 − 𝑚𝑐 𝐴, 𝑝 𝐴′ − 𝑚𝑐 𝐴′ }.

The store serves a consumer who makes two shopping trips, indexed by 𝑡 ∈ {1, 2}. On each

trip, she either (i) purchases one of the three “inside goods” sold by the store; or (ii) chooses the

“outside option” of no purchase, indexed by 𝑗 = 0. She values the “inside goods” 𝑗 ∈ {𝐴, 𝐴′, 𝐵} at

𝑣 𝑗 ∈ R and the “outside option” at zero. The conditional indirect utility of good 𝑗 is the difference

between its valuation and its price:

𝑢 𝑗 =





𝑣 𝑗 − 𝑝 𝑗

if 𝑗 ∈ {𝐴, 𝐴′, 𝐵}

0

if 𝑗 = 0.

Our consumer has imperfect information about her preferences among the three goods: although

she knows the valuations of goods 𝐴 and 𝐴′ from prior purchase experiences, she does not know

her valuation of good 𝐵. However, she expects to like good 𝐵 less than good 𝐴′ which, in turn, she

knows to be less preferable than good 𝐴. That is,

E[𝑢𝐵] − 𝑝𝐵 ≕ 𝑢𝐸

𝐵 < 𝑢 𝐴′ < 𝑢 𝐴.

In addition, the consumer prefers good 𝐴′ to the “outside option” of no purchase:

𝑢 𝐴′ > 0.

(2.3)

(2.4)

Turning to the store, I assume that Equations (2.3) and (2.4) are common knowledge. Only

the consumer, however, knows exactly how much utility she expects good 𝐵 to afford (that is, 𝑢𝐸

𝐵).

And neither the store nor the consumer knows the true utility 𝑢𝐵 of good 𝐵. However, the store has

microdata 𝜃 on the consumer’s past purchases and household demographics that help it forecast

both of these quantities (i.e., 𝑢𝐸

𝐵 and 𝑢𝐵).29

Suppose that our consumer orders good 𝐴 on trip 1, only for the good to go out of stock before

pickup. Should the store offer 𝐴′ or 𝐵 as a substitute? Its decision depends on several criteria. Two

29Formally, let 𝐻 (· | ·) and 𝐻 (·) denote the conditional and unconditional entropy operators, respectively. Then
𝐵 | 𝜃) − 𝐻 (𝑢𝐸

𝐵) ∈ R>0 and 𝐻 (𝑢𝐵 | 𝜃) − 𝐻 (𝑢𝐵) ∈ R>0.

𝐻 (𝑢𝐸

27

of these criteria concern the first shopping trip. These include (a) the potential substitute’s retail

margin and (b) the probability of acceptance. Regarding (b), our consumer will accept a substitute

𝑠 ∈ {𝐴′, 𝐵} if, and only if, she expects to prefer it to the “outside option” of no purchase (that is,

“good 0”).30

The store’s choice of substitute also affects its future profits. If the store offers good 𝐵 as the

substitute, the consumer may discover that she likes the good more than she had expected and, in

consequence, purchase it on her second trip. That would boost the store’s future profits because

good 𝐵 affords greater retail margins than do goods 𝐴 or 𝐴′.

The store’s optimal choice of substitute can be formalized as follows. Let 𝛿 denote the discount

factor for profits on trip 2. Then, given that the consumer originally ordered good 𝐴, the optimal

substitute should maximize the present-discounted sum of profits on trips 1 and 2;

𝑠★
𝐴 ( 𝑝, 𝑚𝑐; 𝜃) ≔ arg max
𝑠∈{ 𝐴′,𝐵}

(cid:8) E[Π1 | offer 𝑠; 𝜃] + 𝛿 E[Π2 | offer 𝑠; 𝜃](cid:9).

(2.5)

Focus first on trip 1. If the store offers good 𝐴′ as a substitute, the consumer will certainly accept.

Thus, E[Π1 | offer 𝐴′; 𝜃] = 𝑝 𝐴 − 𝑚𝑐 𝐴. Concerning good 𝐵, by contrast, the store is unsure of
acceptance. As a result, E[Π1 | offer 𝐵; 𝜃] = Pr[𝑢𝐸

𝐵 > 0 | 𝜃] ( 𝑝𝐵 − 𝑚𝑐𝐵).

Turn now to trip 2. Supposing that the store offered 𝐴′ as a substitute, the consumer cannot have

learned anything from the stockout substitution (as she already knew her taste for 𝐴′). She will,

therefore, order good 𝐴 on trip 2 just as she did on trip 1. In other words, E[Π2 | accept 𝐴′; 𝜃] =

𝑝 𝐴 − 𝑚𝑐 𝐴. What if, however, the store offered good 𝐵? On the one hand, if the consumer rejected it,

she will not have learned anything and will, therefore, order good 𝐴 again on trip 2. Consequently,

E[Π2 | reject 𝐵; 𝜃] = 𝑝 𝐴 − 𝑚𝑐 𝐴. On the other hand, if the consumer accepted good 𝐵, she might

have found it preferable to good 𝐴. That’s to say,

E[Π2 | offer 𝐵; 𝜃] = Pr[𝑢𝐵 > 𝑢 𝐴 | accept 𝐵; 𝜃] ( 𝑝𝐵 −𝑚𝑐𝐵) +Pr[𝑢𝐵 ⩽ 𝑢 𝐴 | accept 𝐵; 𝜃] ( 𝑝 𝐴 −𝑚𝑐 𝐴).

30Here, I implicitly assume that the consumer is myopic, meaning that she overlooks the (expected) value of learning
her true tastes for good 𝐵. In Section 2.5.1, I explain why this assumption is likely to provide a close approximation of
consumers’ true behavior in the context of curbside grocery pickup.

28

Notice that expected trip-2 profits conditional on the store’s offering good 𝐵 strictly exceed those

conditional on offering 𝐴′.

2.4.2 Substitution Policies and Consumer Welfare

I have hitherto assumed that the store minds the relationship between the stockout substitution

and our consumer’s learning. I have also supposed that the store exploits the microdata 𝑡ℎ𝑒𝑡𝑎 to

more accurately forecast our consumer’s prior-expected and true tastes for brand 𝐵 (𝑢𝐸

𝐵 and 𝑢𝐵,

respectively). In point of fact, accounting for these factors might increase the store’s administrative

and computational costs.31 How, therefore, would our consumer’s welfare change if the store

determined to disregard her (potential) learning or the consumer microdata?

Suppose first that the store does employ the microdata, but does not account for the con-

sumer’s (potential) learning. It will then choose a substitute according to the rule 𝑠1

𝐴 ( 𝑝, 𝑚𝑐; 𝜃) ≔
arg max𝑠∈{𝐴′,𝐵} E[Π1 | offer 𝑠; 𝜃]. Thus, the store will fail to offer the optimum substitute (as given
by 𝑠★

𝐴) if, and only if,

E[Π1 | offer 𝐵; 𝜃] < E[Π1 | offer 𝐴; 𝜃] < E[Π1 | offer 𝐵; 𝜃] + 𝛿 E[Π2 | offer 𝐵; 𝜃].

(2.6)

When these inequalities hold, which substitution policy is better for the consumer? The answer

depends on whether her present-discounted value of expected surplus is larger from the offer of

good 𝐵 or 𝐴′. If she is offered the latter, she will accept and enjoy the following present-discounted

surplus:

E[𝐶𝑆 | offered 𝐴′] = 𝑢 𝐴′ + 𝛿(𝑢 𝐴).

(2.7)

Now consider good 𝐵. Given our assumption that the consumer is myopic, she will accept good 𝐵

if, and only if, 𝑢𝐸

𝐵 ⩾ 0. Hence, her present-discounted value of expected surplus is

E[𝐶𝑆 | offered 𝐵] =

𝑢𝐸
𝐵 + 𝛿 max{E[𝑢𝐵 | 𝑢𝐵 > 𝑢 𝐴], 𝑢 𝐴}

if 𝑢𝐸

𝐵 ⩾ 0

0 + 𝛿 · 𝑢 𝐴

if 𝑢𝐸

𝐵 < 0

(2.8)





A comparison of Equations (2.7) and (2.8) reveals that the consumer benefits from the store’s

disregard of her learning whenever she is so pessimistic about good 𝐵 that she would reject it (i.e.,

31The cost associated with forecasting consumer learning and exploiting consumer microdata are probably dimin-

ishing with time. Thus, the store might find it more profitable to mind these factors in the future than at present.

29

𝑢𝐸
𝐵 < 0). The welfare effect is ambiguous, however, when the expected utility of good 𝐵 suffices for
her to accept (i.e., 𝑢𝐸
𝐵 ⩾ 0). On the one hand, the consumer’s expected surplus increases on trip 1

when the store overlooks her learning (and thus offers 𝐴′). This is because 𝑢 𝐴′ > 𝑢𝐸

𝐵. On the other

hand, her present-discounted value of expected trip-2 surplus strictly decreases. The reason is that

she no longer learns her true taste for good 𝐵 and will, as a consequence, certainly order good 𝐴

on trip 2 (even if good 𝐵 would, in point of fact, afford greater utility).

Consider finally the effect of the microdata on our consumer’s welfare. If the store disregards

both the microdata 𝜃 and the opportunity to steer the consumer’s learning, it will select a substitute

according to the rule ˜𝑠1

𝐴 ( 𝑝, 𝑚𝑐) ≔ arg max𝑠∈{ 𝐴′,𝐵} E[Π1 | offer 𝑠]. Here, expected trip-1 profits
conditional on offering good 𝐴′ remain unchanged because, with or without the microdata, the store

is confident that the consumer will accept. In other words, E[Π1 | offer 𝐴′] = E[Π1 | offer 𝐴′ |

𝜃] = 𝑝 𝐴′ − 𝑚𝑐 𝐴′. By contrast, expected trip-1 profits conditional on offering good 𝐵′ may differ

insofar as Pr[𝑢𝐸

𝐵 > 0; 𝜃]. Thus, the store may be more or less likely to offer good
𝐵 as a substitute depending on the structure of its beliefs, or without, the microdata. As far as

𝐵 > 0] ≠ Pr[𝑢𝐸

the consumer is concerned, the welfare effects of the store’s use or disuse of the microdata depend

once more on the relative value of the consumer’s learning her true taste for good 𝐵 compared to

the greater present-trip utility deliver by good 𝐴′.

2.5 Empirical Model and Estimation

In this section, I build a learning model of differentiated products demand. Then I describe the

estimation procedure.

2.5.1 The Model

Consider discrete choice among 𝐽𝑡 goods/products at “time” 𝑡,32 indexed by 𝑗 ∈ J𝑡 ≔

{1, . . . , 𝐽𝑡 }. These goods are sold under differentiated brands 𝑏 (such as “Häagen-Dazs” or “Halo

Top”). Let 𝐵( 𝑗) denote the brand of good 𝑗.33

The utility that consumer 𝑖 derives from good 𝑗 depends partly on her liking (or “taste”) for its

32In point of fact, 𝑡 is defined as the combination of a specific store location and time. For economy of exposition,

I focus on the temporal dimension of the index.

33Formally, the function 𝐵 : (cid:208)𝑡 ∈ T J𝑡 → B maps from each good sold to its brand. (Here T ≔ {1, . . . , 𝑇 } denotes

the set of all time periods, while B denotes the set of all brands.)

30

brand. This is measured by the scalar 𝑣𝑖𝐵( 𝑗) ∈ R. Utility also depends on the good’s non-brand

observable characteristics (𝑥 𝑗 ) and its price (𝑝 𝑗𝑡), which is the same online as in-store.34 Besides

these observable determinants of demand, there is an unobserved demand factor (𝜉 𝑗𝑡)35 and an i.i.d.

Gumbel error (𝜀𝑖 𝑗𝑡). In all,

𝑢𝑖 𝑗𝑡 = 𝑣𝑖𝐵( 𝑗) + 𝑥 𝑗 𝛽𝑖 − 𝛼𝑖 𝑝 𝑗𝑡 + 𝜉 𝑗𝑡 + 𝜀𝑖 𝑗𝑡 .

(2.9)

Of course, the consumer is not obliged to purchase any of the 𝐽𝑡 goods on offer. Let 𝑗 = 0 index

the “outside option” of purchasing nothing (which provides utility 𝑢𝑖0𝑡).36

Learning.—Consumers can, in principle, learn about their tastes for any observable character-

istic. However, computational limitations force me to focus on just one characteristic. I choose the

characteristic of brand for two reasons. First, descriptive evidence suggests that consumers learn

more about their tastes for brands than about their tastes for other characteristics (see Section 2.3.2).

And second, the characteristic of brand is among the primary determinants of products’ retail mar-

gins (see Section 2.3.3). The store may, therefore, profit more from steering consumers’ learning

about brands than from steering their learning about other characteristics.

I model consumers’ learning about brands as follows. If consumer 𝑖 has never purchased brand

𝑏, she holds the following (unbiased) beliefs about her tastes for the brand:

𝑣𝑖𝑏 ∼ Normal

(cid:16)

𝜇𝑖𝑏, 𝜄2
𝑏

(cid:17)

.

(2.10)

Once she purchases one of the brand’s products (i.e., some good 𝑗 such that 𝐵( 𝑗) = 𝑏), she will

learn her true tastes 𝑣𝑖𝑏 for the brand. Specifically, 𝑣𝑖𝑏 will be randomly drawn from Equation (2.10),

with the results of the draw determining her tastes for the brand on all future trips.37

34Prices are “exogenous” in the sense that the store cannot individualize prices based on consumers’ decisions to

accept or reject substitutes.

35This term captures unobserved store-level promotional activities that temporarily shift demand for the good, such

as being featured in a flyer or being placed in a prominent location (such as an “end cap”).

36I normalize 𝑢𝑖0𝑡 = 𝜀𝑖0𝑡 , where 𝜀𝑖0𝑡 is an i.i.d. Gumbel error.
37Here, I implicitly assume that a single consumption experience suffices to obtain full knowledge of one’s true
tastes for a brand. Although this “one-shot” model of learning is more restrictive than the Bayesian one used in
much of the literature (e.g., Erdem and Keane [1996]), it affords two key advantages. First, it accommodates richer
heterogeneity in consumers’ underlying tastes than would a more complex model of learning (see Erdem, Keane, and
Sun [2008] or Che, Erdem, and Öncü [2015]). And second, “one-shot” learning is likely a close approximation of
consumers’ true learning process in this environment. (Intuitively, less experience is required to learn whether one
likes a packaged snack or drink than whether one likes a more complex good, such as a car or a computer.)

31

Consumers hold heterogeneous prior beliefs about their tastes for a given brand. In particu-

lar, prior expected tastes for brands (the 𝜇𝑖𝑏’s) are normally distributed across the population of

consumers such that

𝜇𝑖𝑏 ∼ Normal

(cid:16)

(cid:17)

𝜇𝑏, 𝜎2
𝑏

for each brand 𝑏. However, all consumers’ priors are equally informative about a given brand 𝑏

(hence the absence of an 𝑖 subscript on 𝜄2

𝑏 in equation [2.10]).

In-Store Purchases, Curbside Orders, and Stockout Substitutions.— Whether she is shopping

in-store or online, each consumer 𝑖 purchases one unit of the good with the highest expected utility.38

The source of uncertainty is her tastes for brands. Concerning goods 𝑗 whose brands 𝐵( 𝑗) she has

never purchased before, the consumer’s expected utility depends on her prior-expected tastes for its

brand, namely 𝜇𝑖𝐵( 𝑗). As for goods 𝑗 whose brands she has bought before, she knows their exact

utilities (𝑢𝑖 𝑗𝑡) because she has already learned her true brand tastes (𝑣𝑖𝐵( 𝑗)) from experience.

Let I𝑖𝑡 denote the consumer’s prior beliefs and experiential knowledge concerning her prefer-

ences for brands at time 𝑡. Regarding each brand 𝑏 that the consumer has not yet purchased, the

information set contains the parameters 𝜇𝑖𝑏 and 𝜄2

𝑏 that characterize her prior beliefs. As to a brand

𝑏 that she has previously purchased, I𝑖𝑡 contains her true tastes 𝑣𝑖𝑏.

The expected utility of good 𝑗 ∈ J𝑡 \ {0} is given by

E[𝑢𝑖 𝑗𝑡 | I𝑖𝑡] = E[𝑣𝑖𝐵( 𝑗) | I𝑖𝑡] + 𝑥 𝑗 𝛽𝑖 − 𝛼𝑖 𝑝 𝑗𝑡 + 𝜉 𝑗𝑡 + 𝜀𝑖 𝑗𝑡,

with

E[𝑣𝑖𝐵( 𝑗) | I𝑖𝑡] =





𝑣𝑖𝐵( 𝑗)

if 𝑖 has bought brand 𝐵( 𝑗) before

(2.11)

otherwise.
If the consumer is placing an order for curbside pickup, her preferred good—say, 𝑗★—may go

𝜇𝑖𝐵( 𝑗)

out of stock. She will then be offered a substitute 𝑠 ∈ J𝑡 \ {0, 𝑗★}, which she will accept if and only

if

E[𝑢𝑖𝑠𝑡 | I𝑖𝑡] ⩾ 𝑢𝑖0𝑡 .
38I do not model the decision to order a good in the first place. In the data, it is difficult to distinguish between
curbside orders where (i) the consumer considered ordering a product from the relevant differentiated-products market,
but decided against it; and (ii) the consumer never considered ordering anything from the market in the first place.

(2.12)

32

Are Consumers Myopic or Forward-Looking?—Consumers’ purchases affect their expected

utility on future shopping trips as well as on the present one. The same is true of their decisions to

accept or reject stockout substitutes. This is because consumers can learn their true tastes for a brand

by either purchasing one of its products or by accepting one of them as a substitute. The resultant

learning would enable them to make more informed—and, in expectation, higher-utility—purchases

in the future.

Are consumers forward-looking, meaning that they account for the (expected) value of learning?

Or are they myopic, meaning that they do not? I assume the latter for two reasons. The first concerns

the purchase environment. When shopping for groceries, consumers typically face a multitude of

low-stakes decisions. To reduce the cognitive burden, consumers may focus on their present-

trip utility, rather than solving the dynamic maximization problem induced by learning’s impact

on future utility. Behavioral considerations aside, it is also computationally useful to assume

that consumers are myopic. In prior work where consumers are not assumed to be myopic, but

rather forward-looking, it has usually proved necessary to assume that all consumers share the

same underlying preferences among brands.39 By assuming that consumers are myopic, I can

accommodate heterogeneous underlying tastes for brands. And, in terms of forecasting consumers’

behavior under counterfactual substitution policies—the ultimate goal of this study—it is arguably

more important to capture heterogeneity in consumers’ underlying brand tastes than to model

(potentially) forward-looking behavior.40

Profits.—As far as this paper is concerned, discounted variable profits correspond to the present-

discounted sum of the retail margins associated with consumers’ purchases. With a view to

computing variable profits, let choose𝑖 𝑗𝑡 = 1 if consumer 𝑖 orders good 𝑗 online or purchases it

in-store; otherwise, let choose𝑖 𝑗𝑡 = 0. Likewise, define OOS 𝑗𝑡 as an indicator variable for good 𝑗’s

39Osborne (2011) and Shin, Misra, and Horsky (2012) provide noteworthy exceptions. Both assume that consumers
are forward-looking and that they possess heterogeneous underlying preferences. To surmount the resultant compu-
tational challenges, however, both studies resort to smaller estimation sample sizes (fewer than 700 households) than
would be ideal for this study, where heterogeneity in consumers’ past purchase histories is of direct interest.

40Concerning the Norwegian market for new books, Daljord (2022) provides quasi-experimental evidence that
consumers evince far greater impatience than the real rate of interest would imply. So, to the extent that consumers are
forward-looking while shopping for groceries—arguably, a faster-paced activity (with lower stakes per item purchased)
than that of shopping for new books—this feature of their behavior is likely of second-order importance.

33

undergoing a stockout substitution at time 𝑡. Lastly, let accept𝑖𝑠𝑡 be an indicator for consumer 𝑖’s

accepting good 𝑗 as a stockout substitute. Then the discounted variable profits from consumer 𝑖 at

time 𝑡 are computed as follows:

Π𝑡

𝑖 =

𝑇𝑖∑︁

(cid:214)

𝑡′=𝑡

𝑗 ∈J𝑡′

𝛿𝑡′−𝑡 ( 𝑝 𝑗𝑡′ − 𝑚𝑐 𝑗𝑡′)choose𝑖 𝑗𝑡′ −OOS 𝑗𝑡′

(cid:18) (cid:214)

𝑠∈J𝑡′ \{ 𝑗 }

( 𝑝𝑠𝑡′ − 𝑚𝑐𝑠𝑡′)accept𝑖𝑠𝑡

(cid:19) OOS 𝑗𝑡′

,

where 𝛿 denotes the discount factor.41

2.5.2 Estimation Method

Several sets of parameters need to be estimated. The first set of parameters pertain to consumers’

prior expected tastes 𝜇𝑖𝑏 for brands 𝑏, as well as their true tastes 𝑣𝑖𝑏. Regarding the latter, two

distinct parameters contribute to heterogeneity in consumers true tastes 𝑣𝑖𝑏 for a given brand 𝑏.

One is 𝜎2

𝑏 , which measures heterogeneity in consumers’ prior expected tastes for the brand; while

the other is 𝜄2

𝑏, which gauges the amount of learning when consumers first try the brand. Summing

these two parameters yields the standard deviation of consumers’ true tastes for a given brand.

Specifically,

𝑣𝑖𝑏 ∼ Normal(𝜇𝑏, 𝜎2

𝑏 + 𝜄2

𝑏).

This follows immediately from 𝑣𝑖𝑏 ∼ Normal(𝜇𝑖𝑏, 𝜄2

𝑏) and 𝜇𝑖𝑏 ∼ Normal(𝜇𝑏, 𝜎2

𝑏 ). For further

details on the brand parameters, see Chapter 2C.

The second set of parameters bears on products’ non-brand observable characteristics 𝑥 𝑗 . Let

𝑘 index specific characteristics, so that 𝑥 𝑗 = (𝑥 𝑗1, . . . , 𝑥 𝑗 𝑘 , . . . , 𝑥 𝑗 𝐾) for each good 𝑗. Because

consumers innately know their tastes 𝛽𝑖𝑘 for each non-brand characteristic 𝑘, the distribution of

consumers’ taste parameters is recovered with the same procedure as in the familiar mixed logit

model (see Arteaga et al. [2022]). For most non-brand characteristics, I assume that tastes 𝛽𝑖𝑘

follow a normal distribution, conditional on the demographics of consumers’ households:

𝛽𝑖𝑘 = 𝛽𝑘 + 𝛽𝐷

𝑘 𝐷𝑖 + 𝜎𝑘 𝜔𝑖𝑘 ,

41Although 𝑡 indexes both time and location, here I abuse notation by letting 𝑡′ − 𝑡 denote the time elapsed between
trips 𝑡′ and 𝑡. As to the discount factor itself, I impose a 0.9998 real daily discount rate. This translates to a 0.93 real
annual discount rate, which falls between the discount factor of 0.9 used by Ryan (2012) and the discount factor of
0.95 used by Collard-Wexler (2013).

34

where 𝜔𝑖𝑘 ∼ Normal(0, 1). To reduce the computational burden of simulation, however, I estimate

fixed coefficients on a few non-brand characteristics (so that 𝛽𝑖𝑘 = 𝛽𝑘 for all consumers 𝑖).

The third set of parameters governs consumers’ price sensitivity. Conditional on household

income, I assume that the random price coefficient 𝛼𝑖 follows a log-normal distribution with shift

parameter 𝛼 and scale parameter 𝜎2
𝛼.

The fourth set of parameters concerns the method by which consumers accept or reject stockout

substitutes. Before September 2021, consumers learned of stockouts upon arriving at the store and

then accepted or rejected the substitute on the spot. Since September 2021, however, consumers

have been able to accept or reject remotely using the store’s app or website. Because this new

procedure may have lowered the cost of rejecting a substitute,42 I allow the utility of rejection to

differ before versus after September 2021. In particular, I assume that the consumer will accept a

substitute 𝑠 if and only if

E[𝑢𝑖𝑠𝑡 | I𝑖𝑡] ⩾ 𝑢𝑖0𝑡 − 𝛾 · 1[reject in-person],

(2.13)

where the parameter 𝛾 gauges the added cost of rejecting a substitute in-person (as opposed to

remotely).

All the foregoing determinants of demand are observed in the data. However, demand also

depends on unobservable factors that vary across space and time. One such factor is store- and

time-specific promotional activities, like inclusion in a flyer or placement in a prominent location

(like an“end-cap”). In the utility specification, shocks of this description are represented by the

term 𝜉 𝑗𝑡.43 To recover 𝜉 𝑗𝑡, I employ the control function approach proposed by Kim and Petrin

(2019). This approach proceeds in two steps.

In the first, I estimate the reduced-form pricing

function. Besides the variables that enter the utility function—namely, the brand 𝐵( 𝑗) and the non-

brand observable characteristics 𝑥 𝑗 —the pricing function also incorporates a set of instrumental

variables that are excluded from demand. I employ products’ wholesale costs 𝑤𝑐 𝑗𝑡 as the excluded

42For a start, it may be easier for the consumer to plan a trip to an additional grocery or convenience store if
she knows of the stockout in advance. There may also be a psychological dimension; when accepting or rejecting
substitutes in-person, consumers might feel social pressure to accept the substitute.

43Although these shocks vary differently over time among consumers 𝑖, I omit an 𝑖 subscript because 𝑡 indexes

combinations of specific store locations and times.

35

instruments. The intuition is that wholesale costs should be correlated with retail prices, but

uncorrelated with store-level promotional activities. All told, the reduced-form pricing function

takes the following form:

𝑝 𝑗𝑡 = 𝜂𝐵( 𝑗) + 𝑥 𝑗 𝜑 + 𝜓 · 𝑤𝑐 𝑗𝑡 + ˜𝜉 𝑗𝑡 .

I estimate this equation via OLS. Because the store changed its internal cost measure in January

2021,44 I perform separate regressions before and after that date. Then, in the second step of the

control function procedure, I substitute 𝜉 𝑗𝑡 = 𝜆 ˜𝜉 𝑗𝑡 in the utility function. Here, ˜𝜉 𝑗𝑡 is the residual

from the reduced-form price regression and 𝜆 is a parameter to be estimated. Due to the change in

the store’s internal cost measure during January 2021, I estimate separate coefficients 𝜆pre-21 and

𝜆post-21 on the control functions before and after that date.

With the control function in hand, the parameters that govern consumers’ utility and learning

are obtained via maximum simulated likelihood estimation. My estimation code is adapted from

Arteaga et al. (2022). See Chapter 2C for details on the estimation method.

Identification.—Formal identification of the model’s parameters is beyond the scope of this

paper. Instead, I will describe how the parameter estimates depend on specific moments of the data.

Because previous work has already identified differentiated products demand in the absence of

consumer learning (see Berry and Haile [2024, 2021, 2016, 2014]), as well as random-coefficients

discrete choice more generally (see Fox et al. [2012] and Iaria and Wang [2024]), my discussion

focuses on the parameters that pertain to consumers’ learning.45

First consider 𝜇𝑏. This parameter measures how much the average consumer expects to like

brand 𝑏 before she tries it. Because consumers’ prior beliefs are assumed to be unbiased, 𝜇𝑏

also gauges how much the average consumer would actually like the brand if she tried it.46 The

parameter 𝜇𝑏 is sensitive to the following moment of the data. Are brand 𝑏’s product’s more or

44Before January 2021, the store included some fixed costs in its internal cost measure (as well as the wholesale

cost).

45Although Shin, Misra, and Horsky (2012) identify a Bayesian learning model of demand, the intuition differs from
the “one-shot” learning model estimated here. In a Bayesian learning model, the researcher must untangle two distinct
learning effects: bias reduction and uncertainty reduction. In a one-shot model, by contrast, a single consumption
experience suffices to eliminate both bias and uncertainty in the consumer’s beliefs.

46This is because consumers’ prior beliefs are assumed to be biased.

36

less popular than would be expected, given their respective (non-brand) observable characteristics,

prices, and unobserved demand factors? If they are more popular than expected, brand 𝑏 must be

comparatively well liked. Thus, 𝜇𝑏 should be large. On the other hand, if the brand’s products

possess smaller market shares than expected, consumers must not like the brand very much. Hence,

𝜇𝑏 should be small.

Now turn to 𝜎2

𝑏 , which measures heterogeneity in expected tastes for brand 𝑏 among consumers

who have not yet tried the brand. This parameter is sensitive to variation across consumers in the

number of shopping trips before they purchase the brand for the first time. To see the intuition,

suppose first that there is little variation in how long consumers wait before trying one of the brand’s

products. This suggests that consumers are similarly optimistic about their tastes for the brand, so

𝜎2

𝑏 is likely small. Now imagine, instead, that there is considerable variation in how long consumers

wait before trying the brand; whereas some consumers purchase the brand on one of their earliest

shopping trips, others wait a long time before doing so. These two groups of consumers probably

differ in their expected tastes for the brand, with the former group being more optimistic than the

latter. Thus, 𝜎2

𝑏 should be large.

Finally, consider 𝜄2

𝑏. This parameter measures the amount of learning that consumers experience

when they try one of brand 𝑏’s products for the first time. To what extent do their true tastes for the

brand (𝑣𝑖𝑏) differ from their expected tastes (𝜇𝑖𝑏)? This parameter partly depends on the following

moment of the data.47 Consider the subset of consumers who try brand 𝑏 for the first time because

of a stockout substitution. How often do these consumers purchase brand 𝑏 in the future? If they

seldom do so, they probably did not learn much about the brand from the substitution. Rather, the

experience confirmed their pessimistic prior beliefs about their tastes for the brand. Consequently,

𝑏 should be small. Now suppose, instead, that many consumers proceed to purchase brand 𝑏 quite
𝜄2
frequently. These consumers likely learned a lot from the substitution, finding brand 𝑏 more to

their tastes than they had expected. Hence, 𝜄2

𝑏 should be large.

47In addition to variation from stockout substitutions, the 𝜄2

𝑏 estimates also depend on a subtler relationship between
(i) the number of purchases before consumers first try out the brand, and (ii) the frequency with which they purchase
the brand’s products thereafter.

37

2.5.3 Construction of Estimation Data Set

In this subsection, I describe how I assemble the data set used to estimate the demand model

above. As the procedure closely resembles the one used by Zeyveld (2024), much of this subsection

is adapted from Section 6 of that paper.

I cannot estimate demand for all the products within a given product category due to computa-

tional constraints. For this reason, I exclude slow-selling brands and products from estimation.48

Computational constraints also prevent me from including all consumers in estimation. Rather,

within each product category, I perform estimation on the following subset of consumers. First, I

find consumers who experience stockout substitutions where both the out-of-stock product and the

substitute are popular products. (These consumers are used both in estimation and in counterfac-

tuals.) Next, to increase the sample size, I randomly sample additional consumers who have also

experienced a stockout substitution—albeit one where either the ordered product or the out-of-stock

one is a slow-selling product. (These consumers are included for estimation but excluded from

counterfactuals.)

Having sampled consumers for estimation, I need to reconstruct the discrete choice problems

that they faced on each shopping trip. What products were available for purchase? And what were

their prices? Recall that the scanner data directly record the UPC and price of the item that was

purchased. However, these data also enable me to infer the UPCs and prices of goods that the

consumer did not purchase. To do so, I consult the chain’s product catalog in order to obtain the

UPCs of the store’s offerings within the relevant category. Then, turning to the scanner data, I

compare these UPCs with those of products sold at the relevant store. If I observe a given product

being purchased at the relevant store on the same day as our consumer’s shopping trip, I assume

that the product was within her choice menu. Failing that, I presume that the product was available

if it was purchased on both the day before and the day after our consumer’s trip. Otherwise, I

48Regarding super-premium ice cream, I estimate demand for products that are (i) sold under one of the top three
brands, (ii) command at least 0.5% market share among consumers who place at least one curbside pickup order, and
(iii) are not “limited edition” products (whose non-brand characteristics are not recorded in the product catalog). These
products populate 77.1% of purchases by consumers who place at least one curbside pickup order. As for apple sauce
cups, I estimate demand for products with at least 0.5% market share among consumers who place at least one curbside
pickup order. Such products cover 98.2% of purchases by consumers who place at least one curbside pickup order.

38

assume that the product was absent from the consumer’s choice set (either because it was out of

stock, or because the store did not carry it at all).

Given that a product appears to be available, I impute its price as being the mean purchase price

on the day of the consumer’s shopping trip (within the relevant store location).49 If no purchases are

observed on the precise day of the trip, I instead take the unweighted average of the mean purchase

prices on the days immediately before and after.

Consumers’ purchases sometimes deviate from the underlying assumptions of my discrete

choice model. For a start, consumers sometimes purchase multiple distinct products on a single

shopping trip. To illustrate, a consumer shopping for super-premium ice cream might purchase

both Häagen-Dazs and Halo Top ice cream on the same trip. I drop all such observations from

estimation.50 In addition, consumers sometimes purchase multiple units of the same product on a

single shopping trips (thereby “stockpiling” the relevant product). For simplicity, I do not model

the consumer’s choice of quantity. Instead, different quantities of a given product constitute a single

option within my discrete choice framework.51

Initial Conditions Problem.—Some consumers have made purchases at the store before the

earliest date recorded in my data (April 24, 2016). This creates an initial conditions problem:

When I observe consumers’ purchases early in the data, are they experiencing brands for the first

time? Or had they purchased them previously, before coverage begins in the data?52

In order to minimize this problem, I drop consumers’ four purchases of super-premium ice

cream or their first seven purchases of apple sauce cups. These “burn-in” periods are motivated

by the following stylized facts. After her first four (nine) shopping trips, three-quarters of super-

premium ice cream (apple sauce cups) buyers have purchased all the brands that they will ever buy

49The chain maintains a policy of uniform prices online and in-store.
50This results in the exclusion of 59.6% (25.1%) of transactions involving super-premium ice cream (apple sauce

cups).

51In the product categories of super-premium ice cream and apple sauce cups, consumers with 1+ stockout substi-

tutions purchase multiple units of a single product on 32.3% and 28.7% of shopping trips, respectively.

52A related, but distinct, concerns purchases at other supercenter chains. If someone purchases a given brand for
the first time at another chain, then her earliest purchase of that brand within the data would not occasion learning.
However, most of the behavioral markers that identify the brand parameters are spread over many transactions. This
should reduce bias from the misattribution of learning.

39

at the store.

2.6 Estimation Results

In this section, I report estimates for the demand model developed in Section 2.5 and then

evaluate how well the model fits the data.

2.6.1 Parameter Estimates

Table 2.4 presents the demand estimates for super-premium ice cream. Concerning the 𝜇𝑏

estimates, the average consumer narrowly prefers Ben & Jerry’s to Häagen Dazs, while Halo Top

comes in a distant third. As for the 𝜎𝑏 estimates, consumers display considerable heterogeneity in

their prior expected tastes for the all three brands. In fact, 𝜎Halo Top exceeds the difference in mean

utility between Halo Top and Ben & Jerry’s (that is, 𝜎Halo Top > 𝜇Ben & Jerry’s − 𝜇Halo Top). Finally,

consider the 𝜄𝑏 estimates. For each brand 𝑏, this parameter is smaller than the corresponding value

of 𝜎𝑏. To see what this means, consider two randomly-selected consumers 𝑖 and 𝑖′. On average, the

disparity in our consumers’ prior expected tastes for a given brand 𝑏—that is, |𝜇𝑖𝑏 − 𝜇𝑖′𝑏 |—exceeds

the amount of learning that consumer 𝑖 or 𝑖′ would experience if one of them tried the brand for the

first time (i.e., |𝑣𝑖𝑏 − 𝜇𝑖𝑏 | or |𝑣𝑖′𝑏 − 𝜇𝑖′𝑏 |).

Consider next the products’ non-brand observables and prices (where my treatment of the

former closely follows Sullivan [2020].) According to the 𝛽 estimates, the average consumer

prefers chocolate-flavored ice cream over mint, sweet cream, or vanilla. She also favors a single

flavor of ice cream over multiple flavors (particularly when she belongs to a large household). As for

mix-ins, such as cookie dough or chocolate chips, the average consumer wants one or two mix-ins

(but not three or more, due to the negative coefficient on the quadratic term). The 𝜎𝑏 estimates,

meanwhile, point to considerable unobserved heterogeneity in consumers’ preferences for both

flavor and mix-ins. Turning to the random coefficient on price, recall that 𝛼𝑖 is assumed to follow a

log-normal distribution with shift parameter 𝛼 and scale parameter 𝜎2

𝛼.53 The former exceeds the

latter in absolute value, so there is considerable variation across consumers in the marginal utility

of additional income.

53Recall that the price enters the utility function negatively; see Equation (2.9).

40

Table 2.4: Parameter Estimates

Mean exp.
tastes (𝜇𝑏’s)
−11.448
[1.210]
−12.029
[1.062]
−15.808
[1.216]

Panel A. Brands

Heterogeneity of
exp. tastes (𝜎𝑏’s)

Amount of
learning (𝜄𝑏’s)

2.820
[0.054]
3.294
[0.075]
4.693
[0.123]

0.091
[0.034]
0.903
[0.050]
1.971
[0.082]

Panel B. Non-brand observables and prices

Means
(𝛽’s or 𝛼)
−0.380
[0.063]
−2.358
[0.136]
−1.970
[0.109]
−0.849
[0.067]
−0.387
[0.089]
0.792
[0.047]
−0.279
[0.008]
1.052
[0.076]
−0.689
[0.083]

Demographic interactions (𝛽𝐷

Std. devs.
(𝜎𝛽’s or 𝜎𝛼)

Household
income

Household
size

𝑘 ’s or 𝛼𝐷)
Age of oldest
HH malea

2.110
[0.047]
3.297
[0.121]
2.121
[0.088]
2.714
[0.060]

1.357
[0.017]

0.697
[0.039]

−0.002
[0.000]

0.004
[0.002]
−0.002
[0.001]

−0.082
[0.014]
0.014
[0.006]

−0.002
[0.003]

Ben & Jerry’s

Häagen-Dazs

Halo Top

Flavor: chocolate

Flavor: mint

Flavor: sweet cream

Flavor: vanilla

Multiple flavors

No. of mix-ins

(No. of mix-ins)2

Quantity (oz.)

Priceb

Panel C. Other explanatory variables

Control function (pre-2021)c

Control function (post-2021)c

Reject in-persond

Coefficients
(𝜆’s or 𝛾)

0.031
[0.052]
0.303
[0.043]
1.377
[0.166]

Notes: estimates are based on 16,827 randomly-sampled observations involving 1024 households. Standard errors
(in brackets) do not correct for measurement error in the control function.

a Or oldest female (if no male in household).
b Conditional on household income, the random price coefficients 𝛼𝑖 are assumed to follow a log-normal

distribution.

c The demand shocks are specified as 𝜉 𝑗𝑡 = 𝜆 ˜𝜉 𝑗𝑡 , where ˜𝜉 𝑗𝑡 is the residual from the pricing function and 𝜆 is a
scaling parameter (reported here). This control function is computed separately before/after January 2021, due

to a

change in the store’s internal cost measure.
d Until September 2021, consumers accepted or rejected stockout substitutes upon arrival at the store. Starting
September 2021, they could accept or reject substitutes remotely (using the store’s app or website).

41

Other Product Categories.—The estimated parameters for apple sauce cups appear in Ta-

ble 2D.1. As for flavored milk and frozen french fries, convergence issues prevent me from

estimating structural models (or performing counterfactual simulations).

2.6.2 Goodness of Fit

How accurately does the model predict the acceptance or rejection of stockout substitutes?

And how well does the model reproduce the dynamics of consumers’ subsequent purchases?

To answer these questions, I compute the predicted probability of acceptance and then perform

forward simulations of consumers’ purchases and learning thereafter.

In both steps, I compute

choice probabilities that reflect consumers’ revealed preferences and beliefs. To accomplish this,

I do not assign equal weights to all the simulation draws of the random coefficients, but rather

compute “conditional weights” that reflect the consumer’s observed choices (see Revelt and Train

[2000]). To see the intuition, consider a consumer 𝑖 who never buys chocolate-flavored ice cream.

By the logic of revealed preferences, consumer 𝑖 probably likes chocolate-flavored ice cream less

than the average consumer does. This suggests that 𝛽𝑖,chocolate < 𝛽chocolate. The same intuition

can be adapted to recover conditional distributions on consumers’ prior beliefs and true tastes for

brands (the 𝜇𝑖𝑏’s and 𝑣𝑖𝑏’s, respectively), along with their marginal utilities of additional income

(the 𝛼𝑖’s). In the results discussed below, I employ conditional distributions that reflect the entirety

of consumers’ purchases in the data—before, during, and after the stockout substitution.54

Accept/Reject Decisions.—Regarding the acceptance or rejection of stockout substitutes, the

model’s fit can be assessed in several ways. The simplest is to compare the predicted and observed

rates of acceptance. These prove extremely close; whereas the model predicts that consumers will

accept 81% of stockout substitutes, they actually accept 79%. Another measure of fit is the predicted

probability assigned to consumers’ true decisions. This amounts to the predicted probability of

acceptance when the consumer is observed to accept and the predicted probability of rejection

54In the counterfactual simulations presented in Section 2.7, I impose that the store cannot foresee consumers’
future purchases as it chooses a stockout substitute. Counterfactual substitution policies are, therefore, defined using
conditional distributions based on consumers’ past purchases only (as these are known to the store at the time of the
stockout). Ex post, however, I assess the outcomes that would be realized under a given substitution policy by using
conditional distributions that reflect all observed choices in the data.

42

otherwise. The model assigns consumers’ true accept/reject decisions a mean predicted probability

of 79%.

Repetition of Brand and Product Choice.—Another important moment concerns the frequency

with which consumers purchase the same brand or product on successive shopping trips. There

are several reasons why the predicted probability of repetition might be too small. One is the

misspecification of demand. Although I accommodate both observed and unobserved heterogeneity

in consumers’ preferences and beliefs, the deterministic portion of utility (that is, 𝑢𝑖 𝑗𝑡 − 𝜀𝑖 𝑗𝑡)

imperfectly captures consumers’ time-invariant tendencies to like (or dislike) certain products.55

To compensate, the model assigns undue importance to the i.i.d. Gumbel errors, which are (by

construction) uncorrelated over time. Another challenge is the finite number of shopping trips

observed per consumer. Although my predictions employ “conditional” distributions of random

coefficients which reflect consumers’ observed choices, these conditional distributions remain non-

degenerate. This is because many realizations of the random coefficients could, in principle, be

consistent with a given consumer’s observed choices.

I find that the model closely matches the true frequency of repeat brand purchases in the data.

Whereas the predicted probability that consumers purchase the same brand on successive shopping

trips is 86%, the true probability is 81%. Regarding individual products, the model performs

comparatively worse. The predicted probability that consumers purchase the same product on

successive shopping trips is 12%, but the true probability is 27%.

Endogenous Learning.—How often do consumers try out new brands on their own initiative?

On the one hand, if consumers frequently experiment with new brands, the store will gain little

from introducing a consumer to a profitable new brand through a stockout substitution. For,

even if the stockout substitution had never occurred, the consumer would likely have tried the

relevant brand soon anyway. On the other hand, if consumers rarely try out new brands, stockout

substitutions present an important opportunity for the store to encourage consumers to try out

profitable new brands. So, if the model under- (over-) predicts the frequency of endogenous

55For one thing, my model excludes behavioral phenomena like inertia or incomplete consideration that may

reinforce the tendency to make repeat purchases.

43

learning, the counterfactual simulations will tend to over- (under-) state the gains from steering

consumers’ learning through stockout substitutions.

To assess fit in relation to endogenous learning, I start by subtracting the number of brands

known to the consumer at the time of the stockout from the total number of brands known at the

latest date in the data (July 2023). Then I compare this measure of observed brand experimentation

with its predicted counterpart. Whereas consumers are predicted to try an average of 0.43 new

brands after the stockout, they actually try out 0.39 new brands.

Learning from Stockout Substitutions.—The profitability of steering consumers’ learning de-

pends critically on the following moment of the data. Consider a consumer who has accepted a

stockout substitute from brand that she has never purchased before. Is she likely to purchase the

substitute’s brand on her own initiative in the future? If so, how often? My model predicts that

consumers will purchase the brand of the store’s chosen substitute on 73% of future shopping trips.

The actual proportion in the data is 76%.

2.7 Counterfactual Simulations

How much would profits increase if the store exploited stockout substitutions to steer consumers’

learning? And how would this affect consumer welfare? To answer these questions, I conduct

counterfactual simulations using the estimated primitives of consumers’ beliefs, learning, and

tastes.

In what follows, I compare profits and consumer welfare under the store’s existing policy

with the corresponding outcomes under counterfactual policies. At present, the store’s “baseline”

substitution policy leaves the choice of substitute to whichever worker happens to be assembling

the curbside order. He is asked to exercise his “best judgement” in selecting a suitable replacement

for the out-of-stock product. As for the counterfactual substitution policies, these vary along two

dimensions. One is the amount of information about consumers that the store employs. Recall that

the store knows consumers’ past purchases and household demographics, as well as their original

order choices. How does the optimal choice of substitute change when the store leverages (i) none

of this information; (ii) only its knowledge of consumers’ original orders (i.e., the sole information

44

available to store workers under the baseline policy); or (iii) all the information available to the

store? The other dimension along which counterfactual policies vary is the store’s objective

function. Does the store seek to maximize its expected profits on the present shopping trip alone?

Or does it also account for stockout substitutions’ influence on consumers’ learning (and future

purchases)?

2.7.1 Simulation Method

I construct the counterfactual substitution policies using the estimated model from Section 2.5,

coupled with plausible assumptions about the future evolution of products’ prices and availabilities.

Under each counterfactual policy, the store will choose substitutes that maximize (a) expected

present-trip profits, (b) the present-discounted value of total profits, or (c) the present-discounted

value of future profits alone. Observe that (a) depends on the retail margins and acceptance

probabilities of the possible substitutes on the shelf, while (b) additionally depends on the learning

that the consumer would experience if she were to accept and (c) depends only on the latter. Here,

products’ retail margins are directly observed in the data, but the other factors must be simulated.

Focus first on the probability of acceptance. Under counterfactual policies that disregard the

consumer microdata, the store will compute this probability based on the joint distribution of tastes

and beliefs over the population of consumers. Concerning the other counterfactual policies, by

contrast, the store exploits its knowledge of consumers’ original orders, purchase histories, and

household demographics to calculate more accurate acceptance probabilities. As in Section 2.6.2, I

compute “conditional” choice probabilities that reflect consumers’ revealed preferences and beliefs.

Here, however, I compute conditional probabilities based solely on the data employed by the relevant

substitution policy. One of the counterfactual policies, for example, leverages only the microdata

on consumers’ original orders (as opposed to their past purchases or household demographics).

As far as this policy is concerned, the probability of acceptance should be conditioned solely

on original orders. Another set of counterfactual policies, meanwhile, exploit all the microdata

available to the store at the time of the stockout. Regarding these policies, I compute conditional

probabilities of acceptance that reflect consumers’ household demographics, original orders, and

45

past purchases—not their future purchases (which the store has not yet observed at the time of the

stockout).

Now turn to future profits. How might a consumer’s acceptance (or rejection) of a substitute

influence the store’s expected future profits? In principle, the influence of a stockout substitution

might extend infinitely into the future. To avoid overstating the returns to steering consumers’

learning, I focus on a short time horizon: one year.

The store faces several sources of uncertainty where future profits are concerned. One is the

timing of consumers’ future shopping trips. Here, I assume that the store adopts a simple heuristic:

for each consumer 𝑖, the frequency of future shopping trips is imputed as being the average frequency

of her shopping trips up to (and including) the stockout substitution. The store is also unsure of

the future availabilities, prices, and wholesale costs of products within the relevant category. For

simplicity, I assume that the store does not possess “insider” knowledge about the evolution of

these factors. Instead, the store randomly samples (with replacement) from the choice sets faced by

consumer on past shopping trips. (Each such draw consists of the entire choice menu—including

availabilities, prices, and wholesale costs—on a single shopping trip.) This allows for persistent

variation across consumers in the composition of choice sets. (Such variation might be rooted in

the size of the local store, the preferred time of day for shopping, etc.)

This procedure yields a synthetic dataset of future shopping trips. Next, I compute the choice

probabilities associated with the future shopping trips within this synthetic dataset. In so doing, I

account for the “endogenous learning” that occurs when consumers try new brands on their own

initiative. To see why this matters, consider a consumer who has never purchased a given brand

𝑏. Even if the store does not offer her one of 𝑏’s products as a substitute, she still might learn

her taste for the brand on a future trip if she elects to purchase one of its products. Endogenous

learning may, therefore, reduce the potential returns to steering consumers’ learning. Then, having

derived the choice probabilities associated with consumers’ subsequent shopping trips, I compute

the store’s present-discounted profits using a 0.9998 real daily discount rate.

Of course, this procedure reflects future profits under just one potential future state of the world.

46

Accordingly, I repeat the entire procedure—synthesizing data and computing choice probabilities—

several times in order to “integrate” over possible future states of the world. Finally, I average across

these simulation rounds to obtain the present-discounted value of expected future profits associated

with the acceptance or rejection of each available substitute. The “steering substitute” is then

defined as the product that maximizes the sum of (a) the expected retail margins on the present

shopping trip and (b) the present-discounted value of expected future profits.

Evaluating Profits and Consumer Welfare.—Having defined the counterfactual substitution

policies, I compare expected profits and consumer welfare under these policies with those under the

“baseline” policy. Here, I exploit the entirety of the data—including consumers’ purchases after

stockout substitutions.

The profits associated with a stockout substitute depend, once more, on the retail margin, the

probability of acceptance, and the present-discounted value of expected profits (conditional on

either acceptance or rejection). Regarding the probability of acceptance, I now leverage the entirety

of the relevant consumer’s observed choices—before, during, and after the stockout substitution—

as I compute the conditional weights on the simulation draws of the random coefficients. As

for future profits, I employ a similar heuristic to the one employed to characterize the “steering”

substitution policy. Now, however, I impute the frequency of the consumer’s future shopping trips

as being the average across the entirety of her shopping trips in the data. Likewise, when simulating

products’ future availabilities, prices, and wholesale costs, I sample (with replacement) from the

entirety of her shopping trips in the data.

Having computed the choice probabilities associated with future shopping trips, I calculate

the expected future profits associated with the substitutes offered under the baseline and steering

policies. This entire process is repeated several times (again, with a view to “integrating” over

possible future states of the world). Finally, I compare the present-discounted value of expected

profits under the baseline and “steering” policies by averaging across the simulations.

47

2.7.2 Counterfactual Results

Table 2.5 compares profit-related outcomes under the “baseline” and counterfactual substitution

policies. Recall that the latter vary along two dimensions. One is the extent of consumer microdata

employed. I compare policies that use (a) none of the microdata, (b) only consumers’ original orders,

or (c) all available microdata (including consumers’ past purchases and household demographics).

As for the second dimension that differentiates the counterfactual policies, I assess three possible

store objective functions: (i) maximizing profits on the present shopping trip, (ii) maximizing the

present-discounted value of total profits, both present and future; or (iii) maximizing the present-

discounted value of future profits alone. Observe that objectives (ii) and (iii) account for stockout

substitutions’ influence on consumer learning, whereas objective (i) does not. For brevity, I assume

that the store adopts objective (i) if it leverages only some (or none) of the consumer microdata.56

As for (iii), this objective serves a purely illustrative function, depicting the outer limits of the

store’s ability to increase future profits by steering consumers’ learning.

These criteria translate to the following counterfactual policies. The first, which I refer to

as the “one-size-fits-all” substitution policy, exploits no consumer microdata whatsoever. Thus,

the store’s choice of substitute only depends on the availabilities, prices, and wholesale costs of

possible substitutes. By contrast, the second substitution policy—termed the “individualized-by-

order” policy—accounts for consumers’ original orders. To the extent that original orders reveals

consumers’ tastes and beliefs, the substitutes offered under this policy should be likelier to be

accepted than those offered under the “one-size-fits-all” policy. The remaining counterfactual

policies, styled as the “fully individualized” policies, additionally leverage the store’s knowledge of

consumers’ past purchases and household demographics. Here, I assess all three possible objective

56The following stylized example illustrates why it would be unappealing for the store to steer consumers’ learning
without exploiting its consumer microdata. Consider a consumer who has purchased the highest-margin brand—
namely, Halo Top—on one past shopping trip. However, none of her intervening purchases are sold under the Halo
Top brand. Instead, she has since purchased the brand that affords the second-highest margins: Ben & Jerry’s. Based
on her purchase history, the store should probably offer a substitute from the Ben & Jerry’s brand, not Halo top. For
a Halo Top–branded substitute would be much likelier to be rejected than a Ben & Jerry’s–branded substitute. And
even in the (comparatively unlikely) event that the consumer accepted a substitute from the Halo Top brand, she would
not learn anything further about the brand; she already knows that she does not like it very much. Contrary to this
intuition, however, a policy that tried to steer all consumers’ learning—irrespective of purchase histories—might still
offer our consumer a Halo Top–branded substitute.

48

functions: namely, maximizing present-trip profits (as in the foregoing policies); maximizing the

present-discounted value of total profits; and maximizing the present-discounted value of future

profits alone.

Regarding the present shopping trip, notice that the store offers higher-margin substitutes under

all the counterfactual policies than under the baseline policy. This is unsurprising. Under the

baseline policy, store workers are asked to assess possible substitutes based on their similarity to

the out-of-stock product, not their profitability. As to the differences in retail margins among the

counterfactual substitution policies, these are partly rooted in the substitutes’ brands. Under the

“one-size-fits-all” policy, which employs none of the microdata, nearly all the substitutes (96%)

are sold under the Ben & Jerry’s brand. Here, Ben & Jerry’s proves the best substitute brand

for the “average” consumer because (a) the average consumer prefers Ben & Jerry’s to the other

two brands (see Table 2.4); and (b) the brand affords fairly high retail margins (see Figure 2.1).

When the store adopts an individualized substitution policy, by contrast, it can select substitutes

that reflect individual consumers’ brand preferences—which may differ markedly from those of

the “average” consumer. Under the “individualized-by-original-order” policy, the substitute shares

the same brand as the out-of-stock product even more often than under the baseline policy (97%

versus 92%). By comparison, a smaller proportion of the “fully-individualized” substitutes share

the brand of the out-of-stock product (specifically, 81% under the policies maximizing present-trip

or total discounted profits, and 29% under the policy maximizing discounted future profits).

Turn next to the predicted probability of acceptance. Here, outcomes under the “one-size-

fits-all” policy and the baseline are virtually indistinguishable; the former affords acceptance

probabilities of 80% and the latter 81%. By comparison, the probability of acceptance rises to

90% under the “individualized-by-original-order” policy, which exploits the store’s knowledge of

the out-of-stock product (but none of the other microdata). So, under the baseline policy, it seems

that the store workers do not fully exploit the information contained in consumers’ original orders.

This is hardly surprising. Due to tight time constraints, workers are unlikely to exhaustively study

all possible substitutes. Instead, they likely focus on products that are situated near the out-of-stock

49

product on the shelf. (This may explain why the store workers overwhelmingly choose substitutes

that share the same brand as the out-of-stock product; ice cream products are typically grouped by

brand.) Now consider the “fully-individualized” policies, which additionally leverage the store’s

knowledge of consumers’ past purchases and household demographics. Two of these policies—

namely, those maximizing present profits or total discounted profits—also deliver 90% predicted

acceptance probabilities. Why do the foregoing policies afford such high acceptance probabilities?

The explanation does not reside in the substitutes’ brands; as previously discussed, the “fully-

individualized” policies match the out-of-stock product’s brand less often than the baseline policy

does. Nor are these high acceptance probabilities rooted in the substitutes’ prices (which, under the

“fully-individualized” policies, tend to exceed those of the baseline substitutes).57 Instead, it seems

the individualized substitutes more closely match the non-brand observable characteristics of the

out-of-stock product, the consumer’s past purchases, or both. By contrast, the lowest probability of

acceptance is associated with the “fully-individualized” policy intended to maximize discounted

future profits. The probability of acceptance averages only 54% under this policy. This is partly

because two-thirds of the substitutes offered under this policy belong to brands that consumers have

never purchased.58

Examine now the store’s expected present-trip profits. These correspond to the product of

the substitute’s retail margin and its probability of acceptance. It emerges that the baseline and

“one-size-fits-all” policies afford identical present-trip profits: $2.19. Most of the individualized

policies, meanwhile, select substitutes that are both higher-margin and likelier to be accepted

than are their baseline counterfactuals. The result is higher expected present-trip profits: $2.47

under the “individualized-by-original-order policy,” and $2.55 under the “fully-individualized”

policies that maximize either present-trip or total discounted profits. However, the policy with the

57The mean (median) of the baseline substitutes’ prices come to $3.82 ($3.95), while those of the “individualized-by-
original-order” policy amount to $3.74 ($3.90). As for the “fully-individualized” policies, substitutes’ mean (median)
prices when the store maximizes present-trip profits are $3.87 ($4.06), while mean (median) prices when the store
maximizes total discounted profits are $3.88 ($4.06).

58In particular, the modal substitution where the predicted probability of acceptance drops by more than average
(i.e., 27%) meets the following description: (i) the out-of-stock product is sold under the Häagen-Dazs brand, (ii)
all of the consumer’s past purchases are Häagen-Dazs, (iii) the baseline substitute is Häagen-Dazs, and (iv) the
“fully-individualized” substitute that maximizes discounted future profits is sold under the Ben & Jerry’s brand.

50

Table 2.5: Profit-Relevant Outcomes by Substitution Policy

“Fully individualized” by original order,
past purchases, and household demographics

Baseline

“One size
fits all”

Individualized
by original order

Max. present-
trip profits

Max. PDV
total profits

Max. PDV
future profits

Panel A. Present trip

Retail margin

Prob. accept

Expected present-
trip profits

2.71
(0.33)
0.81
(0.24)
2.19
(0.70)

2.74
(0.49)
0.80
(0.22)
2.19
(0.63)

2.77
(0.50)
0.90
(0.14)
2.47
(0.45)

2.84
(0.50)
0.90
(0.14)
2.55
(0.44)

2.84
(0.50)
0.90
(0.14)
2.55
(0.44)

2.83
(0.35)
0.54
(0.30)
1.51
(0.84)

PDV future profits

17.16
(25.02)

17.16
(25.02)

17.16
(25.02)

17.16
(25.02)

17.16
(25.02)

17.16
(25.03)

Panel B. Future trips

Panel C. Overall

PDV total profits

19.34
(24.98)

19.34
(25.00)

19.62
(25.02)

19.70
(25.00)

19.70
(25.00)

18.67
(25.00)

Notes: This table compares profit-relevant outcomes under the store’s existing substitution policy (the “baseline”)
with outcomes under counterfactual policies. These counterfactual policies exploit the store’s knowledge of
the consumer to varying degrees, with the “one-size-fits-all” policy leveraging none of this information; the
“individualized-by-original-order” policy employing the store’s knowledge of the consumer’s original order; and
the “fully individualized” policies additionally exploiting the store’s knowledge of the consumer’s past purchases
and household demographics. Regarding the last, the “fully-individualized” policies are respectively designed to
maximize (i) expected profits on the present shopping trip, (ii) the present-discounted value of total profits (both
present and future), or (iii) or the present-discounted value of future profits alone. All results are reported as means,
with standard deviations appearing in parentheses.

lowest present-trip expected profits is the “fully-individualized” policy intended to maximize future

discounted profits. Under this policy, expected present-trip profits shrink to $1.51 due to the low

predicted probability of acceptance.

Consider next the profits from consumers’ future shopping trips. It transpires that the present-

discounted value of future profits is identical under all the substitution policies: $17.16. This is

true even of the “fully-individualized” policy that is designed to maximize future profits alone—

irrespective of the cost to present-trip profits. I will discuss possible explanations for this result

momentarily.

Turn last to the present-discounted value of total profits—both present and future. Here, the

“one-size-fits-all” again proves indistinguishable from the store’s baseline policy: each results in

total discounted profits of $19.34. The “individualized-by-original-order” policy increases profits to

51

$19.62, while the “fully-individualized” policies maximizing present-trip profits or total discounted

profits further increase profits to $19.70. But the “fully-individualized” policy maximizing solely

future profits delivers the lowest total profits of all ($18.67).

Discussion.—Most of the gains from individualization can be realized by conditioning the choice

of substitute on the consumer’s original order. Whereas the “individualized-by-original-order”

policy boosts the present-discounted value of total profits by $0.28 over the store’s baseline policy,

the additional gains from conditioning on consumers’ past purchases and household demographics

come to $0.08. This suggests that a single order decision is highly informative of the consumer

characteristics that affect the store’s optimal choice of substitutes, such as brand preferences and

price sensitivity.

Now consider consumer learning. The counterfactual results suggest that the store cannot

perceptibly increase future profits by introducing consumers to new brands through stockout substi-

tutions. This remains true even if the store disregards the dent to present-trip profits associated with

offering stockout substitutes from unfamiliar brands (which consumers are quite likely to reject).

Admittedly, there some heterogeneity across stockouts in the returns to steering consumers’ learn-

ing. For instance, there are eight stockouts where the present-discounted value of expected future

profits increases by at least five cents under the “fully-individualized” policy designed to maximize

the present-discounted value of total profits, compared to the “fully-individualized” policy tailored

to maximize expected present-trip profits alone. But eight stockouts is a small fraction of the

analysis sample (2048 stockouts).

Why is the store unable to increase profits by steering consumers’ learning? One possible

explanation is endogenous learning: absent the stockout, consumers would have still tried the

more profitable brands soon. However, counterfactual simulations presented in Table 2E.1 suggest

otherwise. Even if consumers never learned about new brands after stockout substitutions, the

store’s future profits under the optimal policy would remain unchanged from the baseline. Another

potential explanation for the null results is the assumption of imperfect foresight. What if the store

precisely knew goods’ future prices, wholesale costs, and availabilities? Simulations in Chapter 2E

52

show that perfect foresight would not increase the returns to steering consumers’ learning either.

Instead, the unprofitability of steering consumers’ learning stems from (i) consumers’ reluctance

to accept stockout substitutes from unfamiliar brands and (ii) the small amount of learning that

they experience when they do accept. Regarding (i), recall that the probability of acceptance dips

to barely one-half under the policy designed to maximized discounted future profits alone. The

reason is that two-thirds of these substitutes are sold under brands that consumers have never tried

before. As for (ii), the demand estimates in Section 2.6 indicate that across-consumer variation

in prior beliefs exceeds the amount of learning that individual consumers experience when they

try brands for the first time. It is thus unlikely that a consumer will discover that she prefers a

hitherto-unfamiliar brand to those she has previously purchased. This stylized fact is consistent

with the descriptive evidence presented in Section 2.3. Although consumers who try out new

brands as a result of stockout substitutions proceed to purchase those brands more frequently than

do otherwise-comparable consumers who do not suffer stockouts, the magnitude of this disparity

is modest (3.2 percentage points).

Chapter 2E supplies suggestive evidence that the gains from steering consumers’ learning stem,

instead, from (a) consumers’ reluctance to accept substitutes from new brands and (b) their fairly

accurate prior beliefs. There, I perform counterfactual simulations in which consumers are forced

to accept stockout substitutions, or consumers learn three times as much as they do in actual

fact, or both. Taken in isolation, neither forcing consumers to accept nor tripling the amount of

learning translates to increased future profits. But when both occur simultaneously, the average

present-discounted value of future profits increases by a cent.

2.7.3 Consumer Welfare

How are consumers affected when the substitution policy is individualized according to their

original orders, past purchases, and household demographics? And are consumers better off

when the policy accounts for substitutions’ influence on consumer learning? To provide insight,

Table 2.6 compares consumer welfare under the baseline and counterfactual substitution policies.

Focus first on the present shopping trip. The results in Table 2.6 suggest the “one-size-fits-

53

Table 2.6: Changes in Consumer Welfare (Compared to Baseline Policy)

“Fully individualized” by original order,
past purchases, and household demographics

Expected present-trip
consumer surplus ($)

PDV future consumer
surplus ($)

PDV total consumer
surplus ($)

“One size
fits all”
−0.84
(6.72)

0.00
(0.17)
−0.84
(6.73)

Individualized by
original order

Max. present-
trip profits

Max. PDV
total profits

2.15
(5.22)

0.00
(0.08)

2.16
(5.22)

2.56
(5.59)

0.00
(0.12)

2.56
(5.59)

2.56
(5.58)

0.00
(0.12)

2.56
(5.58)

Max. PDV
future profits
−5.18
(7.77)

0.12
(0.52)
−5.12
(7.79)

Notes: This table reports changes in consumer welfare when the store adopts various counterfactual substitution
policies. See notes to Table 2.5 for descriptions of these policies. All results are reported as means (with standard
deviations in parentheses).

all” policy diminishes expected present-trip consumer surplus by $0.84 compared to the store’s

baseline policy. As explained above, this is disparity is probably not rooted in the substitutes’

prices. Rather, the baseline substitutes are likelier to share the out-of-stock product’s brand

or non-brand characteristics than are the “one-size-fits-all” substitutes. The “individualized-by-

original-order” policy, by contrast, increases consumers’ expected present-trip surplus by $2.15

compared to the baseline. And two of the three “fully-individualized policies”—namely, those

designed to maximize present-trip profits or the present-discounted value of total profits—secure

even larger gains in present-trip surplus ($2.15). This is likely because the counterfactual policies

select substitutes that better match consumers’ preferences for non-brand characteristics. On the

other hand, the worst policy for present-trip consumer welfare is the “fully-individualized” policy

that maximizes future discounted profits. This policy diminishes present-trip expected surplus by

more than $5 compared to the baseline. As previously discussed in relation to this policy’s low

acceptance rate, the problem is that the policy tends to offer stockout substitutes from unfamiliar

but profitable brands (about which consumers tend to hold pessimistic prior beliefs).

Now consider future shopping trips. Under all but one of the counterfactual policies, consumers’

present-discounted value of future surplus remains unchanged from the baseline. The exception

is the “fully-individualized” policy that maximize the store’s discounted future profits. This

policy, which attempts to introduce two-thirds of consumers to a new brand, increases consumers’

54

discounted future surplus by $0.12 over the baseline. And this average conceals considerable

heterogeneity. Conditional on accepting the stockout substitute, sixty-seven consumers would

enjoy increases of a dollar or more in their present-discounted value of expected future surplus.

Overall, the “one-size-fits-all” policy diminishes the present-discounted value of total consumer

surplus by $0.84 compared to the baseline policy. By contrast, the “individualized-by-original-

order” policy increases total discounted surplus by $2.16. Still larger gains are afforded by the “fully-

individualized” policies that maximize the store’s expected present-trip or total discounted profits:

$2.56 in both cases. As for the “fully-individualized” policy designed to maximize discounted

future profits, the modest increase in consumers’ discounted future surplus is overwhelmed by the

slump in expected present-trip surplus. The net result is a drop of $5.12 in total (discounted)

consumer surplus relative to the baseline.

2.7.4 Counterfactual Results: Apple Sauce Cups

In this subsection, I briefly summarize the counterfactual results for apple sauce cups.59 The

results prove qualitatively similar to those for super-premium ice cream as regards both profits

and consumer welfare. Concerning the former, Table 2E.2 shows that discounted future profits

remain unchanged from the baseline under the various counterfactual policies studied. On the

present shopping trip, however, expected profits increase from the baseline by $1.48 under the

“fully-individualized” policies maximizing either present-trip profits or discounted total profits.

Most of these gains—namely, $1.45 (98%)—can be achieved under the “one-size-fits-all” policy.

As for consumer welfare, Table 2E.3 shows that average consumer surplus is higher under

the baseline policy than under any of the counterfactual policies. Among the counterfactual

policies, though, consumer surplus is higher under the “fully-individualized” policies than under

the “individualized-by-original-order” policy or the “one-size-fits-all” policy.

2.8 Conclusion

This paper shows that stockout substitutions in curbside grocery pickup enable the store to

steer consumers’ learning towards high-margin brands. However, consumers are less likely to

59Recall that structural models were not estimated for flavored milk or frozen french fries due to non-convergence.

55

accept substitutes from unfamiliar brands than they are to accept substitutes from familiar brands

(whose products they’ve purchased before). To quantify the trade-off between steering consumers’

learning and maximizing the probability of acceptance, I estimate a learning model of differen-

tiated products demand. Counterfactual simulations suggest that steering consumers’ learning

would prove an unprofitable strategy. Even so, the store could increase profits—and consumer

welfare—by individualizing substitutions according to consumers’ original orders, past purchases,

and demographics.

A natural extension to this study concerns cross-category spillovers in consumers’ learning.

Many brands sell products in multiple categories, such as the store’s private label (which competes

in nearly every category of packaged food). Concerning such brands, what a consumer learns

about the brand in one product category might also be informative of her tastes for the brand’s

products in other product categories. For instance, imagine that a stockout substitution causes a

consumer to learn that she likes private-label ice cream. If she interprets this as a positive signal

of her tastes for the private label as a whole, she might decide to try its offerings in other product

categories—such as apple sauce cups—on subsequent shopping trips. Thus, to the extent that

brands’ retail margins are correlated across product categories, cross-category learning spillovers

might increase the returns to steering consumers’ learning.

More broadly, further research is needed on the extent to which firms can steer consumers’

learning online. My findings suggest that supermarkets would struggle to profit from steering

consumers’ learning via stockout substitutions—a result that is reassuring as far as consumer

welfare is concerned. However, the internet affords many other opportunities to direct consumers’

learning. Take the case of web browsers, which are used to access important productivity software—

word processors, spreadsheets, calendars, etc.—and to casually surf the web (Taivalsaari et al.

2008). Here, Microsoft leverages the popularity of its Windows operating system to encourage

consumers to try its own browser, Edge, and to discourage them from experimenting with those

of its competitors (Krasnoff 2022; Hollister 2023).60 Another example concerns online shopping,

60Microsoft sets Edge as the default browser on Windows 11 (Krasnoff 2022), so that web links and certain file types
automatically open in Edge (unless consumers manually change the default browser). And when users try to download

56

where Google exploits its dominance in web search to promote its eponymous shopping service

(Raedts and Evans 2024).61 Many of the affected consumers are, of course, happy with Edge or

Google Shopping. Even so, some consumers might learn that they prefer alternatives—like Firefox

or Bing Shopping, respectively—were they to try them. Future work could quantify the welfare

effects of tech giants’ efforts to steer consumers’ learning about web browsers, online shopping,

and other things.62

the rival Chrome browser, they are first presented with a notice that Edge “. . . runs on the same tech as Chrome, with
the added trust of Microsoft,” then asked to complete a poll about their reasons for downloading Chrome (Hollister
2023)

61When consumers make shopping-related searches, Google displays its own shopping service more prominently

than those of its competitors (Raedts and Evans 2024).

62Unlike packaged foods, there are adjustment costs associated with trying out new online software/services. (For
instance, when a consumer experiments with a new web browser, she needs to determine where important functions
are located in the interface.) These adjustment costs affect welfare analysis as follows. If tech firms stopped steering
consumers’ learning, consumers might cross-shop online software/services more frequently. This would, in turn,
increase the total adjustment costs incurred by consumers.

57

BIBLIOGRAPHY

Abdulkadiroğlu, Atila, Nikhil Agarwal, and Parag A. Pathak. “The Welfare Effects of Coordinated
Assignment: Evidence from the New York City High School Match”. American Economic
Review 107, no. 12 (2017): 3635–3689.

Ackerberg, Daniel A. “Advertising, Learning, and Consumer Choice in Experience Good Markets:
An Empirical Examination”. International Economic Review 44, no. 3 (2003): 1007–1040.

Allcott, Hunt. “The Welfare Effects of Misperceived Product Costs: Data and Calibrations from the
Automobile Market”. American Economic Journal: Economic Policy 5, no. 3 (2013): 30–66.

Allcott, Hunt, et al. Sources of Market Power in Web Search: Evidence from a Field Experiment.

National Bureau of Economic Research, 2025.

Allende, Claudia, Francisco Gallego, and Christopher Neilson. “Approximating the Equilibrium

Effects of Informed School Choice”. Working paper, 2019. Visited on 10/28/2024.

Anand, Bharat N., and Ron Shachar. “Advertising, the Matchmaker”. The RAND Journal of

Economics 42, no. 2 (June 2011): 205–245.

Anupindi, Ravi, Maqbool Dada, and Sachin Gupta. “Estimation of Consumer Demand with Stock-
Out Based Substitution: An Application to Vending Machine Products”. Marketing Science
17, no. 4 (1998): 406–423.

Arteaga, Cristian, et al. “xlogit: An Open-Source Python Package for GPU-Accelerated Estimation

of Mixed Logit Models”. Journal of Choice Modelling 42 (2022): 100339.

Bachmann, Rüdiger, et al. “Firms and Collective Reputation: A Study of the Volkswagen Emissions

Scandal”. Journal of the European Economic Association 21, no. 2 (2023): 484–525.

Backus, Matthew, Christopher Conlon, and Michael Sinkinson. Common Ownership and Com-
petition in the Ready-to-Eat Cereal Industry. National Bureau of Economic Research, 2021.
Visited on 04/02/2025.

Bajari, Patrick, and C. Lanier Benkard. “Demand Estimation with Heterogeneous Consumers and
Unobserved Product Characteristics: A Hedonic Approach”. Journal of Political Economy
113, no. 6 (2005): 1239–1276.

Barahona, Nano, Cristóbal Otero, and Sebastián Otero. “Equilibrium Effects of Food Labeling

Policies”. Econometrica 91, no. 3 (2023): 839–868.

Beggs, Steven, Scott Cardell, and Jerry Hausman. “Assessing the Potential Demand for Electric

Cars”. Journal of Econometrics 17, no. 1 (1981): 1–19.

58

Berry, Steven, and Philip Haile. “Identification in Differentiated Products Markets”. Annual

Review of Economics 8, no. 1 (Oct. 31, 2016): 27–52.

Berry, Steven, James Levinsohn, and Ariel Pakes. “Automobile Prices in Market Equilibrium”.

Econometrica 63, no. 4 (1995): 841–890.

— . “Differentiated Products Demand Systems from a Combination of Micro and Macro Data:

The New Car Market”. Journal of Political Economy 112, no. 1 (2004): 68–105.

Berry, Steven T., and Philip A. Haile. “Foundations of Demand Estimation”.

In Handbook of
Industrial Organization, ed. by Kate Ho, Ali Hortaçsu, and Alessandro Lizzeri, 4:1–62. 2021.

— . “Identification in Differentiated Products Markets Using Market Level Data”. Econometrica

82, no. 5 (2014): 1749–1797.

— . “Nonparametric Identification of Differentiated Products Demand Using Micro Data”. Econo-

metrica 92, no. 4 (2024): 1135–1162.

Bradbury, James, et al. JAX: Composable Transformations of Python + NumPy Programs. Version

0.3.13, 2018.

Brenkers, Randy, and Frank Verboven. “Liberalizing a Distribution System: The European Car

Market”. Journal of the European Economic Association 4, no. 1 (2006): 216–251.

Brick Meets Click and Mercatus. “February U.S. eGrocery Sales Total $7.9 Billion, Down 10%

versus Year Ago”. Brick meets click, Mar. 13, 2024. Press Release.

Brownstone, David, and Kenneth A. Small. “Valuing Time and Reliability: Assessing the Evidence
from Road Pricing Demonstrations”. Transportation Research Part A: Policy and Practice 39,
no. 4 (2005): 279–293.

Bruno, Hernán A., and Naufel J. Vilcassim. “Research Note—Structural Demand Estimation with

Varying Product Availability”. Marketing Science 27, no. 6 (2008): 1126–1131.

Carlsson, Fredrik, and Peter Martinsson. “Do Hypothetical and Actual Marginal Willingness to
Pay Differ in Choice Experiments?: Application to the Valuation of the Environment”. Journal
of Environmental Economics and Management 41, no. 2 (2001): 179–192.

Che, Hai, Tülin Erdem, and T. Sabri Öncü. “Consumer Learning and Evolution of Consumer Brand

Preferences”. Quantitative Marketing and Economics 13, no. 3 (Sept. 2015): 173–202.

Chen, Nan, and Hsin-Tien Tsai. “Steering Via Algorithmic Recommendations”. The RAND

Journal of Economics 55, no. 4 (Dec. 2024): 501–518.

Ching, Andrew T. “A Dynamic Oligopoly Structural Model for the Prescription Drug Market After

59

Patent Expiration*”. International Economic Review 51, no. 4 (Nov. 2010): 1175–1207.

Collard-Wexler, Allan. “Demand Fluctuations in the Ready-Mix Concrete Industry”. Econometrica

81, no. 3 (2013): 1003–1037.

Compiani, Giovanni, et al. “Online Search and Optimal Product Rankings: An Empirical Frame-

work”. Marketing Science 43, no. 3 (May 2024): 615–636.

Conlon, Chris, Julie Mortimer, and Paul Sarkis. “Estimating Preferences and Substitution Patterns

from Second Choice Data Alone”. Preliminary and incomplete (2023).

Conlon, Christopher, and Jeff Gortmaker. “Incorporating Micro Data into Differentiated Products

Demand Estimation with PyBLP”. Working paper (2023).

Conlon, Christopher, and Julie Holland Mortimer. “Empirical Properties of Diversion Ratios”.

The RAND Journal of Economics 52, no. 4 (2021): 693–726.

Conlon, Christopher T., and Julie Holland Mortimer. “Demand Estimation under Incomplete
Product Availability”. American Economic Journal: Microeconomics 5, no. 4 (2013): 1–30.

— . “Effects of Product Availability: Experimental Evidence”. National Bureau of Economic

Research Working Paper 16506 (2010).

— . “Efficiency and Foreclosure Effects of Vertical Rebates: Empirical Evidence”. Journal of

Political Economy 129, no. 12 (Dec. 1, 2021): 3357–3404.

Czajkowski, Mikołaj, and Wiktor Budziński. “Simulation Error in Maximum Likelihood Estimation

of Discrete Choice Models”. Journal of Choice Modelling 31 (2019): 73–85.

Daljord, Øystein. “Durable Goods Adoption and the Consumer Discount Factor: A Case Study of

the Norwegian Book Market”. Management Science 68, no. 9 (2022): 6783–6796.

Deb, Partha, and Pravin K. Trivedi. “The Structure of Demand for Health Care: Latent Class

Versus Two-Part Models”. Journal of health economics 21, no. 4 (2002): 601–625.

Donnelly, Robert, Ayush Kanodia, and Ilya Morozov. “Welfare Effects of Personalized Rankings”.

Marketing Science 43, no. 1 (Jan. 2024): 92–113.

Dubé, Jean-Pierre, and Sanjog Misra. “Personalized Pricing and Consumer Welfare”. Journal of

Political Economy 131, no. 1 (2023): 131–189.

Erdem, Tülin, and Michael P. Keane. “Decision-Making Under Uncertainty: Capturing Dynamic
Brand Choice Processes in Turbulent Consumer Goods Markets”. Marketing Science 15, no. 1
(1996): 1–20.

60

Erdem, Tülin, Michael P. Keane, and Baohong Sun. “A Dynamic Model of Brand Choice When
Price and Advertising Signal Product Quality”. Marketing Science 27, no. 6 (2008): 1111–
1125.

Farronato, Chiara, and Andrey Fradkin. “The Welfare Effects of Peer Entry: The Case of Airbnb
and the Accommodation Industry”. American Economic Review 112, no. 6 (2022): 1782–1817.

Farronato, Chiara, Andrey Fradkin, and Alexander MacKay.

“Self-Preferencing at Amazon:
Evidence from Search Rankings”. In AEA Papers and Proceedings, 113:239–243. American
Economic Association, 2023.

Farronato, Chiara, et al. “Understanding the Tradeoffs of the Amazon Antitrust Case”. Harvard

Business Review (Jan. 11, 2024).

Fox, Jeremy T., Kyoo il Kim, and Chenyu Yang. “A Simple Nonparametric Approach to Estimating
the Distribution of Random Coefficients in Structural Models”. Journal of Econometrics 195,
no. 2 (2016): 236–254.

Fox, Jeremy T., et al. “The Random Coefficients Logit Model Is Identified”. Journal of Economet-

rics 166, no. 2 (2012): 204–212.

Grieco, Paul L.E., et al. “Conformant and Efficient Estimation of Discrete Choice Demand Models”.

Working Paper (2023).

Grieco, Paul LE, Charles Murry, and Ali Yurukoglu. “The Evolution of Market Power in the US

Automobile Industry”. The Quarterly Journal of Economics (2023).

Grigolon, Laura, and Frank Verboven. “Nested Logit or Random Coefficients Logit? A Comparison
of Alternative Discrete Choice Models of Product Differentiation”. Review of Economics and
Statistics 96, no. 5 (2014): 916–935.

Haener, M. K., P. C. Boxall, and W. L. Adamowicz. “Modeling Recreation Site Choice: Do
Hypothetical Choices Reflect Actual Behavior?” American Journal of Agricultural Economics
83, no. 3 (Aug. 2001): 629–642.

Hausman, Jerry A., and Paul A. Ruud. “Specifying and Testing Econometric Models for Rank-

Ordered Data”. Journal of Econometrics 34, no. 1 (1987): 83–104.

Heiss, Florian, Stephan Hetzenecker, and Maximilian Osterhaus. “Nonparametric Estimation of
the Random Coefficients Model: An Elastic Net Approach”. Journal of Econometrics 229, no.
2 (2022): 299–321.

Hollister, Sean. “Microsoft Now Thirstily Injects a Poll When You Download Google Chrome”.

The Verge, Oct. 24, 2023.

61

Iaria, Alessandro, and Ao Wang. “Real Analytic Discrete Choice Models of Demand: Theory and

Implications”. Econometric Theory (2024): 1–49.

Jovanovic, B. D., and P. S. Levy. “A Look at the Rule of Three”. The American Statistician 51,

no. 2 (May 1997): 137–139.

Kim, Kyoo il, and Amil Petrin. “Control Function Corrections for Unobserved Factors in Differen-

tiated Product Models”. Working paper, 2019.

Krasnoff, Barbara. “How to change your default browser in Windows 11”. The Verge, Apr. 15,

2022.

Lusk, Jayson L., and Ted C. Schroeder. “Are Choice Experiments Incentive Compatible? A Test
with Quality Differentiated Beef Steaks”. American Journal of Agricultural Economics 86, no.
2 (May 2004): 467–482.

Montag, Felix. “Mergers, Foreign Competition, and Jobs: Evidence from the US Appliance

Industry”. Working paper (2023).

Musalem, Andrés, et al. “Structural Estimation of the Effect of Out-of-Stocks”. Management

Science 56, no. 7 (2010): 1180–1197.

Nelson, Phillip. “Information and Consumer Behavior”. Journal of Political Economy 78, no. 2

(Mar. 1970): 311–329.

Nevo, Aviv. “Measuring Market Power in the Ready-to-Eat Cereal Industry”. Econometrica 69,

no. 2 (Mar. 2001): 307–342.

Newell, Richard G., and Juha Siikamäki. “Nudging Energy Efficiency Behavior: The Role of
Information Labels”. Journal of the Association of Environmental and Resource Economists
1, no. 4 (Dec. 2014): 555–598.

Osborne, Matthew.

“Consumer Learning, Switching Costs, and Heterogeneity: A Structural

Examination”. Quantitative Marketing and Economics 9 (2011): 25–70.

Paetz, Friederike, and Winfried J. Steiner. “Utility Independence versus IIA Property in Indepen-

dent Probit Models”. Journal of Choice Modelling 26 (2018): 41–47.

Parady, Giancarlos, David Ory, and Joan Walker. “The Overreliance on Statistical Goodness-of-Fit
and Under-Reliance on Model Validation in Discrete Choice Models: A Review of Validation
Practices in the Transportation Academic Literature”. Journal of Choice Modelling 38 (2021):
100257.

Quaife, Matthew, et al. “How Well Do Discrete Choice Experiments Predict Health Choices? A
Systematic Review and Meta-Analysis of External Validity”. The European Journal of Health

62

Economics 19, no. 8 (Nov. 2018): 1053–1066.

Raedts, Elske, and Simone Evans. “Google Shopping: Self-Preferencing Can Be Abusive”. Stibbe,

Feb. 10, 2024.

Reimers, Imke, and Joel Waldfogel. A Framework for Detection, Measurement, and Welfare

Analysis of Platform Bias. National Bureau of Economic Research, 2023.

Revelt, David, and Kenneth Train.

“Customer-Specific Taste Parameters and Mixed Logit”,
vol. Working Paper No. E00-274, Department of Economics, University of California, Berkeley.
2000.

Ryan, Stephen P. “The Costs of Environmental Regulation in a Concentrated Industry”. Economet-

rica 80, no. 3 (2012): 1019–1061.

Shin, Sangwoo, Sanjog Misra, and Dan Horsky. “Disentangling Preferences and Learning in Brand

Choice Models”. Marketing Science 31, no. 1 (Jan. 2012): 115–137.

Sobol’, Il’ya Meerovich. “On the Distribution of Points in a Cube and the Approximate Evaluation
of Integrals”. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 7, no. 4 (1967):
784–802.

Sullivan, Christopher. “The Ice Cream Split: Empirically Distinguishing Price and Product Space

Collusion” (2020).

Taivalsaari, Antero, et al. “Web Browser as an Application Platform”. In 2008 34th Euromicro

Conference Software Engineering and Advanced Applications, 293–302. 2008.

Train, Kenneth E. Discrete Choice Methods with Simulation. Cambridge University Press, 2009.

— . “EM Algorithms for Nonparametric Estimation of Mixing Distributions”. Journal of Choice

Modelling 1, no. 1 (2008): 40–69.

Train, Kenneth E., and Clifford Winston. “Vehicle Choice Behavior and the Declining Market
Share of Us Automakers”. International Economic Review 48, no. 4 (Nov. 2007): 1469–1496.

Tuyl, Frank, Richard Gerlach, and Kerrie Mengersen.

“The Rule of Three, its Variants and

Extensions”. International Statistical Review 77, no. 2 (Aug. 2009): 266–275.

U.S. Bureau of Labor Statistics. Consumer Price Index for All Urban Consumers (CPI-U).

U.S. Food & Drug Administration. “Bottled Water Everywhere: Keeping it Safe”. Consumer

Updates, Apr. 22, 2022.

Vatter, Benjamin. “Quality Disclosure and Regulation: Scoring Design in Medicare Advantage”.

63

Working paper, 2024.

Virtanen, Pauli, et al. “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python”.

Nature Methods 17, no. 3 (2020): 261–272.

Xing, Jianwei, Benjamin Leard, and Shanjun Li. “What Does an Electric Vehicle Replace?”

Journal of Environmental Economics and Management 107 (2021): 102432.

Young, Liz. “Never Mind the Delivery, More Online Consumers Are Turning to Store Pickup”.

The Wall Street Journal (July 14, 2023).

Zeyveld, Andrew. “Demand Estimation When Consumers’ Preferences Vary over Time”. Working

Paper (2024).

Zhang, Yongli, and Yuhong Yang. “Cross-Validation for Selecting a Model Selection Procedure”.

Journal of Econometrics 187, no. 1 (2015): 95–112.

64

APPENDIX 2A

DATA STRUCTURE AND OBSERVABLE
CHARACTERISTICS

Illustrating the Structure of the Data.—In Section 2.2.2, I describe a hypothetical consumer who

ordered Mott’s applesauce and Häagen-Dazs, only for the latter to go out of stock. Tables 2A.1

and 2A.2 portray what the curbside stockout data and scanner data would look like in this hypo-

thetical case. Notice that the former lists the UPCs and product catalog descriptions of both the

out-of-stock item and the substitute in our stylized example. However, the price of the out-of-stock

product is missing (and must be imputed from other sales at the same store before and after the

stockout, using the procedure described in Section 2.5.3).

As for the scanner data, Panels A and B of Table 2A.2 compare the contents when the consumer

accepts and rejects the substitute eggs, respectively.

Demographic Data Details.—I employ the following procedure to recover consumers’ demo-

graphic information. For each transaction, the scanner data report two variables concerning the

consumer: her loyalty ID, which serves as the primary panel identifier in my analysis; and her

household ID, which maps to the demographic data.1 When a given loyalty ID maps onto just

one household ID, I assume that the consumer belongs to the household in question. Sometimes,

however, a loyalty ID maps onto multiple household IDs.

In that event, I compute a weighted

Table 2A.1: Curbside Stockout Data (Example)

UPC

Description

Substitute Only

Price ($)

Accepted?

Out-of-Stock Product

Offered Substitute

71373312281

85808900305

“HAAGEN DAZS
VANILLA 14Z”

“HALO TOP ICE CREAM
VANILLA LIGHT 16 OZ”

3.79

Yes

Note: The (counterfactual) purchase price of the out-of-stock item is not recorded in the
data. I impute it using the scanner data.

1I rely on the chain’s loyalty program to track consumers’ purchases over time, rather than the household ID,

because the chain judges the former to be a more reliable identifier of individual shoppers/households.

65

Table 2A.2: Scanner Data (Example)

Panel A. Substitute is accepted.

UPC

Product catalog description

Price ($)

Date

Store ID Channel Loyalty ID

1480000023

85808900305

“MOTTS APPLESAUCE
CINNAMON 6/4 OZ”

“HALO TOP ICE CREAM
VANILLA LIGHT 16 OZ”

3.35

01/01/21

3.79

01/01/21

21

21

Pickup

12345

Pickup

12345

1480000023

“MOTTS APPLESAUCE
CINNAMON 6/4 OZ”

Panel B. Substitute is rejected.

3.35

01/01/21

21

Pickup

12345

average of the demographics associated with the household IDs in question (where the weight is

given by the number of transactions in the scanner data with the relevant household ID). Finally,

because the demographic data were collected in 2014, I lack demographic information on some

consumers’ households. Such consumers are excluded from the structural analysis in Sections 2.5

and 2.7.

The chain transitioned to a new household ID system during the time period studied. Before

August 2017, the scanner data only report the old household ID; from August 2017 to April 2021, the

scanner data indicate both the old and new household IDs; and from May 2021 onwards, the scanner

data contain only the new household IDs. Seeing as the demographic data are organized around

the old household IDs, I adopt the following procedure to impute the household demographics

associated with a given loyalty ID. If the loyalty ID appears in one or more transactions where the

“old household IDs” are observed (i.e., before April 2021), I impute the consumer’s demographics

as a weighted average of the demographics ascribed to the relevant “old household IDs” (following

the procedure in Section 2.2.3). Rarely, a loyalty ID is solely observed in transactions after May

2021—which only contain “new IDs”—and yet the relevant “new ID(s)” themselves appear in

(other) transactions that are old enough to also have “old IDs.” In such cases, for each “new ID,”

I take a weighted average of the demographics associated with all the “old household IDs” with

which the “new ID” appears. (The weights are, once more, based on the number of transactions.)

Finally, the demographics associated with the loyalty ID are imputed as being a transaction-weighted

average of the (imputed) demographics associated with the relevant “new IDs.”

66

Table 2A.3: State Dependence in Brand, Product, and Channel Choice

Panel A. Overall

In consecutive trips,
prob. of the same. . .

Apple sauce
cups

Flavored
milk

Frozen french
fries

Product being purchased
Brand being purchased
Shopping channel

0.612
0.805
0.862

0.603
0.771
0.857

0.364
0.692
0.850

Ice
cream

0.271
0.542
0.907

Panel B. Conditional on present
trip being curbside pickup

Product being purchased
Brand being purchased
Shopping channel

0.600
0.775
0.793

0.663
0.826
0.738

0.379
0.698
0.746

0.331
0.626
0.778

Notes: Estimates are reported as means. In curbside pickup, when there is a stockout substitu-
tion, I define the “purchased product” as being the stockout substitute.

State Dependence in Product, Brand, and Channel Choice.—Do consumers tend to purchase

the same products in consecutive trips? Or at least products of the same brand? And how often do

consumers switch shopping channels (i.e., in-store shopping versus curbside pickup versus home

delivery)?

To provide insight, Table 2A.3 reports the probability of repeated product, brand, and shopping

channel choices—both overall, and conditional on the present trip being curbside pickup. Focus

first on the overall results, which are presented in Panel A. There are meaningful cross-category

differences in the probability of purchasing the same product on consecutive trips; whereas there is

a 61% probability that a consumer purchases the same apple sauce cups on consecutive shopping

trips, there is only a 36.4% (27.1%) that she does the same with respect to flavored french fries

(ice cream). However, in all four categories, a consumer is likely to purchase products that are sold

under the same brands on consecutive trips, with probabilities ranging from 54.2% (ice cream)

to 80.5% (apple sauce cups). Furthermore, these purchases tend to be made through the same

shopping channel. Across the three product categories, between 85% and 91% of consumers select

the same shopping channel on consecutive trips.

Do consumers display more, or less, state dependence after a curbside pickup order? Panel B

suggests that consumers’ behavior evinces a similar degree of state dependence following curbside

67

Table 2A.4: Summary Statistics by Product Category

Statistic

Apple sauce
cups

Flavored
milk

Frozen french
fries

No. of stockout events
Median upper bound on duration (hours)

7332
130.7

14,710

60.4

28,885

123.9

Ice
cream

66,635

148.7

pickup versus in-store shopping or home delivery. The most perceptible difference concerns the

choice of shopping channel. If a consumer has placed an order for curbside pickup, the probability

that her next shopping trip shares the same channel (namely, curbside pickup) drops to 79% or less

across the three product categories (compared to the unconditional probability of repeat channel

choices of 85% or more, depending on the product category).

Frequency and Duration of Stockout Events.—When multiple consumers order the same product

from the same store at roughly the same time, a single stockout event can result in more than one

stockout substitution. How often do stockouts occur, and how long do they last? To answer these

questions, I join the curbside stockout data with the scanner data and then sort the combined data

set by store, product, and date. For each store-product pairing in the resulting data set, I observe

sequences of successful purchases (from the scanner data), interspersed with sequences of stockout

substitutions (from the curbside stockout data). Treating the former as evidence that the product is

in stock and the latter as evidence of stockout, I identify the last successful purchase before each

stockout event as well as the first successful purchase afterwards. By computing the time elapsed

between these two successful purchases, I obtain an upper bound on the duration of the stockout

event.

Panel D reports the results of this descriptive exercise. The total number of stockout events varies

across product categories, ranging from seven thousand (flavored milk) to sixty-seven thousand

(ice cream). The median upper bound on the duration of an individual stockout event is between

sixty and one-hundred forty-nine hours.2

2I report the median, not the mean, because some “stockouts” are of such long duration that they are probably not
stockouts per se. Rather, the store has likely dropped the product in question for several months and then reintroduced
it.

68

APPENDIX 2B

ADDITIONAL DESCRIPTIVE EVIDENCE

Reduced-Form Evidence on the Acceptance or Rejection of Substitutes.—Here, I characterize the

circumstances under which consumers accept or reject substitutes in the product categories of apple

sauce cups, flavored milk, and frozen french fries. For each product category, Table 2B.1 reports

the average marginal effects from Equation (2.1). Across all the product categories, consumers are

much likelier to accept substitutes whose brands they have previously purchased. As for non-brand

characteristics, some of these loom larger than others. For instance, consumers are 9 percentage

points likelier to accept substitute flavored milks that share the same high protein–status as the

out-of-stock product.

Supplementary Evidence of Stockout Substitutions’ Influence on Consumers’ Learning.—The

results in Table 2.3 suggest that stockout substitutions sometimes influence consumers’ purchases

through the mechanism of learning. This is because the future purchases of the “focal consumers”

(who suffer stockout substitutions and, in consequence, can learn about the substitute’s character-

istics) differ from the future purchases of the “control consumers” (who order the same products as

the focal consumers, but successfully pick up and thus do not learn about the substitute).

That the focal consumers proceed to purchase the substitute’s brand more often in the future

than do their “control” counterparts is consistent with the former’s learning about the brand of the

substitute. Specifically, some focal consumers may be discovering that they like the substitute’s

brand more than they had anticipated and, as a result, purchasing that brand on subsequent shopping

trips. However, other factors could also explain the differences between focal and control consumers.

One such factor is the “buy it again” feature of the online order system. When consumers visit the

store’s website or mobile app, consumers are presented with a list of items that they have purchased

on previous shopping trips—any of which can be ordered again with a single click. (By contrast,

ordering an item outside this list requires multiple steps; see Section 2.2.2.) To test whether the

“buy it again” list is responsible for the disparity between focal and control consumers, I repeat the

descriptive exercise with one modification. Rather than comparing focal and control consumers

69

Table 2B.1: Acceptance: Average Marginal Effects from Probit Regressions

Variable

Brand

Sub shares OOS product’s brand

Ever purchased sub’s brand before

Fruit

Ever purchased sub’s fruit before

Seasoning

Ever purchased sub’s seasoning before

No. of cupsa

Sub shares OOS product’s no. of cups

Ever purchased sub’s no. of cups before

Sweetening

Sub shares OOS product’s sweetening

Ever purchased sub’s sweetening before

Pct. milkfat

Sub shares OOS product’s pct. milkfat

Ever purchased sub’s pct. milkfat before

Whether hi-protein

Sub shares OOS product’s whether hi-protein

Ever purchased sub’s whether hi-protein before

Sizeand

Sub shares OOS product’s size

Ever purchased sub’s size before

Base vegetable

Sub shares OOS product’s base vegetable

Ever purchased sub’s base vegetable before

Sub’s price

OOS product’s price

Product category

Apple sauce
cups

Flavored
milk

Frozen french
fries

−0.099***
[0.029]
0.059***
[0.012]

0.030***
[0.006]
0.044***
[0.006]

0.012***
[0.003]
0.031***
[0.003]

0.023
[0.025]

−0.002
[0.017]

−0.298***
[0.083]
0.005
[0.031]

−0.025
[0.029]
0.031*
[0.015]

0.047***
[0.006]
0.031***
[0.006]

0.087***
[0.020]
0.004
[0.020]

−0.024**
[0.008]
−0.006
[0.006]

−0.056*** −0.030***
[0.014]
−0.030***
[0.008]

[0.003]
0.013***
[0.003]

0.010
[0.008]
0.007
[0.008]

0.125***
[0.013]
−0.033***
[0.008]
−0.029***
[0.003]
0.006*
[0.003]

Observations
Pseudo 𝑅2
Notes: The dependent variable is whether a stockout substitute is accepted (=1) or rejected (=0). The
table reports average marginal effects, not coefficients. Standard errors are in brackets.

15,667
0.0577

31,157
0.0215

2,052
0.1076

a Discretized.
* Significant at the 10 percent level.

** Significant at the 5 percent level.
*** Significant at the 1 percent level.

70

Table 2B.2: Model-Free Evidence of Learning: Apple Sauce Cups

No. of purchases

Before stockout

After stockout

Pct. of future purchases with
sub’s version of characteristic

Consumer’s “treatment”

Mean

Std. dev. Mean

Std. dev. Mean

Std. dev.

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

Panel A. Characteristic of being (un)sweetened (80 obs.)

11.9
[0.2]
15.8
[0.3]

13.8
[0.3]
13.3
[0.3]

24.3
[1.4]
23.1
[1.5]

18.4
[0.4]
26.5
[0.6]

10.4
[0.2]
10.6
[0.2]

16.2
[0.3]
14.8
[0.4]

11.3
[0.3]
4.3
[0.1]

Panel B. Characteristic of brand (95 obs.)

25.2
[0.6]
28.3
[0.6]

11.3
[0.2]
12.0
[0.2]

19.2
[0.4]
22.9
[0.5]

13.7
[0.3]
11.1
[0.2]

Panel C. Characteristic of fruit (25 obs.)

35.1
[2.0]
38.4
[2.7]

15.1
[0.8]
16.1
[0.7]

21.4
[1.3]
19.8
[1.2]

3.5
[0.5]
5.3
[0.5]

25.3
[0.4]
10.8
[0.2]

27.4
[0.4]
22.8
[0.3]

11.9
[1.0]
12.5
[0.7]

Table 2B.3: Model-Free Evidence of Learning: Flavored milk

No. of purchases

Before stockout

After stockout

Pct. of future purchases with
sub’s version of characteristic

Consumer’s “treatment”

Mean

Std. dev. Mean

Std. dev. Mean

Std. dev.

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

Panel A. Characteristic of brand (572 obs.)

41.8
[0.2]
42.2
[0.1]

19.8
[0.0]
20.5
[0.0]

24.4
[0.1]
26.1
[0.1]

5.6
[0.0]
3.7
[0.0]

17.3
[0.1]
10.6
[0.0]

Panel B. Characteristic of pct. milkfat (195 obs.)

28.7
[0.3]
31.5
[0.4]

14.2
[0.1]
16.2
[0.1]

15.9
[0.1]
22.6
[0.2]

14.1
[0.1]
8.1
[0.1]

Panel C. Characteristic of size (150 obs.)

20.4
[0.4]
30.3
[0.6]

15.4
[0.1]
18.9
[0.2]

19.6
[0.2]
23.7
[0.2]

15.6
[0.2]
12.1
[0.1]

25.8
[0.2]
17.0
[0.1]

25.2
[0.2]
22.2
[0.2]

23.6
[0.1]
24.7
[0.1]

16.3
[0.1]
16.6
[0.2]

10.8
[0.1]
13.6
[0.2]

71

Table 2B.4: Model-Free Evidence of Learning: Flavored milk

No. of purchases

Before stockout

After stockout

Pct. of future purchases with
sub’s version of characteristic

Consumer’s “treatment”

Mean

Std. dev. Mean

Std. dev. Mean

Std. dev.

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

Panel A. Characteristic of brand (525 obs.)

21.9
[0.1]
25.7
[0.1]

11.1
[0.0]
10.3
[0.0]

12.4
[0.0]
11.0
[0.0]

8.8
[0.0]
4.9
[0.0]

Panel B. Characteristic of flavor (74 obs.)

22.0
[0.6]
16.6
[0.4]

9.5
[0.2]
13.3
[0.2]

15.2
[0.5]
17.7
[0.4]

15.4
[0.4]
19.1
[0.4]

Panel C. Characteristic of size (75 obs.)

24.2
[0.7]
39.3
[1.5]

10.6
[0.1]
11.7
[0.2]

10.7
[0.1]
14.2
[0.3]

2.4
[0.1]
0.5
[0.0]

21.0
[0.1]
15.4
[0.1]

27.4
[0.4]
32.4
[0.4]

10.4
[0.5]
2.2
[0.1]

17.5
[0.0]
18.1
[0.0]

12.5
[0.3]
11.8
[0.2]

18.1
[0.3]
25.4
[0.5]

with respect to all subsequent purchase—both online and offline—I instead focus solely on in-store

purchases. If the disparity between focal and control consumers is entirely driven by the “buy it

again” list (as opposed to learning), the disparity should disappear once analysis is confined to

in-store purchases (where the “buy it again list” is irrelevant). Table 2B.5 presents the results of this

robustness check (where, for brevity, I only report results for the characteristic of brand). Although

the sample sizes shrink dramatically, the focal consumers still purchase the substitute’s brand more

frequently than do their control counterparts

There may also be underlying differences between the focal and control consumers. In particular,

the focal consumers have, by construction, arrived at the store later than their control counterparts

(as the stockout occurred in the interim). Could the pickup time be correlated with differential

trends in future purchases? Such a correlation might arise if, for instance, the pickup time were

associated with consumers’ inclination to try out new products. To test for the presence of any such

compositional differences between focal and control consumers, I repeat the descriptive exercise

above with one modification: I now define the control consumer as the first consumer to successfully

pick up the focal consumer’s preferred product after it goes out of stock (from among the subset

72

Table 2B.5: Model-Free Evidence of Learning About Brands: A Comparison of Future In-Store
Purchases

No. of purchases

Before stockout

After stockout

Pct. of future purchases with
sub’s version of characteristic

Consumer’s “treatment”

Mean

Std. dev. Mean

Std. dev. Mean

Std. dev.

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

13.1
[0.6]
17.0
[0.7]

27.6
[0.2]
30.0
[0.2]

22.3
[0.1]
19.0
[0.1]

25.5
[0.2]
33.2
[0.3]

16.6
[0.9]
20.5
[0.8]

43.5
[0.2]
48.7
[0.5]

25.3
[0.1]
23.9
[0.2]

32.4
[0.4]
38.6
[0.3]

Panel A. Apple sauce cups

5.6
[0.3]
3.8
[0.1]

7.3
[0.4]
3.1
[0.1]

20.8
[1.3]
13.7
[1.0]

Panel B. Flavored milk

12.9
[0.1]
13.0
[0.1]

17.9
[0.1]
17.3
[0.1]

5.2
[0.1]
5.3
[0.1]

Panel C. Frozen french fries

7.6
[0.0]
5.4
[0.0]

13.1
[0.1]
16.4
[0.2]

9.2
[0.1]
6.2
[0.0]

10.0
[0.1]
5.7
[0.1]

Panel D. Ice cream

15.4
[0.2]
22.2
[0.3]

5.5
[0.1]
5.0
[0.1]

38.9
[1.1]
28.3
[1.2]

15.4
[0.1]
16.4
[0.2]

23.5
[0.2]
16.7
[0.1]

16.0
[0.3]
14.4
[0.2]

Notes: This table checks whether the results in Tables 2B.2 and 2.3 are robust to focusing only on consumers’
future in-store purchases. (Unlike order for curbisde pickup, in-store purchases are not directly affected by the
“buy-it-again” feature of the store’s app and website.)

of consumers who, like the focal consumer, have never purchased the substitute’s version of the

relevant characteristic before).1 Thus, the focal consumer’s order must have been assembled before

the control consumer’s, so that either (a) the focal consumer placed her order earlier than did

the control consumer or (b) the focal consumer’s stated pickup time was earlier than the control

consumer’s. As a result, any compositional differences between focal and control consumers that

are rooted in order or pickup times should be reversed. Reassuringly, the results—which are

presented in Table 2B.6—prove qualitatively similar to the ones above.

Determinants of Retail Margins (Additional Categories).—Figure 2B.1 summarizes the results

1In principle, this robustness check (unlike the main descriptive exercise above) is vulnerable to endogenous price
changes. Specifically, the store might respond to a product’s going out of stock by raising the price. This could cause
the control consumer to face a different price from the focal consumer.

73

Table 2B.6: Model-Free Evidence of Learning About Brands: Robustness Check (“First After”)

No. of purchases

Before stockout

After stockout

Pct. of future purchases with
sub’s version of characteristic

Consumer’s “treatment”

Mean

Std. dev. Mean

Std. dev. Mean

Std. dev.

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

Suffer substitution (focal group)

Successful pickup (control group)

8.8
[0.0]
11.0
[0.0]

20.1
[0.0]
22.3
[0.0]

15.0
[0.0]
15.9
[0.0]

22.2
[0.0]
28.4
[0.0]

16.2
[0.0]
20.7
[0.1]

36.1
[0.0]
41.6
[0.0]

22.7
[0.0]
25.0
[0.0]

36.2
[0.0]
45.8
[0.1]

Panel A. Apple sauce cups

8.9
[0.0]
8.9
[0.0]

12.7
[0.0]
12.4
[0.0]

18.9
[0.0]
13.4
[0.0]

Panel B. Flavored milk

17.2
[0.0]
18.0
[0.0]

22.6
[0.0]
24.6
[0.0]

9.1
[0.0]
6.1
[0.0]

Panel C. Frozen french fries

9.7
[0.0]
10.1
[0.0]

18.3
[0.0]
21.1
[0.0]

12.1
[0.0]
12.9
[0.0]

11.0
[0.0]
7.1
[0.0]

Panel D. Ice cream

23.6
[0.0]
27.8
[0.0]

6.2
[0.0]
4.3
[0.0]

31.1
[0.0]
25.9
[0.0]

22.3
[0.0]
17.0
[0.0]

23.4
[0.0]
18.7
[0.0]

15.8
[0.0]
13.2
[0.0]

Notes: This table examines whether the results in Table 2.3 are robust to considering a different population
of “control consumers.” Although the control consumer is drawn from the same pool of potential control
consumers as in Table 2.3, here I select the first consumer to successfully pick up after the stockout event.

a Binned (small/medium/large)
b Binned (less than 100 cal; between 100 and 200 cal; more than 200 cal)

of descriptive regressions concerning retail margins in the product categories of apple sauce cups,

flavored milk, and frozen french fries. As with the product category of ice cream, the characteristic

of brand proves to be a key determinant of retail margins.

74

a. Apple sauce cups

b. Flavored milk

c. Frozen french fries

Figure 2B.1: Determinants of Retail Margins
Notes: This figure plots estimates of the coefficients (𝛾) on products’ observable characteristics using the
specification in Equation (2.2). The horizontal bars provide 95% confidence intervals.

75

−0.50.00.5CoefficientBrand: Private labelBrand: Zee ZeesNo. cupsSeasoning: Birthday CakeSeasoning: SourUnsweetened0.00.20.40.60.8CoefficientBrand: FairlifeBrand: NesquickBrand: TrumooHigh proteinSize (oz)−0.50.00.5CoefficientBrand: AlexiaBrand: Grown in IdahoBrand: Private labelSize (oz)Sweet potato–basedAPPENDIX 2C

ESTIMATION DETAILS

Simulated Likelihood Function.—I employ maximum simulated likelihood estimation to recover

the parameters. The likelihood function is based on the probability of the consumer’s ordering

a particular good, as well as the probability of her accepting a specific substitute. Both those

probabilities, in turn, depend on the goods’ expected utilities at time 𝑡. However, the explanatory

variables used in this learning model differ somewhat from those in a traditional mixed (or “random

coefficients”) logit model. Thus, I begin my derivation of the likelihood by showing how to

compute the goods’ expected utilities as a function of (a) the parameters indexing the distributions

of consumer tastes and learning, as discussed above; and (b) consumers’ observed choices in the

data.

Equation Equation (2.10) gives the consumer’s expected utility of good 𝑗 at time 𝑡, conditional

on the set I𝑖𝑡 of brands for which she fully knows her taste. All quantities in equation Equation (2.10)

are fully known to the consumer, with the possible exception of her time-𝑡 expected taste for good

𝑗’s brand. This can be written as

E[𝑣𝑖𝐵( 𝑗) | I𝑖𝑡] = 𝜇𝑖𝐵( 𝑗)
(cid:124)(cid:123)(cid:122)(cid:125)
prior expected
taste

(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)

+ (𝑣𝑖𝐵( 𝑗) − 𝜇𝑖𝐵( 𝑗)) 1[𝐵( 𝑗) ∈ I𝑖𝑡]
(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)
(cid:125)
(cid:123)(cid:122)
learning “correction” (if brand
was previously purchased)

(cid:124)

(2C.1)

Here the indicator variable 1[𝐵( 𝑗) ∈ I𝑖𝑡] equals one if (and only if) the consumer knows her taste

for brand 𝐵( 𝑗) at time 𝑡. Until she purchases the brand for the first time, she does not fully know

her taste for it and must, instead, rely on her prior expected taste 𝜇𝑖𝐵( 𝑗). But upon her first purchase

of the brand, she learns the degree to which her true taste 𝑣𝑖𝐵( 𝑗) differs from her prior expected

taste 𝜇𝑖𝐵( 𝑗).

In order to take equation Equation (2C.1) to the data, observe that prior expected tastes 𝜇𝑖𝐵( 𝑗)

can be computed as the product of

(i) a 1 × 𝐵 vector of brand dummy variables, (cid:0) 1[𝐵( 𝑗) = 1], . . . , 1[𝐵( 𝑗) = 𝐵](cid:1) ⊺

; and

(ii) a 𝐵 × 1 vector of prior expected brand tastes, (𝜇𝑖1, . . . , 𝜇𝑖𝐵).

76

This is true because

𝜇𝑖𝐵( 𝑗) =

𝐵
∑︁

𝑏=1

1[𝐵( 𝑗) = 𝑏] · 𝜇𝑖𝑏

(cid:16)

=

1[𝐵( 𝑗) = 1]

· · · 1[𝐵( 𝑗) = 𝐵]

(cid:17)

·

𝜇𝑖1
...

𝜇𝑖𝐵

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(2C.2)

The “learning correction” (𝑣𝑖𝐵( 𝑗) − 𝜇𝑖𝐵( 𝑗)) can be calculated similarly. Here, the explanatory

variables must account for the fact that the learning correction remains latent until the consumer

buys the brand for the first time (formally, until 𝐵( 𝑗) ∈ I𝑖𝑡).

I therefore compute the learning

correction as

(i) a 1 × 𝐵 vector of indicator variables, (cid:0) 1[𝐵( 𝑗) = 1 and 1 ∈ I𝑖𝑡], . . . , 1[𝐵( 𝑗) = 𝐵 and 𝐵 ∈

I𝑖𝑡](cid:1) ⊺

, such that entry 𝑏 equals one if 𝑏 is 𝑗’s brand and also 𝑏 is a brand the consumer has

previously purchased (i.e., 𝑏 ∈ I𝑖𝑡); and

(ii) a 𝐵 × 1 vector of the consumer’s “learning shocks,” (𝑣𝑖1 − 𝜇𝑖1, . . . , 𝑣𝑖𝐵 − 𝜇𝑖𝐵)⊺.

This representation is accurate because

𝑣𝑖𝑏 − 𝜇𝑖𝑏 =

𝐵
∑︁

𝑏=1

1[𝐵( 𝑗) = 𝑏 and 𝑏 ∈ I𝑖𝑡] (𝑣𝑖𝑏 − 𝜇𝑖𝑏)

(cid:16)

=

1[𝐵( 𝑗) = 1 and 1 ∈ I𝑖𝑡]

· · · 1[𝐵( 𝑗) = 𝐵 and 𝐵 ∈ I𝑖𝑡]

𝑣𝑖1 − 𝜇𝑖1
...

𝑣𝑖𝐵 − 𝜇𝑖𝐵

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:17)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(2C.3)

Importantly, the learning correction (𝑣𝑖𝑏 − 𝜇𝑖𝑏) has a mean of zero for all brands 𝑏. This follows

from the fact that the consumer’s prior expectation 𝜇𝑖𝑏 on her taste for 𝑏 is unbiased. (Recall that

her true taste 𝑣𝑖𝑏 is drawn directly from her prior, which is normally distributed with mean 𝜇𝑖𝑏.) As

a result, there is only one parameter to be estimated in connected with the learning correction: its

standard deviation 𝜄2
𝑏.

77

Unlike the random coefficients pertaining to brands, the remaining ones can be recovered

with usual procedure employed in mixed (or “random-coefficients”) logit, with 𝑥 𝑗 , 𝑝 𝑗𝑡 and 𝜉 𝑗𝑡 as

explanatory variables.

The complete set of explanatory variables for good 𝑗 can be represented by the vector

(cid:16)

1[𝐵( 𝑗) = 𝑏]

(cid:17) 𝐵
𝑏=1

1[𝐵( 𝑗) = 1 and 1 ∈ I𝑖𝑡], · · · , 1[𝐵( 𝑗) = 𝐵 and 𝐵 ∈ I𝑖𝑡]

𝑥 𝑗

𝑝 𝑗𝑡

1[before Jan. 2021 ] · ˜𝜉 𝑗𝑡

1[after Jan. 2021 ] · ˜𝜉 𝑗𝑡

1[ 𝑗 = 0] · 1[reject in-person ]

(cid:17)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

𝑤 𝑗𝑡 ≔

(cid:16)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

while the complete set of parameters can be written as

(𝜇𝑏) 𝐵

𝑏=1
(𝑣𝑏 − 𝜇𝑏) 𝐵

𝑏=1

(𝛽, 𝜎2
𝛽)

(𝛼, 𝜎2
𝛼)

𝜆pre-21

𝜆post-21

𝛾

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

𝜒𝑖 ≔

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

Having written the expected utility of each good 𝑗 as a function of the parameters to be

estimated, as well as the data, I can now derive a parsimonious expression of the (simulated)

likelihood function used in estimation. My estimation code borrows from Arteaga et al. (2022);

while my exposition here borrows from the same, along with Train (2009). Before elaborating on the

mechanics of estimation, I will introduce additional notation concerning an individual consumer’s

orders, substitutions, and learning. In reference to orders, let 𝑦𝑖 𝑗𝑡 equal one if consumer 𝑖 orders

good 𝑗 in trip 𝑡, and zero otherwise. Likewise, in reference to substitutions, let 𝑎𝑖 𝑗𝑡 equal one if

either (a) consumer 𝑖 accepts good 𝑗 as a substitute at time 𝑡, or (b) she is not offered 𝑗 as a substitute

78

at time 𝑡.1 If neither (a) nor (b) hold—in other words, if the consumer has, in fact, been offered 𝑗 ′

as a substitute and proceeded to reject it—then 𝑎𝑖 𝑗 ′𝑡 equals zero.

Take as given that consumer 𝑖 has taste and learning parameters 𝜒. Then, according to the

familiar conditional logit formula, the probability that she orders good 𝑗 at time 𝑡 is

𝑃𝑖 𝑗𝑡 | 𝜒 ≔ Pr

(cid:104)

𝑗 = arg max 𝑗 ∈J𝑡 E[𝑢𝑖 𝑗𝑡]
exp(𝑤 𝑗𝑡 𝜒)
(cid:205) 𝑗 ′∈J𝑡 exp(𝑤 𝑗 ′𝑡 𝜒)

=

(cid:105)

𝑤𝑡; 𝜒

(cid:12)
(cid:12)
(cid:12)

while her probability of accepting the good as a substitute is given by

𝑖 𝑗𝑡 | 𝜒 ≔ Pr (cid:2) E[𝑢𝑖 𝑗𝑡] > 𝑢𝑖0𝑡
𝑃 𝐴

(cid:12)
(cid:12) 𝑤𝑡; 𝜒(cid:3)

=

exp(𝑤 𝑗𝑡 𝜒)
1 + exp(𝑤 𝑗𝑡 𝜒)

However, due to the panel structure of the data, the consumer may make a sequence of multiple

orders and substitution decisions. The probability of observing a given sequence takes the form

𝑃𝑖 | 𝜒 ≔

(cid:214)

(cid:214)

𝑡∈T

𝑗 ∈J𝑡

(𝑃𝑖 𝑗𝑡 | 𝜒) 𝑦𝑖 𝑗𝑡 (𝑃 𝐴

𝑖 𝑗𝑡 | 𝜒)𝑎𝑖 𝑗𝑡

In reality, though, the consumer’s individual taste coefficients are not observed by the econo-

metrician. The unconditional choice-sequence probability 𝑃𝑖 is obtained by integrating over the

distribution of tastes across the population of consumers:

∫

𝑃𝑖 ≔

(𝑃𝑖 | 𝜒) 𝑓𝜒 ( 𝜒)𝑑𝜒

(2C.4)

Here 𝑓𝜒 (·) denotes the probability density function (PDF) of the parameters 𝜒. (Recall that these

include the consumer’s prior expected brand tastes [the 𝜇𝑖𝑏’s], her learning shocks [the (𝑣𝑖𝑏 − 𝜇𝑖𝑏)’s],

etc.)

As I previously mentioned, equation Equation (2C.4) does not possess a closed form, and must

therefore be simulated. I do this with 𝑅 random draws, indexed 𝑟 ∈ {1, . . . , 𝑅}. For each draw 𝑟, I

1Either because she successfully picks up her original order (whether 𝑗 or some other good), or because she is

offered some other good 𝑗 ′ as a substitute.

79

draw a vector 𝜒𝑟 from 𝑓𝜒 ( 𝜒) and then compute the choice probabilities conditional on 𝜒𝑟, denoted

𝑃𝑖 | 𝜒𝑟.

After conducting 𝑅 draws and computing the resulting conditional choice probabilities, the sim-

ulated unconditional choice-sequence probability ˇ𝑃𝑖 is computed as the average of the conditional

choice probabilities:

ˇ𝑃𝑖 =

1
𝑅

𝑅
∑︁

𝑟=1

(cid:0)𝑃𝑖 | 𝜒𝑟 (cid:1)

(2C.5)

For computational efficiency, this simulation is conducted simultaneously for all consumers

𝑖. The likelihood function is then computed as the product of the consumers’ respective choice

probabilities;

ˇL =

(cid:214)

ˇ𝑃𝑖

𝑖∈N
Calculating Consumer Surplus.—Suppose that consumer 𝑖 has been offered good 𝑠 as a stockout

substitute for her preferred good 𝑗★ at time 𝑡. Her expected present-trip surplus comes to

E[𝐶𝑆𝑖𝑡] = E

(cid:16)

log

(cid:104) 1
𝛼𝑖

exp (cid:0)𝑟 𝐸

𝑖𝑠𝑡 (I𝑖𝑡)(cid:1) + 1

(cid:17) (cid:12)
(cid:12) order 𝑗★; H𝑖𝑡, D𝑖
(cid:12)

(cid:105)

+ 𝐶.

In this equation, 𝑟 𝐸

𝑖𝑠𝑡 ≔ E[𝑢𝑖𝑠𝑡 | I𝑖𝑡] − 𝜀𝑖𝑠𝑡 denotes the expected representative utility of the substitute

𝑠 at time 𝑡, while H𝑖𝑡 and D𝑖 respectively denote the consumer’s purchase history and household

demographics. Finally, 𝐶 is an unknown constant emphasizing that the absolute magnitude of

utility is not identified (Train [2009]). This probability is simulated using a similar approach to

that employed during estimation.

Now turn to future shopping trips. Conditional on acceptance, the present-discounted value of

expected future surplus is given by

𝑉 (accept 𝑠) ≔

𝑇
∑︁

𝑡′=𝑡+1

(cid:34)

1
𝛼𝑖

E

log

(cid:18) ∑︁

𝑗 ′∈J𝑡

exp (cid:0)𝑟 𝐸

𝑖 𝑗 ′𝑡′ (I𝑖𝑡′)(cid:1)

order 𝑗★; H𝑖𝑡, D𝑖

(cid:35)

+ 𝐶.

(cid:19) (cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)

This probability, too, must be simulated.

80

APPENDIX 2D

ESTIMATION RESULTS FOR APPLE SAUCE CUPS

Table 2D.1 reports the parameter estimates for the product categories of apple sauce cups. There

are fewer demographic interactions and random coefficients than for ice cream due to challenges

with convergence.

81

Table 2D.1: Parameter Estimates

Panel A. Brands

Mean exp.
tastes (𝜇𝑏’s)

Heterogeneity of
exp. tastes (𝜎𝑏’s)

Amount of
learning (𝜄𝑏’s)

6.150
[0.088]
5.701
[0.088]

0.816
[0.029]
2.578
[0.025]

0.399
[0.016]
0.107
[0.014]

Panel B. Non-brand observables and prices

Means
(𝛽’s or 𝛼)
−5.650
[0.165]
−6.361
[0.278]
−4.686
[0.114]
−6.050
[0.227]
−3.458
[0.166]
−0.276
[0.010]
0.179
[0.007]
0.171
[0.022]

Interactions with demographics

Std. devs.
(𝜎𝛽’s or 𝜎𝛼)

Household
income

Household
size

Age of oldest
HH malea

4.037
[0.098]
4.416
[0.149]
2.803
[0.062]
3.178
[0.101]
2.479
[0.109]

0.595
[0.017]

0.002
[0.001]

Mott’s

Private label

Fruit: blueberry

Fruit: cherry

Fruit: mixed fruit

Fruit: peach & mango

Fruit: strawberry & kiwi

Unsweetened

No. of cups

Priceb

Panel C. Other explanatory variables

Control function (pre-2021)c

Control function (post-2021)c

Reject in-persond

Coefficients
(𝜆’s or 𝛾)

0.852
[0.050]
0.820
[0.043]
0.792
[0.125]

Notes: estimates are based on 57,811 randomly-sampled observations involving 2048 households. Standard errors
(in brackets) do not correct for measurement error in the control function.

a The random price coefficients 𝛼𝑖 are assumed to follow a log-normal distribution.
b The demand shocks are specified as 𝜉 𝑗𝑡 = 𝜆 ˜𝜉 𝑗𝑡 , where ˜𝜉 𝑗𝑡 is the residual from the pricing function and 𝜆 is a
scaling parameter (reported here). This control function is computed separately before/after January 2021, due

to a

change in the store’s internal cost measure.

c Until September 2021, consumers accepted or rejected stockout substitutes upon arrival at the store. Starting
September 2021, they could accept or reject substitutes remotely (using the store’s app or website).

82

APPENDIX 2E

SUPPLEMENTARY COUNTERFACTUAL
SIMULATIONS

Explaining the Negligible Returns to Steering Consumers’ Learning.—To help explain the unprof-

itability of steering consumers’ learning, Table 2E.1 reports the present-discounted value of future

and total profits under counterfactual changes to the purchase environment or the primitives of

consumers’ learning. Panels A and B show that future profits would remain identical across all

stockout substitution policies if there were no endogenous learning or if the store possessed perfect

foresight about products’ future prices, wholesale costs, and availabilities. Panel C reveals that

even if consumers were willing to accept whatever substitute the store offered, the store’s expected

future profits would remain unchanged from the baseline. Panel D then considers outcomes if

consumers experienced three times as much learning as they do in reality.1 Future profits drop

overall, likely because consumers who formerly purchased Ben & Jerry’s or Halo top “discovered”

that they actually preferred the comparatively low-margin—and inexpensive—Häagen-Dazs brand.

But the present-discounted value of future profits remains identical across the baseline policy and

the two reasonable counterfactual policies (although future profits do increase by a cent under the

purely-illustrative policy tailored to maximize future profits alone). Finally, Panel E indicates that

if consumers were guaranteed to accept and if they experienced three times more learning when

they tried new brands, the present-discounted value of expected future profits would increase by a

cent under the counterfactual policies designed to maximize present-trip profits or total discounted

profits compared to the baseline.

Counterfactual Simulations for Apple Sauce Cups.—Tables 2E.2 and 2E.3 compare profits

and consumer welfare, respectively, under the store’s baseline substitution policy and several

counterfactual ones. See Section 2.7.4 for discussion.

1Here, I triple the magnitude of the (𝑣𝑏 − 𝜇𝑏) “learning correction” parameters before performing the simulation.

83

Table 2E.1: Profits Under Changes to the Purchase Environment or Model
Primitives

Counterfactual policies (“fully individualized”
by original order, past purchases,
and household demographics)

Baseline

Max. present-
trip profits

Max. PDV
total profits

Max. PDV
future profits

Panel A. No endogenous learning

PDV future profits
(given reject)
PDV total profits

PDV future profits

PDV total profits

PDV future profits

PDV total profits

PDV future profits

PDV total profits

17.14
(24.96)
19.33
(24.93)

17.52
(22.32)
19.70
(22.29)

17.16
(25.02)
19.86
(25.01)

14.22
(21.08)
16.41
(21.09)

17.14
(24.96)
19.69
(24.94)

17.14
(24.96)
19.69
(24.94)

Panel B. Perfect foresight

17.52
(22.32)
20.13
(22.35)

17.52
(22.32)
20.13
(22.35)

Panel C. Guaranteed acceptance

17.16
(25.02)
20.29
(25.03)

17.16
(25.03)
20.29
(25.03)

17.14
(24.96)
18.96
(24.95)

17.52
(22.32)
19.08
(22.29)

17.16
(25.03)
18.40
(25.07)

Panel D. Three times more learning

14.22
(21.08)
16.77
(21.09)

14.22
(21.08)
16.77
(21.09)

14.23
(21.08)
15.72
(21.11)

Panel E. Guaranteed acceptance and three times more learning

PDV future profits

PDV total profits

14.22
(21.08)
16.93
(21.08)

14.23
(21.08)
17.36
(21.08)

14.23
(21.08)
17.36
(21.08)

14.23
(21.09)
15.50
(21.13)

Notes: This table reports profit-relevant outcomes under counterfactual changes to the purchase
environments, under the store’s existing substitution policy (the “baseline”) and counterfactual
policies.

84

Table 2E.2: Profit-Relevant Outcomes by Substitution Policy: Apple Sauce

“Fully individualized” by original order,
past purchases, and household demographics

Baseline

“One size
fits all”

Individualized
by original order

Max. present-
trip profits

Max. PDV
total profits

Max. PDV
future profits

Panel A. Present trip

Retail margin

Prob. accept

Expected present-
trip profits

1.56
(0.68)
0.92
(0.13)
1.44
(0.69)

4.66
(2.14)
0.69
(0.27)
2.89
(1.51)

4.58
(2.07)
0.70
(0.27)
2.89
(1.48)

4.41
(2.09)
0.73
(0.25)
2.92
(1.47)

4.41
(2.09)
0.73
(0.25)
2.92
(1.47)

1.65
(0.83)
0.83
(0.25)
1.35
(0.69)

PDV future profits

11.50
(13.71)

11.50
(13.71)

11.50
(13.71)

11.50
(13.71)

11.50
(13.71)

11.50
(13.71)

Panel B. Future trips

Panel C. Overall

PDV total profits

12.94
(13.77)

14.39
(13.92)

14.39
(13.92)

14.42
(13.92)

14.42
(13.92)

12.85
(13.74)

Notes: This table compares profit-relevant outcomes under the store’s existing substitution policy (the “baseline”)
with outcomes under counterfactual policies. These counterfactual policies exploit the store’s knowledge of
the consumer to varying degrees, with the “one-size-fits-all” policy leveraging none of this information; the
“individualized-by-original-order” policy employing the store’s knowledge of the consumer’s original order; and
the “fully individualized” policies additionally exploiting the store’s knowledge of the consumer’s past purchases
and household demographics. Regarding the last, the “fully-individualized” policies are respectively designed to
maximize (i) expected profits on the present shopping trip, (ii) the present-discounted value of total profits (both
present and future), or (iii) or the present-discounted value of future profits alone. All results are reported as means,
with standard deviations appearing in parentheses.

Table 2E.3: Changes in Consumer Welfare Compared to Baseline Policy: Apple Sauce Cups

“Fully individualized” by original order,
past purchases, and household demographics

Expected present-
trip surplus ($)

PDV future surplus ($)

PDV total surplus ($)

“One size
fits all”
−0.67
(2.04)

Individualized by
original order
−0.63
(2.04)

Max. present-
trip profits
−0.57
(2.03)

Max. PDV
total profits
−0.57
(2.03)

Max. PDV
future profits
−0.54
(2.56)

0.00
(0.05)
−0.67
(2.04)

0.00
(0.05)
−0.63
(2.04)

0.00
(0.05)
−0.57
(2.03)

0.00
(0.05)
−0.57
(2.03)

0.01
(0.04)
−0.53
(2.56)

Notes: This table reports changes in consumer welfare when the store adopts various counterfactual substitution
policies. See notes to Table 2.5 for descriptions of these policies. All results are reported as means (with standard
deviations in parentheses).

85

CHAPTER 3

DEMAND ESTIMATION WHEN CONSUMERS’ PREFERENCES VARY OVER TIME

3.1

Introduction

People’s preferences sometimes vary over time. Take the case of coffee, for instance. Many

people prefer iced coffee during the summer and hot coffee during the winter. In this paper, I show

that workhorse demand systems fail to replicate important substitution patterns in markets where

consumers’ preferences vary over time. This shortcoming is rooted in the underlying discrete choice

model: conditional or mixed logit. I show that conditional logit imposes independence between

consumers’ purchases and their pairwise preferences among unpurchased goods. As for mixed

logit, this more general model imposes conditional independence between consumers’ purchases

and their pairwise preferences among unpurchased goods, given the realizations of the consumers’

random coefficients. In other words, what someone purchases on a particular shopping trip should

be uninformative of trip-specific factors that influenced both her purchase and her preferences

among the goods she did not purchase. Hereafter, I refer to the preceding independence constraints

as the independence of preferred alternatives (IPA) properties of conditional and mixed logit,

respectively.

These theoretical results raise two empirical questions. First, can data help determine whether

consumers’ preferences in a given market are consistent with the IPA properties of conditional or

mixed logit? And second, how should demand be estimated when consumers’ preferences prove

inconsistent with the IPA property of the (more flexible) mixed logit model? To provide insight, I

employ novel data from curbside grocery pickup. This is a “click-and-collect” form of shopping

where consumers order groceries online and then pick them up from their local supermarket.

Importantly, products ordered for curbside pickup sometimes go out of stock. This obliges the

store to select a “stockout substitute” on the affected consumer’s behalf. Once she arrives at the

store, the consumer is offered two choices: either she can purchase the stockout substitute, or she

can purchase nothing.1 Whether she is willing to purchase this (store-selected) substitute product

1In principle, the consumer could also enter the store in search of a better substitute. However, this is exceedingly

86

provides direct evidence of its substitutability for the out-of-stock product.

Focusing on the product categories of bottled water and flour, I provide descriptive evidence

that consumers’ decisions to accept (i.e., purchase) or reject (i.e., not purchase) stockout substitutes

are inconsistent with the IPA property of conditional logit. Contrary to the property, consumers’

original orders are informative of their willingness to accept a given stockout substitute. As for

mixed logit, I find that the accept/reject decisions of bottled water buyers are consistent with the

model’s IPA property, whereas those of flour buyers are not. Regarding the latter product category,

consumers’ preferences for substitute flours vary across trips—perhaps owing to variation in the

planned recipe. This kind of within-consumer preference variation is excluded by the mixed logit

IPA.

I next turn to an empirical case study. Does the IPA property of mixed logit influence demand

estimates? If so, does the extent of this influence vary by product category? To give insight,

I estimate demand for bottled water and flour using two models: mixed logit and mixed probit

(which does not exhibit an IPA property). Then I compare the models’ goodness of fit. As I do so,

I focus on the models’ fit in relation to the stockout substitution data. On these data, the mixed logit

IPA imposes the following restriction: a consumer’s original order choice should be conditionally

independent of whether she accepts the substitute (given the realization of her random coefficients).

The results of this case study sometimes vary across product categories, model selection

strategies (i.e., within- versus out-of-sample), and methods of computing choice probabilities (i.e,

“conditional” versus “unconditional”).2 But overall, mixed probit seems to forecast consumers’

accept/reject decisions more accurately than mixed logit does. Importantly, this disparity tends to

be larger for the product category of flour than that of bottled water. This is in keeping with the

descriptive evidence summarized above: namely, that consumers’ preferences for bottled water are

consistent with the IPA property of mixed logit, whereas their preferences for flour are not.

rare with respect to the product categories considered in this paper, namely, flour and bottled water. In 0% (0.6%)
of cases in which the consumer rejects a stockout substitute for a bottled water (flour) product, she enters the store
afterwards to purchase a different bottled water (flour) product.

2The “conditional” approach exploits individual consumers’ past purchases to supply predictions that reflect their

respective choices on past shopping trips (see Train).

87

My findings can inform future applied work on differentiated products demand. In markets

where consumers’ preferences are stable across shopping trips, mixed logit should accurately

reproduce the underlying substitution patterns. But in markets where consumers’ preferences vary

over time, an alternative model may be preferable (such as the mixed probit model estimated in this

paper).3 Of course, there exist markets where the amount of within-consumer preference variation

is not immediately obvious. If the researcher has data on unpurchased goods’ substitutability for

purchased ones—such as “second choice data” or data on stockout substitutions (as in this paper)—

she can adapt the formal tests and informal descriptive analyses developed here to test whether

consumers’ preferences are consistent with the IPA properties of conditional or mixed logit.

The remainder of the paper proceeds as follows. Section 3.2 relates this study to prior literature.

Section 3.3 reviews the canonical differentiated products demand model developed by Berry,

Levinsohn, and Pakes (1995),4 and then formalizes the IPA properties of conditional and mixed logit.

Section 3.4 provides institutional details about curbside pickup and introduces the data. Section 3.5

presents descriptive evidence concerning the extent to which consumer behavior coincides with

the IPA properties of conditional and mixed logit. Section 3.6 presents a demand estimation case

study, while Section 3.7 concludes.

3.2 Relationship to Prior Literature

An extensive literature within empirical industrial organization employs data on consumers’

preferences among unpurchased goods—hereafter, alternate-choice data. Both in this existing

literature and in my study, alternate-choice data help identify products’ substitutability. However,

I also use these data for a second purpose: namely, to test whether consumers’ preferences are

consistent with the IPA properties of conditional and mixed logit.

In what follows, I will elaborate on the relationship between this study and the prior literature

3The mixed probit model is impractical with large datasets. In Section 3.7, I suggest alternative methods of relaxing

the mixed logit IPA that impose a smaller computational burden.

4Unlike Berry, Levinsohn, and Pakes (1995), I abstract away from price endogeneity. I do so for two reasons.
First, unobserved “quality” is probably less important for the products studied in this paper—namely, bottled water and
flour—than it is for automobiles. And second, my demand specification is much more computationally burdensome
than BLP 1995, as I employ a semi-nonparametric estimator. It would be computationally challenging to adopt an IV
(or even control function) approach.

88

within empirical industrial organization (“IO”) that employs alternate-choice data. Then I will

briefly remark on two other literatures to which my work relates: the econometric literature on the

identifying power of alternate-choice data, and the empirical literature that studies stockout events.

3.2.1 Alternate-Choice Data in Empirical IO

A growing empirical literature leverages alternate-choice data to estimate demand elasticities.

The pioneering work is Berry, Levinsohn, and Pakes’s Berry, Levinsohn, and Pakes (2004) study

of the US automotive market—hereafter, BLP ’04. They estimate a mixed logit model of demand

using two types of data: aggregated data on products’ market shares, and questionnaire data from

a representative sample of new-car buyers. The latter indicate buyers’ “second choices”—that is,

the purchases they would have made if their preferred vehicle were unavailable. By requiring their

demand system to match these second-choice substitution patterns, BLP ’04 obtains more precise

estimates of the parameters that govern product substitutability in their model.

The empirical framework developed in BLP ’04 remains the most popular means of incor-

porating alternate-choice data in demand systems.5 Of the few studies that do adopt alternative

frameworks, most still share the following features with BLP ’04:

(i) The consumer’s discrete choice problem is modeled with mixed logit.

(ii) The data consist of cross-sectional data on consumers’ purchases, coupled with stated-

preference data on consumers’ rankings of unpurchased products.6

It is these features that mark my point of departure from the existing literature. Regarding (i), I

highlight the restrictions imposed by mixed logit on the substitution patterns in alternate choice

data. Under the IPA property of mixed logit, the consumer’s purchase choice must be independent

of her pairwise preferences among unpurchased goods, conditional on her (consumer-specific)

taste coefficients. As for (ii), my data differ in important respects from the data employed in

5In addition to a series of studies on the automotive market listed below, Farronato and Fradkin (2022) also adapt
the framework of BLP ’04 in their study about the welfare effects of Airbnb on the accommodation industry. Other
recent examples include Conlon and Gortmaker (2023), who study the soda industry; as well as Montag (2023), who
studies the household appliance industry.

6These data, which are collected from questionnaires, concern consumers’ hypothetical preferences over products

they did not purchase. For example, “If product A were not available, what would you have purchased instead?”

89

earlier studies. Most prior work couples (a) nationally representative, but aggregated, data on

market shares with (b) highly detailed, but stated-preference, alternate-choice data. In contrast, my

data pair (a) household-level panel data on purchases at a single, regional retailer; with (b) less

comprehensive, but revealed-preference,7 alternate-choice data.

I will now elaborate on both these points of departure, explaining how they can inform future

applied work that uses alternate-choice data.

The Model.—In differentiated products demand estimation, the consumer’s discrete choice

problem is most often represented with mixed logit.8 However, mixed logit is subject to an IPA

property that may be unrealistic in some settings.

In the introduction, I used the example of a

regular flour buyer to illustrate the kind of behavior that is excluded by the mixed logit IPA. Here I

translate this constraint to the automotive market, the subject of the empirical application in BLP

’04 as well as several recent studies that integrate alternate choice data in a mixed logit model—

Grieco, Murry, and Yurukoglu (2023); Bachmann et al. (2023); and Xing, Leard, and Li (2021).

To see the significance of the mixed logit IPA in the automotive market, picture someone who has

purchased two cars recently. The first is a large SUV (say, the Chevrolet Suburban); while the

second is a small sports car (say, the Chevrolet Camaro). Suppose that she purchased the former

about a year before the latter. Under the mixed logit IPA, our consumer’s pairwise preferences

among the unpurchased automobiles should have been essentially identical when she purchased

the SUV as when she purchased the sports car a year later. In other words, she was equally likely

to have preferred an unpurchased SUV (say, the Ford Expedition) over an unpurchased sports car

(say, the Ford Mustang) on both occasions. But this prediction is counterintuitive, as the uses of

an SUV (such as transporting bulky objects or ferrying lots of people) differ from those of a sports

car (such as pleasure driving). Thus, when our consumer made her more recent purchase—that of

the Camaro—she was probably searching specifically for a small sports car. It seems unlikely that

7These data record consumers’ decisions to purchase, or not purchase, store-selected substitute products. Con-

sumers therefore have “skin in the game:” if they accept the substitute, they will pay for it.

8Allcott (2013) represents a notable exception.

In his research into the accuracy of consumers’ beliefs on the
savings from fuel efficient vehicles, he employs a nested logit model. Another exception, albeit from outside the field
of industrial organization, is provided by Abdulkadiroğlu, Agarwal, and Pathak (2017). Their research into school
choice employs the multinomial probit model.

90

this search would have ended in the purchase of a second large SUV, as such a vehicle would not

fulfill the purpose she had in mind. But the mixed logit model might make just such a prediction,

because it presumes that her preferences over unpurchased vehicles remained identical between the

two shopping occasions (despite the different classes of vehicle purchased).

Even so, the mixed logit IPA remains realistic in many other settings. Take the case of household

appliances, for example. An individual consumer is unlikely to purchase a given appliance (such

as a furnace or dishwasher) more than a couple of times throughout her lifetime. And even if she

does make multiple purchases, her preferences will likely remain quite stable over time. Thus,

within-consumer preference variation is likely minimal in household appliance markets, even if

there is considerable between-consumer preference variation. In such markets, the mixed logit IPA

accurately describes consumers’ behavior.

The Data.—The data employed in this study provide a useful complement to the data used

in previous work. Within the existing literature, it is customary to couple (i) cross-sectional

data on market shares with (ii) detailed, but stated-preference, alternate-choice data. This data

combination is ideal for most applications of interest, such as recovering markups or characterizing

market responses to counterfactual policy changes. However, it would be challenging to test the

IPA property of mixed logit with cross-sectional data of this description. The reason is that the

mixed logit IPA imposes a within-panel restriction on product substitutability. To assess the extent

to which consumer behavior is consistent with this constraint, it helps to have household-level panel

data. My data—which consist of (i) household-level panel data on consumers’ purchases and (ii)

revealed-preference alternate-choice data—fit this description.

My data display two key limitations. The first concerns external validity: whereas most

existing studies employ nationally representative data, mine cover only one (regional) retailer. As

for the second limitation, my data provide less detailed information on consumers’ preferences

over unpurchased products than do the data employed in existing studies. Specifically, my data

characterize consumers’ revealed preferences between one unpurchased good—namely, a store-

selected stockout substitute—and the “outside option” of purchasing nothing. By contrast, most

91

existing studies leverage questionnaire data in which consumers either (i) state their second–most-

preferred product or (ii) provide a complete ranking of the unpurchased products. Although these

data describe hypothetical choices,9 they are far more detailed than my data, and will thus provide

more precise estimates of demand elasticities.

Unlike most previous studies, my objective is not to obtain a nationally-representative model of

demand for a specific market. Rather, my task is to evaluate the degree to which the IPA properties

of conditional and mixed logit coincide with consumers’ observed behavior for various product

categories. So far as this task is concerned, the limitations of my data are unlikely to prove a

substantial hindrance.

3.2.2 The Econometric Literature on Identification with Alternate-Choice Data

An emerging econometric literature documents how alternate-choice data help to identify

demand. Conlon and Mortimer (2021) show that, under certain conditions, second-choice data

identify the “Average Treatment for the Untreated” (ATUT) which may, in turn, be a good proxy

for demand elasticities. In addition, preliminary work by Conlon, Mortimer, and Sarkis (2023)

suggests that a pairing of (i) second-choice data and (ii) information on market shares can identify

demand even without data on products’ observable characteristics. Furthermore, nonparametric

estimation using such data can sometimes match observed substitution patterns better than BLP

’04–style demand systems, despite the latter exploiting additional data on product characteristics.

In particular, BLP ’04–style demand systems sometimes underpredict diversion to close substitutes

and overpredict diversion to more distant ones. This tendency could be partially explained by the

IPA property of mixed logit, which rules out within-consumer variation in preferences over product

characteristics (such as might arise from variation in purchase circumstances).

3.2.3 The Literature on Stockouts and Demand Estimation

There is a large literature in empirical industrial organization and marketing that leverages

stockout events to help estimate demand. The intuition is that the substitutability of one good—say,

9See Carlsson and Martinsson (2001) for a discussion in the context of environmental economics; Lusk and
Schroeder (2004) for one in agricultural economics; Quaife et al. (2018) for one in health economics; and Brownstone
and Small (2005) for one in transportation research.

92

A—for another—say, B—can be inferred from the degree to which A’s choice share increases

when B goes out of stock.

In this literature, the primary points of differentiation are (i) the

institutional environment and (ii) the cause of product unavailability. Regarding (i), some of these

papers’ environments resemble mine, being either supermarkets or convenience stores. These

include Musalem et al. (2010) and Bruno and Vilcassim (2008). Another important purchasing

environment within this literature is vending machines, the subject of Anupindi, Dada, and Gupta

(1998); Conlon and Mortimer (2021); Conlon and Mortimer (2013); and Conlon and Mortimer

(2010). As for (ii), most studies rely on endogenous (i.e., naturally occurring) stockouts. Notable

exceptions include Conlon and Mortimer’s 2021 and 2010 studies, which experimentally manipulate

product availability in vending machines.

The key difference between these studies and mine is the data. In my data, stockouts occur

after the consumer has already made her initial purchase decision. Consequently, I observe two

choices per stockout event: the consumer’s “first choice” as well her later decision to accept or

reject a store-selected substitute (after her first choice has gone out of stock). By contrast, the

studies listed above observe only one choice per stockout event:

the consumer’s purchase from

among the available alternatives. It remains unknown what the consumer would have purchased

under full availability. Further, the aforementioned studies rely on cross-sectional data, whereas I

have panel data. For both these reasons, my data are especially suitable to test the IPA properties

of conditional and mixed logit.

One study within this literature may provide suggestive evidence of bias resulting from the mixed

logit IPA. Conlon and Mortimer (2010) find that, when a product goes out of stock, the mixed logit

model underpredicts the sales increase enjoyed by close substitutes and overpredicts that enjoyed

by more distant substitutes. Notice that this is the same pattern identified by Conlon, Mortimer, and

Sarkis (2023) in the context of the automotive market (as discussed in Section 3.2.2). Regarding

vending machines, Conlon and Mortimer propose several potential explanations for this pattern,

such as omitted product characteristics or the absence of price variation in vending machines.

However, the IPA property of mixed logit could also be responsible. Under this constraint, an

93

individual consumer cannot be “in the mood” for a certain type of snack on one occasion but a

different type on another.10 So if an individual consumer opts for different categories of snacks

on different occasions—such as a savory snack on one occasion and a sweet one on another—then

the mixed logit model will assume she is (largely) indifferent between the two categories. But in

actual fact, she might have had a strong preference for one category on a given occasion (e.g., “I

could really use a salty snack right now”) but a strong preference for a different category on another

occasion (e.g., “I’m craving something sweet right now”).

3.3 Theory: Alternate-Choice Data in Demand Systems

In this section, I introduce my empirical framework and then formalize the IPA properties of

conditional and mixed logit.

Consider a differentiated products market with 𝐽 goods (or “products”), along with an outside

option of no purchase (“good 0”). At time 𝑡, each consumer 𝑖 purchases the good 𝑗 ∈ J ≡

{0, 1, . . . , 𝐽} that affords the greatest conditional indirect utility 𝑢𝑖 𝑗𝑡.11

Utility is a linear index of product characteristics (𝑥 𝑗 ), price (𝑝 𝑗𝑡), and an i.i.d. Gumbel error

(𝜀𝑖 𝑗𝑡):

𝑢𝑖 𝑗𝑡 = 𝑥 𝑗 𝛽𝑖 − 𝛼𝑖 𝑝 𝑗𝑡 + 𝜀𝑖 𝑗𝑡 .

Note that the taste coefficients (𝛽𝑖, 𝛼𝑖) are specific to individual consumers 𝑖.

I will show that conditional and mixed logit each impose a form of independence between

consumers’ purchases and their preferences among unpurchased goods.

I begin by proving a

lemma about (conditional) logit utilities. Then I use this lemma to derive the IPA properties of

conditional and mixed logit.

10To see how this would bias estimates of demand elasticities, picture a consumer who orders a savory snack—say,
Lay’s potato chips—on one occasion but a sweet one—say, Kit Kat—on another. Under the mixed logit IPA, her
relative preferences among the unpurchased snacks must have remained the same on both occasions. Consider the
counterfactual where both her first-choice products were out of stock on their respective purchase occasions—that is,
Lay’s potato chips were out of stock on the first occasion and Kit Kat out of stock on the second. Under the mixed
logit IPA, she would have been no likelier to divert to a given savory snack—say, salted peanuts—on the first occasion
(when, under full availability, she would have ordered a salty snack) than on the second (when, under full availability,
she would have ordered a sweet snack).

11I assume that arg max 𝑗 ∈ J 𝑢𝑖 𝐴𝑡 is a singleton set with probability one. (In other words, there are no “ties.”).

94

Lemma 1 (Irrelevance of Identical Upper Bounds on Two Goods’ Logit Utilities). Assume that

all consumers share the same taste coefficients, with (𝛽𝑖, 𝛼𝑖) = (𝛽, 𝛼) for all 𝑖. Then, for any two

goods 𝐴, 𝐵 ∈ J and any constant 𝐾 ∈ R,

Pr (cid:2)𝑢𝑖 𝐴𝑡 > 𝑢𝑖𝐵𝑡

(cid:12)
(cid:12) max{𝑢𝑖 𝐴𝑡, 𝑢𝑖𝐵𝑡 } < 𝐾(cid:3) = Pr[𝑢𝑖 𝐴𝑡 > 𝑢𝑖𝐵𝑡].

Proof. See Chapter 3A.

■

Figure 3.1: Irrelevance of Identical Upper Bounds on Two Goods’ Logit Utilities

Figure 3.1 depicts a generic example of Lemma 1. The black solid line and the gray dash-dotted

line chart the unconditional PDFs of 𝑢𝑖 𝐴𝑡 and 𝑢𝑖𝐵𝑡, respectively; while the conditional PDFs of 𝑢𝑖 𝐴𝑡

and 𝑢𝑖𝐵𝑡 respectively correspond to the black dashed and gray dotted lines.

Although both unconditional PDFs share the same shape, the unconditional PDF of 𝑢𝑖 𝐴𝑡 is

a rightwards location-transformation of 𝑢𝑖𝐵𝑡’s.

(Evidently, the representative utility of good 𝐴

exceeds that of good 𝐵: 𝑥 𝐴 𝛽 − 𝛼𝑝 𝐴𝑡 > 𝑥𝐵 𝛽 − 𝛼𝑝𝐵𝑡.) It follows that

Pr[𝑢𝑖 𝐴𝑡 > 𝑢𝑖𝐵𝑡] > 1
2

.

Turning to the conditional PDFs, notice that both are bounded above by 𝐾. However, they

differ in shape, with the conditional PDF of 𝑢𝑖 𝐴𝑡 bunching more tightly around 𝐾 than does the

95

KuiAtuiBtuiAtuiAt<KuiBtuiBt<Kconditional PDF of 𝑢𝑖𝐵𝑡. It follows that

Pr (cid:2)𝑢𝑖 𝐴𝑡 > 𝑢𝑖𝐵𝑡

(cid:12) max{𝑢𝑖 𝐴𝑡, 𝑢𝑖𝐵𝑡 } < 𝐾(cid:3) > 1
(cid:12)
2

.

Conditional on being smaller than 𝐾, the random variable 𝑢𝑖 𝐴𝑡 is more likely to be “just under” the

upper bound 𝐾 than is the random variable 𝑢𝑖𝐵𝑡. Less intuitive, however, is the following result,

which is implied by Lemma 1:

Pr (cid:2)𝑢𝑖 𝐴𝑡 > 𝑢𝑖𝐵𝑡

(cid:12)
(cid:12) max{𝑢𝑖 𝐴𝑡, 𝑢𝑖𝐵𝑡 } < 𝐾(cid:3) = Pr[𝑢𝑖 𝐴𝑡 > 𝑢𝑖𝐵𝑡].

That is, the probability that 𝑢𝑖 𝐴𝑡 is greater than 𝑢𝑖𝐵𝑡 remains unchanged after imposing the condition

that max{𝑢𝑖 𝐴𝑡, 𝑢𝑖𝐵𝑡 } < 𝐾.

In visual terms, the two distributions will both compress to the left

such that the probability of a draw from one distribution exceeding a draw from the other remains

unchanged.

Not all distributions display this property. For instance, if the error terms were distributed i.i.d.

standard normal (as opposed to i.i.d. Gumbel), then

Pr (cid:2)𝑢𝑖 𝐴𝑡 > 𝑢𝑖𝐵𝑡

(cid:12)
(cid:12) max{𝑢𝑖 𝐴𝑡, 𝑢𝑖𝐵𝑡 } < 𝐾(cid:3) ≠ Pr[𝑢𝑖 𝐴𝑡 > 𝑢𝑖𝐵𝑡]

in general.12 So this property represents an unusual feature of the (conditional) logit model and,

by extension, of the Gumbel distribution.

I will now employ Lemma 1 to derive the IPA property of conditional logit.

Theorem 1 (Conditional Logit IPA). Assume that all consumers share the same taste coefficients,

with (𝛽𝑖, 𝛼𝑖) = (𝛽, 𝛼) for all 𝑖. Then, for any three goods 𝐴, 𝐵, 𝐶 ∈ J ,

Pr (cid:2)𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡

(cid:12)
(cid:12) 𝑢𝑖 𝐴𝑡 = max 𝑗 ∈J 𝑢𝑖 𝑗𝑡 (cid:3) = Pr[𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡].

Proof. By the law of iterated expectations,

Pr (cid:2)𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡

(cid:12)
(cid:12) 𝑢𝑖 𝐴𝑡 = max 𝑗 ∈J 𝑢𝑖 𝑗𝑡 (cid:3) = E

(cid:104)

Pr (cid:2)𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡

(cid:12)
(cid:12) 𝑢𝑖 𝐴𝑡 = max 𝑗 ∈J 𝑢𝑖 𝑗𝑡;

(𝑢𝑖 𝑗𝑡) 𝑗 ∈J \{𝐵,𝐶}

(cid:3) (cid:12)
(cid:12)
(cid:12)

𝑢𝑖 𝐴𝑡 = max 𝑗 ∈J 𝑢𝑖 𝑗𝑡

(cid:105)

.

(3.1)

12By way of example, suppose 𝑢𝑖 𝐴𝑡 = 1 + 𝜀𝑖 𝐴 and 𝑢𝑖𝐵𝑡 = 𝜀𝑖𝐵, where the error terms are i.i.d. standard normal.

Then Pr (cid:2)𝑢𝑖 𝐴𝑡 > 𝑢𝑖𝐵𝑡

(cid:12)
(cid:12) max{𝑢𝑖 𝐴𝑡 , 𝑢𝑖𝐵𝑡 } < 0(cid:3) ≈ 0.64 < 0.76 ≈ Pr[𝑢𝑖 𝐴𝑡 > 𝑢𝑖𝐵𝑡 ].

96

As far as the inner component of Equation (3.1) is concerned, only two goods’ utilities are random

variables: those of 𝐵 and 𝐶. (The remaining goods’ utilities are constants.) We can therefore apply

Lemma 1 to the inner component of Equation (3.1), obtaining

Pr (cid:2)𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡

(cid:12)
(cid:12) 𝑢𝑖 𝐴𝑡 = max 𝑗 ∈J 𝑢𝑖 𝑗𝑡; (𝑢𝑖 𝑗𝑡) 𝑗 ∈J \{𝐵,𝐶}

(cid:3) = Pr[𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡].

(3.2)

Substituting Equation (3.2) into Equation (3.1) yields

Pr (cid:2)𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡

(cid:12) 𝑢𝑖 𝐴𝑡 = max 𝑗 ∈J 𝑢𝑖 𝑗𝑡 (cid:3) = E (cid:2) Pr[𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡] (cid:12)
(cid:12)

(cid:12) 𝑢𝑖 𝐴𝑡 = max 𝑗 ∈J 𝑢𝑖 𝑗𝑡 (cid:3)

= Pr[𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡].

■

Importantly, goods 𝐴, 𝐵, and 𝐶 need not be “inside goods.” Rather, one of them could be

the outside option: good 0. Such is the case for the empirical application to curbside pickup in

Sections 3.5 and 3.6.13

I will now show that mixed logit exhibits an analogous IPA property, conditional on the

realizations of consumers’ random taste coefficients.

Corollary 1 (Mixed Logit IPA). For any three goods 𝐴, 𝐵, 𝐶 ∈ J ,

Pr (cid:2)𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡

(cid:12)
(cid:12) 𝑢𝑖 𝐴𝑡 = max 𝑗 ∈J 𝑢𝑖 𝑗𝑡; 𝛽𝑖, 𝛼𝑖(cid:3) = Pr[𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡 | 𝛽𝑖, 𝛼𝑖].

Proof. Follows immediately from Theorem 1 and the definition of mixed logit.

■

I discuss the practical implications of Theorem 1 and Corollary 1 elsewhere.14 In addition,

Chapter 3B relates Theorem 1 to prior theoretical results in the literature, while Chapter 3C presents

Monte Carlo tests of Theorem 1.

13Briefly:

in curbside pickup, a consumer is observed making two related choices: her initial order, and her
subsequent decision to accept or reject a stockout substitute. Regarding the former, her decision to order some good 𝑗
for curbside pickup indicates that she prefers 𝑗 to the outside option (good 0). As to the latter, she will accept an inside
good 𝑗 ′ ∈ J \ {0, 𝑗 } if and only if 𝑗 ′ is preferred to the outside option. See Sections 3.5 and 3.6 for details.

14See Section 3.5 for an application of Theorem 1 to curbside grocery pickup; and for applications of Corollary 1

to the automotive market and to curbside grocery pickup, see Sections 3.2 and 3.5, respectively.

97

3.4

Institutional Background and Data

This section introduces the data, which concern curbside grocery pickup at a regional super-

market chain. In what follows, I first provide an overview of curbside grocery pickup and then

catalog the contents of the data.

3.4.1

Institutional Background

Curbside grocery pickup is a form of online shopping in which a consumer orders her groceries

online and later picks them up from a bricks-and-mortar supermarket. Her shopping experience

proceeds according to the following timeline. First, she uses the supermarket’s website or its

smartphone app to place her order, indicating which items she wants as well as when she would

like to pick them up (e.g., tomorrow morning). Some time later, a supermarket worker gathers the

requested items and sets them aside to await pickup. Once the consumer arrives, the worker will

bring the items out to her car, where she will pay for them.

Sometimes, however, an item in the consumer’s order goes out of stock after she has placed

the order, but before the supermarket worker assembles it. In that event, the worker will choose

another product to serve as a substitute.15 Once the consumer arrives, she will be presented with

two choices: either she can accept the substitute that the worker chose earlier on her behalf, or she

can reject it and buy no such product at all.

3.4.2 Data

This study employs three data sets from a regional supermarket chain. The first, hereafter

referred to as the “curbside stockout” data set, concerns stockout substitutions in curbside pickup

orders. For each stockout event, these data report the universal product code (UPC) of the out-of-

stock item as well as that of the substitute offered. I also observe the price of the substitute, as well

as whether the substitute is accepted or rejected by the consumer.16 The data also assign a unique

identifier to each transaction, enabling me to match them to the second data set.

15The store’s website and mobile app allow the consumer to leave item-level instructions for the store. For instance,
someone who is ordering strawberries might request “extra-ripe” berries. However, a consumer could also use this
feature to request a specific substitute if her preferred product goes out of stock. Although I do not observe whether a
consumer makes such a substitution request (or, for that matter, whether she leaves item-level instructions of any kind),
the retailer has indicated that consumers rarely leave item-level instructions.

16The price of the out-of-stock item is obtained from the second data set. See Section 3.6 for details.

98

The second data set comprises “scanner data,” which characterize all purchases at the super-

market chain, irrespective of shopping channel (i.e., in-store, delivery, or curbside pickup). For

each purchased item, these data report the UPC and price. I also observe transaction IDs that follow

the same system as the curbside stockout data, enabling me to match the two data sets. In addition,

the scanner data record the loyalty program ID of the consumer making the purchase, lending the

data a panel structure.17

The final data set is the chain’s “product catalog,” which characterizes all the products sold at

the chain. For each product, the catalog reports the UPC and brand, along with the location in the

chain’s product taxonomy. The catalog also provides a string description of the product, from which

I extract information on its observable characteristics (using so-called “regular expressions”). For

example, here is the string description for one of the flour products:

“GOLD MEDAL FLOUR HARVEST KING BREAD 5 LB”

This description classifies the product as a bread flour (as opposed to, say, all-purpose or wheat).

It also indicates the quantity of flour: five pounds.

Table 3.1 reports summary statistics for the two product categories studied in Sections 3.5

and 3.6: bottled water and flour. These categories were chosen for three reasons. First, I observe

many stockout substitutions for products in these categories. Second, product differentiation within

each category is fairly uncomplicated. That is to say, a given product’s utility depends on only a

few observable characteristics (a fact which simplifies the structural analysis in Section 3.6). And

third, the categories display dramatically different levels of variation in consumers’ preferences over

time. Recall that the mixed logit IPA constrains within-consumer preference variation as follows:

each consumer’s preferences among unpurchased products should remain constant across all her

trips. Thus, if consumers’ preferences remain stable over time in a given product category, the IPA

property of mixed logit will mirror consumers’ true preferences over unpurchased products. On

17Although participation in the loyalty program is not compulsory in general, it is required in order to place
curbside pickup orders. Consequently, I can match the purchases of curbside pickup patrons to their in-store and
delivery purchases.

99

the other hand, if consumers’ preferences vary between shopping trips, the mixed logit IPA will be

inconsistent with their true preferences over unpurchased products.

To test whether the mixed logit IPA is inconsistent with the behavior of consumers whose

preferences differ between trips, I consider a product category with considerable within-consumer

preference variation: flour. The reason that flour buyers’ preferences vary between trips is that

specific flours are suited to specific recipes. If someone plans to bake bread, she would probably

prefer bread flour; whereas if she intends to bake cupcakes, she would probably favor all-purpose

flour.18 By way of comparison, I also study a product category whose buyers likely exhibit stable

preferences over time: bottled water. Consumers’ preferences concerning this category probably

persist over time because bottled waters are functionally interchangeable.19

In consequence, a

consumer’s order choice will largely depend on (i) her subjective assessments of products’ tastes

and (b) her price sensitivity. And one would expect both (i) and (ii) to remain fairly constant

between trips.

Having explained why bottled water and flour form the focus of my empirical analysis, I now

return to the summary statistics in Table 3.1. Panel A presents an overview of these product

categories. Notice that almost three times as many households have experienced a stockout substi-

tution for bottled water (66,447) as have experienced one for flour (22,549). The categories also

differ, albeit less dramatically, with respect to the number of distinct brands and products carried

by the chain. (By “brand,” I refer to a branded product line under which many distinct products

may be sold. For instance, the Gold Medal brand encompasses many distinct flour products, such

as “Whole Wheat” and “All-Purpose Bleached.”) Specifically, there are more distinct brands of

flour—as well as individual products—than there are of bottled water. Observe also that only a

proper subset of the chain’s offerings in either category are available for curbside pickup.

Turning to the panel dimension of the data, Panel B reports that the average household (who

18Although any flour can be used in any recipe, using the “wrong” type of flour may require extra work on the

baker’s part—such as adjusting the recipe—and may also result in an inferior final product.

19All bottled waters must satisfy FDA “standard of quality” conditions (U.S. Food & Drug Administration 2022),
which regulate the maximum level of contaminants in the product. In addition, most bottled waters share the same
size: 16.9 fl oz.

100

Table 3.1: Summary Statistics by Product Category

Panel A. Overview

Statistic

Bottled
water

No. of households with 1+ substitutions
No. of distinct products purchased

. . . of which ordered for curbside pickup

No. of distinct brands purchased

. . . of which ordered for curbside pickup

66,447
40
32
9
9

Flour

22,549
52
38
14
8

Panel B. Per household
with 1+ substitutions

No. of shopping trips

. . . of which curbside pickup

. . . of which feature 1+ substitutions

No. of distinct products ever purchased

. . . of which ordered for curbside pickup

No. of distinct brands ever purchased

. . . of which ordered for curbside pickup

39.4
7.4
1.6
5.7
2.5
3.1
1.8

12.0
3.1
1.1
4.1
1.9
2.3
1.4

Prob. accept (%)

Panel C. Stockout substitutions

87.3

92.0

Notes: All estimates are reported as means or totals. By “brands,” I refer to branded product
lines, each of which may include multiple products in a given category. For instance, the Gold
Medal brand sells many types of flour, such as “Whole Wheat” and “All-Purpose Bleached.”

has experienced one or more substitutions) has made more shopping trips that involve bottled water

(39.4) than flour (12.0). A modest fraction of these trips are curbside pickup (19% and 26% for

bottled water and flour, respectively). On average, bottled water buyers have experienced slightly

more stockout substitutions (1.6) than have their flour counterparts (1.1). Perhaps in consequence

of having made more purchases, the average bottled water buyer has purchased more distinct brands

and products than has her flour counterpart.

Concerning stockout substitutions, Panel C indicates that flour buyers are likelier to accept the

substitute on offer (92.0%) than are their bottled water counterparts (87.3%).

3.5 Descriptive Evidence

In this section, I provide descriptive evidence concerning the extent to which consumer behavior

coincides with the IPA properties of conditional and mixed logit. Because the IPA property of

conditional logit is a cross-sectional independence constraint, whereas that of mixed logit is a

101

within-panel constraint, I examine the two properties separately.

3.5.1 The Conditional Logit IPA

The IPA property of conditional logit imposes independence between a consumer’s purchase and

her preferences among the unpurchased products. In the context of curbside pickup, the consumer’s

“purchase” corresponds to her order choice. Thus, the conditional logit IPA imposes independence

between her original order and her preferences among the goods she did not order—including the

“outside option” of buying nothing.

To see why, consider a consumer 𝑖 who is placing an order for curbside grocery pickup at time

𝑡. She must choose among 𝐽𝑡 differentiated goods and the “outside option” of no purchase (“good

0”). She will order whichever good 𝑗 ∈ J𝑡 = {0, 1, . . . , 𝐽𝑡 } affords the greatest conditional indirect

utility 𝑢𝑖 𝑗𝑡,20 given by

𝑢𝑖 𝑗𝑡 = 𝑥 𝑗 𝛽 − 𝛼𝑝 𝑗𝑡 + 𝜀𝑖 𝑗𝑡 .

In this equation, 𝑥 𝑗 is a vector of product characteristics, 𝑝 𝑗𝑡 denotes the price, and 𝜀𝑖 𝑗𝑡 is an i.i.d.

Gumbel error. Regarding the outside option, I normalize 𝑢𝑖0𝑡 = 𝜀𝑖0𝑡.

Suppose that consumer 𝑖 orders an inside good 𝑗 ∈ J𝑡 \ {0}. This suggests that she prefers 𝑗

over the other inside goods as well as the outside option: 𝑢𝑖 𝑗𝑡 = max 𝑗 ′∈J𝑇 𝑢𝑖 𝑗 ′𝑡.

Now imagine that our consumer’s ordered good 𝑗 goes out stock. As a result, she faces a

binary choice between (i) a stockout substitute 𝑠 ∈ J \ {0, 𝑗 } and (ii) the outside option. She will

accept the substitute 𝑠 if and only if 𝑢𝑖𝑠𝑡 ⩾ 𝑢𝑖0𝑡.21 Given her original order choice ( 𝑗), what is the

20I model “conditional” demand—that is, demand conditional on ordering one of the inside goods. There are
two reason for adopting this approach. First, on occasions when someone visits the store but does not purchase a
product within a given product category, it is unclear whether (i) she actively considered the store’s offerings within
the category, but decided the “outside option” of no purchase was preferable; or (ii) she never examined the store’s
offerings at all, as she had no need for a product in the category. As for the second reason that I model conditional
demand, it is that the value of the “outside option” may differ within a given curbside pickup trip. When the consumer
is assembling her order at home, she may be more (or less) disposed to prefer the outside option than when she has
been offered a stockout substitute at the store. (For instance, after she has placed her order, she may be committed to
preparing a specific recipe based on the combination of items in her pickup order.)

21Without loss of generality, I normalize the utility of the outside option so that 𝑢𝑖0𝑡 = 𝜀𝑖0𝑡 .

102

probability that she accepts 𝑠? Under Theorem 1,

Pr[𝑖 accepts 𝑠 | 𝑖 ordered 𝑗] = Pr[𝑢𝑖𝑠𝑡 ⩾ 𝑢𝑖0𝑡 | 𝑢𝑖 𝑗𝑡 = max 𝑗 ′∈J 𝑢𝑖 𝑗 ′𝑡]

= Pr[𝑢𝑖𝑠𝑡 ⩾ 𝑢𝑖0𝑡]

= Pr[𝑖 accepts 𝑠]

In other words, the probability of acceptance should be independent of our consumer’s original

order choice.22

This independence constraint can be directly tested in the data by tallying the acceptance

probabilities for each ordered/substitute product pairing and then applying a likelihood-ratio test of

conditional independence. The null hypothesis is that a given good’s probability of being accepted

is independent of the consumer’s original order.

In taking this test to the data, I entertain two specifications. The first includes all or-

dered/substitute product pairings observed in the data. However, this specification suffers from

(potential) amelioration bias, as most substitute products are only offered as substitutes for a small

subset of out-of-stock products. Although I employ the “rule of three” correction,23 the results

should still be interpreted with caution. I therefore prefer a second specification, which focuses

on a smaller analysis sample with only the top ten products in each product category (in terms of

curbside sales among households who experience one or more stockout substitutions).

Under the first specification (which includes all product pairings), the null hypothesis of inde-

pendence is rejected for bottled water (𝑝 < 10−300),24 but not for flour (𝑝 = 0.976).25 However,

in both these categories, more than half of the cells in the three-way contingency table are empty.

Turning to the second specification, which attends only to pairings of the top ten products in each

22Notice that I have not conditioned on the fact that the consumer was offered 𝑠 as a stockout substitute. This
is because the store worker who chooses the substitute does not observe the consumer’s past purchase history, only
the identity of the out-of-stock product. Moreover, it seems unlikely that the worker’s choice of substitute reflects
“unobservable” product characteristics in the spirit of Berry, Levinsohn, and Pakes (1995), as the worker must choose
a substitute quickly (and is probably not an expert about the relevant product category).

23See Jovanovic and Levy (1997) or Tuyl, Gerlach, and Mengersen (2009).
24The log likelihood ratio test statistic is 4398, with 809 degrees of freedom. The latter is computed as

((no. of unique out-of-stock products) − 1)(no. of unique substitute products)

25The log likelihood ratio test statistic is 1607, with 1721 degrees of freedom.

103

category, the null hypothesis of conditional independence is rejected with 𝑝 < 10−300 in both

categories.26

Although the foregoing exercise maps straightforwardly to the conditional logit IPA (as ex-

pressed in Section 3.3), it suffers from two drawbacks. First, there are many products within each

category. This makes it difficult to discern why the conditional logit IPA is, or is not, satisfied

within a category. And second, the exercise is removed from empirical practice. It is not common

practice to estimate consumers’ tastes for individual goods (i.e., with product dummies). Rather, it

is customary to parameterize utility as a linear index of product characteristics such as brand or size.

I therefore emphasize a different descriptive exercise which focuses on product characteristics, as

opposed to specific substitute/out-of-stock product pairings. This exercise centers on the following

corollary to the conditional logit IPA (Theorem 1).

The conditional logit IPA imposes independence between the following:

(1) The identity of the out-of-stock product

(2) The decision to accept or reject a given substitute

Provided that utility is a linear index of product characteristics, the succeeding pair of factors should

also be mutually independent:27

(1A) The characteristics of the out-of-stock product

(2A) The decision to accept or reject a substitute with given characteristics

In other words, a substitute is no likelier to be accepted if its characteristics closely resemble those of

the out-of-stock product than if they are highly dissimilar. Rather, what matters is the “popularity”

of the substitute’s characteristics. Are the substitute’s characteristics—brand, size, flavor, etc.—

ones that feature in a large share of orders? If so, the substitute affords high representative utility

26The likelihood ratio test statistics are 3099 and 515 for the product categories of bottled water and flour,

respectively. There are 80 degrees of freedom in each case.

27This follows from the definition of conditional independence; factors (1A) and (2A) are more aggregate partitions

of the product space than are factors (1) and (2), respectively.

104

and,28 in consequence, will enjoy a comparatively high acceptance probability. On the other hand, if

the substitute’s characteristics appear in only a small fraction of orders, then its representative utility

must be relatively small, in which case it will suffer a comparatively low acceptance probability. At

all events, the extent to which the product is substitutable for the out-of-stock product—as indicated

by its (dis)similarity in observable characteristics—is irrelevant.

Table 3.2 presents the results of this test for the product categories of bottled water and flour.

For each category, the leftmost column lists the characteristics that differentiate products within the

category. (For instance, bottled water is differentiated with respect to four characteristics: brand,

the number of bottles in the case, the size of each bottle, and the type of water.) Then the second

and third columns catalog possible pairings of the out-of-stock product and substitute’s versions

of a given characteristic. Where polytomous characteristics are concerned (such as brand or bottle

count),29 there are too many versions of the characteristic to enumerate all possible pairings. I

therefore report results solely for the top two versions of each characteristic.30 (For example, the

top two brands of bottled water are Ice Mountain and the store’s private label.) Finally, for each

pairing of the substitute and out-of-stock products’ versions of the characteristic, the remaining

columns report the probability of acceptance as well as the number of observations.

According to the conditional logit IPA, the probability of acceptance should depend only on

the “popularity” of the substitute’s characteristics;31 whether they match the out-of-stock product’s

characteristics should be immaterial. However, Table 3.2 does not support this prediction. To

see why, consider a specific characteristic within a product category (such as brand). Notice that

the four rows corresponding to the characteristic are ordered on (i) the substitute’s version of the

characteristic and then (ii) the out-of-stock product’s version. For under the conditional logit

IPA, the probability of acceptance should only depend on the substitute’s version of the indicated

characteristic, not on the out-of-stock product’s version. Hence, among the four rows for a given

characteristic, the probability of acceptance should be the same for the first and second rows, as

28By “representative utility,” I mean the modeled portion of utility (as opposed to the error term).
29That is, characteristics with more than two distinct realizations.
30Within the analysis sample, comprising purchases by households with 1+ attempted substitutions.
31Formally, on the representative utility afforded by the substitute’s characteristics.

105

Characteristic

Brand

No. of bottles

Size of individual bottles

Water type

Brand

Quantity

Type of flour

Whether bleached or not

Table 3.2: Testing the Conditional Logit IPA

Panel A. Bottled water

Out-of-stock
product’s version

Private label
Ice Mountain

Ice Mountain
Private label

Substitute’s
version

Private label
Private label

Ice Mountain
Ice Mountain

Prob.
accept

0.918
0.835

0.890
0.931

0.861
0.930

0.918

0.878
0.789

0.850
0.718

0.905
0.845

0.894
0.801

0.948
0.892

0.863
0.938

0.908
0.955

0.938
0.936

0.942
0.825

0.911
0.840

0.961
0.899

0.898
0.929

No. of
obs.

30,918
8283

8903
11,628

69,823
17,311

0
4712

84,439
1495

787
840

33,260
16,346

37,955
19,619

4614
1719

3954
838

17,887
1587

1013
2008

19,966
1778

1983
344

9639
4398

8021
2686

24
40

40
24

16.9 fl oz
8 fl oz

8 fl oz
16.9 fl oz

Spring
Purified

Purified
Spring

Private label
King Arthur

King Arthur
Private label

5 lb
2 lb

2 lb
5 lb

24
24

40
40

16.9 fl oz
16.9 fl oz

8 fl oz
8 fl oz

Spring
Spring

Purified
Purified

Panel B. Flour

Private label
Private label

King Arthur
King Arthur

5 lb
5 lb

2 lb
2 lb

All-purpose flour All-purpose flour
All-purpose flour

Bread flour

Bread flour
All-purpose flour

Bleached
Unbleached

Unbleached
Bleached

Bread flour
Bread flour

Bleached
Bleached

Unbleached
Unbleached

Notes: This table presents the probability of a stockout substitute being accepted, conditional
on its own version of a specific characteristic as well as that of the out-of-stock product. If
the characteristic in question takes more than two values (as is the case for “brand” in all three
product categories), only the top two versions of the characteristic are considered (based on
purchases by households with one or more curbside stockouts).

106

well as for the third and fourth rows. For example, the first and second (third and fourth) rows of

panel A both concern stockouts in which the substitute is sold under the private label (Ice Mountain

brand). Per the conditional logit IPA, the first and second (third and fourth) rows should thus report

identical acceptance probabilities.

In point of fact, the probability of acceptance tends to be greater when the out-of-stock product

and the substitute share the same version of the characteristic than when they feature different

versions. This is intuitive; one would expect consumers to prefer substitutes that resemble their

first-choice products.

There are several apparent departures from this pattern. For example, a comparison of the

third and fourth rows in Panel A suggests that an Ice Mountain-branded substitute is likelier to be

accepted if the consumer had originally ordered a private-label product than if she had ordered

an Ice Mountain product. Results of this kind appear to arise for two reasons. First, where

some characteristics are concerned, consumers who have ordered one version of the characteristic

are likelier to accept than consumers who have ordered the other—irrespective of the substitute’s

version. Such is the case for bottled water brands. Whether the substitute is sold under the private

label or under the Ice Mountain brand, it is likelier to be accepted if the consumer had originally

ordered the private label than if she had originally ordered Ice Mountain. As to the second

source of these discrepancies, it concerns the finitude of the product space within a particular

product category. Because the store cannot find a substitute that exactly matches the substitute

on all characteristics, it will settle for one that matches it in some characteristics but not others.

As a result, dissimilarity between the substitute and the out-of-stock product with respect to one

characteristic is often associated with similarity with another (see Table 3D.1 in Chapter 3D for a

correlation matrix). And if the first characteristic is less important to the consumer than the second,

the result will be an inverse correlation between the probability of acceptance and the substitute’s

sharing the first characteristic with the out-of-stock product.

To illustrate, consider a stockout event involving flour. For most consumers, the characteristic

of flour type matters more than the characteristic of quantity does. So, given the choice, a consumer

107

would probably prefer a substitute that matches the out-of-stock product’s flour type (but not its

quantity) over an alternate substitute that matches the out-of-stock product’s quantity (but not its

flour type). In addition, there is an inverse correlation between (i) being offered a substitute that

matches the out-of-stock product’s flour type and (ii) being offered a substitute that matches the

out-of-stock product’s quantity (as reported in Table 3D.1). The result is an inverse correlation

between acceptance and the substitute’s sharing the out-of-stock product’s quantity.

That the (dis)similarity of the offered substitute’s characteristics to those of the out-of-stock

product is predictive of acceptance—even conditional on the substitute’s characteristics—is incon-

sistent with the conditional logit IPA. This finding is hardly unexpected. In most differentiated

products markets, consumers exhibit heterogeneous preferences over observable characteristics.

And, in the context of curbside pickup, an individual consumer’s order choice should provide

some indication of her tastes (which may differ from the population “average”). The result is a

positive correlation between (i) the similarity of the substitute and out-of-stock product and (ii)

the probability of acceptance (with a few exceptions due to the finitude of the product space, as

described above).

3.5.2 The Mixed Logit IPA

Having tested the IPA property of conditional logit, I now turn to its mixed logit counterpart.

To see how Corollary 1 relates to curbside pickup, consider the same consumer 𝑖 as in the preceding

subsection. (Recall that she ordered good 𝑗 at time 𝑡 and, after 𝑗 went out of stock, was offered

good 𝑠 as a stockout substitute.)

Unlike conditional logit, mixed logit allows our consumer’s random taste coefficients (𝛽𝑖, 𝛼𝑖)

to differ from those of other consumers. How does this affect the probability of acceptance? Per

Corollary 1,

Pr[𝑖 accepts 𝑠 | 𝑖 ordered 𝑗; 𝛽𝑖, 𝛼𝑖] = Pr[𝑢𝑖𝑠𝑡 ⩾ 𝑢𝑖0𝑡 | 𝑢𝑖 𝑗𝑡 = max 𝑗 ′∈J 𝑢𝑖 𝑗 ′𝑡; 𝛽𝑖, 𝛼𝑖]

= Pr[𝑢𝑖𝑠𝑡 ⩾ 𝑢𝑖0𝑡 | 𝛽𝑖, 𝛼𝑖]

= Pr[𝑖 accepts 𝑠 | 𝛽𝑖, 𝛼𝑖]

108

In other words, our consumer’s order choice should be uninformative of her decision to accept or

reject the substitute, conditional on her time-invariant tendency to like or dislike its observable

characteristics.32

Table 3.3: Stylized Example of the Mixed Logit IPA

Consumer P

Consumer M

Trip Order Substitute Prob. accept Order Substitute Prob. accept

1
2
3
4

PL
PL
PL
IM

PL’
PL’

𝑝𝑖
𝑝𝑖𝑖

IM
IM
PL
IM

PL’
PL’

𝑝𝑖𝑖𝑖
𝑝𝑖𝑣

Note: Products PL and PL’ are sold under the private label, while good IM is sold under the Ice
Mountain brand.

To see the significance of this constraint, consider two consumers who regularly order bottled

water for curbside pickup. One of them, consumer P, usually purchases the private label;33 whereas

the other, consumer M, typically opts for Ice Mountain. Table 3.3 summarizes their orders and

stockout substitutions. On trips 1 and 2, each consumer orders her customary brand, with consumer

P choosing product PL (one of the private label’s offerings) and consumer M opting for product

IM (one of Ice Mountain’s). On trips 3 and 4, by contrast, their orders coincide exactly, with both

choosing product PL on trip 3 and product IM on trip 4. However, on trips 3 and 4, both consumers’

orders go out of stock, and they are each offered product PL’ as a substitute. Assume that PL’ shares

the same brand as PL—namely, the private label—and is generally a closer substitute for PL than

for IM.

How does the probability of acceptance vary across these four (attempted) substitutions? In-

tuitively, there are two key determinants of acceptance or rejection here:

(i) the consumer’s

time-invariant tendency to like or dislike the characteristics of the substitute, and (ii) trip-specific

considerations. To see how these factors figure in our stylized example, let 𝑝𝑖 and 𝑝𝑖𝑖 denote

32In my differentiated products demand framework, as well as that in Berry, Levinsohn, and Pakes (2004), there is
only one source of within-consumer variation in a particular good’s representative utility: price changes (for which I
include controls in the descriptive exercises below). Although some studies, such as Grieco, Murry, and Yurukoglu
(2023), accommodate secular shifts in goods’ representative utility over time, they do so at the market level (as opposed
to the household level).

33That is, the store’s eponymous brand of groceries.

109

the probability that consumer P accepts PL’ on her third and fourth trips, respectively. Likewise,

let 𝑝𝑖𝑖𝑖 and 𝑝𝑖𝑣 denote the probability that consumer M accepts PL’ on her third and fourth trips,

respectively.

Focus first on the consumers’ time-invariant tendencies to (dis)like the characteristics of the

substitute, PL’. Recall that consumer P tends to favor the private label over Ice Mountain, whereas

consumer M exhibits the reverse tendency. Thus, the substitute PL’ shares the same brand as

consumer P’s go-to product, but does not share the brand of consumer M’s. As a result, when the

two consumers have ordered the same product, consumer P should be likelier to accept PL’ as a

substitute than is consumer M. In other words, 𝑝𝑖 should exceed 𝑝𝑖𝑖𝑖 and 𝑝𝑖𝑖 should exceed 𝑝𝑖𝑣.

This intuition is supported by the mixed logit IPA, which allows a given substitute’s acceptance

probability to vary based on individual consumers’ (heterogeneous) time-invariant tendencies to

like or dislike the substitute’s observable characteristics (here, its brand).

Turning to trip-specific considerations, note that consumers sometimes deviate from their usual

order behavior due to unusual circumstances. Take the case of consumer P’s order on trip 4, for

example. Although consumer P usually prefers the private label, here she departs from this pattern

and orders the Ice Mountain brand instead. This departure suggests the presence of trip-specific

circumstances that make Ice Mountain more attractive than usual, relative to the private label.

Perhaps she is hosting guests who are partial to Ice Mountain, whereas on previous trips she was

shopping just for herself (and could therefore purchase the private label, which she prefers). At all

events, her decision to pass over the private label in favor of Ice Mountain suggests that she may be

less amenable to a private-label substitute than usual. One would therefore expect 𝑝𝑖𝑖 to be smaller

than 𝑝𝑖. By similar logic, consumer M’s uncharacteristic decision to order the private label in trip

3, as opposed to her go-to brand (Ice Mountain), suggests that she may be more amenable to a

private-label substitute than usual. Consequently, one would expect 𝑝𝑖𝑖𝑖 to exceed 𝑝𝑖𝑣. However,

neither of these intuitions are consistent with the mixed logit IPA, under which consumers’ order

choices should be independent of the probability of accepting a given substitute. Here, this means

that 𝑝𝑖 = 𝑝𝑖𝑖 and 𝑝𝑖𝑖𝑖 = 𝑝𝑖𝑣.

110

Are the foregoing predictions of the mixed logit IPA consistent with the data? To provide insight,

I estimate a probit model in which the probability of acceptance depends on (i) the extent to which

the substitute’s characteristics resemble those of the out-of-stock product, and (ii) the consumer’s

time-invariant tendency to like or dislike the characteristics of the substitute. Regarding (i), I

include a set of indicators variables for the substitute’s sharing a given characteristic 𝑘 (such as

brand) with the out-of-stock product. Let same𝑖𝑘 = 1 if consumer 𝑖 is offered a substitute that

shares characteristic 𝑘 with the out-of-stock product, and same𝑖𝑘 = 0 otherwise. As for (ii), I

proxy for the consumer’s time-invariant tendency to like (or dislike) the substitute’s characteristics

as follows. Leveraging the panel structure of the data, I compute the fraction of the consumer’s

shopping trips—past, present, and future—in which the purchased product shares the substitute’s

version of characteristic 𝑘.34 I denote the resulting fraction by frac𝑖𝑘 . The intuition is that, if the

consumer likes the substitute’s version of a given characteristic, a large fraction of her purchases

will feature it; whereas if she dislikes it, only a small fraction will. To illustrate, I return to the

stylized example about bottled water buyers in Table 3.3. Minding that this example centers on

the product characteristic of brand, consider trip 3. Both consumers’ preferred products go out of

stock on this trip, and both of them are offered PL’ as a substitute. Concentrate first on consumer

P. Of the four trips observed in the data, she chooses product PL on three and product IM on one.

Only the former product is sold under the same brand as the substitute PL’—namely, the private

label—so the proxy variable frac𝑃,brand equals three-fourths. Now turn to consumer M, who opts

for product IM on three of her four trips and product PL on the remaining one. As the latter (but

not the former) shares the brand of the substitute PL’, the variable frac𝑀,brand equals one-quarter.

Observable characteristics aside, the price of the substitute may also be informative of the

decision to accept or reject. In particular, acceptance may be less likely if the substitute is perceptibly

pricier than the out-of-stock product. For this reason, I permit the probability of acceptance to

depend on the difference between the substitute’s price (𝑝𝑖,sub) and that of the out-of-stock product

34Where curbside pickup is concerned, I define the consumer’s “purchase” as the product that she originally
ordered—even if it goes out of stock and she purchases a substitute. (In that event, her original order choice will be
more informative of her preferences than the substitute, which is chosen by the store.)

111

(𝑝𝑖,OOS).35

All told, I take the following probit model to the data. Letting 𝑎𝑖 = 1 if consumer 𝑖 accepts and

𝑎𝑖 = 0 otherwise, I estimate:

𝑎𝑖 =

1

0





if 𝑎★

𝑖 ⩾ 0

if 𝑎★

𝑖 < 0,

where

𝑎★
𝑖 =

𝐾
∑︁

𝑘=1

(𝛾𝑘 same𝑖𝑘 + 𝜁𝑘 frac𝑖𝑘 ) + 𝜂 · ( 𝑝𝑖,sub − 𝑝𝑖,OOS) + 𝜐𝑖,

and 𝜐𝑖 is distributed i.i.d. standard normal.

Under the mixed logit IPA, whether the substitute matches the out-of-stock product’s version

of a characteristic 𝑘 (as captured by the same𝑖𝑘 variable) should be uninformative of acceptance,

conditional on how often the consumer purchases products with the substitute’s version of the

characteristic (as given by the frac𝑖𝑘 variable). So, if consumers’ behavior is consistent with the

mixed logit IPA, the 𝜁𝑘 ’s should be positive whereas the 𝛾𝑘 ’s should be indistinguishable from zero.

To illustrate how the mixed logit IPA would manifest in the data, I revisit the stylized example

about water bottle buyers in Table 3.3. Recall that the consumers were offered good PL’ as a

substitute on two occasions: trip 3, when both consumers had originally ordered good PL, and

trip 4, when both consumers had ordered good IM. Now suppose that the consumers’ behavior is

consistent with the mixed logit IPA—that is, 𝛾brand = 0. Although the substitute (PL’) shares the

same brand as the consumers’ preferred good on trip 3 (PL) but not their preferred good on trip 4

(IM), the probability of acceptance should be the same on both trips for each consumer. That is,

𝑝𝑖 = 𝑝𝑖𝑖 and 𝑝𝑖𝑖𝑖 = 𝑝𝑖𝑣.

Turning to the regression results, Table 3.4 reports the average marginal effects of the explanatory

variables.36 Notice that there are two variables for each observable characteristic: an indicator for

35As discussed in Section 3.4, I do not observe the out-of-stock product’s price. Instead, I search the data for the
nearest date on which the out-of-stock product was purchased at the store in question. Then I impute the out-of-stock
product’s price as being the average purchase price on the date in question. For details on how I impute prices, see
Section 3.6.

36Specifically, I compute the average marginal effect of a change in each variable on the probability of acceptance.
(By “average,” I mean the following. First, I compute the variables’ marginal effects for each individual observation;

112

Table 3.4: Testing the Mixed Logit IPA: Average Marginal Effects
from Probit Regressions

Variable

Brand

Sub shares OOS product’s version

Frac. of purchases with sub’s version

No. of bottles

Sub shares OOS product’s version

Frac. of purchases with sub’s version

Size of each bottle

Sub shares OOS product’s version

Frac. of purchases with sub’s version

Water type

Sub shares OOS product’s version

Frac. of purchases with sub’s version

Flour type

Sub shares OOS product’s version

Frac. of purchases with sub’s version

Quantity

Sub shares OOS product’s version

Frac. of purchases with sub’s version

Whether bleached or not

Sub shares OOS product’s version

Frac. of purchases with sub’s version

Sub’s price – OOS product’s price

Observations
Pseudo 𝑅2

Product category

Bottled water

Flour

−0.012***
(0.004)
0.147***
(0.006)

−0.023***
(0.003)
0.038***
(0.005)

0.032***
(0.004)
0.049***
(0.006)

0.043***
(0.003)
0.092***
(0.005)

0.021***
(0.001)

82,001
0.0672

−0.066***
(0.007)
0.062***
(0.011)

0.124***
(0.008)
0.027**
(0.010)

−0.064***
(0.009)
0.016
(0.011)

0.003
(0.007)
0.038***
(0.010)
−0.003
(0.002)

14,181
0.0720

Notes: The dependent variable is whether a stockout substitute is accepted (=1) or
rejected (=0). The table reports average marginal effects, not coefficients. Standard
errors are in parentheses. (Because some households experience multiple stockouts,
the standard errors are clustered at the household level.)

* Significant at the 10 percent level.
** Significant at the 5 percent level.
*** Significant at the 1 percent level.

113

the substitute’s sharing the out-of-stock product’s version of the characteristic, and a scalar variable

for the fraction of the consumer’s shopping trips where the purchased product shares the substitute’s

version of the characteristic. The table is organized so that the coefficients on the former (i.e., the

𝛾𝑘 ’s) are situated above the coefficients on the latter (i.e., the 𝜁𝑘 ’s).

As far as bottled water is concerned, consumers’ behavior seems to be consistent with the

mixed logit IPA. For all four characteristics, the marginal effect associated with the fraction of

purchases that share the substitute’s version of the characteristic is much larger in magnitude than

the marginal effect associated with the substitute’s (not) sharing the out-of-stock product’s version

of the characteristic. This pattern is particularly pronounced where brand and water type are

concerned. All else equal, acceptance is 14.7 (9.2) percentage points likelier if the consumer nearly

always purchases products with the substitute’s brand (water type) than if she virtually never does

so.

By contrast, the results for flour are difficult to reconcile with the mixed logit IPA. Whether

the substitute matches the out-of-stock product’s brand, flour, or quantity is highly predictive of

acceptance—even conditional on the frequency with which the consumer purchases the substitute’s

versions of these characteristics. This is especially true where flour type is concerned; acceptance

is 12.4 percentage points more likely if the substitute shares the out-of-stock product’s flour type

than if it does not. Notice that this marginal effect greatly exceeds that associated with the fraction

of trips where the purchased product shares the substitute’s flour type; a consumer who almost

always purchases the substitute’s flour type is only 2.7 percentage points likelier to accept than a

consumer who virtually never purchases the substitute’s flour type.

Why is the flour type of the out-of-stock product so predictive of the substitute’s acceptance or

rejection? Recall from Section 3.1 that specific types of flour are suited to specific recipes—bread

flour for bread, all-purpose flour for cupcakes, etc. Hence, if a consumer has a particular recipe in

mind when she places her order, she will choose a flour of the corresponding type. She is therefore

likely to prefer a substitute of the out-of-stock product’s flour type over a substitute of a different

and second, I take the average across all the observations. An alternative approach, which I do not employ, is to
compute the marginal effects at the sample means.)

114

flour type—even a flour type that she purchases more frequently—as only the former would enable

her to bake the intended recipe (without modification).

In contrast to flour type, the marginal effect of the substitute’s sharing the brand or quantity

of the out-of-stock product is negative. At face value, this means that the substitute is likelier to

be accepted if it differs from the out-of-stock product with respect to these characteristics than

if it matches them. However, this counterintuitive result probably reflects the limitations of this

reduced-form exercise, which—among other omissions—largely abstracts from the role of price.

Discussion.—These results provide suggestive evidence that consumers’ purchases of bottled

water are consistent with the mixed logit IPA, whereas their purchases of flour are not. The key

difference between the categories is the amount of within-consumer preference variation. Regarding

bottled water, individual consumers’ preferences largely persist over time. By contrast, individual

consumers’ preferences for flour appear to vary considerably between trips, perhaps due to variation

in intended recipes (for which specific flour types may be optimal).

However, these results also highlight the limitations of reduced-form analysis with respect to

testing the IPA property of mixed logit; some determinants of acceptance are difficult to capture

without an explicit model of consumer preferences. For this reason, the next section adopts a

structural approach to testing the mixed logit IPA.

3.6 Structural Evidence

In this section, I evaluate the extent to which the mixed logit IPA causes bias in the estimation of

demand elasticities. To do so, I estimate demand for bottled water and flour using mixed probit—

which does not suffer from an IPA constraint—as well as mixed logit. The demand framework

includes consumers’ in-store purchases, curbside orders, and decisions to accept or reject stockout

substitutes. Then, with the estimated models in hand, I compare mixed probit and mixed logit’s

goodness of fit, both within- and out-of-sample. As I do so, I attend especially to the data on

consumers’ acceptance or rejection of stockout substitutes. This is because mixed logit, due to its

IPA property, imposes conditional independence between a given consumer’s order choice and her

subsequent decision about the substitute (given the realization of her random taste coefficients).

115

Mixed probit, by contrast, does not impose this independence property.

To enable mixed logit to compete with mixed probit on the best possible footing, I nonpara-

metrically estimate the joint distribution of the random coefficients. This ensures that consumers’

random taste coefficients provide the most accurate possible representation of their (time-invariant)

tendencies to like or dislike substitutes’ observable characteristics, thereby minimizing the influence

of the mixed logit IPA.37 My estimation method adapts the fixed grid approach from Fox, Kim, and

Yang (2016) and Train (2008). In the case of mixed probit, I employ a novel grid search approach

to permit (some) correlation in the error terms.

3.6.1 Model

For simplicity, the conceptual framework in Section 3.5 focused on curbside pickup. Here, I

extend this framework to include in-store purchases and home delivery as well as curbside pickup.

This provides more observations per consumer, facilitating the identification of the distribution of

random taste coefficients.

Consider a consumer 𝑖 who is shopping at time 𝑡. Irrespective of shopping channel (in-person,

home delivery, or curbside pickup),38 she faces a choice between 𝐽𝑡 differentiated goods and an

outside option of no purchase (“good 0”). She will choose whichever good 𝑗 ∈ J𝑡 ≡ {0, 1, . . . , 𝐽𝑡 }

maximizes her conditional indirect utility 𝑢𝑖 𝑗𝑡. As in Section 3.5.2, utility is a consumer-specific

function of product characteristics (𝑥 𝑗 ) and price (𝑝 𝑗𝑡):

𝑢𝑖 𝑗𝑡 = 𝑥 𝑗 𝛽𝑖 − 𝛼𝑖 𝑝 𝑗𝑡 + 𝜀𝑖 𝑗𝑡 .

Unlike in Sections 3.5.1 and 3.5.2, the distribution of the error term 𝜀𝑖 𝑗𝑡 now depends on the model.

It is i.i.d. Gumbel in mixed logit, and i.i.d. multivariate normal in mixed probit.

If the consumer has placed an order for curbside pickup, her preferred product 𝑗 may go out of

stock. In that event, the store will offer a substitute 𝑠 ∈ J \ {0, 𝑗 }. The consumer will accept the

37The mixed logit IPA only imposes independence between the order and the accept/reject decision conditional on
representative utility. Thus, misspecification of representative utility could lead to spurious failures of the mixed logit
IPA.

38In principle, some goods with a small market share may be solely offered for in-store purchase (as opposed to
home delivery or curbside pickup). However, in my empirical estimation, I drop less popular products (because discrete
choice models struggle to accommodate alternatives with negligible choice shares). And, unpopular products aside,
the online choice set should coincide with its in-store counterpart (e.g., prices should be identical).

116

substitution if and only if 𝑢𝑖𝑠𝑡 ⩾ 𝑢𝑖0𝑡, where 𝑢𝑖0𝑡 ≡ 𝜀𝑖0𝑡 denotes the utility of the outside option.

Identification.—In what follows, I employ a nonparametric mixture estimator for both mixed

logit and mixed probit. Do the data afford sufficient variation to support this estimation method?

Regarding mixed logit, Fox et al. (2012) prove that the model is nonparametrically identified under

fairly minimal data requirements (e.g., local variation in product characteristics). As for mixed

probit, Iaria and Wang (2023) show that the model is semi-nonparametrically identified. That is,

taking as given that the error terms are distributed i.i.d. multivariate normal, the distribution of

random coefficients is nonparametrically identified.

3.6.2 Estimation Method

I estimate the joint distribution of random coefficients (𝛽𝑖, 𝛼𝑖) nonparametrically. Following

Fox, Kim, and Yang (2016), I approximate the distribution function using a “fixed grid” estimator.

In this approach, a fixed grid of heterogeneous coefficients is selected before estimation. Then the

probability weights on the (pre-specified) grid points are estimated. In what follows, I first derive

the likelihood function for these weight parameters. (In so doing, I borrow from the exposition

in Heiss, Hetzenecker, and Osterhaus [2022].) Then I explain the expectation-maximization (EM)

algorithm employed to maximize the likelihood function, as well as the simulation required for the

mixed probit model. To keep this subsection focused, a discussion of the tuning parameters (such

as the number and location of the grid points) is relegated to Chapter 3E. I do the same with respect

to the grid-search estimator for correlated errors in the mixed probit model.

The task is to estimate the joint distribution 𝐹 (𝛽, 𝛼) of random coefficients.

I employ a

finite-dimensional sieve approximation that divides the support of (𝛽, 𝛼) into a grid of 𝑅 fixed

vectors:

B =

(𝛽1, 𝛼1)
...

(𝛽𝑅, 𝛼𝑅)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

Having chosen the grid B, I estimate the probability weights 𝜃 = (𝜃1, . . . , 𝜃 𝑅) on each of the

coefficient vectors in B. The weight 𝜃𝑟 on a coefficient vector 𝛽𝑟 ∈ B depends on the extent to

117

which it is representative of tastes across the population of consumers. To derive 𝜃𝑟, focus first on

an individual consumer 𝑖. Let choose𝑖 𝑗𝑡 = 1 if good 𝑗 is her most-preferred product on trip 𝑡—that

is to say, the ordered product (online) or the purchased product (in-store)—and let choose𝑖 𝑗𝑡 = 0

otherwise.39

Supposing that trip 𝑡 is curbside pickup, consumer 𝑖’s ordered good—say,

𝑗—may go out

of stock before pickup.

In that event, she will be offered a substitute good 𝑠 ≠ 𝑗. To notate

stockout substitutions, let OOS𝑖 𝑗𝑡 = 1 if ordered good 𝑗 goes out of stock on trip 𝑡 and OOS𝑖 𝑗𝑡 = 0

otherwise.40 And, conditional on ordered good 𝑗 going out of stock, let accept𝑖𝑠𝑡 = 1 if the

consumer 𝑖 accepts good 𝑠 as a substitute on trip 𝑡 and accept𝑖𝑠𝑡 = 0 otherwise.

Due to the panel nature of the data, individual consumers are observed making repeated choices

over time. Consequently, the likelihood criterion concerns the probability of observing the entire

sequence of choices made by each consumer (Train 2009). Assuming that 𝛽𝑟 represents the true

tastes of consumer 𝑖, this is given by

𝑃𝑖 | 𝛽𝑟, 𝛼𝑟 ≡

(cid:32)

(cid:214)

(cid:214)

𝑡∈T𝑖

𝑗 ∈J𝑡

(cid:0) Pr[choose 𝑗 | 𝑥𝑡; 𝛽𝑟, 𝛼𝑟](cid:1) choose𝑖 𝑗𝑡

(cid:16) (cid:214)

(cid:0) Pr[accept 𝑠 | choose 𝑗; 𝑥𝑡; 𝛽𝑟, 𝛼𝑟](cid:1) accept𝑖𝑠𝑡 (cid:17) OOS𝑖 𝑗𝑡

(cid:33)

,

𝑠∈J𝑡 \{ 𝑗 }

where T𝑖 denotes the set of all her trips.

Of course, consumer 𝑖’s true tastes are unknown to the researcher. To recover the unconditional

probability of her observed sequence of choices, compute the weighted average of the conditional

choice probabilities (𝑃𝑖 | 𝛽𝑟, 𝛼𝑟) associated with each taste vector (𝛽𝑟, 𝛼𝑟) ∈ B:

𝑃𝑖 ≡

𝑅
∑︁

𝑟=1

𝜃𝑟 (𝑃𝑖 | 𝛽𝑟, 𝛼𝑟).

In this equation, the probability weights 𝜃𝑟 measure the prevalence of tastes (𝛽𝑟, 𝛼𝑟) across the

population of consumers.

39In a slight abuse of notation, I now use 𝑡 to index an individual consumer’s trips, as opposed to time.
40Where in-store shopping and home delivery are concerned, OOS𝑖 𝑗𝑡 = 0 for all goods 𝑗.

118

Finally, compute the log-likelihood criterion by summing 𝑃𝑖 over the population of consumers:

L =

1
𝑁

𝑁
∑︁

𝑖=1

log(𝑃𝑖).

Computing the Conditional Choice Probabilities.—So far, I have abstracted away from the

calculation of conditional choice probabilities. In the case of mixed logit, they take a closed form.

The probability that 𝑖 orders (purchases) good 𝑗 while shopping online (in-store) is given by

Pr[order 𝑗 | 𝑥𝑡; 𝛽𝑟, 𝛼𝑟] =

exp(𝑥 𝑗 𝛽𝑟 − 𝛼𝑟 𝑝 𝑗𝑡)
(cid:205) 𝑗 ′∈J exp(𝑥 𝑗 ′ 𝛽𝑟 − 𝛼𝑟 𝑝 𝑗 ′𝑡)

.

If trip 𝑡 is curbside pickup, the probability that she accepts a substitute 𝑠 ∈ J𝑡 \ { 𝑗 } is

Pr[accept 𝑠 | order 𝑗; 𝑥𝑡; 𝛽𝑟, 𝛼𝑟] = Pr[accept 𝑠 | 𝑥𝑡; 𝛽𝑟, 𝛼𝑟]

=

exp(𝑥𝑠 𝛽𝑟 − 𝛼𝑟 𝑝𝑠𝑡)
1 + exp(𝑥𝑠 𝛽𝑟 − 𝛼𝑟 𝑝𝑠𝑡)

.

The former equality follows from the mixed logit IPA, under which a consumer’s initial order is

uninformative of her accept/reject decision about the substitute (conditional on her time-invariant

tastes).

Where mixed probit is concerned, the conditional choice probabilities lack closed forms and

must be simulated. To improve the accuracy of the simulated probabilities, I do not draw the

simulated error terms directly from a multivariate normal distribution. Rather, I draw the error

terms from a scrambled “Sobol’ sequence” (Sobol’ 1967).41 For a given number of draws, this

quasi-Monte Carlo method should more closely approximate the underlying distribution than would

pseudo-random draws from the corresponding multivariate normal distribution.42

Simulation proceeds as follows.

I take 𝑄 quasi-Monte Carlo draws, indexed by 𝑞 ∈ Q ≡

{1, . . . , 𝑄}. For each draw 𝑞, I draw a vector of low-discrepancy multivariate normal errors

41Concerning a related simulation problem—namely, computing parametric mixed logit choice probabilities—
recent work by Czajkowski and Budziński (2019) suggests that scrambled Sobol’ sequences are more efficient than
alternative simulation methods, such as scrambled Halton sequences and modified Latin hypercube sampling.

42To preserve the balance properties of this quadrature rule, it is necessary that the total number of random draws—
that is, the product of (i) the number of orders and (ii) the number of simulations—be a power of two (Virtanen et al.
2020). Throughout, I choose the number of simulations (as well as the number of orders modeled) so that this condition
is satisfied.

119

(𝜀𝑞

𝑖 𝑗𝑡) 𝑗 ∈J𝑡 for each consumer 𝑖 and trip 𝑡.43 Then the probability of consumer 𝑖 ordering (purchasing)
good 𝑗 while shopping online (in-store), conditional on having tastes (𝛽𝑟, 𝛼𝑟), is approximated by

the fraction of draws 𝑞 in which 𝑗 maximizes her conditional indirect utility 𝑢𝑖 𝑗𝑡. That is,

ˆPr[order 𝑗 | 𝑥𝑡; 𝛽𝑟, 𝛼𝑟] =

1
𝑄

∑︁

𝑞∈Q

1 (cid:2)𝑢𝑟𝑞

𝑖 𝑗𝑡 = max 𝑗 ′∈J𝑡 {𝑢𝑟𝑞

𝑖 𝑗 ′𝑡 }(cid:3),

where

𝑖 𝑗𝑡 ≡ 𝑥 𝑗 𝛽𝑟 − 𝛼𝑟 𝑝 𝑗𝑡 + 𝜀𝑞
𝑢𝑟𝑞
𝑖 𝑗𝑡 .

Where curbside pickup is concerned, the probability of accepting a substitute depends on the

consumer’s original order. To see why, suppose that consumer 𝑖 orders good 𝑗 on trip 𝑡, only for

𝑗 to go out stock. Conditional on having true tastes 𝛽𝑟, the researcher knows that the error terms

(𝜀𝑖 𝑗𝑡) 𝑗 ∈J𝑡 satisfy

𝑢𝑟
𝑖 𝑗𝑡 = max
𝑗 ′∈J𝑡

{𝑢𝑟

𝑖 𝑗 ′𝑡 }.

In other words, the consumer’s decision to order 𝑗 is informative of the error terms for the other

goods 𝑗 ′ ≠ 𝑗.

How does this association between order and substitution choices affect estimation? Supposing

that consumer 𝑖 has ordered good 𝑗, any draws 𝑞 such that

𝑢𝑟𝑞
𝑖 𝑗𝑡 < max
𝑗 ′∈J𝑡

{𝑢𝑟𝑞

𝑖 𝑗 ′𝑡 }

can be discarded. If 𝛽𝑟 represents consumer 𝑖’s true tastes, draws of the foregoing description would

result in her placing a different order than the one observed in the data. Hence, I approximate the

probability that 𝑖 accepts a substitute 𝑠 ∈ J𝑡 \ { 𝑗 } as the fraction of the remaining draws

Q★( 𝑗) ≡ {𝑞 ∈ Q : 𝑢𝑟𝑞

𝑖 𝑗𝑡 = max
𝑗 ′∈J𝑡

{𝑢𝑟𝑞

𝑖𝑠𝑡 }}

for which 𝑢𝑟𝑞

𝑖𝑠𝑡 exceeds the (simulated) utility of the outside option, 𝑢𝑟𝑞

𝑖0𝑡. That is,

ˆPr[accept 𝑠 | order 𝑗; 𝑥𝑡; 𝛽𝑟, 𝛼𝑟] =

1
|Q★( 𝑗)|

∑︁

𝑞∈Q★( 𝑗)

1[𝑢𝑟𝑞

𝑖𝑠𝑡 > 𝑢𝑟𝑞

𝑖0𝑡].

43Precisely speaking, these draws are not from a multivariate normal distribution as such. Rather, they are based

on a low-discrepancy Sobol’ approximation (as described above).

120

Because many draws 𝑞 will be discarded, estimation requires a large total number of draws 𝑄.

To avoid unacceptably long run times, I employ the JAX Python library (Bradbury et al. 2018) to

spread computation across multiple GPUs as well as to optimize the code.

Expectation-Maximization (EM) Algorithm.—The fixed grid estimator suffers from a curse of

dimensionality rooted in the number of random coefficients. To be specific, I compute probability

weights on 78,125 fixed grid points in the estimates that follow. This multiplicity of parameters

poses a problem for gradient-based optimization. Inversion of the Hessian can fail, and optimization

may become “stuck” in regions where the likelihood function is inadequately approximated by a

quadratic (Train [2009]).

To surmount this computational difficulty, I employ an “expectation-maximization” (EM) algo-

rithm. Rather than maximizing the likelihood function directly, an EM algorithm instead maximizes

(conditional) expectations of the likelihood while holding various parameters constant by turns.

See Train (2008), Section 6 for a detailed discussion of the EM algorithm used in this paper.44

3.6.3 Data Details

Whenever a consumer purchases something (whether in the store, through curbside pickup,

or via home delivery), the data report the item’s UPC and price. But what about the rest of the

consumer’s choice menu? Which alternatives did she pass over in favor of her preferred product,

and what were their prices?

To reconstruct the consumer’s choice menu, I first consult the chain’s product catalog to see

the UPCs of products in the relevant category. Then I match the resulting list against the UPCs of

products sold at the relevant store according to the scanner data. Regarding availability, I assume

that a product was in the consumer’s choice menu if a different consumer purchased it on the same

day, at the same store. Failing that, I check if the product was purchased on both the day before

and the day after (not necessarily by the same consumer). If neither of these conditions is satisfied,

I assume that the product was not in the consumer’s choice set (either because the product was out

of stock, or because the store did not carry it at all).

44In the case of mixed probit, a slight adjustment is necessary: probit kernels, not logit kernels, are computed for

each agent.

121

Given that a product is (presumably) available to the consumer, I impute its price as being the

mean purchase price on the day of the consumer’s shopping trip (within the relevant store location).

If no purchases are observed on the precise day of the trip, I instead take the unweighted average of

the mean purchase prices on the days immediately before and after.

I employ a slightly different procedure with respect to products that were ordered for curbside

pickup but later went out of stock. Observe that a product of this description was likely on the

shelf at the time that the consumer placed her order.45 That, in turn, suggests that the product was

either available (i) the day of the attempted stockout substitution or (ii) the day before. Accordingly,

I impute the out-of-stock product’s price as being the average purchase price on the day of the

substitution or, failing that, the average purchase price on the day before. If I do not observe any

sales on either day, I impute the price as being the average purchase price on the nearest date for

which observations appear in the data (up to seven days before or after the stockout event).46

All prices are deflated to 2016 dollars using the six-month smoothed CPI (U.S. Bureau of

Labor Statistics).

Due to computational constraints, I cannot model demand for all forty bottled-water products

or all sixty-one flour products. Rather, I exclude slow-selling or unusual products within each

category, leaving me with six bottled water products and ten flour products.47 For the same reason,

I do not perform estimation on all the available data.

Instead, I focus on a random sample of

45Unless a stockout was directly caused by an order for curbside pickup, there may be a delay before the store’s

website indicates that a given item is out of stock.

46The structural estimation here focuses on top-selling products, whose prices are comparatively easy to infer. By
contrast, the reduced-form regression in Section 3.5.2 also includes low-volume products, which may sell infrequently
at a given store. If the procedure defined in the main text fails to recover the price of a low-volume out-of-stock product,
I instead compute the average purchase price for stores in the same (narrowly-defined) geographic area on the nearest
date with observations in the data (once more, up to seven days before or after the stockout event). The assumption is
that stores in the same geographic area will coordinate on discounts (which might be advertised through mass mailings
or billboards). To group stores by location, I rely on the most granular geographic designation in the chain’s internal
system. At all events, the results in Section 3.5.2 are robust to the inclusion or exclusion of observations whose prices
are imputed in this fashion.

47For bottled water, I estimate demand solely for the top six products. Together, these products command a
75% market share among “analysis households” (i.e., households that experience one or more curbside stockout
substitutions). As for flour, I restrict attention to the top three brands (the private label, King Arthur, and Gold Medal)
as well as the top two types of flour (all-purpose and bread). I further exclude products with less than 1.75% market
share, along with the one organic flour with nontrivial sales. (To include that organic product, which represents 2% of
analysis households’ purchases, I would need to add another explanatory variable: an “organic” dummy.) This leaves
me with ten products, which together represent 75% of purchases by analysis households.

122

households within each product category.48

Multi-Product and Multi-Unit Purchases.—In the data, consumers’ purchases depart from

standard discrete choice frameworks in two ways. First, consumers may purchase multiple distinct

products on a single shopping trip. For instance, someone might purchase 24-packs of both Ice

Mountain and Aquafina on one trip. As for the second departure from discrete choice, consumers

may purchase multiple units of a single product on one shopping trip. For example, someone might

purchase two 24-packs of Ice Mountain bottled water on one trip. Multiple purchases of this kind

might be motivated by “stockpiling” to take advantage of discounts.

Within the product categories of bottled water and flour, purchases of more than one product

on a single shopping trip are fairly uncommon. Among analysis households, three (seven) percent

of shopping trips feature purchases of more than one bottled water (flour) product. I exclude all

such transactions from my structural estimation. By comparison, purchases of multiple units of a

single product are much more common. Roughly 25% (12%) of analysis households’ purchases of

bottled water (flour) involve multiple units.

Because standard discrete choice models (such as mixed logit and mixed probit) do not accom-

modate multi-unit purchases, the result may be biased predictions of consumers’ choices. And,

where mixed logit is concerned, these biased predictions could result in apparent violations of the

model’s IPA property that do not reflect within-consumer preference variation, but rather misspec-

ification of the underlying choice problem. To avoid such an outcome, it is important to minimize

the influence of multi-unit purchases on demand estimation.

I do so by estimating demand for

a subset of households who are especially unlikely to make multi-unit purchases. In particular,

I identify households with (i) zero purchases involving multiple units of a single product and (i)

ten or more purchases in total. For an additional discussion of multi-unit purchases (including a

summary of results when households with multi-unit purchases are not dropped), see Chapter 3F.

48To ensure the balance properties of the Sobol’ sequence, it is necessary that the product of the number of sampled
purchases (here, 4096) and the number of simulated error draws (here, 16,384) be a property of two. For the number
of purchases to be exactly 4096, I may drop some of the later purchases made by at most one sampled household.

123

3.6.4 Results: Mixed Probit versus Mixed Logit

The task is to compare mixed logit’s goodness of fit with that of mixed probit, especially in

regard to the alternate-choice data on stockout substitutions. I proceed as follows. First, I draw

a random sample from the set of households with ten or more purchases (and zero multi-unit

purchases) in the data. (Recall from the preceding subsection that I cannot include the universe of

households due to memory constraints.) Then I estimate demand using both mixed logit and mixed

probit. Finally, I compare the two models with respect to the predicted probabilities assigned to the

choices of the same random subset of households that I earlier used in estimation.

In addition to the “within-sample” comparison that I have just described, it is also instructive

to perform an “out-of-sample” comparison. How accurately do the models forecast the choices of

a “holdout sample” of households, whose data were not used in estimation? I will briefly discuss

the motivations for this alternative procedure—as well as its results—later in this subsection.

Within-Sample Goodness of Fit.—Table 3.5 compares the within-sample fit of mixed logit and

mixed probit. Panel A pertains to in-store purchases, home delivery purchases, and curbside pickup

orders; while Panel B attends to stockout substitutions in curbside pickup (i.e., the alternate-choice

data). For each portion of the data, I assess model fit based on the average predicted probability

assigned to consumers’ observed choices.49 I compute these predicted probabilities in two ways.

The first, which I refer to as the “unconditional” approach (after Train [2009]), yields posterior

probabilities based on the (estimated) population distribution of random coefficients. The second

method of computing predicted probabilities, known as the “conditional” approach, leverages the

panel structure of the data to derive posterior probabilities conditional on individual consumers’

observed choices in the data.50 The motivation for reporting both types of predicted probability is

as follows. On the one hand, the “unconditional” approach is more commonly used in the literature.

49In Chapter 3G, I report results for an alternative measure of fit: the fraction of observations in which consumers’

observed choices are assigned the highest predicted probability of any alternative (sometimes termed the “hit rate”).

50When a consumer is observed making multiple decisions, it may become apparent that she likes or dislikes certain
kinds of products. For instance, if a frequent flour buyer always opts for bread flour (as opposed to all-purpose), she
probably likes bread flours more than the “average” consumer does. This intuition can be harnessed to situate the taste
coefficients (𝛽𝑖, 𝛼𝑖) of an individual consumer 𝑖 within the population distribution of random coefficients (Train 2009).
To do so in the context of a fixed-grid model, I follow the steps prescribed by Train (2008).

124

On the other, the “conditional” approach is closer in spirit to the statement of the mixed logit IPA

in Corollary 1 (which conditions on the realizations of consumers’ true random coefficients).

Table 3.5: Goodness of Fit: Mixed Logit versus Mixed Probit

Statistic

No. of households
No. of purchasesa
No. of available productsb

Predicted probability of purchase

. . . using the “unconditional” approachc

. . . using the “conditional” approachd

Panel A. In-store purchases and
online orders

Bottled water

Flour

Mixed
logit

121
4096
4.56
(0.98)

0.284
(0.129)
0.632
(0.315)

Mixed
probit

121
4096
4.56
(0.98)

0.285
(0.130)
0.631
(0.312)

Mixed
logit

405
4096
7.55
(1.67)

0.186
(0.094)
0.545
(0.299)

Mixed
probit

405
4096
7.55
(1.67)

0.186
(0.094)
0.530
(0.293)

Panel B. Stockout substitutions

No. of (attempted) stockout substitutions

. . . of which accepted

147
125

147
125

353
330

353
330

True decision’s predicted probability

. . . using the “unconditional” approachc

. . . using the “conditional” approachd

No. of random coefficients
No. of grid points
No. of simulated error drawsa

0.771
(0.291)
0.902
(0.178)

0.777
(0.293)
0.906
(0.173)

0.878
(0.210)
0.946
(0.159)

0.890
(0.219)
0.944
(0.166)

Panel C. Empirical specification

7
78,125

7
78,125
16,384

7
78,125

7
78,125
16,384

Notes: This table compares the within-sample fit of mixed probit and mixed logit models for the product
categories of bottled water and flour (see Sections 3.6.2 and 3.6.4 for details). Where relevant, standard
deviations appear in parentheses.

a The number of purchases and the number of draws are jointly chosen to maintain the balance

properties of the Sobol’ sequence.

b Excluding the “outside option” of no purchase.
c This corresponds to the posterior probability based on the (estimated) population distribution of

random coefficients.

d This yields the posterior probability of the purchase, conditional on the consumer’s observed choices

in the data.

The relative performance of mixed logit and mixed probit varies by data type. Focus first on

in-store purchases and online orders (Panel A). Regarding the “unconditional” choice probabilities,

125

the models’ fit is comparable; consumers’ observed purchases of bottled water (flour) are assigned

a 0.0 (0.1) percentage point lower predicted probability by mixed logit than by mixed probit. As

for the “conditional” approach, the models’ fit is comparable as far as bottled water is concerned;

the predicted probabilities of consumers’ observed purchases are 0.1 percentage points higher for

mixed logit than for mixed probit. Regarding flour, by contrast, there is a perceptible difference

in the models’ fit. Consumers’ observed purchases are assigned a 1.5 percentage points greater

predicted probability by mixed logit than by mixed probit.

Turn next to stockout substitutions (Panel B). Under the “unconditional” approach, the predicted

probabilities associated with consumers’ observed decisions to accept or reject substitute bottled

waters are 0.6 percentage points greater for mixed probit than for mixed logit. The disparity in

relation to flour is twice as large: 1.2 percentage points. This is consistent with the descriptive

evidence in Section 3.5.2. Since consumers’ preferences for bottled water seem to be largely

consistent with the IPA property of mixed logit, the model should forecast consumers’ acceptance

or rejection of stockout substitutes almost as accurately as mixed probit does. Regarding flour, by

contrast, descriptive evidence suggests that consumers’ preferences may be inconsistent with the

mixed logit IPA. Consequently, mixed probit (which does not suffer from an IPA constraint) should

predict acceptance or rejection more accurately than mixed logit (which does).

It is more difficult to reconcile the results under the “conditional” approach with the descriptive

evidence in Section 3.5.2. Here, mixed probit and mixed logit supply predictions of comparable ac-

curacy with respect to consumers’ accept/reject decisions. Concerning bottled water, the predicted

probabilities associated with consumers’ observed accept/reject decisions are 0.4 percentage points

greater for mixed probit than for mixed logit. As for flour, the predicted probabilities are nearly

identical, being 0.2 percentage points greater for the mixed logit model than for mixed probit.

Over-fitting may have biased this model selection exercise, however. Due to the nonparametric

estimation approach, the mixed probit and mixed logit models feature nearly eighty thousand

parameters each. This complexity enables the models to closely match random noise in the data

126

as well as underlying economic factors.51 Hence, to the extent that within-sample differences in

fit reflect the models’ ability to reproduce random noise (as opposed to systematic determinants

of demand), the results of a within-sample comparison may be biased. I adopt an out-of-sample

approach to address this potential source of bias.

Out-of-Sample Validation.—In contrast to within-sample methods of model selection, out-of-

sample methods assess models’ ability to forecast the choices of a “holdout sample” of consumers

whose data were not used in estimation. The intuition is as follows. To the extent that an

estimated model captures statistical noise, as opposed to systematic determinants of demand, it

will (incorrectly) project this random noise onto the consumers in the holdout sample. As a result,

the model’s accuracy in predicting the choices of the holdout sample depends solely on the extent

to which the model has captured generalizable (and economically meaningful) determinants of

consumers’ choices.52

Out-of-sample validation proceeds as follows. First, I randomly draw a “holdout sample”

of consumers whose data were not used to estimate the models above. And second, I compute

the posterior probabilities using both the “unconditional” and “conditional” approaches. For

both approaches, I rely on the empirical CDF of random coefficients from the estimation results

above. And regarding the “conditional” method, I report posterior probabilities conditional on

consumers’ original orders (as well as their in-store purchases and their orders for home delivery),

but excluding their decisions to accept or reject stockout substitutes. This ensures that, so far as

stockout substitutions are concerned, the validation exercise is predictive in nature.

Table 3.6 reports the results of out-of-sample validation. So far as consumers’ in-store pur-

chases and online orders are concerned, the results remain qualitatively unchanged from the within-

51With a sufficient number of parameters, models can reproduce idiosyncrasies in consumer behavior that should be
attributed to the error term. To see the intuition, picture a consumer who is placing a curbside order for bottled water.
She intends to order a 24-pack of Ice Mountain bottled water, but mistakenly clicks on a 6-pack of Aquafina instead
(and does not spot her error). Although this mistake should be attributed to the error term, a sufficiently complicated
model might nevertheless assign our consumer’s mistake a fairly high predicted probability.

52For an accessible introduction to out-of-sample validation, see Parady, Ory, and Walker (2021); while Zhang
and Yang (2015) provide a more technical discussion. As far as applications are concerned, this approach has been
employed in variety of economic fields, including Health Economics (e.g., Deb and Trivedi 2002) and Agricultural
Economics (e.g., Haener, Boxall, and Adamowicz 2001), as well as Industrial Organization (e.g., Bajari and Benkard
2005).

127

Table 3.6: Out-of-Sample Validation: Mixed Logit versus Mixed Probit

Statistic

No. of households
No. of purchasesa
No. of available productsb

Predicted probability of purchase

. . . using the “unconditional” approachc

. . . using the “conditional” approachd

Panel A. In-store purchases and
online orders

Bottled water

Flour

Mixed
logit

111
4096
4.60
(1.01)

0.294
(0.130)
0.544
(0.293)

Mixed
probit

111
4096
4.60
(1.01)

0.294
(0.130)
0.537
(0.292)

Mixed
logit

427
4096
7.53
(1.69)

0.193
(0.099)
0.523
(0.287)

Mixed
probit

427
4096
7.53
(1.69)

0.194
(0.100)
0.504
(0.279)

Panel B. Stockout substitutions

No. of (attempted) stockout substitutions

. . . of which accepted

157
140

157
140

356
329

356
329

True decision’s predicted probability

. . . using the “unconditional” approachc

. . . using the “conditional” approachd

No. of random coefficients
No. of grid points
No. of simulated error drawsa

0.809
(0.248)
0.862
(0.263)

0.818
(0.249)
0.859
(0.277)

0.870
(0.227)
0.890
(0.252)

0.880
(0.233)
0.896
(0.254)

Panel C. Empirical specification

7
78,125

7
78,125
16,384b

7
78,125

7
78,125
16,384b

Notes: This table compares the fit of mixed probit and mixed logit models on a holdout sample (see
Sections 3.6.2 and 3.6.4 for details). Where relevant, standard deviations appear in parentheses.
a The number of purchases and the number of draws are jointly chosen to maintain the balance

properties of the Sobol’ sequence.

b Excluding the “outside option” of no purchase.
c This corresponds to the posterior probability based on the (estimated) population distribution of

random coefficients.

d This yields the posterior probability of the purchase, conditional on the consumer’s observed choices

in the data (prior to the stockout substitution).

sample comparison. Irrespective of the product category and the method of computing probabilities

(i.e., “conditional” versus “unconditional”), mixed logit supplies predictions of comparable—or

superior—accuracy to those of mixed probit. Now consider stockout substitutions (Panel B). Us-

ing the “unconditional” approach, mixed probit forecasts the acceptance or rejection of stockout

128

substitutions much more accurately than mixed logit does. (The advantage in acceptance prob-

abilities is 0.9 percentage points for bottled water and 1 percentage point for flour.) Turning to

the “conditional” approach, mixed probit delivers less accurate predictions than mixed logit where

bottled water is concerned (with the latter’s predicted probabilities exceeding the former’s by 0.3

percentage points). The results are more than reversed for flour, however. Here, the predicted

probabilities of consumers’ observed accept/reject decisions are 0.6 percentage points greater for

mixed probit than mixed logit. This is in keeping with the descriptive evidence in Section 3.5.2:

namely, that consumers’ purchases of bottled water are consistent with the mixed logit IPA, whereas

their purchases of flour are not.

3.7 Conclusion

This paper shows that workhorse demand systems fail to reproduce important substitution pat-

terns when individual consumers’ preferences vary over time. This shortcoming is rooted in the

independence of preferred alternatives (IPA) properties of logit models. Conditional logit imposes

independence between a consumer’s purchase and her preferences among unpurchased goods,

while mixed logit imposes conditional independence between the same (given the realizations of

the consumer-specific random coefficients). To assess these properties’ influence on demand esti-

mates, I employ novel revealed-preference data on curbside pickup. The data concern consumers’

willingness to accept store-selected substitutes when their preferred products go out of stock.

Focusing on the product categories of bottled water and flour, I present both formal tests and

informal descriptive evidence that consumers’ preferences are inconsistent with the IPA property

of conditional logit. As for mixed logit, descriptive evidence suggests that consumers’ purchases

of bottled water are consistent with the model’s IPA property, but not their purchases of flour.

I next present a demand estimation case study. The goal is to quantify what (if any) bias results

from the IPA property of mixed logit. To this end, I estimate demand for bottled water and flour

using two models: mixed logit and mixed probit (which does not display an IPA property). Then

I compare the models’ goodness of fit in relation to the stockout substitution data. The results

of this comparison vary by product category, as well as the model selection approach (within-

129

versus out-of-sample) and the method of computing choice probabilities (“conditional” versus

“unconditional”). On balance, however, mixed probit seems to forecast consumers’ accept/reject

decisions more accurately than mixed logit does. Importantly, this disparity tends to be larger for

the product category of flour than that of bottled water. This is in keeping with the descriptive

evidence summarized above: namely, that consumers’ preferences for bottled water are consistent

with the IPA property of mixed logit, whereas their preferences for flour are not.

My findings can inform future applied work on differentiated products demand. In markets

where consumers’ preferences are stable across shopping trips, mixed logit should accurately

reproduce the underlying substitution patterns. But in markets where consumers’ preferences vary

over time, an alternative model may be preferable. One such model is mixed probit (as employed

in this paper). However, mixed probit is too computationally burdensome in many applications.

Another possibility, therefore, is the “random-coefficients nested logit” model estimated in Brenkers

and Verboven (2006) as well as Grigolon and Verboven (2014). Because its error terms are not

Gumbel, but rather Generalized Extreme Value (GEV), the model is unlikely to suffer from an

IPA constraint. Furthermore, existing empirical frameworks for alternate-choice data could be

adapted to use this more general model in place of mixed logit. Many frameworks should be

amenable to this adaption, including those proposed by Berry, Levinsohn, and Pakes (2004);

Train and Winston (2007); Bachmann et al.; and Grieco et al. (2023). However, the feasibility

of incorporating random-coefficients nested logit in these frameworks depends on the conditional

choice probabilities of consumers’ alternate choices (e.g., second choices or accept/reject decisions

in stockout data). Do these probabilities take a parsimonious and predictable form as the number

of alternatives and “nests” grows? And, if so, does the resulting model match the true substitution

patterns in markets where consumers’ preferences vary over time? These questions are left for

future work.

Another method of relaxing the mixed logit IPA is to expressly model within-consumer pref-

erence variation across shopping trips. Rather than assuming that each consumer’s preferences

are characterized by a single vector of random coefficients, one could instead assume that her

130

preferences are given by two (or more) distinct vectors of random coefficients based on the circum-

stances of her shopping trip. To illustrate, consider the problem of modeling demand for flour. An

individual consumer 𝑖 could have one vector of random coefficients for shopping trips in which she

plans to bake bread, denoted by 𝛽bread

𝑖

; and another vector of random coefficients for shopping trips

in which she plans to bake cupcakes (for which all-purpose flour is ideal), denoted by 𝛽all-purpose

𝑖

.

Such a model would constitute a discrete mixture of two mixed logit models, where the partition

of shopping trips between “bread trips” and “cupcake trips” is latent. I leave to future research

the following questions.

Is such a model identified? If so, does identification require data on

consumers’ preferences among unpurchased goods? Or does it suffice to observe purchases alone?

And how might such a model be estimated?53

53One promising direction is to adapt the expectation maximization (EM) algorithm from Section 5 of Train (2008).

The key modification would center on the kernel probability.

131

BIBLIOGRAPHY

Abdulkadiroğlu, Atila, Nikhil Agarwal, and Parag A. Pathak. “The Welfare Effects of Coordinated
Assignment: Evidence from the New York City High School Match”. American Economic
Review 107, no. 12 (2017): 3635–3689.

Ackerberg, Daniel A. “Advertising, Learning, and Consumer Choice in Experience Good Markets:
An Empirical Examination”. International Economic Review 44, no. 3 (2003): 1007–1040.

Allcott, Hunt. “The Welfare Effects of Misperceived Product Costs: Data and Calibrations from the
Automobile Market”. American Economic Journal: Economic Policy 5, no. 3 (2013): 30–66.

Allcott, Hunt, et al. Sources of Market Power in Web Search: Evidence from a Field Experiment.

National Bureau of Economic Research, 2025.

Allende, Claudia, Francisco Gallego, and Christopher Neilson. “Approximating the Equilibrium

Effects of Informed School Choice”. Working paper, 2019. Visited on 10/28/2024.

Anand, Bharat N., and Ron Shachar. “Advertising, the Matchmaker”. The RAND Journal of

Economics 42, no. 2 (June 2011): 205–245.

Anupindi, Ravi, Maqbool Dada, and Sachin Gupta. “Estimation of Consumer Demand with Stock-
Out Based Substitution: An Application to Vending Machine Products”. Marketing Science
17, no. 4 (1998): 406–423.

Arteaga, Cristian, et al. “xlogit: An Open-Source Python Package for GPU-Accelerated Estimation

of Mixed Logit Models”. Journal of Choice Modelling 42 (2022): 100339.

Bachmann, Rüdiger, et al. “Firms and Collective Reputation: A Study of the Volkswagen Emissions

Scandal”. Journal of the European Economic Association 21, no. 2 (2023): 484–525.

Backus, Matthew, Christopher Conlon, and Michael Sinkinson. Common Ownership and Com-
petition in the Ready-to-Eat Cereal Industry. National Bureau of Economic Research, 2021.
Visited on 04/02/2025.

Bajari, Patrick, and C. Lanier Benkard. “Demand Estimation with Heterogeneous Consumers and
Unobserved Product Characteristics: A Hedonic Approach”. Journal of Political Economy
113, no. 6 (2005): 1239–1276.

Barahona, Nano, Cristóbal Otero, and Sebastián Otero. “Equilibrium Effects of Food Labeling

Policies”. Econometrica 91, no. 3 (2023): 839–868.

Beggs, Steven, Scott Cardell, and Jerry Hausman. “Assessing the Potential Demand for Electric

Cars”. Journal of Econometrics 17, no. 1 (1981): 1–19.

132

Berry, Steven, and Philip Haile. “Identification in Differentiated Products Markets”. Annual

Review of Economics 8, no. 1 (Oct. 31, 2016): 27–52.

Berry, Steven, James Levinsohn, and Ariel Pakes. “Automobile Prices in Market Equilibrium”.

Econometrica 63, no. 4 (1995): 841–890.

— . “Differentiated Products Demand Systems from a Combination of Micro and Macro Data:

The New Car Market”. Journal of Political Economy 112, no. 1 (2004): 68–105.

Berry, Steven T., and Philip A. Haile. “Foundations of Demand Estimation”.

In Handbook of
Industrial Organization, ed. by Kate Ho, Ali Hortaçsu, and Alessandro Lizzeri, 4:1–62. 2021.

— . “Identification in Differentiated Products Markets Using Market Level Data”. Econometrica

82, no. 5 (2014): 1749–1797.

— . “Nonparametric Identification of Differentiated Products Demand Using Micro Data”. Econo-

metrica 92, no. 4 (2024): 1135–1162.

Bradbury, James, et al. JAX: Composable Transformations of Python + NumPy Programs. Version

0.3.13, 2018.

Brenkers, Randy, and Frank Verboven. “Liberalizing a Distribution System: The European Car

Market”. Journal of the European Economic Association 4, no. 1 (2006): 216–251.

Brick Meets Click and Mercatus. “February U.S. eGrocery Sales Total $7.9 Billion, Down 10%

versus Year Ago”. Brick meets click, Mar. 13, 2024. Press Release.

Brownstone, David, and Kenneth A. Small. “Valuing Time and Reliability: Assessing the Evidence
from Road Pricing Demonstrations”. Transportation Research Part A: Policy and Practice 39,
no. 4 (2005): 279–293.

Bruno, Hernán A., and Naufel J. Vilcassim. “Research Note—Structural Demand Estimation with

Varying Product Availability”. Marketing Science 27, no. 6 (2008): 1126–1131.

Carlsson, Fredrik, and Peter Martinsson. “Do Hypothetical and Actual Marginal Willingness to
Pay Differ in Choice Experiments?: Application to the Valuation of the Environment”. Journal
of Environmental Economics and Management 41, no. 2 (2001): 179–192.

Che, Hai, Tülin Erdem, and T. Sabri Öncü. “Consumer Learning and Evolution of Consumer Brand

Preferences”. Quantitative Marketing and Economics 13, no. 3 (Sept. 2015): 173–202.

Chen, Nan, and Hsin-Tien Tsai. “Steering Via Algorithmic Recommendations”. The RAND

Journal of Economics 55, no. 4 (Dec. 2024): 501–518.

Ching, Andrew T. “A Dynamic Oligopoly Structural Model for the Prescription Drug Market After

133

Patent Expiration*”. International Economic Review 51, no. 4 (Nov. 2010): 1175–1207.

Collard-Wexler, Allan. “Demand Fluctuations in the Ready-Mix Concrete Industry”. Econometrica

81, no. 3 (2013): 1003–1037.

Compiani, Giovanni, et al. “Online Search and Optimal Product Rankings: An Empirical Frame-

work”. Marketing Science 43, no. 3 (May 2024): 615–636.

Conlon, Chris, Julie Mortimer, and Paul Sarkis. “Estimating Preferences and Substitution Patterns

from Second Choice Data Alone”. Preliminary and incomplete (2023).

Conlon, Christopher, and Jeff Gortmaker. “Incorporating Micro Data into Differentiated Products

Demand Estimation with PyBLP”. Working paper (2023).

Conlon, Christopher, and Julie Holland Mortimer. “Empirical Properties of Diversion Ratios”.

The RAND Journal of Economics 52, no. 4 (2021): 693–726.

Conlon, Christopher T., and Julie Holland Mortimer. “Demand Estimation under Incomplete
Product Availability”. American Economic Journal: Microeconomics 5, no. 4 (2013): 1–30.

— . “Effects of Product Availability: Experimental Evidence”. National Bureau of Economic

Research Working Paper 16506 (2010).

— . “Efficiency and Foreclosure Effects of Vertical Rebates: Empirical Evidence”. Journal of

Political Economy 129, no. 12 (Dec. 1, 2021): 3357–3404.

Czajkowski, Mikołaj, and Wiktor Budziński. “Simulation Error in Maximum Likelihood Estimation

of Discrete Choice Models”. Journal of Choice Modelling 31 (2019): 73–85.

Daljord, Øystein. “Durable Goods Adoption and the Consumer Discount Factor: A Case Study of

the Norwegian Book Market”. Management Science 68, no. 9 (2022): 6783–6796.

Deb, Partha, and Pravin K. Trivedi. “The Structure of Demand for Health Care: Latent Class

Versus Two-Part Models”. Journal of health economics 21, no. 4 (2002): 601–625.

Donnelly, Robert, Ayush Kanodia, and Ilya Morozov. “Welfare Effects of Personalized Rankings”.

Marketing Science 43, no. 1 (Jan. 2024): 92–113.

Dubé, Jean-Pierre, and Sanjog Misra. “Personalized Pricing and Consumer Welfare”. Journal of

Political Economy 131, no. 1 (2023): 131–189.

Erdem, Tülin, and Michael P. Keane. “Decision-Making Under Uncertainty: Capturing Dynamic
Brand Choice Processes in Turbulent Consumer Goods Markets”. Marketing Science 15, no. 1
(1996): 1–20.

134

Erdem, Tülin, Michael P. Keane, and Baohong Sun. “A Dynamic Model of Brand Choice When
Price and Advertising Signal Product Quality”. Marketing Science 27, no. 6 (2008): 1111–
1125.

Farronato, Chiara, and Andrey Fradkin. “The Welfare Effects of Peer Entry: The Case of Airbnb
and the Accommodation Industry”. American Economic Review 112, no. 6 (2022): 1782–1817.

Farronato, Chiara, Andrey Fradkin, and Alexander MacKay.

“Self-Preferencing at Amazon:
Evidence from Search Rankings”. In AEA Papers and Proceedings, 113:239–243. American
Economic Association, 2023.

Farronato, Chiara, et al. “Understanding the Tradeoffs of the Amazon Antitrust Case”. Harvard

Business Review (Jan. 11, 2024).

Fox, Jeremy T., Kyoo il Kim, and Chenyu Yang. “A Simple Nonparametric Approach to Estimating
the Distribution of Random Coefficients in Structural Models”. Journal of Econometrics 195,
no. 2 (2016): 236–254.

Fox, Jeremy T., et al. “The Random Coefficients Logit Model Is Identified”. Journal of Economet-

rics 166, no. 2 (2012): 204–212.

Grieco, Paul L.E., et al. “Conformant and Efficient Estimation of Discrete Choice Demand Models”.

Working Paper (2023).

Grieco, Paul LE, Charles Murry, and Ali Yurukoglu. “The Evolution of Market Power in the US

Automobile Industry”. The Quarterly Journal of Economics (2023).

Grigolon, Laura, and Frank Verboven. “Nested Logit or Random Coefficients Logit? A Comparison
of Alternative Discrete Choice Models of Product Differentiation”. Review of Economics and
Statistics 96, no. 5 (2014): 916–935.

Haener, M. K., P. C. Boxall, and W. L. Adamowicz. “Modeling Recreation Site Choice: Do
Hypothetical Choices Reflect Actual Behavior?” American Journal of Agricultural Economics
83, no. 3 (Aug. 2001): 629–642.

Hausman, Jerry A., and Paul A. Ruud. “Specifying and Testing Econometric Models for Rank-

Ordered Data”. Journal of Econometrics 34, no. 1 (1987): 83–104.

Heiss, Florian, Stephan Hetzenecker, and Maximilian Osterhaus. “Nonparametric Estimation of
the Random Coefficients Model: An Elastic Net Approach”. Journal of Econometrics 229, no.
2 (2022): 299–321.

Hollister, Sean. “Microsoft Now Thirstily Injects a Poll When You Download Google Chrome”.

The Verge, Oct. 24, 2023.

135

Iaria, Alessandro, and Ao Wang. “Real Analytic Discrete Choice Models of Demand: Theory and

Implications”. Econometric Theory (2024): 1–49.

Jovanovic, B. D., and P. S. Levy. “A Look at the Rule of Three”. The American Statistician 51,

no. 2 (May 1997): 137–139.

Kim, Kyoo il, and Amil Petrin. “Control Function Corrections for Unobserved Factors in Differen-

tiated Product Models”. Working paper, 2019.

Krasnoff, Barbara. “How to change your default browser in Windows 11”. The Verge, Apr. 15,

2022.

Lusk, Jayson L., and Ted C. Schroeder. “Are Choice Experiments Incentive Compatible? A Test
with Quality Differentiated Beef Steaks”. American Journal of Agricultural Economics 86, no.
2 (May 2004): 467–482.

Montag, Felix. “Mergers, Foreign Competition, and Jobs: Evidence from the US Appliance

Industry”. Working paper (2023).

Musalem, Andrés, et al. “Structural Estimation of the Effect of Out-of-Stocks”. Management

Science 56, no. 7 (2010): 1180–1197.

Nelson, Phillip. “Information and Consumer Behavior”. Journal of Political Economy 78, no. 2

(Mar. 1970): 311–329.

Nevo, Aviv. “Measuring Market Power in the Ready-to-Eat Cereal Industry”. Econometrica 69,

no. 2 (Mar. 2001): 307–342.

Newell, Richard G., and Juha Siikamäki. “Nudging Energy Efficiency Behavior: The Role of
Information Labels”. Journal of the Association of Environmental and Resource Economists
1, no. 4 (Dec. 2014): 555–598.

Osborne, Matthew.

“Consumer Learning, Switching Costs, and Heterogeneity: A Structural

Examination”. Quantitative Marketing and Economics 9 (2011): 25–70.

Paetz, Friederike, and Winfried J. Steiner. “Utility Independence versus IIA Property in Indepen-

dent Probit Models”. Journal of Choice Modelling 26 (2018): 41–47.

Parady, Giancarlos, David Ory, and Joan Walker. “The Overreliance on Statistical Goodness-of-Fit
and Under-Reliance on Model Validation in Discrete Choice Models: A Review of Validation
Practices in the Transportation Academic Literature”. Journal of Choice Modelling 38 (2021):
100257.

Quaife, Matthew, et al. “How Well Do Discrete Choice Experiments Predict Health Choices? A
Systematic Review and Meta-Analysis of External Validity”. The European Journal of Health

136

Economics 19, no. 8 (Nov. 2018): 1053–1066.

Raedts, Elske, and Simone Evans. “Google Shopping: Self-Preferencing Can Be Abusive”. Stibbe,

Feb. 10, 2024.

Reimers, Imke, and Joel Waldfogel. A Framework for Detection, Measurement, and Welfare

Analysis of Platform Bias. National Bureau of Economic Research, 2023.

Revelt, David, and Kenneth Train.

“Customer-Specific Taste Parameters and Mixed Logit”,
vol. Working Paper No. E00-274, Department of Economics, University of California, Berkeley.
2000.

Ryan, Stephen P. “The Costs of Environmental Regulation in a Concentrated Industry”. Economet-

rica 80, no. 3 (2012): 1019–1061.

Shin, Sangwoo, Sanjog Misra, and Dan Horsky. “Disentangling Preferences and Learning in Brand

Choice Models”. Marketing Science 31, no. 1 (Jan. 2012): 115–137.

Sobol’, Il’ya Meerovich. “On the Distribution of Points in a Cube and the Approximate Evaluation
of Integrals”. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 7, no. 4 (1967):
784–802.

Sullivan, Christopher. “The Ice Cream Split: Empirically Distinguishing Price and Product Space

Collusion” (2020).

Taivalsaari, Antero, et al. “Web Browser as an Application Platform”. In 2008 34th Euromicro

Conference Software Engineering and Advanced Applications, 293–302. 2008.

Train, Kenneth E. Discrete Choice Methods with Simulation. Cambridge University Press, 2009.

— . “EM Algorithms for Nonparametric Estimation of Mixing Distributions”. Journal of Choice

Modelling 1, no. 1 (2008): 40–69.

Train, Kenneth E., and Clifford Winston. “Vehicle Choice Behavior and the Declining Market
Share of Us Automakers”. International Economic Review 48, no. 4 (Nov. 2007): 1469–1496.

Tuyl, Frank, Richard Gerlach, and Kerrie Mengersen.

“The Rule of Three, its Variants and

Extensions”. International Statistical Review 77, no. 2 (Aug. 2009): 266–275.

U.S. Bureau of Labor Statistics. Consumer Price Index for All Urban Consumers (CPI-U).

U.S. Food & Drug Administration. “Bottled Water Everywhere: Keeping it Safe”. Consumer

Updates, Apr. 22, 2022.

Vatter, Benjamin. “Quality Disclosure and Regulation: Scoring Design in Medicare Advantage”.

137

Working paper, 2024.

Virtanen, Pauli, et al. “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python”.

Nature Methods 17, no. 3 (2020): 261–272.

Xing, Jianwei, Benjamin Leard, and Shanjun Li. “What Does an Electric Vehicle Replace?”

Journal of Environmental Economics and Management 107 (2021): 102432.

Young, Liz. “Never Mind the Delivery, More Online Consumers Are Turning to Store Pickup”.

The Wall Street Journal (July 14, 2023).

Zeyveld, Andrew. “Demand Estimation When Consumers’ Preferences Vary over Time”. Working

Paper (2024).

Zhang, Yongli, and Yuhong Yang. “Cross-Validation for Selecting a Model Selection Procedure”.

Journal of Econometrics 187, no. 1 (2015): 95–112.

138

APPENDIX 3A

PROOF OF LEMMA 1

Denote the representative utility of good 𝑗 ∈ { 𝐴, 𝐵} by 𝑣 𝑗 ≡ 𝑥 𝑗 𝛽 − 𝛼𝑝 𝑗𝑡 and, without loss of

generality, normalize 𝑣 𝐵 = 0.1 Then

𝑃𝐴 ≡ Pr (cid:2)𝑢𝑖 𝐴𝑡 > 𝑢𝑖𝐵𝑡

(cid:12)
(cid:12) max{𝑢𝑖 𝐴𝑡, 𝑢𝑖𝐵𝑡 } < 𝐾(cid:3)

= Pr (cid:2)𝑣 𝐴 + 𝜀𝑖 𝐴 > 𝜀𝑖𝐵

(cid:12)
(cid:12) max{𝑣 𝐴 + 𝜀𝑖 𝐴, 𝜀𝑖𝐵} < 𝐾(cid:3)

(cid:104)

= E𝜀𝑖 𝐴

Pr (cid:2)𝑣 𝐴 + 𝜀𝑖 𝐴 > 𝜀𝑖𝐵

(cid:12) max{𝑣 𝐴 + 𝜀𝑖 𝐴, 𝜀𝑖𝐵} < 𝐾; 𝜀𝑖 𝐴(cid:3) (cid:12)
(cid:12)
(cid:12)
(cid:12)

𝑣 𝐴 + 𝜀𝑖 𝐴 < 𝐾

(cid:105)

.

(3A.1)

where the last equality follows from the law of iterated expectations.

Consider the inner component of Equation (3A.1), namely, the conditional probability

𝑃𝐴 | 𝜀𝑖 𝐴 ≡ Pr (cid:2)𝑣 𝐴 + 𝜀𝑖 𝐴 > 𝜀𝑖𝐵

= Pr (cid:2)𝜀𝑖𝐵 < 𝑣 𝐴 + 𝜀𝑖 𝐴

(cid:12)
(cid:12) max{𝑣 𝐴 + 𝜀𝑖 𝐴, 𝜀𝑖𝐵} < 𝐾; 𝜀𝑖 𝐴(cid:3)
(cid:12)
(cid:12) max{𝑣 𝐴 + 𝜀𝑖 𝐴, 𝜀𝑖𝐵} < 𝐾; 𝜀𝑖 𝐴(cid:3) .

(3A.2)

Because max{𝑣 𝐴 + 𝜀𝑖 𝐴, 𝜀𝑖𝐵} < 𝐾, the random variable 𝜀𝑖𝐵 possesses the support (−∞, 𝐾). Eq.

Equation (3A.2) can thus be expressed as the fraction

𝑃𝐴 | 𝜀𝑖 𝐴 =

𝐹𝜀 (𝑣 𝐴 + 𝜀𝑖 𝐴)
𝐹𝜀 (𝐾)

,

(3A.3)

where 𝐹𝜀 (𝜀′) ≡ exp (cid:0) − 𝑒−𝜀′ (cid:1) denotes the cumulative distribution function (CDF) of the Gumbel

distribution.

1To see why this assumption is without loss of generality, decompose both goods’ utilities into their respective

representative utility and error terms;

Pr (cid:2)𝑢𝑖 𝐴 > 𝑢𝑖𝐵

(cid:12)
(cid:12) max{𝑢𝑖 𝐴, 𝑢𝑖𝐵} < 𝐾(cid:3) = Pr (cid:2)𝑣 𝐴 + 𝜀𝑖 𝐴 > 𝑣 𝐵 + 𝜀𝑖𝐵

(cid:12)
(cid:12) max{𝑣 𝐴 + 𝜀𝑖 𝐴, 𝑣 𝐵 + 𝜀𝑖𝐵} < 𝐾(cid:3) .

Then subtract 𝑣 𝐵 from each quantity on the right-hand side to obtain

Pr (cid:2)𝑢𝑖 𝐴 > 𝑢𝑖𝐵

(cid:12)
(cid:12) max{𝑢𝑖 𝐴, 𝑢𝑖𝐵} < 𝐾(cid:3) = Pr (cid:2)𝑣 𝐴 + 𝜀𝑖 𝐴 − 𝑣𝑖𝐵 > 𝜀𝑖𝐵
≡ Pr (cid:2)𝑣′

𝐴 + 𝜀𝑖 𝐴 > 𝜀𝑖𝐵

(cid:12)
(cid:12) max{𝑣′

(cid:12)
(cid:12) max{𝑣 𝐴 + 𝜀𝑖 𝐴 − 𝑣𝑖𝐵, 𝜀𝑖𝐵} < 𝐾 − 𝑣𝑖𝐵(cid:3)

𝐴 + 𝜀𝑖 𝐴, 𝜀𝑖𝐵} < 𝐾 ′(cid:3),

where 𝑣′

𝐴 ≡ 𝑣 𝐴 − 𝑣 𝐵 and 𝐾 ′ ≡ 𝐾 − 𝑣 𝐵.

139

Substituting Equation (3A.3) into Equation (3A.1) yields

𝑃𝐴 = E𝜀𝑖 𝐴

E𝜀𝑖 𝐴

=

𝑣 𝐴 + 𝜀𝑖 𝐴 < 𝐾

(cid:12)
(cid:20) 𝐹𝜀 (𝑣 𝐴 + 𝜀𝑖 𝐴)
(cid:12)
(cid:12)
𝐹𝜀 (𝐾)
(cid:12)
(cid:2)𝐹𝜀 (𝑣 𝐴 + 𝜀𝑖 𝐴) (cid:12)
(cid:12) 𝑣 𝐴 + 𝜀𝑖 𝐴 < 𝐾(cid:3)
𝐹𝜀 (𝐾)

(cid:21)

.

(3A.4)

Now employ the definition of expectation to write Equation (3A.4) as an integral. Notice that

𝑣 𝐴 + 𝜀𝑖 𝐴 < 𝐾 implies 𝜀𝑖 𝐴 ∈ (−∞, 𝐾 − 𝑣 𝐴), so the probability density function (PDF) of 𝜀𝑖 𝐴 is
𝑖 𝐴)(cid:14)𝐹𝜀 (𝐾 − 𝑣 𝐴). (Here, 𝑓𝜀 (𝜀′) ≡ exp (cid:0) − 𝑒−𝜀′ − 𝜀′(cid:1) denotes the PDF of the Gumbel

given by 𝑓𝜀 (𝜀′

distribution.) As a result,

𝑃𝐴 =

1
𝐹𝜀 (𝐾)

∫ 𝐾−𝑣 𝐴

𝜀′
𝑖 𝐴=−∞

𝐹𝜀 (𝑣 𝐴 + 𝜀′

𝑖 𝐴)

𝑖 𝐴)𝑑 (𝜀′
𝑓𝜀 (𝜀′
𝑖 𝐴)
𝐹𝜀 (𝐾 − 𝑣 𝐴)

=

1
exp (cid:0) − 𝑒−𝐾 (cid:1)

∫ 𝐾−𝑣 𝐴

𝜀′
𝑖 𝐴=−∞

(cid:16)

exp

−𝑒−(𝑣 𝐴+𝜀′

𝑖 𝐴)(cid:17) exp (cid:0) − 𝑒−𝜀′

𝑖 𝐴 − 𝜀′
𝑖 𝐴

(cid:1) 𝑑𝜀′
𝑖 𝐴

exp (cid:0) − 𝑒−(𝐾−𝑣 𝐴)(cid:1)

= exp (cid:0)𝑒−𝐾 (cid:1) exp (cid:0)𝑒−(𝐾−𝑣 𝐴)(cid:1)

exp (cid:0) − 𝑒−(𝑣 𝐴+𝜀′

𝑖 𝐴)(cid:1) exp (cid:0) − 𝑒−𝜀′

𝑖 𝐴 − 𝜀′
𝑖 𝐴

(cid:1) 𝑑𝜀′
𝑖 𝐴

= exp (cid:0)𝑒−𝐾 + 𝑒−(𝐾−𝑣 𝐴)(cid:1)

exp (cid:0) − 𝑒−𝜀′

𝑖 𝐴𝑒−𝑏(cid:1) exp (cid:0) − 𝑒−𝜀′

𝑖 𝐴 − 𝜀′
𝑖 𝐴

(cid:1) 𝑑𝜀′
𝑖 𝐴

= exp

(cid:16)(cid:0)𝑒−𝑣 𝐴 + 1(cid:1)𝑒−(𝐾−𝑣 𝐴)(cid:17) ∫ 𝐾−𝑣 𝐴

𝜀′
𝑖 𝐴=−∞

(cid:16)

exp (cid:0) − 𝑒−𝜀′

𝑖 𝐴(cid:1)(cid:17) exp(−𝑣 𝐴)

exp (cid:0) − 𝑒−𝜀′

𝑖 𝐴 − 𝜀′
𝑖 𝐴

(cid:1) 𝑑𝜀′
𝑖 𝐴

Setting 𝑢 ≡ exp

(cid:16)

−𝑒−𝜀′

𝑖 𝐴

(cid:17)

and 𝑑𝑢 ≡ exp

(cid:16)

−𝑒−𝜀′

𝑖 𝐴 − 𝜀′
𝑖 𝐴

(cid:17)

𝑑𝜀′

𝑖 𝐴 yields

∫ 𝐾−𝑣 𝐴

𝜀′
𝑖 𝐴=−∞
∫ 𝐾−𝑣 𝐴

𝜀′
𝑖 𝐴=−∞

𝑃𝐴 = exp

= exp

= exp

= exp

𝑢exp(−𝑣 𝐴) 𝑑𝑢

𝑢=0

(cid:16)(cid:0)𝑒−𝑣 𝐴 + 1(cid:1)𝑒−(𝐾−𝑣 𝐴)(cid:17) ∫ exp(− exp(−(𝐾−𝑣 𝐴)))
(cid:16)(cid:0)𝑒−𝑣 𝐴 + 1(cid:1)𝑒−(𝐾−𝑣 𝐴)(cid:17) (cid:20) 𝑢exp(−𝑣 𝐴)+1
𝑒−𝑣 𝐴 + 1
exp (cid:0) − 𝑒−(𝐾−𝑣 𝐴)(cid:1)(cid:17) exp(−𝑣 𝐴)+1
𝑒−𝑣 𝐴 + 1
− (cid:0)𝑒−𝑣 𝐴 + 1(cid:1)𝑒−(𝐾−𝑣 𝐴)(cid:17)

(cid:16)(cid:0)𝑒−𝑣 𝐴 + 1(cid:1)𝑒−(𝐾−𝑣 𝐴)(cid:17)

(cid:21) exp(− exp(−(𝐾−𝑣 𝐴)))

𝑢=0

(cid:16)

(cid:16)

(cid:16)(cid:0)𝑒−𝑣 𝐴 + 1(cid:1)𝑒−(𝐾−𝑣 𝐴)(cid:17) exp

𝑒−𝑣 𝐴 + 1

■

=

=

1
𝑒−𝑣 𝐴 + 1
𝑒𝑣 𝐴
.
1 + 𝑒𝑣 𝐴

140

APPENDIX 3B

COMPARISON OF THEOREM 1 WITH PRIOR
THEORETICAL RESULTS

Beggs, Cardell, and Hausman (1981) derive a result that closely resembles Theorem 1. However,

their result applies to different types of alternate-choice data. Whereas Theorem 1 pertain to data on

consumers’ pairwise preferences among unpurchased goods, Cardell and Hausman’s result applies

to second-choice data (as well as more comprehensive rankings of the choice set).

Beggs, Cardell, and Hausman’s result is as follows. Letting 𝑗 and 𝑗 ′ be any two goods in J ,

consider the joint probability that a consumer both (i) purchases good 𝐴 and (ii) lists good 𝐵 as

her second-most-preferred good. Cardell and Hausman show that this joint probability equals the

product of the unconditional probabilities of observing (i) and (ii). Formally,

Pr[𝑢𝑖 𝐴𝑡 = max 𝑗 ∈J 𝑢𝑖 𝑗𝑡 and 𝑢𝑖𝐵𝑡 = max 𝑗 ∈J \{ 𝐴} 𝑢𝑖 𝑗𝑡] = Pr[𝑢𝑖 𝐴𝑡 = max 𝑗 ∈J 𝑢𝑖 𝑗𝑡]

· Pr[𝑢𝑖𝐵𝑡 = max 𝑗 ∈J \{ 𝐴} 𝑢𝑖 𝑗𝑡]

As to more comprehensive rankings of consumers’ preferences, let S ⊆ J be any subset of the

goods on offer. Then the probability of observing a given ranking of the goods in S can be written

as the product of |S| − 1 logit formulas.1

These results indicate that conditional logit restricts consumers’ second choices—as well as

more comprehensive rankings of the choice set—in a manner that resembles Theorem 1. (Whether

Beggs, Cardell, and Hausman’s findings imply Theorem 1 is not immediately clear.)

1Formally, let 𝑟 ≡ (𝑟1, 𝑟2, . . . , 𝑟𝑆) be any ordinal ranking of the goods in S such that 𝑢𝑖𝑟1𝑡 > 𝑢𝑖𝑟2𝑡 > · · · > 𝑢𝑖𝑟𝑆 𝑡 .

Then

Pr[𝑢𝑖𝑟1𝑡 > 𝑢𝑖𝑟2𝑡 > · · · > 𝑢𝑖𝑟𝑆 𝑡 ] = Pr[𝑢𝑖𝑟1𝑡 = max 𝑗 ∈ S 𝑢𝑖 𝑗𝑡 ] · Pr[𝑢𝑖𝑟2𝑡 = max 𝑗 ∈ S\{𝑟1 } 𝑢𝑖 𝑗𝑡 ]

· Pr[𝑢𝑖𝑟3𝑡 = max 𝑗 ∈ S\{𝑟1,𝑟2 } 𝑢𝑖 𝑗𝑡 ] · · · Pr[𝑢𝑖𝑟𝑆−1𝑡 > 𝑢𝑖𝑟𝑆 𝑡 ].

(This notation for preference rankings is borrowed from Hausman and Ruud [1987].)

141

APPENDIX 3C

MONTE CARLO TESTS OF THEOREM 1

In this appendix, I perform Monte Carlo simulations to verify Theorem 1.

Consider a market with 𝐽 goods, indexed by 𝑗 ∈ J ≡ {1, . . . , 𝐽}.1 Utility is specified as

𝑢𝑖 𝑗 = 𝑥 𝑗 𝛽 − 𝛼𝑝 𝑗 + 𝜀𝑖 𝑗 ≡ 𝑣 𝑗 + 𝜀𝑖 𝑗 ,

where 𝜀𝑖 𝑗 is distributed i.i.d. Gumbel. (For simplicity, I abstract from the panel dimension of the

data as well as within-product price variation over time.) The task is to ascertain whether

Pr (cid:2)𝑢𝑖𝐵 > 𝑢𝑖𝐶

(cid:12)
(cid:12) 𝑢𝑖 𝐴 = max 𝑗 ∈J 𝑢𝑖 𝑗 (cid:3) = Pr[𝑢𝑖𝐵 > 𝑢𝑖𝐶]

(3C.1)

To do so, I compare (i) the conditional probability of preferring 𝐵 over 𝐶—given 𝐴 is the most-

preferred good—with (ii) the unconditional probability of the same. In computing (i), I do not

directly impose the mixed logit IPA (i.e., Theorem 1). Rather, I randomly draw errors from the

Gumbel distribution. Then I discard any draws for which 𝐴 is not the most-preferred good. Finally,

I compute the fraction of the remaining draws in which 𝐵 is preferred to 𝐶. This comparison is

repeated for 𝑆 different random draws of the goods’ representative utilities.

Each simulation 𝑠 ∈ S ≡ {1, . . . 𝑆} proceeds as follows.

I begin by randomly drawing the

representative utility 𝑣 𝑗 𝑠 of each good 𝑗 ∈ J .

In so doing, I treat the goods’ representative

utilities as (mutually independent) random uniform variables with support [−4.5, 3.5].2 With the

representative utility draws in hand, I proceed to compute the probability that 𝐵 is preferred to

𝐶—both unconditionally, and conditional on 𝐴 being the most-preferred good. The unconditional

probability is given by the familiar logit formula:

Pr[𝑢𝑖𝐵𝑟 > 𝑢𝑖𝐶𝑟 | (𝑣 𝑗 𝑠) 𝑗 ∈J ] =

exp(𝑣 𝐵𝑠)
exp(𝑣 𝐵𝑠) + exp(𝑣𝐶𝑠)

.

(3C.2)

As for the conditional probability of preferring 𝐵 to 𝐶 (given 𝐴 is the most-preferred good), I

simulate it by randomly drawing 𝑁 different i.i.d. Gumbel errors for each good 𝑗, {𝜀𝑖 𝑗 }𝑁

𝑖=1.3 Then

1For simplicity, I abstract from the inside/outside good distinction.
2This choice of support follows the Monte Carlo experiments in Heiss, Hetzenecker, and Osterhaus (2022).
3Regarding the absence of an 𝑠 subscript: for computational simplicity, I use the same ten million Gumbel draws

for all simulations.

142

the conditional probability is approximated by:

ˆPr[𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡

(cid:12)
(cid:12) 𝑢𝑖 𝐴𝑡 = max 𝑗 ∈J 𝑢𝑖 𝑗𝑡; (𝑣 𝑗 𝑠) 𝑗 ∈J ] =

𝑁
∑︁

𝑖=1

1 (cid:2)𝑣 𝐵𝑟 + 𝜀𝑖𝐵 > 𝑣𝐶𝑟 + 𝜀𝑖𝐶

and 𝑣 𝐴𝑟 + 𝜀𝑖 𝐴 = max 𝑗 ∈J {𝑣 𝑗𝑟 + 𝜀𝑖 𝑗 }(cid:3)

1 (cid:2)𝑣 𝐴𝑟 + 𝜀𝑖 𝐴 = max 𝑗 ∈J {𝑣 𝑗𝑟 + 𝜀𝑖 𝑗 }(cid:3) .

𝑁
∑︁

(cid:46)

𝑖=1

(3C.3)

With Equations (3C.2) and (3C.3) in hand, I proceed to compute the absolute value of the difference

between them:

AbsDiff𝑠 = (cid:12)

(cid:12) ˆPr[𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡

(cid:12) 𝑢𝑖 𝐴𝑡 = max 𝑗 ∈J 𝑢𝑖 𝑗𝑡; (𝑣 𝑗 𝑠) 𝑗 ∈J ] − Pr[𝑢𝑖𝐵𝑟 > 𝑢𝑖𝐶𝑟 | (𝑣 𝑗 𝑠) 𝑗 ∈J ](cid:12)
(cid:12)
(cid:12)

Having repeated this process for 𝑆 simulations, I compute the average absolute value of the

difference between the conditional and unconditional probabilities: 𝑆−1 (cid:205)𝑆

𝑠=1 AbsDiff𝑠.

Numerical Details and Results.—I perform the steps described above for markets of two different

sizes: three goods and four goods. For each market size, I synthesize 100 different representative

utility combinations (drawn, as described above, from the uniform distribution with support [-4.5,

3.5]). To approximate the conditional choice probabilities, I take ten million i.i.d. Gumbel draws

per good.

The results of this simulation are as follows. For the three-good market, the mean absolute

difference between the conditional and unconditional probability is 0.000265 (with a standard

deviation of 0.000483). And for the four-good market, the mean absolute difference between the

conditional and unconditional probability is 0.000457 (with a standard deviation of 0.000853).

143

APPENDIX 3D

CROSS-CHARACTERISTIC CORRELATIONS IN
(DIS)SIMILARITY

Table 3D.1 reports cross-characteristic correlations in the substitutes’ similarity or dissimilarity

with respect to two characteristics. Letting 𝑖 index rows and 𝑗 index columns, cell entry 𝑖, 𝑗 reports

the correlation between the substitute’s (i) matching the out-of-stock product on characteristic 𝑖 and

(ii) matching the out-of-stock product on characteristic 𝑗.

For the most part, similarity between the substitute and the out-of-stock product in one charac-

teristic is inversely correlated with similarity in another. There are only a handful of exceptions.

(For instance, a substitute flour is more likely to share the same flour type as the out-of-stock

product if it also shares its “bleached” status.)

Table 3D.1: Correlation Matrices of Similarity in Characteristics between Substitute and
Out-Of-Stock Product

a. Bottled water

Same
brand

Similara
bottle size

Similara
no. of bottles

Same water
type

1.00
Same brand
−0.26
Similara bottle size
Similara no. of bottles −0.31
−0.02
Same water type

1.00
−0.11
−0.14

1.00
−0.09

1.00

Notes: Letting 𝑖 index rows and 𝑗 index columns, the entry in cell 𝑖, 𝑗 indicates the correlation
between the substitute and out-of-stock product sharing characteristic 𝑖 and their sharing
characteristic 𝑗 as well. There are 106,484 observations.

a Within 10%.

b. Flour

Same
brand

Same “bleached”
status

Similara
quantity

Same flour
type

Same brand
1.00
Same “bleached” status −0.05
−0.49
Similar quantitya
0.02
Same flour type

1.00
−0.26
0.07

1.00
−0.16

1.00

Note: 26,242 observations. (See Panel A for details.)

a Within 10%.

144

APPENDIX 3E

DETAILS ON THE STRUCTURAL ESTIMATION
METHOD

This appendix describes two aspects of the structural estimation process. These include (i) the esti-

mation of correlations among the mixed probit error terms and (ii) the choice of tuning parameters

(in both mixed logit and mixed probit).

Grid Search Estimator of Error Correlations in Mixed Probit.—Trip-specific circumstances

sometimes shift multiple goods’ utilities, causing their error terms to be correlated. To see the

intuition, recall the example from Section 3.1 of a baker who usually bakes bread (for which

bread flour is ideal), but who occasionally bakes cupcakes instead (for which all-purpose flour is

preferable). On the rare trips when she plans to bake bread, there will be a positive shock to the

utilities of all-purpose flours but a negative shock to those of bread flour. Now consider how these

trips will figure in a discrete choice model. The positive shocks to all-purpose flours’ utilities will

appear as positive realization of those products’ error terms, whereas the negative shocks to bread

flours’ utilities will manifest as negative realizations. Thus, the circumstances of a given shopping

trip (and, in particular, the planned recipe) cause the error terms of products of a given flour type

to be correlated with each other.

The preceding example highlights the following fact. In markets where trip-specific circum-

stances affect the utilities of multiple goods, a demand system should accommodate correlated

errors. This is especially true when alternate choice data are available. In that event, the inclusion

of correlated errors should enable the demand system to better match consumers’ observed prefer-

ences over unpurchased products (as reported in the alternate choice data). And, to the extent that

preferences over unpurchased products are indicative of product substitutability, the final result is

more accurate estimates of demand elasticities.

Unlike mixed logit,1 the mixed probit model accommodates correlated errors. However, it is

challenging to recover the structure of the correlation. One must simulate choice probabilities not

1Only generalizations of mixed logit, such as mixed nested logit, can incorporate correlated errors.

145

only for every point of the fixed grid, but also for each possible correlation structure. It is therefore

helpful to minimize the number of potential correlation structures considered. For this reason,

I adopt a grid search approach to estimating the correlations between error terms. This method

is popular in the machine learning literature, where it is used for a different purpose (namely,

tuning so-called “hyperparameters”). Here the method appeals for the same overarching reason: it

minimizes the number of times a computationally burdensome procedure must be repeated.

The grid search estimator proceeds as follows. First, I propose a general structure for the

correlations among the error terms. I begin by identifying a cluster of products within the category

whose error terms are especially likely to be correlated.

In so doing, I consult the descriptive

evidence in Section 3.5.2 concerning within-consumer preference variation across trips. For the

product category of flour, I focus on within-consumer preference variation with respect to flour

type. The idea is that flours of the type needed for the consumer’s intended recipe will enjoy

positive utility shocks (which manifest as positive, correlated error terms). As for bottled water,

the descriptives provide little guidance regarding which (if any) characteristics experience within-

consumer variation in tastes. Resorting to intuition, I opt to model correlation centered on bottle

count, the idea being that consumers will sometimes require more water bottles than usual due to

trip-specific circumstances (such as preparing for a long road trip).

Having identified a cluster of products whose error terms may be correlated, I compute correlated

errors as follows. Assume that the products’ error terms are distributed multivariate normal such

that (i) all the error terms’ variances equal one; (ii) the error terms corresponding to products within

the “correlation cluster” exhibit a common covariance of 𝜎 with one other, but are independent of

the error terms of products outside the “correlation cluster;” and (iii) the error terms of products

outside the “correlation cluster” are independent of both each other and of the error terms of

products within the “correlation cluster.” To see what the resulting covariance matrix might look

like, recall the stylized four-good market from Section 3.1 in which products A and B are close

substitutes for each other, but not for goods C and D (which, in turn, are close substitutes for

each other but not for A or B). In relation to this stylized market, I might consider the following

146

covariance matrix:

Var(𝜀 𝐴)

Cov(𝜀 𝐴, 𝜀𝐵) Cov(𝜀 𝐴, 𝜀𝐶) Cov(𝜀 𝐴, 𝜀𝐷)

Cov(𝜀𝐵, 𝜀 𝐴)

Var(𝜀𝐵)

Cov(𝜀𝐵, 𝜀𝐶) Cov(𝜀𝐵, 𝜀𝐷)

Cov(𝜀𝐶, 𝜀 𝐴) Cov(𝜀𝐶, 𝜀𝐵)

Var(𝜀𝐶)

Cov(𝜀𝐶, 𝜀𝐷)

Cov(𝜀𝐷, 𝜀 𝐴) Cov(𝜀𝐷, 𝜀𝐵) Cov(𝜀𝐶, 𝜀𝐶)

Var(𝜀𝐷)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

=

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

1 𝜎 0 0

𝜎 1 0 0

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
0 1 0
(cid:174)
(cid:174)
(cid:174)
0 0 1
(cid:172)

0

0

Notice that I only model the error correlations within one “cluster” of products within the market:

that of goods A and B. Ideally, I would also estimate the correlation between the error terms

of goods C and D (the other pair of close substitutes). However, modeling correlations for two

clusters, as opposed to one, would exponentially increase the computational burden.2 Besides,

when there are only two product clusters (as is the case here), the key qualitative patterns in the

data can be captured by modeling the correlations of just one cluster’s errors.3 Happily, such is

case for the product categories of bottled water and flour. Regarding the former category, I model

correlations among the error terms of products with twenty-four bottles (as distinct from forty, the

other top-selling size). As for the latter category, I model the correlations in the error terms of

bread flours (as distinct from all-purpose flours, the other top-selling flour type).

Having identified a cluster of products whose error terms may be correlated, I specify a set

C = {𝜎1, . . . , 𝜎𝐶 } of possible covariance parameters. Then I estimate demand separately for each

covariance parameter 𝜎𝑐 ∈ C. Each time, I follow the steps described above. The only difference

between iterations 𝑐 = 1, . . . , 𝐶 concerns the simulated error terms. On iteration 𝑐, I assume that

the error terms are distributed multivariate normal with the covariance matrix implied by (i) the

cluster structure under consideration and (ii) the specific covariance parameter 𝜎𝑐 being evaluated.

(Notice that the estimated distribution of the random coefficients (𝛽𝑖, 𝛼𝑖) will vary across iterations

2More precisely, the computational burden is squared. For instance, if I considered five different levels of
correlation per cluster—0, 0.1, 0.2, 0.3, and 0.4—evaluating the Cartesian product of the candidate correlations would
require 52 = 25 rounds of estimation.

3To see why, suppose that a consumer has purchased good A, so her second–most-preferred product is probably
B. Because the consumer purchased good A, whose errors are correlated with those of good B, the realization of good
B’s error term is probably positive. Thus, the model would likely predict that good B is the consumer’s second–most-
preferred product. Now suppose, instead, that the consumer has purchased good C, in which case D is probably her
second–most-preferred product. In this case, the realizations of A and B’s errors would be disproportionately likely to
be negative, thereby increasing the probability that the model assigns good D greater utility than A or B.

147

so as to maximize the likelihood function given the error draws.)

With the estimates in hand, I identify the covariance parameter that results in the largest log

likelihood at convergence. Then I perform estimation a second time with that parameter. (Without

this step, the log likelihood for the “optimal” covariance parameter may be upwardly biased due to

random noise in the simulated probabilities.4)

Tuning Parameters.—It is necessary to choose both the number and location of the fixed grid

points before estimation.

Regarding the number of grid points, my approach closely resembles that employed by Train

(2008). That is to say, I begin by determining the maximum number of grid points that can fit within

the memory. Then I divide this total number of grid points evenly among the random coefficients,

so that each coefficient’s support will be discretized into the same number of values. The final

result is that the support of each random coefficient is approximated by five distinct fixed points

(which happens to be the same number as one of Train’s specifications [2008].)

Having selected the number of distinct fixed grid values per random coefficient, it remains to

determine their locations. I follow Heiss, Hetzenecker, and Osterhaus (2022) in basing the grid

points’ locations on parametric mixed logit estimates. Specifically, I center the grid on the mean

coefficient estimates from the parametric model. Then, for each coefficient, I place the outermost

points two (estimated) standard deviations above and below the mean. In the case of mixed probit, I

divide each point by

√

1.6 to adjust for the difference in normalization between multinomial probit

and logit models (see Train [2009]).

4The “optimal” covariance parameter 𝜎𝑐 is chosen because it maximizes the log likelihood at convergence.
However, the log likelihood is evaluated with error because it is simulated—and sometimes the simulated probabilities
of consumers’ observed choices exceed the true probabilities (perhaps because the error draws spuriously align with
consumers’ observed choices).

148

APPENDIX 3F

MULTIPLE-UNIT PURCHASES OF INDIVIDUAL
PRODUCTS

Contrary to standard discrete choice frameworks, consumers sometimes purchase multiple units

of a single product on one shopping trip. This poses a problem for the model selection exercise

in Section 3.6. Recall that the IPA property of mixed logit imposes conditional independence

between consumers’ orders choices and their decisions to accept or reject the substitute, given their

time-invariant tendencies to like or dislike the substitute (based on its observable characteristics).

The key assumption is that consumers’ preferences do not vary between trips in a fashion that is

correlated across products. However, if consumers’ choice sets include multiple units of individual

products, their observed behavior may be inconsistent with the mixed logit IPA for a different

reason: the model misspecifies the underlying choice problem. To see why, consider a consumer

who likes to purchase bottled water in large quantities. She might consider the following to be

her top two purchase options:

(i) a 40-pack of the private label and (ii) two 24-packs of Ice

Mountain. However, standard discrete choice models exclude option (ii), as they assume that she

will purchase only a single unit of a given product. In consequence, discrete choice models might

underestimate the probability that she purchases Ice Mountain while overestimating the probability

that she purchases other brands that offer larger packs.

One possible solution would be to treat different quantities of a product as distinct alternatives.

For instance, purchasing one 24-pack of Ice Mountain would be treated as a different alternative

from purchase two 24-packs of Ice Mountain. However, the data on stockout substitutions do not

report the requested number of units of the out-of-stock product. Although it seems likely that the

consumer would be offered a quantity of the substitute such that the total quantity (i.e., size per

unit times number of units) would closely match the out-of-stock product’s in most situations, it

also seems probable that rejection would be especially likely in situations where the substitute’s

total quantity diverges from the out-of-stock product’s. For this reason, I do not attempt to impute

the number of units requested of the out-of-stock product based on the substitute’s total quantity.

149

Instead, I identify households who are especially unlikely to purchase multiple units of a single

product. To do so, I find households for whom I observe (i) zero purchases involve multiple units

and (i) ten or more purchases in total.

(In principle, I could solely drop transactions featuring

multi-unit purchases, as opposed to entire households. However, because multi-unit purchases are

so common [see Section 3.6.3], it seems plausible that a large fraction of households entertained

multi-unit purchases during trips where they ultimately purchased a single unit.)

I quantify the importance of excluding households with multi-unit purchases as follows. First,

I draw a random sample of households from the universe of sample households (as opposed

to those with 10+ transactions and 0 multi-unit purchases). And second, I repeat the model

selection exercises in Section 3.6.4 on this alternative sample. The results qualitatively resemble

those presented in the main text (both within- and out-of-sample). The primary difference is that

mixed probit always delivers more accept/reject predictions than does mixed logit—irrespective of

product category, model selection approach, or method of computing predicted choice probabilities.

However, the disparity still tends to be larger in relation to flour than in relation to bottled water.

This is consistent with the descriptive evidence presented in Section 3.5.2.

150

APPENDIX 3G

SUPPLEMENTARY RESULTS FROM STRUCTURAL
ESTIMATION

Verifying the Mixed Logit IPA.—According to Corollary 1, the conditional probability of accepting

a stockout substitute—given one’s original order choice—should be identical to the unconditional

probability of the same. To verify that this is indeed the case, Table 3G.1 compares two estimation

approaches. The first approach directly imposes the mixed logit IPA, resulting in the closed-form

likelihood presented in Section 3.6.2 of the main text. By contrast, the second approach simulates

the likelihood function without imposing the mixed logit IPA. Simulation proceeds in two steps.

First, I compute the order choice probabilities by drawing from the standard Gumbel distribution.

And second, I calculate the accept/reject probabilities based solely on the error draws that resulted

in “correct” order predictions.

Table 3G.1: Verifying the Mixed Logit IPA by Simulation

Statistic

Product category

Bottled water Flour

Frac. of stockouts with same prediction
Avg. absolute difference in predicted prob. accept
Root mean square difference in predicted prob. accept

0.996
0.003
0.024

0.993
0.005
0.025

Notes: This table compares the predictions of two mixed logit estimators: (i) directly imposing
the mixed logit IPA (and using the resultant closed-form likelihood), and (ii) simulating the choice
probabilities. For (ii), the accept/reject probabilities are solely based on Gumbel error draws that
result in the “correct” original online order.
(Consequently, most of the 20,000 draws used to
compute the order probabilities are discarded for the accept/reject stage.)

Table 3G.1 reports three measures of the similarity of the two estimation approaches. All

these measures pertain to the predicted probability of acceptance. The first measure is the fraction

of stockout substitutions in which both models predict the same outcome.1 As for the second

measure, I compute the average absolute difference between the two models’ predicted probabilities

of acceptance. Letting 𝑠 ∈ {1, . . . , 𝑆} index (attempted) stockout substitutions, the measure is given

1That is, I compute the fraction of substitutions in which either (i) both models assign a predicted probability of

>50% to acceptance or (ii) both assign a predicted probability of <50% to acceptance.

151

by

AAD =

𝑆
∑︁

|𝑃𝑠 − ˆ𝑃𝑠 |

1
𝑆

𝑠=1
where 𝑃𝑠 indexes acceptance probabilities derived from the closed-form likelihood and ˆ𝑃𝑠 denotes

their simulated counterparts. The third (and final) measure is the root-mean-square difference in

predicted acceptance probabilities:

RMSD =

(cid:118)(cid:117)(cid:116)

1
𝑆

𝑆
∑︁

𝑠=1

(𝑃𝑠 − ˆ𝑃𝑠)2

Observe that the second and third measures are similar in spirit; both gauge the average “distance”

in acceptance probabilities. However, the average absolute difference employs the 𝐿1 norm whereas

the root-mean-square difference employs the 𝐿2 norm.

The results in Table 3G.1 indicate that the two estimation approaches arrive at very similar

predictions. This is especially true where bottled water is concerned.2

Random Coefficients.—Table 3G.2 reports summary statistics for the random coefficients (i.e.,

the 𝛽’s) in each product category. To compare the mixed logit coefficients with their mixed probit

counterparts, divide the former by

√

1.6. Concerning mixed probit, Table 3G.2 also indicates

the estimated correlation parameter (𝜎) for the indicated “cluster” of products. This parameter is

estimated to be 0.1 for bottled water and 0.2 for flour.3

Whether the error terms are correlated or uncorrelated, mixed probit does not display an IPA

property.4 Consequently, the model allow a consumer’s initial order to be correlated with her

decision to accept or reject a stockout substitute.

In principle, this correlation might have no

real-world economic content (being a purely mathematical property). However, it is also possible

that this correlation reflects real-world consumer behavior. Regarding the latter hypothesis, recall

2There are two reasons why the results for bottled water are more precise than those for flour. First, a larger fraction
of error draws translate to “correct” order predictions in the former category than in the latter (27% versus 20%).
This leaves more draws with which to simulate the accept/reject probabilities. And second, the utility specification
for bottled water is more flexible than the utility specification for flour. Whereas the former includes dummies for
individual products, the latter relies on observable characteristics (brand, flour type, quantity, etc.)
3I tested five potential correlation parameters in each category: {0, 0.1, 0.2, 0.3, 0.4, 0.5}.
4By way of example, consider a three-good market with goods 𝐴, 𝐵, and 𝐶. Suppose that 𝑢𝑖 𝐴𝑡 = 𝜀𝑖 𝐴, 𝑢𝑖𝐵𝑡 = 1+𝜀𝑖𝐵,
(cid:12)
(cid:12) 𝑢𝑖 𝐴𝑡 = max 𝑗 ∈ J 𝑢𝑖 𝑗𝑡 (cid:3) ≈

and 𝑢𝑖𝐶𝑡 = 2 + 𝜀𝑖𝐵 (where the error terms are i.i.d. standard normal). Then Pr (cid:2)𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡
0.331 > 0.240 ≈ Pr[𝑢𝑖𝐵𝑡 > 𝑢𝑖𝐶𝑡 ].

152

Table 3G.2: Summary Statistics on Structural Parameters

Panel A. Bottled water

Mixed logit

Mixed probit

Variable

Means

Std. devs. Means

Std. devs.

Price
Aquafina (24 ct.)a
Ice Mtn. (24 ct.)a
Nestle (24 ct.)a
Pvt. lbl. purified water (24 ct.)a
Pvt. lbl. purified water (40 ct.)a
Pvt. lbl. spring water (24 ct.)a

2.174
8.689
8.838
8.211
8.622
10.207
8.284

0.682
2.699
2.962
2.155
2.252
2.957
2.540

1.190
6.682
7.538
6.499
6.688
8.697
7.154

0.405
1.871
2.272
1.727
1.837
2.207
1.908

24-packs
0.0

Error correlation cluster
Correlation parameter (𝜎)

Price
All-purpose flour
Bread flour
Gold Medal brand
King Arthur brand
Log quantity
Unbleached

Error correlation cluster
Correlation parameter (𝜎)

Panel B. Flour

2.298
6.801
4.781
0.099
2.153
1.693
0.085

0.475
3.045
3.043
2.811
5.536
1.046
4.007

1.555
5.077
3.359
0.615
1.777
1.574
−0.663

0.450
2.411
2.653
2.038
3.325
0.846
2.238

Bread flours
0.0

Notes: This table presents summary statistics for the nonparametrically-estimated distributions
of random coefficients. To compare the mixed logit coefficients with the mixed probit ones,
1.6. The “error correlation clusters” in mixed probit consist of products
divide the former by
whose error terms are correlated. See Chapter 3E for details.

√

a Product-specific dummy.

that multinomial probit with uncorrelated errors—hereafter, “independent probit”—does not suffer

from the familiar independence of preferred alternatives (IIA) property displayed by conditional

logit (Paetz and Steiner 2017). Furthermore, independent probit relaxes the IIA property in a

systematic way. Consider a market with two goods: 𝐴 and 𝐵. Without loss of generality, assume

that good 𝐴 commands a larger choice share than does good 𝐵. Simulations performed by Paetz

and Steiner (2018) suggest that the introduction of a third good—say, 𝐶—will cause the choice

share of the less popular good (𝐵) to shrink more dramatically in percentage terms than the choice

share of the more popular good (𝐴).

153

The “Hit Rate.”—In Section 3.6.4, I compare mixed logit and mixed probit’s goodness of fit

based on the average predicted probability assigned to consumers’ observed choices. An alternative

measure of fit is the fraction of observations in which consumers’ observed choices are assigned

the highest predicted probability of any alternative—hereafter, the “hit rate.”

The discussion in the main text focuses on the average predicted probabilities of consumers’

observed choices—as opposed to the hit rate—for two reasons. First, the predicted probability of

the chosen product directly enters the likelihood function, whereas the “hit rate” does not. And

second, the predicted probability of the chosen product is more closely related to the product’s

(estimated) cross-price elasticities than the “hit rate” is.5

Table 3G.3 compares the hit rates of mixed logit and mixed probit. Using the “unconditional”

approach, mixed logit and mixed probit deliver extremely similar hit rates—irrespective of the

product category or the model selection strategy (i.e., within- versus out-of-sample). Using the

“conditional” approach, by contrast, mixed logit performs weakly better than mixed probit. The

sole exception is out-of-sample predictions about stockout substitutions within the product category

of flour. There, mixed probit’s hit rate exceeds that of mixed logit by 0.3 percentage points.

5The cross-price elasticity of good 𝑗 with respect to good 𝑗 ′ is defined as (cid:0)𝜕𝑠 𝑗 /𝜕 𝑝 𝑗′ (cid:1) (cid:0)𝑝 𝑗′ /𝑠 𝑗 (cid:1), where 𝑠 𝑗 denotes
the market share of good 𝑗. In this equation, 𝑠 𝑗 is computed as the average predicted probability of 𝑗 being purchased
(across all the observed choice situations), while (cid:0)𝜕𝑠 𝑗 /𝜕 𝑝 𝑗′ (cid:1) is defined as marginal changes in the same. See Train
(2009).

154

Table 3G.3: “Hit Rate:” Mixed Logit versus Mixed Probit

Data type

Panel A. Within sample

Bottled water

Flour

Mixed
logit

Mixed
probit

Mixed
logit

Mixed
probit

In-store purchases and online orders

. . . using “unconditional” approacha 0.405
. . . using “conditional” approachb
0.719

Stockout substitutions

. . . using “unconditional” approacha 0.850
. . . using “conditional” approachb
0.952

0.401
0.717

0.254
0.686

0.255
0.669

0.850
0.952

0.935
0.966

0.935
0.960

Panel B. Out of sample

In-store purchases and online orders

. . . using “unconditional” approacha 0.432
. . . using “conditional” approachb
0.646

Stockout substitutions

. . . using “unconditional” approacha 0.892
. . . using “conditional” approachb
0.898

0.423
0.638

0.305
0.681

0.305
0.659

0.892
0.892

0.924
0.916

0.924
0.919

Notes: This table compares the “hit rates” of mixed probit and mixed logit models for the
product categories of bottled water and flour. The hit rate is defined as the fraction of choice
situations for which the model assigns the consumer’s observed choice the highest predicted
probability of any alternative. (See Sections 3.6.2 and 3.6.4 for details on estimation).

a This yields the posterior probability of the purchase, conditional on the consumer’s

observed choices
in the data.

b This is the posterior probability based on the (estimated) population distribution of

random

coefficients.

155