VALUATION OF PUBLIC GREAT LAKES BEACHES IN MICHIGAN
By
Min Chen

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Agricultural, Food, and Resource Economics – Doctor of Philosophy
2013

ABSTRACT
VALUATION OF PUBLIC GREAT LAKES BEACHES IN MICHIGAN
By
Min Chen
The objective of this dissertation is to measure the monetary values of public Great Lakes
beaches using the travel cost approach. To decide which econometric model to use,
Monte Carlo simulations were developed, and results showed that the nested logit model
was robust and reliable. To collect beach use data, a two-stage survey of over 29,000
people was conducted from 2011 to 2012. A mail survey went out in 2011 to identify
people who participated in beach recreation with a random sample from Michigan’s
driver license list. Respondents who said they visited a Great Lakes beach since June 1,
2010 were invited to a follow-up web survey about trips to public Great Lakes beaches in
the summer of 2011. A repeated nested logit model with a participation hurdle was
estimated for the day trip data. The estimated beach recreation participation rate was 58%
for adults living in Lower Peninsula of Michigan, and an estimated 20.9 million day trips
were taken by Michigan adults to public Great Lakes beaches in the summer of 2011. The
value of access to a public beach for a day trip was estimated to be $32-$39 per person
per trip in 2011 dollars. Access to all Lake Michigan public beaches, in Michigan, was
estimated to be worth over $400 million per season for day trips for adults living in
Lower Peninsula of Michigan. To value long trips of four nights or more, a model was
developed allowing people to visit combinations of single and multiple sites on a trip.
The resulting values were about $53 per person per beach day for access to a site for a

trip of four nights or longer. The more common approach of using the main destination
for multi-site trips has larger welfare measures compared to the approach permitting
combinations of multiple sites to be visited.

ACKNOWLEDGEMENTS

First of all, I would like to express my special gratitude and thanks to my major professor,
Dr. Frank Lupi. He is very creative, and it is a great pleasure to work with him. I have
learned a lot in the six years, not only in research, but also about the communication
skills, how to work in a team, etc. Second, I want to thank the NOAA team for efforts on
the big survey in one and a half years. Dr. Michael Kaplowitz was very supportive in
many aspects and provided practical advice from the perspective of a lawyer. My coworker, Scott Weicksel, was in charge of almost all the work related to survey printing
and mailing, pretesting, mail survey design and scanning, etc. I really appreciated his
hard work and it contributed a lot to the good quality of our survey.
Also, I would like to thank my committee members, Dr. John Hoehn, Dr. Patricia
Norris and Dr. Jinhua Zhao, for their constructive comments and understandings; Scott
Knoche, Richard Melstrom, Tim Komarek and Tim Hodge, for their helpful advice on
study, research and work; and all peers who is or used to be in the department and all my
friends, for giving me such a great time at Michigan State!
Finally, I want to thank my family in China. Thank my parents and grandparents
for always being considerate and caring!

iv

TABLE OF CONTENTS

LIST OF TABLES ............................................................................................................ vii
LIST OF FIGURES .......................................................................................................... xii
INTRODUCTION ...............................................................................................................1
Chapter 1
Relative Performance of the Latent Class Model Compared to the Conditional Logit and
Nested Logit Models for Environmental Valuation.............................................................3
1 Motivation .........................................................................................................................3
2 Models...............................................................................................................................6
2.1 Conditional Logit Model....................................................................................6
2.2 Nested Logit Model ...........................................................................................8
2.3 Latent Class Model ..........................................................................................10
3 Simulations .....................................................................................................................12
3.1 True Model-Latent Class Model ......................................................................12
3.1.1 Simulation Steps ...............................................................................12
3.1.2 Simulation Results ............................................................................16
3.2 True Model-Conditional Logit Model .............................................................26
3.3 True Model-Nested Logit Model .....................................................................30
3.3.1 Simulation Steps ...............................................................................30
3.3.2 Simulation Results ............................................................................32
3.4 Sensitivity Analyses .........................................................................................36
4 Discussion and Conclusions ...........................................................................................37
Chapter 2
Estimating Use Values of Public Great Lakes Beaches in Michigan ................................41
1 Motivation .......................................................................................................................41
2 Models.............................................................................................................................45
2.1 Random Utility Models....................................................................................45
2.2 Predicted Trips .................................................................................................52
2.3 Welfare Measures ............................................................................................52
3 Survey and Data ..............................................................................................................55
3.1 Surveys.............................................................................................................55
3.1.1 Screener Mail Survey ........................................................................55
3.1.2 Follow-Up Web Survey ....................................................................56
3.2 Data ..................................................................................................................58
3.3 Model Specification .........................................................................................62
v

4 Estimation Results ..........................................................................................................68
5 Discussion and Conclusions ...........................................................................................80
Chapter 3
Modeling Long Overnight Trips by Chaining Recreation Sites ........................................83
1 Motivation .......................................................................................................................83
2 Models.............................................................................................................................89
3 Data .................................................................................................................................97
4 Estimation Results ........................................................................................................104
5 Discussion and Conclusions .........................................................................................110
APPENDICES .................................................................................................................112
Appendix A: Results of Sensitivity Analyses for the Monte Carlo Simulations in Chapter
1........................................................................................................................................112
Appendix B: Comparison between Driver License List and Census Data ......................140
Appendix C: Data Weights ..............................................................................................142
Appendix D: Great Lakes Beach Recreation Participation..............................................155
Appendix E: Model Sensitivity in Chapter 3 ...................................................................160
REFERENCES ................................................................................................................162

vi

LIST OF TABLES

Table 1: Simulating One’s Choice .....................................................................................14

Table 2: Performance of Latent Class Model When It Is the True Model ........................17

Table 3: Performance of Conditional Logit and Nested Logit Models When Latent Class
model Is the True Model ....................................................................................................21

Table 4: Estimated Values of Marginal Quality Change of Latent Class Model When It Is
the True Model ...................................................................................................................23

Table 5: Estimated Site Values of Latent Class Model When It Is the True Model..........24

Table 6: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent
Class Model Is the True Model ..........................................................................................25

Table 7: Performance of Latent Class Model When Conditional Logit Model Is the True
Model .................................................................................................................................27

Table 8: Performance of Conditional Logit and Nested Logit Models When Conditional
Logit Model Is the True Model ..........................................................................................28

Table 9: Welfare Measures of Conditional Logit, Nested Logit and Latent Class Models
When Conditional Logit Model Is the True Model ...........................................................29

Table 10: Performance of Latent Class Model When Nested Logit Model Is the True
Model .................................................................................................................................33

Table 11: Performance of conditional logit and nested logit models when nested logit
model is the true model ......................................................................................................34
vii

Table 12: Welfare measures of conditional logit, nested logit and latent class models
when nested logit model is the true model.........................................................................35

Table 13: Demographic Characteristics of Users, Potential Users and Nonusers .............59

Table 14: Full Information Maximum Likelihood (FIML) Estimation Results ................71

Table 15: Welfare Estimates of Changing a Beach in 2011 Dollars at Individual Level ..73

Table 16: Welfare Estimates of Changing a Beach in 2011 Dollars (Million) at State
Level ..................................................................................................................................74

Table 17: Estimated Trips and Welfare Changes of Closing All Beaches on a Great Lake
in 2011 Dollars...................................................................................................................75

Table 18: Examples of Literature Not Differentiating Overnight Trips from Day Trips ..84

Table 19: Studies Dealing with Overnight/Multiple-Objective/Multiple-Site Trips .........85

Table 20: Demographic Characteristics of Participants with Long Overnight Trips ......101

Table 21: Full Information Maximum Likelihood (FIML) Estimation Results ..............105

Table 22: Estimated Welfare Changes per Person in 2011 Dollars .................................106

Table 23: Estimation Results of Truncated Poisson Models ...........................................109

Table A-1: Performance of Latent Class Model When It Is the True Model ..................115

Table A-2: Performance of Conditional Logit and Nested Logit Models When Latent
Class Model Is the True Model ........................................................................................116
viii

Table A-3: Estimated Values of Marginal Quality Change of Latent Class Model When It
Is the True Model .............................................................................................................116

Table A-4: Estimated Site Values of Latent Class Model When It Is the True Model ...117

Table A-5: Welfare Estimates of Conditional Logit and Nested Logit Models When
Latent Class Model Is the True Model ............................................................................117

Table A-6: Performance of Conditional Logit, Nested Logit and Latent Class Models
When Conditional Logit Model Is the True Model .........................................................119

Table A-7: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models
When Conditional Logit Model Is the True Model .........................................................120

Table A-8: Performance of Conditional Logit, Nested Logit and Latent Class Models
When Nested Logit Model Is the True Model .................................................................122

Table A-9: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models
When Nested Logit Model Is the True Model .................................................................123

Table A-10: Performance of Latent Class Model When It Is the True Model ................125

Table A-11: Performance of Conditional Logit and Nested Logit Models When Latent
Class Model Is the True Model ........................................................................................126

Table A-12: Estimated Values of Marginal Quality Change of Latent Class Model When
It Is the True Model .........................................................................................................127

Table A-13: Estimated site values of latent class model when it is the true model .........128

Table A-14: Welfare Estimates of Conditional Logit and Nested Logit Models When
Latent Class Model Is the True Model ............................................................................129

ix

Table A-15: Performance of Conditional Logit, Nested Logit and Latent Class Models
When Conditional Logit Model Is the True Model .........................................................131

Table A-16: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models
When Conditional Logit Model Is the True Model .........................................................132

Table A-17: Performance of Conditional Logit, Nested Logit and Latent Class Models
When Nested Logit Model Is the True Model .................................................................134

Table A-18: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models
When Nested Logit Model Is the True Model .................................................................135

Table A-19: Performance of Latent Class Model When It Is the True Model ................137

Table A-20: Performance of Conditional Logit and Nested Logit Models When Latent
Class Model Is the True Model ........................................................................................138

Table A-21: Estimated Values of Marginal Quality Change of Latent Class Model When
It Is the True Model .........................................................................................................138

Table A-22: Estimated Site Values of Latent Class Model When It Is the True Model .139

Table A-23: Welfare Estimates of Conditional Logit and Nested Logit Models When
Latent Class Model Is the True Model ............................................................................139

Table B-1: Age and Gender Distribution of Census and Driver License List in Michigan
for People Age 16 or Older ..............................................................................................141

Table B-2: Age and Gender Distribution of Census and Driver License List for People
Age 16 or Older, for the Upper Peninsula and Lower Peninsula.....................................141

Table C-1: Mail Survey Sample Weights for Counties in the Lower Peninsula .............144

x

Table C-2: Results of a Probit Response/Nonresponse Model for the Mail Survey Using
Sample Weights ...............................................................................................................146

Table C-3: Joint Age, Gender and County Distribution of Driver License List ..............147

Table C-4: Joint Age, Gender and County Distribution of 9,591 Eligible Mail Survey
Respondents .....................................................................................................................147

Table C-5: Mail Survey Respondent Weights .................................................................148

Table C-6: Results of a Probit Response/Nonresponse Model for the Web Survey Using
Mail Survey Respondent Weights ...................................................................................149

Table C-7: Results of a Probit Response/Nonresponse Model for the Web Survey Using
Mail Survey Respondent Weights With Fewer Variables ...............................................150

Table C-8: Raking Weights for Web Survey Respondents with No Missing Data (NonNormalized) .....................................................................................................................151

Table C-9: Raking Weights for Web Survey Respondents with Missing Data ...............153

Table C-10: Distribution of Normalized Final Weights for Web Respondents...............154

Table D-1: Participation in Leisure Activities .................................................................157

Table D-2: Factors Influencing Participation in Great Lakes Beach Visitation ..............159

Table E-1: Parameter Estimates of Main Destination Model with and without Regional
Dummies ..........................................................................................................................161

xi

LIST OF FIGURES

Figure 1: Travel Cost Estimates over Some Iterations ......................................................19
Figure 2: Site Quality Estimates over Some Iterations ......................................................19
Figure 3: Decision Tree of Conditional Logit Model ........................................................46
Figure 4: Decision Tree of Two-Level Nested Logit Model .............................................46
Figure 5: Decision Tree with Participation/Nonparticipation............................................48
Figure 6: Public Great Lakes Beaches for Day Trips ........................................................60
Figure 7: GLOS Points on Great Lakes in Michigan .........................................................61
Figure 8: Decision Tree of Main-Destination Model ........................................................89
Figure 9: Decision Tree of Model Allowing Multiple Sites per Trip ................................90
Figure 10: Public Great Lakes Beaches Visited On Long Overnight Trips ......................99
Figure 11: Aggregated Beach Areas in the Long Overnight Trip Model ........................100
Figure 12: GLOS Points on Great Lakes in Michigan .....................................................100

xii

INTRODUCTION

Michigan has the longest freshwater coastline in the United States, and large numbers of
people visit public Great Lakes beaches every year. Beach recreation not only facilitates
the economic development of coastal areas, but also brings welfare to people that use
them. Although for public beaches there is generally no price, they do have economic use
values. The objectives of this dissertation are to quantify the demand for beach recreation
and measure the associate use through Random Utility Models (RUM) with data from
two surveys. The outcomes of our work can be applied to benefit-cost analysis in the
decision-making process. In addition, the estimated demand model structure can be
transferred to other locations for valuation of freshwater beaches.
Within the widely used random utility modeling framework, there are several
types of econometric model specifications. The latent class model assumes heterogeneity
in preferences while the nested logit model captures similarity in alternatives. The
conditional logit model is the simplest, since preferences are assumed to be the same and
alternatives are independent. The first chapter investigates relative performance of the
latent class model compared to the conditional logit and nested logit models. Monte
Carlo simulations are used to investigate model performances under several scenarios.
Results show that the latent class model does not always work as expected, and the nested
logit model was found to be more robust than the other two. Thus, the nested logit RUM
is applied in chapters 2 and 3.

1

The second chapter estimates use values of public Great Lakes beaches. A mail
survey on the leisure activities of Michigan residents was conducted to identify who did
and did not participate in Great Lakes beach recreation. People who participated were
then recruited to a web survey about their trips to public Great Lakes beaches for an
entire summer season. Day trip data was used in a nested logit model to produce
estimates of the value of Great Lakes beach use. Unlike most literature, nonusers, those
who had not visited Great Lakes beaches in the past two years, also enter the model to
test how this alters the way that results are generalized to the population.
The third chapter models multiple day recreation trips by chaining recreation sites.
In the recreation demand literature, multiple day trips are rarely modeled, but when they
are, the traditional way of modeling these trips is to assume only the primary destination
is visited (for the trips with more than one destination). In our web survey, participants
who take overnight trips of four days or more are asked to report on multiple beaches
they have visited in one randomly selected trip, which makes it possible to relax the
traditional single-site assumption and allow for visitation of a second beach on overnight
trips. The results are compared to those from the traditional model to see if the added
complexity and survey cost is warranted.

2

Chapter 1
Relative Performance of the Latent Class Model Compared to the Conditional Logit and
Nested Logit Models for Environmental Valuation

1 Motivation
Random Utility Models (RUMs) have been widely applied to recreation demand analysis
and valuation. Within its framework, according to Train (2003), different distribution
assumptions lead to different models such as conditional logit, nested logit (generalized
extreme value), probit and mixed logit models. Preferences over attributes are the same in
the conditional logit and nested logit models, while alternatives in the choice set can be
correlated in the latter. The probit model requires a normal distribution. The mixed logit
model is the most inclusive. Random parameter (or mixed logit models) and latent class
models are both frequently used to model preference heterogeneity. The random
parameter model imposes distributional assumptions over individual preference. The
latent class model assumes there are a number of latent groups in the population, and
people in different groups have different preferences. It can be treated as a discrete and
semi-parametric version of the random parameter model (Greene and Hensher (2003)).
Although not as flexible, the latent class model may have more power in interpretation
since it can link demographic characteristics to heterogeneous preferences. For example,
young people may value water quality more than old people and care less about travel
distance, as they are more likely to have contact with water. Hence, many studies have
valued recreation activities through the latent class model (Boxall and Adamowicz (2002),

3

Scarpa and Thiene (2005), Morey et al (2006), Owen and Videras (2007), Patunru et al
(2007), Scarpa et al (2007), Burton and Rigby (2009)).
Several studies have investigated how the latent class model performs against
others. Greene and Hensher (2003) compared the latent class and random parameter
models through an empirical data set from a stated choice experiment. They evaluated
willingness to pay indicators and elasticity and concluded that one was not absolutely
better than the other. Each model had its advantages and disadvantages. In their data set,
they found the latent class model was preferred statistically. Provencher and Bishop
(2004) examined the forecasting ability of the logit, random parameter and latent class
models based on salmon fishing on Lake Michigan. They showed that the latter two
performs equally well in trip prediction, and for other measures, the logit model could
have more reliable results. Hynes, Hanley and Scarpa (2008) studied preference
heterogeneity of kayakers using the latent class and random parameter models, and stated
that the latent class model might provide better interpretation. Kosenius (2010) analyzed
water quality data with the multinomial logit, random parameter and latent class models.
The author elucidated that when there were correlations among alternatives, the random
parameter model had a better fit to the data than the multinomial logit model. The latent
class model used demographic information to explain the heterogeneity in preferences.
Nonetheless, using real data, it is hard to tell whether or not the latent class model
can successfully recover the true preferences, because those true values are not known. In
the literature applying the latent class model to different areas, it is not uncommon to see
the estimated preference in one class be more than 10 times that of another class (Scarpa
and Thiene (2005), Train (2008), etc.). It is possible that discrepancies in preferences
4

among people are large, but it may also be that the model has drawbacks. A model that
cannot correctly reflect the real preferences could be misleading in empirical studies.
Therefore, in this chapter, Monte Carlo simulations are employed to test the reliability of
the latent class model, where the truth is known, and compare its performance to the
conditional logit and nested logit models in the context of environmental valuation. The
random parameter model is not under investigation as several studies above have
demonstrated that it performs similarly as the latent class model. The one that displays
robustness will be used for valuation in the following two chapters.

5

2 Models
The utility from visiting a recreation site can be expressed as:

where subscripts n and j denote individuals and sites. The construction of the covariate
matrix X depends on the specific model. It can include variables only varying across sites,
like site characteristics, variables only varying across people, like demographic variables,
variables varying across both sites and people, like travel cost, and their interaction terms.
The parameter vector β reflects people’s preferences. It can be fixed for all or different
for different groups. The random term ε represents individual and site factors influencing
utilities.
Based on the utility equation, a person will go to the site that generates the highest
utility in his/her choice set. Since individual errors cannot be observed from the
perspective of researchers, each site has a probability of being visited. Different models
have different expressions for the probability because of different distribution
assumptions of the errors. The maximum likelihood estimation searches parameter values
to maximize the joint probability of observed choices. Welfare measures of site loss or
characteristic change can then be computed from parameter estimates.

2.1 Conditional Logit Model

6

The conditional logit model assumes that the errors are independent and follow a Type I
extreme value distribution. The parameters are constants, and variables that are invariant
to sites must be excluded or interacted with

. Following Chapter 3 of Train (2003),

the probability of a site to be visited is:

()

∑

Let y be the binary variable indicating people’s choices. The log-likelihood
function is:

∑∑

(

( ))

where N is the total number of people and J is the total number of sites.
Because the model measures use value, person n only cares about the site he/she
visits, so only a loss of the chosen site or change on that site (if it is small enough not to
affect the original choice) affects this person’s welfare. Suppose person n chooses site g,
the loss of other sites or any changes on other sites are of no value to him/her. When site
g is closed, person n has to go to the site that gives the second highest utility, say site f;
then the reduction in utility is (

)

, where

), and the monetary loss is (

is marginal utility of income, the absolute value of the travel

7

cost parameter. When a marginal change happens on site j, the change in utility for
visitors is

, the parameter of site characteristic l, and

⁄

is its monetary value.

From a researchers’ point of view, however, uncertainty exists due to the error
term. Each site has a probability of being visited by anyone. Thus, those probabilities
need to be taken into account in welfare estimates. According to Chapter 8 of Haab and
McConnell (2002), the estimated welfare change for person n caused by the loss of site j
is:

(

̂ ( )) ̂ ; the estimated value of a marginal change on site

characteristic l of site j is:
and

̂( )

( ̂ ⁄ ̂ ), where ̂

and

̂

are estimates of

. The calculation applies to all sites, j=1, 2, …, J.

2.2 Nested Logit Model
Consider the simplest form, a two-level nested logit model, where the choice set is
divided into several nests based on site similarities. Within one nest, errors are correlated;
for two sites in different nests, errors are still independent. Following Chapter 4 of Train
(2003), the probability that a site is visited becomes:

(∑

()
∑

(∑

8

)
)

where

measures the degree of independence in errors among sites in nest k. This

parameter is normally assumed to be the same across all nests, so we will replace
with .

Compared with the conditional logit model, the calculation of estimated welfare
change from the loss of a site in the nested logit model is slightly more complicated. The
probability that person n chooses a site in nest k is:

(∑

( )
∑

)
(∑

)

And the probability that person n chooses site j conditional on the fact that nest k is
chosen is:

()
∑
According to Chapter 8 of Haab and McConnell (2002), the estimated welfare
change due to closure of site j is:

̂

((

̂ ( )| )

̂

( ̂)

9

̂( )

(

̂ ( )))

The estimated value of a marginal change on site characteristics has the same expression
as in the conditional logit model where the relevant site choice probabilities are from the
above nested logit formulas.

2.3 Latent Class Model
The latent class model relies on the assumption that people’s preferences are not the same
and they can be categorized into different classes, each having its own set of parameters.
Individuals know which class they are in, but researchers don’t. Within one class, people
behave exactly the same as in the conditional logit model. From researchers’ perspective,
a person can belong to any class with a probability. Then the probability that person n
chooses site j is the weighted average of the conditional logit probabilities in all classes.
In Chapter 6 of Train (2003), the probabilities of membership in each class are the
same for all people, which are actually the shares of people in the population for each
class. Suppose there are C classes in total, the choice probability is:

()

The shares

∑

(

∑

, c=1, 2, …, C can be estimated together with

)

, c=1, 2, …, C.

Instead of fixed shares, researchers may assume the probability of membership to
class c has a multinomial logit form, and can be predicted by individual information:

10

∑
where

is a covariate of individual characteristics, and

specific to class c, which can be estimated together with

is a vector of parameters
. The choice probability in

this case becomes:

()

∑

(

∑

)

∑(

∑

)(
∑

)

According to Boxall and Adamowicz (2002), the way to calculate the estimated
welfare measures is similar to what has been discussed above. The measure is an average
of welfare estimates from each class weighted by the corresponding estimated shares or
predicted probabilities of membership to each class.

∑

∑

∑

11

∑

3 Simulations
Monte Carlo analysis will be used to compare the three possible econometric
specifications. Three scenarios are constructed, where the data generating process follows
the latent class, conditional logit and nested logit models respectively. Under each
scenario, pseudo data is estimated using the three models. It is assumed that there are 3
sites, 1,000 people, and the utility equation contains two explanatory variables, travel cost
and site quality.

3.1 True Model-Latent Class Model
3.1.1 Simulation Steps
Suppose there are two classes with 700 people in the first class and 300 people in the
second class. Let the true parameters and shares of the two classes be:

12

where

and

McConnell (2002).

are set to match the model estimates reported in Chapter 8, Haab and
and

are assigned to make sure there is obvious distinction
1

between two classes. The Monte Carlo simulation steps are as follows :
(Step 1)

Take 3,000 random draws uniformly over the range from 0 to 100 as the

travel cost variable, since it varies across both sites and people. Take 3 uniform
random draws for the quality variable from 0 to 2, which just vary across sites.
Next, produce random errors for 1,000 people from a Type I extreme value
distribution with a normalized variance of

⁄

. From Chapter 9 of Train

(2003), the cumulative distribution function for

(

)

is:

(
(

and its inverse function is:

(
[ (

))
)]). Since (

) falls

between 0 and 1, we take random draws from a (0, 1) uniform distribution first
and then use the inverse CDF function to compute correspondent random
numbers for
(Step 2)

(Train 2003).

For the 700 people who are in the first class, extract their travel costs, site

quality and errors to compute their utilities. For each person, pick the maximum
among the three site utilities, mark it as one and others as zero, and we get the
pseudo observation for the chosen site. Table 1 shows an example of a person’s
randomly generated data for travel costs and for site quality for each of three sites.
1

Simulations are programmed in R.
13

The resulting utility is computed and implies site 1 is the best for this person. The
same approach is done for the 300 people who are in the second class, but with
different parameters in the utility equation. The choices of all 1,000 people form
the data for the dependent variable.
(Step 3)

Compute the true welfare measures. Since we know exactly which class

each person belongs to, the calculation for individual welfare measures is the
same as in the conditional logit model. Averaging site values and values of
marginal quality change over 700 people in class 1 and 300 people in class 2 will
produce true welfare measures in class 1 and class 2; averaging them over the
entire 1,000 people will produce the population’s true welfare measures for each
site.
Table 1: Simulating One’s Choice
Site
1
2
3
(Step 4)

Travel Cost
7.79
61.90
31.95

Quality
1.02
0.64
1.71

Error
-0.12
0.54
0.62

Utility
-0.09
-2.86
-0.46

Observation
1
0
0

Regress site choices on two explanatory variables (travel cost and site

quality) to get the estimated parameters, using conditional logit, nested logit and
2

latent class models. When estimating with the nested logit model , we try three
combinations for sites: site 1 and 2 as a nest, site 2 and 3 as a nest, and site 1 and
3 as a nest. Since our objective is to see whether the latent class model recovers
the truth, we set the number of classes to be two in the estimation, the same as the

2

The starting values are based on the conditional logit model estimates. For the travel
cost parameter, it is the estimate minus or plus 0.01; for the quality parameter, it is the
estimate minus or plus 0.1. BFGS is used to locate MLE estimates.
14

3

truth . Also, following Scarpa and Thiene (2005), we assume the probabilities of
membership to each class are:

( )

( )
( )

which is equivalent to fixed shares, s and 1-s. The share needs to be between 0
and 1, and the expressions above embed the constraint in the estimation process.
(Step 5)

The estimated welfare measures are then derived from those parameter

estimates. When it comes to the latent class model, individual welfare estimates
are averages of each class weighted by estimated shares. To make comparisons
with the true values, since we know 700 people are in class 1 and 300 people are
in class 2, we take the means of the former as the welfare estimates for class 1,
and the means of the latter as for class 2. The means over the entire 1,000 people
are compared to the population’s true welfare measures.
(Step 6)

Repeat the last part of step (1), which is generating new errors while

keeping explanatory variables the same, and step (2) to (5) 1,000 times. We then
have a random sample of size 1,000 for each set of estimates. For each sample,
compute the descriptive statistics, such as mean, median, variance, quartiles and
mean squared error (MSE).

3

When estimating with the latent class model, how many classes should be considered is
a big issue. Train (2008) illustrated how the EM algorithm would estimate parameters
with three types of discrete distributions. With the latent class model, the researcher tried
different numbers of segments varying from 1 to 30, and found that class number of 8
(indicated by Bayesian Information Criterion) and 25 (indicated by Akaike Information
Criterion) worked the best for that specific data set.
15

3.1.2 Simulation Results
With two classes, the probability expression of the latent class model is:

()

∑

∑

16

4

Table 2: Performance of Latent Class Model When It Is the True Model

rd

True

Mean

Var.

MSE

Min.

1 Quartile

Median

Max.

̂

3
Quartile

-0.06

-0.20

0.704

0.723

-8.97

-0.069

-0.063

-0.056

2.78

̂

0.49

1.84

276.47

278.02

-222.3

0.40

0.48

0.64

142.7

̂

-0.10

-0.46

2.10

2.23

-8.83

-0.14

-0.075

-0.068

6.89

̂

0.21

0.36

29.38

29.38

-41.09

0.14

0.36

0.45

62.98

st

̂5
̂ ̂

0.70

0.49

0.069

0.112

0.007

0.35

0.50

0.71

0.99

-8.17

-9.16

65679.5

65614.5

-1940

-10.42

-7.44

-6.13

7318.0

̂ ̂

-2.10

-3.34

12.49

14.02

-10.62

-6.08

-4.44

-1.04

9.59

-0.07

-0.147

0.061

0.066

-1.98

-0.093

-0.075

-0.068

-0.024

0.41

0.51

2.08

2.09

-9.45

0.35

0.42

0.49

15.44

-6.35

-6.12

70.03

70.01

-42.6

-6.83

-6.28

-5.65

246.9

̂6
̂7
̂
̂8

4
5

Iterations in which the estimation fails to converge are excluded. The results come from the remaining 996 iterations.
This is computed from

̂ , and it matches the class 1 estimates.

̂ and ̂ .
It is the estimated site quality parameter on average, weighted by ̂ and ̂ .
8
It is the ratios of estimated parameters in two classes, weighted by ̂ and ̂.
6

It is the estimated travel cost parameter on average, weighted by

7

17

The two terms are interchangeable, i.e. we cannot tell which set of estimates are
for class 1 and which for class 2 simply by the orders showing in the log-likelihood
function. The true parameter ratios of class 1 and 2 are -8.17 and -2.10 respectively. Thus,
we take ratios of the two sets of estimated parameters, and treat the one with a ratio larger
in absolute value as class 1 estimates. Table 2 shows the descriptive statistics of class 1
estimates, class 2 estimates, their weighted averages, and estimated shares. For parameter
estimates and their averages, the medians are much closer to true values than the means,
which are influenced by extreme values. Variances and mean squared errors (MSE) are
also affected by extreme values. Class 1 estimates perform better than class 2, which may
be attributed to its larger number of people. If we look at the quartile ranges, the
estimates are somewhat acceptable half of the time; but still, for the travel cost estimate
of class 1,

̂ , which is the least biased, the range is around +/- 10%; for the site

quality estimate of class 1,

̂ , the range grows to +/- 20%.

The share of class 1 is underestimated by 28.5%. On average though, the latent
class model performs fine, a 4.2% downward bias in the travel cost estimate, a 2.2%
upward bias in the site quality estimate and a 1.1% upward bias in the ratio. Variances
and MSEs are much smaller. Although there are extreme values that get estimated for
some of the preference parameters within the classes, these extreme values receive a
weight close to 0 because the class probability becomes close to 0, as shown in Figure 1
and 2. In fact, it is the weighted average of probabilities that enter the likelihood function,
so the latent class model works well on average.

18

1
0.8
0.6
0.4
0.2
0
1

-0.2

2

3

4

5

6

7

8

-0.4

9 10 11 12 13 14 15 16 17

Iterations
Class 1 travel cost estimate

Class 2 travel cost estimate

Estimated share of class 1
Figure 1: Travel Cost Estimates over Some Iterations

3
2
1
0
1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17

-1
-2
-3
-4

Iterations

Class 1 site quality estimate
Class 2 site quality estimate
Estimated share of class 1

Figure 2: Site Quality Estimates over Some Iterations

19

The results above are based on two classes in the population. In empirical studies,
estimation will be conducted under different numbers of classes and Akaike Information
Criteria (AIC) or Bayesian Information Criteria (BIC) are used to decide which is optimal.
Scarpa and Thiene (2005) used the following expression:

( )
where J is the number of estimated parameters, log(L) is the log-likelihood function
valued at estimated parameters, and κ is a constant. AIC has κ=2; BIC has κ=log(N),
where N is the sample size. We re-estimate the data assuming three classes in each
iteration. Over all iterations where both estimations converge, AIC will select two classes
93% of the time, and BIC 100% of the time. So the latent class model can self-detect the
true number of classes using both criteria.
When the true model is the latent class model, the conditional logit and nested
logit models measure the average effects, so the true values of the parameters are
weighted averages of two classes. The true value of λ in the nested logit model is 1 as
sites are all uncorrelated, and how the nests are constructed doesn’t matter, which can be
seen from Table 3. Both models produce similar and reliable estimates, about a 7%
upward bias in the travel cost estimates, a 3% upward bias in the site quality estimates
and a 2.5% upward bias in the ratios. Since variances are very small, the distributions of
the estimates are well described by the median, minimum and maximum.

20

Table 3: Performance of Conditional Logit and Nested Logit Models When Latent Class model Is the True Model

Conditional
Logit

Nested
10
Logit

Nested
11
Logit

Nested
12
Logit

9

̂
̂
̂ ̂
̂
̂
̂
̂ ̂
̂
̂
̂
̂ ̂
̂
̂
̂
̂ ̂

9

Mean

Var.

MSE

Min.

Median

Max.

-0.072

-0.067

1.1e-05

3.3e-05

-0.080

-0.067

-0.057

0.406

0.42

3.3e-03

3.4e-03

0.238

0.417

0.591

-6.35

-6.19

0.69

0.72

-9.15

-6.23

-3.52

1.00

1.02

8.5e-03

8.8e-03

0.70

1.01

1.43

-0.072

-0.068

2.0e-05

3.6e-05

-0.084

-0.068

-0.053

0.406

0.42

3.5e-03

3.7e-03

0.226

0.419

0.610

-6.35

-6.18

0.70

0.73

-8.94

-6.20

-3.61

1.00

1.01

8.8e-03

9.0e-03

0.78

1.01

1.38

-0.072

-0.068

2.1e-05

3.8e-05

-0.088

-0.068

-0.055

0.406

0.42

3.3e-03

3.5e-03

0.23

0.42

0.59

-6.35

-6.17

0.74

0.77

-8.97

-6.16

-5.61

1.00

0.98

0.010

0.011

0.69

0.98

1.36

-0.072

-0.067

1.6e-05

4.3e-05

-0.080

-0.067

-0.055

0.406

0.41

4.0e-03

4.1e-03

0.21

0.41

0.63

-6.35

-6.13

0.75

0.79

-8.50

-6.14

-3.18

True

The values are the averages of true parameters in two classes, weighted by true shares.

10
11
12

Site 1 and 2 are in one nest.
Site 2 and 3 are in one nest.
Site 1 and 3 are in one nest.
21

There is a side note on the parameter λ in the nested logit model. It normally falls
within 0 and 1, but based on Train (2003), it can be greater than 1, so the main concern is
a positive λ. The estimation results in Table 3 are from unconstrained maximization.
Actually, when λ is constrained to be between 0 and 1, we have an estimate very close to
1, and other parameter estimates are almost identical to those of the conditional logit
model. Hence, in our cases, it makes little difference whether the constraint is imposed or
not.
The welfare estimates of the latent class model have similar patterns as its
parameter estimates. The quartile ranges suggest somewhat acceptable performance, and
extreme values from some iterations distort the means. But average welfare measures
perform well because the extreme values within a class receive a low weight since the
estimated class share is small. The conditional logit and nested logit model produce
welfare measures close to true average values. And all three nest structures lead to the
same results.

22

Table 4: Estimated Values of Marginal Quality Change of Latent Class Model When It Is the True Model

Site

Class 1

Class 2

Average

13

1
2
3
1
2
3
1
2
3

True

13

-2.09
-3.68
-2.40
-0.64
-0.83
-0.62
-1.65
-2.82
-1.87

st

Mean

Var.

MSE

Min.

1
Quartile

-2.08
-6.81
-0.26
-0.87
-1.50
-0.97
-1.53
-2.90
-1.69

5446.8
24521.6
3314.7
1.10
2.03
1.18
3.14
24.43
3.99

5441.4
24506.8
3316.0
1.15
2.47
1.30
3.15
24.41
4.02

-1562
-1938
-114.9
-2.51
-5.19
-2.92
-19.75
-39.68
-4.26

-2.13
-4.94
-2.76
-1.65
-2.64
-1.80
-1.72
-3.29
-1.93

rd

Median

3
Quartile

Max.

-1.86
-3.32
-2.12
-1.26
-1.84
-1.34
-1.59
-2.88
-1.78

-1.60
-2.64
-1.79
-0.32
-0.39
-0.33
-1.42
-2.56
-1.59

1438
4075
1805
3.92
2.25
3.42
48.13
138.1
60.62

The true values are averages over 1,000 iterations. It is the same with all the true welfare measures below.
23

Table 5: Estimated Site Values of Latent Class Model When It Is the True Model
Site
Class
15
1
Class
16
2
Average
17

14

1
2
3
1
2
3
1
2
3

st

14

Mean

Var.

MSE

Min.

7.28
16.06
8.72
8.22
11.65
8.18
7.56
14.74
8.56

3.25
14.58
4.30
7.91
12.61
8.46
7.46
14.55
8.34

8950.2
129200
13828
2.30
8.34
1.93
11.40
164.6
18.73

8956.7
129062
13832
2.40
9.24
2.00
11.40
164.4
18.76

-2744
-10210
-3551
-10.11
-11.59
-10.46
-87.04
-338
-114

True

1
Quartile
6.81
14.63
8.21
7.43
10.72
8.29
7.38
14.32
8.33

rd

Median
7.41
15.82
8.53
7.75
13.59
8.50
7.58
14.79
8.49

3
Quartile
7.73
18.09
8.80
8.29
14.65
8.70
7.80
15.26
8.66

Max.
171.5
2666
162.5
15.48
19.15
15.84
9.76
65.3
10.55

The true values are averages over 1,000 iterations. It is the same with all the true welfare measures below.

15

In some iterations, we could get very abnormal estimates for both classes. The very large scale of the travel cost estimate
makes travel cost extremely important in the decision-making process of where to go. A person will just go to the nearest site.
If that site is closed, the welfare loss is huge. As a result we will have infinite site values. So we exclude those iterations in the
analysis of welfare estimates. The results here are from 920 iterations.
16
The results are from 881 iterations.
17

The results are from 805 iterations.
24

Table 6: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model
Site
Conditional
Logit
Nested
Logit (Site 1 and 2)
Nested
Logit (Site 2 and 3)
Nested
Logit (Site 1 and 3)

1
2
3
1
2
3
1
2
3
1
2
3

Site Loss/Closure
True
Estimate
7.56
7.57
14.74
14.62
8.56
8.55
7.56
7.60
14.74
14.64
8.56
8.50
7.56
7.55
14.74
14.63
8.56
8.57
7.56
7.55
14.74
14.67
8.56
8.52

25

Quality Change
True
Estimate
-1.65
-1.67
-2.82
-2.70
-1.87
-1.83
-1.65
-1.67
-2.82
-2.69
-1.87
-1.82
-1.65
-1.66
-2.82
-2.69
-1.87
-1.82
-1.65
-1.65
-2.82
-2.68
-1.87
-1.81

3.2 True Model-Conditional Logit Model
To apply simulations in the scenario where the true model is the conditional logit model, the
steps are almost the same as in the previous section, except that all 1,000 people have the same
preferences in the true world. The true parameters, again taken from Haab and McConnell as in
class one above, are:

The simulation results are summarized in Table 7, Table 8 and Table 9.

26

18

Table 7: Performance of Latent Class Model When Conditional Logit Model Is the True Model
st

rd

True

̂
̂
̂
̂

18

Mean

Var.

MSE

Min.

1
Quartile

Median

3
Quartile

Max.

-0.06

-0.085

0.017

0.018

-2.17

-0.067

-0.062

-0.059

-0.012

0.49

0.65

3.26

3.28

-28.78

0.39

0.52

0.66

16.85

-8.17

-9.37

570.6

571.5

-298.6

-10.3

-8.40

-6.61

401.0

The parameters are averages of each class weighted by estimated shares.
27

Table 8: Performance of Conditional Logit and Nested Logit Models When Conditional Logit Model Is the True Model
True
Conditional
Logit

̂
̂
̂ ̂

Mean

Var.

MSE

Min.

Median

Max.

-0.06

-0.06

9.6e-06

9.6e-06

-0.07

-0.06

-0.05

0.49

0.49

0.020

0.020

-0.01

0.48

1.00

-8.17

-8.13

5.65

5.64

-17.2

-8.04

0.23

1.00

1.01

9.0e-03

9.0e-03

0.71

1.01

1.34

-0.06

-0.06

1.5-05

1.6e-05

-0.08

-0.06

-0.05

0.49

0.49

0.024

0.024

-0.003

0.49

1.02

̂ ̂

Nested
Logit (Site
1 and 2)

̂
̂
̂

-8.17

-8.16

6.05

6.04

-16.57

-8.05

0.06

Nested
Logit (Site
1 and 3)

1.00

1.00

9.8e-03

9.8e-03

0.72

1.00

1.32

-0.06

-0.06

1.7e-05

1.7e-05

-0.07

-0.06

-0.05

0.49

0.49

0.023

0.023

-0.08

0.49

1.01

̂ ̂

Nested
Logit (Site
2 and 3)

̂
̂
̂

-8.17

-8.20

7.21

7.20

-17.48

-8.11

1.40

̂
̂
̂

1.00

1.00

9.2e-03

9.2e-03

0.72

1.00

1.41

-0.06

-0.06

1.6e-05

1.6e-05

-0.08

-0.06

-0.05

0.49

0.49

0.023

0.023

-0.05

0.49

1.10

̂ ̂

-8.17

-8.13

5.96

5.96

-17.98

-8.07

1.05

28

Table 9: Welfare Measures of Conditional Logit, Nested Logit and Latent Class Models When Conditional Logit Model Is the
True Model
Site
Conditional
Logit
Nested
Logit (Site 1 and
2)
Nested
Logit (Site 2 and
3)
Nested
Logit (Site 1 and
3)
Latent
Class

19

1
2
3
1
2
3
1
2
3
1
2
3
1
2
3

Site Loss/Closure
True
Estimate
9.01
9.01
12.06
12.06
10.92
10.90
9.01
9.02
12.06
12.08
10.92
10.89
9.01
9.02
12.06
12.06
10.92
10.90
9.01
9.01
12.06
12.07
10.92
10.90
19
9.01
9.31
12.06
12.04
10.92
10.91

Quality Change
True
-2.39
-3.00
-2.78
-2.39
-3.00
-2.78
-2.39
-3.00
-2.78
-2.39
-3.00
-2.78
-2.39
-3.00
-2.78

After we exclude iterations with infinite site values, 901 iterations are used to compute the averages.
29

Estimate
-2.35
-3.00
-2.78
-2.36
-3.01
-2.78
-2.37
-3.02
-2.79
-2.35
-3.00
-2.78
-1.69
-3.78
-3.90

The latent class model performs fairly well based on the quartile ranges. For the
medians, there’s a 3.3% downward bias in the travel cost estimate, a 6.1% upward bias in
the site quality estimate and a 2.8% downward bias in the ratio. The means are not as
good due to extreme values from some iterations. Parameter estimates of the conditional
logit model and nested logit model are very close to true values. In the nested logit model,
how the nests are constructed doesn’t matter as sites are all uncorrelated. As discussed
above, if the parameter λ is constrained to be between 0 and 1, its estimate will be nearly
1, and other estimates are almost identical to those of the conditional logit model.
The estimated site values of the latent class model perform quite well, which
makes sense because the travel cost estimate has good properties. The estimated values of
marginal quality change are somewhat different from true values, which is attributed to
the bias in the estimated parameter ratio. For site 1, there is a 29% upward bias; for site 2,
there is a 26% downward bias; for site 3, there is a 40% downward bias. The conditional
logit and nested logit models give very good welfare measures.

3.3 True Model-Nested Logit Model
3.3.1 Simulation Steps
To simulate the true world with the nested logit model as the true model, instead of
generating random errors from a multivariate extreme value distribution, we follow what
has been done in Herriges and Kling (1997) as detailed below.

30

With the true parameters as

, we can

compute the probabilities each person visits each site, say

and

. Then a

number is drawn from a [0, 1] uniform distribution, denoted as x. If x is less than

, this

person will choose site 1; if x is greater than

but less than (

), this person

will choose site 2; if x is greater than (

), this person will choose site 3. By

repeating this procedure for all people we get the pseudo observations. In different
iterations, the probabilities remain the same, but x is newly drawn, so the observations are
different.
When the true model is the nested logit model, some sites are correlated. The IIA
assumption no longer holds in the true world, so both the conditional logit and latent class
models would produce biased parameter and welfare estimates. The site quality estimate
is more biased than the travel cost estimate, so the estimated values of marginal quality
change deviate more from true values than the estimated site values. For the latent class
model, the median of the average quality estimate is more than two times the true value;
the bias in the median of the average travel cost estimate is about 45%. The bias in the
means is larger.
The nested logit model recovers the truth very well if the nest structure is correct.
When the nest structure is incorrect, however, the model approaches the conditional logit
model. We find that with a correct nest structure, the results from unconstrained and
constrained maximization are the same; with an incorrect nest structure, the estimate of λ
is closer to 1 in constrained maximization than in unconstrained maximization, and other
31

estimates are also closer to those of the conditional logit model. Therefore, the nested
logit model will perform at least as well as the conditional logit model regardless of the
true nest structure.

3.3.2 Simulation Results
The simulation results are shown in Table 10, Table 11 and Table 12.

32

Table 10: Performance of Latent Class Model When Nested Logit Model Is the True Model
st

rd

True

̂
̂
̂
̂

Mean

Var.

MSE

Min.

1
Quartile

Median

3
Quartile

Max.

-0.06

-0.099

9.4e-03

0.011

-1.86

-0.095

-0.087

-0.078

-0.050

0.49

1.86

12.26

14.13

-23.98

0.82

1.07

1.61

62.81

-8.17

-13.65

242.6

272.4

-285.9

-16.28

-14.05

-11.44

272.4

33

Table 11: Performance of conditional logit and nested logit models when nested logit model is the true model
True
Conditional
Logit

̂
̂
̂ ̂

Mean

Var.

MSE

Min.

Median

Max.

-0.06

-0.07

1.3e-05

1.9e-04

-0.09

-0.07

-0.06

0.49

0.88

0.03

0.19

0.29

0.88

1.46

-8.17

-12.05

6.19

21.30

-19.94

-12.04

-3.68

0.50

0.50

3.5e-03

3.5e-03

0.34

0.50

0.73

-0.06

-0.06

1.4e-05

1.4e-05

-0.07

-0.06

-0.05

0.49

0.49

0.024

0.024

0.01

0.49

1.00

̂ ̂

Nested
Logit (Site
1 and 2)

̂
̂
̂

-8.17

-8.16

6.11

6.11

-14.83

-8.08

-0.14

Nested
Logit (Site
1 and 3)

0.50

1.24

0.012

0.56

0.93

1.23

1.58

-0.06

-0.081

2.9e-05

4.7e-04

-0.10

-0.081

-0.067

0.49

0.96

0.042

0.27

0.30

0.97

1.57

̂ ̂

Nested
Logit (Site
2 and 3)

̂
̂
̂

-8.17

-11.93

6.41

20.55

-19.74

-11.94

-3.52

̂
̂
̂

0.50

1.28

0.016

0.62

0.89

1.28

1.69

-0.06

-0.083

3.5e-05

5.6e-04

-0.11

-0.083

-0.067

0.49

0.79

0.044

0.13

-0.018

0.80

1.44

̂ ̂

-8.17

-9.63

7.36

9.49

-18.32

-9.52

0.20

34

Table 12: Welfare measures of conditional logit, nested logit and latent class models when nested logit model is the true model
Site
Conditional
Logit
Nested
Logit (Site 1 and
2)
Nested
Logit (Site 2 and
3)
Nested
Logit (Site 1 and
3)
Latent
Class

20

1
2
3
1
2
3
1
2
3
1
2
3
1
2
3

Site Loss/Closure
True
Estimate
9.54
10.35
7.38
7.64
13.27
12.12
9.54
9.54
7.38
7.39
13.27
13.28
9.54
9.71
7.38
8.03
13.27
12.47
9.54
10.81
7.38
7.30
13.27
12.12
20
9.54
9.79
7.38
7.72
13.27
12.64

Quality Change
True
Estimate
-2.71
-4.07
-2.29
-3.32
-3.17
-4.66
-2.71
-2.70
-2.29
-2.27
-3.17
-3.19
-2.71
-3.93
-2.29
-3.35
-3.17
-4.65
-2.71
-3.31
-2.29
-2.63
-3.17
-3.68
-2.71
-3.57
-2.29
-1.70
-3.17
-8.39

After we exclude iterations with infinite site values, 838 iterations are used to compute the averages.
35

3.4 Sensitivity Analyses
To see how sensitive the results are to underlying factors, we conduct sensitivity analyses.
In the simulations above, the parameter ratio of travel cost over site quality in one class is
about four times of that in the other class. We picked two pairs of true parameters from
Hynes, Hanley, et al. (2008) so that the difference between parameter ratios becomes
even larger, around 24 times. Preferences of the two classes are very distinct, which
might help to identify a person’s membership. Second, we increase the number of sites
from three to seven. With more sites, there is more variation in people’s site choices with
different preferences, and it might be easier to tell which class one belongs to from their
observations. Also, we changed the true shares of two classes as 50% and 50%. By
having equal number of people, any disadvantage of having a smaller group is removed.
21

It turns out that all results display the same pattern as before . Hence, we conclude that
the inherent functioning of the latent class model produces outcomes that work well on
average but not necessarily for the individual classes since the above assumptions used in
the simulations had little influence on the patterns of the results.

21

See Appendix A for results.
36

4 Discussion and Conclusions
The latent class model has been broadly applied in many areas, including within
environmental economics for valuation studies and for recreation demand analyses. In
this chapter, we use Monte Carlo simulations in the context of recreation site choices to
test whether the latent class model will successfully recover the truth and how it performs
compared to two other widely used site choice models, the conditional logit and nested
logit models. By conducting simulations under three true scenarios, we find that the latent
class model works at best the same as the conditional logit model, and is inferior to the
nested logit model when alternatives are no longer independent.
The latent class model aims to capture preference heterogeneity by assuming
there are a number of latent groups in the population. However, even if this is the true
scenario, we don’t know the true number of groups or everyone’s membership. For the
former, we can try a set of group numbers and let the data tell which is optimal. Based on
our findings, the two information criteria frequently used select the correct one at least 90%
of the time, which indicates that the latent class model can recover the true number of
preference groups in the population at an acceptable confidence level. For the latter, we
either rely on the data using fixed group shares in the estimation, or infer the probabilities
to each group through demographics. No matter how small a group is or how low a
probability could be, the uncertainty exists over which group a person belongs to. Thus, it
is the averages weighted by group shares or personal probabilities that enter the loglikelihood function. That is to say, the values of preference parameters in each group and
corresponding shares do not matter, as long as the averages, which are their combinations,
maximize the log-likelihood function. So in the simulations, we see that the latent class
37

model performs very well on average for both parameter and welfare estimates. But for
each individual class, the estimated parameters and group shares can deviate from the
true values, sometimes substantially. The latent class model does not always do a good
job identifying classes of a population with distinct preferences as is designed to do. It
could misallocate individuals in groups together with biased preferences. The positive
finding was that 50% of the time the bias from poorly estimated class sizes or parameters
may not be very large. In addition, the commonly applied information criteria are likely
to self-detect the true number of groups using the latent class model.
The conditional logit model has the simplest form, yet it has very good
performance when the unmeasured site characteristics in the errors are truly uncorrelated
with one another. When the true model is the conditional logit model itself, estimates are
close to true values, and variances and MSEs are quite small; when the true model is the
latent class model, conditional logit does well in recovering population averages. The
conditional logit estimated parameters and welfare measures sometimes even have even
better properties than population average estimates of the latent class model. In fact, the
conditional logit model can be viewed as a degenerate latent class model with the
constraint that preferences in all groups are the same. So we may be better off by
imposing constraints in maximum likelihood estimation. Both models are expected to be
biased if there is correlation among sites.
In all true scenarios, the nested logit model has the best performance among the
three models considered regardless of the true model. When sites are independent, how
the nests are constructed is irrelevant. When the true model is the nested logit model, a
correct nest structure gives estimates almost identical to true values. If the nest structure
38

is wrong, the results are very similar to the conditional logit model. Hence, the nested
logit model can detect an incorrect nest structure and go with no nesting as a solution. As
discussed in Herriges and Kling (1997), some nesting works better than no nesting, which
may be attributed to the additional degrees of freedom available in the nested logit model.
In conclusion, for future use of the latent class model, one should be cautious
interpreting the meanings of estimated class-specific parameters and the population
segment sizes. If the estimates seem extreme when compared to the other classes or when
a class membership is of very small size, one may be better off using a conditional logit
model. In addition, the robustness and reliability of the nested logit model justifies its
application to the Great Lakes beach survey data in the following two chapters.
For future research, it is worth considering true scenarios with more preference
groups in the population and estimating a latent class model with a variety in the number
of classes. For example, in applications, Greene and Hensher (2003) and Provencher and
Bishop (2004) had three classes, Scarpa and Thiene (2005) had four, Hynes, Hanley and
Scarpa (2008) had six, and Train (2008) had eight and twenty-five. It is possible that
having more latent groups in the truth might help the latent class model identify
individual class preferences. More variation in the true scenarios may make the
estimation more stable, and the ability of the latent class model to detect the true number
of classes can be further tested by using a variety of class numbers. Another possible
future direction would be to extend the modeling of class memberships to include a rich
set of demographic variables. Also, instead of generating explanatory variables,
simulations may be applied to survey data with real travel cost and site quality as well as

39

demographic information, with which individual-specific membership to each group can
be modeled.

40

Chapter 2
Estimating Use Values of Public Great Lakes Beaches in Michigan

1 Motivation
People often take trips to public beaches in their leisure time to participate in recreation
activities. Although there is no explicit market for pricing, through recreation demand
models, monetary use values of public beaches can be derived, which have important
policy implications. If policy-makers consider initiating environmental protection or
remediation projects related to beaches, they might apply benefit-cost analysis and weigh
costs against benefits, which come from increased trips. Moreover, establishing the
economic value of beach recreation can help policy makers think about the relative value
of various natural assets as they consider funding allocations among competing areas of
need.
Many researchers have evaluated the economics of beaches along the coastlines
of oceans. For instance, Deacon and Kolstad (2000) summarized several studies in 1970s
and 1980s on saltwater beach valuation, the results of which ranged from $0.70 to $13.55
per beach day in 1990 dollars. Hilger and Hanemann (2006) used data from a survey on
households in Southern California about their annual beach trips, and computed an
average willingness to pay of $5.71 in 2001 dollars, for an increase of one letter grade on
a water pollution rating scale. Lew and Larson (2008) had a telephone-mail-telephone
survey on randomly chosen households in San Diego County and asked eligible
participants about their trips to beaches. They computed the value of having access to
41

beaches to be between $21 and $23 per day in 2000 dollars. Parsons et al (2009) surveyed
Texas residents living within 200 miles of the Gulf of Mexico and showed that if all
Padre Island beaches were closed, the mean loss would be $20 per trip in 2008 dollars.
Nonetheless, very few studies have focused on public Great Lakes beaches, which
are located along the largest group of freshwater lakes on the Earth and have unique
characteristics of their own. Murray, Sohngen and Pendleton (2001) took an on-site
survey at Maumee Bay and Headlands State Park beaches on Lake Erie and calculated
the value per beach day to be $25 for the former and $15 for the latter, in 1998 dollars.
However, because Lake Erie is smaller and its coastline is quite different when compared
with Lake Michigan and Lake Huron, it is unclear if their results can be generalized to
the entire Great Lakes. Song, Lupi and Kaplowitz (2010) did a web survey on visitation
to public Great Lakes beaches using a convenience sample from a consumer web panel of
Michigan adults and concluded that the welfare loss of eliminating a beach was around
$50 per visitor in 2006 dollars. However, the web panel was not representative of the
general population and their trip location data was only for the beach visited most often.
In addition, recreation demand models are usually applied to people who
participate in the activities. Although this assumption has some efficiencies for activities
that require a license (e.g., fishing), it is worth investigating how to generalize the results
to the entire population for activities that are more general such as beach use. Shaw (1988)
addressed the issues of truncation and endogenous stratification for on-site sampling
using a Poisson model. Englin and Shonkwiler (1995) proposed the negative binomial
model with count data to improve estimation. Shonkwiler and Shaw (1996) defined three
groups of people in recreation as “nonusers”, who never participated, “potential users”,
42

who would participate but didn’t in the survey season, and “users”, who always
participated, and put single and double hurdles into the count data model. But these
solutions are for single site models. In the context of multiple sites, von Haefen et al
(2005) took the day trip data from a survey to Delaware residents on their visitation to
Mid-Atlantic ocean beaches, and integrated single and double hurdles into discrete choice
models to model the behavior of not taking a day trip in the season, where the total
number of day trips was zero over all choice occasions. Instead of distinguishing people
by the number of trips, English (2008) treated people who held licenses for shrimp
baiting as participants, the rest as nonparticipants. He derived a participation hurdle by
equating seasonal consumer surplus with the cost of license. The hurdle was added to the
nested logit model where one chose to purchase a license or not. The survey was only
sent to license holders. Information on nonparticipants was obtained in aggregate form
from the census data at ZCTA (Zip Code Tabulation Areas) levels.
If we adopt the definitions in Shonkwiler and Shaw (1996), nonusers and
potential users were pooled in von Haefen et al (2005), because both groups would have
no day trip in the season. The hurdles modeled the difference between the aggregation of
these two groups and the user group. In English (2008), nonusers were separated from the
pool by not holding the license. Potential users would pass the participation hurdle as
users and decide not to go for shrimp baiting in every choice occasion. However, there
was no survey on nonusers. Also, identification of the three groups would not be that
straightforward in beach recreation. To fill the gap, we conducted a two-stage survey of
Michigan residents where a screener mail survey was followed by a web survey. The
purpose of the mail survey was to find users and potential users of beach recreation and

43

collect data on nonusers at the individual level. The web survey was implemented on
users and potential users, who were asked to report seasonal trips on public Great Lakes.
In this chapter, we apply the repeated nested logit model to the survey data to estimate
use values of public Great Lake beaches in Michigan, and the model is augmented with a
participation hurdle to examine how different forms of generalizing to the population
affect the results.

44

2 Models
2.1 Random Utility Models
Random Utility Models are widely applied for recreation demand with multiple sites.
Following Train (2003), the utility person n receives from visiting a beach j in choice
occasion t is the sum of a deterministic term and a random term:

where the so called indirect utility

.

characteristics, or simply beach-specific constants.

is a vector of beach

varies across people and

beaches and may include travel cost and interactions between demographics and beach
characteristics.

captures all other factors that affect utilities but cannot be observed

by researchers.
In a choice set with J beaches, person n will choose beach j in choice occasion t if
and only if:

Suppose

is independently, identically distributed as Type I extreme value

distribution, from researchers’ point of view, person n will have a one-level decision tree
as in Figure 3. The probability of choosing beach j in choice occasion t is:

45

()

∑

It is called the conditional logit model, and implicitly assumes the property of
independence from irrelevant alternatives (IIA). That is, the relative probability of
choosing beach j over beach i in every choice occasion is not influenced by the number or
attributes of other alternatives. In reality, this does not hold most of the time. If
a generalized extreme value distribution, the IIA assumption will be relaxed to some
extent. And the decision tree will have two levels as in Figure 4.

Figure 3: Decision Tree of Conditional Logit Model

Figure 4: Decision Tree of Two-Level Nested Logit Model
46

has

The probability of person n going to beach j in nest k in choice occasion t is:

(

(∑

)
∑

where

)

(∑

)

is the number of alternatives in nest k. This model is referred to as the nested

logit model. IIA holds within nests, but not across nests. The parameter

measures the

degree of independence among the alternatives in nest k. The higher it is, the lower
correlation between these alternatives and the closer the nested logit model to the
conditional logit model. It can also be interpreted as the parameter on the lower level’s
inclusive value. It is normally assumed to be the same across all the nests so that the
model will converge. And we can replace

with .

(

) can be decomposed

into the multiplication of the probability to choose beach j conditional on nest k, and the
probability to choose nest k in choice occasion t.

(

)
∑

(∑

( )
∑
(

)

)

(∑
(
47

)
)

( )

We use a repeated nested logit model on day trips (Morey et al (1993)) for our
analysis, since there are a number of choice occasions in one summer season. Following
English (2008), with a participation hurdel, the decision tree is illustrated in Figure 5.

Figure 5: Decision Tree with Participation/Nonparticipation
At the top level, nonusers will not participate. People who overcome this hurdle
will decide whether to take a day trip in each choice occasion. Potential users have the
status quo utility exceed the utility of visiting a public Great Lakes beach in every choice
occasion and take no day trip. Otherwise, they will become users and take at least one
day trip over the season. The nests are defined by different Great Lakes, since they have
their own characteristics.
For users and potential users, in choice occasion t, the probability that person n
chooses beach j conditional on going to lake k is:

48

(

)
∑

The probability of going to lake k conditional on taking a day trip is:

(

(∑

)
∑

)

(∑

)

Denote the indirect utility of not taking a day trip in the current occasion as

.

The probability of taking a day trip in this occasion is:

(

(∑

)

(∑
(∑

) )
(∑

) )

Then, the unconditional probability of person n visiting beach j on lake k is:

(

)

(
(∑

)

(
(∑

)
(∑

)
(∑

(∑

(

)
) )

) )

The probability that person n doesn’t take a day trip in choice occasion t is:

49

(

)
(∑

(∑

) )

For users and potential users, the so called inclusive value, which is the maximum
utility person n can attain in choice occasion t, is:

(

( ∑ (∑

) ) )

As shown in Figure 5, the participation hurdle is imposed for the overall season.
To derive the participation hurdle, unlike activities requiring licenses, the cost of entry is
zero for beach recreation, although parking fees or access fees may apply on some public
beaches. Following English (2008), people who participate will have positive consumer
surplus, which means that the seasonal utility of participating is greater than the status
quo utility of not participating. The sum of every choice occasion’s inclusive value gives
person n’s seasonal maximum utility:

∑

where T is the number of choice occasions in the season. Denote the indirect utility of not
participating as

, the behavior of participating and not participating

can be described by a logit model:

50

(

)

(

)

where ρ is the parameter on the seasonal inclusive value.
Hence, the log-likelihood function is:

( (

∑

))

( (

∑

))

∑∑∑∑

∑∑

(

(

(

)

(

(

))

))

where S is the number of nonusers, N is the number of users and potential users, T is the
number of choice occasions, and w is the personal weight.

is 1 for the beach visited

in occasion t and 0 for all other beaches. The total number of day trips taken by person n
can be computed as:

∑

.

51

2.2 Predicted Trips
With the estimated parameters, we can predict individual probabilities of taking day trips,

̂(

), and visiting certain beaches, ̂ (

). Then for person n, the predicted

total number of day trips is:

̂

∑ ̂(

)

The predicted total number of trips taken to beach j on lake k is:

̂ (

)

∑ ̂(

)

If beach j is closed or there is a marginal increase in the length of beach j, the changes in
total trips or the trips taken to beach j can be calculated.

̂
̂ (

)

̂
̂ (

̂
̂ (

)

2.3 Welfare Measures

52

)

A change on one or more beaches will cause welfare changes to users and potential users.
It is of no value to nonusers. Based on Haab and McConnell (2002) and Champ et al
(2003), for person n, in choice occasion t, the welfare change are computed as the change
of the maximum utility this person can attain in this choice occasion, i.e. the inclusive
value, before and after a scenario happens, divided by marginal utility of income.

̂ |

̂

̂ |
̂

The seasonal welfare change will be:

̂

Taking the weighted average of

∑ ̂

̂

across all users and potential users gives

the seasonal value per person.

̂

̂

∑
∑

To make seasonal welfare estimates comparable to those from single-site demand
models, they can be normalized by two kinds of factors: changes in total trips or trips
taken to the changed site, both of which were presented in the previous section. The way
we apply the normalization is to divide the weighted sum of seasonal values by the
weighted sum of trip changes, so that the results may not be distorted by possibly almost
zero probabilities to visit certain beaches at the individual level.

53

̂

∑

̂

∑
∑
∑

̂
̂ (

)

All the estimates above are per person measures. How to generalize them to the
population depends on specific models.

54

3 Survey and Data
3.1 Surveys
3.1.1 Screener Mail Survey
To recruit people who might participate in beach recreation and collect data on nonusers,
a screener mail survey was sent to Michigan residents in 2011. A stratified sample was
drawn from Michigan’s driver license list, which has similar demographic characteristics
22

as the census data.

The two strata are for coastal and non-coastal counties, with 60% of
23

the sample drawn from coastal counties and 40% from noncoastal counties . Within the
two strata, we drew randomly proportional to each counties’ population to further ensure
geographic representativeness of the sample. To manage the survey costs, people who
lived in the Upper Peninsula were excluded as the majority of population lives in the
Lower Peninsula. The original sample size drawn was 32,230, and the number went down
to 29,613 after removal of deceased people and those with bad mailing addresses.
The short four-page mail survey had three parts. The first part asked people about
their participation in various everyday activities, recreation activities and indoor activities.
Only one question was about Great Lakes beaches in order to reduce potential selfselection bias that could occur if people knew the survey was aimed at identifying Great
Lake beach-goers. The second part was about participation obstacles, such as time or

22

See Appendix B.

23

The ratio of 60% over 40 % was decided through sensitivity analyses to balance
between recruiting as many people who participated in beach recreation as possible
within the project budget and not losing the representativeness of the general population.
55

money constraints. The third part contained demographic questions like race, education,
employment status, household income, etc.
From June, 2011 to November, 2011, three waves of survey packages were
mailed out and two waves of automated phone calls were sent to household landlines as
reminders. 11,028 people returned their questionnaires for a 37.24% response rate, and
9,591 respondents were kept for data analysis according to the criteria of living in the
Lower Peninsula and being the persons to whom the mail survey was addressed, among
24

which 5,556 said they had visited a Great Lakes beach since June 1, 2010 .

3.1.2 Follow-Up Web Survey
5,476 users and potential users from the screener mail survey were invited for the follow25

up Great Lakes beach web survey . In-person and on-line pretesting was implemented
to test survey instruments (see Weicksel (2012)). There were additional 85 people
participating in beach recreation (their responses were received after the mail survey was
closed for data collection) chosen for a pilot survey, the purpose of which was to test the
functionality and data storage of the web survey.

24

Please refer to Weicksel (2012) for complete mail survey details.

25

The 80 people not invited to the web survey actually had multiple answers to the
question “Where do you live”. They might own properties in the Upper Peninsula of
Michigan or other States. We decided to include them for data analysis after the web
invitation went out. All these discrepancies are taken care of through weights.
56

There were two sections in the follow-up web survey: the beach trip section

26

and

the choice experiment section analyzed by Weicksel (2012). In the beach trip section,
following the survey in Parsons et al (2009), trips are categorized into three types: trips
lasting a day or less (day trip), overnight trips of less than four nights (short overnight
trip), and overnight trips of four nights or more (long overnight trip). People are asked to
report trip numbers of each type during the time frame from Memorial Day weekend,
2011 to September 30, 2011 (the primary beach-going season). Detailed questions were
asked for up to two randomly selected trips, such as date, activities and the number of
adults and children. If one had not gone to any public Great Lakes beaches in Michigan
in the past two years, the beach trip section would be automatically skipped.
Four waves of contacts were sent to potential web respondents. The first wave
mail package included an invitation letter with the invitee’s unique survey website
address and a $1 cash incentive; postcard reminders with the unique survey web
addresses were used in the second and third waves, differing in sizes. In the last wave, a
letter invitation was sent with a completion incentive strategy. The survey started in April,
2012, and closed right after the Memorial Day weekend, 2012. In total, 3,197 people
logged on the survey and answered our initial trip questions, giving a response rate of
27

58.38% . The overall response rate of the two-stage survey was 21.7%.

26

In the survey, “Great Lakes beaches” were defined with a labeled graphic along with
the following bulleted list: “For this survey, Great Lakes beaches in Michigan include
beaches on the shorelines of • Lake Michigan, • Lake Huron, • Lake Erie, • Lake
Superior • All connecting waters (Lake St. Clair, St. Clair River, Detroit River, etc.)”.
27
Please refer to Weicksel (2012) for complete web survey details.
57

3.2 Data
Out of 9,591 mail survey respondents, 3,838 said they didn’t visit any Great Lakes
beaches, so they are defined as nonusers for beach recreation. Within the 3,197 people
who responded to the web survey, 2,544 are the persons to whom the web survey was
addressed, and are kept for data analysis. 7 of them skipped the beach trip section, which
leaves us 2,537 effective respondents as users and potential users.
This chapter follows most recreation demand studies and only day trips are used
28

in the model. Trips are removed where beaches are on inland lakes or Lake Superior , or
out of Michigan, and where no trips are reported for the beaches. If total trip numbers in
29

each month exceed the upper limits , excess trips are dropped. After these steps, we
have 1,538 users who took at least one day trip in the summer of 2011, and 999 potential
users with no day trip.

28

Web respondents all live in the Lower Peninsula and it is impossible for most of them
to go to Lake Superior and come back on one day. Some people may have a second home
in the Upper Peninsula, so they report day trips to beaches on Lake Superior. Their trips
are not included in the analysis as we consider trips originating from permanent
residence.
29
For day trips, the upper limit in June (including Memorial Day weekend) is 34, 31 in
July and August, 30 in September.
58

Table 13: Demographic Characteristics of Users, Potential Users and Nonusers

Age (Mean)
Income (Mean, $1000)
Education Years (Mean)
Male (%)
White (%)
Employed Full-Time (%)
Retired (%)
Children under 17 (%)

Effective Web Survey Respondents
Potential
All*
Users
Users*
44.4
43.9
45.0
81.9
79.0
85.7
14.8
14.8
14.8
47.8
50.0
44.8
90.9
90.7
91.1
52.2
54.3
49.4
19.2
17.5
21.5
35.0
34.2
36.0

30

Nonusers*
49.5
61.0
13.8
49.7
80.1
40.1
29.9
29.2

*Note: Nonusers were significantly different at 1% level from the group of Users and
Potential Users for each characteristic except “Male”. Nonusers were significantly
different at 1% level from Potential Users for each characteristic. Potential Users are
significantly different at 5% level from Users for “Income”, “Employed Full-Time” and
“Retired”.

We use demographic data from the web survey for users and potential users as it
is the most recent. It can be seen from Table 13 that nonusers have very different
characteristics from the group of users and potential users. People are more likely to
participate if they are young, with higher income, more educated, white, employed fulltime, not retired and with children under 17. Between users and potential users, we would
expect the employment status to affect the behavior of taking or not taking a day trip in
one choice occasion. Furthermore, nonusers are significantly different from potential
users for each characteristic suggesting that pooling these two categories as in von
Haefen et al (2005) may lose some accuracy. It is worth noticing that nonusers are
identified based on the screener mail survey in this study. Although the chances are likely

30

These are weighted by corresponding weights.
59

small, those nonusers who responded the mail survey before September 30, 2011 might
have taken trips to public Great Lakes beaches in the survey season.

Figure 6: Public Great Lakes Beaches for Day Trips

31

For interpretation of the references to color in this and all other figures, the reader is
referred to the electronic version of this dissertation.

According to the official beach list from Michigan Department of Environmental
Quality (DEQ), there are 588 public Great Lake beaches in Michigan, 454 on Lake Erie,
Lake St. Clair, Lake Huron and Lake Michigan. Removing 3 beaches with no length
information, we have 451 beaches as candidates in people’s choice sets (Figure 6).
Choice sets can be different among individuals based on the maximum driving distance
on one day. Following the literature, we set the cut point to be 500 miles for a round
31

Figure 6, 7, 10, 11 and 12 are Google Earth images. File conversion is through the
website: http://www.earthpoint.us/ExcelToKml.aspx
60

32

trip , which means beaches more than 250 miles away from one’s permanent residence
are not available for day trip visitation. The resulting choice set is quite large compared to
previous studies on beach visitation, which often have fewer than 100 alternatives. For
instance, Murray, Sohngen and Pendleton (2001) conducted their survey on 15 Lake Erie
beaches, and Parsons et al (2009) had the maximum number of sites in the choice set as
65.

Figure 7: GLOS Points on Great Lakes in Michigan

Individual beach length

33

and the previous year’s closure information were

provided by Michigan Department of Environmental Quality. The number of closure
days is the sum of all closure periods in the year of 2010, the year prior to our trip data.

32
33

About 1% of people who took day trips visited beaches more than 250 miles away.
It is defined as the length of shoreline reach.
61

Data on water surface temperature in the survey season was obtained from National
Oceanic and Atmospheric Administration (NOAA) Great Lakes Environmental Research
34

Laboratory (GLERL) using Great Lakes Observing System (GLOS) Point Query tool .
56 grid points are selected on Lake Huron along the coastline, 79 on Lake Michigan and
2 on Lake Erie (there are two beaches on Lake Erie in the DEQ list), as shown in Figure 7.
Daily temperatures were retrieved at these points and averaged into monthly temperatures,
because we know the month of the trips but not the exact days. Monthly data was directly
used for Lake St. Clair as its daily data was not available. Individual beaches were
matched to the nearest location with temperature data.

3.3 Model Specification
In the repeated nested logit model with a participation hurdle, the specification of the
indirect utility person n obtains from visiting beach j on lake k in choice occasion t is:

(

34

http://glos.us/data-tools/point-query-tool-glcfs
62

)

and

are described by the demographic variables in

Table 13.
The computation of travel cost is:

(

)

⁄

( ⁄ )
$0.476 per mile is the total driving cost minus maintenance and insurance costs for an
35

average size car in 2011, reported by American Automobile Association (AAA) . Time
cost is the opportunity cost. A person employed full-time works approximately 2,000
hours per year, and the hourly wage can be derived. As discussed in Chapter 9 of Champ
et al (2003), for people working with fixed time schedule, normally one third of the
hourly wage is treated as the time cost. Travel distance and travel time are calculated in
36

PC miler, the logistic software, and their measures are mile and hour respectively .
The definition of regions is from Center for Geographic Information in the State
of Michigan, where there are six regions in the Lower Peninsula plus one for the Upper
Peninsula. Beaches are assigned to different regions based on counties they belong to,
35

This is one way to compute travel cost. Another way would be the operating cost (gas,
maintenance and tires) plus depreciation caused by driving, which gives $0.2422 per
mile. Results using this travel cost are available upon request.
36
The travel cost in this study is for each adult, not household. It does not count the
number of people in one vehicle.
63

which is available in the official beach list from Michigan Department of Environmental
Quality (DEQ). Since a few Lake Michigan beaches are on the Upper Peninsula, we
include six regional dummies for the Lower Peninsula in the estimation.
In the survey data, instead of reporting beach names, people might only report the
nearest town or city to the beach. That is to say, we don’t know the exact beach but the
area. There could be multiple beaches in that area. Given that all beaches are mutually
exclusive, the probability that person n visits area a can be expressed as:

(

)

∑

(

)

It can be inferred from the official beach list how many beaches are in certain areas and
what they are. Also, for some trips, we are not able to locate the beaches or the areas, and
have to count these trips at the level of taking or not taking a day trip. That is to say,

(

) is used to describe the trip information. Data from these two groups takes

about 35.3% and 8.8% of the total day trips respectively.
Since nearly half of the trip data is non-regular, the estimation is programmed in
Matlab so that the log-likelihood function can be adjusted to incorporate all available
information, although the estimation burden greatly increases. Depending on the speed of
computers, it takes 2 to 4 days to estimate the proposed model in Figure 5 with starting
values from sequential estimation. To remove the effect of cluster standard errors in
repeated trips, bootstrapping is applied through High Performance Computing Center in
Michigan State University, where it is possible to execute many single-process jobs at a

64

time. Regarding the time constraint, we set the number of runs in bootstrapping as 100,
which still requires about four weeks before getting all the results.
We also estimate two traditional repeated nested logit models without the
participation hurdle for comparison. For convenience, we call them Model 1 and Model 2,
where Model 3 is the proposed model. Model 1 only uses the web survey data and
excludes nonusers, which is normally applied with list sampling and some studies with a
screener survey, such as Lew and Larson (2008). Model 2 and Model 3 include users,
potential users and nonusers, and individual weights are adjusted to maintain the relative
ratio of participation to nonparticipation. So the data for both models is representative of
the general population. Like the models in von Haefen et al (2005), Model 2 does not
differentiate potential users from nonusers because they all took no day trips in the survey
season. Model 3 follows the procedure in English (2008), and has a similar structure as
Model 1 except for the added participation hurdle. Nonusers do not enter the nests below
the hurdle (Figure 5).
The computation of welfare measures at the individual level in Model 1 and 2
follows the equations in Section 2.3, since Model 2 pools nonusers with potential users.
In Model 1, to calculate welfare measures at the population level, we need to take into
account the fact that these individual estimates are for users and potential users. The
37

participation rate inferred from the mail survey was 58.01% . The total number of adults
living in Lower Peninsula of Michigan is 7,289,085 according to 2010 census, which
implies 4,228,398 users and potential users. Multiplying the number of users and

37

See Appendix D.
65

potential users to Model 1 individual estimates gives welfare changes for the population.
In Model 2, multiplying 7,289,085 to individual estimates will produce welfare measures
for the population.
For Model 3 where there are three groups of people, changes in beaches cause
welfare loss to users and potential users, not to nonusers (because of the nature of the use
values being estimated). Although the data shows which group one belonged to during
the survey, generally, researchers will have no information on the membership. Also,
people switch between groups all the time. Therefore, we can predict one’s probability to
participate and not to participate in status quo, and apply them to conditional estimates to
derive unconditional welfare measures. For person h in Model 3, we have the welfare
changes in choice occasion t and total estimated trips as:

̂

̂ (

)

̂ |

̂ (

̂

̂ (

)

) ∑ ̂(

̂(

)

̂ |
̂

̂ (

)

̂ (

)

) ∑ ̂(

̂ (

)

66

)

These will generate individual welfare measures for a random person in the population,
and the calculation of population welfare measures is the same as Model 2.

67

4 Estimation Results
Table 14 shows full information estimation results of two traditional repeated nested logit
models, Model 1 and Model 2, and the proposed model with a participation hurdle,
Model 3. All models display the same pattern at the beach level, and have similar
estimates, since information at this level mainly comes from users. The estimated
parameter on travel cost has a negative sign and is statistically significant at 1% level,
which is consistent with demand theory. The higher the price is, the lower the demand.
Following the literature, logarithm of beach length is used. The length matters a lot when
beaches are short, and its importance decreases for longer beaches. All else equal,
warmer beaches are preferred to colder beaches. Total closure days in the previous year
have a negative effect on beach visitation, suggesting that previous beach closures have a
lasting stigma impact on future visitation. The estimated parameters on regional dummies
indicate that all else equal, beaches on Lake Michigan are more popular compared to
Lake St. Clair, Lake Erie and Lake Huron.
The two nesting parameters at the lake level and the trip/no trip level are
statistically significant at 1% level and within the unit interval, which is consistent with
utility maximizing behaviors. Thus, nesting works better than no nesting. At the trip/no
trip level, how demographic variables affect the behavior of taking or not taking a day
trip in one choice occasion is different in Model 2 compared to Model 1 and 3. The
estimated parameters as well as their significance are quite different, or even have
opposite signs, because nonusers are identified at this level together with potential users
in Model 2. Model 1 and Model 3 show that within the population of beach-goers, people
who are male, non-white and not full-time employed take more day trips, as Table 13
68

indicates that the employment status may influence the behavior of taking or not taking a
day trip in one choice occasion; whereas Model 2 suggests that in the general population,
people who are more educated take more day trips.
Comparing Model 1 and with the part of Model 3 that is conditional upon
participating in beach recreation shows that they produce almost identical results. Recall
that Model 3 is essentially Model 1 plus the participation hurdle; nonusers do not enter
the nests below the participation hurdle. In Model 3, the nesting parameter at the top level
(the hurdle level) is statistically significant at 1 % level and between 0 and 1, so adding
the participation hurdle to the model works better than no hurdle. The variable for being
full-time employed is dropped from the hurdle because otherwise the model would not
converge, which might be caused by its higher correlations with other demographic
variables for nonusers. In the general population, the hurdle model suggests people who
are young, white and more educated are more likely to participate in beach recreation.
The three variables are all significantly different between nonusers and the group of users
and potential users in Table 13.
Based on the estimation results, preferences on travel cost and beach
characteristics are not affected much by the model structure or whether the data is from
the population or a sub-population. The preferences are revealed when people actually
take trips. The distinction of the three models is what behaviors are being modeled.
Model 1 and Model 2 both incorporate the behavior of taking or not taking a day trip in
one choice occasion, the former in the group of beach-goers, the latter in the general
population. Model 3 separates the behavior of participating or not participating in one
season from the behavior of taking or not taking a day trip in one choice occasion through
69

the participation hurdle. Although it is not a conceptual hurdle derived from an objective
utility function with constraints, it can explain how people behave to some extent, and
make use of more information compared to Model 2.

70

Table 14: Full Information Maximum Likelihood (FIML) Estimation Results
Model 1
Model 2
Variables
Estimates
t Statistics
Estimates
t Statistics
Travel Cost
-0.0280***
-20.0
-0.0312***
-21.1
Log(Length)
0.126***
4.85
0.139***
5.20
Temperature
0.0589***
5.96
0.0601***
5.95
Closure Days of 2010
-0.0189***
-4.47
-0.0207***
-4.54
LP Northeast
-0.0642
-0.173
-0.197
-0.534
LP Mid-East
-1.29***
-3.32
-1.56***
-4.00
LP Southeast
-1.38***
-3.23
-1.68***
-4.03
LP Northwest
1.16***
4.70
1.15***
3.76
LP Mid-West
0.901***
3.49
0.992***
3.30
LP Southwest
0.321
1.20
0.406
1.22
Lake Level
Nesting Parameter
0.644***
11.7
0.705***
12.6
Trip/No Trip Level
Nesting Parameter
0.547***
9.00
0.596***
9.89
No Trip
Male
-0.152*
-1.65
-0.118
-1.31
Age
-0.0027
-0.768
0.00391
1.13
White
0.378*
1.91
0.014
0.0682
Education Years
-0.0106
-0.483
-0.0918***
-6.30
Full-Time Employed
0.212**
2.31
0.106
1.04
Retired
0.187
1.12
0.18
1.12
Children under 17
0.133
1.54
0.097
1.09
Constant
5.30***
9.29
7.23***
12.5
Note: *10% significance level; **5% significance level; *** 1% significance level
Model Levels
Beach Level

71

Model 3
Estimates
t Statistics
-0.0281***
-17.3
0.126***
4.93
0.0581***
5.69
-0.0189***
-4.03
-0.0621
-0.196
-1.30***
-3.57
-1.39***
-3.30
1.17***
4.25
0.903***
3.65
0.325
1.23
0.645***
12.5
0.544***
7.45
-0.151
-1.56
-0.0026
-0.757
0.383
1.61
-0.0098
-0.490
0.215**
2.38
0.186
1.12
0.136
1.52
5.23***
9.59

Table 14 (cont’d)
Model 1
Model 2
Variables
Estimates
t Statistics
Estimates
t Statistics
Nesting Parameter
Male
Age
White
Education Years
Retired
Children under 17
Constant
Note: *10% significance level; **5% significance level; *** 1% significance level
Model Levels
Participation Hurdle
Not Participate

72

Model 3
Estimates
t Statistics
0.00511***
21.4
0.0579
0.591
0.0148***
4.98
-0.767***
-4.02
-0.176***
-9.16
0.124
0.998
0.0704
0.738
5.59***
17.3

Table 15: Welfare Estimates of Changing a Beach in 2011 Dollars at Individual Level

Closure
of One
Beach in
the
38
Region
Marginal
Increase
in Length
of One
Beach in
the
Region

Huron North
Huron South
St. Clair
Erie
Michigan North
Michigan Central
Michigan South
Huron North
Huron South
St. Clair
Erie
Michigan North
Michigan Central
Michigan South

Model 1
-0.0408
-0.113
-0.989
-1.81
-0.0600
-0.700
-0.370
0.0262
0.0419
0.469
0.449
0.0232
0.186
0.134

Season
Model 2
-0.0254
-0.0685
-0.645
-1.08
-0.0368
-0.432
-0.228
0.0162
0.0254
0.310
0.280
0.0144
0.116
0.084

Season/Total Trip Change
Model 1 Model 2 Model 3
37.5
33.3
38.2
36.7
31.9
37.2
36.4
32.5
36.6
36.4
32.4
36.5
38.1
33.5
39.0
38.5
34.0
38.3
38.3
33.7
38.0
39.8
33.6
42.5
38.0
31.8
36.8
36.7
32.3
36.4
36.5
32.5
36.3
31.2
34.7
36.8
38.3
33.7
38.4
38.0
33.5
38.1

Model 3
-0.0232
-0.0713
-0.694
-1.27
-0.0278
-0.324
-0.172
0.0152
0.0262
0.326
0.317
0.0110
0.0877
0.0634

38

Season/Site Trip Change
Model 1 Model 2 Model 3
12.2
13.2
12.4
12.7
13.3
12.8
13.3
14.2
13.3
14.8
15.5
14.7
11.9
13.3
11.7
12.7
13.6
12.7
12.8
13.6
12.7
13.0
13.4
14.0
13.3
13.4
12.8
14.4
15.0
14.2
17.1
17.4
16.8
9.82
13.5
11.4
12.9
13.7
12.9
12.8
13.7
12.9

As described in the text, we construct 451 scenarios where one of the 451 beaches is closed in one scenario, which will give us the
value of each beach. A region has multiple beaches, so we use the average value of these beaches to represent “One Beach in the
Region”. It is the same with marginal increase in beach length.

73

Table 16: Welfare Estimates of Changing a Beach in 2011 Dollars (Million) at State Level
39

Closure of One Beach in
the Region

Marginal Increase in
Length of One Beach in
the Region

Huron North
Huron South
St. Clair
Erie
Michigan North
Michigan Central
Michigan South
Huron North
Huron South
St. Clair
Erie
Michigan North
Michigan Central
Michigan South

Model 1
-0.172
-0.477
-4.18
-7.65
-0.254
-2.96
-1.56
0.111
0.177
1.98
1.90
0.098
0.787
0.569

39

Season
Model 2
-0.185
-0.499
-4.70
-7.86
-0.268
-3.15
-1.66
0.118
0.185
2.26
2.04
0.105
0.848
0.613

Model 3
-0.169
-0.520
-5.06
-9.26
-0.203
-2.36
-1.25
0.111
0.191
2.38
2.31
0.0802
0.640
0.462

As described in Section 2.3, for the seasonal value, in Model 1, the average individual values were multiplied by the population of
adults living in the Lower Peninsula of Michigan adjusted by the participation rate 58.01%. In Model 2 and Model 3, the population
values are the average individual values multiplied by the population of adults living in the Lower Peninsula of Michigan.

74

Table 17: Estimated Trips and Welfare Changes of Closing All Beaches on a Great Lake in 2011 Dollars
Individual Level
Number of Trips
Model
Model
Model
Model
1
2
3
1
Erie
0.236
0.136
0.167
-5.16
St. Clair
0.422
0.260
0.298
-9.43
Huron
0.820
0.472
0.496
-20.6
Michigan
3.46
2.00
1.62
-118.1
State Level (Million)
Number of Trips
Model 1
Model 2
Erie
1.00
0.99
St. Clair
1.79
1.90
Huron
3.47
3.44
Michigan
14.6
14.6

Season
Model
2
-2.84
-5.63
-11.2
-62.0

Model
3
-3.59
-6.59
-12.0
-53.8

Model 3
1.22
2.17
3.61
11.8

75

Season/Total Trip Change
Model
Model
Model
1
2
3
36.4
32.4
36.4
36.4
32.4
36.4
36.9
32.7
36.8
37.4
33.2
37.3

Model 1
-21.8
-39.9
-86.9
-499.2

Season/Lake Trip Change
Model
Model
Model
1
2
3
21.9
21.0
21.5
22.3
21.6
22.1
25.0
23.6
24.2
34.1
31.0
33.2

Season
Model 2
-20.7
-41.0
-81.4
-451.6

Model 3
-26.2
-48.0
-87.5
-391.8

To compare valuation results, three scenarios are constructed: closing one beach
in different regions, marginally increasing the length of one beach in different regions,
and closing all beaches on one Great Lake. As described in Section 2.3, there are two
measures of welfare for each scenario, per season (columns titled “Season” in Table 15,
16 and 17) and per trip (columns titled “Season/Total Trip Change”, “Season/Site Trip
Change” and “Season/Lake Trip Change” in Table 15, 16 and 17). The per trip measures
come from normalizing per season measures by the change in the expected number of
trips to the affected site(s), or the change in the number of trips to any sites, so that results
of multiple-site demand models are comparable to those of single-site demand models or
models with different choice sets.
Take the per season measure as an example. When a beach is closed or there is a
marginal increase in beach length, we consider the change as happening separately at
each of the 451 beaches. In the case of beach j, we compute the welfare change for each
person in the sample, which can be denoted as

̂

following the previous

notation, and take the weighted average across people as the average per person welfare
estimate of beach j,

̂

. Then with the average per person welfare estimates

for all 451 beaches, we calculate the mean values within every region to represent a
beach in one region.

̂

∑

̂

where R is the total number of individual beaches in that region. When all beaches on one
Great Lake are closed, the computation is the same except that the last step is not applied.
76

Aggregating per person estimates to the population gives welfare estimates at the
population level, which includes adults living in the Lower Peninsula of Michigan in our
study.
Regarding per season measures, when a beach is closed, seasonal welfare losses
are larger for Lake Erie and Lake St. Clair because they have much fewer beaches
compared to Lake Huron and Lake Michigan. If one beach is closed, there are not many
substitutes, and the utility decreases a lot. When the length is increased by 1 mile on one
beach, seasonal welfare changes are also larger for Lake Erie and Lake St. Clair. Beaches
on these two lakes are all shorter than 0.5 mile, while beaches on the other two lakes tend
to be much longer. With the logarithm, a marginal increase in length will lead to more
utility increase for short beaches than for long beaches. Hence, the welfare gains for
changes in length at single sites are much smaller for Lake Huron and Lake Michigan.
When one entire lake is closed, seasonal welfare loss is the largest for Lake
Michigan, then Lake Huron. Lake St. Clair and Lake Erie have much smaller values.
Lake Michigan has the largest number of beaches. The maximum utility one could attain
would greatly decrease if all beaches on Lake Michigan were closed. There is much less
variation in per trip measures across regions and lakes because they are normalized by
trip changes. Thus, closures at more valuable beaches/lakes will lose more trips. Hence,
these normalized measures tend to remove the difference in demand for different sites,
and are comparable to those from single-site demand models that assume only one site is
available.

77

To compare across three models, it can be seen that at the individual level,
seasonal welfare estimates in Model 2 and 3 are about 55% to 60% of those in Model 1,
which can be explained by the fact that only users and potential users are included in
Model 1 and the participation rate in the sample is 58.01%. At the State level, when all
the results are generalized to the population, seasonal welfare changes in Model 1 are a
little smaller than Model 2 if the change is for one beach, and bigger if the change is for
one lake. Possible explanations could be that users and potential users have slightly less
elastic demand than nonusers with small changes, and more elastic demand with big
changes. Model 1 and 2 predict almost the same number of trips to each lake.
Model 3 has somewhat different patterns: higher values for beaches on Lake Erie,
Lake St. Clair and Lake Huron, and lower values for beaches on Lake Michigan. It is the
same case for estimated trips. Compared to Model 1 and 2, Model 3 has different
allocations across lakes, fewer trips to Lake Michigan and more trips to the other three
lakes. The total number of predicted trips is also smaller. English (2008) also found that
the hurdle model tended to smooth the variation in trip prediction for different areas. For
population estimates, we would expect models to produce similar results if the population
mean is preserved by the model. However, Model 3 loses the mean-fitting property that
Models 1 and 2 possess for the total trips and for trips by region. One reason is the use of
individual-level participation rates. With the participation hurdle, the predicted
participation rate on average is 58.06%, almost identical as the sample, but there may be
a lot of variation across people. When inserting the participation rate at the individual
level, differences in each person’s welfare changes and estimated trips could be enlarged
rather than being averaged out. For instance, even though the means of two random

78

variables are the same in two models, the product of the means in one model is not
necessarily equal to the mean of the two variables’ products in other model. Considering
Model 1 and Model 3, which are essentially the same model except the participation
hurdle, in Model 1, web survey respondents receive a 100% participation rate. Mail
survey respondents are not included so their participation rates are 0. In Model 3, the
estimated participation rate is positive for each person in users, potential users and
nonusers. The noise associated with individual estimates of the participation rates may
not go away in the aggregation process.

79

5 Discussion and Conclusions
In this chapter, a repeated nested logit model is estimated with data from a two-stage
survey of the general population, providing policy makers with monetary values of public
beaches on Lake Erie, Lake St. Clair, Lake Huron and Lake Michigan. We find that 58%
of Michigan adults living in the Lower Peninsula of Michigan participate in Great Lakes
beach recreation during the summer season. In the general population, people who are
young, white and more educated are more likely to participate. Once participating, people
who are male, non-white and not employed full-time tend to take more day trips.
The value of an individual public beach is about $32-$39 per trip, depending on
the region. If length on one beach is increased by one mile, the welfare gain is about $31$43 per trip. About 20.9 million day trips in total are taken to public Great Lakes beaches
(excluding Lake Superior) each summer by Michigan adults from the Lower Peninsula,
with about 14.6 million for Lake Michigan. The results show that access to beaches for
day trips on Lake Michigan is worth over $400 million each year to Michigan adults
living in the Lower Peninsula of Michigan. These values are relevant to decisions on
beach issues such as quality maintenance and beach facility construction, and to policy
decisions about the value and environmental improvement of Great Lakes beaches.
This chapter also clarifies whether including nonusers and differentiating them
from potential users will make a difference. In previous studies, if only one survey was
implemented, the two groups were pooled and nonusers were treated as potential users
who took no trip during the season; if there was a screener survey, the purpose was to
recruit a sample for follow-up surveys and the data was rarely used. We follow what was

80

done in English (2008) with the improvement that we also collected individual-level data
for nonusers. The estimation results of three models show that pooling nonusers with
potential users will produce different parameter estimates and welfare estimates
compared to using information only from users and potential users. When the behavior of
participation/nonparticipation is explicitly modeled, it hardly influences estimated
parameters for the beaches because nonusers provide no trip information; however, it
does tell us what factors could play a role in determining whether to participate or not.
We can predict the participation rate for any individual. However, the unconditional
results for total trips and welfare measures for the hurdle model are somewhat different
because presence of the hurdle leads to different spatial allocations of trip-taking
behaviors when results are aggregated to the population level. The loss of prediction
power might also be attributed to the lack of theoretical support for the participation
hurdle. As stated in English (2008), the hurdle could inadequately capture people’s
economic response to factors other than their own demographics, like site characteristics
and possible investments for participating in beach recreation (e.g. buying a boat).
Future work may focus on deriving a participation hurdle with an objective utility
function and relevant constraints. In this chapter, the seasonal inclusive values are used to
represent the utility of participation. English (2008) also incorporated cost of licenses as
another factor in the hurdle. If data is available on beach access fees, it could be
combined with people’s leisure activities in the mail survey to derive the equations for
participating and not participating from more comprehensive utility maximizing
behaviors. It is worth further investigating the implications of losing the mean-fitting
property of the typical repeated logit models when a hurdle is incorporated. In addition,

81

we have only a few quality variables in this chapter. Although regional dummies can
explain site characteristics to some extent, choices among beaches will be more
accurately modeled with more data at the beach level, such as beach width, facilities,
whether a beach is located in the state park, etc.

82

Chapter 3
Modeling Long Overnight Trips by Chaining Recreation Sites

1 Motivation
In recreation studies, valuation often applies to trips where recreation is the single
objective and only one site is visited, so day trip data is the most widely used as it
normally meets the two requirements (Caulkins et al (1986), Lew and Larson (2005),
Moeltner and Shonkwiler (2005), Scarpa and Thiene (2005), Smith (2005), von Haefen et
al (2005), Kim et al (2007), Timmins and Murdock (2007), Parsons et al (2009), etc.).
Some studies, most of which are for fishing or hunting trips, do not explicitly
differentiate overnight trips from day trips, or give the same treatment to the two types of
trips, where the single-objective and single-site assumptions are still imposed (Morey et
al (1993), Englin and Shonkwiler (1995), Haab and Hicks (1997), Provencher and Bishop
(1997), Shrestha et al (2002), Schuhmann and Schwabe (2004), Morey et al (2006),
Cutter et al (2007), Hynes et al (2007), Haab et al (2008), von Haefen and Phaneuf
(2008), etc.).

83

Table 18: Examples of Literature Not Differentiating Overnight Trips from Day Trips
Papers

Activities

Models

Morey et al (1993)

Fishing

Random Utility Models

Englin and Shonkwiler (1995)

Boating, Swimming and
Fishing

Count Data Models

-

Haab and Hicks (1997)

Visiting Beaches, Fishing
and Boating

Random Utility Models

-

Provencher and Bishop (1997)

Fishing

Dynamic Programming

-

Shrestha et al (2002)

Fishing

Count Data Models

-

Schuhmann and Schwabe
(2004)

Fishing

Random Utility Models

-

Morey et al (2006)

Fishing

Latent Class Model

-

Cutter et al (2007)

Visiting National Parks

Random Utility Models

-

Hynes et al (2007)

Kayaking

Random Utility Models

-

Haab et al (2008)

Fishing

Random Utility Models

-

von Haefen and Phaneuf (2008)

Hunting

Random Utility Models

-

84

Comments
Recode all trips to Maine rivers as
day trips, and all trips to Canadian
rivers as four-day trips

Table 19: Studies Dealing with Overnight/Multiple-Objective/Multiple-Site Trips
Papers

Activities

Kealy and Bishop (1986)

Fishing

Mendelsohn et al (1992)

Visiting National
Parks

Hoehn et al (1996)

Fishing

McKean, Walsh and
Johnson (1996)
Tay, McCarthy and
Fletcher (1996)
Parsons and Wilson
(1997)

Models

Welfare Measures

Demand Theory of
Single Site
System of Demand
Equations
Random Utility
Models

19.5 per Day in 1978
Dollars
16.8 per Day in 1982
Dollars
66.7 per Multiple-Day
Trip in 1994 Dollars
69.2 per Trip in 1986
Dollars

Fishing

Count Data Models

Fishing

Random Utility
Models

Fishing

Count Data Models

Shaw and Ozog (1999)

Fishing

Random Utility
Models

Loomis, Yorizane and
Larson (2000)

Whale Watching

Count Data Models

Lupi et al (2003)

Fishing

Random Utility
Models

Yeh, Haab and Sohngen
(2006)

Visiting Beaches

Random Utility
Models

85

Comments

Explicitly model the number of
recreation days
Combine multiple sites as one
composite
Put day and overnight trips in
two separate nests
Include price and time variables
for secondary sites
Use portfolios of destination,
N/A
duration and frequency
58.8-76.9 per Day Trip in Define one dummy variable for
1989 Dollars
incidental consumption
268 in 1988 Dollars on Test two structures with a level
Catch Rate Improvement of trip length
75.0 per Day in 1993
Distinguish incidental trips from
Dollars
joint consumption
Allow different preference
125.0 per Multiple-Day
parameters for day and overnight
Trip in 1994 Dollars
trips
1.45 in 1998 Dollars on
Reducing One Advisory

Make an adjustment to travel
cost for multiple-objective trips

Although most recreation trips are day trips, overnight trips make up a nontrivial
portion of recreation trips, and demand for recreation activities will be more accurately
modeled if these trips are accounted for. Previous studies have proposed several
approaches to deal with overnight trips. Kealy and Bishop (1986) derived the demand
equation from utility theory and used the total number of recreation days as the dependent
variable. Explanatory variables included demographic characteristics, travel cost, daily
on-site costs, daily overnight expenditures, etc. Multiple sites were not involved though.
Hoehn et al (1996) proposed a repeated nested logit model with a trip length level for
fishing trips where day trips and overnight trips were in two separate nests. Trip duration
was taken into account as well as locations and target species. In Tay, McCarthy and
Fletcher (1996), a multinomial logit model was applied to annual fishing trips. The
alternatives were not only individual sites, but also included trip duration and frequency
information. A subset of the universal set was used in estimation, and sampling
correction was applied. Shaw and Ozog (1999) specified two nesting structures in a
repeated nested multinomial logit model. One put the trip length level above the site level,
and the other had the opposite order. The first model had independence parameters within
the unit interval. Lupi et al (2003) implemented a repeated nested logit model with a trip
length level for single and multiple day trips. They allowed different parameters for day
and overnight trips, and the estimated results showed that the marginal utility of income
was lower for overnight trips.
However, these studies still assume only one site is visited on overnight trips. To
address the issue of multiple–sites or multiple-objective trips, Mendelsohn et al (1992)
combined all sites people visited as composites, which were added to the system of

86

demand equations as additional alternatives. People could substitute between these
composites and individual sites. McKean, Walsh and Johnson (1996) included price and
time variables for secondary sites when estimating the demand function of the primary
site. Since the secondary sites were close to the primary site in their study and shared
similar characteristics, these variables were automatically dropped from estimation due to
multicollinearity. Parsons and Wilson (1997) proposed a theory to incorporate incidental
and joint consumption in count data models using a dummy variable as a proxy. It could
be interacted with site quality and demographic variables. Both multiple-objective and
multiple-site trips would be allowed in this approach. They found that incidental
consumption was a complementary good for recreation trips. Loomis, Yorizane and
Larson (2000) distinguished incidental trips from joint consumption using two sets of
dummy variables if both were incurred on a trip. They asked a screening question in the
survey to identify whether a trip was single-purpose or involved incidental and
consumption. Yeh, Haab and Sohngen (2006) applied a nested logit model to day and
overnight trips, and adjusted travel cost based on the proportion of time spent on the
recreation purpose for multiple-objective trips.
Nonetheless, these methods either process the data in a way that multiple-site trips
can be fit into the framework of single-site trips, or model the existence of multiple-site
trips using dummy variables. As yet, there are no applications where allowing people to
decide how many sites to visit and where to go have both been incorporated into a site
choice model. To fill this gap, in this chapter, we extend the traditional model where only
the main destination is visited on overnight trips, to a three-level nested logit model
which explicitly incorporates people’s decision on the number of sites and choice of sites

87

to visit on an overnight trip. The data is from overnight trips where the main purpose is
recreation and people may visit any combination of 49 distinct sites. We want to see
whether the proposed model does a good job on explaining people’s behaviors and
produces different welfare estimates compared to the models based on the main
destination assumption.

88

2 Models
The traditional way to model overnight trips is to assume people only visited their main
destination. With this assumption, we will have a simple conditional logit model as in
Figure 8.

Figure 8: Decision Tree of Main-Destination Model
Following Train (2003), on the overnight trip, the utility person n obtains from
visiting site i as the main destination is:

where the indirect utility

may include travel cost, site characteristics, and their

interactions with demographic variables.

measures unobserved factors. Person n will

go to site i if and only if:

89

When

follows an i.i.d. Type I extreme value distribution, suppose the number of

alternatives is K in the choice set, the probility of visiting site i is:

()

∑

All the sites are independent, and the relative probility of visiting site i over site m is not
affected by other sites. The assumption of independence from irrelevant alternatives (IIA)
holds.

Figure 9: Decision Tree of Model Allowing Multiple Sites per Trip
To build multiple sites into the model, we propose the structure in Figure 9. A
person will simultaneously decide whether to visit one or two sites and where to go.
Within the nest of visiting two sites, the first level represents the primary site, on which
one spends the most amount of time; starting from there, one chooses the secondary site

90

from the rest of alternatives.With K sites, the number of alternatives is also K in the nest
of visiting one site, and K×
(K-1) in the nest of visiting two sites. The total is K× which
K,
greatly enlarges the choice set compared to the traditional model.
As described in chapters 1 and 2, if person n decides to visit one site, the
conditional probability of choosing site i is:

(

)
∑

If person n decides to visit two sites, the probability to choose j as the secondary
site conditional on k being the primary site is:

(

)
∑

k-1 means k is excluded from candidates for the secondary sites.
The conditional probability that a person n chooses k as the primary site is:

(

(∑

)
∑

(∑

)
)

Then for person n, the inclusive values of visiting one site and two sites are:

91

(∑

(∑ (∑

)

)

)

which is the maximum utility person n can attain if visiting one site and two sites
respectively.
To investigate whether demographic variables have any effects on selecting the
number of sites, we put them into the indirect utility of visiting one site:

Then the maximum utility person n can attain from taking an overnight trip is:

(

)

The probabilities of visiting one site and two sites are:

(

)

(

)

92

Hence, the unconditional probabilities of choosing only site i or the pair of site k
and j are:

()

(

)

(

)

∑
(

)

(

)
(

)
(∑

∑

)

)

(∑

(

)

The log-likelihood function will be:

∑ ∑

(

∑ ∑∑

93

( ))

(

(

))

where

people visit one site and

people visit two sites; y is the binary indicator for

the chosen alternative.

The indirect utilities

and

are composed of the price variable, i.e.

travel cost, and quality variables.

,

and

could be different, so that we can test their relationships, for

instance, whether the sum of

and

is equal to

.Unlike previous studies

where day trips are estimated together with overnight trips, in this case, the marginal
utility of income is the same no matter how many sites one visit.
Welfare estimates in the conditional logit and nested logit models are per trip
measures with respect to the choice set since the “don’t go” option is not available. If one
site is closed, or there is a marginal increase on the length of one site, the estimated
welfare change for person n is:

̂

̂|

̂|
̂

The weighted average gives the per person value:

94

̂

̂

∑
∑

Where N is the sample size for the model, and

is person n’s weight.

To facilitate comparison of the model results to those of single-site models or
models with different choice sets, welfare measures can be normalized by
increase/decrease in the probability of visiting the changed site. Denote the changed site
as m, it is straightforward in the conditional logit model.

̂

̂
̂

∑

̂

∑

In the nested logit model, however, site m appears at multiple nodes. If it is closed, the
number of alternatives reduces to K-1 in the nest of visiting one site, and (K-1)*(K-2) in
the nest of visiting two sites. If its length increases, characteristics of more than one
alternative will be affected. In other words,

̂

is the sum of person n’s estimated

probabilities to visit site m and alternatives including site m.

̂

̂( )

∑ ̂(

)

∑ ̂(

)

In the second term, m is the primary site, and in the third term, m is the secondary site.
Following Parsons and Wilson (1997), a pooled truncated Poisson model is also
estimated. We refer to it as a pooled model because it is an ad hoc single site demand
formulation that ignores the complexities of multiple substitute sites and models a generic
95

trip demand using data on the site a person visited. Because people visit different sites
and take different numbers of trips, the effects of quality can be generically entered and
identified. With main destinations, we have:

( )

It is assumed that each site has the same demand function. This is a pooled model, so
generic site quality variables can be included. The dependent variable x is the number of
overnight trips.
For the multiple-site version of this model, denote the dummy variable for visiting
two sites as D, and the equation becomes:

( )

(

)

The last interaction term uses 1-D instead of D in order to be consistent with the nested
logit model above, so that it captures how people visiting one site differ from those
visiting two sites. The access value per trip is

96

⁄(

) in the pooled count model.

3 Data
The data comes from a two-stage survey we conducted in 2011 and 2012. A screener
mail survey went out to Michigan residents to recruit participants in beach recreation.
The sample was drawn from Michigan’s driver license list, and the surveys asked about
people’s leisure activities and participation obstacles. To reduce potential self-selection
bias, the screening question was but one of many questions on the screener survey.
People who said they had visited a beach on the Great Lakes since June, 2010 were
invited to the follow-up web survey, which asked about trips taken to public Great Lakes
beaches in Michigan in the summer of 2011.
Following the approach in Parsons et al (2009), the web survey categorized trips
into three types: day trip (lasting a day or less), short overnight trip (less than four nights)
and long overnight trip (four nights or more). In the web survey section on long overnight
trips, beside trip frequency information, detailed questions were presented for one
randomly selected trip. People were asked to report the beaches on which they spent the
most/second most/third most amount of time, as well as the number of days on each
beach. With information on how many sites people visited and where they went, we are
able to apply the proposed model with multiple sites to value long overnight trips.
To construct the choice set, given there are 588 public Great Lakes beaches in
Michigan according to Michigan Department of Environmental Quality (DEQ). We will
have a 588×
588 choice set if individual beaches are used, and this is extremely
computationally burdensome. Based on literature on site aggregation (Lupi and Feather
(1998), Haener et al (2004), etc.), we aggregate the 588 public beaches into 49

97

aggregated sites, where the key factors to consider are beach popularity, geographic
distribution and heterogeneity of travel cost (Figure 10 and Figure 11). A beach is more
likely to stand on its own if many people go there. Beaches with no visits are dropped.
Since the travel cost parameter is the denominator of all welfare estimates, to minimize
the distance heterogeneity in all aggregated sites, we keep the average distance between
two individual beaches under 18 miles within one site. Even with aggregation, a choice
set of 49 sites is relatively large compared to previous literature. For instance, Shaw and
Ozog (1999) aggregated 13 rivers into 8 groups, and Kaoru et al (1995) had 29
aggregates from 80 sites. In the web survey, 447 people took long overnight trips in the
summer of 2011. Before aggregation, 337 visited one beach, 81 visited two beaches and
29 visited three beaches. After aggregation, 355 visited one site, 71 visited two sites and
21 visited three sites. Hence, although we use 49 aggregated sites to represent 588
individual beaches, there is not much information on trips with multiple sites that is lost
with aggregation.
Following the aggregation literature, characteristics of these sites are averages of
individual beach characteristics, and the number of elemental beaches within an
aggregate is included in the estimation (Ben-Akiva and Lerman (1985), Parsons and
40

Needelman (1992)). Individual beach length

was provided by Michigan Department of

Environmental Quality. Data on water surface temperature in the survey season was
obtained from National Oceanic and Atmospheric Administration (NOAA) Great Lakes
Environmental Research Laboratory (GLERL) using Great Lakes Observing System

40

It is defined as the length of shoreline reach.
98

41

(GLOS) Point Query tool . Daily temperatures were retrieved and averaged into
monthly temperatures, because we know the month of the trips but not the exact days.
Monthly data was directly used for Lake St. Clair as its daily data was not available.
Individual beaches were matched to the nearest location with temperature data. All
individual beaches’ characteristics are averaged to get the quality data for aggregated
sites.

Figure 10: Public Great Lakes Beaches Visited On Long Overnight Trips

41

http://glos.us/data-tools/point-query-tool-glcfs
99

Figure 11: Aggregated Beach Areas in the Long Overnight Trip Model

Figure 12: GLOS Points on Great Lakes in Michigan
100

In data analysis, for the 21 people who visited three sites, the third site was
truncated, and they were pooled into people visiting two sites, because the group is too
small to identify, and the model may become intractable. The descriptive statistics of
participants taking long overnight trips are shown in Table 20. It can be seen that people
visiting two sites are not very different from people visiting one site.

Table 20: Demographic Characteristics of Participants with Long Overnight Trips
Age (Mean)
Income (Mean, $1000)
Education Years (Mean)
Male (%)
White (%)
Employed Full-Time (%)
Retire (%)
Children under 17 (%)

Participants
45.5
95.7
15.2
44.7
96.8
54.9
18.1
39.6

Visiting One Site*
45.7
95.1
15.2
45.3
96.2
54.9
18.6
38.4

42

Visiting Two Sites*
44.9
98.3
15.5
42.4
99.1
55.1
16.3
44.1

*Note: People visiting two sites were not significantly different from people visiting one
site except for the race variable “White”, where the difference was statistically significant
at 5% level.

To compute each person’s travel cost, we have:

(
( ⁄ )

42

These are weighted by corresponding weights.
101

⁄

)

$0.476 per mile is the total driving cost minus maintenance and insurance costs for an
43

average size car in 2011, reported by American Automobile Association (AAA) . Time
cost is the opportunity cost. A person employed full-time works approximately 2,000
hours per year, and the hourly wage can be derived. As discussed in Chapter 9 of Champ
et al (2003), for people working with fixed time schedule, normally one third of the
hourly wage is treated as the time cost. Travel distance and travel time are calculated in
44

PC miler, the logistic software, and their measures are mile and hour respectively . For
alternatives in the nest of visiting two sites, the round trip travel distance and travel time
is counted from permanent residence to the primary site, the primary site to the secondary
site, and the secondary site back to permanent residence.
Demographic variables included in the model are those listed in Table 20 as well
as three dummies indicating whether one’s income is within 0 to 25% percentile, 25%-50%
percentile, or 50%-75% percentile. The dummy variables for income are considered here
to test if income tends to play a role in the decision process for visiting multiple sites on
long overnight trips.
We estimate four models for comparison: the traditional model with main
destination, the proposed multiple-site model with and without demographics, and the
pooled truncated Poisson model. The maximum likelihood estimation of the first three
models is programmed in Matlab, and the standard errors are computed using the inverse

43

This is one way to compute travel cost. Another way would be the operating cost (gas,
maintenance and tires) plus depreciation caused by driving, which gives $0.2422 per
mile. Results using this travel cost are available upon request.
44
The travel cost in this study is for each adult, not household. It does not count the
number of people in one vehicle.
102

of Hessian. It takes about 8-10 hours to estimate the multiple-site model with
demographics. The pooled truncated Poisson model is estimated in Stata.

103

4 Estimation Results
It can be seen from Table 21 that estimated parameters for the travel cost variable are
different in the three models. If we take into account the scale effect, for visiting one site
in the two nested logit models, β/σ is -0.00413 and -0.00404 respectively, which are both
bigger than the main destination model. The length variable has positive estimates in all
the models, but it is only statistically significant with visiting one site. The sign for water
temperature is negative, which may be counterintuitive and is the opposite of what we
expected. In fact, water temperature is highly correlated with regions. After analyzing the
data of long overnight trips, we find that more people go to Lake Superior and the north
part of Lake Michigan and Lake Huron, where the water is cold. Beaches on these areas
may have distinct unmeasured characteristics, compared to beaches in the south, and
people who take long overnight trips might care more about such unmeasured beach
quality than about water temperature. Thus, the regional effects are confounded with the
45

temperature variable and influence the signs of estimates . The scale parameters in two
nested logit models are all statistically significant and within the unit interval, which is
consistent with the utility maximization behavior and indicates that nesting with multiple
sites is better than no nesting. . However, we don’t find significant difference between
people who visit one site and people who visit two sites.

45

It is shown in Appendix E that with regional dummies to control the unmeasured
regional beach characteristics in the main destination model, the estimated parameter of
water temperature turns positive. However, both multiple-site models will not converge
with these regional dummies variables as explained in Appendix E.
104

Table 21: Full Information Maximum Likelihood (FIML) Estimation Results
Multiple-Site Model w/o
Demographics
Estimates
t statistics
Estimates
t statistics
Travel Cost
-0.00327***
-6.47
-0.00172***
-2.96
One: Length
0.283*
1.90
0.140***
2.76
One: Temperature
-0.0602
0.658
-0.0241**
-2.15
One: # of Beaches
0.0287**
2.25
0.0111*
1.89
Two: Primary Length
0.0748
0.479
Two: Primary Temperature
-0.102***
-4.44
Two: Primary # of Beaches
0.0341
1.63
Two: Secondary Length
0.0412
1.41
Two: Secondary Temperature
-0.0242**
-2.41
Two: Secondary # of Beaches
0.00651
1.49
Two: Primary Level Parameter
0.161***
2.68
One/Two Sites Level Parameter
0.416***
3.22
One: Male
One: Age
One: White
One: Education Years
One: Full-Time Employed
One: Retired
One: Children under 17
One: 0-25% Income
One: 25%-50% Income
One: 50%-75% Income
Note: *10% significance level; **5% significance level; *** 1% significance level
Variables

Main-Destination Model

105

Multiple-Site Model w/
Demographics
Estimates
t statistics
-0.00226***
-3.51
0.187***
3.04
-0.0305**
-2.4
0.0153*
1.94
0.101
0.646
-0.112***
-4.29
0.031
1.45
0.0545
1.46
-0.0326***
-2.67
0.00848
1.54
0.213***
3.08
0.560***
3.69
0.0321
0.125
0.00553
0.544
-1.2
-1.39
-0.0406
-0.85
0.122
0.403
0.137
0.293
-0.0619
-0.231
0.484
1.34
-0.311
-0.851
0.122
0.305

Table 22: Estimated Welfare Changes per Person in 2011 Dollars
Main-Destination Model

Multiple-Site Model w/o
Demographics
Per Trip/Trip
Per Trip
Change

Multiple-Site Model w/
Demographics
Per Trip/Trip
Per Trip
Change

Per Trip

Per Trip/Trip
Change

Closing One Site

-6.31

308.7

-5.17

211.0

-5.38

218.4

Marginal Length
Increase on One Site

2.03

313.0

1.68

217.7

1.74

225.0

106

As discussed in the previous section, for the length variable, with the null
hypothesis that the one site parameter is equal to the sum of the primary site parameter
and the secondary site parameter, we have t statistics to be 1.16 and 1.39 in the two
multiple-site models, so we cannot reject the null hypothesis at 10% significance level.
Estimated welfare changes are shown in Table 22, including per person per trip
measures and normalized per trip measures comparable to those of single-site demand
models or models with different choice sets. We consider the change as happening at
each of the 49 sites, and compute the welfare change for each site. The weighted average
across people is the welfare estimate for each site. The numbers in the table are mean
values of 49 sites. The two multiple-site models have similar measures, and the inclusion
of demographics makes the numbers a little bigger. The estimates of the main-destination
model are about 20% higher for per trip measures, and 40% higher for normalized
measures. The reciprocal of the scaled travel cost parameter estimate is -305.8 in the
main-destination model, -242.1 in the multiple-site model without demographics, and
247.5 in the multiple-site model with demographics. This explains part of the difference
among three models since the marginal utility of income is the denominator of welfare
estimates. Another factor causing the discrepancy is that the choice of multiple sites is
available in two multiple-site models. If one site is closed, the maximum utility one can
attain does not decrease that much since combinations of other sites may still give similar
utilities. It is the same with a marginal length increase. Therefore, ignoring the possibility
of people visiting multiple sites on overnight trips will have larger welfare changes
relative to the models allowing multiple sites per trip.

107

Additionally, in the survey, on average, people who took long overnight trips
spent 4.20 days on one beach. Dividing normalized access values in Table 22 by 4.20 will
produce per beach day values for one site of $73.5 in the main-destination model, $50.2
and $52.0 in two multiple-site models. Considering we are valuing long overnight trips,
these numbers are comparable to other beach recreation studies in Table 19.
As another point of comparison, we also estimate a pooled truncated Poisson
model following Parsons and Wilson (1997). The pooled model is truncated because the
data excludes people who didn’t take long overnight trips, but there is no need to adjust
for endogenous sampling as in Shaw (1988) because we survey from the general
population. The results in Table 23 show that people with children under the age of 17
take fewer long overnight trips. People who are full-time employed or retired might be
more likely to visit two sites. The dummy variable indicating a second site is not
statistically significant. The access values are more than twice those in the Random
Utility Models. Thus, the pooled count data model which assumes an ad hoc single
demand equation does not appear to be well-suited to modeling long overnight trips with
multiple sites.

108

Table 23: Estimation Results of Truncated Poisson Models
Variables
Travel Cost
Primary Length
Primary Temperature
Primary # of Beaches
Secondary Dummy: D
D × Secondary Length
D × Secondary Temperature
D × Secondary # of Beaches
Male
Age
White
Education Years
Full-Time Employed
Retired
Children under 17
0-25% Income
25%-50% Income
50%-75% Income
(1-D) × Male
(1-D) × Age
(1-D) × White
(1-D) × Education Years
(1-D) × Full-Time Employed
(1-D) × Retired
(1-D) × Children under 17
(1-D) × 0-25% Income
(1-D) × 25%-50% Income
(1-D) × 50%-75% Income
Constant

Main Destination
Estimates
t Statistics
-0.00136**
-2.26
0.0511
0.316
0.00773
0.583
-0.012
-0.417
0.224
1.19
-0.00398
-0.537
0.656
1.11
-0.0425
-0.951
-0.165
-0.596
0.246
0.689
-0.693***
-3.06
-0.16
-0.405
-0.358
-0.933
-0.0303
-0.0724
0.676
0.392

Multiple Sites
Estimates
t Statistics
-0.0013**
-2.276
-0.00351
-0.0248
0.00977
0.692
-0.0223
-0.774
1.9
0.399
0.221
0.714
-0.0248
-0.484
-0.00478
-0.124
-0.268
-0.635
-0.00856
-0.474
-0.552
-0.535
-0.0322
-0.337
1.36*
1.75
1.75*
1.897
-0.978**
-2.24
-0.157
-0.274
-0.988
-1.4
-0.556
-0.846
0.598
1.27
0.0041
0.21
1.4
1.15
-0.0136
-0.125
-1.75**
-2.13
-1.68*
-1.75
0.338
0.684
0.0536
0.0801
0.769
0.953
0.588
0.738
0.566
0.296

Access Value per Trip in
737.4
767.9
2011 Dollar
Note: *10% significance level; **5% significance level; *** 1% significance level

109

5 Discussion and Conclusions
In this chapter, we build a model structure for long overnight trips where people can
simultaneously decide how many sites to visit and where to go. The options of visiting
one or two sites are significantly different. If two sites are visited, unobserved
characteristics are shared among secondary sites within one primary site. We find that the
value per beach day per person is about $50-$52 for one site in 2011 dollars. The
traditional approach assuming only the main destination is visited on overnight trips tends
to have larger welfare estimates relative to the models where all possible combinations of
sites are included.
Since we have trip frequency data, we originally sought to apply a repeated nested
logit model which added a level of taking or not taking a long overnight trip. It took
about 1-2 days to estimate this four-level repeated nested logit model in Matlab. However,
after many tries with different sets of explanatory variables and different nesting
structures, such as separating or integrating the primary and secondary sites, using
regional dummies and assigning different scale parameters to the nests, either the
repeated nested logit model does not converge even with starting values from sequential
estimation, or the estimated parameter on the inclusive value for the trip is negative.
Recall that in 447 people taking long overnight trips, 92 visited two sites, about 25% of
the data. But in the nest of visiting two sites, there are 49× 2352 alternatives, and only
48,
4% of them have visitation information. Therefore, it is probable that our relatively small
sample of people visiting multiple sites on long overnight trips leads to the problem of
not converging.

110

One direction of future work would be to find more data on beach characteristics
and regional amenities, as it may be that regional amenities may be more important than
individual beach quality with aggregated sites. More factors beside the length and water
temperature could also have significant influence on people’s choices of where to go, like
facilities, the convenience of lodging and whether a beach is located in the state park, and
may avoid some of the regional correlations that appear especially problematic for the
estimated temperature parameter. Other detailed information for the trip may also be
included, such as activities, the number of adults and children, etc. Another direction
might be to add short overnight trips to the model to fully take into account all the
information of overnight trips. In addition, more complicated models like the mixed logit
model could be applied, which is flexible on the substitution patterns across people,
alternatives and even choice occasions. Nonetheless, these all seem to greatly increase the
estimation burden, and more efficient programming may be required.

111

APPENDICES

112

Appendix A

Results of Sensitivity Analyses for the Monte Carlo Simulations in Chapter 1

113

Sensitivity analyses are conducted to investigate whether changing underlying factors
will have significant effects on the results of the Monte Carlo simulations in Chapter 1.
We apply the simulations to three situations below: (1) a new set of true parameters, (2)
seven sites in the choice site, and (3) the same number of people in each group. Based on
baseline simulation results, in the sensitivity analyses, the nested logit model has site 1
and 2 in one nest, and there are two classes in the latent class model.

A.1 Different True Parameters
A.1.1 True Model-Latent Class Model
Simulation results are shown in the following tables.

114

46

Table A-1: Performance of Latent Class Model When It Is the True Model
True

Mean

Var.

MSE

Min.

Median

Max.

̂

-0.12

-0.43

1.71

1.81

-8.89

-0.12

-0.033

̂

0.15

-0.33

75.54

75.70

-106.1

0.37

56.32

̂

-0.07

-0.17

0.30

0.30

-9.11

-0.079

0.43

̂

2.15

5.82

458.8

471.7

-120.5

1.81

217.2

̂
̂ ̂

0.7

0.615

0.062

0.070

0.083

0.615

0.993

-1.25

-2.33

31.05

32.20

-18.6

-2.73

16.45

̂ ̂

-30.71

-57.12

186278

186785

-11330

-24.41

206.5

-0.105

-0.177

0.074

0.080

-2.39

-0.112

-0.083

0.75

0.84

5.31

5.31

-20.98

0.84

14.41

-10.09

-10.62

21.24

21.50

-122.4

-10.36

-4.66

̂
̂
̂ ̂

46

Results are from 978 iterations.
115

Table A-2: Performance of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model
True
Conditional
Logit

Nested
Logit

̂
̂
̂ ̂
̂
̂
̂ ̂

Mean

Var.

MSE

Min.

Median

Max.

-0.105

-0.092

2.3e-05

2.0e-04

-0.11

-0.091

-0.078

0.75

0.89

0.018

0.037

0.40

0.87

1.36

-10.09

-9.67

2.00

2.17

-14.52

-9.62

-4.03

-0.105

-0.096

4.0e-05

1.3e-04

-0.123

-0.095

-0.077

0.75

0.92

0.022

0.051

0.43

0.91

1.42

-10.09

-9.64

2.02

2.22

-14.63

-9.59

-4.12

Table A-3: Estimated Values of Marginal Quality Change of Latent Class Model When It Is the True Model

Class 1

Class 2

Average

True
-0.41
-0.40
-0.44
-6.62
-15.00
-9.10
-2.27
-4.78
-3.03

Mean
-0.61
-0.95
-0.77
-4.56
-42.56
-10.00
-2.36
-5.01
-3.26

Var.
3.72
3.38
3.39
11.35
168717
474.2
0.30
18.61
0.28

MSE
3.75
3.68
3.50
15.56
169304
474.5
0.30
18.64
0.33

116

Min.
-4.69
-7.84
-6.07
-37.65
-10670
-623.1
-3.70
-110.8
-8.86

Median
-0.91
-0.91
-0.91
-4.90
-10.89
-7.82
-2.41
-4.70
-3.25

Max.
7.44
3.66
5.35
28.62
183.2
44.85
-0.015
0.011
-1.46

Table A-4: Estimated Site Values of Latent Class Model When It Is the True Model

47

Class 1

48

Class 2

49

Average

True
9.08
8.33
9.41
5.30
17.16
8.32
7.94
10.98
9.08

Mean
8.87
8.83
8.77
5.41
27.46
8.85
7.91
11.07
8.94

Var.
0.86
1.62
0.12
7.24
33034
15.21
0.075
3.96
0.039

MSE
0.90
1.86
0.53
7.24
33094
15.47
0.076
3.97
0.060

Min.
6.21
5.03
8.13
-27.31
-86.44
-46.73
7.05
8.35
8.36

Median
8.75
8.78
8.80
6.07
14.61
9.20
7.92
10.95
8.95

Max.
14.32
14.88
14.67
19.86
4621
91.99
8.87
56.73
9.85

Table A-5: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model
Site
Conditional
Logit
Nested
Logit

47
48
49

1
2
3
1
2
3

Site Loss
True
Estimate
7.94
7.91
10.98
10.64
9.08
9.14
7.94
8.03
10.98
10.74
9.08
8.95

Quality Change
True
Estimate
-2.27
-2.86
-4.78
-3.61
-3.03
-3.20
-2.27
-2.87
-4.78
-3.60
-3.03
-3.16

After we exclude iterations with infinite site values, 884 iterations are used to compute the averages.
After we exclude iterations with infinite site values, 913 iterations are used to compute the averages.
After we exclude iterations with infinite site values, 819 iterations are used to compute the averages.
117

A.1.2 True Model-Conditional Logit Model
Simulation results are shown in the following tables.

118

Table A-6: Performance of Conditional Logit, Nested Logit and Latent Class Models When Conditional Logit Model Is the
True Model
True
Conditional
Logit

Nested Logit

̂
̂
̂ ̂
̂
̂
̂ ̂

Latent
50
Class

50

̂
̂
̂ ̂

Mean

Var.

MSE

Min.

Median

Max.

-0.07

-0.07

1.4e-05

1.4e-05

-0.087

-0.070

-0.060

2.15

2.16

0.014

0.014

1.83

2.16

2.59

-30.71

-30.79

1.51

1.51

-34.47

-30.74

-26.31

-0.07

-0.07

2.2e-05

2.2e-05

-0.089

-0.07

-0.058

2.15

2.16

0.023

0.023

1.75

2.16

2.69

-30.71

-30.79

1.52

1.52

-34.46

-30.75

-26.31

-0.07

-0.098

0.014

0.015

-1.53

-0.073

-0.015

2.15

2.76

6.40

6.77

1.12

2.22

35.24

-30.71

-31.28

17.28

17.59

-97.39

-30.92

3.20

Results are from 988 iterations.
119

Table A-7: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models When Conditional Logit Model Is the
True Model
Site
Conditional
Logit
Nested
Logit
Latent
Class

51

1
2
3
1
2
3
1
2
3

True
27.00
2.57
6.96
27.00
2.57
6.96
27.00
2.57
6.96

Site Loss
Estimate
27.06
2.57
6.96
27.06
2.57
6.96
51

27.58
2.59
6.91

Quality Change
True
Estimate
-19.41
-19.47
-3.67
-3.67
-7.63
-7.65
-19.41
-19.47
-3.67
-3.67
-7.63
-7.65
-19.41
-20.15
-3.67
-3.59
-7.63
-7.55

After we exclude iterations with infinite site values, 876 iterations are used to compute the averages.
120

A.1.3 True Model-Nested Logit Model
Simulation results are shown in the following tables.

121

Table A-8: Performance of Conditional Logit, Nested Logit and Latent Class Models When Nested Logit Model Is the True
Model
True
Conditional
Logit

Nested Logit

̂
̂
̂ ̂
̂
̂
̂ ̂

Latent
52
Class

52

̂
̂
̂ ̂

Mean

Var.

MSE

Min.

Median

Max.

-0.07

-0.093

2.6e-05

5.3e-04

-0.11

-0.092

-0.079

2.15

2.53

0.040

0.18

2.02

2.52

3.33

-30.71

-27.33

2.71

14.18

-32.63

-27.35

-22.16

-0.07

-0.071

2.8e-05

2.9e-05

-0.095

0.070

-0.058

2.15

2.17

0.026

0.026

1.72

2.16

2.85

-30.71

-30.76

3.34

3.34

-37.19

-30.77

-25.49

-0.07

-0.11

5.9e-03

3.52

-1.86

-0.10

-0.079

2.15

3.22

7.8e-03

4.67

-14.16

2.83

26.22

-30.71

-27.94

3.47

11.14

-33.61

-27.96

-22.37

Results are from 993 iterations.
122

Table A-9: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models When Nested Logit Model Is the True
Model
Site
Conditional
Logit
Nested
Logit
Latent
Class

53

1
2
3
1
2
3
1
2
3

True
8.56
16.79
4.32
8.56
16.79
4.32
8.56
16.79
4.32

Site Loss
Estimate
9.38
16.70
3.74
8.56
16.81
4.30
53

9.04
16.93
3.87

Quality Change
True
Estimate
-9.68
-9.01
-15.48
-13.55
-5.55
-4.77
-9.68
-9.69
-15.48
-15.53
-5.55
-5.53
-9.68
-9.06
-15.48
-14.92
-5.55
-3.96

After we exclude iterations with infinite site values, 965 iterations are used to compute the averages.
123

A.2 Seven Sites
A.2.1 True Model-Latent Class Model
Simulation results are shown in the following tables.

124

54

Table A-10: Performance of Latent Class Model When It Is the True Model
True

Mean

Var.

MSE

Min.

Median

Max.

̂

-0.06

-0.12

0.46

0.46

-11.06

-0.062

6.12

̂

0.49

2.22

292.8

295.5

-120.6

0.52

289

̂

-0.10

-0.23

0.82

0.84

-9.75

-0.073

0.27

̂

0.21

0.15

14.84

14.82

-58.55

0.33

30.95

̂
̂ ̂

0.7

0.42

0.086

0.16

0.0030

0.5

0.994

-8.17

-28.54

1.7e05

1.8e05

-8638

-7.93

3692

̂ ̂

-2.10

-3.28

15.99

17.37

-13.48

-4.28

12.74

-0.072

-0.097

0.015

0.015

-1.55

-0.073

-0.038

0.406

0.44

0.74

0.74

-9.91

0.409

11.65

-6.35

-6.68

56.7

56.7

-197.1

-6.28

47.5

̂
̂
̂ ̂

54

Results are from 999 iterations.
125

Table A-11: Performance of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model
True
Conditional
Logit

Nested
Logit

̂
̂
̂ ̂
̂
̂
̂ ̂

Mean

Var.

MSE

Min.

Median

Max.

-0.072

-0.068

6.7e-06

2.3e-05

-0.077

-0.068

-0.061

0.406

0.408

5.8e-03

5.8e-03

0.125

0.407

0.67

-6.35

-6.01

1.23

1.34

-9.60

-6.02

-1.72

-0.072

-0.068

2.3e-05

3.7e-05

-0.088

-0.068

-0.055

0.406

0.410

6.6e-03

6.6e-03

0.149

0.409

0.68

-6.35

-6.01

1.23

1.35

-9.61

-6.03

-1.76

126

Table A-12: Estimated Values of Marginal Quality Change of Latent Class Model When It Is the True Model

Class 1

Class 2

Average

True
-0.93
-1.02
-1.45
-0.86
-1.17
-1.53
-1.22
-0.30
-0.28
-0.32
-0.24
-0.30
-0.37
-0.31
-0.74
-0.80
-1.11
-0.67
-0.90
-1.18
-0.95

Mean
-4.78
-0.90
-12.88
-1.92
-1.31
-4.98
-1.79
-0.36
-0.40
-0.60
-0.32
-0.46
-0.64
-0.50
-0.66
-0.73
-1.51
-0.58
-0.86
-1.38
-0.96

Var.
18686
34.3
95285
2290.1
38.03
8065
102.6
0.31
0.30
0.38
0.23
0.34
0.47
0.33
0.41
0.037
40.63
0.10
0.047
2.25
0.085

127

MSE
18681
34.3
95320
2289.0
38.01
8068
102.8
0.31
0.32
0.46
0.24
0.36
0.54
0.36
0.42
0.042
40.75
0.11
0.048
2.28
0.085

Min.
-4256
-87.19
-8115
-1415
-81.52
-1644
-142.9
-1.23
-1.44
-2.85
-1.12
-1.78
-3.04
-2.12
-18.28
-2.43
-181.5
-6.46
-3.87
-26.12
-6.34

Median
-0.86
-0.95
-1.39
-0.75
-1.09
-1.50
-1.16
-0.55
-0.57
-0.69
-0.48
-0.62
-0.76
-0.63
-0.68
-0.74
-1.15
-0.62
-0.86
-1.24
-0.95

Max.
404.9
59.48
2172
277.3
112.9
1635
218.6
2.78
2.30
0.94
2.32
1.95
1.14
1.37
3.76
0.45
32.17
2.40
0.076
22.53
0.73

Table A-13: Estimated site values of latent class model when it is the true model

55

Class 1

56

Class 2

57

Average

55
56
57

True
2.40
2.69
4.08
2.19
3.11
4.46
3.21
2.26
2.15
2.74
1.58
2.23
3.12
2.37
2.36
2.53
3.68
2.00
2.85
4.06
2.96

Mean
2.34
2.70
5.39
2.26
3.12
5.46
3.51
2.47
2.54
3.11
2.08
2.72
3.53
2.69
2.38
2.55
3.71
2.05
2.84
4.11
2.96

Var.
90.52
21.44
682.4
31.5
29.5
430.2
55.56
0.41
0.35
0.79
0.41
0.40
0.75
0.49
0.028
0.021
0.28
0.024
0.025
0.21
0.041

MSE
90.43
21.42
683.4
31.5
29.5
430.7
55.59
0.45
0.50
0.93
0.66
0.65
0.92
0.59
0.029
0.021
0.28
0.027
0.025
0.21
0.041

Min.
-224.1
-43.02
-394.9
-75.54
-51.65
-351.6
-103.3
-1.66
-1.79
-2.35
-1.41
-1.90
-1.96
-2.02
1.14
1.87
-2.25
0.92
2.10
-0.42
1.98

Median
2.32
2.56
3.99
2.03
2.91
4.39
3.10
2.38
2.50
3.35
2.02
2.76
3.76
2.82
2.37
2.54
3.67
2.02
2.84
4.08
2.95

After we exclude iterations with infinite site values, 953 iterations are used to compute the averages.
After we exclude iterations with infinite site values, 963 iterations are used to compute the averages.
After we exclude iterations with infinite site values, 917 iterations are used to compute the averages.
128

Max.
58.33
68.64
241.7
59.51
83.18
227.4
101.0
10.76
10.69
10.95
10.62
11.04
11.25
10.88
3.79
4.27
12.69
3.50
4.97
12.53
5.74

Table A-14: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model
Site

Conditional
Logit

Nested
Logit

1
2
3
4
5
6
7
1
2
3
4
5
6
7

True
2.36
2.53
3.68
2.00
2.85
4.06
2.96
2.36
2.53
3.68
2.00
2.85
4.06
2.96

Site Loss
Estimate
2.37
2.55
3.61
2.04
2.86
4.02
2.96
2.37
2.55
3.61
2.04
2.86
4.02
2.96

129

Quality Change
True
Estimate
-0.74
-0.72
-0.80
-0.77
-1.11
-1.02
-0.67
-0.64
-0.90
-0.86
-1.18
-1.11
-0.95
-0.90
-0.74
-0.72
-0.80
-0.77
-1.11
-1.02
-0.67
-0.64
-0.90
-0.86
-1.18
-1.11
-0.95
-0.90

A.2.2 True Model-Conditional Logit Model
Simulation results are shown in the following tables.

130

Table A-15: Performance of Conditional Logit, Nested Logit and Latent Class Models When Conditional Logit Model Is the
True Model
True
Conditional
Logit

Nested
Logit

Latent Class

̂
̂
̂ ̂
̂
̂
̂ ̂
̂
̂
̂ ̂

Mean

Var.

MSE

Min.

Median

Max.

-0.06

-0.060

5.4e-06

5.4e-06

-0.069

-0.06

-0.054

0.49

0.49

5.1e-03

5.1e-03

0.28

0.49

0.71

-8.17

-8.18

1.44

1.44

-11.62

-8.18

-4.71

-0.06

-0.060

1.7e-05

1.7e-05

-0.077

-0.060

-0.047

0.49

0.49

6.5e-03

6.5e-03

0.28

0.49

0.79

-8.17

-8.18

1.46

1.46

-11.84

-8.18

-4.76

-0.06

-0.084

0.017

0.017

-1.69

-0.061

-0.024

0.49

0.60

0.95

0.96

-5.44

0.51

11.85

-8.17

-8.32

14.86

14.86

-71.36

-8.26

59.98

131

Table A-16: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models When Conditional Logit Model Is the
True Model
Site

Conditional
Logit

Nested
Logit

Latent
Class

58

1
2
3
4
5
6
7
1
2
3
4
5
6
7
1
2
3
4
5
6
7

True
3.15
3.56
3.32
1.86
2.34
4.54
3.44
3.15
3.56
3.32
1.86
2.34
4.54
3.44
3.15
3.56
3.32
1.86
2.34
4.54
3.44

Site Loss
Estimate
3.16
3.55
3.33
1.87
2.34
4.55
3.43
3.16
3.55
3.33
1.87
2.34
4.55
3.43
58

3.15
3.55
3.32
1.89
2.34
4.62
3.42

Quality Change
True
Estimate
-1.18
-1.18
-1.29
-1.29
-1.20
-1.21
-0.76
-0.75
-0.91
-0.90
-1.59
-1.60
-1.24
-1.24
-1.18
-1.18
-1.29
-1.29
-1.20
-1.21
-0.76
-0.75
-0.91
-0.90
-1.59
-1.60
-1.24
-1.24
-1.18
-1.19
-1.29
-1.32
-1.20
-1.22
-0.76
-0.69
-0.91
-0.87
-1.59
-1.77
-1.24
-1.27

After we exclude iterations with infinite site values, 904 iterations are used to compute the averages.
132

A.2.3 True Model-Nested Logit Model
Simulation results are shown in the following tables.

133

Table A-17: Performance of Conditional Logit, Nested Logit and Latent Class Models When Nested Logit Model Is the True
Model
True
Conditional
Logit

Nested Logit

̂
̂
̂ ̂
̂
̂
̂ ̂

Latent
59
Class

59

̂
̂
̂ ̂

Mean

Var.

MSE

Min.

Median

Max.

-0.06

-0.090

1.1e-05

9.1e-04

-0.10

-0.090

-0.079

0.49

0.64

4.2e-03

0.026

0.43

0.64

0.84

-8.17

-7.10

0.47

1.61

-8.97

-7.13

-4.74

-0.06

-0.060

1.7e-05

1.7e-05

-0.073

-0.060

-0.057

0.49

0.49

2.4e-03

2.4e-03

0.33

0.49

0.53

-8.17

-8.17

0.52

0.52

-10.28

-8.19

-5.40

-0.06

-0.095

1.2e-03

2.5e-03

-0.95

-0.091

-0.079

0.49

0.72

0.48

0.53

-1.33

0.65

16.07

-8.17

-7.02

0.55

1.88

-9.50

-7.04

-4.32

Results are from 999 iterations.
134

Table A-18: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models When Nested Logit Model Is the True
Model
Site

Conditional
Logit

Nested
Logit

Latent
Class

60

1
2
3
4
5
6
7
1
2
3
4
5
6
7
1
2
3
4
5
6
7

True
3.67
1.84
2.61
3.59
2.34
1.66
2.23
3.67
1.84
2.61
3.59
2.34
1.66
2.23
3.67
1.84
2.61
3.59
2.34
1.66
2.23

Site Loss
Estimate
3.69
2.14
2.80
3.66
2.12
1.56
1.98
3.66
1.84
2.61
3.59
2.34
1.66
2.23
60

3.69
2.15
2.76
3.67
2.11
1.58
1.97

Quality Change
True
Estimate
-1.60
-1.38
-0.88
-0.85
-1.17
-1.07
-1.55
-1.35
-1.12
-0.91
-0.82
-0.70
-1.02
-0.84
-1.60
-1.60
-0.88
-0.88
-1.17
-1.17
-1.55
-1.56
-1.12
-1.12
-0.82
-0.82
-1.02
-1.01
-1.60
-1.47
-0.88
-0.77
-1.17
-1.07
-1.55
-1.45
-1.12
-0.85
-0.82
-0.62
-1.02
-0.79

After we exclude iterations with infinite site values, 986 iterations are used to compute the averages.
135

A.3 Equal Probability of Membership in Latent Class Model
Simulation results are shown in the following tables.

136

Table A-19: Performance of Latent Class Model When It Is the True Model

61

True

Mean

Var.

MSE

Min.

Median

Max.

̂

-0.06

-0.27

0.909

0.953

-8.14

-0.072

3.21

̂

0.49

2.98

789.1

794.5

-186.5

0.58

179.1

̂

-0.10

-0.22

0.946

0.961

-8.76

-0.078

2.98

̂

0.21

0.025

8.47

8.50

-46.16

0.22

24.13

̂
̂ ̂

0.50

0.33

0.062

0.091

0.0038

0.32

0.995

-8.17

-116.6

9.2e06

9.2e06

-68980

-7.26

22980

̂ ̂

-2.10

-2.19

23.96

23.95

-22.23

-2.63

25.5

-0.08

-0.14

0.045

0.048

-2.02

-0.082

-0.039

0.35

0.47

5.52

5.52

-22.26

0.375

17.16

-5.13

-5.68

1213.6

1212.7

-772.4

-5.00

512.3

̂
̂
̂ ̂

61

Results are from 992 iterations.
137

Table A-20: Performance of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model
True
Conditional
Logit

Nested
Logit

̂
̂
̂ ̂
̂
̂
̂ ̂

Mean

Var.

MSE

Min.

Median

Max.

-0.08

-0.075

1.3e-05

3.9e-05

-0.086

-0.075

-0.064

0.35

0.35

0.031

0.031

-0.18

0.35

0.83

-5.13

-4.62

5.45

5.71

-11.11

-4.61

2.35

-0.08

-0.076

2.6e-05

4.3e-05

-0.092

-0.076

-0.061

0.35

0.33

0.035

0.036

-0.30

0.33

0.85

-5.13

-4.41

6.49

7.01

-13.33

-4.39

3.51

Table A-21: Estimated Values of Marginal Quality Change of Latent Class Model When It Is the True Model

Class 1

Class 2

Average

True
-2.96
-2.83
-2.38
-0.68
-0.69
-0.73
-1.82
-1.76
-1.56

Mean
-101.1
-5.05
-10.5
-0.80
-0.77
-0.62
-2.78
-1.70
-1.19

Var.
5.7e06
59411
2.6e06
2.66
2.61
2.79
777.3
52.4
149.8

138

MSE
5.7e06
59356
2.6e06
2.67
2.61
2.80
777.5
52.4
149.8

Min.
-65810
-3165
-45320
-8.63
-8.12
-5.48
-737.4
-36.08
-284.8

Median
-2.52
-2.48
-2.08
-0.89
-0.88
-0.86
-1.87
-1.78
-1.22

Max.
9373
6066
22390
6.77
7.16
11.57
309
199.7
239.6

Table A-22: Estimated Site Values of Latent Class Model When It Is the True Model

62

Class 1

63

Class 2

64

Average

True
11.48
10.83
8.91
8.63
9.06
9.67
10.06
9.95
9.29

Mean
13.83
7.00
1.18
9.63
9.59
9.67
9.88
9.68
9.28

Var.
32094
5240.9
633723
1.06
0.88
1.38
14.85
7.38
39.60

MSE
32062
5249.6
633058
2.07
1.16
1.38
14.81
7.42
39.42

Min.
-1864
-1535
-20840
6.60
7.12
6.27
-61.03
-55.06
-121.4

Median
10.22
9.97
8.93
9.71
9.64
9.50
10.0
9.82
9.35

Max.
3966
326.7
10120
25.22
25.11
26.4
53.7
17.57
117.9

Table A-23: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model
Site
Conditional
Logit
Nested
Logit

62
63
64

1
2
3
1
2
3

Site Loss
True
Estimate
10.06
9.94
9.95
9.81
9.29
9.32
10.06
9.95
9.95
9.84
9.29
9.29

Quality Change
True
Estimate
-1.82
-1.60
-1.76
-1.56
-1.56
-1.45
-1.82
-1.53
-1.76
-1.50
-1.56
-1.39

After we exclude iterations with infinite site values, 874 iterations are used to compute the averages.
After we exclude iterations with infinite site values, 953 iterations are used to compute the averages.
After we exclude iterations with infinite site values, 835 iterations are used to compute the averages.
139

Appendix B

Comparison between Driver License List and Census Data

140

The mail survey sample was drawn from Michigan’s driver license list (from the
Michigan Office of the Secretary of State). Its demographic statistics are compared to
2010 census data for age and gender. The cut points in the age are from the census.
Table B-1: Age and Gender Distribution of Census and Driver License List in Michigan
for People Age 16 or Older
Michigan

Census

Driver

Age 16+
Age 18+
Age 21+
Age 62+
Age 65+

100.00%
96.26%
90.46%
21.54%
17.38%

100.00%
97.76%
92.78%
22.48%
18.20%

Census
Male
48.50%
46.57%
43.62%
9.51%
7.50%

Driver
Male
49.78%
48.63%
46.08%
10.20%
8.12%

Census
Female
51.50%
49.69%
46.84%
12.04%
9.89%

Driver
Female
50.22%
49.13%
46.70%
12.27%
10.08%

Table B-2: Age and Gender Distribution of Census and Driver License List for People
Age 16 or Older, for the Upper Peninsula and Lower Peninsula
Census
Upper Peninsula
Age 16+
3.29%
Age 18+
3.20%
Age 21+
3.00%
Age 62+
0.86%
Age 65+
0.71%
Lower Peninsula
Age 16+
96.71%
Age 18+
93.07%
Age 21+
87.46%
Age 62+
20.68%
Age 65+
16.68%

Driver

Census
Male

Driver
Male

Census
Female

Driver
Female

3.09%
3.02%
2.89%
0.89%
0.73%

1.71%
1.66%
1.55%
0.40%
0.32%

1.56%
1.52%
1.46%
0.42%
0.34%

1.59%
1.54%
1.45%
0.46%
0.38%

1.53%
1.50%
1.43%
0.46%
0.39%

96.91%
94.74%
89.89%
21.59%
17.47%

46.69%
44.92%
42.07%
9.10%
7.17%

48.22%
47.11%
44.62%
9.78%
7.78%

49.92%
48.15%
45.39%
11.58%
9.50%

48.69%
47.63%
45.27%
11.81%
9.69%

As shown in the tables, the joint distribution of age and gender in the driver
license list is very close to that of the census data. Therefore, the driver license list
reasonably represents the general population of adults in the Lower Peninsula.

141

Appendix C

Data Weights

142

The survey weights are constructed in stages, starting with the mail survey sample and
ending with weights for the web survey respondents. This section describes each stage of
the weights.

C.1 Mail Survey Sample Weights
The mail survey has a weighted random sample, with the purpose of recruiting as many
participants in beach recreation as possible. Thus, the data need to be weighted back for
the analysis. Originally, 60% of the sample was drawn from coastal counties and 40%
from noncoastal counties in the Lower Peninsula. With removal of people who deceased
or moved, this may not be the case, so the weights are calculated by county and applied
65

to the effective sample of 29,613, where the base is the driver license list .

65

Weights are computed as ratios of the percentages in driver license list to the
percentages in the sample, so that they are normalized and do not distort the original
sample size.
143

Table C-1: Mail Survey Sample Weights for Counties in the Lower Peninsula
County Code
1
3
4
5
6
8
9
10
11
12
13
14
15
16
18
19
20
23
24
25
26
28
29
30
32
33
34
35
37
38
39
40
41
43
44
45
46
47
50
51
53
54

County Name
Alcona
Allegan
Alpena
Antrim
Arenac
Barry
Bay
Benzie
Berrien
Branch
Calhoun
Cass
Charlevoix
Cheboygan
Clare
Clinton
Crawford
Eaton
Emmet
Genesee
Gladwin
Grand Traverse
Gratiot
Hillsdale
Huron
Ingham
Ionia
Iosco
Isabella
Jackson
Kalamazoo
Kalkaska
Kent
Lake
Lapeer
Leelanau
Lenawee
Livingston
Macomb
Manistee
Mason
Mecosta
144

Sample Weight
0.67
0.73
0.69
0.69
0.68
1.12
0.65
0.68
0.85
1.24
1.43
1.32
0.67
0.69
1.30
1.18
1.16
1.21
0.72
1.39
1.11
0.67
1.10
1.30
0.69
1.34
1.03
0.68
0.96
1.24
1.35
1.40
1.40
1.21
1.21
0.71
1.22
1.16
0.71
0.64
0.76
1.08

Table C-1 (cont’d)
County Code
56
57
58
59
60
61
62
63
64
65
67
68
69
70
71
72
73
74
75
76
78
79
80
81
82
83

County Name
Midland
Missaukee
Monroe
Montcalm
Montmorency
Muskegon
Newaygo
Oakland
Oceana
Ogemaw
Osceola
Oscoda
Otsego
Ottawa
Presque Isle
Roscommon
Saginaw
St. Clair
St. Joseph
Sanilac
Shiawassee
Tuscola
Van Buren
Washtenaw
Wayne
Wexford

145

Sample Weight
1.18
1.06
0.73
1.19
1.26
0.72
1.31
1.40
0.81
1.22
1.41
1.15
1.29
0.67
0.68
1.27
1.36
0.71
1.49
0.66
1.16
0.65
0.83
1.42
0.90
1.29

C.2 Mail Survey Respondent Weights
A probit response/nonresponse model over the effective sample of 29,613 is run with the
mail survey sample weights (Table C-1) and with independent variables from the driver’s
license data (age, gender and counties). Variables that are not statistically significant at
90% confidence level are not shown.
Table C-2: Results of a Probit Response/Nonresponse Model for the Mail Survey Using
Sample Weights
Probit without County
Dummies
Estimates
t Statistics
0.0139***
31.9
0.138***
8.79
-0.917***
-38.4

Probit with County
Dummies
Variables
Estimates
t Statistics
Age
0.0138***
31.3
Gender
0.143***
9.05
Constant
-0.782***
-5.28
Macomb County (Coastal)
-0.256*
1.73
Wayne County (Coastal)
-0.412***
-2.80
Note: *10% significance level; **5% significance level; *** 1% significance level

The results above are suggestive of demographic differences in respondents to the
mail survey. To correct for possible response/non-response bias together with the
sampling scheme, additional weights for the 9,591 eligible mail survey respondents are
computed according to the joint distribution of age, gender and counties, where the base
is still the driver license list. There are eight age ranges (16-24, 25-34, 45-54, 55-64, 6574, 75-84 and 85+) and four county categories (Macomb, Wayne, other coastal counties
and noncoastal counties). For the category of age 85+, there are only two county
categories, coastal and noncoastal counties; otherwise, the number of people in some
cells will be smaller than 30, which may have negative impacts on the weighting.

146

Table C-3: Joint Age, Gender and County Distribution of Driver License List*
Gender

County

Male
Male
Male
Male
Male
Female
Female
Female
Female
Female

Macomb
Wayne
Other Coastal
Coastal
Noncoastal
Macomb
Wayne
Other Coastal
Coastal
Others

Age 1624
0.62%
1.43%
1.31%
3.98%
0.59%
1.36%
1.20%
3.76%

Age 2534
0.71%
1.71%
1.45%
4.82%
0.69%
1.54%
1.31%
4.40%

Age 3544
0.77%
1.81%
1.45%
4.61%
0.77%
1.64%
1.37%
4.41%

Age 4554
0.84%
1.82%
1.73%
4.99%
0.85%
1.72%
1.69%
5.00%

Age 5564
0.65%
1.42%
1.50%
4.11%
0.68%
1.46%
1.51%
4.28%

Age 6574
0.35%
0.71%
0.91%
2.26%
0.41%
0.82%
0.96%
2.47%

Age 7584
0.21%
0.42%
0.50%

Age 85+

1.21%
0.29%
0.59%
0.62%
1.57%

0.73%
0.72%

1.12%
1.15%

Table C-4: Joint Age, Gender and County Distribution of 9,591 Eligible Mail Survey Respondents*
Gender

County

Male
Male
Male
Male
Male
Female
Female
Female
Female
Female

Macomb
Wayne
Other Coastal
Coastal
Noncoastal
Macomb
Wayne
Other Coastal
Coastal
Others

Age 1624
0.34%
0.46%
0.82%

Age 2534
0.69%
0.74%
1.26%

Age 3544
0.62%
0.82%
1.55%

Age 4554
0.97%
1.25%
2.77%

Age 5564
1.06%
1.45%
2.93%

Age 6574
0.66%
1.06%
2.29%

Age 7584
0.33%
0.47%
1.15%

1.42%
0.47%
0.71%
0.94%

2.37%
0.78%
1.14%
1.65%

2.43%
0.90%
1.19%
2.07%

4.08%
1.70%
2.02%
3.29%

4.29%
1.38%
1.91%
3.54%

2.76%
0.65%
1.27%
2.25%

1.23%
0.50%
0.77%
1.24%

1.74%

3.24%

3.45%

5.89%

5.60%

3.43%

1.65%

*The distributions use the mail survey sample weights (Table C-1).

147

Age 85+

0.50%
0.38%

0.90%
0.57%

Table C-5: Mail Survey Respondent Weights
Gender
Male
Male
Male
Male
Male
Female
Female
Female
Female
Female

66

County
Macomb
Wayne
Other Coastal
Coastal
Noncoastal
Macomb
Wayne
Other Coastal
Coastal
Others

Age 1624
1.79
3.11
1.59
2.81
1.26
1.92
1.28
2.16

Age 2534
1.04
2.31
1.15
2.04
0.88
1.36
0.79
1.36

Age 3544
1.25
2.20
0.93

Age 4554
0.86
1.46
0.62

1.90
0.85
1.38
0.66
1.28

They are normalized to the size of eligible mail survey respondents.
148

1.22
0.50
0.85
0.51
0.85

66

Age 5564
0.62
0.98
0.51
0.96
0.50
0.77
0.43
0.76

Age 6574
0.54
0.67
0.40
0.82
0.64
0.64
0.42
0.72

Age 7584
0.62
0.90
0.44
0.99
0.58
0.76
0.50
0.96

Age 85+

1.47
1.91

1.25
2.01

C.3 Web Survey Respondent Weights
Similarly, before calculating the weights for web survey data, a probit
response/nonresponse model is run over the web sample of 5,476. The dependent
variable is response/nonresponse to the web survey, and the independent variables are
gender, age, race, education and employment, which were reported in the mail survey.
The analysis is performed using the mail survey respondent weights (Table C-5).
Variables that are not statistically significant at 90% confidence level are not shown.
Table C-6: Results of a Probit Response/Nonresponse Model for the Web Survey Using
Mail Survey Respondent Weights
Variables
Estimates
t Statistics
Age
0.00476***
2.72
White
0.381***
2.84
Asian
0.564**
2.34
Some Schooling
4.54***
15.7
High School or Equivalent
4.73***
19.9
Associate’s or Technical Degree
4.90***
20.5
College Degree
5.19***
21.8
Advanced Degree
5.16***
21.7
College or Equivalent
0.323***
7.07
Graduate Degree
0.450***
7.33
Constant
-5.52***
-12.5
Benzie County (Coastal)
1.03**
2.49
Hillsdale County (Noncoastal)
0.842*
1.95
Isabella County (Noncoastal)
0.704*
1.76
Leelanau County (Coastal)
0.797*
1.90
Roscommon (Noncoastal)
0.647*
1.85
Note: *10% significance level; **5% significance level; *** 1% significance level

149

Table C-7: Results of a Probit Response/Nonresponse Model for the Web Survey
Using Mail Survey Respondent Weights With Fewer Variables
Variable
Estimates
t Statistics
Age 16-24
0.401***
3.89
Age 25-34
0.338***
3.57
Age 35-44
0.438***
4.70
Age 45-54
0.561***
6.38
Age 55-64
0.751***
8.47
Age 65-74
0.587***
6.17
White
0.289***
3.96
College degree
0.413***
9.81
Significant counties
0.446***
3.98
Constant
-0.712***
-6.65
Note: *10% significance level; **5% significance level; *** 1% significance level
If many factors are taken into account to correct the response/nonresponse bias,
the number of people in each elementary cell will be small and the weight will be big,
which could inflate variances. Therefore, to reduce the number of factors, we run the
following regression. All variables that are not statistically significant in previous
regression are dropped. Age dummy variables replace the continuous age variable for the
purpose of weighting. There are only 56 Asians in the respondents, so the corresponding
variable is not included. For the education, the effects of having a college degree and an
advanced degree are very similar, so a new dummy variable is created indicating whether
a person has a college degree or not. All county dummies collapse into one where it
equals one if a person lives in the five statistically significant counties in Table C-6.
Hence, four factors, age, county, race and education, have significant effects on
the weights for 2,544 eligible web survey respondents. Since the number of people can be
quite small in some categories, the approach of raking weights is used, rather than
comparison of joint distributions. The computation is implemented in SAS raking

150

67

macro , and the mail survey respondent weights apply to both the web survey sample
and eligible respondents. Only people with no missing data in race and education enter
68

the computation .
Table C-8: Raking Weights for Web Survey Respondents with No Missing
69
70
Data (Non-Normalized )
Age Category
Age 16-24
Age 16-24
Age 16-24
Age 16-24
Age 16-24
Age 16-24
Age 25-34
Age 25-34
Age 25-34
Age 25-34
Age 25-34
Age 25-34
Age 25-34
Age 35-44
Age 35-44
Age 35-44
Age 35-44
Age 35-44
Age 35-44
Age 35-44

67

Significant County
0
0
0
0
1
1
0
0
0
0
1
1
1
0
0
0
0
1
1
1

College Degree
0
0
1
1
0
1
0
0
1
1
0
0
1
0
0
1
1
0
1
1

White
0
1
0
1
1
1
0
1
0
1
0
1
1
0
1
0
1
1
0
1

Web Weights
1.60
1.25
1.15
0.89
0.90
0.65
1.60
1.25
1.15
0.89
1.16
0.91
0.65
1.58
1.23
1.13
0.88
0.89
0.82
0.64

It is developed by David Izrael, Abt Associates, June 1999.

68

Missing data could be treated as a separate category; however, the percentage of
missing data is too low to make the raking weights converge.
69
Outcomes of the macro are individual-specific when input data has weights, in our
case, the mail survey respondent weights. If the outcomes are divided by the input
weights, the results are very similar among people in the same age, county, race and
education category. Differences come from rounding errors. Therefore, we take averages
of those results in the finest category and treat them as the raking weights for web survey
respondents.
70
The original outcomes are normalized to the total number of people with no missing
data. When we divide them by the input weights, the normalization no long holds.
151

Table C-8 (cont’d)
Age Category
Age 45-54
Age 45-54
Age 45-54
Age 45-54
Age 45-54
Age 45-54
Age 55-64
Age 55-64
Age 55-64
Age 55-64
Age 55-64
Age 55-64
Age 65-74
Age 65-74
Age 65-74
Age 65-74
Age 65-74
Age 65-74
Age 65-74
Age 75+
Age 75+
Age 75+
Age 75+
Age 75+
Age 75+

Significant County
0
0
0
0
1
1
0
0
0
0
1
1
0
0
0
0
1
1
1
0
0
0
0
1
1

College Degree
0
0
1
1
0
1
0
0
1
1
0
1
0
0
1
1
0
0
1
0
0
1
1
0
1

White
0
1
0
1
1
1
0
1
0
1
1
1
0
1
0
1
0
1
1
0
1
0
1
1
1

Web Weights
1.44
1.12
1.03
0.80
0.81
0.58
1.33
1.03
0.95
0.74
0.75
0.54
1.45
1.13
1.04
0.81
1.05
0.82
0.59
3.26
2.53
2.33
1.81
1.84
1.32

For people with missing data in race, we match them according to their age,
county and education in Table C-8, and use weighted web weights. For example, a person
has Age 16-24 in the age category, 0 in county and 0 in college. Under these criteria, we
have 12 non-White people and 106 White people in Table C-8, with a weight of 1.60 and
1.25 respectively. Then the weight of this person is calculated as:

152

The same procedure is applied to people with missing data in education and in both race
and education.
Table C-9: Raking Weights for Web Survey Respondents with Missing Data
Age Category
Age 16-24
Age 16-24
Age 16-24
Age 16-24
Age 25-34
Age 25-34
Age 25-34
Age 35-44
Age 35-44
Age 35-44
Age 45-54
Age 45-54
Age 45-54
Age 45-54
Age 45-54
Age 55-64
Age 55-64
Age 55-64
Age 55-64
Age 55-64
Age 65-74
Age 65-74
Age 65-74
Age 65-74
Age 65-74
Age 75+
Age 75+

Significant
County
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0

College Degree

White

Web Weights

0
1

1.51
1.15
1.28
0.91
1.04
1.29
0.93
1.02
1.26
0.90
1.21
0.99
1.14
0.82
1.00
0.91
0.65
1.05
0.75
0.92
0.96
0.79
1.14
0.81
0.97
2.19
1.83

0
1
1
0
1
1
0
1
0
1
0
1
1
1
0
1
1
1
0
1
1
1

When all 2,544 eligible web respondents have their web survey weights, we
multiply them with corresponding mail survey weights, and normalize the products to the
size of 2,544, which gives us the final weights for eligible web survey respondents.

153

Table C-10: Distribution of Normalized Final Weights for Web Respondents
Final Weight
0.2 to 0.3
0.3 to 0.4
0.4 to 0.5
0.5 to 0.6
0.6 to 0.7
0.7 to 0.8
0.8 to 0.9
0.9 to 1
1 to 1.5
1.5 to 2
2 to 3
3 to 4
4 to 5
5 to 6

Count
5
158
256
425
179
302
109
267
389
259
156
35
3
1

Percent
0.20%
6.21%
10.06%
16.71%
7.04%
11.87%
4.28%
10.50%
15.29%
10.18%
6.13%
1.38%
0.12%
0.04%

The big range between individual weights may distort the analysis and inflate the
variation. Therefore, we use three censoring rules to trim the weights. The first is ad hoc,
keeping the weights between 0.3 and 3; the second range is 0.4 to 2.3, where 163 people
are censored on both sides; the third range is 0.37 to 2.45, where approximately 5% of
people get censored. Trimmed weights are then normalized to the size of 2,544. The three
new sets of weights, as well as the original weights, are applied to eligible web
respondents to compare the joint distribution on age, county, education and race with the
71

web sample with mail survey respondent weights . Although some discrepancies exist
because of missing data, especially for old people, the differences are very small, so all
four types of weights can be used in data analysis to correct for possible sampling and
nonresponse bias. The analyses in chapters 2 and 3 use the non-censored weights.

71

There are 87 possible combinations of values in age, county, education and race for
the web sample, and 72 for web respondents, because people in some categories did not
respond. All missing categories take about 0.6% of the sample, so this is negligible.
154

Appendix D

Great Lakes Beach Recreation Participation

155

D.1 Participation in Various Activities
The summary of the mail survey data on leisure activities is presented below. The items
are presented in the same order that they appeared in the mail survey. The Great Lakes
beach question is show in bold in the bottom one-third of the table.

156

Table D-1: Participation in Leisure Activities

Participation Rate
Eat Dinner at a Restaurant
Go for a Walk or a Hike
Attend or Participate in Outdoor Sports
Swim at a Pool, Lake or River
Go to a Movie in a Theater
Attend a Music Concert
Attend a Cultural or Arts Festival/Fair
Visit County, City, or Township Park
Visit State Park or State Campground
Visit State Forest or State Game Area
Visit National Park or National Forest
Camping
Hunting
Fishing
Boating
Picnicking at Public Parks
Visiting a Beach
Driving an All-Terrain Vehicle (ATV)
Snowmobiling
Skiing or Snowboarding
Visiting a Beach on the Great Lakes
Fishing on the Great Lakes
Boating on the Great Lakes
Read Books
Indoor/Outdoor Exercise
Watch Television
Use the Internet
Play Video Games
Play a Musical Instrument
Volunteer

97.18%
87.50%
65.59%
64.27%
66.95%
48.50%
59.91%
73.72%
52.09%
25.04%
20.44%
30.75%
15.84%
32.30%
45.96%
45.79%
64.20%
14.20%
6.82%
11.01%
59.14%
14.39%
21.86%
77.48%
82.79%
96.91%
83.56%
21.24%
10.56%
37.22%

Participation Rate
(Mail Survey
Respondent
Weights)
97.26%
88.24%
68.17%
68.70%
70.80%
49.69%
59.35%
74.54%
53.76%
25.41%
19.89%
33.56%
16.77%
34.31%
47.93%
46.30%
65.34%
15.81%
7.71%
12.73%
58.01%
14.22%
21.08%
75.29%
83.39%
96.58%
85.82%
26.31%
11.92%
35.53%

The three Great Lakes activities have slightly lower participation rates when the
weights are applied, which should be the case since coastal counties were oversampled.

157

D.2 Participation in Great Lakes Beach Recreation
To investigate what factors influence participation in beach recreation, a probit model is
used with mail survey respondent weights. The dependent variable is a binary variable of
visiting a Great Lakes beach or not, and the independent variables include demographics
and county dummies. Variables that are not statistically significant at 90% confidence
level are not shown below.
The results illustrate that these kinds of people are more likely to visit Great
Lakes beaches: young people and couples with children age 6 to 17; these kinds of people
are less likely to: African American, people unemployed and couples with children age
under 5. Although most of the education and income categories have negative effects,
people with higher education and income are more likely to visit Great Lakes beaches.
Also, as expected, people living in coastal counties are more likely to visit Great Lakes
beaches than people from noncoastal counties. The only exception is Wayne County, a
highly urbanized county.

158

Table D-2: Factors Influencing Participation in Great Lakes Beach Visitation
Variable
Estimates
t Statistics
Age
-0.00849***
-5.44
Black/African American
-0.597***
-4.56
Some Schooling
-0.430***
-3.28
High School or Equivalent
-0.328***
-5.76
Income: Less than $25,000
-0.432***
-6.34
Income: $25,000 to $49,999
-0.320***
-5.47
Income: $50,000 to $99,999
-0.0859*
-1.65
Unemployment
-0.227*
-1.66
Household: Couple with Children Age 5 and Under
-0.162*
-1.66
Household: Couple with Children Age 6 to 17
0.138*
1.89
Constant
1.21***
3.76
Arenac County (Coastal)
1.10**
2.42
Barry County (Noncoastal)
-0.5978
-1.79
Benzie County (Coastal)
0.833**
2.26
Berrien County (Coastal)
0.674**
2.42
Cheboygan County (Coastal)
0.832*
1.95
Emmet County (Coastal)
0.824**
2.45
Grand Traverse County (Coastal)
0.543*
1.91
Iosco County (Coastal)
0.617*
1.72
Jackson County (Noncoastal)
-0.485*
-1.67
Lenawee County (Noncoastal)
-0.752**
-2.38
Manistee County (Coastal)
1.13***
3.39
Muskegon County (Coastal)
0.75***
2.68
Oakland County (Noncoastal)
-0.503*
-1.93
Oceana County (Coastal)
1.11***
2.91
Ottawa County (Coastal)
0.654**
2.4
Saginaw County (Noncoastal)
-0.493*
-1.77
Washtenaw County (Noncoastal)
-0.52*
-1.93
Wayne County (Coastal)
-0.543**
-2.1
Note: *10% significance level; **5% significance level; *** 1% significance level

159

Appendix E

Model Sensitivity in Chapter 3

160

When the regional dummy variables are added to the traditional model with main
destination, the estimated parameters on travel cost and the number of beaches in the
aggregated site do not change much, so these two variables are robust to these dummies.
The estimated length parameter decreases about 43.8%, and the estimated temperature
parameter turns positive with a 122.6% increase. Both variables are sensitive to the
regional dummies, which demonstrate that beach quality is correlated with regional
geographic characteristics.
Table E-1: Parameter Estimates of Main Destination Model with and without Regional
Dummies
No Regional Dummies
Regional Dummies
Estimates
t Statistics
Estimates
t Statistics
Travel Cost
-0.00327***
-6.47
-0.00381***
-6.67
Length
0.283*
1.90
0.159**
1.96
Temperature
-0.0602
0.658
0.0136
0.679
# of Beaches
0.0287**
2.25
0.0272**
2.32
LP Northeast
-0.853***
-2.83
LP Mid-East
-1.67***
-3.66
LP Southeast
-2.28***
-4.23
LP Northwest
-0.55*
-1.84
LP Mid-West
-0.566
-1.54
LP Southwest
-1.45***
-3.25
UP Lake Michigan
-0.941**
-2.21
Note: *10% significance level; **5% significance level; *** 1% significance level
Variables

However, when regional dummy variables are added to multiple sites, they will
appear in three different places: in the nest of visiting one site and in both primary and
secondary sites in the nest of visiting two sites. When any model with this formulation
was attempted, the model estimation would not converge. Thus, we have dropped these
regional dummies from the model in chapter 3, and their effects are manifested in part
through the estimates for the length and temperature variables.

161

REFERENCES

162

REFERENCES

Akiva, M. E. B. and S. R. Lerman (1985). Discrete choice analysis: theory and application to
predict travel demand, The MIT press.

Boxall, P. C. and W. L. Adamowicz (2002). "Understanding heterogeneous preferences in
random utility models: a latent class approach." Environmental & Resource Economics
23(4): 421-446.

Burton, M. and D. Rigby (2009). "Hurdle and latent class approaches to serial non-participation
in choice models." Environmental and Resource Economics 42(2): 211-226.

Caulkins, P. P., R. C. Bishop, et al. (1986). "The travel cost model for lake recreation: a
comparison of two methods for incorporating site quality and substitution effects."
American Journal of Agricultural Economics 68(2): 291-297.
Champ, Patricia A., Kevin J. Boyle, and Thomas C. Brown, eds. A primer on nonmarket
valuation. Vol. 3. Springer, 2003.
Cutter, W. B., L. Pendleton, et al. (2007). "Activities in models of recreational demand." Land
Economics 83(3): 370-381.

Deacon, R. T. and C. D. Kolstad (2000). "Valuing beach recreation lost in environmental
accidents." Journal of Water Resources Planning & Management 126(6): 374.

Englin, J., and Shonkwiler, J. S. (1995). Estimating social welfare using count data models: an
application to long-run recreation demand under conditions of endogenous stratification
and truncation. The Review of Economics and Statistics: 104-112.

Englin, J. and J. S. Shonkwiler (1995). "Modeling recreation demand in the presence of
unobservable travel costs: toward a travel price model." Journal of Environmental
Economics and Management 29(3): 368-377.

163

English, E. (2008). "Recreation nonparticipation as choice behavior rather than statistical
outcome." American Journal of Agricultural Economics 90(1): 186-196.

Greene, W. H. and D. A. Hensher (2003). "A latent class model for discrete choice analysis:
contrasts with mixed logit." Transportation Research Part B-Methodological 37(8): 681698.

Haab, T. C., M. Hamilton, et al. (2008). "Small boat fishing in Hawaii: a random utility model of
ramp and ocean destinations." Marine Resource Economics 23(2): 137.

Haab, T. C. and R. L. Hicks (1997). "Accounting for choice set endogeneity in random utility
models of recreation demand." Journal of Environmental Economics and Management
34(2): 127-147.

Haab, T. C. and K. E. McConnell, Valuing environmental and natural resources: the
econometrics of non-market valuation, Edward Elgar Publishing 2002.

Haener, M. K., P. C. Boxall, et al. (2004). "Aggregation bias in recreation site choice models:
resolving the resolution problem." Land Economics 80(4): 561-574.

Herriges, J. A. and C. L. Kling (1997). "The performance of nested logit models when welfare
estimation is the goal." American Journal of Agricultural Economics 79(3): 792-802.

Hilger, J. and M. Hanemann (2006). "Heterogeneous preferences for water quality: a finite
mixture model of beach recreation in Southern California."

Hoehn, J. P., Tomasi, T., Lupi, F., & Chen, H. Z. (1996). An economic model for valuing
recreational angling resources in Michigan. Michigan State University, Report to the
Michigan Department of Environmental Quality.

Hynes, S., N. Hanley, et al. (2007). "Up the proverbial creek without a paddle: Accounting for
variable participant skill levels in recreational demand modelling." Environmental and
Resource Economics 36(4): 413-426.

164

Hynes, S., N. Hanley, et al. (2008). "Effects on welfare measures of alternative means of
accounting for preference heterogeneity in recreational demand models." American
Journal of Agricultural Economics 90(4): 1011-1027.

Izrael, D., D. C. Hoaglin, et al. (2000). A SAS macro for balancing a weighted sample.
Proceedings of the Twenty-Fifth Annual SAS Users Group International Conference,
Citeseer.

Kaoru, Y., V. K. Smith, et al. (1995). "Using random utility models to estimate the recreational
value of estuarine resources." American Journal of Agricultural Economics 77(1): 141151.

Kealy, M. J. and R. C. Bishop (1986). "Theoretical and empirical specifications issues in travel
cost demand studies." American Journal of Agricultural Economics 68(3): 660-667.

Kim, H. N., W. D. Shaw, et al. (2007). "The distributional impacts of recreational fees: A
discrete choice model with incomplete data." Land Economics 83(4): 561-574.

Kosenius, A. K. (2010). "Heterogeneous preferences for water quality attributes: The Case of
eutrophication in the Gulf of Finland, the Baltic Sea." Ecological Economics 69(3): 528538.

Leeworthy, V. R. and United States. National Ocean Service (2005). Projected participation in
marine recreation: 2005 & 2010, US Dept. of Commerce, National Oceanic and
Atmospheric Administration, National Ocean Service, Special Projects.

Lew, D. K. and D. M. Larson (2005). "Accounting for stochastic shadow values of time in
discrete-choice recreation demand models." Journal of Environmental Economics and
Management 50(2): 341-361.

Lew, D. K. and D. M. Larson (2008). "Valuing a beach day with a repeated nested logit model of
participation, site choice, and stochastic time value." Marine Resource Economics 23(3):
233.

165

Loomis, J. B., S. Yorizane, et al. (2000). "Testing significance of multi-destination and multipurpose trip effects in a travel cost method demand model for whale watching trips."
Agricultural and Resource Economics Review 29(2).

Lupi, F. and P. M. Feather (1998). "Using partial site aggregation to reduce bias in random utility
travel cost models." Water Resources Research 34(12): 3595-3603.

Lupi, F., Hoehn, J. P., & Christie, G. C. (2003). Using an economic model of recreational fishing
to evaluate the benefits of Sea Lamprey (Petromyzon marinus) Control on the St. Marys
River. Journal of Great Lakes Research 29: 742-754.

McKean, J. R., R. G. Walsh, et al. (1996). "Closely related good prices in the travel cost model."
American Journal of Agricultural Economics 78(3): 640-646.

Mendelsohn, R., J. Hof, et al. (1992). "Measuring recreation values with multiple destination
trips." American Journal of Agricultural Economics 74(4): 926-933.

Moeltner, K. and J. S. Shonkwiler (2005). "Correcting for on-site sampling in random utility
models." American Journal of Agricultural Economics 87(2): 327-339.

Morey, E. R., R. D. Rowe, et al. (1993). "A repeated nested-logit model of Atlantic salmon
fishing." American Journal of Agricultural Economics 75(3): 578-592.

Morey, E.R., J. Thacher, et al. (2006). "Using angler characteristics and attitudinal data to
identify environmental preference classes: a latent-class model." Environmental &
Resource Economics 34(1): 91-115.

Murray, C., B. Sohngen, et al. (2001). "Valuing water quality advisories and beach amenities in
the Great Lakes." Water Resources Research 37(10): 2583-2590.

NOAA GLERL, 2013. Unpublished data, Great Lakes Coastal Forecasting System. NOAA Great
Lakes Environmental Research Laboratory, Ann Arbor, MI, www.glerl.noaa.gov.

166

Owen, A. L. and J. R. Videras (2007). "Culture and public goods: the case of religion and the
voluntary provision of environmental quality." Journal of Environmental Economics and
Management 54(2): 162-180.
Parsons, G. R. and M. S. Needelman (1992). "Site aggregation in a random utility model of
recreation." Land Economics: 418-433.

Parsons, G. R., A. K. Kang, et al. (2009). "Valuing beach closures on the Padre Island National
Seashore." Marine Resource Economics 24(3).

Parsons, G. R. and A. J. Wilson (1997). "Incidental and joint consumption in recreation
demand." Agricultural and Resource Economics Review 26: 1-6.

Patunru, A. A., J. B. Braden, et al. (2007). "Who cares about environmental stigmas and does it
matter? a latent segmentation analysis of stated preferences for real estate." American
Journal of Agricultural Economics 89(3): 712-726.

Provencher, B. and R. C. Bishop (1997). "An estimable dynamic model of recreation behavior
with an application to Great Lakes angling." Journal of Environmental Economics and
Management 33(2): 107-127.

Provencher, B. and R. C. Bishop (2004). "Does accounting for preference heterogeneity improve
the forecasting of a random utility model? A case study." Journal of Environmental
Economics and Management 48(1): 793-810.

Scarpa, R. and M. Thiene (2005). "Destination choice models for rock climbing in the
Northeastern Alps: a latent-class approach based on intensity of preferences." Land
Economics 81(3): 426-444.

Scarpa, R., M. Thiene, et al. (2007). "Latent class count models of total visitation demand: days
out hiking in the Eastern Alps." Environmental & Resource Economics 38(4): 447-460.

Schuhmann, P. W. and K. A. Schwabe (2004). "An analysis of congestion measures and
heterogeneous angler preferences in a random utility model of recreational fishing."
Environmental and Resource Economics 27(4): 429-450.

167

Shaw, D. (1988). On-site samples' regression: Problems of non-negative integers, truncation, and
endogenous stratification. Journal of Econometrics, 37(2): 211-223.

Shaw, W. D. and M. T. Ozog (1999). "Modeling overnight recreation trip choice: application of
a repeated nested multinomial logit model." Environmental and Resource Economics
13(4): 397-414.

Shonkwiler, J. S. and W. D. Shaw (1996). "Hurdle count-data models in recreation demand
analysis." Journal of Agricultural and Resource Economics: 210-219.

Shrestha, R. K., A. F. Seidl, et al. (2002). "Value of recreational fishing in the Brazilian Pantanal:
a travel cost analysis using count data models." Ecological Economics 42(1): 289-299.

Smith, M. D. (2005). "State dependence and heterogeneity in fishing location choice." Journal of
Environmental Economics and Management 50(2): 319-340.

Song, F., Lupi, F., & Kaplowitz, M. (2010). Valuing Great Lakes beaches. In prepared for
presentation at the Agricultural and Applied Economics Association Join Annual Meeting.

Staum, P. (2007). "Fuzzy Matching using the COMPGED Function." Proceedings of the 2007
Northeastern SAS.
Tay, R., McCarthy, P. S., & Fletcher, J. J. (1996). “A portfolio choice model of the demand for
recreational trips.” Transportation Research Part B: Methodological, 30(5): 325-337.

Timmins, C. and J. Murdock (2007). "A revealed preference approach to the measurement of
congestion in travel cost models." Journal of Environmental Economics and Management
53(2): 230-249.

Train, K.E., Discrete Choice Methods with Simulation, Cambridge University Press 2003.

Train, K. E. (2008). "EM algorithms for nonparametric estimation of mixing distributions."
Journal of Choice Modelling 1(1): 40.

168

Von Haefen, R. H., D. M. Massey, et al. (2005). "Serial nonparticipation in repeated discrete
choice models." American Journal of Agricultural Economics: 1061-1076.

Von Haefen, R. H. and D. J. Phaneuf (2008). "Identifying demand parameters in the presence of
unobservables: a combined revealed and stated preference approach." Journal of
Environmental Economics and Management 56(1): 19-32.

Weicksel, S. A. Measuring preferences for changes in water quality at Great Lakes beaches using
a choice experiment, Master Thesis, Michigan State University.

Yeh, C. Y., T. C. Haab, et al. (2006). "Modeling multiple-objective recreation trips with choices
over trip duration and alternative sites." Environmental and Resource Economics 34(2):
189-209.

169