VALUATION OF PUBLIC GREAT LAKES BEACHES IN MICHIGAN By Min Chen A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Agricultural, Food, and Resource Economics – Doctor of Philosophy 2013 ABSTRACT VALUATION OF PUBLIC GREAT LAKES BEACHES IN MICHIGAN By Min Chen The objective of this dissertation is to measure the monetary values of public Great Lakes beaches using the travel cost approach. To decide which econometric model to use, Monte Carlo simulations were developed, and results showed that the nested logit model was robust and reliable. To collect beach use data, a two-stage survey of over 29,000 people was conducted from 2011 to 2012. A mail survey went out in 2011 to identify people who participated in beach recreation with a random sample from Michigan’s driver license list. Respondents who said they visited a Great Lakes beach since June 1, 2010 were invited to a follow-up web survey about trips to public Great Lakes beaches in the summer of 2011. A repeated nested logit model with a participation hurdle was estimated for the day trip data. The estimated beach recreation participation rate was 58% for adults living in Lower Peninsula of Michigan, and an estimated 20.9 million day trips were taken by Michigan adults to public Great Lakes beaches in the summer of 2011. The value of access to a public beach for a day trip was estimated to be $32-$39 per person per trip in 2011 dollars. Access to all Lake Michigan public beaches, in Michigan, was estimated to be worth over $400 million per season for day trips for adults living in Lower Peninsula of Michigan. To value long trips of four nights or more, a model was developed allowing people to visit combinations of single and multiple sites on a trip. The resulting values were about $53 per person per beach day for access to a site for a trip of four nights or longer. The more common approach of using the main destination for multi-site trips has larger welfare measures compared to the approach permitting combinations of multiple sites to be visited. ACKNOWLEDGEMENTS First of all, I would like to express my special gratitude and thanks to my major professor, Dr. Frank Lupi. He is very creative, and it is a great pleasure to work with him. I have learned a lot in the six years, not only in research, but also about the communication skills, how to work in a team, etc. Second, I want to thank the NOAA team for efforts on the big survey in one and a half years. Dr. Michael Kaplowitz was very supportive in many aspects and provided practical advice from the perspective of a lawyer. My coworker, Scott Weicksel, was in charge of almost all the work related to survey printing and mailing, pretesting, mail survey design and scanning, etc. I really appreciated his hard work and it contributed a lot to the good quality of our survey. Also, I would like to thank my committee members, Dr. John Hoehn, Dr. Patricia Norris and Dr. Jinhua Zhao, for their constructive comments and understandings; Scott Knoche, Richard Melstrom, Tim Komarek and Tim Hodge, for their helpful advice on study, research and work; and all peers who is or used to be in the department and all my friends, for giving me such a great time at Michigan State! Finally, I want to thank my family in China. Thank my parents and grandparents for always being considerate and caring! iv TABLE OF CONTENTS LIST OF TABLES ............................................................................................................ vii LIST OF FIGURES .......................................................................................................... xii INTRODUCTION ...............................................................................................................1 Chapter 1 Relative Performance of the Latent Class Model Compared to the Conditional Logit and Nested Logit Models for Environmental Valuation.............................................................3 1 Motivation .........................................................................................................................3 2 Models...............................................................................................................................6 2.1 Conditional Logit Model....................................................................................6 2.2 Nested Logit Model ...........................................................................................8 2.3 Latent Class Model ..........................................................................................10 3 Simulations .....................................................................................................................12 3.1 True Model-Latent Class Model ......................................................................12 3.1.1 Simulation Steps ...............................................................................12 3.1.2 Simulation Results ............................................................................16 3.2 True Model-Conditional Logit Model .............................................................26 3.3 True Model-Nested Logit Model .....................................................................30 3.3.1 Simulation Steps ...............................................................................30 3.3.2 Simulation Results ............................................................................32 3.4 Sensitivity Analyses .........................................................................................36 4 Discussion and Conclusions ...........................................................................................37 Chapter 2 Estimating Use Values of Public Great Lakes Beaches in Michigan ................................41 1 Motivation .......................................................................................................................41 2 Models.............................................................................................................................45 2.1 Random Utility Models....................................................................................45 2.2 Predicted Trips .................................................................................................52 2.3 Welfare Measures ............................................................................................52 3 Survey and Data ..............................................................................................................55 3.1 Surveys.............................................................................................................55 3.1.1 Screener Mail Survey ........................................................................55 3.1.2 Follow-Up Web Survey ....................................................................56 3.2 Data ..................................................................................................................58 3.3 Model Specification .........................................................................................62 v 4 Estimation Results ..........................................................................................................68 5 Discussion and Conclusions ...........................................................................................80 Chapter 3 Modeling Long Overnight Trips by Chaining Recreation Sites ........................................83 1 Motivation .......................................................................................................................83 2 Models.............................................................................................................................89 3 Data .................................................................................................................................97 4 Estimation Results ........................................................................................................104 5 Discussion and Conclusions .........................................................................................110 APPENDICES .................................................................................................................112 Appendix A: Results of Sensitivity Analyses for the Monte Carlo Simulations in Chapter 1........................................................................................................................................112 Appendix B: Comparison between Driver License List and Census Data ......................140 Appendix C: Data Weights ..............................................................................................142 Appendix D: Great Lakes Beach Recreation Participation..............................................155 Appendix E: Model Sensitivity in Chapter 3 ...................................................................160 REFERENCES ................................................................................................................162 vi LIST OF TABLES Table 1: Simulating One’s Choice .....................................................................................14 Table 2: Performance of Latent Class Model When It Is the True Model ........................17 Table 3: Performance of Conditional Logit and Nested Logit Models When Latent Class model Is the True Model ....................................................................................................21 Table 4: Estimated Values of Marginal Quality Change of Latent Class Model When It Is the True Model ...................................................................................................................23 Table 5: Estimated Site Values of Latent Class Model When It Is the True Model..........24 Table 6: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model ..........................................................................................25 Table 7: Performance of Latent Class Model When Conditional Logit Model Is the True Model .................................................................................................................................27 Table 8: Performance of Conditional Logit and Nested Logit Models When Conditional Logit Model Is the True Model ..........................................................................................28 Table 9: Welfare Measures of Conditional Logit, Nested Logit and Latent Class Models When Conditional Logit Model Is the True Model ...........................................................29 Table 10: Performance of Latent Class Model When Nested Logit Model Is the True Model .................................................................................................................................33 Table 11: Performance of conditional logit and nested logit models when nested logit model is the true model ......................................................................................................34 vii Table 12: Welfare measures of conditional logit, nested logit and latent class models when nested logit model is the true model.........................................................................35 Table 13: Demographic Characteristics of Users, Potential Users and Nonusers .............59 Table 14: Full Information Maximum Likelihood (FIML) Estimation Results ................71 Table 15: Welfare Estimates of Changing a Beach in 2011 Dollars at Individual Level ..73 Table 16: Welfare Estimates of Changing a Beach in 2011 Dollars (Million) at State Level ..................................................................................................................................74 Table 17: Estimated Trips and Welfare Changes of Closing All Beaches on a Great Lake in 2011 Dollars...................................................................................................................75 Table 18: Examples of Literature Not Differentiating Overnight Trips from Day Trips ..84 Table 19: Studies Dealing with Overnight/Multiple-Objective/Multiple-Site Trips .........85 Table 20: Demographic Characteristics of Participants with Long Overnight Trips ......101 Table 21: Full Information Maximum Likelihood (FIML) Estimation Results ..............105 Table 22: Estimated Welfare Changes per Person in 2011 Dollars .................................106 Table 23: Estimation Results of Truncated Poisson Models ...........................................109 Table A-1: Performance of Latent Class Model When It Is the True Model ..................115 Table A-2: Performance of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model ........................................................................................116 viii Table A-3: Estimated Values of Marginal Quality Change of Latent Class Model When It Is the True Model .............................................................................................................116 Table A-4: Estimated Site Values of Latent Class Model When It Is the True Model ...117 Table A-5: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model ............................................................................117 Table A-6: Performance of Conditional Logit, Nested Logit and Latent Class Models When Conditional Logit Model Is the True Model .........................................................119 Table A-7: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models When Conditional Logit Model Is the True Model .........................................................120 Table A-8: Performance of Conditional Logit, Nested Logit and Latent Class Models When Nested Logit Model Is the True Model .................................................................122 Table A-9: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models When Nested Logit Model Is the True Model .................................................................123 Table A-10: Performance of Latent Class Model When It Is the True Model ................125 Table A-11: Performance of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model ........................................................................................126 Table A-12: Estimated Values of Marginal Quality Change of Latent Class Model When It Is the True Model .........................................................................................................127 Table A-13: Estimated site values of latent class model when it is the true model .........128 Table A-14: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model ............................................................................129 ix Table A-15: Performance of Conditional Logit, Nested Logit and Latent Class Models When Conditional Logit Model Is the True Model .........................................................131 Table A-16: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models When Conditional Logit Model Is the True Model .........................................................132 Table A-17: Performance of Conditional Logit, Nested Logit and Latent Class Models When Nested Logit Model Is the True Model .................................................................134 Table A-18: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models When Nested Logit Model Is the True Model .................................................................135 Table A-19: Performance of Latent Class Model When It Is the True Model ................137 Table A-20: Performance of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model ........................................................................................138 Table A-21: Estimated Values of Marginal Quality Change of Latent Class Model When It Is the True Model .........................................................................................................138 Table A-22: Estimated Site Values of Latent Class Model When It Is the True Model .139 Table A-23: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model ............................................................................139 Table B-1: Age and Gender Distribution of Census and Driver License List in Michigan for People Age 16 or Older ..............................................................................................141 Table B-2: Age and Gender Distribution of Census and Driver License List for People Age 16 or Older, for the Upper Peninsula and Lower Peninsula.....................................141 Table C-1: Mail Survey Sample Weights for Counties in the Lower Peninsula .............144 x Table C-2: Results of a Probit Response/Nonresponse Model for the Mail Survey Using Sample Weights ...............................................................................................................146 Table C-3: Joint Age, Gender and County Distribution of Driver License List ..............147 Table C-4: Joint Age, Gender and County Distribution of 9,591 Eligible Mail Survey Respondents .....................................................................................................................147 Table C-5: Mail Survey Respondent Weights .................................................................148 Table C-6: Results of a Probit Response/Nonresponse Model for the Web Survey Using Mail Survey Respondent Weights ...................................................................................149 Table C-7: Results of a Probit Response/Nonresponse Model for the Web Survey Using Mail Survey Respondent Weights With Fewer Variables ...............................................150 Table C-8: Raking Weights for Web Survey Respondents with No Missing Data (NonNormalized) .....................................................................................................................151 Table C-9: Raking Weights for Web Survey Respondents with Missing Data ...............153 Table C-10: Distribution of Normalized Final Weights for Web Respondents...............154 Table D-1: Participation in Leisure Activities .................................................................157 Table D-2: Factors Influencing Participation in Great Lakes Beach Visitation ..............159 Table E-1: Parameter Estimates of Main Destination Model with and without Regional Dummies ..........................................................................................................................161 xi LIST OF FIGURES Figure 1: Travel Cost Estimates over Some Iterations ......................................................19 Figure 2: Site Quality Estimates over Some Iterations ......................................................19 Figure 3: Decision Tree of Conditional Logit Model ........................................................46 Figure 4: Decision Tree of Two-Level Nested Logit Model .............................................46 Figure 5: Decision Tree with Participation/Nonparticipation............................................48 Figure 6: Public Great Lakes Beaches for Day Trips ........................................................60 Figure 7: GLOS Points on Great Lakes in Michigan .........................................................61 Figure 8: Decision Tree of Main-Destination Model ........................................................89 Figure 9: Decision Tree of Model Allowing Multiple Sites per Trip ................................90 Figure 10: Public Great Lakes Beaches Visited On Long Overnight Trips ......................99 Figure 11: Aggregated Beach Areas in the Long Overnight Trip Model ........................100 Figure 12: GLOS Points on Great Lakes in Michigan .....................................................100 xii INTRODUCTION Michigan has the longest freshwater coastline in the United States, and large numbers of people visit public Great Lakes beaches every year. Beach recreation not only facilitates the economic development of coastal areas, but also brings welfare to people that use them. Although for public beaches there is generally no price, they do have economic use values. The objectives of this dissertation are to quantify the demand for beach recreation and measure the associate use through Random Utility Models (RUM) with data from two surveys. The outcomes of our work can be applied to benefit-cost analysis in the decision-making process. In addition, the estimated demand model structure can be transferred to other locations for valuation of freshwater beaches. Within the widely used random utility modeling framework, there are several types of econometric model specifications. The latent class model assumes heterogeneity in preferences while the nested logit model captures similarity in alternatives. The conditional logit model is the simplest, since preferences are assumed to be the same and alternatives are independent. The first chapter investigates relative performance of the latent class model compared to the conditional logit and nested logit models. Monte Carlo simulations are used to investigate model performances under several scenarios. Results show that the latent class model does not always work as expected, and the nested logit model was found to be more robust than the other two. Thus, the nested logit RUM is applied in chapters 2 and 3. 1 The second chapter estimates use values of public Great Lakes beaches. A mail survey on the leisure activities of Michigan residents was conducted to identify who did and did not participate in Great Lakes beach recreation. People who participated were then recruited to a web survey about their trips to public Great Lakes beaches for an entire summer season. Day trip data was used in a nested logit model to produce estimates of the value of Great Lakes beach use. Unlike most literature, nonusers, those who had not visited Great Lakes beaches in the past two years, also enter the model to test how this alters the way that results are generalized to the population. The third chapter models multiple day recreation trips by chaining recreation sites. In the recreation demand literature, multiple day trips are rarely modeled, but when they are, the traditional way of modeling these trips is to assume only the primary destination is visited (for the trips with more than one destination). In our web survey, participants who take overnight trips of four days or more are asked to report on multiple beaches they have visited in one randomly selected trip, which makes it possible to relax the traditional single-site assumption and allow for visitation of a second beach on overnight trips. The results are compared to those from the traditional model to see if the added complexity and survey cost is warranted. 2 Chapter 1 Relative Performance of the Latent Class Model Compared to the Conditional Logit and Nested Logit Models for Environmental Valuation 1 Motivation Random Utility Models (RUMs) have been widely applied to recreation demand analysis and valuation. Within its framework, according to Train (2003), different distribution assumptions lead to different models such as conditional logit, nested logit (generalized extreme value), probit and mixed logit models. Preferences over attributes are the same in the conditional logit and nested logit models, while alternatives in the choice set can be correlated in the latter. The probit model requires a normal distribution. The mixed logit model is the most inclusive. Random parameter (or mixed logit models) and latent class models are both frequently used to model preference heterogeneity. The random parameter model imposes distributional assumptions over individual preference. The latent class model assumes there are a number of latent groups in the population, and people in different groups have different preferences. It can be treated as a discrete and semi-parametric version of the random parameter model (Greene and Hensher (2003)). Although not as flexible, the latent class model may have more power in interpretation since it can link demographic characteristics to heterogeneous preferences. For example, young people may value water quality more than old people and care less about travel distance, as they are more likely to have contact with water. Hence, many studies have valued recreation activities through the latent class model (Boxall and Adamowicz (2002), 3 Scarpa and Thiene (2005), Morey et al (2006), Owen and Videras (2007), Patunru et al (2007), Scarpa et al (2007), Burton and Rigby (2009)). Several studies have investigated how the latent class model performs against others. Greene and Hensher (2003) compared the latent class and random parameter models through an empirical data set from a stated choice experiment. They evaluated willingness to pay indicators and elasticity and concluded that one was not absolutely better than the other. Each model had its advantages and disadvantages. In their data set, they found the latent class model was preferred statistically. Provencher and Bishop (2004) examined the forecasting ability of the logit, random parameter and latent class models based on salmon fishing on Lake Michigan. They showed that the latter two performs equally well in trip prediction, and for other measures, the logit model could have more reliable results. Hynes, Hanley and Scarpa (2008) studied preference heterogeneity of kayakers using the latent class and random parameter models, and stated that the latent class model might provide better interpretation. Kosenius (2010) analyzed water quality data with the multinomial logit, random parameter and latent class models. The author elucidated that when there were correlations among alternatives, the random parameter model had a better fit to the data than the multinomial logit model. The latent class model used demographic information to explain the heterogeneity in preferences. Nonetheless, using real data, it is hard to tell whether or not the latent class model can successfully recover the true preferences, because those true values are not known. In the literature applying the latent class model to different areas, it is not uncommon to see the estimated preference in one class be more than 10 times that of another class (Scarpa and Thiene (2005), Train (2008), etc.). It is possible that discrepancies in preferences 4 among people are large, but it may also be that the model has drawbacks. A model that cannot correctly reflect the real preferences could be misleading in empirical studies. Therefore, in this chapter, Monte Carlo simulations are employed to test the reliability of the latent class model, where the truth is known, and compare its performance to the conditional logit and nested logit models in the context of environmental valuation. The random parameter model is not under investigation as several studies above have demonstrated that it performs similarly as the latent class model. The one that displays robustness will be used for valuation in the following two chapters. 5 2 Models The utility from visiting a recreation site can be expressed as: where subscripts n and j denote individuals and sites. The construction of the covariate matrix X depends on the specific model. It can include variables only varying across sites, like site characteristics, variables only varying across people, like demographic variables, variables varying across both sites and people, like travel cost, and their interaction terms. The parameter vector β reflects people’s preferences. It can be fixed for all or different for different groups. The random term ε represents individual and site factors influencing utilities. Based on the utility equation, a person will go to the site that generates the highest utility in his/her choice set. Since individual errors cannot be observed from the perspective of researchers, each site has a probability of being visited. Different models have different expressions for the probability because of different distribution assumptions of the errors. The maximum likelihood estimation searches parameter values to maximize the joint probability of observed choices. Welfare measures of site loss or characteristic change can then be computed from parameter estimates. 2.1 Conditional Logit Model 6 The conditional logit model assumes that the errors are independent and follow a Type I extreme value distribution. The parameters are constants, and variables that are invariant to sites must be excluded or interacted with . Following Chapter 3 of Train (2003), the probability of a site to be visited is: () ∑ Let y be the binary variable indicating people’s choices. The log-likelihood function is: ∑∑ ( ( )) where N is the total number of people and J is the total number of sites. Because the model measures use value, person n only cares about the site he/she visits, so only a loss of the chosen site or change on that site (if it is small enough not to affect the original choice) affects this person’s welfare. Suppose person n chooses site g, the loss of other sites or any changes on other sites are of no value to him/her. When site g is closed, person n has to go to the site that gives the second highest utility, say site f; then the reduction in utility is ( ) , where ), and the monetary loss is ( is marginal utility of income, the absolute value of the travel 7 cost parameter. When a marginal change happens on site j, the change in utility for visitors is , the parameter of site characteristic l, and ⁄ is its monetary value. From a researchers’ point of view, however, uncertainty exists due to the error term. Each site has a probability of being visited by anyone. Thus, those probabilities need to be taken into account in welfare estimates. According to Chapter 8 of Haab and McConnell (2002), the estimated welfare change for person n caused by the loss of site j is: ( ̂ ( )) ̂ ; the estimated value of a marginal change on site characteristic l of site j is: and ̂( ) ( ̂ ⁄ ̂ ), where ̂ and ̂ are estimates of . The calculation applies to all sites, j=1, 2, …, J. 2.2 Nested Logit Model Consider the simplest form, a two-level nested logit model, where the choice set is divided into several nests based on site similarities. Within one nest, errors are correlated; for two sites in different nests, errors are still independent. Following Chapter 4 of Train (2003), the probability that a site is visited becomes: (∑ () ∑ (∑ 8 ) ) where measures the degree of independence in errors among sites in nest k. This parameter is normally assumed to be the same across all nests, so we will replace with . Compared with the conditional logit model, the calculation of estimated welfare change from the loss of a site in the nested logit model is slightly more complicated. The probability that person n chooses a site in nest k is: (∑ ( ) ∑ ) (∑ ) And the probability that person n chooses site j conditional on the fact that nest k is chosen is: () ∑ According to Chapter 8 of Haab and McConnell (2002), the estimated welfare change due to closure of site j is: ̂ (( ̂ ( )| ) ̂ ( ̂) 9 ̂( ) ( ̂ ( ))) The estimated value of a marginal change on site characteristics has the same expression as in the conditional logit model where the relevant site choice probabilities are from the above nested logit formulas. 2.3 Latent Class Model The latent class model relies on the assumption that people’s preferences are not the same and they can be categorized into different classes, each having its own set of parameters. Individuals know which class they are in, but researchers don’t. Within one class, people behave exactly the same as in the conditional logit model. From researchers’ perspective, a person can belong to any class with a probability. Then the probability that person n chooses site j is the weighted average of the conditional logit probabilities in all classes. In Chapter 6 of Train (2003), the probabilities of membership in each class are the same for all people, which are actually the shares of people in the population for each class. Suppose there are C classes in total, the choice probability is: () The shares ∑ ( ∑ , c=1, 2, …, C can be estimated together with ) , c=1, 2, …, C. Instead of fixed shares, researchers may assume the probability of membership to class c has a multinomial logit form, and can be predicted by individual information: 10 ∑ where is a covariate of individual characteristics, and specific to class c, which can be estimated together with is a vector of parameters . The choice probability in this case becomes: () ∑ ( ∑ ) ∑( ∑ )( ∑ ) According to Boxall and Adamowicz (2002), the way to calculate the estimated welfare measures is similar to what has been discussed above. The measure is an average of welfare estimates from each class weighted by the corresponding estimated shares or predicted probabilities of membership to each class. ∑ ∑ ∑ 11 ∑ 3 Simulations Monte Carlo analysis will be used to compare the three possible econometric specifications. Three scenarios are constructed, where the data generating process follows the latent class, conditional logit and nested logit models respectively. Under each scenario, pseudo data is estimated using the three models. It is assumed that there are 3 sites, 1,000 people, and the utility equation contains two explanatory variables, travel cost and site quality. 3.1 True Model-Latent Class Model 3.1.1 Simulation Steps Suppose there are two classes with 700 people in the first class and 300 people in the second class. Let the true parameters and shares of the two classes be: 12 where and McConnell (2002). are set to match the model estimates reported in Chapter 8, Haab and and are assigned to make sure there is obvious distinction 1 between two classes. The Monte Carlo simulation steps are as follows : (Step 1) Take 3,000 random draws uniformly over the range from 0 to 100 as the travel cost variable, since it varies across both sites and people. Take 3 uniform random draws for the quality variable from 0 to 2, which just vary across sites. Next, produce random errors for 1,000 people from a Type I extreme value distribution with a normalized variance of ⁄ . From Chapter 9 of Train (2003), the cumulative distribution function for ( ) is: ( ( and its inverse function is: ( [ ( )) )]). Since ( ) falls between 0 and 1, we take random draws from a (0, 1) uniform distribution first and then use the inverse CDF function to compute correspondent random numbers for (Step 2) (Train 2003). For the 700 people who are in the first class, extract their travel costs, site quality and errors to compute their utilities. For each person, pick the maximum among the three site utilities, mark it as one and others as zero, and we get the pseudo observation for the chosen site. Table 1 shows an example of a person’s randomly generated data for travel costs and for site quality for each of three sites. 1 Simulations are programmed in R. 13 The resulting utility is computed and implies site 1 is the best for this person. The same approach is done for the 300 people who are in the second class, but with different parameters in the utility equation. The choices of all 1,000 people form the data for the dependent variable. (Step 3) Compute the true welfare measures. Since we know exactly which class each person belongs to, the calculation for individual welfare measures is the same as in the conditional logit model. Averaging site values and values of marginal quality change over 700 people in class 1 and 300 people in class 2 will produce true welfare measures in class 1 and class 2; averaging them over the entire 1,000 people will produce the population’s true welfare measures for each site. Table 1: Simulating One’s Choice Site 1 2 3 (Step 4) Travel Cost 7.79 61.90 31.95 Quality 1.02 0.64 1.71 Error -0.12 0.54 0.62 Utility -0.09 -2.86 -0.46 Observation 1 0 0 Regress site choices on two explanatory variables (travel cost and site quality) to get the estimated parameters, using conditional logit, nested logit and 2 latent class models. When estimating with the nested logit model , we try three combinations for sites: site 1 and 2 as a nest, site 2 and 3 as a nest, and site 1 and 3 as a nest. Since our objective is to see whether the latent class model recovers the truth, we set the number of classes to be two in the estimation, the same as the 2 The starting values are based on the conditional logit model estimates. For the travel cost parameter, it is the estimate minus or plus 0.01; for the quality parameter, it is the estimate minus or plus 0.1. BFGS is used to locate MLE estimates. 14 3 truth . Also, following Scarpa and Thiene (2005), we assume the probabilities of membership to each class are: ( ) ( ) ( ) which is equivalent to fixed shares, s and 1-s. The share needs to be between 0 and 1, and the expressions above embed the constraint in the estimation process. (Step 5) The estimated welfare measures are then derived from those parameter estimates. When it comes to the latent class model, individual welfare estimates are averages of each class weighted by estimated shares. To make comparisons with the true values, since we know 700 people are in class 1 and 300 people are in class 2, we take the means of the former as the welfare estimates for class 1, and the means of the latter as for class 2. The means over the entire 1,000 people are compared to the population’s true welfare measures. (Step 6) Repeat the last part of step (1), which is generating new errors while keeping explanatory variables the same, and step (2) to (5) 1,000 times. We then have a random sample of size 1,000 for each set of estimates. For each sample, compute the descriptive statistics, such as mean, median, variance, quartiles and mean squared error (MSE). 3 When estimating with the latent class model, how many classes should be considered is a big issue. Train (2008) illustrated how the EM algorithm would estimate parameters with three types of discrete distributions. With the latent class model, the researcher tried different numbers of segments varying from 1 to 30, and found that class number of 8 (indicated by Bayesian Information Criterion) and 25 (indicated by Akaike Information Criterion) worked the best for that specific data set. 15 3.1.2 Simulation Results With two classes, the probability expression of the latent class model is: () ∑ ∑ 16 4 Table 2: Performance of Latent Class Model When It Is the True Model rd True Mean Var. MSE Min. 1 Quartile Median Max. ̂ 3 Quartile -0.06 -0.20 0.704 0.723 -8.97 -0.069 -0.063 -0.056 2.78 ̂ 0.49 1.84 276.47 278.02 -222.3 0.40 0.48 0.64 142.7 ̂ -0.10 -0.46 2.10 2.23 -8.83 -0.14 -0.075 -0.068 6.89 ̂ 0.21 0.36 29.38 29.38 -41.09 0.14 0.36 0.45 62.98 st ̂5 ̂ ̂ 0.70 0.49 0.069 0.112 0.007 0.35 0.50 0.71 0.99 -8.17 -9.16 65679.5 65614.5 -1940 -10.42 -7.44 -6.13 7318.0 ̂ ̂ -2.10 -3.34 12.49 14.02 -10.62 -6.08 -4.44 -1.04 9.59 -0.07 -0.147 0.061 0.066 -1.98 -0.093 -0.075 -0.068 -0.024 0.41 0.51 2.08 2.09 -9.45 0.35 0.42 0.49 15.44 -6.35 -6.12 70.03 70.01 -42.6 -6.83 -6.28 -5.65 246.9 ̂6 ̂7 ̂ ̂8 4 5 Iterations in which the estimation fails to converge are excluded. The results come from the remaining 996 iterations. This is computed from ̂ , and it matches the class 1 estimates. ̂ and ̂ . It is the estimated site quality parameter on average, weighted by ̂ and ̂ . 8 It is the ratios of estimated parameters in two classes, weighted by ̂ and ̂. 6 It is the estimated travel cost parameter on average, weighted by 7 17 The two terms are interchangeable, i.e. we cannot tell which set of estimates are for class 1 and which for class 2 simply by the orders showing in the log-likelihood function. The true parameter ratios of class 1 and 2 are -8.17 and -2.10 respectively. Thus, we take ratios of the two sets of estimated parameters, and treat the one with a ratio larger in absolute value as class 1 estimates. Table 2 shows the descriptive statistics of class 1 estimates, class 2 estimates, their weighted averages, and estimated shares. For parameter estimates and their averages, the medians are much closer to true values than the means, which are influenced by extreme values. Variances and mean squared errors (MSE) are also affected by extreme values. Class 1 estimates perform better than class 2, which may be attributed to its larger number of people. If we look at the quartile ranges, the estimates are somewhat acceptable half of the time; but still, for the travel cost estimate of class 1, ̂ , which is the least biased, the range is around +/- 10%; for the site quality estimate of class 1, ̂ , the range grows to +/- 20%. The share of class 1 is underestimated by 28.5%. On average though, the latent class model performs fine, a 4.2% downward bias in the travel cost estimate, a 2.2% upward bias in the site quality estimate and a 1.1% upward bias in the ratio. Variances and MSEs are much smaller. Although there are extreme values that get estimated for some of the preference parameters within the classes, these extreme values receive a weight close to 0 because the class probability becomes close to 0, as shown in Figure 1 and 2. In fact, it is the weighted average of probabilities that enter the likelihood function, so the latent class model works well on average. 18 1 0.8 0.6 0.4 0.2 0 1 -0.2 2 3 4 5 6 7 8 -0.4 9 10 11 12 13 14 15 16 17 Iterations Class 1 travel cost estimate Class 2 travel cost estimate Estimated share of class 1 Figure 1: Travel Cost Estimates over Some Iterations 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 -1 -2 -3 -4 Iterations Class 1 site quality estimate Class 2 site quality estimate Estimated share of class 1 Figure 2: Site Quality Estimates over Some Iterations 19 The results above are based on two classes in the population. In empirical studies, estimation will be conducted under different numbers of classes and Akaike Information Criteria (AIC) or Bayesian Information Criteria (BIC) are used to decide which is optimal. Scarpa and Thiene (2005) used the following expression: ( ) where J is the number of estimated parameters, log(L) is the log-likelihood function valued at estimated parameters, and κ is a constant. AIC has κ=2; BIC has κ=log(N), where N is the sample size. We re-estimate the data assuming three classes in each iteration. Over all iterations where both estimations converge, AIC will select two classes 93% of the time, and BIC 100% of the time. So the latent class model can self-detect the true number of classes using both criteria. When the true model is the latent class model, the conditional logit and nested logit models measure the average effects, so the true values of the parameters are weighted averages of two classes. The true value of λ in the nested logit model is 1 as sites are all uncorrelated, and how the nests are constructed doesn’t matter, which can be seen from Table 3. Both models produce similar and reliable estimates, about a 7% upward bias in the travel cost estimates, a 3% upward bias in the site quality estimates and a 2.5% upward bias in the ratios. Since variances are very small, the distributions of the estimates are well described by the median, minimum and maximum. 20 Table 3: Performance of Conditional Logit and Nested Logit Models When Latent Class model Is the True Model Conditional Logit Nested 10 Logit Nested 11 Logit Nested 12 Logit 9 ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ 9 Mean Var. MSE Min. Median Max. -0.072 -0.067 1.1e-05 3.3e-05 -0.080 -0.067 -0.057 0.406 0.42 3.3e-03 3.4e-03 0.238 0.417 0.591 -6.35 -6.19 0.69 0.72 -9.15 -6.23 -3.52 1.00 1.02 8.5e-03 8.8e-03 0.70 1.01 1.43 -0.072 -0.068 2.0e-05 3.6e-05 -0.084 -0.068 -0.053 0.406 0.42 3.5e-03 3.7e-03 0.226 0.419 0.610 -6.35 -6.18 0.70 0.73 -8.94 -6.20 -3.61 1.00 1.01 8.8e-03 9.0e-03 0.78 1.01 1.38 -0.072 -0.068 2.1e-05 3.8e-05 -0.088 -0.068 -0.055 0.406 0.42 3.3e-03 3.5e-03 0.23 0.42 0.59 -6.35 -6.17 0.74 0.77 -8.97 -6.16 -5.61 1.00 0.98 0.010 0.011 0.69 0.98 1.36 -0.072 -0.067 1.6e-05 4.3e-05 -0.080 -0.067 -0.055 0.406 0.41 4.0e-03 4.1e-03 0.21 0.41 0.63 -6.35 -6.13 0.75 0.79 -8.50 -6.14 -3.18 True The values are the averages of true parameters in two classes, weighted by true shares. 10 11 12 Site 1 and 2 are in one nest. Site 2 and 3 are in one nest. Site 1 and 3 are in one nest. 21 There is a side note on the parameter λ in the nested logit model. It normally falls within 0 and 1, but based on Train (2003), it can be greater than 1, so the main concern is a positive λ. The estimation results in Table 3 are from unconstrained maximization. Actually, when λ is constrained to be between 0 and 1, we have an estimate very close to 1, and other parameter estimates are almost identical to those of the conditional logit model. Hence, in our cases, it makes little difference whether the constraint is imposed or not. The welfare estimates of the latent class model have similar patterns as its parameter estimates. The quartile ranges suggest somewhat acceptable performance, and extreme values from some iterations distort the means. But average welfare measures perform well because the extreme values within a class receive a low weight since the estimated class share is small. The conditional logit and nested logit model produce welfare measures close to true average values. And all three nest structures lead to the same results. 22 Table 4: Estimated Values of Marginal Quality Change of Latent Class Model When It Is the True Model Site Class 1 Class 2 Average 13 1 2 3 1 2 3 1 2 3 True 13 -2.09 -3.68 -2.40 -0.64 -0.83 -0.62 -1.65 -2.82 -1.87 st Mean Var. MSE Min. 1 Quartile -2.08 -6.81 -0.26 -0.87 -1.50 -0.97 -1.53 -2.90 -1.69 5446.8 24521.6 3314.7 1.10 2.03 1.18 3.14 24.43 3.99 5441.4 24506.8 3316.0 1.15 2.47 1.30 3.15 24.41 4.02 -1562 -1938 -114.9 -2.51 -5.19 -2.92 -19.75 -39.68 -4.26 -2.13 -4.94 -2.76 -1.65 -2.64 -1.80 -1.72 -3.29 -1.93 rd Median 3 Quartile Max. -1.86 -3.32 -2.12 -1.26 -1.84 -1.34 -1.59 -2.88 -1.78 -1.60 -2.64 -1.79 -0.32 -0.39 -0.33 -1.42 -2.56 -1.59 1438 4075 1805 3.92 2.25 3.42 48.13 138.1 60.62 The true values are averages over 1,000 iterations. It is the same with all the true welfare measures below. 23 Table 5: Estimated Site Values of Latent Class Model When It Is the True Model Site Class 15 1 Class 16 2 Average 17 14 1 2 3 1 2 3 1 2 3 st 14 Mean Var. MSE Min. 7.28 16.06 8.72 8.22 11.65 8.18 7.56 14.74 8.56 3.25 14.58 4.30 7.91 12.61 8.46 7.46 14.55 8.34 8950.2 129200 13828 2.30 8.34 1.93 11.40 164.6 18.73 8956.7 129062 13832 2.40 9.24 2.00 11.40 164.4 18.76 -2744 -10210 -3551 -10.11 -11.59 -10.46 -87.04 -338 -114 True 1 Quartile 6.81 14.63 8.21 7.43 10.72 8.29 7.38 14.32 8.33 rd Median 7.41 15.82 8.53 7.75 13.59 8.50 7.58 14.79 8.49 3 Quartile 7.73 18.09 8.80 8.29 14.65 8.70 7.80 15.26 8.66 Max. 171.5 2666 162.5 15.48 19.15 15.84 9.76 65.3 10.55 The true values are averages over 1,000 iterations. It is the same with all the true welfare measures below. 15 In some iterations, we could get very abnormal estimates for both classes. The very large scale of the travel cost estimate makes travel cost extremely important in the decision-making process of where to go. A person will just go to the nearest site. If that site is closed, the welfare loss is huge. As a result we will have infinite site values. So we exclude those iterations in the analysis of welfare estimates. The results here are from 920 iterations. 16 The results are from 881 iterations. 17 The results are from 805 iterations. 24 Table 6: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model Site Conditional Logit Nested Logit (Site 1 and 2) Nested Logit (Site 2 and 3) Nested Logit (Site 1 and 3) 1 2 3 1 2 3 1 2 3 1 2 3 Site Loss/Closure True Estimate 7.56 7.57 14.74 14.62 8.56 8.55 7.56 7.60 14.74 14.64 8.56 8.50 7.56 7.55 14.74 14.63 8.56 8.57 7.56 7.55 14.74 14.67 8.56 8.52 25 Quality Change True Estimate -1.65 -1.67 -2.82 -2.70 -1.87 -1.83 -1.65 -1.67 -2.82 -2.69 -1.87 -1.82 -1.65 -1.66 -2.82 -2.69 -1.87 -1.82 -1.65 -1.65 -2.82 -2.68 -1.87 -1.81 3.2 True Model-Conditional Logit Model To apply simulations in the scenario where the true model is the conditional logit model, the steps are almost the same as in the previous section, except that all 1,000 people have the same preferences in the true world. The true parameters, again taken from Haab and McConnell as in class one above, are: The simulation results are summarized in Table 7, Table 8 and Table 9. 26 18 Table 7: Performance of Latent Class Model When Conditional Logit Model Is the True Model st rd True ̂ ̂ ̂ ̂ 18 Mean Var. MSE Min. 1 Quartile Median 3 Quartile Max. -0.06 -0.085 0.017 0.018 -2.17 -0.067 -0.062 -0.059 -0.012 0.49 0.65 3.26 3.28 -28.78 0.39 0.52 0.66 16.85 -8.17 -9.37 570.6 571.5 -298.6 -10.3 -8.40 -6.61 401.0 The parameters are averages of each class weighted by estimated shares. 27 Table 8: Performance of Conditional Logit and Nested Logit Models When Conditional Logit Model Is the True Model True Conditional Logit ̂ ̂ ̂ ̂ Mean Var. MSE Min. Median Max. -0.06 -0.06 9.6e-06 9.6e-06 -0.07 -0.06 -0.05 0.49 0.49 0.020 0.020 -0.01 0.48 1.00 -8.17 -8.13 5.65 5.64 -17.2 -8.04 0.23 1.00 1.01 9.0e-03 9.0e-03 0.71 1.01 1.34 -0.06 -0.06 1.5-05 1.6e-05 -0.08 -0.06 -0.05 0.49 0.49 0.024 0.024 -0.003 0.49 1.02 ̂ ̂ Nested Logit (Site 1 and 2) ̂ ̂ ̂ -8.17 -8.16 6.05 6.04 -16.57 -8.05 0.06 Nested Logit (Site 1 and 3) 1.00 1.00 9.8e-03 9.8e-03 0.72 1.00 1.32 -0.06 -0.06 1.7e-05 1.7e-05 -0.07 -0.06 -0.05 0.49 0.49 0.023 0.023 -0.08 0.49 1.01 ̂ ̂ Nested Logit (Site 2 and 3) ̂ ̂ ̂ -8.17 -8.20 7.21 7.20 -17.48 -8.11 1.40 ̂ ̂ ̂ 1.00 1.00 9.2e-03 9.2e-03 0.72 1.00 1.41 -0.06 -0.06 1.6e-05 1.6e-05 -0.08 -0.06 -0.05 0.49 0.49 0.023 0.023 -0.05 0.49 1.10 ̂ ̂ -8.17 -8.13 5.96 5.96 -17.98 -8.07 1.05 28 Table 9: Welfare Measures of Conditional Logit, Nested Logit and Latent Class Models When Conditional Logit Model Is the True Model Site Conditional Logit Nested Logit (Site 1 and 2) Nested Logit (Site 2 and 3) Nested Logit (Site 1 and 3) Latent Class 19 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 Site Loss/Closure True Estimate 9.01 9.01 12.06 12.06 10.92 10.90 9.01 9.02 12.06 12.08 10.92 10.89 9.01 9.02 12.06 12.06 10.92 10.90 9.01 9.01 12.06 12.07 10.92 10.90 19 9.01 9.31 12.06 12.04 10.92 10.91 Quality Change True -2.39 -3.00 -2.78 -2.39 -3.00 -2.78 -2.39 -3.00 -2.78 -2.39 -3.00 -2.78 -2.39 -3.00 -2.78 After we exclude iterations with infinite site values, 901 iterations are used to compute the averages. 29 Estimate -2.35 -3.00 -2.78 -2.36 -3.01 -2.78 -2.37 -3.02 -2.79 -2.35 -3.00 -2.78 -1.69 -3.78 -3.90 The latent class model performs fairly well based on the quartile ranges. For the medians, there’s a 3.3% downward bias in the travel cost estimate, a 6.1% upward bias in the site quality estimate and a 2.8% downward bias in the ratio. The means are not as good due to extreme values from some iterations. Parameter estimates of the conditional logit model and nested logit model are very close to true values. In the nested logit model, how the nests are constructed doesn’t matter as sites are all uncorrelated. As discussed above, if the parameter λ is constrained to be between 0 and 1, its estimate will be nearly 1, and other estimates are almost identical to those of the conditional logit model. The estimated site values of the latent class model perform quite well, which makes sense because the travel cost estimate has good properties. The estimated values of marginal quality change are somewhat different from true values, which is attributed to the bias in the estimated parameter ratio. For site 1, there is a 29% upward bias; for site 2, there is a 26% downward bias; for site 3, there is a 40% downward bias. The conditional logit and nested logit models give very good welfare measures. 3.3 True Model-Nested Logit Model 3.3.1 Simulation Steps To simulate the true world with the nested logit model as the true model, instead of generating random errors from a multivariate extreme value distribution, we follow what has been done in Herriges and Kling (1997) as detailed below. 30 With the true parameters as , we can compute the probabilities each person visits each site, say and . Then a number is drawn from a [0, 1] uniform distribution, denoted as x. If x is less than , this person will choose site 1; if x is greater than but less than ( ), this person will choose site 2; if x is greater than ( ), this person will choose site 3. By repeating this procedure for all people we get the pseudo observations. In different iterations, the probabilities remain the same, but x is newly drawn, so the observations are different. When the true model is the nested logit model, some sites are correlated. The IIA assumption no longer holds in the true world, so both the conditional logit and latent class models would produce biased parameter and welfare estimates. The site quality estimate is more biased than the travel cost estimate, so the estimated values of marginal quality change deviate more from true values than the estimated site values. For the latent class model, the median of the average quality estimate is more than two times the true value; the bias in the median of the average travel cost estimate is about 45%. The bias in the means is larger. The nested logit model recovers the truth very well if the nest structure is correct. When the nest structure is incorrect, however, the model approaches the conditional logit model. We find that with a correct nest structure, the results from unconstrained and constrained maximization are the same; with an incorrect nest structure, the estimate of λ is closer to 1 in constrained maximization than in unconstrained maximization, and other 31 estimates are also closer to those of the conditional logit model. Therefore, the nested logit model will perform at least as well as the conditional logit model regardless of the true nest structure. 3.3.2 Simulation Results The simulation results are shown in Table 10, Table 11 and Table 12. 32 Table 10: Performance of Latent Class Model When Nested Logit Model Is the True Model st rd True ̂ ̂ ̂ ̂ Mean Var. MSE Min. 1 Quartile Median 3 Quartile Max. -0.06 -0.099 9.4e-03 0.011 -1.86 -0.095 -0.087 -0.078 -0.050 0.49 1.86 12.26 14.13 -23.98 0.82 1.07 1.61 62.81 -8.17 -13.65 242.6 272.4 -285.9 -16.28 -14.05 -11.44 272.4 33 Table 11: Performance of conditional logit and nested logit models when nested logit model is the true model True Conditional Logit ̂ ̂ ̂ ̂ Mean Var. MSE Min. Median Max. -0.06 -0.07 1.3e-05 1.9e-04 -0.09 -0.07 -0.06 0.49 0.88 0.03 0.19 0.29 0.88 1.46 -8.17 -12.05 6.19 21.30 -19.94 -12.04 -3.68 0.50 0.50 3.5e-03 3.5e-03 0.34 0.50 0.73 -0.06 -0.06 1.4e-05 1.4e-05 -0.07 -0.06 -0.05 0.49 0.49 0.024 0.024 0.01 0.49 1.00 ̂ ̂ Nested Logit (Site 1 and 2) ̂ ̂ ̂ -8.17 -8.16 6.11 6.11 -14.83 -8.08 -0.14 Nested Logit (Site 1 and 3) 0.50 1.24 0.012 0.56 0.93 1.23 1.58 -0.06 -0.081 2.9e-05 4.7e-04 -0.10 -0.081 -0.067 0.49 0.96 0.042 0.27 0.30 0.97 1.57 ̂ ̂ Nested Logit (Site 2 and 3) ̂ ̂ ̂ -8.17 -11.93 6.41 20.55 -19.74 -11.94 -3.52 ̂ ̂ ̂ 0.50 1.28 0.016 0.62 0.89 1.28 1.69 -0.06 -0.083 3.5e-05 5.6e-04 -0.11 -0.083 -0.067 0.49 0.79 0.044 0.13 -0.018 0.80 1.44 ̂ ̂ -8.17 -9.63 7.36 9.49 -18.32 -9.52 0.20 34 Table 12: Welfare measures of conditional logit, nested logit and latent class models when nested logit model is the true model Site Conditional Logit Nested Logit (Site 1 and 2) Nested Logit (Site 2 and 3) Nested Logit (Site 1 and 3) Latent Class 20 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 Site Loss/Closure True Estimate 9.54 10.35 7.38 7.64 13.27 12.12 9.54 9.54 7.38 7.39 13.27 13.28 9.54 9.71 7.38 8.03 13.27 12.47 9.54 10.81 7.38 7.30 13.27 12.12 20 9.54 9.79 7.38 7.72 13.27 12.64 Quality Change True Estimate -2.71 -4.07 -2.29 -3.32 -3.17 -4.66 -2.71 -2.70 -2.29 -2.27 -3.17 -3.19 -2.71 -3.93 -2.29 -3.35 -3.17 -4.65 -2.71 -3.31 -2.29 -2.63 -3.17 -3.68 -2.71 -3.57 -2.29 -1.70 -3.17 -8.39 After we exclude iterations with infinite site values, 838 iterations are used to compute the averages. 35 3.4 Sensitivity Analyses To see how sensitive the results are to underlying factors, we conduct sensitivity analyses. In the simulations above, the parameter ratio of travel cost over site quality in one class is about four times of that in the other class. We picked two pairs of true parameters from Hynes, Hanley, et al. (2008) so that the difference between parameter ratios becomes even larger, around 24 times. Preferences of the two classes are very distinct, which might help to identify a person’s membership. Second, we increase the number of sites from three to seven. With more sites, there is more variation in people’s site choices with different preferences, and it might be easier to tell which class one belongs to from their observations. Also, we changed the true shares of two classes as 50% and 50%. By having equal number of people, any disadvantage of having a smaller group is removed. 21 It turns out that all results display the same pattern as before . Hence, we conclude that the inherent functioning of the latent class model produces outcomes that work well on average but not necessarily for the individual classes since the above assumptions used in the simulations had little influence on the patterns of the results. 21 See Appendix A for results. 36 4 Discussion and Conclusions The latent class model has been broadly applied in many areas, including within environmental economics for valuation studies and for recreation demand analyses. In this chapter, we use Monte Carlo simulations in the context of recreation site choices to test whether the latent class model will successfully recover the truth and how it performs compared to two other widely used site choice models, the conditional logit and nested logit models. By conducting simulations under three true scenarios, we find that the latent class model works at best the same as the conditional logit model, and is inferior to the nested logit model when alternatives are no longer independent. The latent class model aims to capture preference heterogeneity by assuming there are a number of latent groups in the population. However, even if this is the true scenario, we don’t know the true number of groups or everyone’s membership. For the former, we can try a set of group numbers and let the data tell which is optimal. Based on our findings, the two information criteria frequently used select the correct one at least 90% of the time, which indicates that the latent class model can recover the true number of preference groups in the population at an acceptable confidence level. For the latter, we either rely on the data using fixed group shares in the estimation, or infer the probabilities to each group through demographics. No matter how small a group is or how low a probability could be, the uncertainty exists over which group a person belongs to. Thus, it is the averages weighted by group shares or personal probabilities that enter the loglikelihood function. That is to say, the values of preference parameters in each group and corresponding shares do not matter, as long as the averages, which are their combinations, maximize the log-likelihood function. So in the simulations, we see that the latent class 37 model performs very well on average for both parameter and welfare estimates. But for each individual class, the estimated parameters and group shares can deviate from the true values, sometimes substantially. The latent class model does not always do a good job identifying classes of a population with distinct preferences as is designed to do. It could misallocate individuals in groups together with biased preferences. The positive finding was that 50% of the time the bias from poorly estimated class sizes or parameters may not be very large. In addition, the commonly applied information criteria are likely to self-detect the true number of groups using the latent class model. The conditional logit model has the simplest form, yet it has very good performance when the unmeasured site characteristics in the errors are truly uncorrelated with one another. When the true model is the conditional logit model itself, estimates are close to true values, and variances and MSEs are quite small; when the true model is the latent class model, conditional logit does well in recovering population averages. The conditional logit estimated parameters and welfare measures sometimes even have even better properties than population average estimates of the latent class model. In fact, the conditional logit model can be viewed as a degenerate latent class model with the constraint that preferences in all groups are the same. So we may be better off by imposing constraints in maximum likelihood estimation. Both models are expected to be biased if there is correlation among sites. In all true scenarios, the nested logit model has the best performance among the three models considered regardless of the true model. When sites are independent, how the nests are constructed is irrelevant. When the true model is the nested logit model, a correct nest structure gives estimates almost identical to true values. If the nest structure 38 is wrong, the results are very similar to the conditional logit model. Hence, the nested logit model can detect an incorrect nest structure and go with no nesting as a solution. As discussed in Herriges and Kling (1997), some nesting works better than no nesting, which may be attributed to the additional degrees of freedom available in the nested logit model. In conclusion, for future use of the latent class model, one should be cautious interpreting the meanings of estimated class-specific parameters and the population segment sizes. If the estimates seem extreme when compared to the other classes or when a class membership is of very small size, one may be better off using a conditional logit model. In addition, the robustness and reliability of the nested logit model justifies its application to the Great Lakes beach survey data in the following two chapters. For future research, it is worth considering true scenarios with more preference groups in the population and estimating a latent class model with a variety in the number of classes. For example, in applications, Greene and Hensher (2003) and Provencher and Bishop (2004) had three classes, Scarpa and Thiene (2005) had four, Hynes, Hanley and Scarpa (2008) had six, and Train (2008) had eight and twenty-five. It is possible that having more latent groups in the truth might help the latent class model identify individual class preferences. More variation in the true scenarios may make the estimation more stable, and the ability of the latent class model to detect the true number of classes can be further tested by using a variety of class numbers. Another possible future direction would be to extend the modeling of class memberships to include a rich set of demographic variables. Also, instead of generating explanatory variables, simulations may be applied to survey data with real travel cost and site quality as well as 39 demographic information, with which individual-specific membership to each group can be modeled. 40 Chapter 2 Estimating Use Values of Public Great Lakes Beaches in Michigan 1 Motivation People often take trips to public beaches in their leisure time to participate in recreation activities. Although there is no explicit market for pricing, through recreation demand models, monetary use values of public beaches can be derived, which have important policy implications. If policy-makers consider initiating environmental protection or remediation projects related to beaches, they might apply benefit-cost analysis and weigh costs against benefits, which come from increased trips. Moreover, establishing the economic value of beach recreation can help policy makers think about the relative value of various natural assets as they consider funding allocations among competing areas of need. Many researchers have evaluated the economics of beaches along the coastlines of oceans. For instance, Deacon and Kolstad (2000) summarized several studies in 1970s and 1980s on saltwater beach valuation, the results of which ranged from $0.70 to $13.55 per beach day in 1990 dollars. Hilger and Hanemann (2006) used data from a survey on households in Southern California about their annual beach trips, and computed an average willingness to pay of $5.71 in 2001 dollars, for an increase of one letter grade on a water pollution rating scale. Lew and Larson (2008) had a telephone-mail-telephone survey on randomly chosen households in San Diego County and asked eligible participants about their trips to beaches. They computed the value of having access to 41 beaches to be between $21 and $23 per day in 2000 dollars. Parsons et al (2009) surveyed Texas residents living within 200 miles of the Gulf of Mexico and showed that if all Padre Island beaches were closed, the mean loss would be $20 per trip in 2008 dollars. Nonetheless, very few studies have focused on public Great Lakes beaches, which are located along the largest group of freshwater lakes on the Earth and have unique characteristics of their own. Murray, Sohngen and Pendleton (2001) took an on-site survey at Maumee Bay and Headlands State Park beaches on Lake Erie and calculated the value per beach day to be $25 for the former and $15 for the latter, in 1998 dollars. However, because Lake Erie is smaller and its coastline is quite different when compared with Lake Michigan and Lake Huron, it is unclear if their results can be generalized to the entire Great Lakes. Song, Lupi and Kaplowitz (2010) did a web survey on visitation to public Great Lakes beaches using a convenience sample from a consumer web panel of Michigan adults and concluded that the welfare loss of eliminating a beach was around $50 per visitor in 2006 dollars. However, the web panel was not representative of the general population and their trip location data was only for the beach visited most often. In addition, recreation demand models are usually applied to people who participate in the activities. Although this assumption has some efficiencies for activities that require a license (e.g., fishing), it is worth investigating how to generalize the results to the entire population for activities that are more general such as beach use. Shaw (1988) addressed the issues of truncation and endogenous stratification for on-site sampling using a Poisson model. Englin and Shonkwiler (1995) proposed the negative binomial model with count data to improve estimation. Shonkwiler and Shaw (1996) defined three groups of people in recreation as “nonusers”, who never participated, “potential users”, 42 who would participate but didn’t in the survey season, and “users”, who always participated, and put single and double hurdles into the count data model. But these solutions are for single site models. In the context of multiple sites, von Haefen et al (2005) took the day trip data from a survey to Delaware residents on their visitation to Mid-Atlantic ocean beaches, and integrated single and double hurdles into discrete choice models to model the behavior of not taking a day trip in the season, where the total number of day trips was zero over all choice occasions. Instead of distinguishing people by the number of trips, English (2008) treated people who held licenses for shrimp baiting as participants, the rest as nonparticipants. He derived a participation hurdle by equating seasonal consumer surplus with the cost of license. The hurdle was added to the nested logit model where one chose to purchase a license or not. The survey was only sent to license holders. Information on nonparticipants was obtained in aggregate form from the census data at ZCTA (Zip Code Tabulation Areas) levels. If we adopt the definitions in Shonkwiler and Shaw (1996), nonusers and potential users were pooled in von Haefen et al (2005), because both groups would have no day trip in the season. The hurdles modeled the difference between the aggregation of these two groups and the user group. In English (2008), nonusers were separated from the pool by not holding the license. Potential users would pass the participation hurdle as users and decide not to go for shrimp baiting in every choice occasion. However, there was no survey on nonusers. Also, identification of the three groups would not be that straightforward in beach recreation. To fill the gap, we conducted a two-stage survey of Michigan residents where a screener mail survey was followed by a web survey. The purpose of the mail survey was to find users and potential users of beach recreation and 43 collect data on nonusers at the individual level. The web survey was implemented on users and potential users, who were asked to report seasonal trips on public Great Lakes. In this chapter, we apply the repeated nested logit model to the survey data to estimate use values of public Great Lake beaches in Michigan, and the model is augmented with a participation hurdle to examine how different forms of generalizing to the population affect the results. 44 2 Models 2.1 Random Utility Models Random Utility Models are widely applied for recreation demand with multiple sites. Following Train (2003), the utility person n receives from visiting a beach j in choice occasion t is the sum of a deterministic term and a random term: where the so called indirect utility . characteristics, or simply beach-specific constants. is a vector of beach varies across people and beaches and may include travel cost and interactions between demographics and beach characteristics. captures all other factors that affect utilities but cannot be observed by researchers. In a choice set with J beaches, person n will choose beach j in choice occasion t if and only if: Suppose is independently, identically distributed as Type I extreme value distribution, from researchers’ point of view, person n will have a one-level decision tree as in Figure 3. The probability of choosing beach j in choice occasion t is: 45 () ∑ It is called the conditional logit model, and implicitly assumes the property of independence from irrelevant alternatives (IIA). That is, the relative probability of choosing beach j over beach i in every choice occasion is not influenced by the number or attributes of other alternatives. In reality, this does not hold most of the time. If a generalized extreme value distribution, the IIA assumption will be relaxed to some extent. And the decision tree will have two levels as in Figure 4. Figure 3: Decision Tree of Conditional Logit Model Figure 4: Decision Tree of Two-Level Nested Logit Model 46 has The probability of person n going to beach j in nest k in choice occasion t is: ( (∑ ) ∑ where ) (∑ ) is the number of alternatives in nest k. This model is referred to as the nested logit model. IIA holds within nests, but not across nests. The parameter measures the degree of independence among the alternatives in nest k. The higher it is, the lower correlation between these alternatives and the closer the nested logit model to the conditional logit model. It can also be interpreted as the parameter on the lower level’s inclusive value. It is normally assumed to be the same across all the nests so that the model will converge. And we can replace with . ( ) can be decomposed into the multiplication of the probability to choose beach j conditional on nest k, and the probability to choose nest k in choice occasion t. ( ) ∑ (∑ ( ) ∑ ( ) ) (∑ ( 47 ) ) ( ) We use a repeated nested logit model on day trips (Morey et al (1993)) for our analysis, since there are a number of choice occasions in one summer season. Following English (2008), with a participation hurdel, the decision tree is illustrated in Figure 5. Figure 5: Decision Tree with Participation/Nonparticipation At the top level, nonusers will not participate. People who overcome this hurdle will decide whether to take a day trip in each choice occasion. Potential users have the status quo utility exceed the utility of visiting a public Great Lakes beach in every choice occasion and take no day trip. Otherwise, they will become users and take at least one day trip over the season. The nests are defined by different Great Lakes, since they have their own characteristics. For users and potential users, in choice occasion t, the probability that person n chooses beach j conditional on going to lake k is: 48 ( ) ∑ The probability of going to lake k conditional on taking a day trip is: ( (∑ ) ∑ ) (∑ ) Denote the indirect utility of not taking a day trip in the current occasion as . The probability of taking a day trip in this occasion is: ( (∑ ) (∑ (∑ ) ) (∑ ) ) Then, the unconditional probability of person n visiting beach j on lake k is: ( ) ( (∑ ) ( (∑ ) (∑ ) (∑ (∑ ( ) ) ) ) ) The probability that person n doesn’t take a day trip in choice occasion t is: 49 ( ) (∑ (∑ ) ) For users and potential users, the so called inclusive value, which is the maximum utility person n can attain in choice occasion t, is: ( ( ∑ (∑ ) ) ) As shown in Figure 5, the participation hurdle is imposed for the overall season. To derive the participation hurdle, unlike activities requiring licenses, the cost of entry is zero for beach recreation, although parking fees or access fees may apply on some public beaches. Following English (2008), people who participate will have positive consumer surplus, which means that the seasonal utility of participating is greater than the status quo utility of not participating. The sum of every choice occasion’s inclusive value gives person n’s seasonal maximum utility: ∑ where T is the number of choice occasions in the season. Denote the indirect utility of not participating as , the behavior of participating and not participating can be described by a logit model: 50 ( ) ( ) where ρ is the parameter on the seasonal inclusive value. Hence, the log-likelihood function is: ( ( ∑ )) ( ( ∑ )) ∑∑∑∑ ∑∑ ( ( ( ) ( ( )) )) where S is the number of nonusers, N is the number of users and potential users, T is the number of choice occasions, and w is the personal weight. is 1 for the beach visited in occasion t and 0 for all other beaches. The total number of day trips taken by person n can be computed as: ∑ . 51 2.2 Predicted Trips With the estimated parameters, we can predict individual probabilities of taking day trips, ̂( ), and visiting certain beaches, ̂ ( ). Then for person n, the predicted total number of day trips is: ̂ ∑ ̂( ) The predicted total number of trips taken to beach j on lake k is: ̂ ( ) ∑ ̂( ) If beach j is closed or there is a marginal increase in the length of beach j, the changes in total trips or the trips taken to beach j can be calculated. ̂ ̂ ( ) ̂ ̂ ( ̂ ̂ ( ) 2.3 Welfare Measures 52 ) A change on one or more beaches will cause welfare changes to users and potential users. It is of no value to nonusers. Based on Haab and McConnell (2002) and Champ et al (2003), for person n, in choice occasion t, the welfare change are computed as the change of the maximum utility this person can attain in this choice occasion, i.e. the inclusive value, before and after a scenario happens, divided by marginal utility of income. ̂ | ̂ ̂ | ̂ The seasonal welfare change will be: ̂ Taking the weighted average of ∑ ̂ ̂ across all users and potential users gives the seasonal value per person. ̂ ̂ ∑ ∑ To make seasonal welfare estimates comparable to those from single-site demand models, they can be normalized by two kinds of factors: changes in total trips or trips taken to the changed site, both of which were presented in the previous section. The way we apply the normalization is to divide the weighted sum of seasonal values by the weighted sum of trip changes, so that the results may not be distorted by possibly almost zero probabilities to visit certain beaches at the individual level. 53 ̂ ∑ ̂ ∑ ∑ ∑ ̂ ̂ ( ) All the estimates above are per person measures. How to generalize them to the population depends on specific models. 54 3 Survey and Data 3.1 Surveys 3.1.1 Screener Mail Survey To recruit people who might participate in beach recreation and collect data on nonusers, a screener mail survey was sent to Michigan residents in 2011. A stratified sample was drawn from Michigan’s driver license list, which has similar demographic characteristics 22 as the census data. The two strata are for coastal and non-coastal counties, with 60% of 23 the sample drawn from coastal counties and 40% from noncoastal counties . Within the two strata, we drew randomly proportional to each counties’ population to further ensure geographic representativeness of the sample. To manage the survey costs, people who lived in the Upper Peninsula were excluded as the majority of population lives in the Lower Peninsula. The original sample size drawn was 32,230, and the number went down to 29,613 after removal of deceased people and those with bad mailing addresses. The short four-page mail survey had three parts. The first part asked people about their participation in various everyday activities, recreation activities and indoor activities. Only one question was about Great Lakes beaches in order to reduce potential selfselection bias that could occur if people knew the survey was aimed at identifying Great Lake beach-goers. The second part was about participation obstacles, such as time or 22 See Appendix B. 23 The ratio of 60% over 40 % was decided through sensitivity analyses to balance between recruiting as many people who participated in beach recreation as possible within the project budget and not losing the representativeness of the general population. 55 money constraints. The third part contained demographic questions like race, education, employment status, household income, etc. From June, 2011 to November, 2011, three waves of survey packages were mailed out and two waves of automated phone calls were sent to household landlines as reminders. 11,028 people returned their questionnaires for a 37.24% response rate, and 9,591 respondents were kept for data analysis according to the criteria of living in the Lower Peninsula and being the persons to whom the mail survey was addressed, among 24 which 5,556 said they had visited a Great Lakes beach since June 1, 2010 . 3.1.2 Follow-Up Web Survey 5,476 users and potential users from the screener mail survey were invited for the follow25 up Great Lakes beach web survey . In-person and on-line pretesting was implemented to test survey instruments (see Weicksel (2012)). There were additional 85 people participating in beach recreation (their responses were received after the mail survey was closed for data collection) chosen for a pilot survey, the purpose of which was to test the functionality and data storage of the web survey. 24 Please refer to Weicksel (2012) for complete mail survey details. 25 The 80 people not invited to the web survey actually had multiple answers to the question “Where do you live”. They might own properties in the Upper Peninsula of Michigan or other States. We decided to include them for data analysis after the web invitation went out. All these discrepancies are taken care of through weights. 56 There were two sections in the follow-up web survey: the beach trip section 26 and the choice experiment section analyzed by Weicksel (2012). In the beach trip section, following the survey in Parsons et al (2009), trips are categorized into three types: trips lasting a day or less (day trip), overnight trips of less than four nights (short overnight trip), and overnight trips of four nights or more (long overnight trip). People are asked to report trip numbers of each type during the time frame from Memorial Day weekend, 2011 to September 30, 2011 (the primary beach-going season). Detailed questions were asked for up to two randomly selected trips, such as date, activities and the number of adults and children. If one had not gone to any public Great Lakes beaches in Michigan in the past two years, the beach trip section would be automatically skipped. Four waves of contacts were sent to potential web respondents. The first wave mail package included an invitation letter with the invitee’s unique survey website address and a $1 cash incentive; postcard reminders with the unique survey web addresses were used in the second and third waves, differing in sizes. In the last wave, a letter invitation was sent with a completion incentive strategy. The survey started in April, 2012, and closed right after the Memorial Day weekend, 2012. In total, 3,197 people logged on the survey and answered our initial trip questions, giving a response rate of 27 58.38% . The overall response rate of the two-stage survey was 21.7%. 26 In the survey, “Great Lakes beaches” were defined with a labeled graphic along with the following bulleted list: “For this survey, Great Lakes beaches in Michigan include beaches on the shorelines of • Lake Michigan, • Lake Huron, • Lake Erie, • Lake Superior • All connecting waters (Lake St. Clair, St. Clair River, Detroit River, etc.)”. 27 Please refer to Weicksel (2012) for complete web survey details. 57 3.2 Data Out of 9,591 mail survey respondents, 3,838 said they didn’t visit any Great Lakes beaches, so they are defined as nonusers for beach recreation. Within the 3,197 people who responded to the web survey, 2,544 are the persons to whom the web survey was addressed, and are kept for data analysis. 7 of them skipped the beach trip section, which leaves us 2,537 effective respondents as users and potential users. This chapter follows most recreation demand studies and only day trips are used 28 in the model. Trips are removed where beaches are on inland lakes or Lake Superior , or out of Michigan, and where no trips are reported for the beaches. If total trip numbers in 29 each month exceed the upper limits , excess trips are dropped. After these steps, we have 1,538 users who took at least one day trip in the summer of 2011, and 999 potential users with no day trip. 28 Web respondents all live in the Lower Peninsula and it is impossible for most of them to go to Lake Superior and come back on one day. Some people may have a second home in the Upper Peninsula, so they report day trips to beaches on Lake Superior. Their trips are not included in the analysis as we consider trips originating from permanent residence. 29 For day trips, the upper limit in June (including Memorial Day weekend) is 34, 31 in July and August, 30 in September. 58 Table 13: Demographic Characteristics of Users, Potential Users and Nonusers Age (Mean) Income (Mean, $1000) Education Years (Mean) Male (%) White (%) Employed Full-Time (%) Retired (%) Children under 17 (%) Effective Web Survey Respondents Potential All* Users Users* 44.4 43.9 45.0 81.9 79.0 85.7 14.8 14.8 14.8 47.8 50.0 44.8 90.9 90.7 91.1 52.2 54.3 49.4 19.2 17.5 21.5 35.0 34.2 36.0 30 Nonusers* 49.5 61.0 13.8 49.7 80.1 40.1 29.9 29.2 *Note: Nonusers were significantly different at 1% level from the group of Users and Potential Users for each characteristic except “Male”. Nonusers were significantly different at 1% level from Potential Users for each characteristic. Potential Users are significantly different at 5% level from Users for “Income”, “Employed Full-Time” and “Retired”. We use demographic data from the web survey for users and potential users as it is the most recent. It can be seen from Table 13 that nonusers have very different characteristics from the group of users and potential users. People are more likely to participate if they are young, with higher income, more educated, white, employed fulltime, not retired and with children under 17. Between users and potential users, we would expect the employment status to affect the behavior of taking or not taking a day trip in one choice occasion. Furthermore, nonusers are significantly different from potential users for each characteristic suggesting that pooling these two categories as in von Haefen et al (2005) may lose some accuracy. It is worth noticing that nonusers are identified based on the screener mail survey in this study. Although the chances are likely 30 These are weighted by corresponding weights. 59 small, those nonusers who responded the mail survey before September 30, 2011 might have taken trips to public Great Lakes beaches in the survey season. Figure 6: Public Great Lakes Beaches for Day Trips 31 For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. According to the official beach list from Michigan Department of Environmental Quality (DEQ), there are 588 public Great Lake beaches in Michigan, 454 on Lake Erie, Lake St. Clair, Lake Huron and Lake Michigan. Removing 3 beaches with no length information, we have 451 beaches as candidates in people’s choice sets (Figure 6). Choice sets can be different among individuals based on the maximum driving distance on one day. Following the literature, we set the cut point to be 500 miles for a round 31 Figure 6, 7, 10, 11 and 12 are Google Earth images. File conversion is through the website: http://www.earthpoint.us/ExcelToKml.aspx 60 32 trip , which means beaches more than 250 miles away from one’s permanent residence are not available for day trip visitation. The resulting choice set is quite large compared to previous studies on beach visitation, which often have fewer than 100 alternatives. For instance, Murray, Sohngen and Pendleton (2001) conducted their survey on 15 Lake Erie beaches, and Parsons et al (2009) had the maximum number of sites in the choice set as 65. Figure 7: GLOS Points on Great Lakes in Michigan Individual beach length 33 and the previous year’s closure information were provided by Michigan Department of Environmental Quality. The number of closure days is the sum of all closure periods in the year of 2010, the year prior to our trip data. 32 33 About 1% of people who took day trips visited beaches more than 250 miles away. It is defined as the length of shoreline reach. 61 Data on water surface temperature in the survey season was obtained from National Oceanic and Atmospheric Administration (NOAA) Great Lakes Environmental Research 34 Laboratory (GLERL) using Great Lakes Observing System (GLOS) Point Query tool . 56 grid points are selected on Lake Huron along the coastline, 79 on Lake Michigan and 2 on Lake Erie (there are two beaches on Lake Erie in the DEQ list), as shown in Figure 7. Daily temperatures were retrieved at these points and averaged into monthly temperatures, because we know the month of the trips but not the exact days. Monthly data was directly used for Lake St. Clair as its daily data was not available. Individual beaches were matched to the nearest location with temperature data. 3.3 Model Specification In the repeated nested logit model with a participation hurdle, the specification of the indirect utility person n obtains from visiting beach j on lake k in choice occasion t is: ( 34 http://glos.us/data-tools/point-query-tool-glcfs 62 ) and are described by the demographic variables in Table 13. The computation of travel cost is: ( ) ⁄ ( ⁄ ) $0.476 per mile is the total driving cost minus maintenance and insurance costs for an 35 average size car in 2011, reported by American Automobile Association (AAA) . Time cost is the opportunity cost. A person employed full-time works approximately 2,000 hours per year, and the hourly wage can be derived. As discussed in Chapter 9 of Champ et al (2003), for people working with fixed time schedule, normally one third of the hourly wage is treated as the time cost. Travel distance and travel time are calculated in 36 PC miler, the logistic software, and their measures are mile and hour respectively . The definition of regions is from Center for Geographic Information in the State of Michigan, where there are six regions in the Lower Peninsula plus one for the Upper Peninsula. Beaches are assigned to different regions based on counties they belong to, 35 This is one way to compute travel cost. Another way would be the operating cost (gas, maintenance and tires) plus depreciation caused by driving, which gives $0.2422 per mile. Results using this travel cost are available upon request. 36 The travel cost in this study is for each adult, not household. It does not count the number of people in one vehicle. 63 which is available in the official beach list from Michigan Department of Environmental Quality (DEQ). Since a few Lake Michigan beaches are on the Upper Peninsula, we include six regional dummies for the Lower Peninsula in the estimation. In the survey data, instead of reporting beach names, people might only report the nearest town or city to the beach. That is to say, we don’t know the exact beach but the area. There could be multiple beaches in that area. Given that all beaches are mutually exclusive, the probability that person n visits area a can be expressed as: ( ) ∑ ( ) It can be inferred from the official beach list how many beaches are in certain areas and what they are. Also, for some trips, we are not able to locate the beaches or the areas, and have to count these trips at the level of taking or not taking a day trip. That is to say, ( ) is used to describe the trip information. Data from these two groups takes about 35.3% and 8.8% of the total day trips respectively. Since nearly half of the trip data is non-regular, the estimation is programmed in Matlab so that the log-likelihood function can be adjusted to incorporate all available information, although the estimation burden greatly increases. Depending on the speed of computers, it takes 2 to 4 days to estimate the proposed model in Figure 5 with starting values from sequential estimation. To remove the effect of cluster standard errors in repeated trips, bootstrapping is applied through High Performance Computing Center in Michigan State University, where it is possible to execute many single-process jobs at a 64 time. Regarding the time constraint, we set the number of runs in bootstrapping as 100, which still requires about four weeks before getting all the results. We also estimate two traditional repeated nested logit models without the participation hurdle for comparison. For convenience, we call them Model 1 and Model 2, where Model 3 is the proposed model. Model 1 only uses the web survey data and excludes nonusers, which is normally applied with list sampling and some studies with a screener survey, such as Lew and Larson (2008). Model 2 and Model 3 include users, potential users and nonusers, and individual weights are adjusted to maintain the relative ratio of participation to nonparticipation. So the data for both models is representative of the general population. Like the models in von Haefen et al (2005), Model 2 does not differentiate potential users from nonusers because they all took no day trips in the survey season. Model 3 follows the procedure in English (2008), and has a similar structure as Model 1 except for the added participation hurdle. Nonusers do not enter the nests below the hurdle (Figure 5). The computation of welfare measures at the individual level in Model 1 and 2 follows the equations in Section 2.3, since Model 2 pools nonusers with potential users. In Model 1, to calculate welfare measures at the population level, we need to take into account the fact that these individual estimates are for users and potential users. The 37 participation rate inferred from the mail survey was 58.01% . The total number of adults living in Lower Peninsula of Michigan is 7,289,085 according to 2010 census, which implies 4,228,398 users and potential users. Multiplying the number of users and 37 See Appendix D. 65 potential users to Model 1 individual estimates gives welfare changes for the population. In Model 2, multiplying 7,289,085 to individual estimates will produce welfare measures for the population. For Model 3 where there are three groups of people, changes in beaches cause welfare loss to users and potential users, not to nonusers (because of the nature of the use values being estimated). Although the data shows which group one belonged to during the survey, generally, researchers will have no information on the membership. Also, people switch between groups all the time. Therefore, we can predict one’s probability to participate and not to participate in status quo, and apply them to conditional estimates to derive unconditional welfare measures. For person h in Model 3, we have the welfare changes in choice occasion t and total estimated trips as: ̂ ̂ ( ) ̂ | ̂ ( ̂ ̂ ( ) ) ∑ ̂( ̂( ) ̂ | ̂ ̂ ( ) ̂ ( ) ) ∑ ̂( ̂ ( ) 66 ) These will generate individual welfare measures for a random person in the population, and the calculation of population welfare measures is the same as Model 2. 67 4 Estimation Results Table 14 shows full information estimation results of two traditional repeated nested logit models, Model 1 and Model 2, and the proposed model with a participation hurdle, Model 3. All models display the same pattern at the beach level, and have similar estimates, since information at this level mainly comes from users. The estimated parameter on travel cost has a negative sign and is statistically significant at 1% level, which is consistent with demand theory. The higher the price is, the lower the demand. Following the literature, logarithm of beach length is used. The length matters a lot when beaches are short, and its importance decreases for longer beaches. All else equal, warmer beaches are preferred to colder beaches. Total closure days in the previous year have a negative effect on beach visitation, suggesting that previous beach closures have a lasting stigma impact on future visitation. The estimated parameters on regional dummies indicate that all else equal, beaches on Lake Michigan are more popular compared to Lake St. Clair, Lake Erie and Lake Huron. The two nesting parameters at the lake level and the trip/no trip level are statistically significant at 1% level and within the unit interval, which is consistent with utility maximizing behaviors. Thus, nesting works better than no nesting. At the trip/no trip level, how demographic variables affect the behavior of taking or not taking a day trip in one choice occasion is different in Model 2 compared to Model 1 and 3. The estimated parameters as well as their significance are quite different, or even have opposite signs, because nonusers are identified at this level together with potential users in Model 2. Model 1 and Model 3 show that within the population of beach-goers, people who are male, non-white and not full-time employed take more day trips, as Table 13 68 indicates that the employment status may influence the behavior of taking or not taking a day trip in one choice occasion; whereas Model 2 suggests that in the general population, people who are more educated take more day trips. Comparing Model 1 and with the part of Model 3 that is conditional upon participating in beach recreation shows that they produce almost identical results. Recall that Model 3 is essentially Model 1 plus the participation hurdle; nonusers do not enter the nests below the participation hurdle. In Model 3, the nesting parameter at the top level (the hurdle level) is statistically significant at 1 % level and between 0 and 1, so adding the participation hurdle to the model works better than no hurdle. The variable for being full-time employed is dropped from the hurdle because otherwise the model would not converge, which might be caused by its higher correlations with other demographic variables for nonusers. In the general population, the hurdle model suggests people who are young, white and more educated are more likely to participate in beach recreation. The three variables are all significantly different between nonusers and the group of users and potential users in Table 13. Based on the estimation results, preferences on travel cost and beach characteristics are not affected much by the model structure or whether the data is from the population or a sub-population. The preferences are revealed when people actually take trips. The distinction of the three models is what behaviors are being modeled. Model 1 and Model 2 both incorporate the behavior of taking or not taking a day trip in one choice occasion, the former in the group of beach-goers, the latter in the general population. Model 3 separates the behavior of participating or not participating in one season from the behavior of taking or not taking a day trip in one choice occasion through 69 the participation hurdle. Although it is not a conceptual hurdle derived from an objective utility function with constraints, it can explain how people behave to some extent, and make use of more information compared to Model 2. 70 Table 14: Full Information Maximum Likelihood (FIML) Estimation Results Model 1 Model 2 Variables Estimates t Statistics Estimates t Statistics Travel Cost -0.0280*** -20.0 -0.0312*** -21.1 Log(Length) 0.126*** 4.85 0.139*** 5.20 Temperature 0.0589*** 5.96 0.0601*** 5.95 Closure Days of 2010 -0.0189*** -4.47 -0.0207*** -4.54 LP Northeast -0.0642 -0.173 -0.197 -0.534 LP Mid-East -1.29*** -3.32 -1.56*** -4.00 LP Southeast -1.38*** -3.23 -1.68*** -4.03 LP Northwest 1.16*** 4.70 1.15*** 3.76 LP Mid-West 0.901*** 3.49 0.992*** 3.30 LP Southwest 0.321 1.20 0.406 1.22 Lake Level Nesting Parameter 0.644*** 11.7 0.705*** 12.6 Trip/No Trip Level Nesting Parameter 0.547*** 9.00 0.596*** 9.89 No Trip Male -0.152* -1.65 -0.118 -1.31 Age -0.0027 -0.768 0.00391 1.13 White 0.378* 1.91 0.014 0.0682 Education Years -0.0106 -0.483 -0.0918*** -6.30 Full-Time Employed 0.212** 2.31 0.106 1.04 Retired 0.187 1.12 0.18 1.12 Children under 17 0.133 1.54 0.097 1.09 Constant 5.30*** 9.29 7.23*** 12.5 Note: *10% significance level; **5% significance level; *** 1% significance level Model Levels Beach Level 71 Model 3 Estimates t Statistics -0.0281*** -17.3 0.126*** 4.93 0.0581*** 5.69 -0.0189*** -4.03 -0.0621 -0.196 -1.30*** -3.57 -1.39*** -3.30 1.17*** 4.25 0.903*** 3.65 0.325 1.23 0.645*** 12.5 0.544*** 7.45 -0.151 -1.56 -0.0026 -0.757 0.383 1.61 -0.0098 -0.490 0.215** 2.38 0.186 1.12 0.136 1.52 5.23*** 9.59 Table 14 (cont’d) Model 1 Model 2 Variables Estimates t Statistics Estimates t Statistics Nesting Parameter Male Age White Education Years Retired Children under 17 Constant Note: *10% significance level; **5% significance level; *** 1% significance level Model Levels Participation Hurdle Not Participate 72 Model 3 Estimates t Statistics 0.00511*** 21.4 0.0579 0.591 0.0148*** 4.98 -0.767*** -4.02 -0.176*** -9.16 0.124 0.998 0.0704 0.738 5.59*** 17.3 Table 15: Welfare Estimates of Changing a Beach in 2011 Dollars at Individual Level Closure of One Beach in the 38 Region Marginal Increase in Length of One Beach in the Region Huron North Huron South St. Clair Erie Michigan North Michigan Central Michigan South Huron North Huron South St. Clair Erie Michigan North Michigan Central Michigan South Model 1 -0.0408 -0.113 -0.989 -1.81 -0.0600 -0.700 -0.370 0.0262 0.0419 0.469 0.449 0.0232 0.186 0.134 Season Model 2 -0.0254 -0.0685 -0.645 -1.08 -0.0368 -0.432 -0.228 0.0162 0.0254 0.310 0.280 0.0144 0.116 0.084 Season/Total Trip Change Model 1 Model 2 Model 3 37.5 33.3 38.2 36.7 31.9 37.2 36.4 32.5 36.6 36.4 32.4 36.5 38.1 33.5 39.0 38.5 34.0 38.3 38.3 33.7 38.0 39.8 33.6 42.5 38.0 31.8 36.8 36.7 32.3 36.4 36.5 32.5 36.3 31.2 34.7 36.8 38.3 33.7 38.4 38.0 33.5 38.1 Model 3 -0.0232 -0.0713 -0.694 -1.27 -0.0278 -0.324 -0.172 0.0152 0.0262 0.326 0.317 0.0110 0.0877 0.0634 38 Season/Site Trip Change Model 1 Model 2 Model 3 12.2 13.2 12.4 12.7 13.3 12.8 13.3 14.2 13.3 14.8 15.5 14.7 11.9 13.3 11.7 12.7 13.6 12.7 12.8 13.6 12.7 13.0 13.4 14.0 13.3 13.4 12.8 14.4 15.0 14.2 17.1 17.4 16.8 9.82 13.5 11.4 12.9 13.7 12.9 12.8 13.7 12.9 As described in the text, we construct 451 scenarios where one of the 451 beaches is closed in one scenario, which will give us the value of each beach. A region has multiple beaches, so we use the average value of these beaches to represent “One Beach in the Region”. It is the same with marginal increase in beach length. 73 Table 16: Welfare Estimates of Changing a Beach in 2011 Dollars (Million) at State Level 39 Closure of One Beach in the Region Marginal Increase in Length of One Beach in the Region Huron North Huron South St. Clair Erie Michigan North Michigan Central Michigan South Huron North Huron South St. Clair Erie Michigan North Michigan Central Michigan South Model 1 -0.172 -0.477 -4.18 -7.65 -0.254 -2.96 -1.56 0.111 0.177 1.98 1.90 0.098 0.787 0.569 39 Season Model 2 -0.185 -0.499 -4.70 -7.86 -0.268 -3.15 -1.66 0.118 0.185 2.26 2.04 0.105 0.848 0.613 Model 3 -0.169 -0.520 -5.06 -9.26 -0.203 -2.36 -1.25 0.111 0.191 2.38 2.31 0.0802 0.640 0.462 As described in Section 2.3, for the seasonal value, in Model 1, the average individual values were multiplied by the population of adults living in the Lower Peninsula of Michigan adjusted by the participation rate 58.01%. In Model 2 and Model 3, the population values are the average individual values multiplied by the population of adults living in the Lower Peninsula of Michigan. 74 Table 17: Estimated Trips and Welfare Changes of Closing All Beaches on a Great Lake in 2011 Dollars Individual Level Number of Trips Model Model Model Model 1 2 3 1 Erie 0.236 0.136 0.167 -5.16 St. Clair 0.422 0.260 0.298 -9.43 Huron 0.820 0.472 0.496 -20.6 Michigan 3.46 2.00 1.62 -118.1 State Level (Million) Number of Trips Model 1 Model 2 Erie 1.00 0.99 St. Clair 1.79 1.90 Huron 3.47 3.44 Michigan 14.6 14.6 Season Model 2 -2.84 -5.63 -11.2 -62.0 Model 3 -3.59 -6.59 -12.0 -53.8 Model 3 1.22 2.17 3.61 11.8 75 Season/Total Trip Change Model Model Model 1 2 3 36.4 32.4 36.4 36.4 32.4 36.4 36.9 32.7 36.8 37.4 33.2 37.3 Model 1 -21.8 -39.9 -86.9 -499.2 Season/Lake Trip Change Model Model Model 1 2 3 21.9 21.0 21.5 22.3 21.6 22.1 25.0 23.6 24.2 34.1 31.0 33.2 Season Model 2 -20.7 -41.0 -81.4 -451.6 Model 3 -26.2 -48.0 -87.5 -391.8 To compare valuation results, three scenarios are constructed: closing one beach in different regions, marginally increasing the length of one beach in different regions, and closing all beaches on one Great Lake. As described in Section 2.3, there are two measures of welfare for each scenario, per season (columns titled “Season” in Table 15, 16 and 17) and per trip (columns titled “Season/Total Trip Change”, “Season/Site Trip Change” and “Season/Lake Trip Change” in Table 15, 16 and 17). The per trip measures come from normalizing per season measures by the change in the expected number of trips to the affected site(s), or the change in the number of trips to any sites, so that results of multiple-site demand models are comparable to those of single-site demand models or models with different choice sets. Take the per season measure as an example. When a beach is closed or there is a marginal increase in beach length, we consider the change as happening separately at each of the 451 beaches. In the case of beach j, we compute the welfare change for each person in the sample, which can be denoted as ̂ following the previous notation, and take the weighted average across people as the average per person welfare estimate of beach j, ̂ . Then with the average per person welfare estimates for all 451 beaches, we calculate the mean values within every region to represent a beach in one region. ̂ ∑ ̂ where R is the total number of individual beaches in that region. When all beaches on one Great Lake are closed, the computation is the same except that the last step is not applied. 76 Aggregating per person estimates to the population gives welfare estimates at the population level, which includes adults living in the Lower Peninsula of Michigan in our study. Regarding per season measures, when a beach is closed, seasonal welfare losses are larger for Lake Erie and Lake St. Clair because they have much fewer beaches compared to Lake Huron and Lake Michigan. If one beach is closed, there are not many substitutes, and the utility decreases a lot. When the length is increased by 1 mile on one beach, seasonal welfare changes are also larger for Lake Erie and Lake St. Clair. Beaches on these two lakes are all shorter than 0.5 mile, while beaches on the other two lakes tend to be much longer. With the logarithm, a marginal increase in length will lead to more utility increase for short beaches than for long beaches. Hence, the welfare gains for changes in length at single sites are much smaller for Lake Huron and Lake Michigan. When one entire lake is closed, seasonal welfare loss is the largest for Lake Michigan, then Lake Huron. Lake St. Clair and Lake Erie have much smaller values. Lake Michigan has the largest number of beaches. The maximum utility one could attain would greatly decrease if all beaches on Lake Michigan were closed. There is much less variation in per trip measures across regions and lakes because they are normalized by trip changes. Thus, closures at more valuable beaches/lakes will lose more trips. Hence, these normalized measures tend to remove the difference in demand for different sites, and are comparable to those from single-site demand models that assume only one site is available. 77 To compare across three models, it can be seen that at the individual level, seasonal welfare estimates in Model 2 and 3 are about 55% to 60% of those in Model 1, which can be explained by the fact that only users and potential users are included in Model 1 and the participation rate in the sample is 58.01%. At the State level, when all the results are generalized to the population, seasonal welfare changes in Model 1 are a little smaller than Model 2 if the change is for one beach, and bigger if the change is for one lake. Possible explanations could be that users and potential users have slightly less elastic demand than nonusers with small changes, and more elastic demand with big changes. Model 1 and 2 predict almost the same number of trips to each lake. Model 3 has somewhat different patterns: higher values for beaches on Lake Erie, Lake St. Clair and Lake Huron, and lower values for beaches on Lake Michigan. It is the same case for estimated trips. Compared to Model 1 and 2, Model 3 has different allocations across lakes, fewer trips to Lake Michigan and more trips to the other three lakes. The total number of predicted trips is also smaller. English (2008) also found that the hurdle model tended to smooth the variation in trip prediction for different areas. For population estimates, we would expect models to produce similar results if the population mean is preserved by the model. However, Model 3 loses the mean-fitting property that Models 1 and 2 possess for the total trips and for trips by region. One reason is the use of individual-level participation rates. With the participation hurdle, the predicted participation rate on average is 58.06%, almost identical as the sample, but there may be a lot of variation across people. When inserting the participation rate at the individual level, differences in each person’s welfare changes and estimated trips could be enlarged rather than being averaged out. For instance, even though the means of two random 78 variables are the same in two models, the product of the means in one model is not necessarily equal to the mean of the two variables’ products in other model. Considering Model 1 and Model 3, which are essentially the same model except the participation hurdle, in Model 1, web survey respondents receive a 100% participation rate. Mail survey respondents are not included so their participation rates are 0. In Model 3, the estimated participation rate is positive for each person in users, potential users and nonusers. The noise associated with individual estimates of the participation rates may not go away in the aggregation process. 79 5 Discussion and Conclusions In this chapter, a repeated nested logit model is estimated with data from a two-stage survey of the general population, providing policy makers with monetary values of public beaches on Lake Erie, Lake St. Clair, Lake Huron and Lake Michigan. We find that 58% of Michigan adults living in the Lower Peninsula of Michigan participate in Great Lakes beach recreation during the summer season. In the general population, people who are young, white and more educated are more likely to participate. Once participating, people who are male, non-white and not employed full-time tend to take more day trips. The value of an individual public beach is about $32-$39 per trip, depending on the region. If length on one beach is increased by one mile, the welfare gain is about $31$43 per trip. About 20.9 million day trips in total are taken to public Great Lakes beaches (excluding Lake Superior) each summer by Michigan adults from the Lower Peninsula, with about 14.6 million for Lake Michigan. The results show that access to beaches for day trips on Lake Michigan is worth over $400 million each year to Michigan adults living in the Lower Peninsula of Michigan. These values are relevant to decisions on beach issues such as quality maintenance and beach facility construction, and to policy decisions about the value and environmental improvement of Great Lakes beaches. This chapter also clarifies whether including nonusers and differentiating them from potential users will make a difference. In previous studies, if only one survey was implemented, the two groups were pooled and nonusers were treated as potential users who took no trip during the season; if there was a screener survey, the purpose was to recruit a sample for follow-up surveys and the data was rarely used. We follow what was 80 done in English (2008) with the improvement that we also collected individual-level data for nonusers. The estimation results of three models show that pooling nonusers with potential users will produce different parameter estimates and welfare estimates compared to using information only from users and potential users. When the behavior of participation/nonparticipation is explicitly modeled, it hardly influences estimated parameters for the beaches because nonusers provide no trip information; however, it does tell us what factors could play a role in determining whether to participate or not. We can predict the participation rate for any individual. However, the unconditional results for total trips and welfare measures for the hurdle model are somewhat different because presence of the hurdle leads to different spatial allocations of trip-taking behaviors when results are aggregated to the population level. The loss of prediction power might also be attributed to the lack of theoretical support for the participation hurdle. As stated in English (2008), the hurdle could inadequately capture people’s economic response to factors other than their own demographics, like site characteristics and possible investments for participating in beach recreation (e.g. buying a boat). Future work may focus on deriving a participation hurdle with an objective utility function and relevant constraints. In this chapter, the seasonal inclusive values are used to represent the utility of participation. English (2008) also incorporated cost of licenses as another factor in the hurdle. If data is available on beach access fees, it could be combined with people’s leisure activities in the mail survey to derive the equations for participating and not participating from more comprehensive utility maximizing behaviors. It is worth further investigating the implications of losing the mean-fitting property of the typical repeated logit models when a hurdle is incorporated. In addition, 81 we have only a few quality variables in this chapter. Although regional dummies can explain site characteristics to some extent, choices among beaches will be more accurately modeled with more data at the beach level, such as beach width, facilities, whether a beach is located in the state park, etc. 82 Chapter 3 Modeling Long Overnight Trips by Chaining Recreation Sites 1 Motivation In recreation studies, valuation often applies to trips where recreation is the single objective and only one site is visited, so day trip data is the most widely used as it normally meets the two requirements (Caulkins et al (1986), Lew and Larson (2005), Moeltner and Shonkwiler (2005), Scarpa and Thiene (2005), Smith (2005), von Haefen et al (2005), Kim et al (2007), Timmins and Murdock (2007), Parsons et al (2009), etc.). Some studies, most of which are for fishing or hunting trips, do not explicitly differentiate overnight trips from day trips, or give the same treatment to the two types of trips, where the single-objective and single-site assumptions are still imposed (Morey et al (1993), Englin and Shonkwiler (1995), Haab and Hicks (1997), Provencher and Bishop (1997), Shrestha et al (2002), Schuhmann and Schwabe (2004), Morey et al (2006), Cutter et al (2007), Hynes et al (2007), Haab et al (2008), von Haefen and Phaneuf (2008), etc.). 83 Table 18: Examples of Literature Not Differentiating Overnight Trips from Day Trips Papers Activities Models Morey et al (1993) Fishing Random Utility Models Englin and Shonkwiler (1995) Boating, Swimming and Fishing Count Data Models - Haab and Hicks (1997) Visiting Beaches, Fishing and Boating Random Utility Models - Provencher and Bishop (1997) Fishing Dynamic Programming - Shrestha et al (2002) Fishing Count Data Models - Schuhmann and Schwabe (2004) Fishing Random Utility Models - Morey et al (2006) Fishing Latent Class Model - Cutter et al (2007) Visiting National Parks Random Utility Models - Hynes et al (2007) Kayaking Random Utility Models - Haab et al (2008) Fishing Random Utility Models - von Haefen and Phaneuf (2008) Hunting Random Utility Models - 84 Comments Recode all trips to Maine rivers as day trips, and all trips to Canadian rivers as four-day trips Table 19: Studies Dealing with Overnight/Multiple-Objective/Multiple-Site Trips Papers Activities Kealy and Bishop (1986) Fishing Mendelsohn et al (1992) Visiting National Parks Hoehn et al (1996) Fishing McKean, Walsh and Johnson (1996) Tay, McCarthy and Fletcher (1996) Parsons and Wilson (1997) Models Welfare Measures Demand Theory of Single Site System of Demand Equations Random Utility Models 19.5 per Day in 1978 Dollars 16.8 per Day in 1982 Dollars 66.7 per Multiple-Day Trip in 1994 Dollars 69.2 per Trip in 1986 Dollars Fishing Count Data Models Fishing Random Utility Models Fishing Count Data Models Shaw and Ozog (1999) Fishing Random Utility Models Loomis, Yorizane and Larson (2000) Whale Watching Count Data Models Lupi et al (2003) Fishing Random Utility Models Yeh, Haab and Sohngen (2006) Visiting Beaches Random Utility Models 85 Comments Explicitly model the number of recreation days Combine multiple sites as one composite Put day and overnight trips in two separate nests Include price and time variables for secondary sites Use portfolios of destination, N/A duration and frequency 58.8-76.9 per Day Trip in Define one dummy variable for 1989 Dollars incidental consumption 268 in 1988 Dollars on Test two structures with a level Catch Rate Improvement of trip length 75.0 per Day in 1993 Distinguish incidental trips from Dollars joint consumption Allow different preference 125.0 per Multiple-Day parameters for day and overnight Trip in 1994 Dollars trips 1.45 in 1998 Dollars on Reducing One Advisory Make an adjustment to travel cost for multiple-objective trips Although most recreation trips are day trips, overnight trips make up a nontrivial portion of recreation trips, and demand for recreation activities will be more accurately modeled if these trips are accounted for. Previous studies have proposed several approaches to deal with overnight trips. Kealy and Bishop (1986) derived the demand equation from utility theory and used the total number of recreation days as the dependent variable. Explanatory variables included demographic characteristics, travel cost, daily on-site costs, daily overnight expenditures, etc. Multiple sites were not involved though. Hoehn et al (1996) proposed a repeated nested logit model with a trip length level for fishing trips where day trips and overnight trips were in two separate nests. Trip duration was taken into account as well as locations and target species. In Tay, McCarthy and Fletcher (1996), a multinomial logit model was applied to annual fishing trips. The alternatives were not only individual sites, but also included trip duration and frequency information. A subset of the universal set was used in estimation, and sampling correction was applied. Shaw and Ozog (1999) specified two nesting structures in a repeated nested multinomial logit model. One put the trip length level above the site level, and the other had the opposite order. The first model had independence parameters within the unit interval. Lupi et al (2003) implemented a repeated nested logit model with a trip length level for single and multiple day trips. They allowed different parameters for day and overnight trips, and the estimated results showed that the marginal utility of income was lower for overnight trips. However, these studies still assume only one site is visited on overnight trips. To address the issue of multiple–sites or multiple-objective trips, Mendelsohn et al (1992) combined all sites people visited as composites, which were added to the system of 86 demand equations as additional alternatives. People could substitute between these composites and individual sites. McKean, Walsh and Johnson (1996) included price and time variables for secondary sites when estimating the demand function of the primary site. Since the secondary sites were close to the primary site in their study and shared similar characteristics, these variables were automatically dropped from estimation due to multicollinearity. Parsons and Wilson (1997) proposed a theory to incorporate incidental and joint consumption in count data models using a dummy variable as a proxy. It could be interacted with site quality and demographic variables. Both multiple-objective and multiple-site trips would be allowed in this approach. They found that incidental consumption was a complementary good for recreation trips. Loomis, Yorizane and Larson (2000) distinguished incidental trips from joint consumption using two sets of dummy variables if both were incurred on a trip. They asked a screening question in the survey to identify whether a trip was single-purpose or involved incidental and consumption. Yeh, Haab and Sohngen (2006) applied a nested logit model to day and overnight trips, and adjusted travel cost based on the proportion of time spent on the recreation purpose for multiple-objective trips. Nonetheless, these methods either process the data in a way that multiple-site trips can be fit into the framework of single-site trips, or model the existence of multiple-site trips using dummy variables. As yet, there are no applications where allowing people to decide how many sites to visit and where to go have both been incorporated into a site choice model. To fill this gap, in this chapter, we extend the traditional model where only the main destination is visited on overnight trips, to a three-level nested logit model which explicitly incorporates people’s decision on the number of sites and choice of sites 87 to visit on an overnight trip. The data is from overnight trips where the main purpose is recreation and people may visit any combination of 49 distinct sites. We want to see whether the proposed model does a good job on explaining people’s behaviors and produces different welfare estimates compared to the models based on the main destination assumption. 88 2 Models The traditional way to model overnight trips is to assume people only visited their main destination. With this assumption, we will have a simple conditional logit model as in Figure 8. Figure 8: Decision Tree of Main-Destination Model Following Train (2003), on the overnight trip, the utility person n obtains from visiting site i as the main destination is: where the indirect utility may include travel cost, site characteristics, and their interactions with demographic variables. measures unobserved factors. Person n will go to site i if and only if: 89 When follows an i.i.d. Type I extreme value distribution, suppose the number of alternatives is K in the choice set, the probility of visiting site i is: () ∑ All the sites are independent, and the relative probility of visiting site i over site m is not affected by other sites. The assumption of independence from irrelevant alternatives (IIA) holds. Figure 9: Decision Tree of Model Allowing Multiple Sites per Trip To build multiple sites into the model, we propose the structure in Figure 9. A person will simultaneously decide whether to visit one or two sites and where to go. Within the nest of visiting two sites, the first level represents the primary site, on which one spends the most amount of time; starting from there, one chooses the secondary site 90 from the rest of alternatives.With K sites, the number of alternatives is also K in the nest of visiting one site, and K× (K-1) in the nest of visiting two sites. The total is K× which K, greatly enlarges the choice set compared to the traditional model. As described in chapters 1 and 2, if person n decides to visit one site, the conditional probability of choosing site i is: ( ) ∑ If person n decides to visit two sites, the probability to choose j as the secondary site conditional on k being the primary site is: ( ) ∑ k-1 means k is excluded from candidates for the secondary sites. The conditional probability that a person n chooses k as the primary site is: ( (∑ ) ∑ (∑ ) ) Then for person n, the inclusive values of visiting one site and two sites are: 91 (∑ (∑ (∑ ) ) ) which is the maximum utility person n can attain if visiting one site and two sites respectively. To investigate whether demographic variables have any effects on selecting the number of sites, we put them into the indirect utility of visiting one site: Then the maximum utility person n can attain from taking an overnight trip is: ( ) The probabilities of visiting one site and two sites are: ( ) ( ) 92 Hence, the unconditional probabilities of choosing only site i or the pair of site k and j are: () ( ) ( ) ∑ ( ) ( ) ( ) (∑ ∑ ) ) (∑ ( ) The log-likelihood function will be: ∑ ∑ ( ∑ ∑∑ 93 ( )) ( ( )) where people visit one site and people visit two sites; y is the binary indicator for the chosen alternative. The indirect utilities and are composed of the price variable, i.e. travel cost, and quality variables. , and could be different, so that we can test their relationships, for instance, whether the sum of and is equal to .Unlike previous studies where day trips are estimated together with overnight trips, in this case, the marginal utility of income is the same no matter how many sites one visit. Welfare estimates in the conditional logit and nested logit models are per trip measures with respect to the choice set since the “don’t go” option is not available. If one site is closed, or there is a marginal increase on the length of one site, the estimated welfare change for person n is: ̂ ̂| ̂| ̂ The weighted average gives the per person value: 94 ̂ ̂ ∑ ∑ Where N is the sample size for the model, and is person n’s weight. To facilitate comparison of the model results to those of single-site models or models with different choice sets, welfare measures can be normalized by increase/decrease in the probability of visiting the changed site. Denote the changed site as m, it is straightforward in the conditional logit model. ̂ ̂ ̂ ∑ ̂ ∑ In the nested logit model, however, site m appears at multiple nodes. If it is closed, the number of alternatives reduces to K-1 in the nest of visiting one site, and (K-1)*(K-2) in the nest of visiting two sites. If its length increases, characteristics of more than one alternative will be affected. In other words, ̂ is the sum of person n’s estimated probabilities to visit site m and alternatives including site m. ̂ ̂( ) ∑ ̂( ) ∑ ̂( ) In the second term, m is the primary site, and in the third term, m is the secondary site. Following Parsons and Wilson (1997), a pooled truncated Poisson model is also estimated. We refer to it as a pooled model because it is an ad hoc single site demand formulation that ignores the complexities of multiple substitute sites and models a generic 95 trip demand using data on the site a person visited. Because people visit different sites and take different numbers of trips, the effects of quality can be generically entered and identified. With main destinations, we have: ( ) It is assumed that each site has the same demand function. This is a pooled model, so generic site quality variables can be included. The dependent variable x is the number of overnight trips. For the multiple-site version of this model, denote the dummy variable for visiting two sites as D, and the equation becomes: ( ) ( ) The last interaction term uses 1-D instead of D in order to be consistent with the nested logit model above, so that it captures how people visiting one site differ from those visiting two sites. The access value per trip is 96 ⁄( ) in the pooled count model. 3 Data The data comes from a two-stage survey we conducted in 2011 and 2012. A screener mail survey went out to Michigan residents to recruit participants in beach recreation. The sample was drawn from Michigan’s driver license list, and the surveys asked about people’s leisure activities and participation obstacles. To reduce potential self-selection bias, the screening question was but one of many questions on the screener survey. People who said they had visited a beach on the Great Lakes since June, 2010 were invited to the follow-up web survey, which asked about trips taken to public Great Lakes beaches in Michigan in the summer of 2011. Following the approach in Parsons et al (2009), the web survey categorized trips into three types: day trip (lasting a day or less), short overnight trip (less than four nights) and long overnight trip (four nights or more). In the web survey section on long overnight trips, beside trip frequency information, detailed questions were presented for one randomly selected trip. People were asked to report the beaches on which they spent the most/second most/third most amount of time, as well as the number of days on each beach. With information on how many sites people visited and where they went, we are able to apply the proposed model with multiple sites to value long overnight trips. To construct the choice set, given there are 588 public Great Lakes beaches in Michigan according to Michigan Department of Environmental Quality (DEQ). We will have a 588× 588 choice set if individual beaches are used, and this is extremely computationally burdensome. Based on literature on site aggregation (Lupi and Feather (1998), Haener et al (2004), etc.), we aggregate the 588 public beaches into 49 97 aggregated sites, where the key factors to consider are beach popularity, geographic distribution and heterogeneity of travel cost (Figure 10 and Figure 11). A beach is more likely to stand on its own if many people go there. Beaches with no visits are dropped. Since the travel cost parameter is the denominator of all welfare estimates, to minimize the distance heterogeneity in all aggregated sites, we keep the average distance between two individual beaches under 18 miles within one site. Even with aggregation, a choice set of 49 sites is relatively large compared to previous literature. For instance, Shaw and Ozog (1999) aggregated 13 rivers into 8 groups, and Kaoru et al (1995) had 29 aggregates from 80 sites. In the web survey, 447 people took long overnight trips in the summer of 2011. Before aggregation, 337 visited one beach, 81 visited two beaches and 29 visited three beaches. After aggregation, 355 visited one site, 71 visited two sites and 21 visited three sites. Hence, although we use 49 aggregated sites to represent 588 individual beaches, there is not much information on trips with multiple sites that is lost with aggregation. Following the aggregation literature, characteristics of these sites are averages of individual beach characteristics, and the number of elemental beaches within an aggregate is included in the estimation (Ben-Akiva and Lerman (1985), Parsons and 40 Needelman (1992)). Individual beach length was provided by Michigan Department of Environmental Quality. Data on water surface temperature in the survey season was obtained from National Oceanic and Atmospheric Administration (NOAA) Great Lakes Environmental Research Laboratory (GLERL) using Great Lakes Observing System 40 It is defined as the length of shoreline reach. 98 41 (GLOS) Point Query tool . Daily temperatures were retrieved and averaged into monthly temperatures, because we know the month of the trips but not the exact days. Monthly data was directly used for Lake St. Clair as its daily data was not available. Individual beaches were matched to the nearest location with temperature data. All individual beaches’ characteristics are averaged to get the quality data for aggregated sites. Figure 10: Public Great Lakes Beaches Visited On Long Overnight Trips 41 http://glos.us/data-tools/point-query-tool-glcfs 99 Figure 11: Aggregated Beach Areas in the Long Overnight Trip Model Figure 12: GLOS Points on Great Lakes in Michigan 100 In data analysis, for the 21 people who visited three sites, the third site was truncated, and they were pooled into people visiting two sites, because the group is too small to identify, and the model may become intractable. The descriptive statistics of participants taking long overnight trips are shown in Table 20. It can be seen that people visiting two sites are not very different from people visiting one site. Table 20: Demographic Characteristics of Participants with Long Overnight Trips Age (Mean) Income (Mean, $1000) Education Years (Mean) Male (%) White (%) Employed Full-Time (%) Retire (%) Children under 17 (%) Participants 45.5 95.7 15.2 44.7 96.8 54.9 18.1 39.6 Visiting One Site* 45.7 95.1 15.2 45.3 96.2 54.9 18.6 38.4 42 Visiting Two Sites* 44.9 98.3 15.5 42.4 99.1 55.1 16.3 44.1 *Note: People visiting two sites were not significantly different from people visiting one site except for the race variable “White”, where the difference was statistically significant at 5% level. To compute each person’s travel cost, we have: ( ( ⁄ ) 42 These are weighted by corresponding weights. 101 ⁄ ) $0.476 per mile is the total driving cost minus maintenance and insurance costs for an 43 average size car in 2011, reported by American Automobile Association (AAA) . Time cost is the opportunity cost. A person employed full-time works approximately 2,000 hours per year, and the hourly wage can be derived. As discussed in Chapter 9 of Champ et al (2003), for people working with fixed time schedule, normally one third of the hourly wage is treated as the time cost. Travel distance and travel time are calculated in 44 PC miler, the logistic software, and their measures are mile and hour respectively . For alternatives in the nest of visiting two sites, the round trip travel distance and travel time is counted from permanent residence to the primary site, the primary site to the secondary site, and the secondary site back to permanent residence. Demographic variables included in the model are those listed in Table 20 as well as three dummies indicating whether one’s income is within 0 to 25% percentile, 25%-50% percentile, or 50%-75% percentile. The dummy variables for income are considered here to test if income tends to play a role in the decision process for visiting multiple sites on long overnight trips. We estimate four models for comparison: the traditional model with main destination, the proposed multiple-site model with and without demographics, and the pooled truncated Poisson model. The maximum likelihood estimation of the first three models is programmed in Matlab, and the standard errors are computed using the inverse 43 This is one way to compute travel cost. Another way would be the operating cost (gas, maintenance and tires) plus depreciation caused by driving, which gives $0.2422 per mile. Results using this travel cost are available upon request. 44 The travel cost in this study is for each adult, not household. It does not count the number of people in one vehicle. 102 of Hessian. It takes about 8-10 hours to estimate the multiple-site model with demographics. The pooled truncated Poisson model is estimated in Stata. 103 4 Estimation Results It can be seen from Table 21 that estimated parameters for the travel cost variable are different in the three models. If we take into account the scale effect, for visiting one site in the two nested logit models, β/σ is -0.00413 and -0.00404 respectively, which are both bigger than the main destination model. The length variable has positive estimates in all the models, but it is only statistically significant with visiting one site. The sign for water temperature is negative, which may be counterintuitive and is the opposite of what we expected. In fact, water temperature is highly correlated with regions. After analyzing the data of long overnight trips, we find that more people go to Lake Superior and the north part of Lake Michigan and Lake Huron, where the water is cold. Beaches on these areas may have distinct unmeasured characteristics, compared to beaches in the south, and people who take long overnight trips might care more about such unmeasured beach quality than about water temperature. Thus, the regional effects are confounded with the 45 temperature variable and influence the signs of estimates . The scale parameters in two nested logit models are all statistically significant and within the unit interval, which is consistent with the utility maximization behavior and indicates that nesting with multiple sites is better than no nesting. . However, we don’t find significant difference between people who visit one site and people who visit two sites. 45 It is shown in Appendix E that with regional dummies to control the unmeasured regional beach characteristics in the main destination model, the estimated parameter of water temperature turns positive. However, both multiple-site models will not converge with these regional dummies variables as explained in Appendix E. 104 Table 21: Full Information Maximum Likelihood (FIML) Estimation Results Multiple-Site Model w/o Demographics Estimates t statistics Estimates t statistics Travel Cost -0.00327*** -6.47 -0.00172*** -2.96 One: Length 0.283* 1.90 0.140*** 2.76 One: Temperature -0.0602 0.658 -0.0241** -2.15 One: # of Beaches 0.0287** 2.25 0.0111* 1.89 Two: Primary Length 0.0748 0.479 Two: Primary Temperature -0.102*** -4.44 Two: Primary # of Beaches 0.0341 1.63 Two: Secondary Length 0.0412 1.41 Two: Secondary Temperature -0.0242** -2.41 Two: Secondary # of Beaches 0.00651 1.49 Two: Primary Level Parameter 0.161*** 2.68 One/Two Sites Level Parameter 0.416*** 3.22 One: Male One: Age One: White One: Education Years One: Full-Time Employed One: Retired One: Children under 17 One: 0-25% Income One: 25%-50% Income One: 50%-75% Income Note: *10% significance level; **5% significance level; *** 1% significance level Variables Main-Destination Model 105 Multiple-Site Model w/ Demographics Estimates t statistics -0.00226*** -3.51 0.187*** 3.04 -0.0305** -2.4 0.0153* 1.94 0.101 0.646 -0.112*** -4.29 0.031 1.45 0.0545 1.46 -0.0326*** -2.67 0.00848 1.54 0.213*** 3.08 0.560*** 3.69 0.0321 0.125 0.00553 0.544 -1.2 -1.39 -0.0406 -0.85 0.122 0.403 0.137 0.293 -0.0619 -0.231 0.484 1.34 -0.311 -0.851 0.122 0.305 Table 22: Estimated Welfare Changes per Person in 2011 Dollars Main-Destination Model Multiple-Site Model w/o Demographics Per Trip/Trip Per Trip Change Multiple-Site Model w/ Demographics Per Trip/Trip Per Trip Change Per Trip Per Trip/Trip Change Closing One Site -6.31 308.7 -5.17 211.0 -5.38 218.4 Marginal Length Increase on One Site 2.03 313.0 1.68 217.7 1.74 225.0 106 As discussed in the previous section, for the length variable, with the null hypothesis that the one site parameter is equal to the sum of the primary site parameter and the secondary site parameter, we have t statistics to be 1.16 and 1.39 in the two multiple-site models, so we cannot reject the null hypothesis at 10% significance level. Estimated welfare changes are shown in Table 22, including per person per trip measures and normalized per trip measures comparable to those of single-site demand models or models with different choice sets. We consider the change as happening at each of the 49 sites, and compute the welfare change for each site. The weighted average across people is the welfare estimate for each site. The numbers in the table are mean values of 49 sites. The two multiple-site models have similar measures, and the inclusion of demographics makes the numbers a little bigger. The estimates of the main-destination model are about 20% higher for per trip measures, and 40% higher for normalized measures. The reciprocal of the scaled travel cost parameter estimate is -305.8 in the main-destination model, -242.1 in the multiple-site model without demographics, and 247.5 in the multiple-site model with demographics. This explains part of the difference among three models since the marginal utility of income is the denominator of welfare estimates. Another factor causing the discrepancy is that the choice of multiple sites is available in two multiple-site models. If one site is closed, the maximum utility one can attain does not decrease that much since combinations of other sites may still give similar utilities. It is the same with a marginal length increase. Therefore, ignoring the possibility of people visiting multiple sites on overnight trips will have larger welfare changes relative to the models allowing multiple sites per trip. 107 Additionally, in the survey, on average, people who took long overnight trips spent 4.20 days on one beach. Dividing normalized access values in Table 22 by 4.20 will produce per beach day values for one site of $73.5 in the main-destination model, $50.2 and $52.0 in two multiple-site models. Considering we are valuing long overnight trips, these numbers are comparable to other beach recreation studies in Table 19. As another point of comparison, we also estimate a pooled truncated Poisson model following Parsons and Wilson (1997). The pooled model is truncated because the data excludes people who didn’t take long overnight trips, but there is no need to adjust for endogenous sampling as in Shaw (1988) because we survey from the general population. The results in Table 23 show that people with children under the age of 17 take fewer long overnight trips. People who are full-time employed or retired might be more likely to visit two sites. The dummy variable indicating a second site is not statistically significant. The access values are more than twice those in the Random Utility Models. Thus, the pooled count data model which assumes an ad hoc single demand equation does not appear to be well-suited to modeling long overnight trips with multiple sites. 108 Table 23: Estimation Results of Truncated Poisson Models Variables Travel Cost Primary Length Primary Temperature Primary # of Beaches Secondary Dummy: D D × Secondary Length D × Secondary Temperature D × Secondary # of Beaches Male Age White Education Years Full-Time Employed Retired Children under 17 0-25% Income 25%-50% Income 50%-75% Income (1-D) × Male (1-D) × Age (1-D) × White (1-D) × Education Years (1-D) × Full-Time Employed (1-D) × Retired (1-D) × Children under 17 (1-D) × 0-25% Income (1-D) × 25%-50% Income (1-D) × 50%-75% Income Constant Main Destination Estimates t Statistics -0.00136** -2.26 0.0511 0.316 0.00773 0.583 -0.012 -0.417 0.224 1.19 -0.00398 -0.537 0.656 1.11 -0.0425 -0.951 -0.165 -0.596 0.246 0.689 -0.693*** -3.06 -0.16 -0.405 -0.358 -0.933 -0.0303 -0.0724 0.676 0.392 Multiple Sites Estimates t Statistics -0.0013** -2.276 -0.00351 -0.0248 0.00977 0.692 -0.0223 -0.774 1.9 0.399 0.221 0.714 -0.0248 -0.484 -0.00478 -0.124 -0.268 -0.635 -0.00856 -0.474 -0.552 -0.535 -0.0322 -0.337 1.36* 1.75 1.75* 1.897 -0.978** -2.24 -0.157 -0.274 -0.988 -1.4 -0.556 -0.846 0.598 1.27 0.0041 0.21 1.4 1.15 -0.0136 -0.125 -1.75** -2.13 -1.68* -1.75 0.338 0.684 0.0536 0.0801 0.769 0.953 0.588 0.738 0.566 0.296 Access Value per Trip in 737.4 767.9 2011 Dollar Note: *10% significance level; **5% significance level; *** 1% significance level 109 5 Discussion and Conclusions In this chapter, we build a model structure for long overnight trips where people can simultaneously decide how many sites to visit and where to go. The options of visiting one or two sites are significantly different. If two sites are visited, unobserved characteristics are shared among secondary sites within one primary site. We find that the value per beach day per person is about $50-$52 for one site in 2011 dollars. The traditional approach assuming only the main destination is visited on overnight trips tends to have larger welfare estimates relative to the models where all possible combinations of sites are included. Since we have trip frequency data, we originally sought to apply a repeated nested logit model which added a level of taking or not taking a long overnight trip. It took about 1-2 days to estimate this four-level repeated nested logit model in Matlab. However, after many tries with different sets of explanatory variables and different nesting structures, such as separating or integrating the primary and secondary sites, using regional dummies and assigning different scale parameters to the nests, either the repeated nested logit model does not converge even with starting values from sequential estimation, or the estimated parameter on the inclusive value for the trip is negative. Recall that in 447 people taking long overnight trips, 92 visited two sites, about 25% of the data. But in the nest of visiting two sites, there are 49× 2352 alternatives, and only 48, 4% of them have visitation information. Therefore, it is probable that our relatively small sample of people visiting multiple sites on long overnight trips leads to the problem of not converging. 110 One direction of future work would be to find more data on beach characteristics and regional amenities, as it may be that regional amenities may be more important than individual beach quality with aggregated sites. More factors beside the length and water temperature could also have significant influence on people’s choices of where to go, like facilities, the convenience of lodging and whether a beach is located in the state park, and may avoid some of the regional correlations that appear especially problematic for the estimated temperature parameter. Other detailed information for the trip may also be included, such as activities, the number of adults and children, etc. Another direction might be to add short overnight trips to the model to fully take into account all the information of overnight trips. In addition, more complicated models like the mixed logit model could be applied, which is flexible on the substitution patterns across people, alternatives and even choice occasions. Nonetheless, these all seem to greatly increase the estimation burden, and more efficient programming may be required. 111 APPENDICES 112 Appendix A Results of Sensitivity Analyses for the Monte Carlo Simulations in Chapter 1 113 Sensitivity analyses are conducted to investigate whether changing underlying factors will have significant effects on the results of the Monte Carlo simulations in Chapter 1. We apply the simulations to three situations below: (1) a new set of true parameters, (2) seven sites in the choice site, and (3) the same number of people in each group. Based on baseline simulation results, in the sensitivity analyses, the nested logit model has site 1 and 2 in one nest, and there are two classes in the latent class model. A.1 Different True Parameters A.1.1 True Model-Latent Class Model Simulation results are shown in the following tables. 114 46 Table A-1: Performance of Latent Class Model When It Is the True Model True Mean Var. MSE Min. Median Max. ̂ -0.12 -0.43 1.71 1.81 -8.89 -0.12 -0.033 ̂ 0.15 -0.33 75.54 75.70 -106.1 0.37 56.32 ̂ -0.07 -0.17 0.30 0.30 -9.11 -0.079 0.43 ̂ 2.15 5.82 458.8 471.7 -120.5 1.81 217.2 ̂ ̂ ̂ 0.7 0.615 0.062 0.070 0.083 0.615 0.993 -1.25 -2.33 31.05 32.20 -18.6 -2.73 16.45 ̂ ̂ -30.71 -57.12 186278 186785 -11330 -24.41 206.5 -0.105 -0.177 0.074 0.080 -2.39 -0.112 -0.083 0.75 0.84 5.31 5.31 -20.98 0.84 14.41 -10.09 -10.62 21.24 21.50 -122.4 -10.36 -4.66 ̂ ̂ ̂ ̂ 46 Results are from 978 iterations. 115 Table A-2: Performance of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model True Conditional Logit Nested Logit ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Mean Var. MSE Min. Median Max. -0.105 -0.092 2.3e-05 2.0e-04 -0.11 -0.091 -0.078 0.75 0.89 0.018 0.037 0.40 0.87 1.36 -10.09 -9.67 2.00 2.17 -14.52 -9.62 -4.03 -0.105 -0.096 4.0e-05 1.3e-04 -0.123 -0.095 -0.077 0.75 0.92 0.022 0.051 0.43 0.91 1.42 -10.09 -9.64 2.02 2.22 -14.63 -9.59 -4.12 Table A-3: Estimated Values of Marginal Quality Change of Latent Class Model When It Is the True Model Class 1 Class 2 Average True -0.41 -0.40 -0.44 -6.62 -15.00 -9.10 -2.27 -4.78 -3.03 Mean -0.61 -0.95 -0.77 -4.56 -42.56 -10.00 -2.36 -5.01 -3.26 Var. 3.72 3.38 3.39 11.35 168717 474.2 0.30 18.61 0.28 MSE 3.75 3.68 3.50 15.56 169304 474.5 0.30 18.64 0.33 116 Min. -4.69 -7.84 -6.07 -37.65 -10670 -623.1 -3.70 -110.8 -8.86 Median -0.91 -0.91 -0.91 -4.90 -10.89 -7.82 -2.41 -4.70 -3.25 Max. 7.44 3.66 5.35 28.62 183.2 44.85 -0.015 0.011 -1.46 Table A-4: Estimated Site Values of Latent Class Model When It Is the True Model 47 Class 1 48 Class 2 49 Average True 9.08 8.33 9.41 5.30 17.16 8.32 7.94 10.98 9.08 Mean 8.87 8.83 8.77 5.41 27.46 8.85 7.91 11.07 8.94 Var. 0.86 1.62 0.12 7.24 33034 15.21 0.075 3.96 0.039 MSE 0.90 1.86 0.53 7.24 33094 15.47 0.076 3.97 0.060 Min. 6.21 5.03 8.13 -27.31 -86.44 -46.73 7.05 8.35 8.36 Median 8.75 8.78 8.80 6.07 14.61 9.20 7.92 10.95 8.95 Max. 14.32 14.88 14.67 19.86 4621 91.99 8.87 56.73 9.85 Table A-5: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model Site Conditional Logit Nested Logit 47 48 49 1 2 3 1 2 3 Site Loss True Estimate 7.94 7.91 10.98 10.64 9.08 9.14 7.94 8.03 10.98 10.74 9.08 8.95 Quality Change True Estimate -2.27 -2.86 -4.78 -3.61 -3.03 -3.20 -2.27 -2.87 -4.78 -3.60 -3.03 -3.16 After we exclude iterations with infinite site values, 884 iterations are used to compute the averages. After we exclude iterations with infinite site values, 913 iterations are used to compute the averages. After we exclude iterations with infinite site values, 819 iterations are used to compute the averages. 117 A.1.2 True Model-Conditional Logit Model Simulation results are shown in the following tables. 118 Table A-6: Performance of Conditional Logit, Nested Logit and Latent Class Models When Conditional Logit Model Is the True Model True Conditional Logit Nested Logit ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Latent 50 Class 50 ̂ ̂ ̂ ̂ Mean Var. MSE Min. Median Max. -0.07 -0.07 1.4e-05 1.4e-05 -0.087 -0.070 -0.060 2.15 2.16 0.014 0.014 1.83 2.16 2.59 -30.71 -30.79 1.51 1.51 -34.47 -30.74 -26.31 -0.07 -0.07 2.2e-05 2.2e-05 -0.089 -0.07 -0.058 2.15 2.16 0.023 0.023 1.75 2.16 2.69 -30.71 -30.79 1.52 1.52 -34.46 -30.75 -26.31 -0.07 -0.098 0.014 0.015 -1.53 -0.073 -0.015 2.15 2.76 6.40 6.77 1.12 2.22 35.24 -30.71 -31.28 17.28 17.59 -97.39 -30.92 3.20 Results are from 988 iterations. 119 Table A-7: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models When Conditional Logit Model Is the True Model Site Conditional Logit Nested Logit Latent Class 51 1 2 3 1 2 3 1 2 3 True 27.00 2.57 6.96 27.00 2.57 6.96 27.00 2.57 6.96 Site Loss Estimate 27.06 2.57 6.96 27.06 2.57 6.96 51 27.58 2.59 6.91 Quality Change True Estimate -19.41 -19.47 -3.67 -3.67 -7.63 -7.65 -19.41 -19.47 -3.67 -3.67 -7.63 -7.65 -19.41 -20.15 -3.67 -3.59 -7.63 -7.55 After we exclude iterations with infinite site values, 876 iterations are used to compute the averages. 120 A.1.3 True Model-Nested Logit Model Simulation results are shown in the following tables. 121 Table A-8: Performance of Conditional Logit, Nested Logit and Latent Class Models When Nested Logit Model Is the True Model True Conditional Logit Nested Logit ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Latent 52 Class 52 ̂ ̂ ̂ ̂ Mean Var. MSE Min. Median Max. -0.07 -0.093 2.6e-05 5.3e-04 -0.11 -0.092 -0.079 2.15 2.53 0.040 0.18 2.02 2.52 3.33 -30.71 -27.33 2.71 14.18 -32.63 -27.35 -22.16 -0.07 -0.071 2.8e-05 2.9e-05 -0.095 0.070 -0.058 2.15 2.17 0.026 0.026 1.72 2.16 2.85 -30.71 -30.76 3.34 3.34 -37.19 -30.77 -25.49 -0.07 -0.11 5.9e-03 3.52 -1.86 -0.10 -0.079 2.15 3.22 7.8e-03 4.67 -14.16 2.83 26.22 -30.71 -27.94 3.47 11.14 -33.61 -27.96 -22.37 Results are from 993 iterations. 122 Table A-9: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models When Nested Logit Model Is the True Model Site Conditional Logit Nested Logit Latent Class 53 1 2 3 1 2 3 1 2 3 True 8.56 16.79 4.32 8.56 16.79 4.32 8.56 16.79 4.32 Site Loss Estimate 9.38 16.70 3.74 8.56 16.81 4.30 53 9.04 16.93 3.87 Quality Change True Estimate -9.68 -9.01 -15.48 -13.55 -5.55 -4.77 -9.68 -9.69 -15.48 -15.53 -5.55 -5.53 -9.68 -9.06 -15.48 -14.92 -5.55 -3.96 After we exclude iterations with infinite site values, 965 iterations are used to compute the averages. 123 A.2 Seven Sites A.2.1 True Model-Latent Class Model Simulation results are shown in the following tables. 124 54 Table A-10: Performance of Latent Class Model When It Is the True Model True Mean Var. MSE Min. Median Max. ̂ -0.06 -0.12 0.46 0.46 -11.06 -0.062 6.12 ̂ 0.49 2.22 292.8 295.5 -120.6 0.52 289 ̂ -0.10 -0.23 0.82 0.84 -9.75 -0.073 0.27 ̂ 0.21 0.15 14.84 14.82 -58.55 0.33 30.95 ̂ ̂ ̂ 0.7 0.42 0.086 0.16 0.0030 0.5 0.994 -8.17 -28.54 1.7e05 1.8e05 -8638 -7.93 3692 ̂ ̂ -2.10 -3.28 15.99 17.37 -13.48 -4.28 12.74 -0.072 -0.097 0.015 0.015 -1.55 -0.073 -0.038 0.406 0.44 0.74 0.74 -9.91 0.409 11.65 -6.35 -6.68 56.7 56.7 -197.1 -6.28 47.5 ̂ ̂ ̂ ̂ 54 Results are from 999 iterations. 125 Table A-11: Performance of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model True Conditional Logit Nested Logit ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Mean Var. MSE Min. Median Max. -0.072 -0.068 6.7e-06 2.3e-05 -0.077 -0.068 -0.061 0.406 0.408 5.8e-03 5.8e-03 0.125 0.407 0.67 -6.35 -6.01 1.23 1.34 -9.60 -6.02 -1.72 -0.072 -0.068 2.3e-05 3.7e-05 -0.088 -0.068 -0.055 0.406 0.410 6.6e-03 6.6e-03 0.149 0.409 0.68 -6.35 -6.01 1.23 1.35 -9.61 -6.03 -1.76 126 Table A-12: Estimated Values of Marginal Quality Change of Latent Class Model When It Is the True Model Class 1 Class 2 Average True -0.93 -1.02 -1.45 -0.86 -1.17 -1.53 -1.22 -0.30 -0.28 -0.32 -0.24 -0.30 -0.37 -0.31 -0.74 -0.80 -1.11 -0.67 -0.90 -1.18 -0.95 Mean -4.78 -0.90 -12.88 -1.92 -1.31 -4.98 -1.79 -0.36 -0.40 -0.60 -0.32 -0.46 -0.64 -0.50 -0.66 -0.73 -1.51 -0.58 -0.86 -1.38 -0.96 Var. 18686 34.3 95285 2290.1 38.03 8065 102.6 0.31 0.30 0.38 0.23 0.34 0.47 0.33 0.41 0.037 40.63 0.10 0.047 2.25 0.085 127 MSE 18681 34.3 95320 2289.0 38.01 8068 102.8 0.31 0.32 0.46 0.24 0.36 0.54 0.36 0.42 0.042 40.75 0.11 0.048 2.28 0.085 Min. -4256 -87.19 -8115 -1415 -81.52 -1644 -142.9 -1.23 -1.44 -2.85 -1.12 -1.78 -3.04 -2.12 -18.28 -2.43 -181.5 -6.46 -3.87 -26.12 -6.34 Median -0.86 -0.95 -1.39 -0.75 -1.09 -1.50 -1.16 -0.55 -0.57 -0.69 -0.48 -0.62 -0.76 -0.63 -0.68 -0.74 -1.15 -0.62 -0.86 -1.24 -0.95 Max. 404.9 59.48 2172 277.3 112.9 1635 218.6 2.78 2.30 0.94 2.32 1.95 1.14 1.37 3.76 0.45 32.17 2.40 0.076 22.53 0.73 Table A-13: Estimated site values of latent class model when it is the true model 55 Class 1 56 Class 2 57 Average 55 56 57 True 2.40 2.69 4.08 2.19 3.11 4.46 3.21 2.26 2.15 2.74 1.58 2.23 3.12 2.37 2.36 2.53 3.68 2.00 2.85 4.06 2.96 Mean 2.34 2.70 5.39 2.26 3.12 5.46 3.51 2.47 2.54 3.11 2.08 2.72 3.53 2.69 2.38 2.55 3.71 2.05 2.84 4.11 2.96 Var. 90.52 21.44 682.4 31.5 29.5 430.2 55.56 0.41 0.35 0.79 0.41 0.40 0.75 0.49 0.028 0.021 0.28 0.024 0.025 0.21 0.041 MSE 90.43 21.42 683.4 31.5 29.5 430.7 55.59 0.45 0.50 0.93 0.66 0.65 0.92 0.59 0.029 0.021 0.28 0.027 0.025 0.21 0.041 Min. -224.1 -43.02 -394.9 -75.54 -51.65 -351.6 -103.3 -1.66 -1.79 -2.35 -1.41 -1.90 -1.96 -2.02 1.14 1.87 -2.25 0.92 2.10 -0.42 1.98 Median 2.32 2.56 3.99 2.03 2.91 4.39 3.10 2.38 2.50 3.35 2.02 2.76 3.76 2.82 2.37 2.54 3.67 2.02 2.84 4.08 2.95 After we exclude iterations with infinite site values, 953 iterations are used to compute the averages. After we exclude iterations with infinite site values, 963 iterations are used to compute the averages. After we exclude iterations with infinite site values, 917 iterations are used to compute the averages. 128 Max. 58.33 68.64 241.7 59.51 83.18 227.4 101.0 10.76 10.69 10.95 10.62 11.04 11.25 10.88 3.79 4.27 12.69 3.50 4.97 12.53 5.74 Table A-14: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model Site Conditional Logit Nested Logit 1 2 3 4 5 6 7 1 2 3 4 5 6 7 True 2.36 2.53 3.68 2.00 2.85 4.06 2.96 2.36 2.53 3.68 2.00 2.85 4.06 2.96 Site Loss Estimate 2.37 2.55 3.61 2.04 2.86 4.02 2.96 2.37 2.55 3.61 2.04 2.86 4.02 2.96 129 Quality Change True Estimate -0.74 -0.72 -0.80 -0.77 -1.11 -1.02 -0.67 -0.64 -0.90 -0.86 -1.18 -1.11 -0.95 -0.90 -0.74 -0.72 -0.80 -0.77 -1.11 -1.02 -0.67 -0.64 -0.90 -0.86 -1.18 -1.11 -0.95 -0.90 A.2.2 True Model-Conditional Logit Model Simulation results are shown in the following tables. 130 Table A-15: Performance of Conditional Logit, Nested Logit and Latent Class Models When Conditional Logit Model Is the True Model True Conditional Logit Nested Logit Latent Class ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Mean Var. MSE Min. Median Max. -0.06 -0.060 5.4e-06 5.4e-06 -0.069 -0.06 -0.054 0.49 0.49 5.1e-03 5.1e-03 0.28 0.49 0.71 -8.17 -8.18 1.44 1.44 -11.62 -8.18 -4.71 -0.06 -0.060 1.7e-05 1.7e-05 -0.077 -0.060 -0.047 0.49 0.49 6.5e-03 6.5e-03 0.28 0.49 0.79 -8.17 -8.18 1.46 1.46 -11.84 -8.18 -4.76 -0.06 -0.084 0.017 0.017 -1.69 -0.061 -0.024 0.49 0.60 0.95 0.96 -5.44 0.51 11.85 -8.17 -8.32 14.86 14.86 -71.36 -8.26 59.98 131 Table A-16: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models When Conditional Logit Model Is the True Model Site Conditional Logit Nested Logit Latent Class 58 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 True 3.15 3.56 3.32 1.86 2.34 4.54 3.44 3.15 3.56 3.32 1.86 2.34 4.54 3.44 3.15 3.56 3.32 1.86 2.34 4.54 3.44 Site Loss Estimate 3.16 3.55 3.33 1.87 2.34 4.55 3.43 3.16 3.55 3.33 1.87 2.34 4.55 3.43 58 3.15 3.55 3.32 1.89 2.34 4.62 3.42 Quality Change True Estimate -1.18 -1.18 -1.29 -1.29 -1.20 -1.21 -0.76 -0.75 -0.91 -0.90 -1.59 -1.60 -1.24 -1.24 -1.18 -1.18 -1.29 -1.29 -1.20 -1.21 -0.76 -0.75 -0.91 -0.90 -1.59 -1.60 -1.24 -1.24 -1.18 -1.19 -1.29 -1.32 -1.20 -1.22 -0.76 -0.69 -0.91 -0.87 -1.59 -1.77 -1.24 -1.27 After we exclude iterations with infinite site values, 904 iterations are used to compute the averages. 132 A.2.3 True Model-Nested Logit Model Simulation results are shown in the following tables. 133 Table A-17: Performance of Conditional Logit, Nested Logit and Latent Class Models When Nested Logit Model Is the True Model True Conditional Logit Nested Logit ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Latent 59 Class 59 ̂ ̂ ̂ ̂ Mean Var. MSE Min. Median Max. -0.06 -0.090 1.1e-05 9.1e-04 -0.10 -0.090 -0.079 0.49 0.64 4.2e-03 0.026 0.43 0.64 0.84 -8.17 -7.10 0.47 1.61 -8.97 -7.13 -4.74 -0.06 -0.060 1.7e-05 1.7e-05 -0.073 -0.060 -0.057 0.49 0.49 2.4e-03 2.4e-03 0.33 0.49 0.53 -8.17 -8.17 0.52 0.52 -10.28 -8.19 -5.40 -0.06 -0.095 1.2e-03 2.5e-03 -0.95 -0.091 -0.079 0.49 0.72 0.48 0.53 -1.33 0.65 16.07 -8.17 -7.02 0.55 1.88 -9.50 -7.04 -4.32 Results are from 999 iterations. 134 Table A-18: Welfare Estimates of Conditional Logit, Nested Logit and Latent Models When Nested Logit Model Is the True Model Site Conditional Logit Nested Logit Latent Class 60 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 True 3.67 1.84 2.61 3.59 2.34 1.66 2.23 3.67 1.84 2.61 3.59 2.34 1.66 2.23 3.67 1.84 2.61 3.59 2.34 1.66 2.23 Site Loss Estimate 3.69 2.14 2.80 3.66 2.12 1.56 1.98 3.66 1.84 2.61 3.59 2.34 1.66 2.23 60 3.69 2.15 2.76 3.67 2.11 1.58 1.97 Quality Change True Estimate -1.60 -1.38 -0.88 -0.85 -1.17 -1.07 -1.55 -1.35 -1.12 -0.91 -0.82 -0.70 -1.02 -0.84 -1.60 -1.60 -0.88 -0.88 -1.17 -1.17 -1.55 -1.56 -1.12 -1.12 -0.82 -0.82 -1.02 -1.01 -1.60 -1.47 -0.88 -0.77 -1.17 -1.07 -1.55 -1.45 -1.12 -0.85 -0.82 -0.62 -1.02 -0.79 After we exclude iterations with infinite site values, 986 iterations are used to compute the averages. 135 A.3 Equal Probability of Membership in Latent Class Model Simulation results are shown in the following tables. 136 Table A-19: Performance of Latent Class Model When It Is the True Model 61 True Mean Var. MSE Min. Median Max. ̂ -0.06 -0.27 0.909 0.953 -8.14 -0.072 3.21 ̂ 0.49 2.98 789.1 794.5 -186.5 0.58 179.1 ̂ -0.10 -0.22 0.946 0.961 -8.76 -0.078 2.98 ̂ 0.21 0.025 8.47 8.50 -46.16 0.22 24.13 ̂ ̂ ̂ 0.50 0.33 0.062 0.091 0.0038 0.32 0.995 -8.17 -116.6 9.2e06 9.2e06 -68980 -7.26 22980 ̂ ̂ -2.10 -2.19 23.96 23.95 -22.23 -2.63 25.5 -0.08 -0.14 0.045 0.048 -2.02 -0.082 -0.039 0.35 0.47 5.52 5.52 -22.26 0.375 17.16 -5.13 -5.68 1213.6 1212.7 -772.4 -5.00 512.3 ̂ ̂ ̂ ̂ 61 Results are from 992 iterations. 137 Table A-20: Performance of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model True Conditional Logit Nested Logit ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Mean Var. MSE Min. Median Max. -0.08 -0.075 1.3e-05 3.9e-05 -0.086 -0.075 -0.064 0.35 0.35 0.031 0.031 -0.18 0.35 0.83 -5.13 -4.62 5.45 5.71 -11.11 -4.61 2.35 -0.08 -0.076 2.6e-05 4.3e-05 -0.092 -0.076 -0.061 0.35 0.33 0.035 0.036 -0.30 0.33 0.85 -5.13 -4.41 6.49 7.01 -13.33 -4.39 3.51 Table A-21: Estimated Values of Marginal Quality Change of Latent Class Model When It Is the True Model Class 1 Class 2 Average True -2.96 -2.83 -2.38 -0.68 -0.69 -0.73 -1.82 -1.76 -1.56 Mean -101.1 -5.05 -10.5 -0.80 -0.77 -0.62 -2.78 -1.70 -1.19 Var. 5.7e06 59411 2.6e06 2.66 2.61 2.79 777.3 52.4 149.8 138 MSE 5.7e06 59356 2.6e06 2.67 2.61 2.80 777.5 52.4 149.8 Min. -65810 -3165 -45320 -8.63 -8.12 -5.48 -737.4 -36.08 -284.8 Median -2.52 -2.48 -2.08 -0.89 -0.88 -0.86 -1.87 -1.78 -1.22 Max. 9373 6066 22390 6.77 7.16 11.57 309 199.7 239.6 Table A-22: Estimated Site Values of Latent Class Model When It Is the True Model 62 Class 1 63 Class 2 64 Average True 11.48 10.83 8.91 8.63 9.06 9.67 10.06 9.95 9.29 Mean 13.83 7.00 1.18 9.63 9.59 9.67 9.88 9.68 9.28 Var. 32094 5240.9 633723 1.06 0.88 1.38 14.85 7.38 39.60 MSE 32062 5249.6 633058 2.07 1.16 1.38 14.81 7.42 39.42 Min. -1864 -1535 -20840 6.60 7.12 6.27 -61.03 -55.06 -121.4 Median 10.22 9.97 8.93 9.71 9.64 9.50 10.0 9.82 9.35 Max. 3966 326.7 10120 25.22 25.11 26.4 53.7 17.57 117.9 Table A-23: Welfare Estimates of Conditional Logit and Nested Logit Models When Latent Class Model Is the True Model Site Conditional Logit Nested Logit 62 63 64 1 2 3 1 2 3 Site Loss True Estimate 10.06 9.94 9.95 9.81 9.29 9.32 10.06 9.95 9.95 9.84 9.29 9.29 Quality Change True Estimate -1.82 -1.60 -1.76 -1.56 -1.56 -1.45 -1.82 -1.53 -1.76 -1.50 -1.56 -1.39 After we exclude iterations with infinite site values, 874 iterations are used to compute the averages. After we exclude iterations with infinite site values, 953 iterations are used to compute the averages. After we exclude iterations with infinite site values, 835 iterations are used to compute the averages. 139 Appendix B Comparison between Driver License List and Census Data 140 The mail survey sample was drawn from Michigan’s driver license list (from the Michigan Office of the Secretary of State). Its demographic statistics are compared to 2010 census data for age and gender. The cut points in the age are from the census. Table B-1: Age and Gender Distribution of Census and Driver License List in Michigan for People Age 16 or Older Michigan Census Driver Age 16+ Age 18+ Age 21+ Age 62+ Age 65+ 100.00% 96.26% 90.46% 21.54% 17.38% 100.00% 97.76% 92.78% 22.48% 18.20% Census Male 48.50% 46.57% 43.62% 9.51% 7.50% Driver Male 49.78% 48.63% 46.08% 10.20% 8.12% Census Female 51.50% 49.69% 46.84% 12.04% 9.89% Driver Female 50.22% 49.13% 46.70% 12.27% 10.08% Table B-2: Age and Gender Distribution of Census and Driver License List for People Age 16 or Older, for the Upper Peninsula and Lower Peninsula Census Upper Peninsula Age 16+ 3.29% Age 18+ 3.20% Age 21+ 3.00% Age 62+ 0.86% Age 65+ 0.71% Lower Peninsula Age 16+ 96.71% Age 18+ 93.07% Age 21+ 87.46% Age 62+ 20.68% Age 65+ 16.68% Driver Census Male Driver Male Census Female Driver Female 3.09% 3.02% 2.89% 0.89% 0.73% 1.71% 1.66% 1.55% 0.40% 0.32% 1.56% 1.52% 1.46% 0.42% 0.34% 1.59% 1.54% 1.45% 0.46% 0.38% 1.53% 1.50% 1.43% 0.46% 0.39% 96.91% 94.74% 89.89% 21.59% 17.47% 46.69% 44.92% 42.07% 9.10% 7.17% 48.22% 47.11% 44.62% 9.78% 7.78% 49.92% 48.15% 45.39% 11.58% 9.50% 48.69% 47.63% 45.27% 11.81% 9.69% As shown in the tables, the joint distribution of age and gender in the driver license list is very close to that of the census data. Therefore, the driver license list reasonably represents the general population of adults in the Lower Peninsula. 141 Appendix C Data Weights 142 The survey weights are constructed in stages, starting with the mail survey sample and ending with weights for the web survey respondents. This section describes each stage of the weights. C.1 Mail Survey Sample Weights The mail survey has a weighted random sample, with the purpose of recruiting as many participants in beach recreation as possible. Thus, the data need to be weighted back for the analysis. Originally, 60% of the sample was drawn from coastal counties and 40% from noncoastal counties in the Lower Peninsula. With removal of people who deceased or moved, this may not be the case, so the weights are calculated by county and applied 65 to the effective sample of 29,613, where the base is the driver license list . 65 Weights are computed as ratios of the percentages in driver license list to the percentages in the sample, so that they are normalized and do not distort the original sample size. 143 Table C-1: Mail Survey Sample Weights for Counties in the Lower Peninsula County Code 1 3 4 5 6 8 9 10 11 12 13 14 15 16 18 19 20 23 24 25 26 28 29 30 32 33 34 35 37 38 39 40 41 43 44 45 46 47 50 51 53 54 County Name Alcona Allegan Alpena Antrim Arenac Barry Bay Benzie Berrien Branch Calhoun Cass Charlevoix Cheboygan Clare Clinton Crawford Eaton Emmet Genesee Gladwin Grand Traverse Gratiot Hillsdale Huron Ingham Ionia Iosco Isabella Jackson Kalamazoo Kalkaska Kent Lake Lapeer Leelanau Lenawee Livingston Macomb Manistee Mason Mecosta 144 Sample Weight 0.67 0.73 0.69 0.69 0.68 1.12 0.65 0.68 0.85 1.24 1.43 1.32 0.67 0.69 1.30 1.18 1.16 1.21 0.72 1.39 1.11 0.67 1.10 1.30 0.69 1.34 1.03 0.68 0.96 1.24 1.35 1.40 1.40 1.21 1.21 0.71 1.22 1.16 0.71 0.64 0.76 1.08 Table C-1 (cont’d) County Code 56 57 58 59 60 61 62 63 64 65 67 68 69 70 71 72 73 74 75 76 78 79 80 81 82 83 County Name Midland Missaukee Monroe Montcalm Montmorency Muskegon Newaygo Oakland Oceana Ogemaw Osceola Oscoda Otsego Ottawa Presque Isle Roscommon Saginaw St. Clair St. Joseph Sanilac Shiawassee Tuscola Van Buren Washtenaw Wayne Wexford 145 Sample Weight 1.18 1.06 0.73 1.19 1.26 0.72 1.31 1.40 0.81 1.22 1.41 1.15 1.29 0.67 0.68 1.27 1.36 0.71 1.49 0.66 1.16 0.65 0.83 1.42 0.90 1.29 C.2 Mail Survey Respondent Weights A probit response/nonresponse model over the effective sample of 29,613 is run with the mail survey sample weights (Table C-1) and with independent variables from the driver’s license data (age, gender and counties). Variables that are not statistically significant at 90% confidence level are not shown. Table C-2: Results of a Probit Response/Nonresponse Model for the Mail Survey Using Sample Weights Probit without County Dummies Estimates t Statistics 0.0139*** 31.9 0.138*** 8.79 -0.917*** -38.4 Probit with County Dummies Variables Estimates t Statistics Age 0.0138*** 31.3 Gender 0.143*** 9.05 Constant -0.782*** -5.28 Macomb County (Coastal) -0.256* 1.73 Wayne County (Coastal) -0.412*** -2.80 Note: *10% significance level; **5% significance level; *** 1% significance level The results above are suggestive of demographic differences in respondents to the mail survey. To correct for possible response/non-response bias together with the sampling scheme, additional weights for the 9,591 eligible mail survey respondents are computed according to the joint distribution of age, gender and counties, where the base is still the driver license list. There are eight age ranges (16-24, 25-34, 45-54, 55-64, 6574, 75-84 and 85+) and four county categories (Macomb, Wayne, other coastal counties and noncoastal counties). For the category of age 85+, there are only two county categories, coastal and noncoastal counties; otherwise, the number of people in some cells will be smaller than 30, which may have negative impacts on the weighting. 146 Table C-3: Joint Age, Gender and County Distribution of Driver License List* Gender County Male Male Male Male Male Female Female Female Female Female Macomb Wayne Other Coastal Coastal Noncoastal Macomb Wayne Other Coastal Coastal Others Age 1624 0.62% 1.43% 1.31% 3.98% 0.59% 1.36% 1.20% 3.76% Age 2534 0.71% 1.71% 1.45% 4.82% 0.69% 1.54% 1.31% 4.40% Age 3544 0.77% 1.81% 1.45% 4.61% 0.77% 1.64% 1.37% 4.41% Age 4554 0.84% 1.82% 1.73% 4.99% 0.85% 1.72% 1.69% 5.00% Age 5564 0.65% 1.42% 1.50% 4.11% 0.68% 1.46% 1.51% 4.28% Age 6574 0.35% 0.71% 0.91% 2.26% 0.41% 0.82% 0.96% 2.47% Age 7584 0.21% 0.42% 0.50% Age 85+ 1.21% 0.29% 0.59% 0.62% 1.57% 0.73% 0.72% 1.12% 1.15% Table C-4: Joint Age, Gender and County Distribution of 9,591 Eligible Mail Survey Respondents* Gender County Male Male Male Male Male Female Female Female Female Female Macomb Wayne Other Coastal Coastal Noncoastal Macomb Wayne Other Coastal Coastal Others Age 1624 0.34% 0.46% 0.82% Age 2534 0.69% 0.74% 1.26% Age 3544 0.62% 0.82% 1.55% Age 4554 0.97% 1.25% 2.77% Age 5564 1.06% 1.45% 2.93% Age 6574 0.66% 1.06% 2.29% Age 7584 0.33% 0.47% 1.15% 1.42% 0.47% 0.71% 0.94% 2.37% 0.78% 1.14% 1.65% 2.43% 0.90% 1.19% 2.07% 4.08% 1.70% 2.02% 3.29% 4.29% 1.38% 1.91% 3.54% 2.76% 0.65% 1.27% 2.25% 1.23% 0.50% 0.77% 1.24% 1.74% 3.24% 3.45% 5.89% 5.60% 3.43% 1.65% *The distributions use the mail survey sample weights (Table C-1). 147 Age 85+ 0.50% 0.38% 0.90% 0.57% Table C-5: Mail Survey Respondent Weights Gender Male Male Male Male Male Female Female Female Female Female 66 County Macomb Wayne Other Coastal Coastal Noncoastal Macomb Wayne Other Coastal Coastal Others Age 1624 1.79 3.11 1.59 2.81 1.26 1.92 1.28 2.16 Age 2534 1.04 2.31 1.15 2.04 0.88 1.36 0.79 1.36 Age 3544 1.25 2.20 0.93 Age 4554 0.86 1.46 0.62 1.90 0.85 1.38 0.66 1.28 They are normalized to the size of eligible mail survey respondents. 148 1.22 0.50 0.85 0.51 0.85 66 Age 5564 0.62 0.98 0.51 0.96 0.50 0.77 0.43 0.76 Age 6574 0.54 0.67 0.40 0.82 0.64 0.64 0.42 0.72 Age 7584 0.62 0.90 0.44 0.99 0.58 0.76 0.50 0.96 Age 85+ 1.47 1.91 1.25 2.01 C.3 Web Survey Respondent Weights Similarly, before calculating the weights for web survey data, a probit response/nonresponse model is run over the web sample of 5,476. The dependent variable is response/nonresponse to the web survey, and the independent variables are gender, age, race, education and employment, which were reported in the mail survey. The analysis is performed using the mail survey respondent weights (Table C-5). Variables that are not statistically significant at 90% confidence level are not shown. Table C-6: Results of a Probit Response/Nonresponse Model for the Web Survey Using Mail Survey Respondent Weights Variables Estimates t Statistics Age 0.00476*** 2.72 White 0.381*** 2.84 Asian 0.564** 2.34 Some Schooling 4.54*** 15.7 High School or Equivalent 4.73*** 19.9 Associate’s or Technical Degree 4.90*** 20.5 College Degree 5.19*** 21.8 Advanced Degree 5.16*** 21.7 College or Equivalent 0.323*** 7.07 Graduate Degree 0.450*** 7.33 Constant -5.52*** -12.5 Benzie County (Coastal) 1.03** 2.49 Hillsdale County (Noncoastal) 0.842* 1.95 Isabella County (Noncoastal) 0.704* 1.76 Leelanau County (Coastal) 0.797* 1.90 Roscommon (Noncoastal) 0.647* 1.85 Note: *10% significance level; **5% significance level; *** 1% significance level 149 Table C-7: Results of a Probit Response/Nonresponse Model for the Web Survey Using Mail Survey Respondent Weights With Fewer Variables Variable Estimates t Statistics Age 16-24 0.401*** 3.89 Age 25-34 0.338*** 3.57 Age 35-44 0.438*** 4.70 Age 45-54 0.561*** 6.38 Age 55-64 0.751*** 8.47 Age 65-74 0.587*** 6.17 White 0.289*** 3.96 College degree 0.413*** 9.81 Significant counties 0.446*** 3.98 Constant -0.712*** -6.65 Note: *10% significance level; **5% significance level; *** 1% significance level If many factors are taken into account to correct the response/nonresponse bias, the number of people in each elementary cell will be small and the weight will be big, which could inflate variances. Therefore, to reduce the number of factors, we run the following regression. All variables that are not statistically significant in previous regression are dropped. Age dummy variables replace the continuous age variable for the purpose of weighting. There are only 56 Asians in the respondents, so the corresponding variable is not included. For the education, the effects of having a college degree and an advanced degree are very similar, so a new dummy variable is created indicating whether a person has a college degree or not. All county dummies collapse into one where it equals one if a person lives in the five statistically significant counties in Table C-6. Hence, four factors, age, county, race and education, have significant effects on the weights for 2,544 eligible web survey respondents. Since the number of people can be quite small in some categories, the approach of raking weights is used, rather than comparison of joint distributions. The computation is implemented in SAS raking 150 67 macro , and the mail survey respondent weights apply to both the web survey sample and eligible respondents. Only people with no missing data in race and education enter 68 the computation . Table C-8: Raking Weights for Web Survey Respondents with No Missing 69 70 Data (Non-Normalized ) Age Category Age 16-24 Age 16-24 Age 16-24 Age 16-24 Age 16-24 Age 16-24 Age 25-34 Age 25-34 Age 25-34 Age 25-34 Age 25-34 Age 25-34 Age 25-34 Age 35-44 Age 35-44 Age 35-44 Age 35-44 Age 35-44 Age 35-44 Age 35-44 67 Significant County 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 College Degree 0 0 1 1 0 1 0 0 1 1 0 0 1 0 0 1 1 0 1 1 White 0 1 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 0 1 Web Weights 1.60 1.25 1.15 0.89 0.90 0.65 1.60 1.25 1.15 0.89 1.16 0.91 0.65 1.58 1.23 1.13 0.88 0.89 0.82 0.64 It is developed by David Izrael, Abt Associates, June 1999. 68 Missing data could be treated as a separate category; however, the percentage of missing data is too low to make the raking weights converge. 69 Outcomes of the macro are individual-specific when input data has weights, in our case, the mail survey respondent weights. If the outcomes are divided by the input weights, the results are very similar among people in the same age, county, race and education category. Differences come from rounding errors. Therefore, we take averages of those results in the finest category and treat them as the raking weights for web survey respondents. 70 The original outcomes are normalized to the total number of people with no missing data. When we divide them by the input weights, the normalization no long holds. 151 Table C-8 (cont’d) Age Category Age 45-54 Age 45-54 Age 45-54 Age 45-54 Age 45-54 Age 45-54 Age 55-64 Age 55-64 Age 55-64 Age 55-64 Age 55-64 Age 55-64 Age 65-74 Age 65-74 Age 65-74 Age 65-74 Age 65-74 Age 65-74 Age 65-74 Age 75+ Age 75+ Age 75+ Age 75+ Age 75+ Age 75+ Significant County 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 College Degree 0 0 1 1 0 1 0 0 1 1 0 1 0 0 1 1 0 0 1 0 0 1 1 0 1 White 0 1 0 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 1 Web Weights 1.44 1.12 1.03 0.80 0.81 0.58 1.33 1.03 0.95 0.74 0.75 0.54 1.45 1.13 1.04 0.81 1.05 0.82 0.59 3.26 2.53 2.33 1.81 1.84 1.32 For people with missing data in race, we match them according to their age, county and education in Table C-8, and use weighted web weights. For example, a person has Age 16-24 in the age category, 0 in county and 0 in college. Under these criteria, we have 12 non-White people and 106 White people in Table C-8, with a weight of 1.60 and 1.25 respectively. Then the weight of this person is calculated as: 152 The same procedure is applied to people with missing data in education and in both race and education. Table C-9: Raking Weights for Web Survey Respondents with Missing Data Age Category Age 16-24 Age 16-24 Age 16-24 Age 16-24 Age 25-34 Age 25-34 Age 25-34 Age 35-44 Age 35-44 Age 35-44 Age 45-54 Age 45-54 Age 45-54 Age 45-54 Age 45-54 Age 55-64 Age 55-64 Age 55-64 Age 55-64 Age 55-64 Age 65-74 Age 65-74 Age 65-74 Age 65-74 Age 65-74 Age 75+ Age 75+ Significant County 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 College Degree White Web Weights 0 1 1.51 1.15 1.28 0.91 1.04 1.29 0.93 1.02 1.26 0.90 1.21 0.99 1.14 0.82 1.00 0.91 0.65 1.05 0.75 0.92 0.96 0.79 1.14 0.81 0.97 2.19 1.83 0 1 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1 When all 2,544 eligible web respondents have their web survey weights, we multiply them with corresponding mail survey weights, and normalize the products to the size of 2,544, which gives us the final weights for eligible web survey respondents. 153 Table C-10: Distribution of Normalized Final Weights for Web Respondents Final Weight 0.2 to 0.3 0.3 to 0.4 0.4 to 0.5 0.5 to 0.6 0.6 to 0.7 0.7 to 0.8 0.8 to 0.9 0.9 to 1 1 to 1.5 1.5 to 2 2 to 3 3 to 4 4 to 5 5 to 6 Count 5 158 256 425 179 302 109 267 389 259 156 35 3 1 Percent 0.20% 6.21% 10.06% 16.71% 7.04% 11.87% 4.28% 10.50% 15.29% 10.18% 6.13% 1.38% 0.12% 0.04% The big range between individual weights may distort the analysis and inflate the variation. Therefore, we use three censoring rules to trim the weights. The first is ad hoc, keeping the weights between 0.3 and 3; the second range is 0.4 to 2.3, where 163 people are censored on both sides; the third range is 0.37 to 2.45, where approximately 5% of people get censored. Trimmed weights are then normalized to the size of 2,544. The three new sets of weights, as well as the original weights, are applied to eligible web respondents to compare the joint distribution on age, county, education and race with the 71 web sample with mail survey respondent weights . Although some discrepancies exist because of missing data, especially for old people, the differences are very small, so all four types of weights can be used in data analysis to correct for possible sampling and nonresponse bias. The analyses in chapters 2 and 3 use the non-censored weights. 71 There are 87 possible combinations of values in age, county, education and race for the web sample, and 72 for web respondents, because people in some categories did not respond. All missing categories take about 0.6% of the sample, so this is negligible. 154 Appendix D Great Lakes Beach Recreation Participation 155 D.1 Participation in Various Activities The summary of the mail survey data on leisure activities is presented below. The items are presented in the same order that they appeared in the mail survey. The Great Lakes beach question is show in bold in the bottom one-third of the table. 156 Table D-1: Participation in Leisure Activities Participation Rate Eat Dinner at a Restaurant Go for a Walk or a Hike Attend or Participate in Outdoor Sports Swim at a Pool, Lake or River Go to a Movie in a Theater Attend a Music Concert Attend a Cultural or Arts Festival/Fair Visit County, City, or Township Park Visit State Park or State Campground Visit State Forest or State Game Area Visit National Park or National Forest Camping Hunting Fishing Boating Picnicking at Public Parks Visiting a Beach Driving an All-Terrain Vehicle (ATV) Snowmobiling Skiing or Snowboarding Visiting a Beach on the Great Lakes Fishing on the Great Lakes Boating on the Great Lakes Read Books Indoor/Outdoor Exercise Watch Television Use the Internet Play Video Games Play a Musical Instrument Volunteer 97.18% 87.50% 65.59% 64.27% 66.95% 48.50% 59.91% 73.72% 52.09% 25.04% 20.44% 30.75% 15.84% 32.30% 45.96% 45.79% 64.20% 14.20% 6.82% 11.01% 59.14% 14.39% 21.86% 77.48% 82.79% 96.91% 83.56% 21.24% 10.56% 37.22% Participation Rate (Mail Survey Respondent Weights) 97.26% 88.24% 68.17% 68.70% 70.80% 49.69% 59.35% 74.54% 53.76% 25.41% 19.89% 33.56% 16.77% 34.31% 47.93% 46.30% 65.34% 15.81% 7.71% 12.73% 58.01% 14.22% 21.08% 75.29% 83.39% 96.58% 85.82% 26.31% 11.92% 35.53% The three Great Lakes activities have slightly lower participation rates when the weights are applied, which should be the case since coastal counties were oversampled. 157 D.2 Participation in Great Lakes Beach Recreation To investigate what factors influence participation in beach recreation, a probit model is used with mail survey respondent weights. The dependent variable is a binary variable of visiting a Great Lakes beach or not, and the independent variables include demographics and county dummies. Variables that are not statistically significant at 90% confidence level are not shown below. The results illustrate that these kinds of people are more likely to visit Great Lakes beaches: young people and couples with children age 6 to 17; these kinds of people are less likely to: African American, people unemployed and couples with children age under 5. Although most of the education and income categories have negative effects, people with higher education and income are more likely to visit Great Lakes beaches. Also, as expected, people living in coastal counties are more likely to visit Great Lakes beaches than people from noncoastal counties. The only exception is Wayne County, a highly urbanized county. 158 Table D-2: Factors Influencing Participation in Great Lakes Beach Visitation Variable Estimates t Statistics Age -0.00849*** -5.44 Black/African American -0.597*** -4.56 Some Schooling -0.430*** -3.28 High School or Equivalent -0.328*** -5.76 Income: Less than $25,000 -0.432*** -6.34 Income: $25,000 to $49,999 -0.320*** -5.47 Income: $50,000 to $99,999 -0.0859* -1.65 Unemployment -0.227* -1.66 Household: Couple with Children Age 5 and Under -0.162* -1.66 Household: Couple with Children Age 6 to 17 0.138* 1.89 Constant 1.21*** 3.76 Arenac County (Coastal) 1.10** 2.42 Barry County (Noncoastal) -0.5978 -1.79 Benzie County (Coastal) 0.833** 2.26 Berrien County (Coastal) 0.674** 2.42 Cheboygan County (Coastal) 0.832* 1.95 Emmet County (Coastal) 0.824** 2.45 Grand Traverse County (Coastal) 0.543* 1.91 Iosco County (Coastal) 0.617* 1.72 Jackson County (Noncoastal) -0.485* -1.67 Lenawee County (Noncoastal) -0.752** -2.38 Manistee County (Coastal) 1.13*** 3.39 Muskegon County (Coastal) 0.75*** 2.68 Oakland County (Noncoastal) -0.503* -1.93 Oceana County (Coastal) 1.11*** 2.91 Ottawa County (Coastal) 0.654** 2.4 Saginaw County (Noncoastal) -0.493* -1.77 Washtenaw County (Noncoastal) -0.52* -1.93 Wayne County (Coastal) -0.543** -2.1 Note: *10% significance level; **5% significance level; *** 1% significance level 159 Appendix E Model Sensitivity in Chapter 3 160 When the regional dummy variables are added to the traditional model with main destination, the estimated parameters on travel cost and the number of beaches in the aggregated site do not change much, so these two variables are robust to these dummies. The estimated length parameter decreases about 43.8%, and the estimated temperature parameter turns positive with a 122.6% increase. Both variables are sensitive to the regional dummies, which demonstrate that beach quality is correlated with regional geographic characteristics. Table E-1: Parameter Estimates of Main Destination Model with and without Regional Dummies No Regional Dummies Regional Dummies Estimates t Statistics Estimates t Statistics Travel Cost -0.00327*** -6.47 -0.00381*** -6.67 Length 0.283* 1.90 0.159** 1.96 Temperature -0.0602 0.658 0.0136 0.679 # of Beaches 0.0287** 2.25 0.0272** 2.32 LP Northeast -0.853*** -2.83 LP Mid-East -1.67*** -3.66 LP Southeast -2.28*** -4.23 LP Northwest -0.55* -1.84 LP Mid-West -0.566 -1.54 LP Southwest -1.45*** -3.25 UP Lake Michigan -0.941** -2.21 Note: *10% significance level; **5% significance level; *** 1% significance level Variables However, when regional dummy variables are added to multiple sites, they will appear in three different places: in the nest of visiting one site and in both primary and secondary sites in the nest of visiting two sites. When any model with this formulation was attempted, the model estimation would not converge. Thus, we have dropped these regional dummies from the model in chapter 3, and their effects are manifested in part through the estimates for the length and temperature variables. 161 REFERENCES 162 REFERENCES Akiva, M. E. B. and S. R. Lerman (1985). Discrete choice analysis: theory and application to predict travel demand, The MIT press. Boxall, P. C. and W. L. Adamowicz (2002). "Understanding heterogeneous preferences in random utility models: a latent class approach." Environmental & Resource Economics 23(4): 421-446. Burton, M. and D. Rigby (2009). "Hurdle and latent class approaches to serial non-participation in choice models." Environmental and Resource Economics 42(2): 211-226. Caulkins, P. P., R. C. Bishop, et al. (1986). "The travel cost model for lake recreation: a comparison of two methods for incorporating site quality and substitution effects." American Journal of Agricultural Economics 68(2): 291-297. Champ, Patricia A., Kevin J. Boyle, and Thomas C. Brown, eds. A primer on nonmarket valuation. Vol. 3. Springer, 2003. Cutter, W. B., L. Pendleton, et al. (2007). "Activities in models of recreational demand." Land Economics 83(3): 370-381. Deacon, R. T. and C. D. Kolstad (2000). "Valuing beach recreation lost in environmental accidents." Journal of Water Resources Planning & Management 126(6): 374. Englin, J., and Shonkwiler, J. S. (1995). Estimating social welfare using count data models: an application to long-run recreation demand under conditions of endogenous stratification and truncation. The Review of Economics and Statistics: 104-112. Englin, J. and J. S. Shonkwiler (1995). "Modeling recreation demand in the presence of unobservable travel costs: toward a travel price model." Journal of Environmental Economics and Management 29(3): 368-377. 163 English, E. (2008). "Recreation nonparticipation as choice behavior rather than statistical outcome." American Journal of Agricultural Economics 90(1): 186-196. Greene, W. H. and D. A. Hensher (2003). "A latent class model for discrete choice analysis: contrasts with mixed logit." Transportation Research Part B-Methodological 37(8): 681698. Haab, T. C., M. Hamilton, et al. (2008). "Small boat fishing in Hawaii: a random utility model of ramp and ocean destinations." Marine Resource Economics 23(2): 137. Haab, T. C. and R. L. Hicks (1997). "Accounting for choice set endogeneity in random utility models of recreation demand." Journal of Environmental Economics and Management 34(2): 127-147. Haab, T. C. and K. E. McConnell, Valuing environmental and natural resources: the econometrics of non-market valuation, Edward Elgar Publishing 2002. Haener, M. K., P. C. Boxall, et al. (2004). "Aggregation bias in recreation site choice models: resolving the resolution problem." Land Economics 80(4): 561-574. Herriges, J. A. and C. L. Kling (1997). "The performance of nested logit models when welfare estimation is the goal." American Journal of Agricultural Economics 79(3): 792-802. Hilger, J. and M. Hanemann (2006). "Heterogeneous preferences for water quality: a finite mixture model of beach recreation in Southern California." Hoehn, J. P., Tomasi, T., Lupi, F., & Chen, H. Z. (1996). An economic model for valuing recreational angling resources in Michigan. Michigan State University, Report to the Michigan Department of Environmental Quality. Hynes, S., N. Hanley, et al. (2007). "Up the proverbial creek without a paddle: Accounting for variable participant skill levels in recreational demand modelling." Environmental and Resource Economics 36(4): 413-426. 164 Hynes, S., N. Hanley, et al. (2008). "Effects on welfare measures of alternative means of accounting for preference heterogeneity in recreational demand models." American Journal of Agricultural Economics 90(4): 1011-1027. Izrael, D., D. C. Hoaglin, et al. (2000). A SAS macro for balancing a weighted sample. Proceedings of the Twenty-Fifth Annual SAS Users Group International Conference, Citeseer. Kaoru, Y., V. K. Smith, et al. (1995). "Using random utility models to estimate the recreational value of estuarine resources." American Journal of Agricultural Economics 77(1): 141151. Kealy, M. J. and R. C. Bishop (1986). "Theoretical and empirical specifications issues in travel cost demand studies." American Journal of Agricultural Economics 68(3): 660-667. Kim, H. N., W. D. Shaw, et al. (2007). "The distributional impacts of recreational fees: A discrete choice model with incomplete data." Land Economics 83(4): 561-574. Kosenius, A. K. (2010). "Heterogeneous preferences for water quality attributes: The Case of eutrophication in the Gulf of Finland, the Baltic Sea." Ecological Economics 69(3): 528538. Leeworthy, V. R. and United States. National Ocean Service (2005). Projected participation in marine recreation: 2005 & 2010, US Dept. of Commerce, National Oceanic and Atmospheric Administration, National Ocean Service, Special Projects. Lew, D. K. and D. M. Larson (2005). "Accounting for stochastic shadow values of time in discrete-choice recreation demand models." Journal of Environmental Economics and Management 50(2): 341-361. Lew, D. K. and D. M. Larson (2008). "Valuing a beach day with a repeated nested logit model of participation, site choice, and stochastic time value." Marine Resource Economics 23(3): 233. 165 Loomis, J. B., S. Yorizane, et al. (2000). "Testing significance of multi-destination and multipurpose trip effects in a travel cost method demand model for whale watching trips." Agricultural and Resource Economics Review 29(2). Lupi, F. and P. M. Feather (1998). "Using partial site aggregation to reduce bias in random utility travel cost models." Water Resources Research 34(12): 3595-3603. Lupi, F., Hoehn, J. P., & Christie, G. C. (2003). Using an economic model of recreational fishing to evaluate the benefits of Sea Lamprey (Petromyzon marinus) Control on the St. Marys River. Journal of Great Lakes Research 29: 742-754. McKean, J. R., R. G. Walsh, et al. (1996). "Closely related good prices in the travel cost model." American Journal of Agricultural Economics 78(3): 640-646. Mendelsohn, R., J. Hof, et al. (1992). "Measuring recreation values with multiple destination trips." American Journal of Agricultural Economics 74(4): 926-933. Moeltner, K. and J. S. Shonkwiler (2005). "Correcting for on-site sampling in random utility models." American Journal of Agricultural Economics 87(2): 327-339. Morey, E. R., R. D. Rowe, et al. (1993). "A repeated nested-logit model of Atlantic salmon fishing." American Journal of Agricultural Economics 75(3): 578-592. Morey, E.R., J. Thacher, et al. (2006). "Using angler characteristics and attitudinal data to identify environmental preference classes: a latent-class model." Environmental & Resource Economics 34(1): 91-115. Murray, C., B. Sohngen, et al. (2001). "Valuing water quality advisories and beach amenities in the Great Lakes." Water Resources Research 37(10): 2583-2590. NOAA GLERL, 2013. Unpublished data, Great Lakes Coastal Forecasting System. NOAA Great Lakes Environmental Research Laboratory, Ann Arbor, MI, www.glerl.noaa.gov. 166 Owen, A. L. and J. R. Videras (2007). "Culture and public goods: the case of religion and the voluntary provision of environmental quality." Journal of Environmental Economics and Management 54(2): 162-180. Parsons, G. R. and M. S. Needelman (1992). "Site aggregation in a random utility model of recreation." Land Economics: 418-433. Parsons, G. R., A. K. Kang, et al. (2009). "Valuing beach closures on the Padre Island National Seashore." Marine Resource Economics 24(3). Parsons, G. R. and A. J. Wilson (1997). "Incidental and joint consumption in recreation demand." Agricultural and Resource Economics Review 26: 1-6. Patunru, A. A., J. B. Braden, et al. (2007). "Who cares about environmental stigmas and does it matter? a latent segmentation analysis of stated preferences for real estate." American Journal of Agricultural Economics 89(3): 712-726. Provencher, B. and R. C. Bishop (1997). "An estimable dynamic model of recreation behavior with an application to Great Lakes angling." Journal of Environmental Economics and Management 33(2): 107-127. Provencher, B. and R. C. Bishop (2004). "Does accounting for preference heterogeneity improve the forecasting of a random utility model? A case study." Journal of Environmental Economics and Management 48(1): 793-810. Scarpa, R. and M. Thiene (2005). "Destination choice models for rock climbing in the Northeastern Alps: a latent-class approach based on intensity of preferences." Land Economics 81(3): 426-444. Scarpa, R., M. Thiene, et al. (2007). "Latent class count models of total visitation demand: days out hiking in the Eastern Alps." Environmental & Resource Economics 38(4): 447-460. Schuhmann, P. W. and K. A. Schwabe (2004). "An analysis of congestion measures and heterogeneous angler preferences in a random utility model of recreational fishing." Environmental and Resource Economics 27(4): 429-450. 167 Shaw, D. (1988). On-site samples' regression: Problems of non-negative integers, truncation, and endogenous stratification. Journal of Econometrics, 37(2): 211-223. Shaw, W. D. and M. T. Ozog (1999). "Modeling overnight recreation trip choice: application of a repeated nested multinomial logit model." Environmental and Resource Economics 13(4): 397-414. Shonkwiler, J. S. and W. D. Shaw (1996). "Hurdle count-data models in recreation demand analysis." Journal of Agricultural and Resource Economics: 210-219. Shrestha, R. K., A. F. Seidl, et al. (2002). "Value of recreational fishing in the Brazilian Pantanal: a travel cost analysis using count data models." Ecological Economics 42(1): 289-299. Smith, M. D. (2005). "State dependence and heterogeneity in fishing location choice." Journal of Environmental Economics and Management 50(2): 319-340. Song, F., Lupi, F., & Kaplowitz, M. (2010). Valuing Great Lakes beaches. In prepared for presentation at the Agricultural and Applied Economics Association Join Annual Meeting. Staum, P. (2007). "Fuzzy Matching using the COMPGED Function." Proceedings of the 2007 Northeastern SAS. Tay, R., McCarthy, P. S., & Fletcher, J. J. (1996). “A portfolio choice model of the demand for recreational trips.” Transportation Research Part B: Methodological, 30(5): 325-337. Timmins, C. and J. Murdock (2007). "A revealed preference approach to the measurement of congestion in travel cost models." Journal of Environmental Economics and Management 53(2): 230-249. Train, K.E., Discrete Choice Methods with Simulation, Cambridge University Press 2003. Train, K. E. (2008). "EM algorithms for nonparametric estimation of mixing distributions." Journal of Choice Modelling 1(1): 40. 168 Von Haefen, R. H., D. M. Massey, et al. (2005). "Serial nonparticipation in repeated discrete choice models." American Journal of Agricultural Economics: 1061-1076. Von Haefen, R. H. and D. J. Phaneuf (2008). "Identifying demand parameters in the presence of unobservables: a combined revealed and stated preference approach." Journal of Environmental Economics and Management 56(1): 19-32. Weicksel, S. A. Measuring preferences for changes in water quality at Great Lakes beaches using a choice experiment, Master Thesis, Michigan State University. Yeh, C. Y., T. C. Haab, et al. (2006). "Modeling multiple-objective recreation trips with choices over trip duration and alternative sites." Environmental and Resource Economics 34(2): 189-209. 169