THREE ESSAYS IN ENVIRONMENTAL ECONOMICS By Andrew Earle A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Economics – Doctor of Philosophy Environmental Science and Policy – Dual Major 2023 ABSTRACT This dissertation studies the welfare generated by outdoor recreation. Chapters 1 and 2 study a high-profile form of recreation: U.S. national park visitation. Chapter 3 uses innovative data to study a classic topic: the value of water quality. All three chapters apply a two-stage estimation procedure that exploits panel variation in visitation and resource quality within a random utility maximization (RUM) travel cost model, the field’s “workhorse” model. In Chapter 1, I conduct the most comprehensive analysis of demand for the U.S. National Park System to date. I create a versatile and unified framework to analyze demand for 140 national parks throughout the contiguous United States. Combining nationally representative surveys, park- level visitor counts, and a statistical atlas of park attributes, I estimate a RUM model of visitation from 2005 through 2019. The model produces estimates of park awesomeness and explains awesomeness using detailed park attribute data. Iconic parks like Glacier, Yellowstone, and Grand Canyon all rank among the most awesome parks. Visitors prefer parks with charismatic wildlife, like bison, and bald eagles, wide-ranging elevation, and coastline. My second chapter applies the data infrastructure and model from Chapter 1 to analyze how climate change will impact the welfare generated by national park visitation. I estimate visitor preferences for long-run average temperatures and short-run temperature deviations, and I use these preferences to simulate visitor welfare under future climate conditions. Visitors prefer temperatures between 70°F and 85°F, and on average, they dislike cold more than they dislike extreme heat. Assuming limited changes to park resources, I find climate change will likely increase the welfare generated by national park visitation. The overall gains are driven by large benefits in cooler seasons that outweigh the losses from extreme heat in the summer. Chapter 3, co-authored with Hyunjung Kim, blends the modeling and estimation techniques from Chapters 1 and 2 with a high-frequency, administrative park visitation dataset. We quantify the losses from water quality-induced beach closures at Lake St. Clair Metropark in southeast Michigan. Our park visitation data include the residential ZIP code and exact minute of park entry for the universe of visits to the Huron-Clinton Metropark system. Our preferred model estimates a daily panel of park fixed effects and regresses the fixed effects on a beach closure indicator in a second stage. We estimate that the 2022 beach closures caused welfare losses of around $70,000. ACKNOWLEDGEMENTS Without the support of my mentors, colleagues, friends, and family, I would never have completed this dissertation or even dreamed of going to graduate school. My advisor, Soren Anderson, generously devoted his time, energy, and expertise to helping me improve my research. His vision and encouragement contributed massively to this final product. I am grateful for the opportunity to absorb some of his knowledge. Frank Lupi deserves similar praise. His advising, as well as the opportunity to work as his co-author, has made me a much better and more confident economist. I am thankful for many other MSU faculty I have learned from, including Kyoo il Kim, Oren Ziv, Mike Conlin, Justin Kirkpatrick, Todd Elder, Jeff Wooldridge, Stacy Dickert-Conlin, and Joe Herriges. I was fortunate to learn with and from my fellow graduate students, and I will always cherish our lunch table conversations, cookouts, and intramural games. I am also thankful for the opportunity to spend the summer of 2021 at the Property and Environment Research Center (PERC) in Bozeman, Montana. I thank the staff for their support of graduate fellowship program, the faculty for treating me as a colleague, and the other graduate fellows for their feedback on my research and their friendship. My interest in environmental economics originated in Wyoming during an undergraduate summer field course. I am grateful to the instructors, Mandi Lyons and Steve Latta, and my classmates for sharing their passion for the environment. After returning to the University of Pittsburgh, Randy Walsh, Jeremy Weber, Andrea LaNauze, and Katherine Wolfe provided valuable mentorship as I explored my new interest and considered graduate school. Finally, I am grateful for my friends and family. My community at University Lutheran Church has provided valuable perspective and support throughout my time in East Lansing. I am thankful for my parents and brothers who have been a steady source of encouragement and energy for my entire life. My partner, Emma, has supported me unwaveringly and filled my life with joy in good times and bad. No one understands the trials and triumphs I have faced over the past five years better than her. I am also grateful for my grandfather, Carville, who nurtured my curiosity when I was little, and for my family which continues to share memories of him with me. iv TABLE OF CONTENTS CHAPTER 1 VISITING AMERICA’S BEST IDEA: DEMAND FOR THE U.S. NATIONAL PARK SYSTEM ............................................................................................................................. 1 CHAPTER 2 THE WELFARE IMPACT OF CLIMATE CHANGE ON U.S. NATIONAL PARK SYSTEM VISITATION ................................................................................................... 23 CHAPTER 3 VALUING WATER QUALITY WITH HIGH-FREQUENCY DATA: EVIDENCE FROM MICHIGAN BEACH CLOSURES (WITH HYUNJUNG KIM) ............... 40 BIBLIOGRAPHY ........................................................................................................................ 53 v CHAPTER 1 VISITING AMERICA’S BEST IDEA: DEMAND FOR THE U.S. NATIONAL PARK SYSTEM 1.1 Introduction In 1916, the National Park Service was created to conserve America’s most treasured natural resources. More than 100 years later, the National Park Service now manages 424 parks that attract roughly 300 million visits each year. The national parks have also become part of America’s cultural identity. The novelist Wallace Stegner even called them, “the best idea we ever had”. Their unique resources, popularity, and cultural importance make the parks economically significant. The National Park System estimates that visitors spend $20.5 billion in parks and their surrounding communities each year. Recent work by Szabó and Ujhelyi (2021) finds that national parks increase economic development, incomes, and employment, with spillover effects extending beyond the recreation and tourism sector. These economic contributions are one reason why national parks enjoy some degree of bipartisan political support. Despite the importance of the national parks and the visitors they attract, there are large gaps in our knowledge of why people visit. The National Park Service’s internal research is often park- specific and based on infrequent surveys. Efforts to understand visitation at a system-wide level are rare, and they typically say little about specific park resources. This paper analyzes demand for the U.S. National Park System with the goal of understanding preferences for the national parks and their attributes. I create a random utility maximization (RUM) model of visitation for 140 national parks, nearly all those protected for their natural resources, in which individuals repeatedly choose which park to visit and whether to drive or fly. In the model, an individual’s visitation decisions depend on the travel costs of accessing each park and the mean utility provided by each park’s attributes. To allow park mean utilities to vary seasonally, the model includes a full set of park-by-month fixed effects, which I call “park effects." These parameters 1 represent the mean utility of visiting a park after controlling for the travel costs needed to get there and capture all a park’s observable and unobservable attributes. In plain terms, they measure national park awesomeness. I combine three types of data to estimate the model and understand preferences for the parks and their attributes. I obtain individual-level data on national park visitation from nationally representative telephone surveys administered by the National Park Service in 2008-2009 and 2018. I complement these survey data with monthly park visitor counts from the National Park Service’s Visitor Use Statistics. Finally, I consolidate a rich collection of data describing park attributes to build a statistical atlas of the national parks, allowing me to estimate preferences for attributes, such as elevation, infrastructure, and the presence of charismatic wildlife. I introduce a two-step estimation and calibration procedure to combine these data. The first step combines the survey and visitor count data in a maximum likelihood procedure to estimate the model during the survey periods, 2008-2009 and 2018. Using these estimates and the visitor count data, I calibrate a monthly panel of park effects from January 2005 through December 2019. The calibration uses annual American Community Survey microdata to account for demographic changes and calculate time-varying travel costs over the fifteen year period. In the second step, I regress the park effects on a collection of park attributes. For attributes that vary over time, such as weather, the park effects’ panel structure allows me to use fixed effects to control for potential omitted attributes. I find that “bucket list" parks such as Yellowstone, Glacier, and Grand Canyon consistently rank in the top ten of my national parks awesomeness index. Observable park attributes explain 56% of the variation in the index. Visitors tend to prefer parks with redwood forests, bison, bald eagles, wide-ranging elevation, and shoreline. Many of these attributes vary little across time, which poses a challenge for causal inference. Yet, my estimated park effects reveal underexplored seasonal variation, making a causal interpretation more plausible for attributes that vary across time. For parks with harsh winters, willingness to pay peaks in the summer months, while parks with more moderate climates provide more stable mean utility throughout the year. 2 Most of the previous literature on recreation demand for U.S. national parks has focused on single parks or parks in a particular region (Walls, 2022). The limited number of nation-wide studies often focus on the local economic impacts of visitation (Szabó and Ujhelyi, 2021; Cullinane Thomas and Koontz, 2020). There has been little nation-wide research that explores preferences for park attributes using detailed visitation and park attribute data.1 By analyzing demand for 140 parks across the United States over fifteen years, building a structural model of individual visitation, and combining survey, visitor count, and extensive park attribute data, this paper constitutes the most comprehensive analysis of demand for the U.S. National Park System to date. My estimation procedure makes two methodological contributions to the broader recreation demand literature. Recreation demand studies often employ random utility maximization (RUM) travel cost models to estimate the value of recreation sites or environmental attributes. Most applications of RUM models analyze recreation demand for a single season, likely because RUM models are estimated with survey data which is costly to collect. Yet, many interesting and important natural resource changes occur outside survey periods. My estimation and calibration procedure demonstrates how to combine site-level visitor counts with a structural RUM model to bridge gaps between individual surveys. Second, my estimation and calibration approach allows for panel data econometric techniques to be applied within a RUM model. My two-stage procedure is similar to Murdock (2006), except that I estimate a panel of park effects in the first stage to preserve variation across and within parks for the second stage regression. Murdock estimates only a cross-section of park-fixed effects in the first stage, leaving second stage estimates susceptible to omitted variables bias. Lupi et al. (2020) recommend Murdock’s approach as a best practice for RUM travel cost models and also emphasize the need for more rigorous identification in recreation demand studies. My estimation procedure provides a method to improve identification in RUM models through the use of panel data, and it builds on the recreation demand literature’s current best practices. 1 Both Henrickson and Johnson (2013) and Neher et al. (2013), model visitation to parks across the country as a function of park attributes, and Neher et al. uses individual visitation data. However, both papers use a relatively small set of park attributes in their analysis. 3 The paper proceeds as follows. Section 1.2 describes the nationally representative telephone surveys, monthly park visitor counts, and the national park statistical atlas. Section 1.3 outlines how I calculate flying and driving travel costs. Section 1.4 presents the model of national park visitation. Section 1.5 details the two-step estimation procedure. Section 1.6 describes the results, and Section 1.7 concludes. 1.2 Park Visitation and Attribute Data The main data sources for this project describe individual-level visitation, park-level visitation, and physical and institutional attributes of the national parks. This section describes each of these data sources. The individual-level visitation data come from the National Park Service’s Comprehensive Survey of the American Public. The survey conducts telephone interviews with the primary goal of gauging sentiment towards the National Park Service, their management practices, and visitor experiences. The survey lasts approximately fifteen minutes and includes several questions regarding respondents’ visitation history. These questions include the location of each respondent’s most recent national park visit and the number of times they have visited to the National Park System in the two years prior to the interview. For a random subset of sample, I also observe whether respondents drove or flew on their most recent visit. Several characteristics of the Comprehensive Survey of the American Public make it a uniquely useful data source for studying national park visitation. First, it is nationally representative. Phone numbers are selected using a regionally-stratified random sampling design, and individual respon- dents are randomly selected within each household. The data include weights to account for the regional stratification and match sample demographic statistics to the Census, so weighted sample demographics closely match the general population (table 1.1). I use these weights throughout my analysis. The sampling design includes both visitors and non-visitors, which allows me to model the extensive margin – the choice of whether or not to visit a national park. Another useful feature is that the survey was conducted in twice: once in 2008 and 2009 4 Table 1.1: Telephone Survey Descriptive Statistics Unweighted Weighted Census Age 18-29 11.8 21.3 23.6 30-39 13.5 16.3 16.9 40-49 16.7 16.7 18.4 50-59 24.1 20.8 17.5 60-69 18.5 14.3 12.0 70+ 15.1 10.4 11.3 Income Less than $10,000 4.5 6.0 12.6 $10,000 to $25,000 9.5 11.0 15.0 $25,000 to $50,000 20.3 23.2 23.5 $50,000 to $75,000 20.8 22.2 18.9 $75,000 to $100,000 17.3 15.9 13.5 $100,000 to $150,000 15.4 13.1 10.7 Greater than $150,000 12.0 8.3 5.4 Education Some high school 3.5 5.5 High school degree 36.8 46.9 College degree 35.8 32.0 Graduate degree 22.8 14.6 Has child 29.7 35.3 38.4 White, non-Hispanic 75.0 67.9 63.7 Region Alaska 14.1 0.2 0.2 DC only 11.6 0.2 0.1 Intermountain 14.9 14.9 15.1 Midwest 14.6 22.9 22.6 Northeast 15.1 22.9 23.2 Pacific 14.8 16.8 17.3 Southeast 14.7 21.8 21.2 Visited in past 2 years 67.9 61.7 Avg number of visits 9.2 4.7 Flew (Subsample N = 1537) 13.5 12.6 Sample size 6762 6762 Note: The table shows the share of respondents in various demo- graphic groups for the pooled 2008-2009 and 2018 Comprehensive Survey of the American Public survey data compared to statistics from 2010 Census data. Weights are included in the survey and match sur- vey statistics to Census averages. Thus, the weighted variable means align closely with Census means. The unweighted sample tends to be older, richer, and more white, non-Hispanic than the general popula- tion. 5 and again in 2018. The two iterations are similar, with identical questions on visitation history. The seasonal timing of interviews varies slightly between the two iterations. The 2008 and 2009 interviews were split evenly between seasons to account for seasonal variation in visitation. The 2018 survey, citing a lack of seasonality in the 2008 and 2009 data, conducted interviews from June through November. The Comprehensive Survey of the American Public also has limitations. First, I observe respondents’ home locations imprecisely. In the 2008 iteration, the data include each respondent’s telephone area code and their state of residence. When the area code is within the state of residence, I take the largest city in the area code as the home city when calculating travel costs. When I only observe the state of residence, or the area code and state of residence do not match, I randomly sample a home city according to the state’s population distribution. Second, the survey does not include any information on visit dates, only that the visits occurred within two years of the interview. This limits my ability to capture seasonal variation in certain parameters, including the travel cost coefficient. I discuss the implications for estimation in Section ??. Finally, many less visited national parks never appear as a most recent visit for any respondent. This poses challenges for an estimation based on survey data alone. These shortcomings suggest more visitation data is needed to monitor visitation more thoroughly, motivating the use of park-level visitor counts. I use park-level visitor count data from the National Park Service’s Visitor Use Statistics database. The counts have a broad temporal and geographic scope, dating back to 1905 for the oldest parks and covering 383 national parks in recent years. I use counts from January 2005 through December 2019, because this period overlaps closely with the individual-level survey data and the American Community Survey microdata. Counting procedures vary by park and typically involve National Park Service rangers at entry booths and/or strategically placed vehicle counters. Parks use person-per-vehicle multipliers to convert vehicle counts to person counts. Busy peak seasons, available technology, and often remote locations make it difficult to obtain exact counts in some cases. Nonetheless, the Visitor Use Statistics are used administratively and in many academic studies (Henrickson and Johnson, 6 Table 1.2: Park Attribute Data Sources Source Variables USGS National Map Elevation range, mean elevation, trail miles, number of lakes > 40 acres, area of lakes > 40 acres NPS Administrative Data Designation (Park, Lakeshore, Seashore, etc), acreage, coastal, miles of shoreline, species presence 2004 NLCD Share of land by landcover type, mode landcover type, landcover di- versity Census Road miles, population density of overlapping counties NCEI Monthly average high temperature, days with precipitation > 0.1”, monthly ten-year average temperature and precipitation days Note: The table shows data sources for park attributes and variables generated from them. NPS Administrative Data include the NPSpecies database, Annual Acreage Reports, and a 2011 Resource Report on Shoreline length. NCEI data come from weather station-based Global Summary of the Month reports. NLCD - National Land Cover Database, NCEI - National Centers for Environmental Information. 2013; Fisichelli et al., 2015; Keiser et al., 2018). I adjust the raw visitor count data to make them suitable for recreation demand modeling. This process accounts for re-entry, group size, international visitation, and the primary purpose of trips. Each of these variables is included in using on-site surveys conducted by the National Park Service. I use 105 on-site surveys conducted at 69 different parks between 1995 and 2019. For parks that have not conducted an on-site survey, I impute missing information based on observable park attributes. After imputation, I have a park-month panel of re-entry rates, average group sizes, proportions of international visitors, and proportions of primary purpose trips. I use the panel to convert raw visitor counts to the number of primary purpose visits. To understand visitor preferences for park attributes, I consolidate several datasets describing the national parks themselves. Table 1.2 shows the full list of data sources and the variables I generate from them. 1.3 How much does it cost to visit the national parks? This section describes the procedure for computing travel costs. I calculate travel costs at a quarterly frequency for every individual in the nationally representative telephone surveys, as well 7 as every individual in the American Community Survey microdata between 2005 and 2019. I use these microdata to calibrate the model outside the survey period. Travel costs include the time and money required for individuals to access each of the national parks. These calculations largely follow English et al., who also compute driving and flying travel costs at a national scale. To compute driving travel costs, I calculate the driving mileage and time from each respondent’s home location to each national park using PC*Miler. I multiply mileage by the marginal cost of driving, which I calculate with per-mile maintenance costs from annual AAA reports and regional gas prices from the Energy Information Agency. For every twelve hours of driving, I add the average U.S. hotel rate. I also make the standard assumption that the cost of travel time is one-third of a respondent’s hourly wage rate. Flying travel costs include travel time, plus the cost of driving to the origin airport, airfare, and the cost of driving from the destination airport to the park, which may include rental car prices. Quarterly average airfare data come from the U.S. Department of Transportation’s Consumer Airfare Report (2015), which includes the average airfare for flights between city markets, rather than individual airports. I use the 2012 average rental car price from English et al., adjusting for inflation to approximate rental car prices in other years. For each individual-park pair, I compute travel costs for all routes originating at one of the four city-markets closest to the respondent’s home and ending at one of the four city-markets closest to the park. I select the cheapest of these routes as the individual’s travel cost of flying to the park. Figure 1.1 shows the flying and driving travel costs for a subset of the telephone survey sample. Driving travel costs increase approximately linearly with driving-distance, with different slopes for each income bin. On average, flying is more expensive than driving for trips under 1,600 miles but is cheaper at longer distances, matching calculations from English et al. 1.4 A Model of National Park Visitation In this section, I outline a model describing the choices of which national park to visit and how to travel. By jointly modeling the choice of park and the choice of travel mode, I build 8 Figure 1.1: Travel costs increase with distance Note: The figure plots round trip travel costs on one-way driving distance for a three percent subset of the 2008 suvery sample. Brown circle show driving travel costs, and blue x’s show flying travel costs. Lines show average travel costs conditional on distance for both driving (brown-solid) and flying (blue-dashed). On average, flying travel costs increase more gradually with distance. on both the recreation demand literature, which typically focuses solely on location choice, and the transportation literature, which has a rich history modeling travel mode choices (McFadden, 1974).2 The model also shares similarities with work by Chintagunta et al. (2005), which allows for time-varying mean utilities in a model of demand for margarine. Suppose that each month individuals choose whether to visit a national park, which national park to visit, and whether to drive or fly to the park. Denote the set of national parks J = {1, 2, ... } and the set of travel modes M = {⇡, }, where ⇡ and indicate driving and flying, respectively. Let 9 = 0 denote the outside option, each individuals’ best way of spending the month that does not involve visiting a national park. Because visits to the National Park System’s historic sites are included in the data but differ from visits to nature-centered national parks, I group visits to historic sites into a second outside option, 9 = + 1. Given this choice set, let *8 9 > > > > X0C + n80C 9 =0 > > > > > > < X 9C + V)⇠ )⇠8 9 ⇡C + n8 9 ⇡C > 9 2 {1, ..., }, < = ⇡ *8 9 > > > > > X 9C + V + V)⇠ )⇠8 9 C + n8 9 C 9 2 {1, ..., }, < = > > > > > > X +1,C + n8, +1,C 9 = +1 : 8 > > > > > +0C + n80C 9 =0 > > < > ⌘ +8 9 > 9 2 {1, ..., }, < 2 {⇡, } > > > > > >+ +1,C + n8, +1,C 9 = + 1 : In equation refeq:util, coefficient V)⇠ represents the marginal disutility of travel costs, and V represents the fixed cost of flying relative to driving. For 9 2 {1, ..., }, I call the park-by-month fixed effect, X 9C , the park effect. It captures the mean utility provided by park 9 in month C after controlling for travel costs. Ranking the park effects produces a national park awesomeness index. I decompose the park effects further: X 9C = - 9C U + a 9C , (1.3) where - 9C contains observable park attributes; U is a coefficient vector, and a 9C is unobservable. Assume the error term, n8 9 exp(+0C ) > > > > Õ +1 Õ , if 9 = 0 > > exp(+0C ) + ( :=1 =2M exp( 8:=C + > )) _ > > _ > > > > Õ +1 Õ > > +8 9 exp( _ ) ( :=1 =2M exp( _ )) %8 9 > +8:=C exp( _ ) exp(+0C ) + ( :=1 =2M exp( _ )) +8:=C _ > > :=1 =2M > > > > Õ +1 Õ > > > + + > exp( 8, _+1,C ) ( :=1 exp( 8:=C _ )) _ > > =2M > > Õ Õ Õ Õ , if 9 = + 1 > +1 =2M exp( +8:=C ) exp(+0C ) + ( +1 =2M exp( +8:=C )) _ : :=1 _ :=1 _ (1.4) For visit alternatives, the choice probabilities include two terms. The first indicates the prob- ability of visiting a specific park using a specific travel mode, conditional on choosing a visit alternative. The second term indicates the probability of choosing any visit alternative. If an individual chooses not to visit, then they do not select a specific park and travel mode. Thus, the no-visit choice probability has only one term. The literature often refers to the parameter, _, as the dissimilarity coefficient. For consistency with random utility maximization, _ is bounded between zero and one. Higher values of _ indicate more dissimilar alternatives in the visit nest, and _ equal to one simplifies the probabilities to match the conditional logit model. 1.5 A two-step approach to estimate demand This section describes the procedure to estimate demand for national park visitation. My procedure builds on Murdock (2006), who introduces a two-step approach for estimating recreation demand models. My estimation procedure is also similar to the maximum likelihood estimator proposed by Berry et al. (2004), which combines micro and macro-level data to estimate a demand system. 1.5.1 Step 1a: Maximum likelihood estimation To begin, I use the survey data and visitor count data to estimate the parameters in equation 1.1. These include the marginal disutility of travel costs, the relative preference for flying versus driving, 11 and the observable heterogeneity parameters. Because the survey data do not include the date of respondents’ visits, I effectively observe a cross-section of visitation choices for each survey period. Thus, I drop the C subscript from the model and estimate a constant park effect for each survey period in Step 1a. After this initial estimation, I use the monthly visitor counts to recover a panel of park effects in Step 1b (below). I specify a three part likelihood function that incorporates all the visitation choices observed in the survey data — each individual’s most recently visited park, number of visits, and travel mode (when observed). Using the choice probabilities from equation 1.4, the likelihood of observing individual 8’s visitation choices is ! 8 (V, X) = (⇧ 9=0 ⇧<2M %8 98<9< ) (1 %80 ) E 8 (%80 ) 24 1 E 8 H (1.5) | {z } | {z } | {z } (1) (2) (3) The first term represents the likelihood of individual 8’s most recent visit. For this visit, I observe the park visited, and for a subset of respondents, I also observe the travel mode. The second term represents the likelihood of all other visits in the two years prior to the interview, where E8 indicates the number of visits in the past two years excluding the most recent visit. The third term represents the likelihood from all non-visits in the two years prior to the interview. If an individual never visits a national park in the two years prior to their interview, then they choose the no visit alternative for each of the 24 months prior to their interview. When maximizing the log likelihood function, I constrain the visitation shares predicted by the model to match the visitation shares observed in the visitor count data. I impose this constraint by applying the contraction mapping introduced by Berry: X =+1 = X = + ;=(B) ;=( B̂(X = , V)) As the optimization routine iterates over values of V, the contraction mapping solves for the unique vector of park effects, X, that matches the observed and predicted visitation shares. Incorporating the contraction mapping has several practical benefits. First, it allows me to simultaneously incorporate visitation information from the surveys and the visitor counts. Second, the contraction mapping solves for the park effects, so the optimization routine must search only 12 over the remaining non-linear parameters, V, reducing the estimation time. These features of the contraction mapping also allow me to pin down the park effects for parks that are never chosen in the survey data. By estimating a full cross-section of park effects, I control for bias from unobserved park attributes when estimating the remaining non-linear parameters. In industrial organization settings, firms set prices and likely charge a higher price for products with desirable unobservable attributes. The correlation between price and unobserved attributes biases naive estimates, and it has led to the widespread use of instrumental variables when estimating demand systems. In the recreation demand setting, travel costs are not set by firms directly, but unobserved park attributes, such as remoteness, may still be correlated with travel costs. Including a full set of park effects controls for all observed and unobserved park attributes when estimating the travel cost coefficient (Murdock, 2006). This is possible because, unlike prices in the industrial organization context, travel costs vary at the individual-level and can be separately identified from park fixed effects. Geographic sorting remains an identification concern (Parsons et al., 2021). Individuals who value national parks may choose their residential location to reduce their travel costs. If individuals with low travel costs value national parks more highly than those far away and would visit more often, even conditional on travel costs, then the marginal disutility of travel cost will be overstated. This bias would overstate the value of money relative to park attributes and subsequently bias willingness to pay estimates towards zero. Few travel cost papers address potential bias from sorting, and because of the limited use of travel cost methods at a national scale, the magnitude of the potential bias is unclear. 1.5.2 Step 1b: Calibrating a monthly panel of park effects With estimates of the non-linear parameters, V, in hand, I now recover a monthly panel of park effects by applying the contraction mapping month-by-month, from January 2005 through December 2019. Calibration outside the survey period raises several concerns. Population demographics may change meaningfully over the fifteen-year sample period. To account for this possibility, I calibrate the 13 model using annual American Community Survey (ACS) microdata samples from 2005 to 2019 (Ruggles et al., 2021). The microdata contain key demographic variables, such as income, family structure, and age, and they are nationally representative. Both these features are critical for using survey-based estimates from step 1a to predict visitation shares for the ACS sample. The calibration procedure also requires assumptions on the stability of the non-linear parameters across time. In this paper, I assume the non-linear parameters are constant across the entire fifteen- year calibration period. While this is not necessary, the assumption has empirical justification. In a preliminary robustness check, I allow the non-linear parameters to vary in the 2008 and 2018 survey period and obtain similar estimates. In a similar model of recreational marine fishing, Dundas and von Haefen (2020) allow travel cost coefficients to vary annually and obtain fairly stable estimates from 2004 through 2009. Given these assumptions, I calculate individual choice probabilities for each individual in the ACS microdata samples. These choice probabilities imply predicted visitation shares for each park in each month. Recall, the visitor count data also have a monthly panel structure. Beginning with January 2005, I apply the contraction mapping to obtain the unique vector of park effects that matches the predicted and observed visitation shares. Iteratively applying the contraction mapping month-by-month produces a full panel of park effects through December 2019. The key insight is that applying the contraction mapping to solve for park effects does not require individual-level choice data. Instead, one only needs an estimate of the non-linear parameters, a reasonable microdata sample, and observed visitation shares. 1.5.3 Step 2: Estimating preferences for park attributes In step 2, I estimate equation 1.3, which explains park effects as a function of park attributes. Because step 1a produces a panel of park effects, panel data econometric techniques, such as difference-in-differences and event studies can be implemented to rigorously identify preferences for park attributes. I explore a variety of park attributes, so I do not use one of these classic causal inference techniques. Nonetheless, my approach offers a promising method for blending modern 14 causal inference tools with recreation demand modeling. Applying panel data econometric techniques within the structural model has several benefits. In a reduced-form regression, attribute changes at one park may cause visitors to substitute a visit with another park, biasing estimates. The structural model of park choice controls for the quality of substitute parks when estimating park effects. Therefore, spillovers do not bias estimates. The structural model also provides a framework for calculating welfare impacts. I recover preferences for a broad range of park attributes using a correlated random effects model with a Mundlak device. Some attributes, such as elevation and wildlife presence, do not vary meaningfully across my fifteen-year period of interest. Other attributes, such as temperature, do vary dramatically across parks and across time. While including a flexible set of fixed effects (e.g., park or park-by-season) has the attractive property of controlling for unobserved attributes that are constant across time, the fixed effects would subsume preference for time-invariant attributes. The correlated random effects framework solves this issue. For attributes that vary over time, it recovers identical estimates to a fixed effects approach, and it preserves cross-sectional variation to recover preferences for time-invariant attributes (Mundlak, 1978; Wooldridge, 2019). Specifically, I estimate X 9C = - 9 U0 + - 9C U1 + -̄ 9 B U2 + a 9C , (1.6) where - 9 includes time-invariant attributes, - 9C includes time-varying attributes, and -̄ 9 B is the mean of time-varying attributes at park j in season s. 1.6 Preferences for the U.S. National Parks Table 1.3 reports estimates of travel cost, travel mode, and heterogeneity coefficients for several specifications of equation 1.1. For all specifications and all income groups, the travel cost coefficient is negative and significantly different from zero. Individuals with lower incomes are more sensitive to travel costs, consistent with a diminishing marginal utility of income. Preferences for travel mode differ meaningfully by income group, but on average, all prefer 15 Table 1.3: Estimates 2008 and 2018 Survey Periods (1) (2) (3) Travel cost ($100) -0.251 -0.250 -0.251 ( 0.016) ( 0.015) ( 0.016) Fly -0.945 -1.132 -1.131 ( 0.087) ( 0.109) ( 0.109) TC x income < $25,000 -0.277 -0.328 -0.326 ( 0.022) ( 0.027) ( 0.027) TC x income > $100,000 0.127 0.120 0.119 ( 0.009) ( 0.009) ( 0.009) Flying x income < $25,000 0.914 0.917 ( 0.178) ( 0.177) Flying x income > $100,000 0.351 0.241 ( 0.131) ( 0.138) Flying x parent 0.568 ( 0.117) Dissimilarity coefficient 0.657 0.662 0.654 ( 0.042) ( 0.041) ( 0.042) Note: The table shows estimates of the travel mode, travel cost, and heterogeneity coefficients in equation 1.1 with standard er- rors in parentheses. Income interaction coefficients are relative to the middle income group. driving to flying. The low income group is willing to pay $37 extra per-household, per-trip to drive rather than fly, while the middle and high income groups are willing to pay a premium of $453 and $674 to drive. One explanation for the driving premium is flexibility, as driving allows groups to adjust their schedule and add side trips. Higher income groups may be willing and able to pay for this flexibility. Although, the driving premium also reflects factors I do not include in travel costs, such as airport parking fees, the risk of flight cancellations, or the fuel efficiency, reliability, and comfort of respondents’ vehicles. The dissimilarity coefficient is between zero and one, implying the nested logit model is consistent with utility maximizing behavior (McFadden, 1979). I use estimates from the model in column 1 when calibrating the panel of park effects, but I plan to use results from a model with more detailed heterogeneity in the future. Figure 1.2 shows estimated monthly park effects for two parks: Glacier NP and Great Smoky 16 Mountains NP. The park effects should be interpreted relative to the “no visit" alternative, which is normalized to provide a zero mean utility each month.3 The consistently negative park effects indicate that potential visitors prefer the “no visit" alternative to visiting a specific park, even when that park has zero travel costs. In the context of the model, individuals will only choose to visit a park if it has a large, positive error term draw. In interpreting this result, it is helpful to recall that the “no visit" alternative encompasses all ways to spend a month that do not involve visiting a national park. The estimated park effects also represent mean utilities for both visitors and non-visitors. They are also sensitive to the specified number of choice occasions and the market size. Assuming fewer choice occasions or a smaller market size raises park visitation shares and increases park effects relative to the “no visit" alternative. Figure 1.2: Travel costs increase with distance Note: This figure plots park effect estimates for Great Smoky Mountains NP (solid-brown) and Glacier NP (dashed-blue) in 2018. Both exhibit seasonal variation that has been largely overlooked in RUM recreation demand models. Glacier’s park effects exhibit dramatic seasonal variation, peaking in the summer and collapsing in the winter. Converting the seasonal differences to dollar terms, potential visitors are willing to 3 Note that one can easily change the interpretation of the park effects by taking the residual from a regression of the park effects on month-of-sample fixed effects. After this revision, park effects can be interpreted relative to other parks, rather than an outside option that varies across time. 17 pay $1,032 more to visit Glacier in July rather than January. Great Smoky Mountains displays a flatter peak period and a less extreme winter decline. Similar patterns at other parks suggest that climate and weather drive seasonal variation in park effects. Results table 1.4 shows how park attributes impact park effects. Unshaded variables do not vary meaningfully between 2005 and 2019, either due to data availability or geophysical processes. They are identified with only cross-sectional variation, while variables in the shaded rows leverage within-park variation. Conditional on other observable attributes, visitors are willing to pay more to visit parks with redwood forests, bison, bald eagles, coastline, more roads, more trails, and large elevation ranges. Population density in surrounding counties is also positively correlated with the park effects. This reflects amenities nearby parks, such as restaurants, hotels, and other attractions, but the estimate is likely biased upward, because desirable, unobserved park attributes attract visitors and generate local economic impacts. The land cover coefficient estimates suggest that visitors appreciate barren land, a category that includes rock and sand, more than other land cover types, such as forest, wetland, and grassland. Willingness to pay is lower for parks with grizzly bears and those with more diverse land cover, measure using a standardized Herfindahl-Hirschman Index. Estimates described in this paragraph should be interpreted with caution, because they are identified with cross-sectional variation. Nonetheless, they provide the most extensive, revealed-preference evidence to date regarding what attracts visitors to U.S. national parks. I use within-season variation at each park to identify coefficients on time-varying attributes. More rainy days in a month, both historically and contemporaneously, decreases willingness to pay, while park acreage changes have minimal impact. The national park designation coefficient reflects the impact of switching a park’s designation to “national park" from one of the various other designations. Common wisdom suggests redesignating units with the official national park designation will increase their visibility and attract more visitors. This has even been proposed as a method to reduce crowding at other parks, by making substitutes more appealing. My estimate of the redesignation effect suggests that an official national park 18 Table 1.4: Preferences for Park Attributes Coefficient WTP Redwoods present 0.872* 347 (0.508) Bison present 0.296 118 (0.268) Bald eagles present 0.152 60 (0.147) Coastal 0.087 35 (0.262) Elevation range (1000 ft) 0.052* 21 (0.031) Land cover share: barren land 0.020** 8 (0.007) Trail miles (10 miles) 0.017** 7 (0.005) Nearby population density (100 per sq mile) 0.008** 3 (0.002) Road miles (10 miles) 0.002 1 (0.004) Land cover share: shrub/scrub 0.003 1 (0.003) Lake acreage (100 acres) 0.000 0 (0.000) Acreage (10k) 0.001 0 (0.002) Trail miles x elevation range 0.000 0 (0.001) Precipitation days -0.005** -2 (0.001) Land cover share: grassland -0.009* -3 (0.004) Average precipitation days -0.013** -5 (0.006) Land cover share: emergent wetland -0.014 -6 (0.010) Land cover share: mixed forest -0.016** -7 (0.006) National Park designation -0.027* -11 (0.014) Coastal x elevation range -0.070 -28 (0.093) Land cover diversity (standardized) -0.311** -124 (0.110) Grizzly bears present -0.806** -321 (0.388) R-squared: 0.557 * - Significant at 90% Level, ** - Significant at 95% Level. Estimates for shaded variables are equivalent to estimates from a model including park-by-season fixed effects. Unshaded variables use only between- park variation. Flexible temperature controls are also included. Will- ingness to pay (WTP) is calculated by dividing each attribute coeffi- cient by the travel cost coefficient for the middle income group and multiplying by 100. 19 designation has little impact on the willingness to pay for a visit. Importantly, my estimate is identified from only three redesignations (Pinnacles, Gateway Arch, and Indiana Dunes) that occurred between 2005 and 2019. When analyzing a broader set of redesignations, Szabó and Ujhelyi (2021) find that an official national park designation does increase visitation. Although our studies vary methodologically, the discrepancy in our estimates is likely from my more limited sample. This suggests that the impact of redesignations may vary substantially by park. In short, redesignations do not seem to be an all-powerful shortcut for attracting visitors. Even with this broad array of park attributes and temperature controls, roughly 44% of the variation in park effects remains unexplained. Given the unique resources the parks protect, this is not surprising. It is difficult to estimate the value of iconic park attributes, such as Arches’ arches or Yellowstone’s Old Faithful geyser, which are often idiosyncratic and, famously, remain largely unchanged over time. By capturing mean utilities after controlling for travel costs, monthly park effects provide a national park awesomeness index. Table 1.5 shows the implied ranking for 2018 based on national parks’ maximum park effect throughout the year. I convert the park effects to a 100-point scale. The maximum park effect between 2005 and 2019 scores 100 and the minimum scores 0. This ranking method offers an attractive alternative to rankings from the popular media that are typically based on travel bloggers’ personal experiences or raw visitation counts. Unlike experience-based rankings, my ranking is systematic and incorporates the visitation history of the entire U.S. population. Unlike rankings based on raw visitor counts, my ranking controls for the travel costs of reaching a park to isolate the appeal of the park itself. The top ten ranking includes many of the most famous national parks, such as Glacier, Yellow- stone, and Grand Canyon. One surprising results is that Golden Gate National Recreation Area tops the list. Golden Gate provides views of the famous Golden Gate Bridge, beaches hiking trails, and popular attractions like Alcatraz Island, but for several reasons, its ranking is likely inflated. Although the model controls for the travel costs of accessing each park, it does not control for complementary destinations near a park. Visitors to Golden Gate likely visit other Bay Area 20 Table 1.5: Most Awesome National Parks Rank Park Rating 1 Golden Gate RA 97.4 2 Glacier 93.9 3 Yellowstone 92.9 4 Grand Canyon 92.5 5 Grand Teton 91.9 6 Mount Rainier 91.4 7 Acadia 91.0 8 Rocky Mountain 90.9 9 Olympic 90.6 10 Zion 90.5 Note: The National Park awesomeness index combines visitation and travel cost data to rank parks by the mean utility they provide visitors. The ranking reflects each parks maximum park effect throughout 2018. attractions on the same trip, while Glacier NP, for example, has fewer convenient complementary attractions. Furthermore, local residents may visit Golden Gate several times per month, or even several times per week. While my assumption that visitors take at most one trip per month-long choice occasion may be appropriate for most people and most parks, it is likely too coarse for local residents. Golden Gate’s proximity to the Bay Area means there are many local residents that may visit frequently and bias its park effect upward. 1.7 Conclusion This paper conducts the most comprehensive analysis to date of demand for U.S. national parks. The results describe preferences for the national parks and their attributes. On average, potential visitors are willing to pay $376 more to drive instead of fly to a park, even with identical driving and flying travel costs. Visitors are willing to pay more to visit parks with iconic wildlife, wide-ranging elevation, and coastline, and preferences vary dramatically across seasons, particularly at parks with harsh winters. I produce a national parks awesomeness index that provides a systematic alternative to existing rankings and controls for the travel costs of accessing each park. It produces largely intuitive results, ranking many of the most iconic parks in the top ten. Observable park attributes explain 56% of 21 the variation in the index, meaning idiosyncratic, unobservable, or difficult to quantify attributes play an important role in driving visitation. My model, data infrastructure, and estimation procedure are valuable tools for studying the national parks and recreation demand more broadly. The estimation procedure provides a method of controlling for changing travel costs and demand system spillovers. It filters visitor count data through a structural model. This preserves the panel structure of the visitor count data, which is useful for identification, while the model provides the structure for welfare analysis and counterfactual simulations. It also provides a technique for bridging gaps in individual-level survey data. These advances make the framework relevant for policy and management decisions throughout the National Park System, such as crowding, the impacts of climate change, and potential infrastructure investments. This is particularly important given recent legislative actions, which provide new resources for the continued conservation of the country’s most treasured resources. 22 CHAPTER 2 THE WELFARE IMPACT OF CLIMATE CHANGE ON U.S. NATIONAL PARK SYSTEM VISITATION 2.1 Introduction For over a century, a core mission has guided the National Park Service: “preserve unimpaired the the natural and cultural resources and intrinsic values of the National Park System for the enjoyment, education, and inspiration of this and future generations" (Org, 1916). Steadfast dedication to this mission is one reason the national parks have amassed over 15 billion visits, become iconic global landmarks, and been dubbed “America’s Best Idea." Yet, climate change poses a fundamental challenge for the national parks. Warming temperatures, sea level rise, drought, and an increased frequency of extreme weather and wildfire make preserving the national parks unimpaired increasingly difficult. This paper evaluates how climate change will impact the welfare generated by national park visitation. Adapting the model from Chapter 1, I estimate preferences for long-run average tem- peratures and short-run temperature deviations within a random utility maximization (RUM) travel cost model. My empirical strategy identifies preferences for long-run average temperatures using within-season variation park average temperatures, and I allow preferences for short-run temper- ature deviations to vary across average temperature bins. Using these estimated preferences and projected climate and weather conditions, I then simulate national park visitor welfare under climate change. Abstracting from closures and changes to park resources, I find that climate change will likely increase the surplus generated by national park visitation. When simulating future welfare under a moderate climate projection, average annual total welfare from 2040 to 2049 is $600 million greater than from 2010 to 2019. Beneath this overall increase, the change in welfare varies substantially by season. Welfare decreases in the summer months due to the warming of already hot temperatures, 23 but these losses are offset by large welfare gains from warming cooler months. Strong preferences against cold temperatures drive the welfare results. Willingness to pay (WTP) is maximized at long-run average temperatures between 70 F and 85 F, and cold tempera- tures reduce WTP much more than extreme heat. Relative to the ideal temperature range, visiting a park when the long-run average temperature is 30 F reduces average household per-trip WTP by $503, while visiting when the temperature is 95 F reduces WTP by just $107. Preferences for short-run deviations also suggest gains from warming temperatures. Although, WTP for favorable short-run temperature deviations is roughly five times smaller than for equiv- alent changes in long-run average temperatures. Positive temperature deviations increase WTP at temperatures below 80 F. I do not find a significant negative impact of warmer-than-average months at hotter temperatures. However, my estimates have large standard errors in this range, so I cannot rule out negative impacts. I contribute to a growing literature studying the nonmarket impacts of climate change. Previous research in this space has explored how climate change will impact crime, mortality, and other aspects of human health (Hsiang et al., 2017; Carleton et al., 2022; Deschenes et al., 2009). Many of these papers exploit short-run temperature deviations as plausibly exogenous temperature variation. While increased variability of short-run temperature deviations is one aspect of climate change, this literature often abstracts from changes in long-run average temperatures. Motivated by Bento et al. (2020), I exploit within-season variation to estimate the impacts of both long-run average temperatures and short-run deviations. My results, that visitors have strong preferences for long-run average temperatures, suggest that the existing literature’s focus on short-run deviations may omit an important aspect of climate change. Several other papers study how climate change will impact recreation demand. Almost all of these papers focus on specific recreational activities or geographic regions. Given their different contexts, they produce mixed results on the overall impact of climate change on recreation. Chan and Wichman’s study of cycling predicts welfare gains, while Parthum and Christensen (2022) and Dundas and von Haefen (2020) predict welfare losses for skiing and marine fishing. In a more 24 broadly focused study, Chan and Wichman (2022) use short-run temperature deviations to study eight outdoor activities using time-use diaries from across the United States. Their results suggest climate change will produce aggregate welfare gains of at least $5 billion for these activities. The lack of consensus surrounding welfare impacts of climate change on recreation makes this study valuable. My setting includes recreation sites from a broad geographic range, and my outcome of interest, park visitation, subsumes activities, like angling, hiking, and cycling, that have previously been studied in isolation. By studying a range of sites and a more general activity, my findings provide important evidence regarding the overall welfare impact of climate change on outdoor recreation. Dundas and von Haefen are the only other paper to quantify the welfare impacts of climate change on outdoor recreation using a random utility maximization framework. I extend their methodological contribution by allowing climate and weather to influence both the participation and site choice decisions. In Dundas and von Haefen’s model, temperature only influences the participation choice. This contribution is critical in my setting, where climate and weather vary dramatically across parks in the choice set. Allowing temperatures to impact the site choice decision is likely important in many other recreation demand settings as well. For example, a site with swimming opportunities may provide more enjoyment at high temperatures than a site without water-based recreation. Fisichelli et al. (2015) also study the impact of climate change on U.S. National Park System visitation.1 They regress monthly visitation on temperature at 340 national parks and find the highest visitation at temperatures between 63 F and 77 F. They predict an 8 to 23% increase in system-wide visitation by mid-century, driven by increased visitation in off-peak seasons. My work builds on their analysis by examining welfare impacts, accounting for inter-park substitution using a discrete-choice framework, and separately identifying the impact of average temperatures and temperature deviations. The remainder of this paper is organized as follows. Section 2.2 introduces the model. Sec- 1 Several papers, such as Henrickson and Johnson (2013), include more limited discussion of temperature and national park visitation. 25 tion 2.3 discusses the visitation, climate, and weather data. Section 2.4 provides the details regarding the welfare simulation. Sections 2.5 discusses estimated preferences for temperature. Section 2.6 presents simulated welfare impacts, and Section 2.7 concludes. 2.2 Model In Chapter 1 of this dissertation, I introduce a model of individuals’ national park visitation decision. The model in this section differs only in how I decompose the park-month fixed effects. The Chapter 1 model focuses on explaining the park-month fixed effects using a suite of observable park attributes. Here, I include a flexible set of fixed effects that subsume most park attributes, and I focus on how temperatures explain variation in the park-month fixed effects. Suppose that each month individuals choose whether to visit a national park, which park to visit, and whether to drive or fly on their visit. Denote the set of national parks J = {1, 2, ... } and the set of travel modes M = {⇡, }, where ⇡ and indicate driving and flying. Let 9 = 0 denote the outside option, which is each individual’s preferred way of spending a month that does not involve visiting a national park. I group visits to the National Park System’s historic units as alternative 9 = + 1. Define the utility individual 8 receives from visiting national park 9 using travel mode < during month C as 8 > > > > > X0C + n80C 9 =0 > > > > > > < X 9C + V)⇠ )⇠8 9 ⇡C + n8 9 ⇡C > 9 2 {1, ..., }, < = ⇡ *8 9 > > > > > X 9C + V + V)⇠ )⇠8 9 C + n8 9 C 9 2 {1, ..., }, < = > > > > > > X +1,C + n8, +1,C 9= +1 : . Coefficient V)⇠ represents the marginal disutility of travel costs, and coefficient V represents the fixed cost of flying relative to driving. For 9 2 {1, ... }, the park-month fixed effect, X 9C , captures the mean utility provided by a park after controlling for travel costs. In plain terms, the park-month fixed effects capture the awesomeness of each national park in each month. 26 Chapter 1 describes the relevant details for estimation of equation 2.1. I estimate the parameters via maximum likelihood, using a contraction mapping to incorporate individual and park-level visitation data. I then calibrate the model to produce a monthly panel of park fixed effects from January 2005 through December 2019. I decompose the park-month fixed effects as ’ ’ X 9C = (U1 +⌧ 1(C4< ? 9C 2 1)) + (U1⇡⇢+ C4< ? ⇡⇢+ - 9C 1(C4< ? 9C 2 1)) + U - 9C + W 9 B(C) + qC + a 9C 1 1 (2.2) The primary variables of interest in equation 2.2 are C4< ? and C4< ? ⇡⇢+ . The variable C4< ? represents the average temperature at a park over the past ten years in a given calendar month (e.g., the average temperature at Yellowstone in May over the previous ten years if 9 = “Yellowstone" and C corresponds to the month of May). The variable C4< ? ⇡⇢+ represents the deviation from C4< ¯? that occurs at a park in a given month (i.e., how much warmer or colder than average is the park). My specification allows for a flexible relationship between temperature and the park-month fixed effects by estimating a separate C4<¯ ? coefficient for 5 F bins (denoted with the 1 subscript). It also allows preferences for deviations to vary by average temperature. This allows individuals to prefer warmer-than-average temperatures when temperatures are typically cold and cooler-than-average temperatures when temperatures are typically hot. The set of control variables, - 9C , includes the number of days with precipitation. Just like for temperature, I define ten-year moving average and deviation variables. For parsimony, I do not specify a non-linear relationship between precipitation and park-month fixed effects. Equation 2.2 also includes month-of-sample fixed effects (qC ). These parameters capture system-wide shocks to national park mean utilities, and they influence the interpretation of park- month effects across time. The estimation procedure, outlined in Chapter 1, normalizes the mean utility from the outside option to zero in each month. This implies that all park-month fixed effects should be interpreted relative to their month’s outside option. If the quality of the outside option changes over time, it complicates cross-month comparisons of park-month effects. The month- 27 of-sample fixed effect absorbs variation in the quality of the outside option, allowing for a more natural cross-month comparison of park-month effects. The park-season fixed effects in equation 2.2 (W 9 B(C) ) play a critical role in identifying preferences for temperature. These parameters control for all observed and unobserved park characteristics constant throughout a season, such as park programs or tours, which tend to be more active in peak months. Including these fixed effects leaves within-season variation at each park to identify preferences for temperature. For example, estimation will attribute variation in Yellowstone’s park- month fixed effects between March’s, April’s, and May’s to variation in average temperatures and temperature variations, after controlling for system-wide shocks and precipitation. Park attributes that vary within a season, are correlated with temperatures, and influence visitation, still pose threats to identification. Consider events like fall foliage viewing, which attracts visitors and occurs for a limited portion of the fall season. Seasonal road closures due to heavy snow, which are common at high-elevation parks, are also correlated with temperature and typically occur in the fall season then re-open in the spring. These are just two possible examples that complicate a causal interpretation of the temperature coefficients, especially the average temperature coefficient. Despite these concerns, I argue that this specification isolates relevant variation for understand- ing climate impacts. For example, the National Park Service has already documented evidence of springtime conditions (e.g., trees gaining their leaves) occurring earlier in the season. Thus, in the coming decades, national parks in March may experience average temperatures similar to how they currently do in April. So while within-season variation may not isolate the impact of temperature alone on visitation, to some extent, it captures both the impact of temperature on visitor comfort and the impact of temperature on park management and ecology. 2.3 Visitation, Weather, and Climate Projection Data I observe park visitation using individual-level survey data and park-level visitor counts. The survey data come from the National Park Service’s Comprehensive Survey of the American Public, 28 a nationally representative telephone survey administered in 2008 and 2018. The survey contains several variables describing each respondent’s national park visitation history: the park they visited most recently, the number of times they visits national parks in the two years prior to the interview, and whether the respondents drove or flew on their most recent visit. Respondents also report their state of residence, allowing me to compute the travel costs of reaching any of the national parks. One strength of the Comprehensive Survey of the American Public is that it includes both visitors and non-visitors. This feature is rare for a national survey of recreation demand. Unfortunately for my analysis, the survey does not include the date or timing of respondent’s visits, limiting my ability to identify preferences for climate and weather from the survey data alone. I obtain park-level visitor counts from the National Park Service’s Visitor Use Statistics. The visitor counts are published for 383 of the over 400 national parks at the monthly level. I focus on months between 2005 and 2019, which overlap the telephone survey periods. Chapter 1 describes the survey and visitor count data in more detail. To understand how climate and weather impact visitation, I collect park temperature and precipitation variables from the Global Historical Climatology Network’s Global Summary of the Month datasets (Lawrimore et al., 2016). These data document temperature and precipitation observations collected by weather monitoring stations. I extract two monthly variables for each station: mean daily high temperature and the number of days with more than 0.1 inches of precipitation. Parks often have several weather stations in their vicinity. For each park, I select the nearest station with less than 25% of months missing data as the representative station. If a park has multiple stations within its boundaries that meet the completeness criteria, I select the station with the most complete data as the representative station. On average, representative stations are 5.2 miles from the park they represent. When representative stations are missing data, which occurs for 10% of the station-months, I predict missing temperature and precipitation variables using observations from nearby stations. To characterize long-run temperature and precipitation, I calculate the ten-year average of these 29 two variables for each month of the year at each park. For example, I calculate the average daily high temperature in Yellowstone National Park over the ten previous Aprils. With contemporaneous monthly variables and averages in hand, I calculate the deviation from monthly averages at each park in each month. The average temperature, average number of precipitation days, deviation from average temper- ature, and deviation from average precipitation are the weather variables in my model. Roughly, average temperature and precipitation reflect the weather a visitor could expect to observe at a park in a certain month. This expected weather is relevant for people planning their trip more than a few weeks in advance. The deviation variables capture short-run weather events, like heatwaves and cold snaps, that are not easily foreseen weeks before a visit. I calculate the same climate and weather variables for future conditions using downscaled CMIP5 Climate Projections (Bureau of Reclamation, 2013). There are dozens of CMIP5 climate projections available. For now, I use the Community Earth System Model Contributor’s projection for representative concentration pathway (RCP) 4.5, which assumes society makes moderate emis- sions reductions. Even within RCP’s, climate projections differ, so I intend to incorporate several climate projections in future work. Unlike the weather station data, climate projections are gridded products that provide predictions every 1/8th degree of latitude and longitude (around eight miles in the contiguous United States). I select one grid point to represent each park. For grid points within 0.5 degrees of the park, I compare existing weather observations at the grid points to observations at the park’s representative weather station. I select the grid point with the most similar weather as the park’s representative grid point. By selecting one representative station and grid point, I abstract from intra-park variation in weather, which is substantial in some cases. An alternative method would be to average station observations or use a gridded product and average points within each park. I prefer using a representative station for two reasons. First, weather stations are often located near visitor centers or gateway communities. Both are heavily trafficked by park visitors, meaning the weather observed 30 by stations is often relevant to visitor decision-making. Second, parks with wide-ranging weather conditions often have rugged terrain and expansive backcountry that are sparsely visited. A technique that averages grid points or stations is more likely to be influenced by these backcountry locations, which experience substantially different weather than more highly visited areas. 2.4 Calculating Welfare Impacts I simulate the welfare impacts of changes in temperature and precipitation in two steps. First, I predict a monthly panel of park effects under climate projection forecasts. Then, I calculate the welfare change between current park effects and climate change park effects. I begin by predicting a monthly panel of park effects under future climate projections. I denote the predicted park effect as ’ ’ X̂ 9C = ( Û1 +⌧ 1(C4< ? 9C 2 1))+ ( Û1⇡⇢+ C4< ? ⇡⇢+ - 9C 1(C4< ? 9C 2 1))+ Û - 9C + W̄ 9 B(C) + q̄C (2.3) 1 1 The prediction depends on temperature and precipitation under climate change, estimated preferences for temperature and precipitation, and the park-season and month-of-sample fixed effects. Temperature and precipitation variables come from future climate projections, and I use parameter estimates for temperature and precipitation coefficients directly from my estimation. While the model estimation produces estimates of park-season fixed effects and month-of-sample fixed effects from 2005 to 2019, these parameters capture residual variation After predicting a monthly panel of park effects, I calculate the compensating variation (CV) of national park visitation under the climate and weather conditions in month C relative to 2010 conditions. 1 ⇠+8 ( X̂C ) = (⇢* ( X̂C ) ⇢* ( X̂2010 )), (2.4) V)⇠ 8 where ⇢* represents the expected utility of a choice occasion and is given by 31 ’’ +̂8 9 B43 9C %>BC⇠;>BDA4 9C + V - - 9C + q 9 + bC + a 9C (3.2) 45 Figure 3.1: Weekly visitation at Lake St. Clair Metropark Note: The figure plots 2022 weekly visitation by all visitors (annual and day pass) at Lake St. Clair Metropark. Shaded weeks experienced a beach closure for at least one day. The variable )⇠ represents the travel costs of reaching a park. The variable ⇢E4A⇠;>B43 equals one if the park’s beach is closed at any time during the 2022 season. It equals one if 9 = !0:4 (C. ⇠;08A "4CA> ?0A : and zero otherwise. The variable %>BC⇠;>BDA4 equals one for any date after July 21, the date of the first beach closure at Lake St. Clair and zero otherwise. The coefficient of interest, V⇠ ! , captures the average effect of the beach closures on Lake St. Clair Metropark’s alternative specific constant for all days after the initial closure. The variable - 9C includes any time-varying controls. We assume the error term, n8 9C , follows a Type I Extreme Value distribution that is independent and identically distributed across individuals, parks, and choice occasions. This produces the conditional logit choice probabilities: 4G ?(+8 9C ) %8 9C = Õ , (3.3) := 4G ?(+ ) :=0 8:C 46 where +8 9C = X 9C + V)⇠ )⇠8 9 is the deterministic portion of utility in equation 3.1. The model provides structure for evaluating the welfare impacts of beach closures. We define the compensating variation (CV) of any change in park attributes, including a beach closure, as 1 ’ ’ ⇠+8C = {;=( 1 )) 4G ?(+8:C ;=( 0 ))}. 4G ?(+8:C (3.4) V)⇠ :=0 :=0 When valuing the beach closures at Lake St. Clair, + 1 represents the utility generated by observed park conditions in the summer of 2022. We define + 0 as the utility that would have been generated in absence of any beach closures. More specifically, 0 = X + V)⇠ )⇠ + 9C V⇠ ! ⇢E4A⇠;>B43 9C %>BC⇠;>BDA4 9C (3.5) 9C 89 For all parks except Lake St. Clair Metropark, + 9C 1 = +0 . 9C 3.4 Estimation Our estimation procedure applies techniques from Murdock (2006) and Chapters 1 and 2 of this dissertation. We estimate the model in two stages, and the panel of park-date fixed effects provides variation both across and within-parks for the second stage regression. First, we estimate the parameters in equation 3.1 using maximum likelihood. Rather than estimate the park-date fixed effects directly, we apply the Berry (1994) contraction mapping. The contraction mapping solves for the park-date fixed effects that match the daily park visitation shares predicted by the model to the daily park visitation shares observed in the data. Estimation leveraging the contraction mapping produces the same estimates as a direct estimation of the park-date fixed effects. The benefit of the contraction mapping is that the optimization routine does not need to search over as many parameters. Because our model contains 756 park-date fixed effects (six parks times 126 dates), we suspect the contraction mapping substantially reduces the computational burden. At the end of the first stage, we obtain estimates of V)⇠ and the panel of park-date fixed effects X. 47 In the second stage, we estimate the parameters in equation 3.2. Because we observe data for only one treated park and five controls, we plan to explore the use of synthetic controls to estimate the treatment effect of beach closure. For now though, we include park and date fixed effects in our second stage regression. These control for constant differences in park amenities and system-wide amenity shocks. Thus, any threat to identification must come from unobserved factors correlated with beach closure that vary with time and impact parks differentially. One benefit of this two-stage procedure is that it accounts for demand spillovers by explicitly modeling inter-park substitution. To yield consistent treatment effect estimates, difference-in- differences and event study approaches require a stable unit treatment value assumption (SUTVA). That is, untreated units’ outcomes must be unaffected by the treatment of other units. In a linear regression with visitation as the outcome and beach closure as the treatment, inter-park substitution resulting from the closure would violate SUTVA. Our structural model allows us to use park- date fixed effects as the second-stage outcome. Because the park-date fixed effects are structural parameters representing the mean utility provided by a park, SUTVA is likely to hold, as the beach closure will not influence the mean utility provided by untreated parks. 3.5 Results Table 3.2 presents coefficient estimates from several model specifications. Results in column (1) come from a model with a cross-section of park fixed effects rather than a daily panel. In this model, the beach closure coefficient can be separately identified from the park fixed effects and the estimation occurs in one stage. Results in columns (2) and (3) come from models that include a full set of park-by-date fixed effects in the first stage, and estimation follows the discussion in the previous section. Estimates of the travel cost coefficient are nearly identical across all three specifications. This is not surprising. Consider that park fixed effects control for unobservable park attributes that are correlated with travel cost, such as remoteness. In doing so, they reduce the possibility of omitted variables bias when estimating the travel cost coefficient. In a model with a cross-section of park 48 Table 3.2: Coefficient Estimates Variable (1) (2) (3) Travel cost ($10) -1.5445 -1.5461 -1.5461 (0.0069) (0.0041) (0.0041) Beach closure -0.2371 -0.2548 -0.2503 (0.0017) (0.0306) (0.0307) First-stage fixed effects Park Y Park-date Y Y Second-stage fixed effects Date Y Y Park-day of week Y fixed effects, changes in unobserved attributes correlated with travel costs still pose a threat to identification, so theoretically, park-by-date fixed effects could improve identification of the travel cost coefficient. In practice though, unobservable park attributes correlated with travel cost vary little within the five-month period of our analysis. This means the decision of whether to include a cross-section or panel of park fixed effects has little impact when identifying the travel cost coefficient. The beach closures decrease the mean utility provided by Lake St. Clair Metropark. Dividing by the travel cost coefficient indicates that the individual mean willingness to pay to avoid the park after the closures ranges from $1.54 for the cross-section of park-fixed effects model to $1.65 and $1.62 for the two park-date fixed effects models. These estimates are substantially smaller than the existing literature and should be compared with caution for two reasons. First, the metroparks provide many amenities aside from the beach, so our willingness to pay estimate averages over many individuals who have little interest in swimming in Lake St. Clair. Second, we assign all dates after the initial closure as treated, which includes a many days with no beach closures. Boudreaux et al. (2023) estimate that beachgoers are willing to pay roughly $266 to avoid a beach with a bacterial warning, but these estimates focus exclusively on beachgoers and the illicit preferences for the exact day when a beach is closed. Figure 3.2 shows how the beach closures impact the park-date fixed effect estimates. The light 49 gray lines track the park-date fixed effects for two control parks: Kensington Metropark and Lake Erie Metropark. Anecdotally, these parks provide similar amenities to Lake St. Clair Metropark (whose park-date fixed effects are shown in black). Figure 3.2: Park-Date Fixed Effect Estimates for Three Parks Note: The figure shows park-date fixed effect estimates for three parks: Kensington Metropark (top- gray), Lake St. Clair Metropark (middle-black), and Lake Erie Metropark (bottom-gray). Shaded areas indicate dates when Lake St. Clair Metropark experienced a beach closure or contamination advisory. A power outage affected data collection at Lake St. Clair. on August 4 and 5, and I drop data for August 4 and 5 throughout the entire analysis. Through the first beach closure period, Lake St. Clair Metropark’s park-date fixed effects are roughly the average of Kensington and Lake Erie’s. By early August though, Lake St. Clair Metropark’s park-date fixed effects have fallen, and they are roughly equivalent to Lake Erie Metropark’s. They show some sign of rebounding, but they never return to their pre-closure level relative to these two control parks. This is consistent with the possibility that visitors gain awareness 50 of beach closures as they occur more frequently. The figure also suggests that closures may reduce visitation even after beaches reopen. We use our estimated model to calculate the welfare loss caused by the 2022 beach closures. As described in section 3.3, we calculate the welfare loss relative to a baseline scenario with no beach closures while all other conditions remain at 2022 levels. Table 3.3 presents the welfare loss estimates for the three models described above. Table 3.3: Welfare Loss Estimates (1) (2) (3) Total welfare loss $67,776 $73,425 $71,964 Using the same three models as table 3.2, we estimate the total welfare loss of the closures was between $68,000 and $73,000. This translates to an average daily welfare loss of around $1,200 ($71,000 divided by 59 post-closure days). All three models generate similar welfare loss estimates, which is not very surprising given their similar parameter estimates (table 3.2). In this setting, including a park-date fixed effects rather than a cross-section of park fixed effects does not affect parameter estimates or welfare loss estimates. Although the magnitude of these welfare loss estimates is modest, we believe our results are important for several reasons. First, our results capture the welfare loss for a subset of visitors (annual passholders), and the total welfare loss would be weakly larger if we also consider visitors who do not own an annual pass. Annual passholders make up roughly 70% of visits to the metroparks system, and if annual passholders incur 70% of the welfare loss, including all visitors would raise the total welfare loss estimate to about $100,000. Second, while we analyze beach closures at a single site, beach closures are not uncommon, especially in the Great Lakes region. Any high-quality estimates of the welfare impacts of beach closures are useful for benefit-transfer analyses in other contexts. Our high-frequency, adminis- trative data and identification strategy make our estimates a reliable data point for valuing other beach closures. Furthermore, most existing studies of beach closures rely on stated preference data 51 to value hypothetical beach closures, making our results a valuable point of comparison for the literature. 3.6 Conclusion This paper introduces a new dataset for the study of recreation demand, which tracks the exact minute of park entry for the universe of park system visitors. We explore the potential benefits of such high-frequency, administrative data for estimating recreation demand models, and we leverage the methodological advances from Chapters 1 and 2 of this dissertation to estimate the welfare losses of beach closures. Our estimation exploits daily visitation and water quality variation, which is unique in the recreation demand literature. Our findings show that the 2022 beach closures decreased the mean utility provided by Lake St. Clair Metropark. Our preliminary results value the total welfare loss around $70,000 for the system’s annual passholders. Our study has several limitations. We observe visitation to only six parks, while visitors likely substitute to many other recreation sites throughout the region. Given the lack of comparably detailed park visitation data, gauging the impact of our limited choice set for our estimates is difficult. There may be some way to combine our data with other sources, like surveys or cell phone data, to fill gaps in our choice set. The data’s lack of demographics is another limitation, and we plan to incorporate ZIP code demographic information in future models. Our current analysis focuses exclusively on annual passholders for simplicity, but this ignores visitors who purchase daily entry passes. Unlike annual passholders, we cannot track individuals who purchase daily entry across visits. This complicates the modeling and estimation procedure. We are unsure whether to drop these visitors or invest in creating a more flexible approach moving forward. Despite these limitations, our paper illustrates the potential for innovative datasets to improve recreation demand research. As similar datasets become available, they will provide more oppor- tunities for detailed models and rigorous empirical strategies in the recreation demand field. 52 BIBLIOGRAPHY (1916). U.S. Code Title 16 - Organic Act. Albouy, D., Graf, W., Kellogg, R., and Wolff, H. (2016). Climate amenities, climate change, and American quality of life. Journal of the Association of Environmental and Resource Economists, 3. Bento, A., Miller, N. S., Mookerjee, M., and Severini, E. R. (2020). A Unifying Approach to Measuring Climate Change Impacts and Adaptation. NBER Working Paper 27247. Berry, S. (1994). Estimating Discrete-Choice Models of Product Differentiation. The RAND Journal of Economics, 25:242–262. Berry, S., Levinsohn, J., and Pakes, A. (2004). Differentiated Products Demand Systems from a Combination of Micro and Macro Data: The New Car Market. Journal of Political Economy, 112. Boudreaux, G., Lupi, F., Sohngen, B., and Xu, A. (2023). Measuring beachgoer preferences for avoiding harmful algal blooms and bacterial warnings. Ecological Economics, 204. Bureau of Reclamation (2013). Downscaled CMIP3 and CMIP5 climate and hydrology projections. U.S. Department of the Interior. Carleton, T., Jina, A., Delgado, M., Greenstone, M., Houser, T., Hsiang, S., Hultgren, A., Kopp, R. E., McCusker, K. E., Nath, I., Rising, J., Rode, A., Seo, H. K., Viaene, A., Yuan, J., and Zhang, A. T. (2022). Valuing the global mortality consequences of climate change accounting for adaptation costs and benefits. The Quarterly Journal of Economics, 137:2037–2105. Chan, N. and Wichman, C. J. (2020). Climate Change and Recreation: Evidence from North American Cycling. Environmental and Resource Economics, 76:119–151. Chan, N. W. and Wichman, C. J. (2022). Valuing nonmarket impacts of climate change on recreation: From reduced form to welfare. Environmental & Resource Economics, 81:179–213. Chintagunta, P., Dubé, J.-P., and Goh, K. Y. (2005). Beyond the endogeneity bias: The effect of unmeasured brand characteristics on household-level brand choice models. Management Science, 51:832–849. Cullinane Thomas, C. and Koontz, L. (2020). 2019 National Park Visitor Spending Effects: Economics Contributions to Local Communities, States, and the Nation. National Park Service. Deschenes, O., Greenstone, M., and Guryan, J. (2009). Climate change and birth weight. American Economic Review, 99:211–217. 53 Dundas, S. J. and von Haefen, R. H. (2020). The Eects of Weather on Recreational Fishing Demand and Adaptation: Implications for a Changing Climate. Journal of the Association of Environmental and Resource Economists, 7(2):209–242. English, E., von Haefen, R. H., Herriges, J., Leggett, C., Lupi, F., McConnell, K., Welsh, M., Domanski, A., and Meade, N. (2018). Estimating the value of lost recreation days from the Deepwater Horizon oil spill. Journal of Environmental Economics and Management, 91:26-45. Fisichelli, N. A., Schuurman, G. W., Monahan, W. B., and Ziesler, P. S. (2015). Protected area tourism in a changing climate: Will visitation at US national parks warm up or overheat? PLOS One, 10(6). Gellman, J., Walls, M., and Wibbenmeyer, M. (2022). Non-market damages of wildfire smoke: evidence from administrative recreation data. Working Paper. Hausman, J. A., Leonard, G. K., and McFadden, D. (1995). A utility-consistent, combined discrete choice and count data model: assessing recreational use losses due to natural resource damage. The Journal of Public Economics, 56:1–30. Henrickson, K. E. and Johnson, E. H. (2013). The Demand for Spatially Complementary National Parks. Land Economics, 89:330–345. Hsiang, S., Kopp, R., Jina, A., Rising, J., Delgado, M., Mohan, S., Rasmussen, D. J., Muir-Wood, R., Wilson, P., Oppenheimer, M., Larsen, K., and Houser, T. (2017). Estimating economic damage from climate change in the United States. Science, 356:1362–1369. Keiser, D., Lade, G., and Rudik, I. (2018). Air pollution and visitation at U.S. national parks. Science Advances, 4(7). Keiser, D. A. (2019). The missing benefits of clean water and the role of mismeasured pollution. Journal of the Association of Environmental and Resource Economists, 6(4):669–707. Knittel, C. R., Li, J., and Wan, X. (2023). I love that dirty water? value of water quality in recreation sites. Working Paper. Lawrimore, J. H., Applequist, R., Korzeniewski, B., and Menne, M. J. (2016). Global summary of the month (gsom), version 1.0.3. Lupi, F., Phaneuf, D., and von Haefen, R. (2020). Best Practices for Implementing Recreation Demand Models. Review of Environmental Economics and Policy, 14:302–323. McFadden, D. (1974). The measurement of urban travel demand. The Journal of Public Economics, 3:303–328. McFadden, D. (1979). Quantitative methods for analysing travel behaviour of individuals: Some recent developments. In Hensher, D. A. and Stopher, P. R., editors, Behavioural Travel Modelling, chapter 13, pages 279–318. London. 54 Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica, 46:69– 85. Murdock, J. (2006). Handling unobserved site characteristics in random utility models of recreation demand. Journal of Environmental Economics and Management, 51:1–25. Neher, C., Dueld, J., and Patterson, D. (2013). Valuation of National Park System Visitation: The Ecient Use of Count Data Models, Meta-Analysis, and Secondary Visitor Survey Data. Environmental Management, 52:683–698. Newbold, S. C., Lindley, S., Albeke, S., Viers, J., Parsons, G., and Johnston, R. (2022). Valuing satellite data for harmful algal bloom early warning systems. RFF Working Papers. Office of Aviation Analysis (2015). Consumer airfare report. Parsons, G., Leggett, C., Herriges, J., Boyle, K., Bockstael, N., and Chen, Z. (2021). A Site- Portfolio Model for Multiple-Destination Recreation Trips: Valuing Trips to National Parks in the Southwestern United States. Journal of the Association of Environmental and Resource Economists, 8:1–25. Parthum, B. and Christensen, P. (2022). A market for snow: Modeling winter recreation patterns under current and future climate. Journal of Environmental Economics and Management, 113. Ruggles, S., Flood, S., Foster, S., Goeken, R., Pacas, J., Schouweiler, M., and Sobek, M. (2021). Ipums usa: Version 11.0 [dataset]. Szabó, A. and Ujhelyi, G. (2021). Conservation and Development: Economic Impacts of the US National Park System. Working Paper. Walls, M. (2022). Economics of the us national park system: Values, funding, and resource management challenges. Annual Review of Resource Economics, 14:579–96. Wooldridge, J. M. (2019). Correlated random effects models with unbalanced panels. Journal of Econometrics, 211:137–150. 55