QUANTIFYING NUTRIENT TRANSPORT PATHWAYS USING SPATIALLY EXPLICIT MODELING AND REMOTE SENSING By Luwen Wan A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Environmental Geosciences – Doctor of Philosophy 2023 ABSTRACT Excess nitrogen and phosphorus from numerous sources delivered through complex transport pathways have accelerated eutrophication and increased the incidence of harmful algal blooms, particularly in the coastal and Great Lakes states in the United States. To address these issues, many watershed models have been developed to simulate nutrient inputs and their consequent deliveries to aid in the establishment of water quality management plans. However, these models typically lack spatially explicit descriptions of nutrient sources and attenuation factors, and they often do not take the joint of surface and subsurface transport pathways into consideration. In addition, spatially explicit tile drainage areas are still lacking, which limits the accuracy of hydrology and water quality modeling. This dissertation presents a spatially explicit model to simulate nutrient loading, sources, and pathways annually (Chapter 2) and during two distinct hydrologic seasons (chapter 3), as well as uses machine learning algorithms that translate remotely sensed Earth observations and environmental datasets into tile drainage maps (Chapter 4). In Chapter 2, I enhance and apply the Spatially Explicit Nutrient Source Estimate and Flux (SENSEflux) model, to simulate nitrogen and phosphorus loads from the US Great Lakes Basin (USGLB) on an annual basis. The results show that agricultural sources are the dominant sources for both total nitrogen (58%) and phosphorus (46%) deliveries to the US Great Lakes. In addition, this study reveals that the surface pathways (sum of overland flow and tile field drainage) dominate nutrient delivery, transporting 66% of total nitrogen and 76% of total phosphorus loads to the US Great Lakes coastline. Building on the annual SENSEflux simulation in Chapter 2, Chapter 3 focuses on seasonal variations in nutrient fluxes, sources, and pathways. Two distinct hydrologic seasons were examined, the first being snowmelt where high flow conditions are present in early spring and the second being baseflow where streamflow originates from groundwater discharge in mid-to-late summer. Results indicate that total nitrogen loading during snowmelt periods is four times greater than annual average deliveries. The contribution of agricultural sources (chemical agricultural fertilizer, manure, and N fixation) is substantially higher (15% for TN and 5% for TP) during melt than baseflow, while point sources, septic tanks, and atmospheric deposition become more prominent contributors to nutrient delivery during baseflow. Thus, seasonal variation of nutrient transport should be considered when establishing nutrient criteria and reduction targets. In Chapter 4, the focus is shifted to tile drainage systems by developing a random forest classifier to map tile drainage in the US Midwest in 2017, as spatially explicit tile drainage information are still lacking. Thirty-one satellite-derived and environmental variables sampled at 60,938 tile and non-tile ground truth points collected from a variety of sources were used to train and validate the classifier. The classifier achieved a good performance (Overall accuracy: 96%; F1 score: 0.9). Then, the classifier was used to do classifications for other regions that lack ground truth data within the study region. The classified tile drainage area correlated reasonably well with the county-level area reported by the USDA National Agricultural Statistics Service (r2 = 0.68). The overall importance revealed that the maximum nighttime land surface temperature in the summer ranked highest, followed by climate- and soil-related variables. This dissertation has produced many spatially explicit data products, including nitrogen and phosphorus loadings, sources, pathways, and hotspots. These are available not only on an annual basis but also for two hydrological seasons. Besides, a significant nutrient pathway, agricultural tile drainage, has been mapped and extended to the study region from the USGLB to the 14 states across the US Midwest. These products and insights could help watershed managers and decision- makers implement nutrient reductions at the right time and place more effectively. This work is dictated to my dear parents Mr. Zhongyu Wan and Mrs. Shuyun Deng, my wonderful sister Mrs. Shilin Wan and my beloved Yingjie, for their unconditional support and love. iv ACKNOWLEDGMENTS This dissertation represents five years and three months of hard work and personal growth. The journey was full of challenges and happiness. I would not reach this point without the tremendous support and guidance from my advisors, committee members, collaborators, friends, and family. I will start by thanking my advisors, Dr. Anthony Kendall and Dr. David Hyndman. Thank you for your unwavering support and guidance throughout this degree. I thank Anthony for continuing to support, challenge and motivate me. My research would have stalled many times if not for his brilliant and professional thoughts and ready availability for problem-solving. I would also like to thank Anthony for his technical support in developing and completing this dissertation, his practical and flexible communication strategies, and for providing thoughtful and honest suggestions for many aspects of my graduate study and life. I am grateful for his support and trust, which enabled me to work remotely during the final year. I am a better scientist and person because of his mentorship and friendship. I thank Dave for his deep knowledge and comprehensive perspective, which has guided and challenged me throughout this dissertation and improved the quality of my research. I would also like to thank his care for students and prioritization of our needs, such as providing comments and suggestions for manuscripts and submitting reference letters, which has provided me with lots of motivation and positivity. I often recall his famous saying, ‘You bet’, which encouraged me to ask questions and seek suggestions, and I am grateful for his generosity and kindness. I have learned so much from my noble people, Dave, and Anthony, and I will carry all the lessons forward in my career and life. Second, I sincerely thank my committee members for their support and advice throughout my Ph.D. training. Thanks, Dr. Jay Zarnetske, for deepening my understanding of hydrogeology and watershed hydrology, stimulating more research interests in this discipline, and providing many v professional training opportunities. Thank you for encouraging and challenging me to think deeply and broadly, as well as giving career suggestions. Thank you, Dr. David Roy, for being direct and critical. His constructive comments from a remote sensing perspective challenged me to do better science, and his feedback is valuable. I would also like to thank him for allowing me to audit his remote sensing class and providing life suggestions. I also want to thank my funding sources and collaborators, this work would not have been possible without them, the National Oceanic and Atmospheric Administration, National Aeronautics and Space Administration, China Scholarship Council, Kellogg Biological Station Long-Term Ecological Research (KBS LTER) Program, MSU College of Natural Science and MSU Department of Earth and Environmental Sciences. MSU Hydrogeology Lab is where I call my foreign family, a community that deserves all the beautiful words in the world and gives me the best memories. I will cherish the friendships I gained here throughout my life. I thank my officemates Alex Kuhl, Autumn Parish, Quercus Hamlin, Bailey Hannah, Brent Heerspink, and Chanse Ford for the supportive and generous atmosphere as well as my other friends and labmates: Alexis Lanier, Ally Brady, Behnaz Mirzendehdel, Ben McCarthy, Erin Haacker, Jake Stid, Jeremy Rapp, Jill Deines, Leanne Hancock, Lisi Pei, Luisana Rodriguez, Mohammed Rahman, Quercus Hamlin, Ryan Vannier, Sam Smidt, Samin Abolmaali, Sherry Martin and Xiao Liu. Their continuous positivity, encouragement, and support were essential to my journey. And I would like to provide some special thanks to Sherry Martin, for her mentorship, for being a professional, supportive, and dedicated co-author. Alex Kuhl, for her generous advice and help, especially when I was new to MSU. Behnaz Mirzendehdel, for her positivity and kindness, I miss you! Brent, thanks for your persisting care and encouragement, and friendly review of my writing materials. Quercus and Jeremy, thanks for the precious friendship! vi Thanks for all the professional discussions and for efficiently reviewing my manuscripts and writings. I am grateful for all the snapshots you sent in the past year, which made my life in California feel less lonely. Special thanks go to the EES department staff, including Pam Robinson, Ami McMurphy, Elizabeth McElroy, Dallas Coryell, Brittany Walter, and Judi Smelser. I thank my wonderful friends from the EES department, Lin Liu and Jiaxin Zhang. Lin, thank you for sharing helpful tips and resources, from fellowship application to career development. I know you are always there when I need suggestions. Jiaxin, thank you for your creativity and multi- talent, which has made my life much more fun, and thanks for listening and understanding my frustrations and providing your suggestions. I also want to thank my dear friends from the Center for Systems Integration and Sustainability (CSIS) at MSU, Yuqian Zhang, Ming Lei, Veronica Frans, and Sophia Chau. Thanks for getting together for pleasant traveling, delicious food, biking and hiking, and many late nights. I can’t wait to see you again soon! I am grateful for the friends and colleagues I have met through our reading group and the Hydro90 community. Thank you all for making me keep learning and making my life much happier during the pandemic. Finally, I thank my parents, Zhongyu Wan and Shuyun Deng, and my sister Shilin Wan. Dad and Mom, thanks for being open-minded and supporting my decisions unconditionally. Dear sister, thanks for encouraging and helping me to participate in the overseas exchange program and pursue my Ph.D. in the United States. Thanks for breaking the traditional/conservative mindset and seeking economic independence, which greatly motivates me. Lastly thank you, goes to my best friend and cheerleader - Yingjie. Thanks for being there with me for over ten years, cheering me on, and helping me become a better person. vii TABLE OF CONTENTS CHAPTER 1: INTRODUCTION ................................................................................................... 1 CHAPTER 2: THE IMPORTANT ROLE OF OVERLAND FLOW AND TILE FIELD PATHWAYS IN NUTRIENT TRANSPORT ................................................................................ 9 CHAPTER 3: SEASONAL VARIATION OF NITROGEN AND PHOSPHORUS SOURCES, PATHWAYS AND LOADS: A SPATIALLY EXPLICIT MODEL SENSEFLUX ................... 38 CHAPTER 4: SEETILEDRAIN: SPATIALLY EXPLICIT ESTIMATE OF TILE DRAINAGE USING REMOTE SENSING AND MACHINE LEARNING .................................................... 72 CHAPTER 5: SUMMARY......................................................................................................... 111 REFERENCES ........................................................................................................................... 116 APPENDIX A: SUPPORTING INFORMATION FOR CHAPTER 2 ...................................... 137 APPENDIX B: SUPPORTING INFORMATION FOR CHAPTER 3 ...................................... 157 APPENDIX C: SUPPORTING INFORMATION FOR CHAPTER 4 ...................................... 168 viii CHAPTER 1: INTRODUCTION 1.1: Significance Nitrogen and phosphorus pollution is one of the most widespread issues around the world, posing persistent threats not only to natural ecosystems but also to human health (Basu et al., 2023; Bowles et al., 2018; Goyette et al., 2016; Houlton et al., 2019). Elevated nutrient levels degrade river water quality and trigger harmful algal blooms (HABs) and they can also contaminate drinking water and cause illnesses in both animals and humans (Liu et al., 2022; Pennino et al., 2017). Exposure to HABs such as eating fish and outdoor recreational activities can cause many negative health effects, such as liver and kidney damage, and respiratory issues (Schullehner et al., 2018). High levels of nitrates in drinking water can lead to infant methemoglobinemia (also called blue baby syndrome) (Hamlin et al., 2022; Ward et al., 2018). Excessive nutrient pollution has also been a problem that is increasingly challenging to solve due to the increasing population, changing climate, various nutrient sources, and complex transport pathways (Basu et al., 2023; Ma et al., 2023; Sinha et al., 2017). Nutrients can originate from a variety of sources, including atmospheric deposition, wastewater treatment plants in cities, excessive use of chemical fertilizers and animal wastes in agricultural and forested areas, septic tanks from households in rural areas, and nutrient runoff from urban areas such as golf courses (Byrnes et al., 2020; Hamlin et al., 2020a; Luscz et al., 2015; Zou et al., 2022). These nutrient sources are commonly nonpoint source pollution and challenging to identify since they typically have no single source or discharge point but are transported indirectly into surface waters. In addition, nutrients often transport through multiple surface and subsurface pathways, such as overland runoff, tile drainage, and groundwater, which makes nutrient management even more difficult. As a result, nutrient pollution and related HABs have been a growing problem in most of 1 the coastal and Great Lakes states in the United States (Kalcic et al., 2019; Kimberly J Van Meter et al., 2023; D. M. Robertson & Saad, 2011a). The Great Lakes are the earth’s largest freshwater lake system. It supplies 21% of global and 80% of US fresh surface water (L. Zhang et al., 2019). The Great Lakes Basin is threatened by nutrient pollution due to agricultural and urban land use, and land management such as the density of tile drainage and wetland (Basu et al., 2023). The United States and Canada have signed the Great Lakes Water Quality Agreement and are obligated to protect and restore the Great Lakes. But some lakes, especially Lake Erie, continue to have serious nutrient issues with excessive nitrogen and phosphorus entering the nearshore region (L. Oldfield et al., 2020; Pennuto et al., 2014; Watson et al., 2016). The massive algal bloom in Lake Erie in August 2014 provoked a tap water ban in Toledo, Ohio, where nearly half a million people were without safe drinking water. There are many other nutrient pollution-related issues that have been seen in the past decades. For instance, the ‘dead zone’ area in the Gulf of Mexico was measured as the largest in 2017 since dead zone mapping began there in 1985 due to nutrient pollution that is primarily from agriculture and urban runoff in the Mississippi River basin (Li et al., 2023). In addition, tile drainage across the US Midwest has been found directly related to nitrate loss in the Mississippi River basin (David et al., 2010; Ma et al., 2023). Tile drainage was one of the significant advances across the US Midwest since the 19th century, promoting highly productive agriculture for areas where it was too wet to cultivate (King, Williams, Macrae, et al., 2015). With agricultural intensification and increasing extreme climate events, tile drainage is growing rapidly and becoming more prevalent to sustain agricultural production. Tile drainage area increased by 6.5 million acres from 2012 to 2017 in the US Midwest, and 70% of counties experienced expansion during the five years (NASS, 2012, 2017). 2 Research has found that 49% of soluble phosphorus and 48% of total phosphorus losses occurred via tile drainage discharge in the St. Joseph River watershed, located in the southwest portion of Michigan and the northern portion of Indiana, US (D. Smith et al., 2015). To quantify the impacts of tile drainage installation on water and nutrient flow and transport, maximize the benefits of tile drainage investments, such as maximizing crop yield and minimizing environmental impact, spatially explicit and fine resolution tile drainage information are needed. 1.2: Research Gap Watershed models have been developed to understand nutrient loadings and sources across scales. These models include the Global Nutrient Export from WaterSheds (NEWS) model, the widely used Soil and Water Assessment Tool (SWAT) model, and the Spatially Referenced Regression on Watershed Attributes (SPARROW) model developed by USGS. Global NEWS was developed by an international, interdisciplinary scientific working group and includes basin-scale sub-models that can predict nutrient sources and estimate natural and anthropogenic sources of nutrients (Seitzinger et al., 2005). NEWS does not simulate detailed hydrological processes within a watershed but rather estimates the nutrient loading based on the statistical relationships between climate, land use, and river water quality. SWAT is a semi-distributed model that can simulate the hydrological cycle, nutrient cycling, and sediment transport at the watershed scale (Arnold et al., 1998). The basic spatial unit used in SWAT is the Hydrologic Response Unit (HRU), which is determined by land use, soil type, and topography. In other words, these characteristics are to be uniform and do not vary spatially. SPARROW model uses multiple regression techniques and is used to estimate contaminant sources and transport in surface water (Preston et al., 2011; R. A. Smith, Schwarz, et al., 1997). Overall, these models have been applied to different watersheds around the world, but they lack fine-resolution and spatially- explicit information for nutrient 3 sources, pathways, and loadings. Existing tile drainage data consists primarily of the area reported by farmers via the USDA National Agricultural Statistics Service’s survey at the county level and likely tile-drained area based on geospatial analysis, under the assumption that agricultural areas with lower slopes and poorly drained soils are likely to be tile-drained (Valayamkunnath et al., 2020). In addition, few studies use aerial images to identify tile drainage lines at the field scale (Naz & Bowling, 2008) or remote sensing images to identify subsurface drainage at the watershed scale (Cho et al., 2019). In recent years, the availability of remote sensing with large spatial coverage and high temporal resolution has facilitated the mapping of agricultural practices, and the Google Earth Engine (GEE) provides the infrastructure for processing and analyzing large volumes of input datasets. Therefore, remote sensing and GEE can provide a cost-effective and expedient method for mapping large- scale agricultural tile drainage. The spatially explicit tile drainage data would enhance the accuracy of the crop, hydrology, and water quality models and inform decisions regarding food security and environmental protection. 1.3: Objective, Questions and Hypotheses Taken together, this dissertation includes three research chapters (Figure 1.1). In Chapter 2, I modified the previously published version of the SENSEflux model to quantify total nitrogen and phosphorus loadings, sources, and pathways and identify the nutrient delivery hotspots on an annual basis within the US Great Lakes Basin and focused on addressing the three questions below. Research questions are indicated with the letter “Q” followed by a number, with corresponding numbered hypotheses indicated with the letter “H”. Q2.1: How much nitrogen and phosphorus are delivered annually from drainage basins in the United States to the Great Lakes, and how do nitrogen and phosphorus loadings vary among lake 4 basins? Q2.2: What are the annual contributions of nutrient sources and pathways to nutrient transport and delivery to the Great Lakes? Q2.3: On an annual basis, where are the nutrient delivery hotspots located? I project that: (H2.1) a greater quantity of nitrogen and phosphorus loadings are delivered from the Great Lakes Basin to the Great Lakes, and these loadings vary across the basin, with the majority of the nutrient loadings coming from agricultural and urban areas. (H2.2) the contributions of different nutrient sources and pathways varied, with agricultural sources accounting for most of the nitrogen and phosphorus pollution transported via surface pathways. (H2.3) annual hotpots are in these high nutrient delivery locations and maybe in these high nutrient input areas. Building on the annual SENSEflux model in Chapter 2, I further improved the SENSEflux model in Chapter 3 to simulate the seasonal loadings, sources, and pathways, with the major update being the seasonal nutrient mobility that was inferred from seasonal streamflow. More specifically, two hydrologic seasons were considered, snowmelt in early spring with high flows and baseflow, in mid- to late-summer where streamflow originates from groundwater discharge. This seasonal information further helps the decision makers to target nutrient reduction at the right time to help the reduction more efficiently. This chapter mainly addresses the three questions below. Q3.1: How much nitrogen and phosphorus are transported from US drainage basins to the Great Lakes during snowmelt and baseflow seasons? Does the pattern differ from the annual transport (i.e., delivery from each lake basin)? Q3.2: Which sources and pathways are dominant during the hydrologic seasons? Q3.3: When would it be the most effective time to reduce nutrient pollution in the Great Lakes, and where should seasonal reduction efforts be focused? 5 Here are hypotheses for Chapter 3. (H3.1) more nitrogen and phosphorus are delivered to the Great Lakes during the snow melt season, whereas less are delivered during baseflow. The pattern is dependent on the regional variability of surface runoff and baseflow and is likely to differ from the annual patterns. (H3.2) agricultural sources are most likely the primary source during snowmelt, whereas urban sources become dominant during baseflow season. Surface pathways dominate nutrient transport and delivery during the snow melt season, but subsurface pathways are dominant during baseflow. (H3.3) due to increasing nutrient loadings, the snow melt season would be the best time to implement nutrient reduction plans. During baseflow, however, nutrient reduction should be prioritized to locations with higher nutrient concentrations and groundwater connections. Inspired by the previous two chapters’ findings that tile drainage is a vital nutrient transport pathway on an annual basis, and it delivered more nutrients during snowmelt. Also, due to the lack of spatially explicit and high-resolution data on tile drainage at a large scale, I shifted my focus to tile drainage systems in Chapter 4. Here I integrated remotely-sensed and environmental datasets to map agricultural tile drainage across the Midwest in 2017 using random forest machine learning via the Google Earth Engine cloud computing platform and identified the crucial variables that improve the tile drainage classification accuracy. Q4.1: Using remote sensing data, can a machine learning model identify tile drainage area at the pixel level and achieve good accuracy? Where are the tile drainage regions in the Midwestern United States? Q4.2: Which input variables are important to improve tile drainage classification accuracy across the US Midwest? I proposed the following hypothesis: (H4.1) with satellite imagery and large environmental 6 datasets available, as well as the processing resources provided by Google Earth Engine, a machine learning classifier can map spatially explicit tile drainage information with reasonable accuracy. Tile drainage areas are found in agricultural regions with inadequate drainage. (H4.2) climate- related factors (i.e., precipitation) and soil-related variables such as soil drainage class and slopes are important for tile drainage identification. Figure 1.0 Overview of this dissertation, including three research chapters. The schematic diagram in is based on https://www.cityofgriffin.com/departments/storm- water/watershed-management, and the icons in lower left corner are downloaded from Freepik. left corner the upper This dissertation has produced many spatially explicit data products for nutrient loadings, sources, pathways, and hotspots on an annual basis and during hydrologic seasons (snowmelt and baseflow). Also, a significant nutrient pathway, agricultural tile drainage, has been mapped and extended to the study region from the US Great Lakes Basin to the 14 states across the US 7 Midwest. These maps can be linked to community-facing tools such as the tipping point planner (tippingpointplanner.org) that can help nutrient managers and government agencies pinpoint areas with the most extensive nutrient deliveries and understand the contributions of primary sources and pathways at a finer resolution. It can also help watershed managers and decision-makers target the areas for reduction at the right time and develop water quality management plans precisely. The spatially explicit tile drainage locations and extent can provide a baseline for mapping historical tile drainage area change and predicting the potential areas for future expansion as more extreme rainfall events have been predicted under climate change scenarios. The findings and results from this dissertation are beneficial to broader interdisciplinary research efforts regarding food security, agricultural water management, and environmental sustainability. Finally, the SENSEflux annual and seasonal models and the random forest machine learning algorithms for tile drainage detection can be readily applied to other regions. 8 CHAPTER 2: THE IMPORTANT ROLE OF OVERLAND FLOW AND TILE FIELD PATHWAYS IN NUTRIENT TRANSPORT 2.1: Abstract Nitrogen and phosphorus pollution are of great concern to aquatic life and human wellbeing. While the vast majority of these nutrients are applied to the landscape, little is known about the complex interplay between nutrient applications, transport attenuation processes, and coastal loads. Here, I enhance and apply the Spatially Explicit Nutrient Source Estimate and Flux model (SENSEflux) to simulate the total annual nitrogen and phosphorus loads from the US Great Lakes Basin to the coastline, estimate the relative contributions of different sources and pathways, and identify nutrient delivery hotspots. In addition to in-stream uptake, this model explicitly describes nutrient attenuation through four distinct pathways that are seldom described jointly in other models: runoff from tile-drained agricultural fields, overland runoff, groundwater flow, and septic plumes within groundwater. This model provides fully distributed estimates of nitrogen and phosphorus loading, sources, and pathways at high resolution (120 m). Our analysis shows that Lake Michigan and Erie have highest the nutrient loading from the US portion of the Great Lakes Basin among the Lakes, and the Lake Erie basin has the highest nutrient flux per unit area, in agreement with other studies. Agricultural sources, including chemical fertilizer, manure, and nitrogen fixation, are the dominant sources for both total nitrogen (58%) and total phosphorus (46%) deliveries to the US Great Lakes. In addition, this study reveals that the surface pathways (i.e., overland runoff and tile field drainage) dominate nutrient delivery, transporting 66% of total nitrogen and 76% of total phosphorus loads to the US Great Lakes coastline. Importantly, this study provides the first basin- wide estimates of both groundwater and septic-plume deliveries of nutrients to the lakes. I find that the groundwater pathway for nitrogen has higher delivery than overland flow. This work 9 provides valuable information for environmental managers to target efforts to reduce nutrient loads to the Great Lakes, which could be transferred to other regions worldwide that are facing similar nutrient management challenges. Figure 2.0 Abstract graphic. Nitrogen budget for the US Great Lakes basin in ~2010. Total nitrogen and phosphorus sources are shown in colored rectangles. Orange arrows show the flow of nutrient processes, including harvest and root zone loss, groundwater storage, septic removal, basin attenuation and stream losses. 2.2: Introduction Nitrogen and phosphorus loading have been linked to degraded surface water quality and the eutrophication of many coastal ecosystems worldwide. Research has focused on several ecosystems, including the Gulf of Mexico and Chesapeake Bay in the United States (R. B. Alexander et al., 2000; Lefcheck et al., 2018), the Laurentian Great Lakes Basin in North America (Chapra et al., 2016; D. M. Robertson et al., 2019), as well as Taihu Lake and the Yangtze Basin in China (Xuanjing Chen et al., 2022; L. Guo, 2007; Powers et al., 2016; J. Wang et al., 2020; W. Zhang et al., 2019, 2020). Actions have been taken to restore and protect water quality. For example, the US and Canada signed the Great Lakes Water Quality Agreement (GLWQA) in 1972 10 and updated it in 2012 with reduced phosphorus load targets (D. M. Robertson et al., 2019). Although point source loads have been curtailed under the US Clean Water Act (CWA), nutrient pollution is still one of the most widespread, costly, and challenging environmental problems in the United States (Frei et al., 2021; Sharma, 2020). This is partly due to the technical difficulties inherent in predicting the complex transport of pollutants from millions of “nonpoint” sources through heterogeneous hydrologic systems to receiving water bodies via diverse surface and groundwater pathways. Developing effective management and mitigation strategies requires an improved understanding of the relative contributions of different nutrient sources and their varied transport pathways. Nutrient loading to coastal waters is derived from both point sources (primarily wastewater treatment plants) and nonpoint sources (including agricultural and non-agricultural fertilizer, manure, nitrogen fixation, and atmospheric deposition) (Boyer et al., 2002; Hamlin et al., 2020a; Luscz et al., 2015; Powers et al., 2016; Swaney et al., 2018). Agricultural practices are major contributors to nutrient contamination because of the widespread use of fertilizers and livestock manure (Chaplin-Kramer et al., 2016; Motew et al., 2018). Thus, managing agricultural sources typically has been the focus of efforts to reduce nitrogen and phosphorus losses to the environment (Bowles et al., 2018; Ockenden et al., 2017; W. Zhang et al., 2019, 2020). In contrast to the large body of research on agricultural nutrients, pollution from intensively managed urban landscapes is of great concern but is understudied (Filipović et al., 2015; Lapointe et al., 2017; Rakhimbekova et al., 2021). For example, golf courses have been cited as significant sources of nutrient loading to water bodies (Bock & Easton, 2020), and human wastewater sources may contribute up to 40% of agricultural loads to coastlines globally (Tuholske et al., 2021). Septic sources are direct and concentrated inputs to the groundwater system and are substantial sources of groundwater nitrate 11 in the United States (Ouyang & Zhang, 2012). Septic tanks have been considered as the primary cause of nutrient leaching to groundwater because of inappropriate site conditions, poor design, inadequate maintenance, and infrequent inspections (Pollack et al., 2011). Unfortunately, septic systems are commonly only evaluated at the time of permitting or during major building additions; less attention and resources are typically directed to septic system upkeep and maintenance, which are regulated in different ways across the US (L. Oldfield et al., 2020). It is also important to know the relative importance of different pathways through which nutrients travel to receiving water bodies, as transport and uptake processes differ along those pathways. Most studies have focused on pathways like surface runoff, while there has been little literature that comprehensively quantifies the relative contribution of other nutrient transport pathways (Gill & Mockler, 2016). Specifically, tile drainage (Ikenberry et al., 2014; King, Williams, & Fausey, 2015; Michaud et al., 2019) and groundwater (Brookfield et al., 2021; Holman et al., 2008; Robinson, 2015) have been found as important transport pathways and are major contributors to nutrient loads. For example, groundwater discharge likely accounts for ~50% of phosphorus loaded to Lake Arendsee, Germany, thus accelerating the eutrophication of the lake and detrimental ecological impacts (Meinikmann et al., 2015). (D. Smith et al., 2015) estimated that 49% of soluble P and 48% of total P losses from fields occurred via tile discharge in research fields across the Lake Erie basin. Therefore, it is important to quantify the relative contributions of nutrients from these major pathways. It is not feasible to measure the relative contributions of different nutrient sources and pathways at the regional watershed scale, thus modeling approaches have been used to simulate their fate and transport. Three main categories of models are regression-based empirical, process-based flow and transport, and hybrid empirical and process-based models (R. B. Alexander et al., 2002). 12 Regression-based models are less complicated and easier to implement, although most ignore spatially explicit sources and lack mechanistic components, interactions between sources, and some nutrient loss processes (Howarth et al., 1996). Examples of process-based models include Nutrient Export from Watersheds (NEWS) (Mayorga et al., 2010a; Seitzinger et al., 2005) and the widely used Soil and Water Assessment Tool (SWAT) (Gassman et al., 2014). NEWS simulates hydrological processes on land and subsequently nutrient transport through surface and subsurface waters (Arnold et al., 1998). SWAT simulates transport using hydrologic response units (HRUs), which lump all similar land uses, soils, and slopes within a subbasin. Hybrid empirical and process- based models such as the SPAtially Referenced Regression On Watershed attributes (SPARROW), a GIS-based watershed model developed by the United States Geological Survey (USGS), uses a hybrid approach to estimate nutrient sources, transport, and loadings around the world (Dai et al., 2021; D. M. Robertson et al., 2019; D. M. Robertson & Saad, 2011b; R. A. Smith, Schwarz, et al., 1997; Wellen et al., 2012). While SWAT, NEWS, and SPARROW consider nutrient sources and retention on landscape and stream networks to predict nutrient and source contribution (McCrackin et al., 2013), they do not characterize spatially explicit sources nor extensively model different nutrient attenuation pathways. To address these limitations, I enhanced the spatially-explicit hybrid SENSEflux (Spatially Explicit Nutrient Source Estimate and Flux) model to estimate the fate and transport of nitrogen and phosphorus that originate from point and nonpoint sources. SENSEflux models nutrient transport across the landscape, stream network, and connected inland lakes to the Great Lakes coastline via spatially explicit pathways, including overland flow, tile drains, groundwater, and septic plumes. An earlier version of SENSEflux was developed for the Lower Peninsula of Michigan (Luscz et al., 2017); here additional modeling capabilities for nutrient reduction and 13 retention processes have been incorporated. In particular, SENSEflux offers an improved simulation of nutrient delivery to streams via groundwater pathways (Martin et al., 2021), estimates the long-term storage of phosphorus in the subsurface, and improves the parameterization of both in-stream and lake nutrient losses. Here, I estimate total phosphorus (TP) and total nitrogen (TN) loads to the US Great Lakes Basin (USGLB), and compare the relative contributions of nutrient sources, with an emphasis on transport pathways. The results can be particularly useful for stakeholders to identify hotspot areas (e.g., high nutrient flux and yields) and major sources and pathways that contribute to nutrient inputs to the Great Lakes. This knowledge can help prioritize locations and strategies for nutrient reduction and provide valuable inputs to other hydrological and ecological studies. The SENSEflux model could be applied to other regions around the world that have nutrient management issues. 2.3: Method 2.3.1: SENSEflux Model Description SENSEflux uses a GIS and mass balance approach to simulate the nutrient fate and transport from point and nonpoint sources across the landscape through rivers to lakes and wetlands. A schematic diagram and a detailed conceptual framework of SENSEflux are shown in Figure 2.1. Broadly, SENSEflux includes four components: 1) nutrient applications, 2) in-situ losses, 3) basin attenuation via surface and subsurface pathways, and 4) stream and lake attenuation. Loss and attenuation terms are generally spatially-explicit, the product of both static landscape factors and an independent calibrated parameter for each process and nutrient. Prior to transport, three in-situ loss terms that remove nutrients: crop harvest/in-situ loss, septic system nutrient removal efficiency, and unsaturated zone nutrient storage. I split basin transport into four distinct pathways, which are not commonly represented in most hybrid and statistical nutrient transport models: 14 overland flow, tile drainage, bulk groundwater flow, and septic plumes. Transport along each pathway includes an attenuation factor, proportional to either distance or time, calibrated and validated at sampling locations. Following basin attenuation, nutrients are then subject to stream and lake attenuation before ultimately reaching the desired end point (e.g., a sampling location or the Great Lakes coastline). SENSEflux supports six (P) and seven (N) spatially explicit nutrient applications. There are three agricultural terms: manure, agricultural chemical fertilizers, and nitrogen fixation. Urban land use terms include chemical non-agricultural fertilizers, point sources, and septic tanks. Atmospheric deposition of both N and P occurs in all landscapes. Importantly, I do not use Net Anthropogenic Nitrogen Inputs (NANI) (Hong et al., 2013) or Net Anthropogenic Phosphorus Inputs (NAPI) (Han et al., 2011), though these could be computed from the SENSEflux outputs. For this study, nutrient inputs are described by the 2010 SENSEmap product (described in section 2.4.2). There are three in-situ loss terms applied before nutrients are transported: Septic removal, Harvest, and Storage. The Harvest (ExH in Figure 2.1) loss term includes all in-place root zone losses of nutrients (i.e., sorption, denitrification, P mineralization, etc.) and is assumed to occur on in cells with either manure or chemical agricultural fertilizers applied (see “harvested areas” in Figure A2.0.1). The Storage loss term (Fstor) includes both in-place storage and loss of nutrients below the root zone, for phosphorus. Note that Fstor is only applied to subsurface mobile nutrients (see below). Both Harvest and Storage only occur for surface-applied sources (excluding septic and point sources). Finally, septic sources are subject to Septic loss (SepEff). The spatial distributions and equations for these loss terms are detailed in Appendix A (Text A2.1 and Text A2.2). Before transport, during the calculation of in-situ losses, surface-applied nutrients remaining after harvest are partitioned between surface and groundwater pathways with a spatially variable 15 partition parameter (F). This produces surface- and subsurface-mobile nutrient pools within each cell. The partition parameter (F) is assumed to vary directly with groundwater recharge. Nutrients applied to septic systems are subject to septic loss, then form the septic mobile pool. Nutrients from the mobile pools are then subject to basin transport and attenuation, consisting of movement to streams from each point on the landscape, with spatially-variable and source-specific attenuation occurring along the path. Surface mobile nutrients may flow to streams via either overland flow (BS) or agricultural tile drains (BST), in areas where tile drainage exists. While little overland flow occurs in most tile-drained areas (King et al., 2014), this assumption may lead to a somewhat elevated estimate of overall tile drainage flux versus overland flow. The subsurface mobile nutrients are then transported and attenuated through groundwater flow (BG). Transport and attenuation of septic-mobile (remaining septic removal) nutrients within septic plumes (BSE) is another important groundwater pathway. I separated nutrient transport along this pathway because attenuation within septic plumes occurs differently than in general groundwater flow due to the distinct chemical characteristics of septic tank effluent. 16 a. Schematic diagram b. Conceptual framework Figure 2.1 Schematic diagram (a) and conceptual framework (b) of the SENSEflux model. In the schematic diagram, seven nutrient sources are indicated in blue text. Nutrient transport across the landscape via surface and groundwater pathways are indicated with black dashed lines and yellow triangles. Basin attenuation terms: BS: Overland flow, BST: tile field, BG: groundwater flow, BSE: septic plume. River attenuation term: R: in-stream and lake attenuation. (Based on the https://www.cityofgriffin.com/departments/storm-water/watershed-management). conceptual diagram, brown boxes represent nutrient sources, and green arrows pointing left are septic onsite removal, crop harvest, and loss in the deep unsaturated zone. Dashed yellow arrows are distinct nutrient transport pathways. In 17 Nutrients remaining after basin transport are then subject to attenuation via stream and lake processes (R). Point sources are applied directly to streams and lakes at this step. Stream processes (which here include flow through connected wetland systems) include separate terms for biological nutrient uptake, denitrification (N), and sorption/mineralization (P). Lakes are represented with a linear uptake term, proportional to the length of the nutrient flow paths that intersects lacustrine- classified wetlands (i.e., lakes). Detailed descriptions for the derivation of these terms are given in Appendix A (Text A2.3). This study builds on and renames (here, SENSEflux) the model first presented by (Luscz et al., 2017), incorporating new loss terms and improving multiple parameterizations. These changes were largely necessitated by the greater spatial extent of this model (see section 2.4.1), and thus the greater range of landscape and climate characteristics present. First, I added a subsurface in- situ storage term (Fstor) for phosphorus. I also tested this approach for nitrogen, but as parameterized it decreased model fit and led to unrealistic results for groundwater N transport. Second, I replaced the simplistic R term in (Luscz et al., 2017), which also relied on a basin-yield parameter that dominated the overall stream uptake, potentially skewing results. The new R term includes spatially-variable characteristics related to denitrification or sorption, biological uptake, and lake losses. Finally, some small adjustments were made to the overall model equation. These changes are described in Appendix A (Text A2.1, Text A2.2, and Text A2.3). 2.3.2: Model Parameter Estimation SENSEflux is calibrated separately for N and P, using observed fluxes or concentrations (here, concentrations). This study uses annually-averaged fluxes (section 2.4.3) and thus represents average annual losses, attenuation, and nutrient delivery. There are 10 (N) and 11 (P) scalar parameters in the SENSEflux model; these include four loss and partition terms (SepEff, ExH, F, 18 and Fstor) as linear multipliers, with the remaining attenuation parameters (Bs, Bst, Bse, Bg, Rdn, Rbio, and Lacus) as multipliers within an exponent (see equations in Text A2.1). For N, Fstor is set to 0. Except for SepEff, all parameters are then optimized via automated parameter estimation. SepEff is the efficiency multiplier on septic loads and is set as 0.3 for TN (0.35 for TP) based on existing studies (Luscz et al., 2017; Tong et al., 2020; Valiela et al., 1997). While SENSEflux could be used to independently calibrate multipliers for both SepEff and Bse (septic basin attenuation), here I did not have sufficient data density to independently calibrate both. Here I used MATLAB’s constrained non-linear local-minimum optimization routine fmincon. The objective function for this optimization was the mean absolute difference (error) between the base- 10 log (termed MAEL) of the observed and simulated nutrient concentrations. Several different objective functions were tested, including root-mean-square residuals, and root-mean-square log- 10 residuals. Ultimately MAEL was selected because it more equally weighted both low and high- concentration locations, a necessity given the wide range of nutrients across the study region (section 2.4.1 and Figure A2.0.2 and Figure A2.0.5). I also tested using MAEL with loads, as opposed to concentrations, but ultimately selected concentrations because they provided higher sensitivity to attenuation parameters. 2.4: Study Area and Model Inputs 2.4.1: Study Area: US Great Lakes Basin (USGLB) The Laurentian Great Lakes encompass Lakes Superior, Michigan, Huron, Erie, and Ontario, which make up Earth’s largest liquid freshwater system, with ~21% of the world’s and ~80% of the United States’ surface freshwater supply (L. Zhang et al., 2019). The USGLB includes portions of eight US states (Illinois, Indiana, Michigan, Minnesota, New York, Ohio, Pennsylvania, and Wisconsin). Average precipitation across USGLB varies between 500 and 1600 mm yearly (Figure 19 A2.0.2), and annual average temperatures from 3 to 10 ℃ in the 2010s (OSU, 2014). Forty million residents of the United States and Canada depend on this lake system for clean drinking water (NOAA, 2019). Great Lakes’ ecosystems are being threatened by climate change, invasive species, and degraded water quality because of pollutants from residential, agricultural, and industrial activities (Kovalenko et al., 2018; Paerl et al., 2016; Sterner et al., 2017). Urbanization, agricultural intensification, and failing septic systems are causing contamination across the GLB. Several major cities in the southern basin (e.g., Chicago, Detroit, and Cleveland) have significant impermeable surface areas that route precipitation directly to aquatic systems via overland flow. Most agricultural areas are in the southern portion of the basin (Figure A2.0.2) and produce substantial nutrient loads to the lakes (D. M. Robertson et al., 2019). Because of humid continental climates and broad areas with very permeable soils and high aquifer recharge, 43% of the USGLB coast is vulnerable to groundwater-borne nutrients (Knights et al., 2017; Kottek et al., 2006). Harmful Algal Blooms (HABs) in the GLB have been a critical issue for millions who live in the region, resulting in negative effects on industries (e.g., fishery, tourism, aquaculture), ecology (e.g., fish kills), and public health (e.g., drinking water contamination, toxicity to pets and livestock) (Carmichael & Boyer, 2016; Gobler, 2020; Michalak et al., 2013). 2.4.2: Nutrient Source Inputs: SENSEmap To drive SENSEflux, I used the SENSEmap product that describes each of 7 distinct sources (total nitrogen, TN) or 6 sources (total phosphorus, TP) at 30 m resolution, ca. 2010. These sources include point sources, along with the non-point sources of chemical agricultural fertilizers, chemical non-agricultural fertilizers, manure, septic tanks, atmospheric deposition, and N fixation. SENSEmap estimates sources using GIS and statistical methods constrained by broadly available 20 data from remote sensing, government databases, and literature (Hamlin et al., 2020a, 2020b). For this study, all seven sources were aggregated to 120 m resolution, summing from the 30 m SENSEmap values (Figures A2.0.3 and A2.0.4). 2.4.3: Calibration Data: In-Stream Nutrient Load TN and TP loads at sampling sites (TN: 116, TP: 119) within the USGLB, which were used by (D. M. Robertson & Saad, 2011b) to calibrate the SPARROW model, were extracted to calibrate and validate the SENSEflux model (Figure A2.0.5). Here, averages from 2008 – 2012 were used for calibration and validation. Load data was split randomly into two sets: 70% for model calibration and 30% for validation. (D. M. Robertson & Saad, 2011b) estimated these loads using the Fluxmaster program, which may overestimate nitrogen loads and underestimate TP due to uncertainties such as the lack of continuous measurements of concentration and streamflow (Richards et al., 2012; D. M. Robertson et al., 2019; Stenback et al., 2011). Nevertheless, they are the most complete dataset of annual loads available for the region. 2.4.4: Basin Characteristics: Groundwater Recharge, Overland Flow Length, Harvested Areas, and Tile Drained Areas As discussed in section 2.3, spatially variable factors affect the fate and transport of TN and TP during both landscape (basin) and in-stream transport. Groundwater recharge (Figure A2.0.1), or the amount of water percolating from the surface to the water table, is used to characterize the subsurface partition (F) and the fraction of groundwater-pathway nutrients stored in soil and deep unsaturated zone (Fstor). Overland flow length was calculated using ArcGIS Hydrology Toolbox based on the Digital Elevation Model (DEM) from the National Elevation Dataset (NED) and is used as part of the reduction factor for basin pathways. Harvested areas for TN and TP are determined by where manure or chemical agricultural fertilizers are applied. A novel tile drainage 21 layer was calculated to evaluate whether nutrients were likely transported via overland flow or tile fields. See details for groundwater recharge and tile drainage area calculation in Text A2.4. 2.4.5: In-Stream/Lake Characteristics: Catchments, In-Stream Travel Time, Streambed Exchange Rate, and Lake Travel Distance The hydrologic networks move water through catchments and along rivers, with their associated drainage basin providing a critical component to hydrologic analysis and modeling (Brakebill & Terziotti, 2011). The (~30 m) resolution 1 arc-second DEM from the USGS National Elevation Dataset (NED) was used to calculate flow direction and flow accumulation to generate stream networks (Gsech et al., 2002). Watersheds were delineated using these sampling sites as pour points in the ArcGIS 10.6 Hydrology toolbox. TN and TP watersheds are shown in Figure A2.0.5 with corresponding loads for each watershed. Like the overland flow length calculation in the previous section, in-stream travel time was calculated using the ArcGIS Hydrology Toolbox with the NED DEM and the flowlength function. For this instance, a cost raster was supplied, calculated as the time/unit distance in each cell (i.e., 1/velocity). For overland flow portions of the flow path, the cost function was set to 0, while in- stream velocities were computed from USGS gauge site data as in (Mooney et al., 2020). In-stream velocities are used to calculate the biological uptake portion of stream attenuation (equation (2.5) in Text A2.1). The N denitrification/P sorption portion of the stream attenuation functions was assumed to be driven by the rate of exchange between streamflow and the stream bed and the hyporheic zone beneath. I calculated this rate of exchange as the ratio of streambed flux and in-stream flow, as described in Text A2.3. Ultimately, this exchange rate (Figure A2.0.6, e) is the product and ratio of multiple factors, including hydraulic conductivity (Figure A2.0.6, d), slope (Figure A2.0.6, a), 22 basin yield during baseflow (Figure A2.0.6 b), hydraulic radius (Figure A2.0.6, c) and velocity (Figure A2.0.6, d). The exchange rate is then used to calculate denitrification/sorption in (equation (2.6) in Text A2.1). Travel distance in lakes was computed by providing a 0/1 cost raster to the flowlength function in ArcGIS, with values of 1 indicating lakes and 0 otherwise. This layer is used to compute lake retention (equation (2.7) in Text A2.1). 2.5: Results and Discussion SENSEflux TN and TP annual models performed well (calibration/validation R2 values 0.93/0.86 and 0.79/0.76, respectively, Figure 2.2), providing estimates of TN and TP loads to the USGLB using optimized parameters (Table A2.1). These parameters indicate higher rates of attenuation in overland vs. groundwater pathways, and that transport through tile drainage produces the lowest attenuation rates of all pathways (see extended discussion in Text A2.5). The TN model had a slightly better fit versus TP, with a difference of 14% for calibration and 10% validation datasets, respectively. The best-fit line has slopes > 0.75 (TN and TP calibration/validation slopes are 0.90/0.76 and 0.77/0.78, respectively), indicating a slight bias toward high predictions at low loads, and low predictions at higher loads. Overall, the model predicted loads were close to observed values: the mean absolute error of log daily loads (MAEL), for TN calibration and validation are 0.14 and 0.17 log10 (kg/day) respectively, 0.22 and 0.28 for TP. Analyses of residuals indicate no significant bias for TN, but a slight (statistically non- significant) underestimate of TP deliveries for the validation dataset (Figure A2.0.7). The residuals for TN and TP are not significantly different from zero with P values ranging from 0.4 to 0.65 based on a one-sample t-test (Figure A2.0.7). The spatial residuals (Figure A2.0.8) are reasonable with 83% and 77% of watersheds having residuals (log10) between -0.2 and 0.2 and are 23 significantly clustered spatially with the Moran’s Index of 0.43 and 0.41 for TN and TP respectively. Figure 2.2 Plot of log10 simulated and observed daily loads for model calibration and validation datasets. The dashed black line is the 1:1 line. Solid blue and red lines indicate regression fits. n refers to the number of observation points in each of the calibration and validation datasets. 2.5.1: Spatially Varied Nutrient Delivery and Loads to Lakes Simulated export of TN and TP varied substantially across the US Great Lakes Basin, with the majority of area (~60%) ranging between 128-912 kg/yr/km2 for nitrogen and 4-23 kg/yr/km2 for phosphorus, encompassing the ~20th to 80th percentile of nutrient deliveries (Figure 2.3a & 2.3b). Over the entire USGLB, mean TN and TP loads are 599 and 21.7 kg/yr/km2, respectively. Broadly, spatial patterns are similar for both TN and TP (Pearson correlation coefficient, r = 0.73). For instance, both TN and TP are high in the southern Lake Michigan, Saginaw Bay, Western Lake Erie, and Lake Ontario basins (Figures 2.3a and 2.3b). We also calculated the N:P ratio (ratio of total delivered TN to TP load) across USGLB and each 24 lake basin (Figure 2.3e & 2.3f), which can help us understand the lake trophic status. Overall, the USGLB is N enriched, with > 95% (5th percentile of the ratio = 17.85, median of 80.7) of cells delivering nutrients above the classical Redfield ratio, defined as an N:P molar ratio of ~16:1. Individual Lake basin averages of N:P delivery ratio is between ~58-67 (Figure 2.3f). For comparison, basin averages the N:P ratios of SENSEmap nutrient inputs to the landscape from (Hamlin et al., 2020a) vary between ~17-21, except for Lake Superior at 51.6. It is notable that in our current study, the delivery N:P ratio for Lake Superior is 62.4, indicating that the watersheds draining to this northernmost lake are relatively efficient at routing P to the coastline. In general, watershed deliveries to the lakes indicate that P should be the limiting nutrient, as is largely the case for non-marine waters in this region. Lake Michigan receives the highest TN loads followed by Lake Erie, with 62.9 and 61.5 kt/yr nitrogen delivered from US lands to the water, respectively (Figure 2.3c). Lake Erie has the highest TP loads followed by Lake Michigan, with 2.4 and 2.3 kt/yr phosphorus delivery (Figure 2.3d). Lake Huron, Ontario, and Superior have much lower deliveries, all at or below 24 kt/yr nitrogen and 0.8 kt/yr phosphorus. This is consistent with the larger US drainage basins of Lakes Michigan and Erie among the Great Lakes. I also calculated nutrient yields, defined as nutrient fluxes divided by drainage basin area (Figures 2.3c & 2.3d). Not surprisingly, Lake Erie has the highest yields for both TN and TP, with nitrogen yields of 1148 kg/yr/km2 and phosphorus yields of 43.5 kg/yr/km2. Lakes Huron and Michigan have similar nitrogen yields (567 and 544 kg/yr/km2, respectively) and phosphorus yields (18.6 and 20.1 kg/yr/km2, respectively). Lake Superior has the lowest nitrogen (218.3 kg/yr/km2) and phosphorus yields (7.7 kg/yr/km2). To evaluate the nutrient loading from SENSEflux, I compared simulated TN and TP loads with the GLB SPARROW model(D. M. Robertson & Saad, 2011b), which is the most comparable 25 model with the closest timeframe and is widely used as a watershed nutrient load predictor. Overall, the total modeled delivery of TN and TP from USGLB to the lakes using SENSEflux are 0.37 and 0.45 times lower than simulated loads from the SPARROW model for TN and TP, respectively (Figure A2.0.9). More details about the reasons for these differences can be seen in Text A2.5. Figure 2.3 Predicted nutrient delivery as yield (kg/km2/yr) to Great Lakes shoreline and summarized nutrient loads by lake basin (black outlines) within the USGLB. TN (a) and TP (b) yields are direct outputs from SENSEflux that are resampled to 720 m resolution for display purposes and classified in quantiles, within which each color represents ~20% of the USGLB area. Bars in (c – TN) and (d – TP) represent the total basin-accumulated fluxes and yields (area normalized) for each nutrient. Map of N:P yield molar ratio at 720m resolution across USGLB, rounded to the nearest 1 (e); bar chart of mean N:P load molar ratio by lake basin within USGLB (f). 26 Figure 2.3 (cont’d) The average TP load from Lake Erie watersheds calculated by (Maccoux et al., 2016) was equal to 5.7 kt/yr for 2008-2012 (excluding direct Point and Municipal sources). Thus, the estimate presented here is roughly 42% of that from Maccoux et al. This difference is dominated by an underestimate of loads to Lake Erie from just one tributary, the Maumee River, which suggests some important pathway or source mechanism may be missing or underestimated within SENSEflux. Future work will continue to refine nutrient input mechanisms and specifically examine varying seasonal loads—given the established importance of winter and spring deliveries in that basin (Stow et al., 2015). 2.5.2: Nutrient Delivery Efficiency and Hotspots SENSEflux provides a fully spatially-explicit estimate of nutrient deliveries, allowing a novel view of the landscape: nutrient delivery efficiency, defined as the ratio of deliveries to inputs (Figure 2.4a & 2.4c). Median cell-by-cell TN delivery efficiency was 14.6%, while TP was just 3.7%. In 27 general, northern portions of the GLB have higher delivery efficiency, as do urban areas. Basin- averaged delivery efficiencies range between 11 - 16.5% for N and 2.8 – 4.9% for P in all lakes except for Superior (Figure 2.4b & 2.4d). There, the basin is remarkably efficient at delivering nutrients, sending almost 28% of input N and 23% of input P to the coastline. Cumulatively over the entire region, TN delivery efficiency was 14.6% (coincidentally the same as the median cell- by-cell delivery value), while TP was 4.3% (see Figure 2.0). Figure 2.4 The ratio between nutrient delivery and apply (defined as efficiency here) for TN (a) and TP (b) across the USGLB and summaries by lake basin (b & d). Nutrient hotspots are determined as high yield (x-axis) and efficiency (y-axis) areas in bivariate choropleth maps for TN (c) and TP (d). Deliveries are the result of both inputs and the cumulative storage and attenuation processes along transport pathways. Figures A2.0.10 and A2.0.11 in Appendix A are maps of the loss and attenuation for N and P, respectively. Harvest is the dominant loss process across most of the domain, providing the broad N-S gradient seen in delivery efficiencies (Figures A2.0.10a, 28 A2.0.11a). Stream and lake attenuation variability (Figures A2.0.10c, A2.0.11d) occurs at moderate scales, driven in large part by the travel time from the coastline. Basin transport attenuation (A2.0.10b, A2.0.11c) varies at the shortest scales, responding to distance from streams, tile vs. overland transport, and the presence of septic systems. Combining nutrient deliveries (Figures 2.3a & 2.3b) with delivery efficiencies (Figures 2.4a & 2.4c) produces a novel view of landscape nutrient transport function: delivery hotspots, quantified by terciles and presented in a bivariate colormap (Figures 2.4e and 2.4f). Areas of the landscape with both high loading and delivery efficiency (highest 33%) are the most intense sources of loads to the coastline (shown in blue on the hotspot maps). These are predominantly urban areas, particularly for TP. Next, areas with high delivery (highest 33%) but low efficiency (lowest 33%, teal on the hotspot maps), are agricultural areas generally more distant from the coastline. Areas with low delivery (lowest 33%) but high efficiency (highest 33%) are highlighted in magenta and concentrated in the northern areas of the region. Although they do not generate substantial total deliveries, their high delivery efficiency (ratio of deliveries to inputs) makes them of conservation interest. In general, areas with high delivery efficiency would be ideal targets for conservation efforts, because reducing these inputs has a correspondingly high influence on marginal deliveries. 2.5.3: Leading Sources of N And P Fluxes to The Great Lakes Agricultural sources, including manure, chemical agricultural fertilizer, and nitrogen fixation, dominate nitrogen fluxes, totaling ~58% of all fluxes from the USGLB (Figure 2.5a). Agriculture was the largest nitrogen source of each lake, except for Lake Superior where atmospheric deposition dominated (91%). Phosphorus fluxes were also dominated by agricultural sources (manure and chemical agricultural fertilizer) in USGLB (~46%, Figure 2.5b). On the lake basin scale, phosphorus fluxes in Lakes 29 Michigan, and Ontario are driven by agricultural sources because of large manure inputs, while Lakes Erie and Huron had higher inputs from chemical agricultural fertilizer. Specifically, Lakes Michigan and Ontario had 46% and 60% phosphorus loads from agricultural sources, with manure accounting for 27% and 40% of inputs, and 19% and 20% from chemical agricultural fertilizer, respectively. For Lakes Erie and Huron, agricultural sources appear to contribute 49% and 46%, where agricultural chemical fertilizer was the dominant source (35% and 30%). Urban sources, including chemical non-agricultural fertilizer, septic tanks, and point sources, account for only 12.5% of N contributions but 34% of total P in the USGLB (Figures 2.5a and 2.5b). The contributions of these three sources for TN delivery are similar, with 4.1% of the point source, 3.8% of septic tanks, and 4.6% of chemical non-agricultural fertilizer. However, the point source accounts for 16.8% of TP delivery. The relatively lower point source contribution for TN demonstrates the effectiveness of the CWA and National Pollutant Discharge Elimination System (NPDES) permitting system. The other major urban source for TP transport is chemical non- agricultural fertilizer (14.7%), which is largely applied to golf courses and lawns. This is supported by (Baris et al., 2010) who found that 86% of surface water samples had phosphorus concentrations above the criteria set by the US Environmental Protection Agency, after comparing nitrate and TP concentrations from golf courses across the United States. It’s critical to ensure the effective use of fertilizers in urban systems, and use best management practices to reduce nutrient transport (Bock & Easton, 2020), especially for TP. 30 Figure 2.5 Estimated percent of nutrients delivered to lakes by source. Donut figures represent the USGLB while rotated stacked bar plots show each lake basin within the USGLB. The black donut circle was divided into agricultural sources (brownish), urban sources (greenish), and atmospheric deposition sources (mint tulip) with corresponding percentages. Atmospheric deposition is an important contributor of N and P, consisting of 29% of TN and 20% of TP sources in the USGLB (donuts in Figure 2.5a and 2.5b). This is largely because of the dominant role (91% of TN, 81% of TP sources) that atmospheric deposition plays in Lake Superior, which has severe stoichiometric imbalance with high N and low P concentrations (Sterner, 2011). Unfortunately, atmospheric deposition can be difficult to manage, and research has found that current policies and technologies may not be sufficient to reduce deposition under critical loads (Clark et al., 2018; McCrackin et al., 2017). In summary, although management actions have focused on agricultural sources for decades, they still dominate TN and TP transport and delivery across most of the USGLB. While approximately 71% and 88% of TN and TP sources applied to the landscape are from agriculture (Hamlin et al., 2020a), only 58% and 46% of TN and TP are delivered to USGLB. This means that even though a higher percentage of agricultural phosphorus source is applied to landscapes, less percentage of 31 phosphorus is delivered to aquatic ecosystems in these lower delivery efficiency areas. These disproportional differences between sources and fates show the importance of nutrient transport and the natural differences between TN and TP attenuation. 2.5.4: Nutrient Delivery Pathways: The Dominant and The Underappreciated We summed nutrient deliveries by transport pathways across the USGLB, shown ranked from highest to lowest proportion (surface (tile fields + overland) > subsurface (groundwater + septic plume) > point) for TN and TP within the USGLB (Figure 2.6). Our results agree with prior work that indicates surface pathways dominate transport (Baker et al., 1975; Ryden et al., 1974; Sims et al., 1998), contributing the largest proportion of TN and TP (66% and 76%) delivery to the lakes. Others have shown that overland flow was the primary export pathway for both P and N, but tile drainage cannot be overlooked (Kokulan et al., 2019; D. Smith et al., 2015) and contributes almost ½ of some TP loads (D. Smith et al., 2015). I found that overland flow was the primary pathway (40%) for TP delivery to the US Great Lakes. TN was dominated by tile fields (46%) and 36% of TP was transported by tile fields, showing tile drainage delivers large quantities of nutrients, especially nitrogen to the Lakes and is thus critical to manage. Subsurface pathways (groundwater flow and septic plumes) transport an important proportion of nutrients to the lakes, with about 30% of TN and 6.8 % of TP (Figure 2.6). The groundwater flow pathway dominates the subsurface transport of nitrogen (26% of total transport), likely due to nitrogen’s high mobility. I found that septic plumes contributed 3.8% of total nitrogen (2% of TP) to lake loading. Other studies have indicated that septic systems are important nutrient sources (L. E. Oldfield et al., 2020; Reay, 2004), yet they are rarely accounted for and commonly overlooked. The pathway proportions from the point source are 4.1% for TN and 16.8% for TP, demonstrating more efforts are needed to reduce phosphorus loads to the Great Lakes. Note our maps do not 32 include direct discharges of point sources to the Great Lakes coastline. Figure 2.6 Estimated percentages of total nutrients delivered to lakes by pathways (surface: overland and tile fields; subsurface: groundwater and septic plume). 2.5.5: Heterogeneous Pathways Surface pathways dominated nutrient contributions within each Lake basin (Figure 2.7a & 2.7b). In the Lake Superior basin, overland flow dominates nutrient transport, with 61% of TN and 86% of TP. In the Lake Michigan basin, tile drainage transported 45% of TN (23% by overland flow) and 33% of TP (46% by overland flow). Tile fields delivered more nutrients than overland flow in Lakes Erie and Huron basins. This supports earlier work that found tile drainage to be the primary pathway for nutrient delivery to streams in the western Lake Erie basin(D. Smith et al., 2015; Williams et al., 2018). In the Lake Ontario basin, tile fields transported 42% of nitrogen (23% for overland) and 37% of phosphorus (40% for overland). Subsurface pathway (septic plume and groundwater) contributions varied across the five lake basins (Figure 2.7a & 2.7b). The bulk groundwater flow pathway (excluding septic plumes) contributed substantially across the lake basins, ranging from 24% in the Lake Erie basin to 34% 33 in the Lake Superior basin for N. Conversely, about 4.8% (4.1% - 5.5%) of phosphorus was transported via the groundwater pathway. The proportion of nitrogen load from the septic pathway varies from 2% in the Lake Superior basin (1% for TP) to 4.4% in the Lake Huron basin (2.8% for TP). Controlling much of the landscape nutrient delivery to the USGLB, Michigan is one of the few states in the US without a statewide septic code governing septic system installation, maintenance, and repair (J. Alexander, 2013) although discussion and development of such a code is ongoing. To further investigate the heterogeneity of nutrient delivery, I mapped the amount of TN transported through our four basin transport pathways to streams (overland flow, Figure 2.7c; tile fields, 2.7d; groundwater, 2.7e; septic plume, 2.7f). Maps for TP transport pathways are shown in Figure A2.0.12 because of space limitations and overall similar spatial patterns in TN and TP pathways. For instance, overland transport pathways are high (> 238 kg/km2/yr TN, > 11.97 kg/km2/yr TP,) in the southern Lake Michigan basin, eastern Lake Ontario basin, and some urban areas (i.e., Detroit, MI; Cleveland, OH; Buffalo, NY; Rochester, NY). The tile field pathway is a major contributor (> 1053 kg/km2/yr TN, > 27.71 kg/km2/yr TP) in agricultural areas (e.g., Southern Lake Michigan basin, Saginaw Bay, western Lake Erie, and Lake Ontario basins), likely due to high chemical fertilizer and manure inputs, along with higher tile drainage density. Lakes Erie, Ontario, and southwestern and the thumb area of Lake Michigan showed higher TN and TP delivery through groundwater flow, possibly due to higher groundwater withdrawal rates in these areas and elevated nitrate concentrations (Hamlin et al., 2022). The septic plume yields were highest around large cities, where dense suburban populations are not yet connected to sewer systems. These results also show substantial variability across the USGLB basins, with different dominant 34 pathways in different Lake basins (Figure 2.7). Specifically, as the installation of tile drainage expands or intensifies, fluxes from tile drainage will likely become more important. Figure 2.7 The estimated total yield of TN delivered to lakes by four key pathways. (a) and (b) are for TN and TP pathways by lake basin, (c) – (f) for TN overland, tile fields, groundwater, and septic plume respectively. Maps are resampled from 120 m SENSEflux outputs to 720 m resolution here for display purposes and classified in quantiles, with each color representing 20% of the dataset; the white area in c & d within the basin boundary represents areas with no data as I assumed overland and tile fields are alternative pathways. Corresponding maps for TP are included in Figure A2.0.12. 35 2.6: Implications, Limitations, and Future Work This study uses a spatially explicit nutrient transport model to help understand the fate and transport of total nitrogen and phosphorus from multiple sources along different pathways. Modeling distinct transport pathways provides a novel alternative to many models that do not include important pathways, particularly groundwater and septic plumes. Our analysis shows that overland flow and tile fields are major pathways of nutrient transport, but subsurface transport plays an important role. Specifically, tile drainage is highest in Lake Erie, transporting 44% more TN and 15% more TP than the overland pathway, suggesting the increasing installation of tile drainage may have significant effects in other regions. For subsurface pathways, groundwater and septic plumes provide 30% of TN delivery and 6.8% of TP. This will become even more important when we consider legacy nutrients that often have long groundwater travel times (Martin et al., 2011, 2017, 2021). Thus, these subsurface pathways should not be ignored in water quality management and policy. Agricultural nutrient sources (manure, chemical fertilizer, and nitrogen fixation) have played a dominant role in the Great Lakes Basin’s history and will be a critical part of its future. I also found that atmospheric deposition is a significant source of nitrogen, and septic tanks contribute significant nitrogen and phosphorus loads to the environment. Groundwater also plays a substantial role in transporting nutrients from the landscape to streams, and eventually to the Great Lakes coastline. These findings can be used along with the SENSEflux and SENSEmap products (Hamlin et al., 2020a) for the US‐GLB to provide managers with spatially explicit loading, efficiency, source, and pathway estimates. For example, the Tipping Point Planner Program links watershed data to local decision-making processes(Weinstein et al., 2021) (tippingpointplanner.org), thereby, the addition of SENSEflux can help managers focus actions on specific sources and pathways. The 36 information presented here can provide important inputs to this community-facing tool. Future research could improve nutrient flux and pathway estimates for the Great Lakes Basin, which would help inform more holistic decisions to achieve nutrient reduction strategies. A more accurate tile drainage map would improve estimates of the contributions of this pathway to the waterways in the basin. In addition, the role of septic plumes in phosphorus delivery and lakes that don’t have a connection with streams should be further explored to seek ways that protect water quality by reducing N and P loads. Also, this modeling assumes that all landscape input nutrients have had sufficient time to reach the streams where concentrations are observed and that nutrient inputs aren’t changing meaningfully over decadal time scales. Future efforts could include time- varied surface loads, along with estimates of legacy timescales and travel times. 2.7: Acknowledgments This work was funded by NASA Grant NNX11AC72G and NOAA Grant NA12OAR432007, and grant GS100013. Luwen Wan was partially supported by the China Scholarship Council (CSC). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NASA, NOAA, or CSC. 37 CHAPTER 3: SEASONAL VARIATION OF NITROGEN AND PHOSPHORUS SOURCES, PATHWAYS AND LOADS: A SPATIALLY EXPLICIT MODEL SENSEFLUX 3.1: Abstract Excess nitrogen and phosphorus can cause algal blooms, which limit recreational activities, increase costs to treat drinking water, and can be toxic to human and aquatic life. In temperate climates, there is a consensus that nutrient delivery to streams and rivers increases during snowmelt periods. However, the quantification of total nitrogen and phosphorus loading, sources, and pathways across hydrologic seasons is not well-known. Here I updated the Spatially Explicit Nutrient Source Estimate and Flux model (SENSEflux) with a seasonally dynamic approach to simulate nutrient loadings, sources, and pathways to the US Great Lakes. Specifically, I examined two distinct hydrologic conditions: snowmelt, early spring conditions with characteristically high flows, and baseflow, mid- to late-summer where the bulk of streamflow originates from groundwater discharge. Results show that total nitrogen loading during snowmelt periods is four times greater than annual average deliveries. The contribution of agricultural sources (chemical agricultural fertilizer, manure, and N fixation) is substantially higher (14% for TN and 5% for TP) during melt than baseflow, while point sources, septic tanks, and atmospheric deposition become more prominent contributors to nutrient delivery during baseflow. In addition, most nutrients are transported via overland flow and tile fields regardless of the period. Groundwater and septic plume pathways also play an essential role in transporting nutrients and become more significant during baseflow. Nutrient delivery hotspot analysis shows that targeting nutrient reductions during snowmelt is more effective than focusing on baseflow, especially for TN. The optimal location for nutrient reductions during baseflow would be those areas with high nutrient deliveries spatially. Thus, seasonal variation of nutrient transport should be considered when establishing nutrient 38 criteria and reduction targets. These results and high-resolution (120 m) SENSEflux output products provide decision-makers with nutrient reduction insights. In addition, the SENSEflux model can be readily applied to other regions with nutrient management issues. 3.2: Introduction Nitrogen and phosphorus enrichment are two of the most detrimental and costly stressors in surface waters, degrading water quality and causing eutrophication in the world’s stream, coastal, and lake systems (Abell et al., 2010; R. B. Alexander & Smith, 2006; Azzari et al., 2019; Huggins et al., 2023; K J Van Meter & Basu, 2017; Pearce & Yates, 2020; K. Yan et al., 2019). Due to irregular occurrences (i.e., heavy precipitation, melting snow) and seasonal agricultural activity such as fertilization (Rixon et al., 2020), nutrient delivery can be continuous but is frequently intermittent. For instance, 42–92% of the annual total phosphorus (TP) load and 41–81% of the total nitrogen (TN) load were delivered during snowmelt in the Red River sub-watershed (Corriveau et al., 2013). The seasonal nutrient delivery then alters its concentration in rivers, streams, and downstream waterbodies. Another example for the Upper Mississippi River Basin is that TN concentration was found to be lowest in September, remain steady through the winter and increase further in the spring season due to fertilizer application (Wine et al., 2020). In addition, excessive nitrogen and phosphorus pollution in aquatic systems frequently causes seasonal summertime algal blooms and hypoxia events (Bechard, 2020; Carmichael & Boyer, 2016; Gobler, 2020; Mchau et al., 2019; Pretty et al., 2003). Thus, simulating the seasonal dynamics of nutrient export and understanding the seasonal contributions of sources and pathways are crucial for managing nitrogen and phosphorus delivery and implementing nutrient reduction strategies (LaBeau et al., 2015; Mayorga et al., 2010b; Tong et al., 2019). Climate and environmental changes have been identified that affected nutrient delivery and 39 exacerbated nutrient pollution, such as heavy precipitation and increased streamflow, rising water and air temperatures, and agricultural activities (Hallegraeff et al., 2021; Heil & Muni-Morgan, 2021). Precipitation and subsequent runoff mobilize and transport nutrients (dissolved and particulate) from upland sources to downstream water bodies. For instance, in an agricultural Lake Mendota watershed, Wisconsin, most phosphorus was transported in the spring during the high flow period (Huisman et al., 2017). However, phosphorus pollution was also found to being the most serious in low-water season in the Qiuxi River, China, with high risk of phosphorus release from sediment (Xuemei Chen et al., 2021). Air and water temperature also affect nitrogen and phosphorus differently, such as changing the biological activity rate, solubility, transformation, and biogeochemical cycling processes. The reaction rates, mineralization, and denitrification have been shown to increase with higher temperatures (Alam & Dutta, 2013; Hartman et al., 2014; Paerl et al., 2011), while excessive temperature can cause nutrient loss and ecological impacts. Furthermore, the variation and magnitude of nitrate loads were higher at the beginning of the growing season due to fertilizer applications in the northern Red River basin (Nasab et al., 2018). These cases have demonstrated that the importance and complexity of nutrient transport has been recognized by many researchers. However, because the coverage of nutrient monitoring and availability of natural and anthropogenic nutrient sources, particularly at large scales, are frequently limited, the quantification of nutrient loadings, sources, pathways, and hotspots is inevitably involved with watershed modeling. Nutrient models have been developed to quantify loads and the contribution of sources and been used for nonpoint source pollution management decisions (Wellen et al., 2015). As one of the most widely used process-based watershed models, the Soil and Water Assessment Tool (SWAT) simulate nutrient loadings based on the Hydrologic Response Units (HRU), which are aggregated 40 land areas using unique combinations of land use and land cover, soil type, and topography (Arnold et al., 1998; Bosch, 2008). SWAT predicts nutrient delivery and transport based on lumped parameters within each sub-watershed. The SPARROW model was developed by the United States Geological Survey (USGS) that estimates the quantity of contaminant delivered downstream based on monitoring data, location, and landscape characteristics (Preston et al., 2011; R. A. Smith, Schwar, et al., 1997). It has been applied to US watersheds, such as the Mississippi River basin and Chesapeake Bay watershed (M. P. Miller et al., 2020; D. M. Robertson & Saad, 2013), as well as some binational watersheds (Benoy et al., 2016; D. M. Robertson et al., 2019). Recently, a seasonally dynamic application of SPARROW was developed and showed that seasonally varying streamflow and temperature were significant drivers of nitrogen and phosphorus uptake in northeastern US river corridors (Schmadel et al., 2021). Another model example is the first global, spatially explicit model, named the Global Nutrient Export from WaterSheds (NEWS). NEWS was developed to estimate nutrient export at river mouths based on basin-scale nutrient budgets and seasonal inputs (Mayorga et al., 2010b; Seitzinger et al., 2005). For example, calendar seasons (i.e., spring, summer, autumn, and winter) were used to simulate dissolved inorganic nitrogen in the world’s largest rivers (McCrackin et al., 2014). Thus, simulating nutrient fluxes and their transport and delivery can benefit from modeling that relates climatic and hydrological drivers with landscape and in-stream processes to identify hotspots and moments and inform watershed management. However, the implementation of current watershed models still requires improvements due to the absence of spatial heterogeneities within basins and limited information on hydrological seasonal variations, thus existing models are generally applied to annual loadings and do not consider seasonal variabilities of nutrient loadings, sources, and pathways. To address these limitations, Luscz et al. (2017) developed a spatially explicit nutrient transport 41 model that simulate the seasonal differences in nutrient sources, pathways, and processes in the Lower Peninsula of Michigan. This seasonal pathway and process model was inspired by SPARROW but differs in different ways. For instance, this model has spatially explicit description of nutrient sources and basin features, thus describing spatially explicit nutrient attenuation along nutrient transport pathways. Also, it models temporally-variable transport and predicts seasonal variabilities. Here, I substantially improved the previously published nutrient transport modeling work (Luscz et al., 2017) and SENSEflux annual model (see Chapter 2) to predict seasonal nitrogen and phosphorus loading variations from the United States Great Lake Basin (USGLB) to the Great Lakes. This new SENSEflux seasonal model includes major updates on seasonal mobility of applied nutrients across the landscape and seasonal nutrient retention in streams and rivers. These SENSEflux seasonal models are then able to quantify the contributions of seven nutrient sources and four distinct basin pathways to in-stream nutrient concentrations. More specifically, two distinct hydrologic conditions were chosen, early spring snow melt with very high flows that flush nutrients and the late summer baseflow primarily representing groundwater transport pathways. This research seeks to clarify the critical times and locations for nutrient delivery, enabling watershed managers to implement cost-effective nutrient reduction strategies. 3.3: Materials and Methods 3.3.1: Study Area and Definition of Hydrological Seasons The Great Lakes are one of the world’s largest surface freshwater systems, consisting of Lakes Superior, Michigan, Huron, Erie, and Ontario (Figure 3.1). It stores approximately 84% of North America's surface freshwater and 21% of the world’s surface freshwater. The Great Lakes Basin covers significant portions of the United States and Canada. It provides drinking water and 42 annually attracts tens of millions of visitors (Cotner et al., 2017). However, harmful algal blooms (HABs) have been found throughout the Great Lakes Basin. These have adverse ecological and economic effects and threaten human health, including the contamination of drinking water (Michalak et al., 2013). In the summer of 2014, one of the most notable HABs led to a three-day tap water prohibition in Toledo, Ohio. Due to data availability, this study focuses on the United States side of the Great Lakes Basin (USGLB), which includes portions of eight states: Illinois, Indiana, Michigan, Minnesota, New York, Ohio, Pennsylvania, and Wisconsin. The Great Lakes are expansive, spatially diverse, and dynamic. Their physical characteristics, such as climate, soil, and land use, vary across the basin (Sterner, 2020). For instance, the USGLB is temperate and sub-humid, including Koeppen-Geiger zones Dfa and Dfb, which represent hot- summer and warm-summer humid continental climates, respectively (Rubel & Kottek, 2010). The climate has changed considerably over the past several decades. From 1910 to 2015, the annual mean temperature increased by 1.6 ℃, exceeding the average change of 1.2 ℃ for the remainder of the Continental United States (CONUS). Over the same time, precipitation increased by nearly 10%, with more coming from extreme events (Wuebbles et al., 2019). Additionally, snow is essential to the regional energy budget, hydrological cycle, and human activities (Ford et al., 2020, 2021). For instance, snow results in adverse effects such as snowmelt-induced flooding and nutrient transport (Suriano et al., 2019), although less ice and fewer freezing days are predicted for the Great Lakes. Overall, it is widely acknowledged that climate change is a significant driver of change in the Great Lakes. Mixtures of clays, silts, sands, gravels, and boulders were deposited as glacial drift or as glacial river and lake sediments in the southern USGLB. In contrast, the northern USGLB has a typically thin layer of acidic soil atop bedrock. The Great Lakes region is renowned for its abundant and 43 diverse agricultural production. Its fertile lands provide ideal conditions for growing maize, soybeans, winter wheat, and 15 percent of the nation’s dairy products. Most agricultural planting occurs in the southern portion of the basin, whereas the northern portion remains forested. As a result of increased yields and arable area from tile drainage installation, the southern basin continues to use and enhance its drainage systems. However, they pose nutrient pollution issues for the Great Lakes, particularly in wet years (S. A. Miller & Lyon, 2021b). Throughout this manuscript, we use the phrase “hydrologic season” to denote the times during which a set of hydrologic conditions (i.e., high-flow “snowmelt” and low-flow “baseflow”) occur. Within the USGLB, high streamflow “snowmelt” conditions occur most frequently during a calendar period extending from March 1 to April 15. Similarly, low-flow “baseflow” conditions occur most often from July 15 to September 15. Within these calendar periods, hydrologic conditions were determined based on streamflow. This distinction allows for stream nutrient samples collected during the hydrologic season to be filtered to include only those that meet the definition for snowmelt or baseflow conditions. 3.3.2: Seasonal Nutrient Concentrations, Loads, and Watershed Delineation Comprehensive TN and TP concentrations were downloaded from the STOrage and RETrieval Data Warehouse (STORET) via the Water Quality Portal (WQP, www.waterqualitydata.us). WQP was developed by the US Environmental Protection Agency, USGS, and the National Water Quality Monitoring Council (Blodgett et al., 2016). It is the largest standardized water quality database and provides hundreds of millions of data dating back over a century for sites in groundwater, inland, and coastal waters (Read et al., 2017). The database from multiple sources is invaluable for understanding the historical and current water quality of Nation’s rivers, while some records lacked or contained ambiguous metadata; parameter names; sample fractions (e.g., filtered, 44 unfiltered, dissolved, total, and others) and units of measurement (Sprague et al., 2017). In this study, I focus on total phosphorus (TP) and total nitrogen (TN). TN, unlike TP, has numerous related characteristic names, including inorganic and organic forms. Ammonia, ammonium, nitrate, and nitrite are the inorganic forms of N. Total Kjeldahl nitrogen (TKN), the sum of organic nitrogen, ammonia, and ammonium, is the standard laboratory method for detecting organic nitrogen. While definitionally true, available data from both hydrologic seasons show an essentially equivalent relationship between TN and the sum of total nitrate (NO3 -), nitrite (NO2 -), and TKN (R2 = 1, slope ~ 1, Figure A3.0.1). Thus, I imputed TN values for sites without measurements using the sum of NOx - and TKN. For TP, I used sites that have total phosphorus measurements. To filter nutrient samples for hydrologic conditions, I first calculated the streamflow percentiles for each site using estimated streamflow (Mooney et al., 2020) from 2008 to 2012 during our study period (circa 2010). Then, during the snowmelt season, I only included nutrient measurements when streamflow exceeded the 70th percentile (for baseflow, less than the 30th percentile). Following the below steps, the TN and TP loads on effective seasons (baseflow and melt) for each site were calculated; see also Figure A3.0.2. First, I match the estimated streamflow to the site- specific concentration. Secondly, loads are determined by multiplying the concentration by streamflow. Thirdly, each site's mean Load was calculated by averaging the load from the second step. Fourth, the median flow at each site from 2008 to 2012 is provided. Finally, the concentration at each site was calculated by dividing the mean load by the median flow. For this study, concentrations are used to calibrate and validate the models. At the same time, mean loads are employed as observed loads for each site and compared to the simulated loads (median flow times simulated concentration). Annual loads are downloaded from the Great Lakes Basin’ application 45 of the SPARROW model (D. M. Robertson et al., 2019). After filtering by hydrologic conditions, I compiled concentrations for 370 TN (463 TP) sites within the GLB during snow melt season, and 571 TN (1000 TP) sites during baseflow, see site concentrations and distributions in Figure A3.0.3. Watersheds were delineated using the ArcGIS Hydrology toolbox for all the sites with concentration (Figure A3.0.3) and streamflow data. DEM from USGS NED at 30 m resolution was utilized to calculate flow direction and flow accumulation to generate stream networks, which were then utilized to generate watershed polygons and raster (Gsech et al., 2002). The distribution of watershed size is shown in Figure A3.0.4. 3.3.3: SENSEflux Model and Modifications for Seasonal Predictions 3.3.3.1: SENSEflux Seasonal Model Overview Dynamic modeling of nutrient transport and fate has the potential to simulate seasonal nutrient loadings to streams and lakes. I updated SENSEflux from its annual model (see Chapter 2) and the seasonal model that was previously published version of the SENSEflux model (Luscz et al., 2017) to simulate seasonal nitrogen and phosphorus loadings, sources, pathways, and hotspots. Major updates include seasonal mobility that are derived to compare nutrient concentrations and fluxes across landscape processes with yearly conditions, seasonal in-stream, and lake retention, and SENSEflux seasonal models are calibrated and validated with nutrient concentrations and fluxes during hydrological seasons See more details regarding the major updates in section 3.3.4. With these model updates, SENSEflux seasonal models simulates nutrient loading in snowmelt and baseflow with three seasonally variable parameters for surface basin reduction (bs), biological uptake in rivers (Bio) and lake attenuation (Tsettl). Because I do not have information about when nutrients are applied (also, when those become mobile), I use annual average nutrient inputs, with seasonal mobility inferred from hydrologic flux variability, to simulate nutrient loading. 46 SENSEflux employs a mass balance approach to simulate nutrient transport, see equation 3.1. Specifically, SENSEflux has four major components: (1) nutrient sources; (2) in-situ losses; (3) seasonally variable basin attenuation; and (4) seasonally-varied river and lake attenuation. Sources are spatially explicit and include five surface-applied sources (agricultural chemical fertilizer, manure, nitrogen fixation, non-agricultural chemical fertilizer, and atmospheric deposition), noted as other sources in equation 3.1, septic tanks (Ssep), and points sources (Spoint) (Hamlin et al., 2020a). 𝑐𝑒𝑙𝑙𝑠 𝐿𝑘 = ∑ 𝑅𝑗𝑇𝑠𝑒𝑡𝑡𝑙𝑗 × {𝑆𝑝𝑜𝑖𝑛𝑡 + {𝐺𝑠𝑚𝑆𝑠𝑒𝑝(1 − 𝑆𝑒𝑝𝐸𝑓𝑓)𝐵𝑠𝑒𝑗 𝑗 𝑜𝑡ℎ𝑒𝑟 𝑠𝑜𝑢𝑟𝑐𝑒𝑠 + ∑ 𝑆𝑖𝑗 𝑖 × 𝐸𝑥𝐻𝑖𝑗 × [𝑆𝑠𝑚(1 − 𝐹𝑗)𝐵𝑠𝑗 + 𝐺𝑠𝑚𝐹𝑗(1 − 𝐹𝑠𝑡𝑜𝑟𝑖𝑗)𝐵𝑔𝑗] 𝑅𝑗 = 𝑒−𝛼∗𝐷𝑁𝑆𝑃𝑗 ∗ 𝑒−𝛼1∗𝐵𝑖𝑜𝑗 (3.1) (3.2) Harvesting (ExH), storage (Fstor), and septic removal (SepEff) are assumed the active processes during basin transport and represented as in-situ losses in SENSEflux modeling. Nutrient losses in the root zone, such as uptake by plants, denitrification, sorption, and P mineralization, are lumped as the Harvest term. Moreover, it has been assumed that these losses only occur in areas where manure or agricultural chemical fertilizer is applied. Specifically, surface-applied nutrients are harvested (ExH) first, and the remaining nutrients are partitioned between surface and groundwater pathways, and further formalized nutrient mobile pools. This partition is determined by a spatially variable parameter (F) that varies in direct proportion to the groundwater recharge fraction. Nutrient storage, such as in-place storage and nutrient losses below the root zone, and I assume the storage only applied to phosphorus because of the relatively high mobility of nitrogen in subsurface systems. A fixed percentage of nutrient removal is applied to septic sources. 47 After partitioning between surface and groundwater pathways, storage of P below the root zone, and addition of septic sources, seasonal nutrient mobility factors are introduced. Specifically, event-based surface mobilization (Ssm) and event-based groundwater mobilization (Gsm). Here, I assume that nutrient mobility is related to hydrologic fluxes, that as fluxes increase so does nutrient mobility. We derive Ssm and Gsm from USGS stream gauges using the groundwater-discharge (baseflow) and surface runoff components of streamflow determined by hydrograph separation. For each gauge, the ratio between seasonal surface runoff and groundwater discharge with annual averages of the same are computed (e.g., snowmelt groundwater discharge/ annual groundwater discharge). Then, spatially continuous seasonal mobility layers are created for the USGLB with a Random Forest model trained at the USGS gauges to just three variables, watershed size, hydrologic landscape region (HLR), and unconsolidated glacial drift thickness (more details in section 3.3.4.2). Within the model, the seasonally mobile surface and groundwater nutrients are computed as the product of their respective mobilization factors (i.e., Ssm, Gsm) and available nutrients within the surface and subsurface pool for each cell. The result is the surface- and subsurface-mobile nutrient pools. Nutrients in these mobile pools are then transported along and attenuated within transport pathways, namely, basin transport and attenuation. Surface-mobile nutrients are attenuated via one of the two basin pathways, overland flow (BS) or tile drains (BST). Here, I calibrated the model with a seasonally varied overland flow pathway parameter (bs) with a spatially explicit and constant overland flow length (section 3.3.3). The subsurface-mobile nutrients are transported through the groundwater flow pathway (BG). A separate groundwater pathway is septic plumes (BSE) that transport the mobile nutrients from septic systems. The septic plume pathway is separated from general groundwater because of the distinct chemical characteristics of septic tank 48 sources. The remaining nutrients after basin transport and attenuation then enter stream where point sources are then added and subjected to further attenuation through stream (including flow through connected wetland systems) and lake processes, such as N denitrification and P sorption/mineralization (DNSP), and biological update (Bio) (see equation 3.2). We assume that N denitrification or P sorption/mineralization can be described as a streambed exchange rate within each cell along the travel path. The streambed exchange rate is calculated from the hydraulic conductivity of the streambed sediments, the average slope of the stream channel, groundwater recharge, hydraulic radius, and in-stream water velocity (see Chapter 2). Within these variables, ksat, slope, and groundwater recharge are constant throughout the year but vary spatially, see details in Chapter 2. Hydraulic radius (channel area/wetted perimeter of the stream channel) and velocity vary across the season, see calculation in section 3.3.4.2. More detailed descriptions for deriving these terms are seen Text A2.3: Spatial Distribution and Derivation for Instream and Lake Losses. The other in-stream attenuation process, biological uptake, is considered to be a function of seasonally-variable instream travel time and is calibrated with seasonal parameters (rbio). Lake attenuation was modified from (Chapter 2) where I considered it a function of the intersected distance when nutrients are transported through the river and connected lakes. Here, I used the seasonally-varied settling time with a scalar parameter (tsettl) instead, see details in section 3.3.4.2. The seasonal model output is a map, at 120m resolution, of the total nutrient yield per cell per day, in this case, delivered to the GL coastline. These can then be used to compare nutrient fluxes (and in-stream concentrations) across these varied seasons but do not fully describe nutrient load variability throughout the year. In other words, this is a seasonal model of hydrologic conditions, 49 not a true daily time-varying nutrient flux estimate. However, it can still provide more precise information regarding when and where nutrient reduction should be targeted. 3.3.3.2: Model Calibration and Optimization There are eleven parameters in total in the updated SENSEflux seasonal models. See more details about these parameters, function, and their types (i.e., exponential, or linear) in table A3.1. Generally, we assume that seven parameters (f, ExH, SepEff, fstor, bst, bg, bse, dnsp) do not exhibit seasonal variations, and the remaining three vary across seasons: bs, rbio, and tsettl. I assumed that subsurface, groundwater dependent, and harvest parameters would stay the same throughout the year. In preliminary calibrations, we initially allowed bst to vary seasonally, but it showed little sensitivity so was set to be uniform across seasons. Thus, it’s assumed that the rates of nutrient loss in tile drain field pathway would be relatively similar across season (depending on seasonal nutrient mobility). Similarly, for dnsp during the river and lake attenuation, it did not show much sensitivity during the preliminary model runs, so it’s assumed to be stable throughout the year while the biological uptake (rbio) and the settling time (tsettl) parameters varied seasonally. All parameters, both seasonally variable and annually fixed, are optimized except for septic removal efficiency (SepEff), which is set to 0.3 for N and 0.35 for P as the same with SENSEflux annual models in Chapter 2. SENSEflux models are calibrated with seasonal observed concentrations or fluxes. Here, concentrations were selected because it improved models’ sensitivity to the attenuation terms, and modeled fluxes are dominated by the estimated or measured flow term when loads were used. After calibration, loads were used to assess model performance and present model results so that the results are comparable to those from other nutrient models in the literature. Seasonal TN and TP concentrations are compiled from the STORET database, see data processing in section 3.3.4.1. 50 SENSEflux models for TN and TP were calibrated separately. For each season of TN or TP, we used a random sampling approach to select 70% of the sites (Figure A3.0.3) for calibration and the rest (30%) for validation. For model optimization, the objective function (f(c)) was to minimize the mean absolute difference between log 10 of the simulated concentration (Csim) and log 10 of observed (Cobs) concentrations, see equation 3.3. Specifically, three (annual, melt and baseflow) models for each nutrient were run at the same time. The parameter values for each model were fitted using the MATLAB function fmincon, which uses the 'interior-point' algorithm (Byrd et al., 1999, 2000). The sum of the objective function from each season (annual, melt and baseflow) were used to access TN and TP model performance (equation 3.4). 𝑓(𝑐) = 𝑚𝑒𝑎𝑛(𝑎𝑏𝑠(𝑙𝑜𝑔(𝐶𝑠𝑖𝑚) − 𝑙𝑜𝑔(𝐶𝑜𝑏𝑠))) (3.3) 𝑓(𝑐)𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 = 𝑓(𝑐)𝑎𝑛𝑛𝑢𝑎𝑙 + 𝑓(𝑐)𝑚𝑒𝑙𝑡 + 𝑓(𝑐)𝑏𝑎𝑠𝑒𝑓𝑙𝑜𝑤 (3.4) Model residual (error) was calculated using equation 3.5 below. Thus, a residual of 1 indicates an order magnitude overprediction by the SENSEflux model. Positive errors indicate overprediction (ratio of model-simulated to observed is > 1) by the model, whereas negative errors represent underpredictions (ratio of model-simulated to observed is < 1). 𝐸𝑟𝑟𝑜𝑟 = 𝑙𝑜𝑔(𝐿𝑠𝑖𝑚) − 𝑙𝑜𝑔 (𝐿𝑜𝑏𝑠) (3.5) 3.3.4: Seasonal and Static Model Inputs Based on SENSEflux model modifications from annual to seasonal introduced in previous section, additional seasonal inputs are needed for calibration and optimization. These include (1) the major update – surface and groundwater seasonal mobility; (2) seasonal hydraulic radius and velocity that are required to derive streambed exchange rate; (3) in-stream travel time; and (4) settling time in lakes. Groundwater recharge, slope and hydrologic conductivity that are used to calculate streambed exchange rate, overland flow length, and tile drainage remain the same as the annual 51 models in Chapter 2. 3.3.4.1: Seasonal Mobility Derivation Seasonal mobility inferred from hydrologic flux variability is the primary update of SENSEflux seasonal models from annual models for comparing nutrient concentrations and fluxes with yearly conditions. I first employed the streamflow Hydrograph Separation Program (HYSEP) developed by USGS to split the streamflow into baseflow (groundwater discharge) and surface runoff. During this process, the sliding window method (Sloto & Crouse, 1996) was used for hydrography analysis to separate baseflow for each gauge. I separately calculated average baseflow during the Melt, Baseflow (here, denoting the late summer/early fall event period), and Annual periods. Surface runoff and baseflow for each period were then used to calculate surface water mobility (Ssm) and groundwater mobility (Gsm). Surface seasonal mobility (Ssm) are then calculated as the ratio of median surface runoff during the season (i.e., Melt, Baseflow) and Annual median surface runoff. Similarly, groundwater seasonal mobility (Gsm) are computed as the ratio of median baseflow during the season (i.e., Melt, Baseflow) and Annual median baseflow. See an example seasonal mobility calculation for USGS station (ID:04112500) in Figure A3.0.6. To calculate seasonal mobility, the following three steps are used for Gsm and Ssm separately. First, I applied HYSEP to separate streamflow from each USGS streamflow gauge in the Great Lakes Basin, including a 10km buffer, into baseflow and surface runoff components. Here, I modified the default HYSEP exponent parameter from 0.2 to 0.3, based on exploratory plotting to better estimate baseflow for this region. Next, I computed the median baseflow at each gauge and surface runoff for our Melt, Baseflow, and Annual periods. Gsm at each stream gauge for the Melt period is defined as the ratio of median baseflow during Melt to median Annual baseflow. Likewise, for Gsm during the Baseflow period (note the use of capitalization to indicate the Baseflow period, 52 separate from generic baseflow). Ssm was similarly calculated as the Melt/Baseflow median surface runoff ratio to median Annual. Finally, outliers for Gsm and Ssm were removed based on their distributions. For Melt, Gsm is limited to 1-5 (Ssm: 1- 20). For Baseflow, Gsm and Ssm are between 0 and 1. I then developed a random forest regression model to predict Gsm and Ssm at the HUC12 scale for all such watersheds across the US Great Lakes Basin. I chose the HUC12 scale to represent a fine spatial scale within which watershed characteristics are somewhat uniform. I first trained the model using individual stream gauge stations and then applied this to the HUC12s in the Basin. This model includes three input variables: drainage basin size, hydrologic landscape region (HLR) (Winter, 2001), and drift thickness from the USGS (Soller et al., 2012), see Figure A3.0.5a&b. To train this random forest model, I used the USGS-reported basin size for the stream gauge locations and extracted the HLR value and drift thickness. I then used HUC12 watershed size, majority HLR, and mean drift thickness for prediction at HUC12 scale. The point-scale model with three variables provided reasonable seasonal mobility estimation, with r2 = 0.89 during Melt (r2 = 0.85 during Baseflow) for Gsm (r2 = 0.86 and 0.87 for Ssm). Finally, the random forest regression model was used to predict Gsm and Ssm across the basin at Hydrologic Unit Level 12 (HUC12). Figure A3.0.7 depicts the maps for seasonal mobility. 3.3.4.2: Seasonal In-Stream and Lake Attenuation: Hydraulic Radius, Velocity, In-stream Travel Time and Settling Time for Lakes Stream attenuation includes water column loss, sediment interface, and hyporheic zone losses. Water column loss is mainly due to biological uptake, and it is a function of in-stream travel time across the seasons, see instream travel time during melt and baseflow in Figure A3.0.5c&d. Like overland flow length, seasonal in-stream travel time was calculated using NED DEM and the 53 flowlength function within the ArcGIS Hydrology Toolbox. In this case, a weight raster (1/seasonal velocity) was specified. The data of seasonal velocities were computed from the USGS gauge site and the method in (Mooney et al., 2020). The sediment interface and hyporheic zone losses are determined by a streambed exchange rate derived from the streambed sediments' hydraulic conductivity (K), the slope of the stream channel (S), and hydraulic radius and in-stream average velocity. See details for this streambed exchange rate derivation in Chapter 2. Here for seasonal simulation, I updated the streambed exchange rate with seasonal, varied hydraulic radius and velocity, while K, S, and Recharge maintained the same throughout the year. More specifically, recharge was assumed by the basin yield of streams at their 30th percentile, K and S are calculated from NED DEM and the gSSURGO soil dataset. Seasonal hydraulic radius was computed for each USGS stream gauge along the stream channel. First, flows and geometries for USGS gauge stations (i.e., width, basin area) are downloaded. Assuming the stream channels are rectangle, I calculate the river depth (D) as area (A) dividing by width (W). Hydraulic radius (R) is calculated as the ratio between channel area (A) and the wetted perimeter of the stream channel (P) where P = 2D+W. Seasonal SENSEflux models represent lake attenuation with a simple uptake term proportional to the setting time of nutrient flow paths crossing lacustrine-classified wetlands. The surface area of the lake was derived from the National Wetland Inventory (NWI) lacustrine-class wetlands. Then, seasonal hydraulic loadings were calculated by dividing seasonal streamflow by surface area. As settling time, the inverse of hydraulic loading is then calculated. 3.3.4.3: Static Landscape Inputs Static inputs include (1) nutrient source inputs; (2) groundwater recharge; (3) hydraulic conductivity (ksat) of the streambed sediments and the average slope of the stream channel (slope); 54 and (4) overland flow length and tile drainage. These inputs are the same as those used in the annual SENSEflux models (Chapter 2), and thus the same across both hydrologic season models. Seven spatially explicit nutrient sources from SENSEmap (Hamlin et al., 2020a) serve as nutrient inputs for the SENSEflux model. These sources include three agricultural terms (chemical agricultural fertilizers, manure, and N fixation), three urban terms (chemical non-agricultural fertilizers, septic tanks, and point sources), and atmospheric deposition. See maps in Chapter 2. Groundwater recharge was estimated using a series of linear models derived from the Landscape Hydrology Model (LHM), which combines various GIS layers to predict stream discharge, groundwater recharge, and evapotranspiration (Hyndman et al., 2007). In SENSEflux, groundwater recharge determines the fraction of nutrient transport between surface and groundwater pathways. It also supports streambed exchange rate calculation for nutrient river attenuation, along with two additional factors ksat and slope. The overland flow length is calculated using the flowlength function and DEM. The tile drainage layer was derived as an alternate overland flow pathway, assuming cropland areas with moderately low soil permeability and low average slopes are likely tile-drained. In addition to two independent parameters (bs (seasonally-varied) and bst), the overland flow length and tile drainage layer are static landscape factors that support basin nutrient attenuation. 3.4: Results 3.4.1: Model Accuracy Six SENSEflux TN and TP models with optimized parameters (Table A3.1) provided estimates of TN and TP fluxes to the Great Lakes from their US drainage basins. These models performed well (Figure 3.1), and TN models performed better than their TP counterparts, indicated by higher R square values except for TN calibration dataset, with a difference of -3%, 12%, and 9% for 55 calibration (20%, 14%, and 13% for validation) in annual, melt, and baseflow. Six models corresponded reasonably well to the 1:1 line (black dashed line in Figure 3.1), with TN and TP models slightly overestimating low loads and underestimating high loads, respectively. Also, the slopes of the line of best fit are near 1 (0.7 to 0.96 for TN and 0.73 to 0.86 for TP), indicating a reasonable bias. In addition, the MAEL values for annual and baseflow are comparable for the TN and TP models, but the TN snowmelt model has lower MAEL values (~0.2) than the TP snowmelt model, making it the best-performing model of the six. The median of model residuals (common logarithm of simulated load minus observed load) for calibration and validation watersheds are shown in Figure A3.0.8. Residual analyses reveal no significant (P > 0.05) bias for the TN snowmelt model and TP annual model based on a one-sample t-test; however, the mean residuals of the TN annual model, the TP melt model, and the baseflow models for TN and TP are significantly different from zero (P < 0.05). The residuals (Figure A3.0.9) are significantly (P < 0.01) clustered with the Moran’s Index of 0.42, 0.30, and 0.43 for TN annual, snowmelt, and baseflow models, respectively (0.41, 0.37, and 0.50 for TP models). 56 Figure 3.1 Plot of log10 simulated and observed daily loads for model calibration and validation datasets. Solid red and red lines indicate linear regression fits, the dashed black lines are the 1:1 line, and n refers to the number of observation points for model calibration and validation. 3.4.2: Seasonally Variable and Spatially Explicit Nutrients Delivery to Lakes Simulated TN and TP flux within the USGLB varied substantially across the seasons. For TN 57 (Figure 3.2a), simulated loading across the lake basin and in USGLB are highest in the snowmelt season, followed by annual condition and baseflow. There is 1063 t/day of nitrogen delivered from US lands to the lakes during snowmelt, and this is about 2.8 times greater than the annual delivery (378 t/day) and 16 times greater than the TN delivery during baseflow (66 t/day). This is largely contributed by the Lake Erie basin where 4.5 times greater TN was delivered during snowmelt than annual conditions. For other lake basins, TN delivery is the highest during snowmelt, followed by annual and baseflow. TP has a unique seasonal pattern compared to TN (Figure 3.2). Simulated export of TP across the lake basin and in the USGLB is the greatest under annual conditions (28 t/day), followed by snowmelt (21 t/day) and baseflow (7 t/day). This is somewhat unexpected because of greater seasonal mobility during snowmelt, though the difference between snowmelt and annual conditions is not large. The unique pattern for TP is likely due to the dominance of particulate phosphorus relative to soluble reactive phosphorus (SRP) in aquatic environments such as rivers and lakes. And particulate phosphorus is typically associated with sediment, organic matter, and other particles suspended in the water column. Unlike TN, nitrate is typically the most abundant form in aquatic systems, and its high solubility in water makes it readily transportable in the environment. In addition, the SENSEflux TP annual model overpredicts and the TP snowmelt model underpredicts significantly (Figure A3.0.9). Due to its drainage falling entirely within the USGLB, Lake Michigan receives the most TN (156 t/day) and TP loads (12 t/day) on an annual basis compared to other lake basins. Lake Michigan has the highest phosphorus delivery on an annual basis and across two hydrological seasons (baseflow and snowmelt). Lake Erie receives the highest TN delivery during snowmelt (453 t/day) and baseflow (25 t/day), followed by Lake Michigan (334 t/day during snowmelt and 24 t/day 58 during baseflow). This is most likely because the western basin of Lake Erie has a relatively flat topography with minimal relief. This region has generally shallow groundwater, making it susceptible to nutrient contamination during baseflow. Figure 3.2 Summarized TN and TP flux (a&b) within USGLB and flux (c&d) by lake basin within USGLB. 59 In addition, I calculated nutrient yields across lake basins, which are nutrient fluxes normalized by drainage basin area (Figure 3.3 a&b). Lake Erie has the highest TN yield, followed by Lakes Michigan, Ontario, Huron, and Superior, with 1.9 kg/km2/day on an annual basis, and 8.5 and 0.5 kg/km2/day for snowmelt and baseflow. Lake Superior has the lowest yield of total nitrogen among the five lake basins regardless of its annual condition or hydrologic seasons. The annual TN yield for Lakes Michigan, Ontario, and Huron is between 1.18 and 1.39 kg/km2/day, 2.89 to 3.32 kg/km2/day during snowmelt, and 0.14 to 0.26 kg/km2/day during baseflow. Lake Erie, like TN, has the greatest TP yield. Lakes Ontario, Michigan, and Huron follow with ~0.1 kg/km2/day on an annual basin, ~0.07 kg/km2/day for snowmelt and ~0.02 kg/km2/day for baseflow. This is consistent with Lake Erie’s presence of HABs and its overall low water quality (Michalak et al.,2013). Regardless of its annual conditions or hydrologic seasons, Lake Superior has the lowest TP yield among the five lake basins. Figure 3.3 Predicted (a) TN and (b) TP yields (kg/day/km2) to lakes in annual, melt, and baseflow. (c)&(e)&(g) are spatially TN yields for annual, melt and baseflow with a shared legend ((d)&(f)&(h) for TP). These maps are direct outputs from SENSEflux, then resampled to 720 m resolution for display purposes and classified in quantiles, within which each color represents ~20% of the USGLB area. 60 Figure 3.3 (cont’d) We also found that simulated export of TN and TP varied substantially across seasons within the USGLB, with the highest TN export in snowmelt and the lowest TN and TP delivery during baseflow (Figure 3.3, c-h). During snowmelt, most of Lake Ontario and Erie basin and the southern part of Lake Michigan and Huron basin exported higher than 641 kg/yr/km2 for nitrogen and 28.92 kg/yr/km2 for phosphorus. While in baseflow, most of these areas exported lower than 267 kg/yr/km2 for nitrogen and 9.93 kg/yr/km2 for phosphorus. Overall, snowmelt increased TN flux to the Great Lakes while fewer TN and TP were delivered during baseflow compared with annual conditions. The highest nutrient flux from the Lake 61 Michigan basin is because of its largest drainage basin size, while Lake Erie receives the highest nutrient yield (flux per area). 3.4.3: Nutrient Sources and Pathways Across Hydrologic Seasons The source contribution was highly variable across seasons for TN and TP with USGLB (Figure 3.4). Agricultural sources (chemical agricultural fertilizer, manure, and N fixation) were the dominant TN source annually (42%), and 66% and 52% for snowmelt and baseflow. Moreover, chemical agricultural fertilizer, manure, and N fixation contributed 8.2%, 2.8%, and 3% higher during melt than baseflow. Agricultural sources (chemical agricultural fertilizer and manure) accounted for 62% annually and 59% and 54% during melt and baseflow for phosphorus. The contribution of agricultural chemical fertilizer and manure for TP was substantially higher (3.6% and 1.5%) during melt than baseflow. Atmospheric deposition and urban sources (point source, chemical non-agricultural fertilizer, and septic tanks) were also crucial to TN and TP delivery. Within this, atmospheric deposition contributed more significantly to TN, while urban sources dominated TP. Remarkably, atmospheric deposition contributed the same with agricultural sources for TN delivery and accounted for 41% annually, but much less than agricultural sources during snowmelt (25%) and baseflow (14%). Urban sources accounted for 17%, 9%, and 34% of annual, melt, and baseflow, respectively. However, these have been found differently for TP delivery. Urban sources (point source, chemical non-agricultural fertilizer, and septic tanks) play a more critical role than atmospheric deposition for phosphorus, with 27%, 30%, and 38% for annual, melt, and baseflow, respectively, comparing 11%, 11%, and 8% of TP sources coming from the atmospheric deposition. Notably, point sources played a significant role in TN and TP delivery, particularly during baseflow with 23% of TN and 20% of TP coming from point sources, which is also shown 62 in the pathways (Figure 3.5) Results also show that septic tanks are a more significant source of TP than TN, contributing 6% (annually), 6% (snowmelt) and 3% (baseflow) higher TP than TN. Figure 3.4 Estimated percent of nutrients delivered to the USGLB by annual, melt, and baseflow sources. 63 Figure 3.5 Estimated percent of total nutrients in USGLB delivered to lakes by pathways. Points are drawn for annual, melt, and baseflow with different colors, with the difference highlighted with segments. Note: groundwater pathway in this diagram indicates bulk groundwater transport. Surface pathway (overland flow and tile fields) was the major nutrient transport pathway, totally contributing to 88% (66%), 94% (65%), and 63% (52%) of TN (TP) delivery during annual, melt and baseflow (Figure 3.5). Within this, TN was dominated by the tile drainage pathway while more 64 TP was transported through overland flow regardless of the seasons, except there are 37% more TN was transported by overland on an annual basis. In addition, subsurface pathways, including bulk groundwater and septic plumes, played a significant role, especially for TP. For instance, septic plumes transported 9 to 12% of TP, and bulk groundwater transported about 12 to 16% of TP across the seasons. In summary, the relative contributions of each source and pathway to the total TN and TP delivered to the Great Lakes were highly variable seasonally within USGLB, with a higher proportion of nutrients dominated by agricultural sources delivered through surface pathways (tile drainage (TN) and overland runoff (TP)) and more nutrients coming from urban sources delivered through subsurface pathways (groundwater and septic plumes) during baseflow. 3.4.4: Seasonal Delivery Hotspots The spatially explicit estimates of annual and seasonal nutrient deliveries provide a novel landscape perspective: seasonal delivery hotpots. The hotpot incorporates seasonal delivery (x- axis in Figure 3.6) and the delivery ratio, which is defined as the ratio of snowmelt and baseflow deliveries to annual deliveries (y-axis). The seasonal delivery and delivery ratio are quantified using terciles and presented in bivariate colormaps for melt and baseflow (Figure 3.6). Targeting nutrient reductions during snowmelt is more effective than focusing on baseflow, especially for TN. During the snowmelt season, locations with both high delivery (highest 33%) and delivery ratio (highest 33%) are the most intense delivery area to the coastline (Figure 3.6, a&b). These are predominantly agricultural regions, particularly for TN, and include the western Lake Erie basin, coastal areas in western Lake Michigan, and thumb areas in Michigan. Targeting nutrient reductions in these regions during the snowmelt season will reduce the summertime nutrient 65 concentration in these basins and the Great Lakes most effectively. Next, areas with low delivery (lowest 33%) but high delivery ratio (highest 33%, colored magenta on the hotspot maps) should be targeted for reductions during the snowmelt season. These areas are primarily located in the northern basin. Figure 3.6 Seasonal nutrient delivery hotspots are determined as high delivery (x-axis) and delivery ratio on hydrologic seasons compared with average annual condition (y-axis) in bivariate choropleth maps for TN (a&c) and TP (b&d) during snowmelt and baseflow. BF represents the baseflow season. During baseflow, the delivery ratios for TN and TP are less than one, indicating that nutrient reduction efforts during baseflow would have been less effective than during annual average conditions. Thus, the optimal location for nutrient reductions during baseflow would be those with high nutrient delivery (highest 33%, colored as green). 3.5: Discussion SENSEflux annual model performances are comparable to the SPARROW models for the US 66 Great Lakes Basin presented by (D. M. Robertson et al., 2019) (Figure A3.0.10). Lake Michigan had the highest nutrient flux because of the largest drainage basin size, consistent with (D. M. Robertson et al., 2019). The ranking of the highest yield across lake basins varied across seasons. On an annual basis, the Lake Superior basin and northern Lake Huron basin had much lower TN and TP delivery, likely because of low agricultural activity and sparse population. In addition, TN and TP yields are relatively high in the southern Lake Michigan basin, Saginaw Bay, and Lake Ontario basins, likely due to high chemical fertilizer and manure inputs and high groundwater recharge. Lake Erie had the highest TN and TP yields, and this could be due to many reasons. There is a high density of agriculture with high tile drainage intensity in the Lake Erie basin, agricultural sources teeming with nitrogen and phosphorus, which spurs algae and bacteria growth and directly stimulates the hypoxia and dead zone in Lake Erie. Besides, Lake Erie is the southernmost and warmest of the Great Lakes (Bockheim, 2020). Earlier snowmelt in the late winter and early spring increased nutrient flushing rate, with high nutrient delivery during snowmelt while relatively low TN and TP exports during baseflow (Figure 3.3, e-h). Thus, nutrient loadings are considered the primary driver of HABs in Lake Erie, associated with the eutrophication of Lake Erie (Michalak et al., 2013; Scavia et al., 2014; Watson et al., 2016). This information provides valuable timing information for targeting the total flux/yields to the Great Lakes from the lake basin level. 3.5.1: Seasonal Critical Sources and Pathways Point sources contributed a higher proportion of TN and TP delivery to the Great Lakes (Figure 3.5), particularly during baseflow, this is likely due to the pollutants being concentrated more during a low flow condition. Thus, it’s important to monitor and regulate point source discharges to ensure that they are meeting established standards. Non-point source pollution is still a 67 significant concern as it stems from many different sources and locations. Agricultural sources (chemical agricultural fertilizer, manure, and N fixation) were the leading cause of nutrient pollution across the USGLB, which is consistent with other studies (Richards et al., 2012; D. M. Robertson et al., 2019). However, agricultural sources were not always the leading cause of water pollution across the Great Lake basins. For instance, atmospheric deposition was the primary TN and TP source (Figure A3.0.11) in Lake Superior. Therefore, it is crucial to regulate atmospheric deposition, especially for watersheds forested with little agricultural or industrial activities (Foley & Betterton, 2019). Many nutrient sources and loading models do not specifically identify atmospheric deposition of TP, instead attributing to the forest and urban area sources and other terms in agricultural areas like in the USGS SPARROW model (Richards et al., 2012; D. M. Robertson et al., 2019; R. A. Smith, Schwarz, et al., 1997), thus may lead to underestimation of TP loads in such models. Besides, agricultural sources do not always remain stationary on the landscape where they are applied. Snowmelt increased TN delivery to the Great Lakes (Figure 3.4). Due to heavy snowfalls in the winter and early spring, the rapid release of water from melting snow washing manure and commercial fertilizer flow down the gradient through overland flow and tile fields into the streams and rivers (Figures 3.4 and 3.5). These are consistent with research that has found nitrogen and phosphorus exports are strongly associated with seasonal hydrology, such as snowmelt runoff being a critical period of nutrient export in a cold climate (Clement & Steinman, 2017; Corriveau et al., 2013; Rattan et al., 2017, 2019). Thus, it is critical to manage the timing and amount of fertilizer applications (T. Guo et al., 2021). In addition, agricultural tile drainage has been recognized as one of the dominant TN delivery 68 pathways (Figure 3.5). Nutrients can quickly move to tile drains through preferential flow paths, cracks, and fissures in the soil, root channels, and other macropores and escape from the field. Considering much of the most productive farmland and extensive use of tile drainage in the USGLB, and climate change with warmer temperatures and more extreme events, it is significant to determine the timing of nutrient application and implement cost-effective drainage water management. Lastly, septic tanks accounted for 12% of TP delivery during baseflow (Figure 3.4). Research has found that a significant proportion of nutrient delivery comes from septic tank systems under low flow conditions and thus causes eutrophication risk (P. J. A. Withers et al., 2012). Furthermore, current regulations and management are insufficient to effectively guarantee the septic systems' function (P. J. Withers et al., 2014). For instance, water pollution from septic tanks is a severe but under-appreciated problem across Michigan, where 10% of on-site wastewater treatment systems have failed and are polluting the environment. Thus, it is critical to regulate on-site septic systems and establish a statewide code governing how they are designed, installed, and maintained in states like Michigan, thus protecting public health, safety, and welfare. 3.5.2: Implications, Limitations, and Future Work There is great interest in reducing nutrient loading to the surface and groundwater. Here I present a spatially explicit, regional analysis of seasonal nutrient export that I calibrated with comprehensive TN and TP concentration measurements. SENSEflux uses spatially explicit nutrient sources and seasonal explicit environmental factors and provides insights into seasonal aspects of TN and TP loading, sources, and pathways with USGLB, thus improving our understanding of the timing and extent of hypoxic areas and algal blooms related to coastal nutrient loadings, consequently, allows us to optimize nutrient reduction strategies. This work has 69 demonstrated that SENSEflux has the potential to perform well in other regions which are facing water pollution challenges. Moreover, it could be applied for other periods, such as the summer rainy season, to understand seasonal nutrient transport more accurately. The spatially explicit loading, source, and pathway estimates for USGLB, along with findings, can be easily communicated with stakeholders through a web-based support system such as the Tipping Point Planner program (TPP, tippingpointplanner.org) as they can access the results and explore details of SENSEflux outputs without the assistance of technical experts or special programming training (Weinstein et al., 2021). TPP links data to local decision-making processes. Thus, adding SENSEflux outputs will help managers prioritize their actions in a time- and location-based manner and move toward the desired direction. However, I acknowledge that there are some limitations in our study. Firstly, groundwater has been proven to be a vital nutrient transport pathway, although it is often ignored in most existing models. SENSEflux uses an overland flow length to describe groundwater attenuation and uses a surface watershed boundary to determine nutrient inputs to the groundwater pathway. Including factors such as groundwater travel time would improve load, source, and pathway predictions, especially for ground watersheds with significantly different boundary conditions from surface watersheds. Besides, as I don’t have seasonal nutrient source inputs such as allocating fertilizer and manure to seasons based on phenology and fertilizer practice, SENSEflux simulated seasonal nutrient loading to the lakes using the annual average. For future research, establishing robust monitoring networks such as sampling throughout the non- growing season, particularly the winter and spring thaw periods, is critical to truly understanding the timing and quantity of nutrient loss. Besides, nutrient transport models could be refined by considering climate, cropping characteristics, and regional water quality targets. Climate change 70 has been playing an essential role in water and nutrient transport. As climate change brings warming temperatures and more frequent and intense storm events, there is a need to consider climate change as a factor when planning future nutrient pollution programs to track changes over a targeted time frame. In addition, human activities have been altering hydrological processes and transporting nutrients as well, thus explicitly considering and evaluating the effectiveness of constructed wetlands, stream buffers, agricultural practices (i.e., cover crop, conservation tillage), nutrient management, which could further help reduce nutrient losses (Lintern et al., 2020). These studies need to be ongoing to help practical management efforts at improving watershed health and sustainability. 3.6: Acknowledgments This work was supported by NASA grants 80NSSC17K0262 and NNX11AC72G, NOAA Grant NA12OAR4320071, and GS100013. Luwen Wan was partially supported by the China Scholarship Council (CSC) scholarship. The statements, findings, conclusions, and recommendations are those of the authors and do not necessarily reflect the views of NASA, NOAA, or CSC. 71 CHAPTER 4: SEETILEDRAIN: SPATIALLY EXPLICIT ESTIMATE OF TILE DRAINAGE USING REMOTE SENSING AND MACHINE LEARNING 4.1: Abstract Tile drainage has become increasingly prevalent in the US Midwest due to agricultural intensification and extreme climate events. This region would not be one of the most productive agricultural areas without its drainage systems. However, the existing tile drainage data are limited to survey-based statistics at the county level and likely tile-drained areas that are mapped with geospatial analysis. The lack of spatially explicit and fine-resolution information about tile drainage extent limits the accuracy of hydrologic modeling and the implementation of nutrient reduction strategies. To fill this gap, I developed a machine-learning model that maps agricultural tile drainage across the US Midwest at a 30-m resolution. The model used 20 satellite-derived and environmental variables and was trained with 43,165 tile and non-tile ground truth points based on the Google Earth Engine cloud-computing platform. The results show that our model achieved good accuracy, with 95.5% of points classified correctly and an F1 score of 0.904. The classified tile drainage area at the county scale has a reasonable agreement (r2 = 0.69) with the reported area from the USDA-NASS. I also used variable importance metrics and accumulated effects to interpret the machine learning model. Land Surface Temperature (LST) variables, and climate- and soil-related variables are the most important. The top-ranking variable is the median of nighttime LST during the summer based on the overall importance, followed by median soil moisture percent in the summer. The results may provide useful information for tile monitoring and may improve the accuracy of hydrologic and nutrient simulations to achieve cost-effective agricultural water and nutrient management. Furthermore, the machine learning algorithms developed in this study could be applied to other remote sensing mapping applications. 72 4.2: Introduction The use of drainage systems to remove excess water and promote crop productivity is widespread in poorly drained and humid regions worldwide where experience high levels of precipitation and have relatively high-water tables, such as Europe and North America. Denmark, for example, has artificially drained around 50% of its croplands, with most relying on tile drains (Møller et al., 2018). Similarly, 14% of agricultural land in Germany is tile-drained (Hirt & Volk, 2011). Canada and the United States have 14% and 27%, respectively, of their farmland artificially drained, using either surface or subsurface drainage (ICID, 2018; Kokulan, 2019). The investment in tile drainage is largely driven by the growing occurrence of heavy rainfall events and the resulting benefits of tile drainage installation to crop yields as it provides a well-drained environment for the crops (King, Williams, Macrae, et al., 2015). Tile drainage offers several major benefits for agricultural land systems. For instance, it helps to maintain soil structure by preventing compaction caused by waterlogged conditions, develop healthier and deeper root systems, provide moisture conditions for optimal crop growth, extend growing season, and utimately help farmers achieve more consistent and productive harvests. Although tile drainage offers numerous benefits, it also serves as a significant pathway for water and nutrient transport. Previous research primarily focused on the hydrological effects of tile drainage, including evapotranspiration and hydrological flux metrics such as discharge, streamflow index, and stormflow. While some studies have reported minor effects on hydrology (Khand et al., 2017; S. A. Miller & Lyon, 2021a), such as a slight decrease in annual cumulative evapotranspiration (Yang et al., 2016), others identified significant alterations to hydrological systems resulting from agricultural tile drainage (Thomas et al., 2016). Tile drainage is an important contributor to watershed discharge, accounting for up to 47% of discharge in an Ohio 73 headwater watershed with seasonal variation (King et al., 2014). It also accounts for a substantial portion (46-54%) of annual discharge between March and June in the Boone River basin, Iowa (Schilling et al., 2019). Agricultural tile drainage systems also serve as an essential nutrient pathway, potentially leading to eutrophication concerns in receiving water bodies. The prevalence of extensive agricultural tile drainage in the Midwestern US has significant implications for surface water nutrient pollution, including the formation of "dead zones". Extensive research has been conducted on nitrogen, phosphorus, and their species, such as nitrate, and it has been established that tile drainage amplifies nutrient losses from agricultural lands (King, Williams, Macrae, et al., 2015; Ma et al., 2023; M. P. Miller et al., 2017; S. A. Miller et al., 2022; Ren et al., 2022). For instance, Chapter 2&3 of this dissertation found that tile drainage contributes 46% of total nitrogen and 36% of total phosphorus annually to the US Great Lakes Basin, with even greater nutrient transport during snowmelt. Other studies have highlighted the crucial role of tile drainage in water and nutrient transport in sub-watersheds of the Great Lakes. For instance, (D. Smith et al., 2015) demonstrated that tile drainage discharge was responsible for 49% of soluble phosphorus and 48% of total phosphorus losses in the St. Joseph River watershed located in northeast Indiana, northwest Ohio and south-central Michigan, US and accounted for approximately 30% of nitrogen (primarily nitrate-nitrogen) (Ren et al., 2022). The water quality impacts of tile drainage have raised concerns about drinking water safety in Toledo, Ohio, and the large anaerobic zone in the Gulf of Mexico due to nutrient loading from the Mississippi River (Rabalais & Turner, 2019; D. Smith et al., 2015). Overall, tile drainage's hydrological and water quality impacts are subject to spatial heterogeneity, primarily due to the unique characteristics of different watersheds, changing climate patterns, and human activities. However, a significant factor contributing to this issue is the need for spatially 74 explicit tile drainage maps, especially at large scales. The lack of tile drainage information and spatial variability makes it challenging to study the impacts of tile drainage on crop growth and production, hydrological cycle, and nutrient leaching, especially at large scales. There are multiple approaches to estimate tile drainage extent, which have typically reported the percentage of tile drainage area at the county level or identified probable tile drainage locations. The USDA Agricultural Census, for instance, estimates the tile drainage area (identified as 'drained by tile') by surveying farmers in the CONUS counties every five years (NASS, 2012, 2017). The data are gathered at the county level from questionnaires sent to farmers and the quality of collected information depends on a high level of participation and respondents thus might not be consistently accurate (Wan et al., 2017). Geospatial analysis has also been used to detect likely tile-drained areas, with assumptions that agricultural areas with flat and poorly drained soils are likely to have tile drainage installed. (Sugg, 2007) indicates that a combination of soil class and hydrological properties of soil and soil drainage class would produce the most accurate estimates for different areas, while GIS analysis based on soil and land cover information can help estimate tile drainage extent in highly drained areas. Several recent studies have used soil drainage information and topographic slope thresholds to estimate tile drainage areas (Jame et al., 2022; Valayamkunnath et al., 2020). However, geospatial analysis can only identify likely tile-drained areas and is dependent on estimates from the USDA Census of Agriculture (Ag Census). These estimates result in large uncertainties in estimates of the tile-drained extent without spatially explicit information as Ag Census data is collected by county rather than by farm (NASS, 2017). There are alternative methods to identify tile drainage include thermal and aerial image processing techniques (Naz & Bowling, 2008; Prinds et al., 2019; Tilahun & Seyoum, 2020; Woo et al., 2019). For example, researchers mapped individual tile drains and estimated drainage spacing using high- 75 resolution aerial imagery within the Hoagland watershed in west-central Indiana (Naz et al., 2009; Naz & Bowling, 2008). However, the estimates of tile drainage may not be accurate in the presence of crop residue and other spatial features with similar spectral response as tile drainage. Another study (Gökkaya et al., 2017) used an image differencing approach to delineate tile drainage area for the Shatto Ditch watershed in Indiana by comparing shortwave infrared reflectance (SWIR) before and after a ~2.5cm rainfall event, as SWIR is strongly related to soil moisture, and soil with tile drainage dries relatively faster, thus has higher SWIR. However, such aerial imagery approaches can be expensive, and the image differencing approach is limited by weather conditions (i.e., rainfall intensity and cloud cover). Recently, Cho et al. (2019) integrated variables from satellite imagery and existing data sources, such as land surface temperature, climate, soil moisture, and properties, to identify subsurface drainage in the Red River basin and the Bois de Sioux Watershed in Minnesota for multiple years (Cho et al., 2019). Currently, there is a lack of spatially explicit and well-validated information on tile drainage extent in the US Midwest, which includes ~93% of tile drainage in the United States (NASS, 2017). In this study, I used the Google Earth Engine computing platform to map tile drainage based on 2017 data by combining a range of satellite-derived, climate- and soil-related variables with thorough ground truth points. This resulting spatially explicit tile drainage dataset can be employed in hydrological, water quality, and crop modeling, thus supplying environmental managers with vital information to enhance agricultural water and nutrient management practices. Furthermore, the machine learning algorithms used in our study can help monitor changes in tile drainage areas over time, setting a baseline for predicting future tile drainage installations in response to climate change and human activities. 76 4.3: Data and Methods 4.3.1: Study Region Our study region includes parts of 14 states, with 12 states in the US Midwest: Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wisconsin (Bureau, 2010), and two Great Lakes states, Pennsylvania, and New York. This region is characterized by a humid continental climate with warm to hot summers, as noted by (Peel et al., 2007). Average annual rainfall in the region generally decreases from east to west, with annual average precipitation and evapotranspiration of 860 and 634 mm, respectively (Abatzoglou et al., 2018). The Midwest region boasts deep, fertile soils with high concentrations of organic matter, creating excellent conditions to grow corn and soybeans. This has earned the region its reputation as the "agricultural heartland" of the US (Figure 4.1a) and has led to it becoming one of the most cultivated agricultural areas worldwide (FAO, 2017). As depicted in Figure 4.1b, approximately 70% of the soil in the Midwest is excessively or well-drained, while the remaining 30% fall under other soil drainage classes such as very poorly drained, poorly drained, and somewhat poorly drained. The eastern Midwest primarily consists of lowlands, which gradually increase in elevation towards the west. The median slope across the region is 3.5%, and Figure 4.1c indicates that the mean slope of approximately 60% of the area is less than 3.9%. According to the world agricultural production report, the United States is responsible for over 30% of the world's production of soybeans and corn (USDA, 2023); much of this is produced in the Midwest region (34% of the agricultural area). The Midwest provides some of the most prosperous farmland globally, allowing the cultivation of crops including corn, soybeans, sorghum, alfalfa hay, cotton, wheat, and more. As a result, agriculture plays an important role in 77 the local economies of the US Midwest. Figure 4.1 Study region (737 counties defined in (d)) maps showing (a) cropland and non-cropland areas based on the National Land Cover Database and the Cropland Dataset Layer; (b) the soil drainage class; (c) the mean slope; and (d) the tile drainage acres from the 2017 Ag Census. In (d), areas with no tile drains or withheld data due to privacy concerns as these counties contain few farms are represented as white. The US Midwest would not be one of the most productive agricultural areas in the world without its tile drainage systems (Fausey et al., 1995). It was reported that 224,190 km2 of land was tile drained across the continental United States (CONUS), in which 92.9% (208,358 km2) of tile drainage is installed in the 14 midwestern states and this accounts for 21.6% of agricultural land (NASS, 2017). However, 183 counties among these states (1177 counties) either lack reported data due to the absence of tile drains or withhold information to safeguard individual farm privacy (NA counties, Figure4.2d). To ensure a continuous boundary while excluding most NA counties, our study region is limited to 737 counties delineated by the heavy blue line in Figure 4.1d (this subset 78 is hereafter referred to as the 'US Midwest'), which accounts for 91.7% (204,842 km2) of the tile- drained area in CONUS (Figure 4.1d). 4.3.2: Tile and Non-tile Ground Truth Point for Classification 4.3.2.1: Tile Drainage Ground Truth Points The workflow for this study is shown in Figure 4.2. I generated a regional database of tiled and likely non-tiled ground truth points from various sources across the US Midwest region. These points were collected to compare variable differences between the two-point groups, train a random forest machine learning model and perform an accuracy assessment at the pixel level for the classification. Observed tile drainage points (Figure 4.3) were obtained from point, polyline, and polygon features. To compile point tile drainage features, a proportion of tile drainage points were identified through visual interpretation using the aerial imagery base map from Google Earth. Initially, a shared cropland area was identified from the Cropland Data Layer (CDL) in 2017 and the National Land Cover Dataset (NLCD) in 2016 since NCLD has ~3% more cultivated land than cropland in the CDL (USDA-NASS, 2017; USGS, 2016). The shared cropland area then overlaps with an existing, most likely tile-drained layer called US-AgTile (Valayamkunnath et al., 2020), resulting in the identification of potential tile-drained areas. Subsequently, likely tile-drained points were randomly sampled from these potential tile-drained areas and visually interpreted based on tile drain patterns, spaces, and canal ditches around the fields. 79 Figure 4.2 Workflow diagram illustrating the random forest classification method to create agricultural tile drainage maps across the US Midwest. Another set of tile drainage points was obtained from an existing study that used a geospatial model to determine the likely tile-drained areas of the CONUS. This model integrated a cropland mask, topographic slope, and soil drainage information with county-level tile drainage census data (Valayamkunnath et al., 2020). These points were manually identified from the ERSI multi- resolution aerial imagery base map between October 2, 2012, and April 23, 2017, using the ESRI ArcMap identification tool. 80 Figure 4.3 Map of ground truth tile drainage points sourced from literature (orange), tile drainage permits (green), and visual interpretation from Google Earth aerial imagery (blue). The training and testing tile drainage points are 500 * 500-meter apart, indicated as different point shapes. The study area's counties are highlighted in gray, with the ones reported by USDA-NASS to have tile drainage indicated in light gray and areas marked as "NA" indicate no tile drains or withheld data due to privacy. I also obtained tile drainage points from polygon and polyline features. To ensure their accuracy, I randomly sampled points from these features and then excluded any that were too close to the edge of polygons and polylines. The polygons are obtained from permits for agricultural subsurface drainage tile locations provided by USGS in North Dakota (Finocchiaro, 2016) and South Dakota (Finocchiaro, 2014) and are assumed to represent ground truth tile drainage measurements. The polylines consist of sources including tile lines and drainage structures in three sites in Michigan, US (Blissfield, Clayton, and Palmyra) that were digitized using GIS tools, tile line permits from the Bois de Sioux Watershed District, tile lines from the Story County Farm in Iowa, and a photo of a tile-drained field in the southeast research farm in Iowa. To avoid biased classification due to clustered tile points from available data and the spatial 81 correlation among adjacent points, a distance limit was set through 500*500-meter fishnet grids. (L. Yan & Roy, 2016) reported that the average and median field sizes of US farms are 400 and 527 meters, respectively. Here, a threshold of 500 meters was selected to ensure training and testing point separation thus handling co-variation while still maintaining an adequate number of points for model training and validation. 4.3.2.2: Non-Tile Ground Truth Points To identify non-tile points, I utilize the potential tile-drained layer described in the previous section. I created a 30-meter buffer zone and obtained the complimentary area within agricultural lands, which I define as a likely non-tile layer. I then extracted points at least 500 meters apart from this layer and defined them as likely non-tile points in the classification. While these points may not accurately reflect actual ground conditions, given the limited availability of non-tile data and the associated labor costs for manual identification, utilizing these likely non-tile points is a practical approach for classification. I employed cross-validation techniques, using different proportions of data, to train and validate the model. To validate the model, I randomly choose 30% of non-tile points. Based on the overall area ratio (roughly 1:4) of tile and non-tile in US Midwest region of Ag Census (NASS, 2017), the number of validation tile drainage points was determined. The remaining tile and non-tile points were then used to train the model. 4.3.3: Effective Variable Selection from Satellite and Environmental Datasets To perform random forest classification and measure variable importance, the selection of input variables are important. I utilized three steps to select the final variables and determine the optimal number of variables for classification. First, I gathered variables from previous studies on tile drainage detection (Cho et al., 2019) and other agricultural practices, such as irrigation mapping 82 (Deines et al., 2017, 2019; Xie & Lark, 2021), as both tile drainage and irrigation can affect environmental conditions and promote crop growth, which can be reflected in vegetation and water-related indices. Then, I identified the appropriate periods for these variables by analyzing the differences between tile and non-tile points based on two satellite-derived indices: the normalized difference vegetation index (NDVI) and the normalized water index (NDWI). These two variables was chosen due to the relatively high spatial and temporal resolution of Landsat imagery, as well as NDVI and NDWI has been demonstrated to be helpful in identifying tile drains (Cho et al., 2019; C. Zhang et al., 2014). Periods for other variables then match with these identified time ranges from NDVI and NDWI as needed. Finally, a function was executed to successively remove highly correlated and lesser important variables to handle covariation and provide robust performance for variable importance measurements. Within this, a Pearson correlation coefficient (r) was calculated between each pair of all variables. If the correlation exceeded a threshold (r > 0.8), we retained only the variable with a higher variable importance from the initial classification. Note, there is no standard ‘strength of correlation’ categories, they are domain specific and vary across different studies (Schober et al., 2018). Here, we define 0.8 as the threshold for strong correlation. The satellite and environmental datasets used in this study include (1) Atmospherically corrected surface reflectance derived from the data produced by Landsat 7 ETM+ and Landsat 8 Operational Land Imager (OLI)/Thermal Infrared Sensr (TIRS) sensors (USGS, 2017); (2) MODIS Land Surface Temperature; (3) GridMET, which provides daily climate data, such as precipitation, and potential evapotranspiration ; (4) TerraClimate, which provides monthly climate and climatic water balance for global terrestrial surfaces; (5) SMAP, NASA-USDA Enhanced SMAP Global 83 Soil Moisture Data; (6) POLARIS, Probabilistic Remapping of SSURGO soil properties; (7) USGS 3DEP 10m National Map; (8) gSSURGO Database, which contains gridded soil survey geographic data; (9) National Hydrography Dataset (NHD) from the US Geological Survey; and (10) Hydrologic landscape regions (HLR) of the United States; and (11) Cropland Data Layer (CDL). Using these datasets, I derived variables that capture the distinguishing characteristics between tile and non-tile points, guided by our knowledge and prior research (Cho et al., 2019). For example, we anticipate that fields with tile drainage will exhibit higher normalized difference vegetation index (NDVI) values during the growing season, as tile drains facilitate crop growth and NDVI correlates strongly with green biomass (Prinds et al., 2019). We utilized Landsat 7 and 8 images from the USGS Level 2, Collection 2, and Tier 1 data sources (USGS, 2017) to derive variables such as the normalized difference vegetation index (NDVI) and normalized water index (NDWI). NDVI is the most used vegetation index for observing greenery and is calculated with the equation (4.1), where NIR is near-infrared (which vegetation strongly reflects) light and Red is the visible red (which vegetation absorbs) reflectance. NDWI is developed by (Gao, 1996) to enhance the water-related features of the landscapes. This index uses the near-infrared (NIR) and the Short-Wave infrared (SWIR) bands, see equation (4.2). These satellite imageries contain surface reflectance (SR) that is atmospherically corrected and produced by the Landsat 7 ETM+ sensor and Landsat 8 OLI/TIRS sensor using the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) algorithm and the Land Surface Reflectance Code (LaSRC) respectively (Schmidt et al., 2013; Vermote et al., 2016). 𝑁𝐷𝑊𝐼 = (𝑁𝐼𝑅 − 𝑅𝑒𝑑)/(𝑁𝐼𝑅 + 𝑅𝑒𝑑) 𝑁𝐷𝑊𝐼 = (𝑁𝐼𝑅 − 𝑆𝑊𝐼𝑅1)/(𝑁𝐼𝑅 + 𝑆𝑊𝐼𝑅1) 𝑇𝑟_𝑆𝑊𝐼𝑅1 = (1 − 𝑆𝑊𝐼𝑅1) ∗ (1 − 𝑆𝑊𝐼𝑅1)/(2 ∗ 𝑆𝑊𝐼𝑅1) 𝑇𝑟_𝑆𝑊𝐼𝑅2 = (1 − 𝑆𝑊𝐼𝑅2) ∗ (1 − 𝑆𝑊𝐼𝑅2)/(2 ∗ 𝑆𝑊𝐼𝑅2) (4.1) (4.2) (4.3) (4.4) 84 Although land surface temperature (LST) is available from Landsat images, I opted to use LST from the MODIS dataset due to Landsat collection 2 surface temperature data gaps because of missing Advanced Spaceborne Therma Emission and Reflection Radiometer Global Emissivity Dataset (ASTER GED) (USGS, 2023). MODIS provides an 8-day average LST with a 1000 m resolution. From this product, I obtained daytime and nighttime land surface temperature and computed the difference between them to derive the diurnal LST difference (Wan et al., 2017). The gridded meteorological datasets GridMET and TerraClimate from the University of Idaho are used to obtain climate-related variables, which have a spatial resolution of approximately 4 km (Abatzoglou, 2013; Abatzoglou et al., 2018). Daily precipitation and reflectance evapotranspiration (ETo) are sourced from the GridMET dataset (Abatzoglou, 2013). Aridity is also calculated as the ratio between cumulative precipitation and ETo and is used in the classification. Actual evapotranspiration (AET) variables in the spring and summer seasons are sourced from the monthly TerraClimate dataset (Abatzoglou et al., 2018) and derived using a one- dimensional soil water balance model. GridMET was generated by blending the high-resolution spatial data from Parameter-elevation Regressions on Independent Slopes Model (PRISM) with the high temporal resolution data from National Land Data Assimilation System (NLDAS) to produce spatially- and temporally continuous variable (Abatzoglou, 2013). TerraClimate utilizes climatically-aided interpolation, which combines high-spatial resolution climatological normal from the World Climate dataset with time-varying data from CRU Ts4.0 and the Japanese 55-year Reanalysis (JRA55) at coarser spatial resolutions (Abatzoglou et al., 2018). The NASA-USDA Enhanced SMAP data at 10-km spatial resolution is used to gather surface and subsurface (~1 m) soil moisture (SSM, SUSM) and the percent soil moisture (SMP). This dataset integrates satellite-derived Soil Moisture Active Passive (SMAP) Level 3 soil moisture 85 observations with the modified two-layer Palmer model based on the 1-D Ensemble Kalman Filter (EnKF) data assimilation approach. SMP is estimated as the available water for the plant divided by the total water-holding capacity of the soil profile, which helps determine if the soil has sufficient water for crop growth. Clay percentage and saturated hydraulic conductivity estimates were derived from the Probabilistic Remapping of Soil Survey Geographic Database (SSURGO) soil property database called POLARIS (Chaney et al., 2019). I also estimated plant available water using the van Genuchten equation and parameters, such as saturated soil water content and residual volumetric water, also sourced from POLARIS. In addition, I incorporated topographic slope and soil drainage class information since flat, poorly drained fields are ideal for tile drainage installation (Jame et al., 2022; Valayamkunnath et al., 2020). I computed the mean slope (in degrees) using elevation data from the USGS 3D Elevation Program 10-Meter Resolution Digital Elevation Model(USGS, 2019). Soil drainage class information was obtained from the Gridded SSURGO (USDA, 2013), a product similar to standard SSURGO but in file geodatabase format. It includes seven categories (Figure 4.2b), (0) excessively drained; (1) somewhat excessively drained; (2) well drained; (3) moderately well drained; (4) somewhat poorly drained, (5) poorly drained, and (6) very poorly drained. Canals and ditches from the National Hydrography Dataset (NHD) were also included. These waterways are man-made and serve purposes from water transportation to irrigation, drainage, and navigation. To identify the proximity of each cell in the study region to these canals and ditches, I computed the Euclidean distance. This information is particularly relevant as these waterways are frequently utilized for draining water from tile-drained fields. Hydrologic Landscape Regions (HLRs) are delineated by USGS based on land surface form, geology, and climate among 43,931 86 small watersheds across CONUS, using GIS and statistical analysis. Ultimately, 20 landscape hydrologic region classes were generated based on the similarities in landscape and climate characteristics, and hydrologic processes in different regions are assumed to be affected by the landscape and climate factors. To identify agricultural tile drainage, the cultivated crops category from the National Land Cover Database (NLCD) was selected to create a mask for classification. Cultivated lands refer to areas utilized for annual crop production, such as corn, soybeans, vegetables, tobacco, and cotton. Except for processing soil drainage class and distance to canals or ditches locally and uploading them to the Google Earth Engine (GEE), all these data sources above can be accessed through the GEE data archive and the publicly shared awesome-gee-community-datasets, which are sourced by the community and are available as Earth Engine assets (Roy et al., 2023). The native resolution of these data sources varies from 10 to 10,000 meters; I aggregated or disaggregated them to 30 meters resolution to derive the variables and for the final classification with the built-in reduce resolution and interpolation functions in GEE. 4.3.4: Random Forest Classification and Accuracy Assessment We chose Random Forest (RF) because RF is capable of handling non-monotonic relationships, accommodating non-linear relationship between variables effectively and most importantly, reducing the likelihood of overfitting which is one of the biggest problems in machine learning by creating random subsets of features and building multiple trees using those subsets (Breiman, 2001). RF classification has been widely used for classification tasks such as irrigation mapping, flood risk assessment, and water quality predictions (Belgiu & Drăguţ, 2016a; Cho et al., 2019; Deines et al., 2017, 2019; Z. Wang et al., 2015; Xie & Lark, 2021). Random forest classifiers are tuned by varying the number of decision trees (ntree) and the number 87 of variables randomly selected and tested for the best split when growing trees (mtry). A review study about random forest in remote sensing found that most research set ntree to 500 (Belgiu & Drăguţ, 2016b) because the error rate became stable before this number of trees is achieved (Lawrence et al., 2006) and since the default value for ntree in the randomForest R package is 500 (Liaw & Wiener, 2002a). The mtry can be varied between one and the maximum number of input variables, with the default of the square root of variable numbers. The classifier with this pair of ntree and mtry then makes predictions for other areas that don’t have ground truth data based on what was learned from the training data via the Google Earth Engine cloud computing platform. Accuracy assessment was conducted at both point-based and county level. For point-based assessment, I calculated seven commonly used metrics based on the tile and non-tile ground truth points, including accuracy, sensitivity, specificity, positive and negative predictive values, accuracy, Kappa and F1 score (Congalton & Green, 2019), using the confusionMatrix function from the caret package in R (Kuhn, 2008). These metrics range from 0 to 1, and the higher value means the classification model is more reliable. Accuracy is a statistical measure of how well a binary classification can correctly identify or exclude a condition, calculated as the proportion of correct predictions, both true positives (tile) and true negatives (non-tile), among the total number of points examined. Sensitivity (Recall) and specificity describe the accuracy of a classification of the presence or absence of a condition. Sensitivity measures how well a test can identify true positives, while specificity is a measure of how well a classification can identify true negatives. The positive predictive value (Precision) is the proportion of true positive classifications out of all positive classifications made by the model, so it measures the probability that a positive classification made by the model is correct. Similarly, the negative predictive value is the proportion of true negative classifications out of all negative classifications made by the model 88 and measures the probability that a negative classification made by the model is correct. The F1 score is a common metric that many classification models use to assess the quality of their models, calculated as the equation (4.5) (Sasaki, 2007). 𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 ∗ (𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙)/(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙) (4.5) For county-level accuracy assessment, statistical data in 2017 from the United States Department of Agriculture (USDA) National Agricultural Statistics Service (NASS) are used. More specifically, I calculate classified tile drainage areas for each county and then fit a linear regression model to compare them with reported ones. 4.3.5: The ‘Black Box’ of the Machine Learning Model We used a collection of tools to evaluate and communicate the results of the tile drainage classification model by examining variable importance, such as Mean Decrease Accuracy, Mean Decrease Gini and Shapley values, and accumulated local effects (ALE). The combination of these measures provide a comprehensive understanding of how each variable contributes to the model’s performance. Variable importance helps to quantify what variables are driving model performance and more specifically, how much each variable improves the model’s accuracy. I ran the importance function from the randomForest package in R (Liaw & Wiener, 2002b). I extract variable values for training and validation points and develop a proxy random forest classification with the same training and testing data and identical parameter settings (ntree and mtry) using the randomForest packages in R, which means the variable importance values likely mirror their importance in GEE. The importance function produces two importance measures: MeanDecreaseGini and MeanDecreaseAccuracy. MeanDecreaseGini is an impurity-based importance, representing the total decrease in node impurities from splitting on the variable, averaged over all trees. MeanDecreaseAccuracy is permutation-based importance, measuring accuracy reduction on out- 89 of-bag samples when the variable values are randomly permuted and are more reliable than the MeanDecreaseGini (Strobl et al., 2008). Both measures have been applied to different studies (Deines et al., 2017; Matasci et al., 2018; Schroeder et al., 2017). I also used the Shapley values to check how much every predictor contributed to a given prediction. Shapley is the average contribution of a variable value to the prediction, not the difference in prediction when I remove the feature from the model. I computed the overall importance by assigning a score of 31 to the variable with the highest ranking within each of the three importance measures, and a score of one to the least important variable. The total score from the three measures was then used to calculate the overall importance. The highest possible score is 93, which would correspond to a variable that ranked at the top for all three measured importance. However, we are still not able to know whether the high or low value of the variables yields higher probabilities of the targeted class (i.e., tile) via variable importance metrics. I then checked the ALE with the iml R package (Molnar, 2018). ALE looks at the general relationship between the predictors and the targeted class (i.e., tile drainage). 4.3.6: Comparison with Existing Products To ensure the reliability of our tile drainage product for the US Midwest in 2017, I compared tile drainage areas for the US Midwest to the three other likely tile drainage maps and reported areas from Ag Census for the study region. We also performed a random forest classification using the same training and validation points. However, I only used slope, and soil drainage class, which are commonly used in other products in the comparison. I set the number of trees (ntree) to be the same as our classification (default: 500), and the number of mtry is set as two since there are only two variables. I then compared the out-of-bag (OOB) error with our classification which had 20 variables for the training and 90 validation datasets. The accuracy and F1 score metrics are compared as well. 4.4: Results 4.4.1: Selected Variables and Their Difference between Tile and non-Tile Initially, I gathered 57 variables from nine satellite and environmental datasets described in section 2.3. I then identified three time periods - spring: 4/1 - 5/31, summer: 7/1 - 8/31, and growing season: 5/1 - 9/31 based on the mean of maximum (max among available Landsat images) NDVI and NDWI for all the ground truth points across the study region (Figure A4.0.2). The objective was to maintain a consistent pattern for tile and non-Tile points during the same period. For example, tile points exhibited relatively lower NDVI but higher NDWI in April and May compared to non-Tile points, thus I define the spring season as April 1st to May 31st. Subsequently, I successively remove highly correlated (r > 0.8) and lesser important variables, which results in 26 variables were removed and the remaining 31 variables are used for classification. The ultimately used 31 variables are listed in Table 1, with their short names, full names, units, number of scenes or images, time periods, equations, data sources, and resolutions. The remaining 26 variables were initially chosen but not used in the final classification and are detailed in Text A4.1. Figure A4.0.3 shows the correlation coefficient for the remaining variables, with r < 0.8. Several other variables were investigated in this research, including combinations of VV single co-polarization with vertical transmit/vertical receive and VH dual-band cross-polarization with vertical transmit/horizontal receive, extracted from the C-band Synthetic Aperture Radar Ground Range Detected of the Sentinel-1 mission (Sentinel-1 SAR GRD) from GEE, as well as the neighborhood index and combinations of selected variables. However, these variables were not incorporated into our final classification model as they did not significantly improve prediction accuracy. 91 Table 4.1 Summary information for 31 satellite-derived and environmental variables. Variable full name Max NDVI in growing season Unit Scene Time period Equation Data source Resoluti on (m) NA 1758 5/1-9/30 (NIR-Red)/(NIR+Red) 611 4/1-5/31 745 7/1-8/31 611 4/1-5/31 7/1-8/31 745 (NIR- SWIR1)/(NIR+SWIR1) (1-SWIR1) * (1- SWIR1)/(2*SWIR1) Surface reflectance from USGS Landsat 7 Level 2, Collection 2, Tier 1 and USGS Landsat 8 Level 2, Collection 2, Tier 1 (Landsat) 30 Abbreviation NDVI_grow_max NDWI_spr_max NDWI_summ_max Tr_swir1_spr_max Tr_swir1_summ_max Tr_swir2_grow_max Max NDWI in spring NA Max NDWI in summer NA NA NA Max Tr_swir1 in spring Max Tr_swir1 in summer Max Tr_swir2 in growing season NA 1758 5/1-9/30 (1-SWIR2) * (1- SWIR2)/(2*SWIR2) TerraClimate: Monthly Climate and Climatic Water Balance for Global Terrestrial Surfaces, University of Idaho GridMET:Univ ersity of Idaho Gridded Surface Meteorological Dataset 4638.3 4638.3 MOD11A2.061 Terra Land Surface Temperature and Emissivity 8-Day Global 1km (MODIS) 1000 AET_grow AET in growing season mm 5 5/1-9/30 NA Precip_grow Aridity_spr Aridity_preGrow_3yr DayLST_median_gro w DayLST_median_spr DayLST_range_spr DayLST_median_su mm Precipitation in growing season Aridity in spring Aridity in pre- growing season from the prior 3 years Median daytime land surface temperature (LST) in growing season Median daytime LST in spring Range (max - min) of Daytime LST in spring Median of Daytime LST in summer mm 153 5/1-9/30 NA 61 4/1-5/31 Precipitation/PET NA 93 5/1-5/31 Precipitation/PET 20 NA Kelvin 7 7 8 5/1-9/30 4/1-5/31 NA 4/1-5/31 Range = max - min 7/1-8/31 NA 92 Table 4.1 (cont’d) DayLST_range_grow DayLST_range_sum m DiffLST_median_gro w DiffLST_median_spr NightLST_max_sum m NightLST_max_spr SMP_median_summ SUSM_max_spr SUSM_range_spr Range of daytime LST in growing season Range of daytime LST in summer Median difference between daytime and nighttime LST in growing season Median difference between daytime and nighttime LST in spring Maximum Nighttime LST in summer Maximum Nighttime LST in spring Median of percent soil moisture in spring Maximum subsurface soil moisture in spring Range of subsurface soil moisture in spring 20 5/1-9/30 Range = max - min 8 7/1-8/31 Range = max - min 20 NA 5/1-9/30 7 NA 4/1-5/31 8 7/1-8/31 NA 20 5/1-9/30 NA % 21 7/1-8/31 NA mm 20 NA 4/1-5/31 20 Range = max - min mm 4/1-5/31 Clay_mean_5cm Percent of clay (0-5 cm) % Ksat_mean_5cm Saturated hydraulic conductivity (0-5 cm) log10(cm/h r) Paw_mean_5cm Plant available water (0-5 cm) mm 1 1 1 Slope_mean Mean slope degree 1 Soil_drain_class Soil drainage class NA 1 NA NA NA NA NA NA 93 NASA-USDA Enhanced SMAP Global Soil Moisture Data (SMAP) 10000 Probabilistic Remapping of SSURGO (POLARIS ) Derived from POLARIS with the van Genutchten equation USGS 3DEP 10m National Map Seamless (NED) Gridded Soil Survey Geographic Database(gSSU RGO) 30 10 30 Table 4.1 (cont’d) Canal_ditch_dist HLR Cropland Distance to the nearest canal ditch Hydrologic Landscape Regions A crop-specific land cover data layer meter 1 NA 1 NA NA NA NA NA 1 2017 NA National Hydrography Dataset (NHD) Hydrologic landscape regions of the United States (HLR) USDA NASS Cropland Data Layers (CDL) 500 30 30 Note: Landsat bands are used to calculate variables, where NIR is near infrared band, Red is red band surface reflectance, SWIR1 is the Short-wave Infrared 1 band and SWIR2 is the Short-wave Infrared 2 band. We then compared variable differences between the two groups: tile and non-tile ground truth (Figure 4.4). Our analysis, based on the Wilcoxon test in R, revealed that there is a significant difference (p < 0.0001) between the two groups for the 31 variables used in the final classification. For brevity, we only show the top 20 of the 31 variables, based on mean decrease accuracy. Four climate-related variables were chosen among the selected twenty variables, including actual evapotranspiration (AET) in growing season (AET_grow), aridity in spring (Aridity_spr), and precipitation in growing season (Precip_grow) and aridity in pre-growing season from the prior three years (Aridity_preGrow_3yr). The median AET across all tile points is higher than non-tile points across seasons, with the largest difference seen in the growing season (167 mm). More accumulative precipitation in growing season for the tile points compared with the non-tile points; the difference is about 58 mm. This is consistent with the fact that tile drainage installation occurs in areas with more precipitation for optimal crop growth. For aridity, the median aridity index in the spring season for the tile group is 0.93, and 0.88 for the non-tile group. However, during the pre-growing (May) season in prior three years, aridity for tile (0.75) points is lower than non-tile (0.79) points. 94 Seven variables derived from the MODIS product were selected. During the summer, the maximum nighttime LST is 0.4 degrees Kelvin higher for all tile points, with most points concentrated around a relatively low (292 Kelvin) value; this is probably attributed to the cooling effect of the green vegetation. The nighttime LST in spring for two groups have similar distributions, and coincidentally, the same median value (284 Kelvin). The median value of the maximum daytime LST in the growing season and in spring are close for the two groups. A notable difference (3.6 Kelvin, median) in the range (maximum-minimum) of daytime LST during the growing season for the tile group, as well as in the spring (3.2 Kelvin, median), while in summer, the difference is about 1 Kelvin. 95 Figure 4.4 Significant variable differences between tile and non-tile points (p <= 0.0001, Wilcoxon test) for all variables. The strips are colored by different data sources. The bottom and top of each box represent the first and third quartiles, respectively, and the line inside each box represents the median. Data beyond the end of the whiskers are "outlying" points and are not shown here. Median soil moisture percent (SMP) in the summer, the maximum and range of subsurface soil moisture in spring are the variables that the model picked from the SMAP dataset. I found that SMP in the summer is 21% higher for the tile points and this is likely because tile drainage maintain the right level of water table during the summer season for crop growth. Subsurface soil moisture 96 is higher for tile points in the spring, with 25 mm difference. The range of subsurface soil moisture between the two groups is little (0.5 mm) but significant. For soil drainage class, lower values indicate higher drain capability, with zero representing excessively drained soil, and six indicating very poorly drained soil. Our results indicate that most (56%) tile points are in poorly drained (22%, value = 5), somewhat poorly drained (34%, value = 4) regions. In contrast, non-tiled points are mainly (52%) in well-drained (value = 2) areas. I also observed that the number of points decreases with increasing mean slope for both tile and non-tile points. The maximum slope of non-tiled points is close to ten, while for tile points, the maximum slope is 6. The median slopes for non-tiled and tile are 1.37 and 0.62, respectively. This is likely because tile drainage systems are typically installed in areas with relatively low slopes. The pattern for the hydrologic landscape regions is similar, with non-tile points spread into more regions, while tile points are concentrated in regions (5) arid plains with permeable soils and bedrock, (3) subhumid plains with impermeable soils and permeable bedrock and (7) humid plains with permeable soils and impermeable bedrock, respectively. Both tile and non-tile groups exhibit a similar pattern in terms of distance to canals or ditches, with more points located in areas with lower distances. However, I found that tiled points are generally closer to canals and ditches, with a median distance of 4882 meters less compared to non-tiled points. I also found that the patterns for soil property variables, such as saturated hydrologic conductivity (Ksat) and clay percentage, were similar between the two groups. Tile points had a slightly but significantly higher (4.5%) clay percentage and a lower Ksat value, indicating that areas where tile drainage was installed would have a lower capacity to transmit water without tile drainage systems. 97 4.4.2: Classified Map from Random Forest Classification and Accuracy Assessment In the random forest classification, I used 28,723 tile points and 32,215 non-tile points, along with selected 31 variables, listed in Table 1, to train and validate the classification model. The training and testing tile drainage points are 500 meters apart shown in Figure 4.3, likely non-tile points are distributed as 500 meters away between training and testing dataset, but not shown in the map due to the large number of points and limited spaces. Specifically, I used 48,982 points (26,442 tile points and 22,540 non-tile points) to train the model, with the remaining 11,956 points (2281 tile points and 9675 non-tile points) held out for validation. I use the default settings, 500 trees and 5 variables per split for classification. The classified map is shown in Figure 4.5. Classified tile drainage areas are concentrated in the corn belt region, including eastern Dakotas, southern Minnesota, north central Iowa, northeastern Illinois and Indiana, northwestern Ohio, and Michigan's thumb area. In general, the machine learning model can capture the reported tile fields in the western Lake Erie basin and tile permits in SD. The point-based assessment indicates that the classification model achieved good overall accuracy, with a score of 0.96, which indicates that 96% of points, including tile and non-tile, are classified correctly (Figure 4.6). The Kappa and F1 scores are 0.876 and 0.901. Additionally, the other four metrics were higher than 0.85, including sensitivity (0.95), specificity (0.96), positive predictive value (Precision) (0.85), and negative predictive value (0.99), demonstrating the good quality of the classification model. We aggregated the tile drainage areas for each county from the classified map and compared them with the area from the USDA NASS, as illustrated in Figure 4.7a. The random forest classification model provided reasonable agreement with the reported area from Ag Census, with an R2 value of 0.68. However, the model tended to overestimate tile drainage area, with a slope of 1.1, especially 98 in counties with larger tile drainage reported by NASS. It underestimates the counties with smaller reported areas such as these points with reported areas lower than 500 km2. Figure 4.5 Classified tile drainage (blue) map with the background of agricultural lands (yellow). Two zoom-in windows show the reported tile field (black circle) in the western Lake Erie basin and tile drainage permits (black rectangle) in South Dakota (red rectangle). At the state level (Figure A4.0.5), eight states (Illinois, Indiana, Iowa, Michigan, Minnesota, North Dakota, Ohio, and Pennsylvania) had R2 values greater than 0.65, and two of these states (Michigan and Pennsylvania) had underestimated tile drainage areas. The estimates for the remaining states (Kansas, Missouri, Nebraska, New York, South Dakota, and Wisconsin) are less accurate, likely because they are less heavily tile-drained or have fewer ground truth points. 99 Figure 4.6 Point-based accuracies of tile drainage classification for the test dataset with seven metrics. We also created a residual map at the county level, residuals are calculated as the percentage of area difference divided by the area reported by NASS (Figure 4.7b). The map revealed that overestimation occurred in counties with heavily tile-drained areas, primarily in Eastern Dakota, Southern Minnesota, the Des Moines Lobe in Iowa, Northeastern Illinois, Middle-northern Indiana, and Northwestern Ohio. This is likely due to our tile drainage points being concentrated in these areas and I assumed tile drainage permits in South Dakota, North Dakota, and the Bois de Sioux Watershed District are ground truth tile points. Additionally, I found that 25% of counties were not predicted to have any tile drainage installation, despite having reported tile drains by NASS (NA-Classified in Figure 4.7b). These counties have 100 a relatively low reported area, with a median reported area of ~26 km2 or ~5% of reported tile drainage in agricultural lands. This indicates that our classification model may require better ground truth information for regions with a low percentage of tile drains. It is important to note that the area reported by farmers through surveys may be somewhat inaccurate, however, it is the only source I have for this comparison across the region. Figure 4.7 Accuracy and spatial residuals observed at the county level. (a) Comparison between the classified tile drainage area (km2) and the reported area from the USDA NASS in 2017. Each point on the graph represents one county, with the blue solid line representing the linear regression line and the black dashed line indicating the 1:1 line. (b) Spatial residual at the county level, which was computed as the percentage of area difference divided by the NASS area. NA-NASS stands for no reported area from the Ag Census in 2017, NA-Classified represents the model that did not capture tile drainage in these counties. 101 Figure 4.7 (cont’d) 4.4.3: Overall Variable Importance and Accumulated Local Effects The importance of 31 input variables for tile drainage classification across the US Midwest was assessed using the Mean Decrease Gini (Gini), Mean Decrease Accuracy (MDA), and Shapley values. Higher values of Gini and MDA suggest that a variable has a more significant impact on classifying tile and non-tile areas. AET in the growing season (AET_grow) ranked the highest in the MDA, and the maximum nighttime LST in the summer (NightLST_max_summ) ranked top in the Gini index (Figure A4.0.4), thus they were in the top right corner of the multi-way importance plot (Figure 4.8). Four additional variables (aridity in spring, soil moisture percent in summer, the range of daytime LST in growing season and soil drainage class) ranked high in both Gini and MDA, meaning that removing these variables will largely reduce impurity and model accuracy. Shapley value (Figure 4.9a) can determine whether variables positively or negatively affected the classification accuracy of our model for a given prediction (tile, in this case). It is important to note 102 that Shapley values are the average contribution of an input variable to the target prediction, thus it might be biased due to the uneven distribution of ground truth points. Our findings revealed that four variables, soil moisture percent in the summer, median daytime LST in summer and growing season and actual evapotranspiration in growing season, negatively impacted tile drainage identification. Specifically, soil moisture percent in the summer made the largest negative contribution to identifying tile drains, this is somewhat unexpected as higher soil moisture percent were found in tile drainage points compared to non-tile points (Figure 4.4). Conversely, the remaining 26 variables had positive contributions toward identifying tile drains, and cropland type did not show any impacts on tile drainage identification. Of these positive variables, soil drainage class had the largest positive contribution, followed by aridity in spring, the range of daytime LST in the growing season (which was the second variable based on the Gini index). Align with expectations, tile drains are installed in relatively poorly drained soils (higher class number) thus resulting in positive impacts. 103 Figure 4.8 Two measures of variable importance from the random forest classification. The measure on the x-axis is Mean Decrease Accuracy and Mean Decrease Gini in the y-axis. The P value shows the significance level of importance. The top 11 variables according to these two measures are labeled as circles. Variables are identified by their short names, which can be related to full names and other information using Table 4.1. Finally, I calculated the overall importance (Figure 4.9b), based on these three-importance metrics, Gini index, MDA, and Shapley value. It revealed that variables derived from MODIS product, climate- and soil-related variables are the most important. The top-ranking variable is the maximum of nighttime LST during summer. This is somewhat consistent with (Cho et al., 2019) where the mean LST in the spring strongly contributed to the random forest classification for tile 104 drainage in the Bois de Sioux Watershed District in Minnesota and in the Red River basin (overlies portions of Dakotas and Minnesota). Figure 4.9 (a) Shapley value and (b) overall importance (summed score from MDA, Gini, and Shapley). The +/- Shapley value represents the average positive/negative impacts of variables on the classification. Different colors in (b) represent different data sources. 105 Figure 4.10 Accumulated local effects for top 20 variables, with the locally estimated scatterplot smoothing (loess) method. The strips are colored by different data sources. Figure 4.10 displays the accumulated local effects to evaluate whether high or low values of the variables correspond to higher probabilities of tile drains. Our analysis demonstrates that none of the effects exhibit monotonic behavior. The magnitude of the effects is plotted on the y-axis. The soil drainage class ranked top among the positive variables based on the Shapley value and maximum nighttime LST in the summer ranked top based on Gini index stand out with relatively larger ALE values. Notably, these two variables are about one magnitude higher than the 106 remaining variables. The general patterns can be categorized into several groups: (1) higher values indicate lower probabilities of tile drains, such as the actual evapotranspiration in the growing season; (2) higher values indicate higher probabilities of tile drains but stabilizes at a certain point, such as aridity in spring (threshold: 0.75); (3) higher values yield lower probabilities of tile drains, such as soil drainage class (higher values mean poorly drained soils); and (4) up and down trend variables. 4.5: Discussion and Conclusion 4.5.1: Comparison with Existing Products Table 4.2 listed the tile drainage area in the US Midwest (defined in Figure 4.2d) from different products. Our product, SEETileDrain, estimated that 185,069 km2 were tile drained, which is ~9.6% lower than the estimates from survey-based statistics from USDA-NASS, while the other three (AgTile-US, TD-MostPD, TD-AllPD) have estimated higher likely tile drainage than reported. AgTile-US utilizes information on soil drainage and topographic slope threshold within cropland areas to estimate tile drainage and constrain the geospatial model with statistical tile drainage area at the county level, based on data from the Census of Agriculture in 2017 (Valayamkunnath et al., 2020). The estimate is ~2% lower than the reported areas from USDA- NASS. Another study by (Jame et al., 2022) developed two TransformingDrainage (TD) extent products for the US Midwest, which are based on soil drainage class. These classes were selected because they are directly related to crop production and are therefore considered more suitable for estimating likely drained land than soil properties alone. The first product, TD-MostPD, includes areas with very poorly and poorly drained soils that are likely tile-drained. The second product, TD-AllPD, includes somewhat poorly drained soils in addition to the two categories used in TD- MostPD. TD-MostPD and TD-AllPD estimated 2.8 and 5 times more tile drainage extent than the 107 statistics from USDA-NASS for this region, respectively. Table 4.2. Estimated tile drainage area among existing products. Product Estimated Name Area (km2) Resolution Notes SeeTileDrain 185,069 30 m learning USDA-NASS 204,842 county-level County-level survey statistics Derived from satellite and environmental datasets with machine AgTile-US 201,206 30 m Geospatial analysis based on slope and soil drainage class TD-MostPD 576,493 30 m Areas have very poorly and poorly drained soils TD-AllPD 1,025,288 30 m soils Areas have very poorly, poorly, and somewhat poorly drained Because these maps (AgTile-US, TD-MostPD, TD-AllPD) rely on soil drainage class and slope information, I used a random forest classification model based on these inputs and all ground truth points. The model used the same ntree (500) and two variables at each split (mtry = 2), with an out-of-bag (OOB) estimate of the error rate at 36%. This indicates that only 64% of ground truth points were classified correctly. The model also exhibited relatively low values for sensitivity (0.66), and specificity (0.61), with a positive predictive value (0.29), and negative predictive value (0.88), demonstrating the inferior quality of the classification model which used two variables, compared to our final classification model that used 31 variables. The F1 score for this model was 0.4, compared to the F1 score of 0.9 obtained with 31 variables, indicating that the variables added significantly improved the binary classifications. 4.5.2: Conclusion, Limitation and Future Work Spatially-explicit agricultural tile drainage across the US Midwest, at 30-m resolution, was mapped using a random forest machine learning classification model. This model was 108 implemented on the Google Earth Engine cloud computing platform, incorporating 31 variables from eleven datasets and comprehensive ground truth points. The resulting classified map demonstrated good accuracy in point-based assessment and reasonable agreement with the reported area from USDA-NASS. Land Surface Temperature, soil moisture percent, actual evapotranspiration, and soil drainage class were identified as strong predictors for tile drainage identification. The machine learning model developed here could be readily applied to other regions, both for historical and future years. The resulting data product may be useful for hydrological, water quality, and crop modeling research. This information may help watershed managers and stakeholders achieve cost-effective agricultural water and nutrient management and maintain optimal crop production. The accuracy of the algorithm depends on the coverage of ground truth information, which was not evenly distributed and may be imbalanced for tile and non-tile points. Besides, assumptions about tile drainage permits and potential issues during the visual interpretation can introduce model bias. In addition, this study successively removed relatively high correlated but less important features based on pre-defined threshold, while correlations between variables might still exist. Even though variable co-variation is not often considered in random forest classification, additional preprocessing could be used to reduce the effects of correlated variables on the performance of the classification model and improve the robustness of variable importance. For future work, deep learning methods and tile drainage line extraction Artificial intelligence (AI) models could be used to identify tile drainage areas. These approaches have the potential to provide more accurate and efficient classification results. Moreover, the use of high-resolution images, such as the Harmonization of the Landsat and Sentinel-2 data (HLS) (Claverie et al., 2018), images provided by the Planet Lab or Google's aerial imagery, would enable the identification of tile 109 drainage at a finer resolution. This approach could provide more detailed information on the distribution and patterns of tile drainage across the landscape. The incorporation of such techniques and data sources could enhance our understanding of the effects of tile drainage on agricultural landscapes and facilitate development of more effective management strategies. 4.6: Acknowledgments This work was funded by NASA Grant NNX11AC72G, NOAA Grant NA12OAR432007, GS100013, and KBS LTER Graduate Fellowship. Luwen Wan was partially supported by the China Scholarship Council (CSC). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NASA, NOAA, or CSC. I am grateful to Eva Arvizu and Jordyn Porter who visually interpreted tile drainage points via the Google Earth aerial images. I thank Eunsang Cho (NASA, Goddard Space Flight Center) who shared his code and guidance on the machine learning algorithm. 110 CHAPTER 5: SUMMARY 5.1: Conclusion This dissertation simulated nutrient fluxes to the Great Lakes from the US side of their drainage basins, quantified the contributions of seven nutrient sources and four distinct transport pathways and identified nutrient delivery hotspots. Modeling transport pathways in Chapter 2 offers a novel alternative to many models that do not include essential pathways, such as groundwater and septic plumes. Nutrient simulation on seasonal variations can help stakeholders implement nutrient reductions at the most effective time. Chapter 2 found that the annual average nitrogen and phosphorus loads to the Great Lakes from its US drainage basin are 599 and 21.7 kg/yr/km2, respectively. The spatial patterns are similar, with high nutrient delivery areas in southern Lake Michigan, Saginaw Bay, and the Western Lake Erie basin. SENSEflux modeling results demonstrate that agricultural sources are the main sources of nutrient pollution, and surface pathways (overland runoff and tile drainage) are dominant in the US Great Lakes Basin on an annual basis. Additionally, these results highlight those other sources (i.e., septic tanks, atmospheric deposition, sources from developed urban areas) and pathways (groundwater and septic plumes) cannot be overlooked. Annually, areas with high loading and delivery efficiency are the most intense sources of nutrient loads to the Great Lakes coastline. SENSEflux seasonal models in Chapter 3 found that there is 1063 t/day of nitrogen delivered from US lands to the lakes during snowmelt, and this is about 2.8 times greater than the annual delivery (378 t/day) and 16 times greater than the TN delivery during baseflow (66 t/day). And there are more phosphorus delivered annually and during snow melt season, compared to baseflow. Most nutrients are transported via overland flow and tile fields regardless of the period. Groundwater and septic plume pathways play an essential role in transporting nutrients and become more 111 significant during baseflow. Besides, agricultural sources contribute substantially higher (14% for TN and 5% for TP) during melt than baseflow, while point sources, septic tanks, and atmospheric deposition become more prominent contributors to nutrient delivery during baseflow. In addition, targeting nutrient reductions during snowmelt is more effective than focusing on baseflow, especially for TN, based on seasonal delivery hotspot analysis. Thus, seasonal variations of nutrient transport should be considered when implementing water quality management plans to achieve nutrient reduction targets. Through community-facing tools, the SENSEflux model outputs and insights from Chapter 2&3 can be linked to the local decision-making process, assisting watershed managers in focusing actions on specific sources and pathways at the right time and the right place. This dissertation also used a machine learning model to map agricultural tile drainage across the US Midwest in Chapter 4. This is because tile drainage is an important nutrient transport pathway but the spatially explicit tile drainage data at finer resolution was lacking at large scales. The random forest classification model was implemented on the Google Earth Engine, with comprehensive tile and non-tile ground truth points and twenty satellite-derived and environmental variables. The classifier model achieved good accuracy (95.5%), with an F1 score of 0.904. Besides, aggregated areas from the classified map for each county correlated well with the area reported by USDA Agricultural Census. The maximum nighttime land surface temperature during summer ranked highest based on the assessment of variable importance, followed by climate- and soil-related variables. In summary, SENSEflux model products about nutrient loadings, sources, pathways, and hotspots, as well as insights from these products can help decisions makers target the areas for nutrient reduction at the right time and develop watershed management strategies more effectively. 112 SENSEflux model could be readily applied to other watersheds with nutrient pollution and management issues, with a resolution limited only by spatially explicit nutrient inputs. The classified tile drainage map can improve the accuracy of crop, hydrological, and water quality modeling work across the US Midwest. The identified tile drainage locations can provide a baseline for mapping tile drainage area change over time predicting for the future. Similarly, the random forest machine learning approach for tile drainage classification could be readily applied to other regions where the key variables are available. 5.2: Limitations There are some limitations in this dissertation, even though it provides spatially explicit data products and results necessary to understand the interactions between human and natural systems. First, SENSEflux considers tile drainage to be an alternative nutrient pathway to overland runoff, which somewhat overestimates tile drainage’s contribution. Second, this dissertation assumed that groundwater watersheds have the same boundary as surface watersheds and that nutrient attenuation via the groundwater pathway was determined by the overland flow length. Third, there are some inherent uncertainties in the annual nutrient source due to the various sources, the coarse resolution of some inputs, and so on, but SENSEmap is the most suitable and useful nutrient source for this dissertation. SENSEflux used the annual average nutrient inputs with seasonal mobility inferred from streamflow variability, to simulate seasonal nutrient loading, as I do not have information about when nutrients are applied. Fourth, the distribution and accuracy of ground truth points are critical for accurate classification. Some of the tile drainage points used in this dissertation are clustered, tile drainage permits from three states were assumed to be real tile drainage points, and likely non-tile points were used for classification, because tile and non-tile ground truth points are not easily accessible, and labor and cost-intensive. 113 5.3: Future Work Future research could improve nutrient delivery simulation by distinguishing between overland runoff and flow from tile drainage, as well as nutrient travel time through groundwater watersheds and estimates of legacy timescales, particularly in watersheds with significantly different boundary conditions than surface watersheds. Furthermore, time-varying nutrient source inputs such as allocating fertilizer and manure to seasons based on phenology and fertilizer practice, as well as a robust monitoring network for collecting nutrient concentration across seasons could improve the seasonal nutrient delivery simulation and understanding the timing and quantity of nutrient losses. Furthermore, considering nutrient uptake and transformation from wetland systems before flowing into the Great Lakes is necessary because wetlands are effective natural systems for nitrogen, phosphorus, and carbon, as well as reducing the effects of nutrient pollution on aquatic ecosystems. In addition, the SENSEflux model can simulate nutrient species such as dissolved inorganic nitrogen (nitrate and nitrite), and soluble reactive phosphorus, which have different chemical properties, bioavailability, and ecological impacts in the environment than total nitrogen and phosphorus. Furthermore, the SENSEflux model can be extended to the Great Lakes Basin, which would aid in making more holistic decisions for nutrient management and conservation in the United States and Canada. Future research could make use of high-resolution remote sensing images and refine tile drainage detection methods. Using images from the Planet Lab, Google's aerial imagery, or the Harmonization of the Landsat and Sentinel-2 data (HLS) would enhance classification accuracy and enable the identification of tile drainage at a higher resolution. In addition, deep learning techniques and tile drainage line extraction Artificial intelligence (AI) models have the potential to identify tile drainage areas. These high-resolution images and AI techniques can be used to 114 identify historical tile drainage and predict future tile drainage, the resulting tile drainage area will improve our understanding of the effects of tile drainage on agricultural landscapes. Lastly, future interdisciplinary research could focus on the relationship between food production, nutrient pollution, and climate change, as well as human activities. The influence of climate change on food production, water balance, nutrient transport, and the modifications of human behaviors to manage complex landscapes has been growing. Future agricultural and hydrological management research needs to account for the warming temperatures and the increased frequency and intensity of extreme events brought on by climate change. In addition, agricultural practices such as tile drainage, cover crop, conservation tillage, and nutrient management practices such as the edge of field practices, woodchip bioreactors, and constructed wetlands could help reduce nutrient losses. Future management efforts aimed at enhancing human well-being and fostering a sustainable planet will require the integration of these processes. 115 REFERENCES Abatzoglou, J. T. (2013). Development of gridded surface meteorological data for ecological applications and modelling. International Journal of Climatology, 33(1), 121–131. https://doi.org/10.1002/joc.3413 Abatzoglou, J. T., Dobrowski, S. Z., Parks, S. A., & Hegewisch, K. C. (2018). TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958– 2015. Scientific Data, 5(1), 170191. https://doi.org/10.1038/sdata.2017.191 Abell, J. M., Özkundakci, D., & Hamilton, D. P. (2010). Nitrogen and Phosphorus Limitation of Phytoplankton Growth in New Zealand Lakes: Implications for Eutrophication Control. Ecosystems, 13(7), 966–977. https://doi.org/10.1007/s10021-010-9367-9 Alam, M. J., & Dutta, D. (2013). Predicting climate change impact on nutrient pollution in waterways: a case study in the upper catchment of the Latrobe River, Australia. Ecohydrology, 6(1), 73–82. https://doi.org/10.1002/eco.282 Alexander, J. (2013, May 12). Michigan has nation’s weakest regulations on septic systems. Retrieved from https://www.mlive.com/environment/2013/05/michigan_has_nations_weakest_r.html Alexander, R. B., & Smith, R. A. (2006). Trends in the nutrient enrichment of U.S. rivers during the late 20th century and their relation to changes in probable stream trophic conditions. Limnology and Oceanography, 51(1part2), 639–654. https://doi.org/10.4319/lo.2006.51.1_part_2.0639 Alexander, R. B., Smith, R. A., & Schwarz, G. E. (2000). Effect of stream channel size on the delivery of nitrogen to the Gulf of Mexico. Nature, 403(6771), 758–761. https://doi.org/10.1038/35001562 Alexander, R. B., Johnes, P. J., Boyer, E. W., & Smith, R. A. (2002). A comparison of models for estimating the riverine export of nitrogen from large watersheds. Biogeochemistry, 57– 58(1), 295–339. https://doi.org/10.1023/a:1015752801818 Arnold, J. G., Srinivasan, R., Muttiah, R. S., & Williams, J. R. (1998). Large Area Hydrologic Modeling and Assessment Part I: Model Development. Journal of the American Water Resources Association, 34(1), 73–89. https://doi.org/10.1111/j.1752-1688.1998.tb05961.x Azzari, G., Grassini, P., Edreira, J. I. R., Conley, S., Mourtzinis, S., & Lobell, D. B. (2019). Satellite mapping of tillage practices in the North Central US region from 2005 to 2016. Remote Sensing of Environment, 221, 417–429. https://doi.org/10.1016/j.rse.2018.11.010 Baker, J. L., Campbell, K. L., Johnson, H. P., & Hanway, J. J. (1975). Nitrate, Phosphorus, and Sulfate in Subsurface Drainage Water. Journal of Environmental Quality, 4(3), 406–412. https://doi.org/10.2134/jeq1975.00472425000400030027x 116 Baris, R. D., Cohen, S. Z., Barnes, N. L., Lam, J., & Ma, Q. (2010). Quantitative analysis of over 20 years of golf course monitoring studies. Environmental Toxicology and Chemistry, 29(6), 1224–1236. https://doi.org/10.1002/etc.185 Basu, N. B., Dony, J., Meter, K. J., Johnston, S. J., & Layton, A. T. (2023). A Random Forest in the Great Lakes: Stream Nutrient Concentrations Across the Transboundary Great Lakes Basin. Earth’s Future, 11(4). https://doi.org/10.1029/2021ef002571 Bechard, A. (2020). The economic impacts of harmful algal blooms on tourism: an examination of Southwest Florida using a spline regression approach. Natural Hazards, 104(1), 593–609. https://doi.org/10.1007/s11069-020-04182-7 Belgiu, M., & Drăguţ, L. (2016a). Random forest in remote sensing: A review of applications and future directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24–31. https://doi.org/10.1016/j.isprsjprs.2016.01.011 Belgiu, M., & Drăguţ, L. (2016b). Random forest in remote sensing: A review of applications and future directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24–31. https://doi.org/10.1016/j.isprsjprs.2016.01.011 Benoy, G. A., Jenkinson, R. W., Robertson, D. M., & Saad, D. A. (2016). Nutrient delivery to Lake Winnipeg from the Red—Assiniboine River Basin – A binational application of the SPARROW model. Canadian Water Resources Journal / Revue Canadienne Des Ressources Hydriques, 41(3), 1–19. https://doi.org/10.1080/07011784.2016.1178601 Blodgett, D., Read, E., Lucido, J., Slawecki, T., & Young, D. (2016). An Analysis of Water Data Systems to Inform the Open Water Data Initiative. JAWRA Journal of the American Water Resources Association, 52(4), 845–858. https://doi.org/10.1111/1752-1688.12417 Bock, E. M., & Easton, Z. M. (2020). Export of nitrogen and phosphorus from golf courses: A review. Journal of Environmental Management, 255, 109817. https://doi.org/10.1016/j.jenvman.2019.109817 Bockheim, J. G. (2020). Soils of the Laurentian Great Lakes, USA and Canada, 121–127. https://doi.org/10.1007/978-3-030-52425-8_9 Bosch, N. S. (2008). The influence of impoundments on riverine nutrient transport: An evaluation using the Soil and Water Assessment Tool. Journal of Hydrology, 355(1–4), 131–147. https://doi.org/10.1016/j.jhydrol.2008.03.012 Bowles, T. M., Atallah, S. S., Campbell, E. E., Gaudin, A. C. M., Wieder, W. R., & Grandy, A. S. (2018). Addressing agricultural nitrogen losses in a changing climate. Nature Sustainability, 1(8), 399–408. https://doi.org/10.1038/s41893-018-0106-0 Boyer, E. W., Goodale, C. L., Jaworski, N. A., & Howarth, R. W. (2002). Anthropogenic nitrogen sources and relationships to riverine nitrogen export in the northeastern U.S.A. Biogeochemistry, 57–58(1), 137–169. https://doi.org/10.1023/a:1015709302073 117 Brakebill, J. W., & Terziotti, S. E. (2011). A Digital Hydrologic Network Supporting NAWQA MRB SPARROW Modeling--MRB_E2RF1WS. https://doi.org/10.3133/70046785 Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/a:1010933404324 Brookfield, A. E., Hansen, A. T., Sullivan, P. L., Czuba, J. A., Kirk, M. F., Li, L., et al. (2021). Predicting algal blooms: Are we overlooking groundwater? Science of The Total Environment, 769, 144442. https://doi.org/10.1016/j.scitotenv.2020.144442 Bureau, U. S. C. (2010). Census Regions and Divisions of the United States. Retrieved from https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf Byrd, R. H., Hribar, M. E., & Nocedal, J. (1999). An Interior Point Algorithm for Large-Scale Nonlinear Programming. SIAM Journal on Optimization, 9(4), 877–900. https://doi.org/10.1137/s1052623497325107 Byrd, R. H., Gilbert, J. C., & Nocedal, J. (2000). A trust region method based on interior point techniques for nonlinear programming. Mathematical Programming, 89(1), 149–185. https://doi.org/10.1007/pl00011391 Byrnes, D. K., Meter, K. J. V., & Basu, N. B. (2020). Long‐Term Shifts in U.S. Nitrogen Sources and Sinks Revealed by the New TREND‐Nitrogen Data Set (1930–2017). Global Biogeochemical Cycles, 34(9). https://doi.org/10.1029/2020gb006626 Carmichael, W. W., & Boyer, G. L. (2016). Health impacts from cyanobacteria harmful algae blooms: Implications for the North American Great Lakes. Harmful Algae, 54, 194–212. https://doi.org/10.1016/j.hal.2016.02.002 Chaney, N. W., Minasny, B., Herman, J. D., Nauman, T. W., Brungard, C. W., Morgan, C. L. S., et al. (2019). POLARIS Soil Properties: 30‐m Probabilistic Maps of Soil Properties Over the Contiguous United States. Water Resources Research, 55(4), 2916–2938. https://doi.org/10.1029/2018wr022797 Chaplin-Kramer, R., Hamel, P., Sharp, R., Kowal, V., Wolny, S., Sim, S., & Mueller, C. (2016). Landscape configuration is the primary driver of impacts on water quality associated with agricultural expansion. Environmental Research Letters, 11(7), 074012. https://doi.org/10.1088/1748-9326/11/7/074012 Chapra, S. C., Dolan, D. M., & Dove, A. (2016). Mass-balance modeling framework for simulating and managing long-term water quality for the lower Great Lakes. Journal of Great Lakes Research, 42(6), 1166–1173. https://doi.org/10.1016/j.jglr.2016.04.008 Chen, Xuanjing, Wang, M., Kroeze, C., Chen, X., Ma, L., Chen, X., et al. (2022). Nitrogen in the Yangtze River Basin: Pollution Reduction through Coupling Crop and Livestock Production. Environmental Science & Technology. https://doi.org/10.1021/acs.est.1c08808 118 Chen, Xuemei, Zhang, W., Yin, Y., Tang, J., Li, G., & Yan, Y. (2021). Seasonal variation characteristics and release potential of phosphorus in sediments: a case study of the Qiuxi River, a typical diffuse source pollution river in southwestern China. Journal of Soils and Sediments, 21(1), 575–591. https://doi.org/10.1007/s11368-020-02805-x Cho, E., Jacobs, J. M., Jia, X., & Kraatz, S. (2019). Identifying Subsurface Drainage using Satellite Big Data and Machine Learning via Google Earth Engine. Water Resources Research, 55(10), 8028–8045. https://doi.org/10.1029/2019wr024892 Clark, C. M., Phelan, J., Doraiswamy, P., Buckley, J., Cajka, J. C., Dennis, R. L., et al. (2018). Atmospheric deposition and exceedances of critical loads from 1800−2025 for the conterminous United States. Ecological Applications, 28(4), 978–1002. https://doi.org/10.1002/eap.1703 Claverie, M., Ju, J., Masek, J. G., Dungan, J. L., Vermote, E. F., Roger, J.-C., et al. (2018). The Harmonized Landsat and Sentinel-2 surface reflectance data set. Remote Sensing of Environment, 219, 145–161. https://doi.org/10.1016/j.rse.2018.09.002 Clement, D. R., & Steinman, A. D. (2017). Phosphorus loading and ecological impacts from agricultural tile drains in a west Michigan watershed. Journal of Great Lakes Research, 43(1), 50–58. https://doi.org/10.1016/j.jglr.2016.10.016 Congalton, R. G., & Green, K. (2019). Assessing the accuracy of remotely sensed data: principles and practices, Third Edition (Third Edition). CRC Press. Corriveau, J., Chambers, P. A., & Culp, J. M. (2013). Seasonal Variation in Nutrient Export Along Streams in the Northern Great Plains. Water, Air, & Soil Pollution, 224(7), 1594. https://doi.org/10.1007/s11270-013-1594-1 Cotner, J. B., Weinke, A. D., & Biddanda, B. A. (2017). Great Lakes: Science can keep them great. Journal of Great Lakes Research, 43(5), 916–919. https://doi.org/10.1016/j.jglr.2017.07.002 Dai, Y., Lang, Y., Wang, T., Han, X., Wang, L., & Zhong, J. (2021). Modelling the sources and transport of ammonium nitrogen with the SPARROW model: A case study in a karst basin. Journal of Hydrology, 592, 125763. https://doi.org/10.1016/j.jhydrol.2020.125763 David, M. B., Drinkwater, L. E., & McIsaac, G. F. (2010). Sources of Nitrate Yields in the Mississippi River Basin. Journal of Environmental Quality, 39(5), 1657–1667. https://doi.org/10.2134/jeq2010.0115 Deines, J. M., Kendall, A. D., & Hyndman, D. W. (2017). Annual Irrigation Dynamics in the U.S. Northern High Plains Derived from Landsat Satellite Data. Geophysical Research Letters, 44(18), 9350–9360. https://doi.org/10.1002/2017gl074071 Deines, J. M., Kendall, A. D., Crowley, M. A., Rapp, J., Cardille, J. A., & Hyndman, D. W. (2019). Mapping three decades of annual irrigation across the US High Plains Aquifer using 119 Landsat and Google Earth Engine. Remote Sensing of Environment, 233, 111400. https://doi.org/10.1016/j.rse.2019.111400 FAO. (2017). FAO Statistical Databases. Retrieved from http://faostat.fao.org Fausey, N. R., Brown, L. C., Belcher, H. W., & Kanwar, R. S. (1995). Drainage and Water Quality in Great Lakes and Cornbelt States. Journal of Irrigation and Drainage Engineering, 121(4), 283–288. https://doi.org/10.1061/(asce)0733-9437(1995)121:4(283) Filipović, V., Toor, G. S., Ondrašek, G., & Kodešová, R. (2015). Modeling water flow and nitrate–nitrogen transport on golf course under turfgrass. Journal of Soils and Sediments, 15(8), 1847–1859. https://doi.org/10.1007/s11368-014-0980-7 Finocchiaro, R. G. (2014). Agricultural Subsurface Drainage Tile Locations by Permits in South Dakota. Retrieved from http://dx.doi.org/10.5066/F7KS6PNW Finocchiaro, R. G. (2016). Agricultural Subsurface Drainage Tile Locations by Permits in North Dakota. Retrieved from http://dx.doi.org/10.5066/F7QF8QZW Foley, T. A., & Betterton, E. A. (2019). Nitrogen dry deposition to Lake Superior and Lake Michigan. Journal of Great Lakes Research, 45(2), 224–239. https://doi.org/10.1016/j.jglr.2018.12.003 Ford, C. M., Kendall, A. D., & Hyndman, D. W. (2020). Effects of shifting snowmelt regimes on the hydrology of non-alpine temperate landscapes. Journal of Hydrology, 590, 125517. https://doi.org/10.1016/j.jhydrol.2020.125517 Ford, C. M., Kendall, A. D., & Hyndman, D. W. (2021). Snowpacks decrease and streamflows shift across the eastern US as winters warm. Science of The Total Environment, 793, 148483. https://doi.org/10.1016/j.scitotenv.2021.148483 Frei, R. J., Lawson, G. M., Norris, A. J., Cano, G., Vargas, M. C., Kujanpää, E., et al. (2021). Limited progress in nutrient pollution in the U.S. caused by spatially persistent nutrient sources. PLoS ONE, 16(11), e0258952. https://doi.org/10.1371/journal.pone.0258952 Gao, B. (1996). NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sensing of Environment, 58(3), 257–266. https://doi.org/10.1016/s0034-4257(96)00067-3 Gassman, P. W., Sadeghi, A. M., & Srinivasan, R. (2014). Applications of the SWAT Model Special Section: Overview and Insights. Journal of Environmental Quality, 43(1), 1–8. https://doi.org/10.2134/jeq2013.11.0466 Gill, L. W., & Mockler, E. M. (2016). Modeling the pathways and attenuation of nutrients from domestic wastewater treatment systems at a catchment scale. Environmental Modelling & Software, 84, 363–377. https://doi.org/10.1016/j.envsoft.2016.07.006 120 Gobler, C. J. (2020). Climate Change and Harmful Algal Blooms: Insights and perspective. Harmful Algae, 91, 101731. https://doi.org/10.1016/j.hal.2019.101731 Gökkaya, K., Budhathoki, M., Christopher, S. F., Hanrahan, B. R., & Tank, J. L. (2017). Subsurface tile drained area detection using GIS and remote sensing in an agricultural watershed. Ecological Engineering, 108, 370–379. https://doi.org/10.1016/j.ecoleng.2017.06.048 Goyette, J., Bennett, E. M., Howarth, R. W., & Maranger, R. (2016). Changes in anthropogenic nitrogen and phosphorus inputs to the St. Lawrence sub‐basin over 110 years and impacts on riverine export. Global Biogeochemical Cycles, 30(7), 1000–1014. https://doi.org/10.1002/2016gb005384 Gsech, D., Oimoen, M., Greenlee, S., Nelson, C., Steuck, M., & Tyler, D. (2002). The National Elevation Dataset: photogrammetric engineering and remote sensing, 68, 5–11. Guo, L. (2007). Doing Battle With the Green Monster of Taihu Lake. Science, 317(5842), 1166– 1166. https://doi.org/10.1126/science.317.5842.1166 Guo, T., Johnson, L. T., LaBarge, G. A., Penn, C. J., Stumpf, R. P., Baker, D. B., & Shao, G. (2021). Less Agricultural Phosphorus Applied in 2019 Led to Less Dissolved Phosphorus Transported to Lake Erie. Environmental Science & Technology, 55(1), 283–291. https://doi.org/10.1021/acs.est.0c03495 Hallegraeff, G., Enevoldsen, H., & Zingone, A. (2021). Global harmful algal bloom status reporting. Harmful Algae, 102, 101992. https://doi.org/10.1016/j.hal.2021.101992 Hamlin, Q. F., Kendall, A. D., Martin, S. L., Whitenack, H. D., Roush, J. A., Hannah, B. A., & Hyndman, D. W. (2020a). Quantifying Landscape Nutrient Inputs With Spatially Explicit Nutrient Source Estimate Maps. Journal of Geophysical Research: Biogeosciences, 125(2). https://doi.org/10.1029/2019jg005134 Hamlin, Q. F., Kendall, A. D., Martin, S., Whitenack, H., Roush, J., Hannah, B., & Hyndman, D. W. (2020b, February 11). SENSEmap-USGLB: Nitrogen and Phosphorus Inputs. Retrieved from https://www.hydroshare.org/resource/1a116e5460e24177999c7bd6f8292421/ Hamlin, Q. F., Martin, S. L., Kendall, A. D., & Hyndman, D. W. (2022). Examining Relationships Between Groundwater Nitrate Concentrations in Drinking Water and Landscape Characteristics to Understand Health Risks. GeoHealth, 6(5), e2021GH000524. https://doi.org/10.1029/2021gh000524 Han, H., Bosch, N., & Allan, J. D. (2011). Spatial and temporal variation in phosphorus budgets for 24 watersheds in the Lake Erie and Lake Michigan basins. Biogeochemistry, 102(1–3), 45–58. https://doi.org/10.1007/s10533-010-9420-y Hartman, M. D., Baron, J. S., Ewing, H. A., & Weathers, K. C. (2014). Combined global change effects on ecosystem processes in nine U.S. topographically complex areas. Biogeochemistry, 119(1–3), 85–108. https://doi.org/10.1007/s10533-014-9950-9 121 Heil, C. A., & Muni-Morgan, A. L. (2021). Florida’s Harmful Algal Bloom (HAB) Problem: Escalating Risks to Human, Environmental and Economic Health With Climate Change. Frontiers in Ecology and Evolution, 9, 646080. https://doi.org/10.3389/fevo.2021.646080 Hirt, U., & Volk, M. (2011). Quantifying the proportion of tile-drained land in large river basins. Physics and Chemistry of the Earth, Parts A/B/C, 36(12), 591–598. https://doi.org/10.1016/j.pce.2011.05.004 Holman, I. P., Whelan, M. J., Howden, N. J. K., Bellamy, P. H., Willby, N. J., Rivas‐Casado, M., & McConvey, P. (2008). Phosphorus in groundwater—an overlooked contributor to eutrophication? Hydrological Processes, 22(26), 5121–5127. https://doi.org/10.1002/hyp.7198 Hong, B., Swaney, D. P., & Howarth, R. W. (2013). Estimating Net Anthropogenic Nitrogen Inputs to U.S. Watersheds: Comparison of Methodologies. Environmental Science & Technology, 47(10), 5199–5207. https://doi.org/10.1021/es303437c Houlton, B. Z., Almaraz, M., Aneja, V., Austin, A. T., Bai, E., Cassman, K. G., et al. (2019). A World of Cobenefits: Solving the Global Nitrogen Challenge. Earth’s Future, 7(8), 865– 872. https://doi.org/10.1029/2019ef001222 Howarth, R. W., Billen, G., Swaney, D., Townsend, A., Jaworski, N., Lajtha, K., et al. (1996). Regional nitrogen budgets and riverine N & P fluxes for the drainages to the North Atlantic Ocean: Natural and human influences. Biogeochemistry, 35(1), 75–139. https://doi.org/10.1007/bf02179825 Huggins, X., Gleeson, T., Serrano, D., Zipper, S., Jehn, F., Rohde, M. M., et al. (2023). Overlooked risks and opportunities in groundwatersheds of the world’s protected areas. Nature Sustainability, 1–10. https://doi.org/10.1038/s41893-023-01086-9 Huisman, N. L. H., Huisman, A. J., & Karthikeyan, K. G. (2017). Seasonal and animal farm size influences on in-stream phosphorus transport in an agricultural watershed. Nutrient Cycling in Agroecosystems, 109(1), 29–42. https://doi.org/10.1007/s10705-017-9866-6 Hyndman, D. W., Kendall, A. D., & Welty, N. R. H. (2007). Evaluating Temporal and Spatial Variations in Recharge and Streamflow Using the Integrated Landscape Hydrology Model (ILHM). ICID. (2018, January 1). World Drained Area. Retrieved from https://www.icid.org/world- drained-area.pdf Ikenberry, C. D., Soupir, M. L., Schilling, K. E., Jones, C. S., & Seeman, A. (2014). Nitrate‐ Nitrogen Export: Magnitude and Patterns from Drainage Districts to Downstream River Basins. Journal of Environmental Quality, 43(6), 2024–2033. https://doi.org/10.2134/jeq2014.05.0242 122 Jame, S. A., Frankenberger, J., Reinhart, B. D., & Bowling, L. (2022). Mapping Agricultural Drainage Extent in the U.S. Corn Belt: The Value of Multiple Methods. Applied Engineering in Agriculture, 38(6), 917–930. https://doi.org/10.13031/aea.15226 Kalcic, M. M., Muenich, R. L., Basile, S., Steiner, A. L., Kirchhoff, C., & Scavia, D. (2019). Climate Change and Nutrient Loading in the Western Lake Erie Basin: Warming Can Counteract a Wetter Future. Environmental Science & Technology, 53(13), 7543–7550. https://doi.org/10.1021/acs.est.9b01274 Khand, K., Kjaersgaard, J., Hay, C., & Jia, X. (2017). Estimating Impacts of Agricultural Subsurface Drainage on Evapotranspiration Using the Landsat Imagery-Based METRIC Model. Hydrology, 4(4), 49. https://doi.org/10.3390/hydrology4040049 King, K. W., Fausey, N. R., & Williams, M. R. (2014). Effect of subsurface drainage on streamflow in an agricultural headwater watershed. Journal of Hydrology, 519, 438–445. https://doi.org/10.1016/j.jhydrol.2014.07.035 King, K. W., Williams, M. R., & Fausey, N. R. (2015). Contributions of Systematic Tile Drainage to Watershed‐Scale Phosphorus Transport. Journal of Environmental Quality, 44(2), 486–494. https://doi.org/10.2134/jeq2014.04.0149 King, K. W., Williams, M. R., Macrae, M. L., Fausey, N. R., Frankenberger, J., Smith, D. R., et al. (2015). Phosphorus Transport in Agricultural Subsurface Drainage: A Review. Journal of Environmental Quality, 44(2), 467–485. https://doi.org/10.2134/jeq2014.04.0163 Knights, D., Parks, K. C., Sawyer, A. H., David, C. H., Browning, T. N., Danner, K. M., & Wallace, C. D. (2017). Direct groundwater discharge and vulnerability to hidden nutrient loads along the Great Lakes coast of the United States. Journal of Hydrology, 554, 331–341. https://doi.org/10.1016/j.jhydrol.2017.09.001 Kokulan, V. (2019). Environmental and Economic Consequences of Tile Drainage Systems in Canada (Journal of Environmental Quality) (pp. 469–477). Retrieved from https://capi- icpa.ca/wp-content/uploads/2019/06/2019-06-14-CAPI-Vivekananthan-Kokulan-Paper- WEB-1.pdf Kokulan, V., Macrae, M. L., Lobb, D. A., & Ali, G. A. (2019). Contribution of Overland and Tile Flow to Runoff and Nutrient Losses from Vertisols in Manitoba, Canada. Journal of Environmental Quality, 48(4), 959–965. https://doi.org/10.2134/jeq2019.03.0103 Kottek, M., Grieser, J., Beck, C., Rudolf, B., & Rubel, F. (2006). World Map of the Köppen- Geiger climate classification updated. Meteorologische Zeitschrift, 15(3), 259–263. https://doi.org/10.1127/0941-2948/2006/0130 Kovalenko, K. E., Reavie, E. D., Barbiero, R. P., Burlakova, L. E., Karatayev, A. Y., Rudstam, L. G., & Watkins, J. M. (2018). Patterns of long-term dynamics of aquatic communities and water quality parameters in the Great Lakes: Are they synchronized? Journal of Great Lakes Research, 44(4), 660–669. https://doi.org/10.1016/j.jglr.2018.05.018 123 Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 28 (5), 1–26. Retrieved from https://www.jstatsoft.org/article/view/v028i05 LaBeau, M., Mayer, A., Griffis, V., Watkins, D., Robertson, D., & Gyawali, R. (2015). The importance of considering shifts in seasonal changes in discharges when predicting future phosphorus loads in streams. Biogeochemistry, 126(1–2), 153–172. https://doi.org/10.1007/s10533-015-0149-5 Lapointe, B. E., Herren, L. W., & Paule, A. L. (2017). Septic systems contribute to nutrient pollution and harmful algal blooms in the St. Lucie Estuary, Southeast Florida, USA. Harmful Algae, 70, 1–22. https://doi.org/10.1016/j.hal.2017.09.005 Lawrence, R. L., Wood, S. D., & Sheley, R. L. (2006). Mapping invasive plants using hyperspectral imagery and Breiman Cutler classifications (randomForest). Remote Sensing of Environment, 100(3), 356–362. https://doi.org/10.1016/j.rse.2005.10.014 Lefcheck, J. S., Orth, R. J., Dennison, W. C., Wilcox, D. J., Murphy, R. R., Keisman, J., et al. (2018). Long-term nutrient reductions lead to the unprecedented recovery of a temperate coastal region. Proceedings of the National Academy of Sciences, 115(14), 3658–3662. https://doi.org/10.1073/pnas.1715798115 Li, Y., Robinson, S. V. J., Nguyen, L. H., & Liu, J. (2023). Satellite prediction of coastal hypoxia in the northern Gulf of Mexico. Remote Sensing of Environment, 284, 113346. https://doi.org/10.1016/j.rse.2022.113346 Liaw, A., & Wiener, M. (2002a). Classification and Regression by randomForest. R News, 2(3), 18–22. Liaw, A., & Wiener, M. (2002b). Classification and Regression by randomForest, (2(3)), 18–22. Lintern, A., McPhillips, L., Winfrey, B., Duncan, J., & Grady, C. (2020). Best Management Practices for Diffuse Nutrient Pollution: Wicked Problems Across Urban and Agricultural Watersheds. Environmental Science & Technology, 54(15), 9159–9174. https://doi.org/10.1021/acs.est.9b07511 Liu, M., Graham, N., Wang, W., Zhao, R., Lu, Y., Elimelech, M., & Yu, W. (2022). Spatial assessment of tap-water safety in China. Nature Sustainability, 1–10. https://doi.org/10.1038/s41893-022-00898-5 Luscz, E. C., Kendall, A. D., & Hyndman, D. W. (2015). High resolution spatially explicit nutrient source models for the Lower Peninsula of Michigan. Journal of Great Lakes Research, 41(2), 618–629. https://doi.org/10.1016/j.jglr.2015.02.004 Luscz, E. C., Kendall, A. D., & Hyndman, D. W. (2017). A spatially explicit statistical model to quantify nutrient sources, pathways, and delivery at the regional scale. Biogeochemistry, 133(1), 37–57. https://doi.org/10.1007/s10533-017-0305-1 124 Ma, Z., Guan, K., Peng, B., Sivapalan, M., Li, L., Pan, M., et al. (2023). Agricultural nitrate export patterns shaped by crop rotation and tile drainage. Water Research, 229, 119468. https://doi.org/10.1016/j.watres.2022.119468 Maccoux, M. J., Dove, A., Backus, S. M., & Dolan, D. M. (2016). Total and soluble reactive phosphorus loadings to Lake Erie A detailed accounting by year, basin, country, and tributary. Journal of Great Lakes Research, 42(6), 1151–1165. https://doi.org/10.1016/j.jglr.2016.08.005 Martin, S. L., Hayes, D. B., Rutledge, D. T., & Hyndman, D. W. (2011). The land‐use legacy effect: Adding temporal context to lake chemistry. Limnology and Oceanography, 56(6), 2362–2370. https://doi.org/10.4319/lo.2011.56.6.2362 Martin, S. L., Hayes, D. B., Kendall, A. D., & Hyndman, D. W. (2017). The land-use legacy effect: Towards a mechanistic understanding of time-lagged water quality responses to land use/cover. Science of The Total Environment, 579, 1794–1803. https://doi.org/10.1016/j.scitotenv.2016.11.158 Martin, S. L., Hamlin, Q. F., Kendall, A. D., Wan, L., & Hyndman, D. W. (2021). The land use legacy effect: looking back to see a path forward to improve management. Environmental Research Letters, 16(3), 035005. https://doi.org/10.1088/1748-9326/abe14c Matasci, G., Hermosilla, T., Wulder, M. A., White, J. C., Coops, N. C., Hobart, G. W., & Zald, H. S. J. (2018). Large-area mapping of Canadian boreal forest cover, height, biomass and other structural attributes using Landsat composites and lidar plots. Remote Sensing of Environment, 209, 90–106. https://doi.org/10.1016/j.rse.2017.12.020 Mayorga, E., Seitzinger, S. P., Harrison, J. A., Dumont, E., Beusen, A. H. W., Bouwman, A. F., et al. (2010a). Global Nutrient Export from WaterSheds 2 (NEWS 2): Model development and implementation. Environmental Modelling & Software, 25(7), 837–853. https://doi.org/10.1016/j.envsoft.2010.01.007 Mayorga, E., Seitzinger, S. P., Harrison, J. A., Dumont, E., Beusen, A. H. W., Bouwman, A. F., et al. (2010b). Global Nutrient Export from WaterSheds 2 (NEWS 2): Model development and implementation. Environmental Modelling & Software, 25(7), 837–853. https://doi.org/10.1016/j.envsoft.2010.01.007 McCrackin, M. L., Harrison, J. A., & Compton, J. E. (2013). A comparison of NEWS and SPARROW models to understand sources of nitrogen delivered to US coastal areas. Biogeochemistry, 114(1–3), 281–297. https://doi.org/10.1007/s10533-012-9809-x McCrackin, M. L., Harrison, J. A., & Compton, J. E. (2014). Factors influencing export of dissolved inorganic nitrogen by major rivers: A new, seasonal, spatially explicit, global model. Global Biogeochemical Cycles, 28(3), 269–285. https://doi.org/10.1002/2013gb004723 McCrackin, M. L., Cooter, E. J., Dennis, R. L., Harrison, J. A., & Compton, J. E. (2017). Alternative futures of dissolved inorganic nitrogen export from the Mississippi River Basin: 125 influence of crop management, atmospheric deposition, and population growth. Biogeochemistry, 133(3), 263–277. https://doi.org/10.1007/s10533-017-0331-z Mchau, G. J., Makule, E., Machunda, R., Gong, Y. Y., & Kimanya, M. (2019). Harmful algal bloom and associated health risks among users of Lake Victoria freshwater: Ukerewe Island, Tanzania. Journal of Water and Health, 17(5), 826–836. https://doi.org/10.2166/wh.2019.083 Meinikmann, K., Hupfer, M., & Lewandowski, J. (2015). Phosphorus in groundwater discharge – A potential source for lake eutrophication. Journal of Hydrology, 524, 214–226. https://doi.org/10.1016/j.jhydrol.2015.02.031 Meter, K J Van, & Basu, N. B. (2017). Time lags in watershed-scale nutrient transport: an exploration of dominant controls. Environmental Research Letters, 12(8), 084017. https://doi.org/10.1088/1748-9326/aa7bf4 Meter, Kimberly J Van, Schultz, V. O., & Chang, S. Y. (2023). Data-driven approaches demonstrate legacy N accumulation in upper Mississippi river basin groundwater. Environmental Research Letters. https://doi.org/10.1088/1748-9326/acea34 Michalak, A. M., Anderson, E. J., Beletsky, D., Boland, S., Bosch, N. S., Bridgeman, T. B., et al. (2013). Record-setting algal bloom in Lake Erie caused by agricultural and meteorological trends consistent with expected future conditions. Proceedings of the National Academy of Sciences, 110(16), 6448–6452. https://doi.org/10.1073/pnas.1216006110 Michaud, A. R., Poirier, S., & Whalen, J. K. (2019). Tile Drainage as a Hydrologic Pathway for Phosphorus Export from an Agricultural Subwatershed. Journal of Environmental Quality, 48(1), 64–72. https://doi.org/10.2134/jeq2018.03.0104 Miller, M. P., Tesoriero, A. J., Hood, K., Terziotti, S., & Wolock, D. M. (2017). Estimating Discharge and Nonpoint Source Nitrate Loading to Streams From Three End‐Member Pathways Using High‐Frequency Water Quality Data. Water Resources Research, 53(12), 10201–10216. https://doi.org/10.1002/2017wr021654 Miller, M. P., Capel, P. D., García, A. M., & Ator, S. W. (2020). Response of Nitrogen Loading to the Chesapeake Bay to Source Reduction and Land Use Change Scenarios: A SPARROW‐Informed Analysis. JAWRA Journal of the American Water Resources Association, 56(1), 100–112. https://doi.org/10.1111/1752-1688.12807 Miller, S. A., & Lyon, S. W. (2021a). Tile drainage causes flashy streamflow response in Ohio watersheds. Hydrological Processes, 35(8). https://doi.org/10.1002/hyp.14326 Miller, S. A., & Lyon, S. W. (2021b). Tile Drainage Increases Total Runoff and Phosphorus Export During Wet Years in the Western Lake Erie Basin. Frontiers in Water, 3, 757106. https://doi.org/10.3389/frwa.2021.757106 126 Miller, S. A., Lyon, S. W., & Moore, R. H. (2022). Impacts of a nutrient trading plan on stream water quality in Sugar Creek, Ohio. JAWRA Journal of the American Water Resources Association. https://doi.org/10.1111/1752-1688.13067 Møller, A. B., Beucher, A., Iversen, B. V., & Greve, M. H. (2018). Predicting artificially drained areas by means of a selective model ensemble. Geoderma, 320, 30–42. https://doi.org/10.1016/j.geoderma.2018.01.018 Molnar, C. (2018). iml: An R package for Interpretable Machine Learning. Journal of Open Source Software, 3(26), 786. https://doi.org/10.21105/joss.00786 Mooney, R. J., Stanley, E. H., Rosenthal, W. C., Esselman, P. C., Kendall, A. D., & McIntyre, P. B. (2020). Outsized nutrient contributions from small tributaries to a Great Lake. Proceedings of the National Academy of Sciences, 117(45), 28175–28182. https://doi.org/10.1073/pnas.2001376117 Motew, M., Booth, E. G., Carpenter, S. R., Chen, X., & Kucharik, C. J. (2018). The synergistic effect of manure supply and extreme precipitation on surface water quality. Environmental Research Letters, 13(4), 044016. https://doi.org/10.1088/1748-9326/aaade6 Nasab, M. T., Grimm, K., Bazrkar, M. H., Zeng, L., Shabani, A., Zhang, X., & Chu, X. (2018). SWAT Modeling of Non-Point Source Pollution in Depression-Dominated Basins under Varying Hydroclimatic Conditions. International Journal of Environmental Research and Public Health, 15(11), 2492. https://doi.org/10.3390/ijerph15112492 NASS. (2012). Agricultural Census data for county-level tile drainage area, 2012. Retrieved February 15, 2019, from https://www.nass.usda.gov/ NASS. (2017). Agricultural Census data for county-level tile drainage area, 2017. Retrieved from https://www.nass.usda.gov/ Naz, B. S., & Bowling, L. C. (2008). Automated identification of tile lines from remotely sensed data. Transactions of the ASABE. Naz, B. S., Ale, S., & Bowling, L. C. (2009). Detecting subsurface drainage systems and estimating drain spacing in intensively managed agricultural landscapes. Agricultural Water Management, 96(4), 627–637. https://doi.org/10.1016/j.agwat.2008.10.002 NOAA. (2019, February 1). Great Lakes Ecoregion. Retrieved September 30, 2021, from https://www.noaa.gov/education/resource-collections/freshwater/great-lakes-ecoregion Ockenden, M. C., Hollaway, M. J., Beven, K. J., Collins, A. L., Evans, R., Falloon, P. D., et al. (2017). Major agricultural changes required to mitigate phosphorus losses under climate change. Nature Communications, 8(1), 161. https://doi.org/10.1038/s41467-017-00232-0 Oldfield, L., Rakhimbekova, S., Roy, J. W., & Robinson, C. E. (2020). Estimation of phosphorus loads from septic systems to tributaries in the Canadian Lake Erie Basin. Journal of Great Lakes Research, 46(6), 1559–1569. https://doi.org/10.1016/j.jglr.2020.08.021 127 Oldfield, L. E., Roy, J. W., & Robinson, C. E. (2020). Investigating the use of the artificial sweetener acesulfame to evaluate septic system inputs and their nutrient loads to streams at the watershed scale. Journal of Hydrology, 587, 124918. https://doi.org/10.1016/j.jhydrol.2020.124918 OSU. (2014, February 4). PRISM Climate Group. Oregon State University. Retrieved from https://prism.oregonstate.edu Ouyang, Y., & Zhang, J.-E. (2012). Quantification of Shallow Groundwater Nutrient Dynamics in Septic Areas. Water, Air, & Soil Pollution, 223(6), 3181–3193. https://doi.org/10.1007/s11270-012-1100-1 Paerl, H. W., Hall, N. S., & Calandrino, E. S. (2011). Controlling harmful cyanobacterial blooms in a world experiencing anthropogenic and climatic-induced change. Science of The Total Environment, 409(10), 1739–1745. https://doi.org/10.1016/j.scitotenv.2011.02.001 Paerl, H. W., Gardner, W. S., Havens, K. E., Joyner, A. R., McCarthy, M. J., Newell, S. E., et al. (2016). Mitigating cyanobacterial harmful algal blooms in aquatic ecosystems impacted by climate change and anthropogenic nutrients. Harmful Algae, 54, 213–222. https://doi.org/10.1016/j.hal.2015.09.009 Pearce, N. J. T., & Yates, A. G. (2020). Spatial and temporal patterns in macronutrient concentrations and stoichiometry of tributaries draining the lower Great Lakes-St. Lawrence basin. Journal of Great Lakes Research, 46(4), 989–1000. https://doi.org/10.1016/j.jglr.2020.05.002 Peel, M. C., Finlayson, B. L., & McMahon, T. A. (2007). Updated world map of the Köppen- Geiger climate classification. Hydrology and Earth System Sciences, 11(5), 1633–1644. https://doi.org/10.5194/hess-11-1633-2007 Pennino, M. J., Compton, J. E., & Leibowitz, S. G. (2017). Trends in Drinking Water Nitrate Violations Across the United States. Environmental Science & Technology, 51(22), 13450– 13460. https://doi.org/10.1021/acs.est.7b04269 Pennuto, C. M., Dayton, L., Kane, D. D., & Bridgeman, T. B. (2014). Lake Erie nutrients: From watersheds to open water. Journal of Great Lakes Research, 40(3), 469–472. https://doi.org/10.1016/j.jglr.2014.07.002 Pollack, L., Comuzzi, J., Brooks, I. B., Trépanier, P., Speck, S., & Knott, L. D. (2011). Fifteenth Biennial Report Executive Summary & Recommendations. Powers, S. M., Bruulsema, T. W., Burt, T. P., Chan, N. I., Elser, J. J., Haygarth, P. M., et al. (2016). Long-term accumulation and transport of anthropogenic phosphorus in three river basins. Nature Geoscience, 9(5), 353–356. https://doi.org/10.1038/ngeo2693 Preston, S. D., Alexander, R. B., & Wolock, D. M. (2011). Sparrow Modeling to Understand Water-Quality Conditions in Major Regions of the United States: A Featured Collection 128 Introduction1. Journal of the American Water Resources Association, 47(5), 887–890. https://doi.org/10.1111/j.1752-1688.2011.00585.x Pretty, J. N., Mason, C. F., Nedwell, D. B., Hine, R. E., Leaf, S., & Dils, R. (2003). Environmental Costs of Freshwater Eutrophication in England and Wales. Environmental Science & Technology, 37(2), 201–208. https://doi.org/10.1021/es020793k Prinds, C., Petersen, R. J., Greve, M. H., & Iversen, B. V. (2019). Locating tile drainage outlets and surface flow in riparian lowlands using thermal infrared and RGB-NIR remote sensing. Geografisk Tidsskrift-Danish Journal of Geography, 119(1), 1–12. https://doi.org/10.1080/00167223.2019.1573408 Rabalais, N. N., & Turner, R. E. (2019). Gulf of Mexico Hypoxia: Past, Present, and Future. Limnology and Oceanography Bulletin, 28(4), 117–124. https://doi.org/10.1002/lob.10351 Rakhimbekova, S., O’Carroll, D. M., Oldfield, L. E., Ptacek, C. J., & Robinson, C. E. (2021). Spatiotemporal controls on septic system derived nutrients in a nearshore aquifer and their discharge to a large lake. Science of The Total Environment, 752, 141262. https://doi.org/10.1016/j.scitotenv.2020.141262 Rattan, K. J., Corriveau, J. C., Brua, R. B., Culp, J. M., Yates, A. G., & Chambers, P. A. (2017). Quantifying seasonal variation in total phosphorus and nitrogen from prairie streams in the Red River Basin, Manitoba Canada. Science of The Total Environment, 575, 649–659. https://doi.org/10.1016/j.scitotenv.2016.09.073 Rattan, K. J., Blukacz-Richards, E. A., Yates, A. G., Culp, J. M., & Chambers, P. A. (2019). Hydrological variability affects particulate nitrogen and phosphorus in streams of the Northern Great Plains. Journal of Hydrology: Regional Studies, 21, 110–125. https://doi.org/10.1016/j.ejrh.2018.12.008 Read, E. K., Carr, L., Cicco, L. D., Dugan, H. A., Hanson, P. C., Hart, J. A., et al. (2017). Water quality data for national‐scale aquatic research: The Water Quality Portal. Water Resources Research, 53(2), 1735–1745. https://doi.org/10.1002/2016wr019993 Reay, W. G. (2004). Septic Tank Impacts on Ground Water Quality and Nearshore Sediment Nutrient Flux. Groundwater, 42(7), 1079–1089. https://doi.org/10.1111/j.1745- 6584.2004.tb02645.x Ren, D., Engel, B., Mercado, J. A. V., Guo, T., Liu, Y., & Huang, G. (2022). Modeling and assessing water and nutrient balances in a tile-drained agricultural watershed in the U.S. Corn Belt. Water Research, 210, 117976. https://doi.org/10.1016/j.watres.2021.117976 Richards, R. P., Alameddine, I., Allan, J. D., Baker, D. B., Bosch, N. S., Confesor, R., et al. (2012). “Nutrient Inputs to the Laurentian Great Lakes by Source and Watershed Estimated Using SPARROW Watershed Models” by Dale M. Robertson and David A. Saad 2: Discussion - Nutrient Inputs to the Laurentian Great Lakes by Source and Watershed Estimated Using SPARROW Watershed Models. JAWRA Journal of the American Water Resources Association, 49(3), 715–724. https://doi.org/10.1111/jawr.12006 129 Rixon, S., Levison, J., Binns, A., & Persaud, E. (2020). Spatiotemporal variations of nitrogen and phosphorus in a clay plain hydrological system in the Great Lakes Basin. Science of The Total Environment, 714, 136328. https://doi.org/10.1016/j.scitotenv.2019.136328 Robertson, D. M., & Saad, D. A. (2011a). Nutrient Inputs to the Laurentian Great Lakes by Source and Watershed Estimated Using SPARROW Watershed Models. Journal of the American Water Resources Association, 47(5), 1011–1033. https://doi.org/10.1111/j.1752- 1688.2011.00574.x Robertson, D. M., & Saad, D. A. (2011b). Nutrient Inputs to the Laurentian Great Lakes by Source and Watershed Estimated Using SPARROW Watershed Models1: Nutrient Inputs to the Laurentian Great Lakes by Source and Watershed Estimated Using SPARROW Watershed Models. JAWRA Journal of the American Water Resources Association, 47(5), 1011–1033. https://doi.org/10.1111/j.1752-1688.2011.00574.x Robertson, D. M., & Saad, D. A. (2013). SPARROW Models Used to Understand Nutrient Sources in the Mississippi/Atchafalaya River Basin. Journal of Environmental Quality, 42(5), 1422–1440. https://doi.org/10.2134/jeq2013.02.0066 Robertson, D. M., Saad, D. A., Benoy, G. A., Vouk, I., Schwarz, G. E., & Laitta, M. T. (2019). Phosphorus and Nitrogen Transport in the Binational Great Lakes Basin Estimated Using SPARROW Watershed Models. JAWRA Journal of the American Water Resources Association, 55(6), 1401–1424. https://doi.org/10.1111/1752-1688.12792 Robertson, W. D., Stempvoort, D. R. V., & Schiff, S. L. (2019). Review of phosphorus attenuation in groundwater plumes from 24 septic systems. Science of The Total Environment, 692, 640–652. https://doi.org/10.1016/j.scitotenv.2019.07.198 Robinson, C. (2015). Review on groundwater as a source of nutrients to the Great Lakes and their tributaries. Journal of Great Lakes Research, 41(4), 941–950. https://doi.org/10.1016/j.jglr.2015.08.001 Roy, S., Swetnam, T., Trochim, E., Schwehr, K., & Pasquarella, V. (2023). samapriya/awesome- gee-community-datasets: Community Catalog (1.0.4). Zenodo. Retrieved from https://doi.org/10.5281/zenodo.7614526 Rubel, F., & Kottek, M. (2010). Observed and projected climate shifts 1901-2100 depicted by world maps of the Köppen-Geiger climate classification. Meteorologische Zeitschrift, 19(2), 135–141. https://doi.org/10.1127/0941-2948/2010/0430 Ryden, J. C., Syers, J. K., & Harris, R. F. (1974). Phosphorus in Runoff and Streams. Advances in Agronomy, 25, 1–45. https://doi.org/10.1016/s0065-2113(08)60777-4 Sasaki, Y. (2007). The truth of the F-measure. Teach Tutor Mater, 1(5), 1–5. Scavia, D., Allan, J. D., Arend, K. K., Bartell, S., Beletsky, D., Bosch, N. S., et al. (2014). Assessing and addressing the re-eutrophication of Lake Erie: Central basin hypoxia. Journal of Great Lakes Research, 40(2), 226–246. https://doi.org/10.1016/j.jglr.2014.02.004 130 Schilling, K. E., Gassman, P. W., Arenas-Amado, A., Jones, C. S., & Arnold, J. (2019). Quantifying the contribution of tile drainage to basin-scale water yield using analytical and numerical models. Science of The Total Environment, 657, 297–309. https://doi.org/10.1016/j.scitotenv.2018.11.340 Schmadel, N. M., Harvey, J. W., & Schwarz, G. E. (2021). Seasonally dynamic nutrient modeling quantifies storage lags and time-varying reactivity across large river basins. Environmental Research Letters, 16(9), 095004. https://doi.org/10.1088/1748-9326/ac1af4 Schmidt, G., Jenkerson, C., Masek, J., Vermote, E., & Gao, F. (2013). Landsat ecosystem disturbance adaptive processing system (LEDAPS) algorithm description: U.S. Geological Survey Open-File Report 2013–1057, 17p. Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation Coefficients. Anesthesia & Analgesia, 126(5), 1763–1768. https://doi.org/10.1213/ane.0000000000002864 Schroeder, T. A., Schleeweis, K. G., Moisen, G. G., Toney, C., Cohen, W. B., Freeman, E. A., et al. (2017). Testing a Landsat-based approach for mapping disturbance causality in U.S. forests. Remote Sensing of Environment, 195, 230–243. https://doi.org/10.1016/j.rse.2017.03.033 Schullehner, J., Hansen, B., Thygesen, M., Pedersen, C. B., & Sigsgaard, T. (2018). Nitrate in drinking water and colorectal cancer risk: A nationwide population‐based cohort study. International Journal of Cancer, 143(1), 73–79. https://doi.org/10.1002/ijc.31306 Seitzinger, S. P., Harrison, J. A., Dumont, E., Beusen, A. H. W., & Bouwman, A. F. (2005). Sources and delivery of carbon, nitrogen, and phosphorus to the coastal zone: An overview of Global Nutrient Export from Watersheds (NEWS) models and their application. Global Biogeochemical Cycles, 19(4), n/a-n/a. https://doi.org/10.1029/2005gb002606 Sharma, A. (2020). The Wicked Problem of Diffuse Nutrient Pollution from Agriculture. Journal of Environmental Law, 32(3), 471–502. https://doi.org/10.1093/jel/eqaa017 Sims, J. T., Simard, R. R., & Joern, B. C. (1998). Phosphorus Loss in Agricultural Drainage: Historical Perspective and Current Research. Journal of Environmental Quality, 27(2), 277– 293. https://doi.org/10.2134/jeq1998.00472425002700020006x Sinha, E., Michalak, A. M., & Balaji, V. (2017). Eutrophication will increase during the 21st century as a result of precipitation changes. Science, 357(6349), 405–408. https://doi.org/10.1126/science.aan2409 Sloto, R. A., & Crouse, M. Y. (1996). HYSEP: A Computer Program for Streamflow Hydrograph Separation and Analysis. Retrieved from https://pubs.usgs.gov/wri/1996/4040/wri19964040.pdf Smith, D., King, K., Johnson, L., Francesconi, W., Richards, P., Baker, D., & Sharpley, A. N. (2015). Surface Runoff and Tile Drainage Transport of Phosphorus in the Midwestern 131 United States. Journal of Environmental Quality, 44(2), 495–502. https://doi.org/10.2134/jeq2014.04.0176 Smith, R. A., Schwarz, G. E., & Alexander, R. B. (1997). Regional interpretation of water‐ quality monitoring data. Water Resources Research, 33(12), 2781–2798. https://doi.org/10.1029/97wr02171 Smith, R. A., Schwar, G. E., & Alexander, R. B. (1997). Regional interpretation of water-quality monitoring data. Soller, D. R., Packard, P. H., & Garrity, C. P. (2012). Database for USGS Map I-1970 — Map showing the thickness and character of Quaternary sediments in the glaciated United States east of the Rocky Mountains: U.S. Geological Survey Data Series 656. Retrieved from https://pubs.usgs.gov/ds/656/ Sprague, L. A., Oelsner, G. P., & Argue, D. M. (2017). Challenges with secondary use of multi- source water-quality data in the United States. Water Research, 110, 252–261. https://doi.org/10.1016/j.watres.2016.12.024 Stenback, G. A., Crumpton, W. G., Schilling, K. E., & Helmers, M. J. (2011). Rating curve estimation of nutrient loads in Iowa rivers. Journal of Hydrology, 396(1–2), 158–169. https://doi.org/10.1016/j.jhydrol.2010.11.006 Sterner, R. W. (2011). C:N:P stoichiometry in Lake Superior: freshwater sea as end member. Inland Waters, 1(1), 29–46. https://doi.org/10.5268/iw-1.1.365 Sterner, R. W. (2020). The Laurentian Great Lakes: A Biogeochemical Test Bed. Annual Review of Earth and Planetary Sciences, 49(1), 1–29. https://doi.org/10.1146/annurev-earth- 071420-051746 Sterner, R. W., Ostrom, P., Ostrom, N. E., Klump, J. V., Steinman, A. D., Dreelin, E. A., et al. (2017). Grand challenges for research in the Laurentian Great Lakes: Grand challenges for Great Lakes research. Limnology and Oceanography, 62(6), 2510–2523. https://doi.org/10.1002/lno.10585 Stow, C. A., Cha, Y., Johnson, L. T., Confesor, R., & Richards, R. P. (2015). Long-Term and Seasonal Trend Decomposition of Maumee River Nutrient Inputs to Western Lake Erie. Environmental Science & Technology, 49(6), 3392–3400. https://doi.org/10.1021/es5062648 Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9(1), 307. https://doi.org/10.1186/1471-2105-9-307 Sugg, Z. (2007). Assessing US farm drainage: Can GIS lead to better estimates of subsurface drainage extent? World Resources Institute. 132 Suriano, Z. J., Robinson, D. A., & Leathers, D. J. (2019). Changing snow depth in the Great Lakes basin (USA): Implications and trends. Anthropocene, 26, 100208. https://doi.org/10.1016/j.ancene.2019.100208 Swaney, D. P., Howarth, R. W., & Hong, B. (2018). Nitrogen use efficiency and crop production: Patterns of regional variation in the United States, 1987–2012. Science of The Total Environment, 635, 498–511. https://doi.org/10.1016/j.scitotenv.2018.04.027 Thomas, N. W., Arenas, A. A., Schilling, K. E., & Weber, L. J. (2016). Numerical investigation of the spatial scale and time dependency of tile drainage contribution to stream flow. Journal of Hydrology, 538, 651–666. https://doi.org/10.1016/j.jhydrol.2016.04.055 Tilahun, T., & Seyoum, W. M. (2020). High-Resolution Mapping of Tile Drainage in Agricultural Fields Using Unmanned Aerial System (UAS)-Based Radiometric Thermal and Optical Sensors. Hydrology, 8(1), 2. https://doi.org/10.3390/hydrology8010002 Tong, Y., Xu, X., Zhang, S., Shi, L., Zhang, X., Wang, M., et al. (2019). Establishment of season-specific nutrient thresholds and analyses of the effects of nutrient management in eutrophic lakes through statistical machine learning. Journal of Hydrology, 578, 124079. https://doi.org/10.1016/j.jhydrol.2019.124079 Tong, Y., Wang, M., Peñuelas, J., Liu, X., Paerl, H. W., Elser, J. J., et al. (2020). Improvement in municipal wastewater treatment alters lake nitrogen to phosphorus ratios in populated regions. Proceedings of the National Academy of Sciences, 117(21), 11566–11572. https://doi.org/10.1073/pnas.1920759117 Tuholske, C., Halpern, B. S., Blasco, G., Villasenor, J. C., Frazier, M., & Caylor, K. (2021). Mapping global inputs and impacts from of human sewage in coastal ecosystems. PLoS ONE, 16(11), e0258898. https://doi.org/10.1371/journal.pone.0258898 USDA. (2013). Soil Survey Geographic (SSURGO) database - Drainage Class. Retrieved from https://sdmdataaccess.sc.egov.usda.gov USDA. (2023). World Agricultural Production (WAP27-34) (Foreign Agricultural Services Circular Series) (pp. 27–34). Retrieved from https://apps.fas.usda.gov/psdonline/circulars/production USDA-NASS. (2017). USDA National Agricultural Statistics Service Cropland Data Layer. Retrieved from https://nassgeodata.gmu.edu/CropScape/ USGS. (2011). NLCD 2011 Land Cover Coterminous United States. Retrieved from http://mrlc.gov/data USGS. (2016). NLCD 2016 Land Cover Coterminous United States. Retrieved from https://www.mrlc.gov/data USGS. (2017). Landsat Collection 2 Level-2 Science Products. Retrieved from https://www.usgs.gov/landsat-missions/landsat-collection-2-level-2-science-products 133 USGS. (2019). 3D Elevation Program 10-Meter Resolution Digital Elevation Model. USGS. (2023). Landsat Collection 2 Surface Temperature data gaps due to missing ASTER GED. Retrieved August 2, 2023, from https://www.usgs.gov/landsat-missions/landsat- collection-2-surface-temperature-data-gaps-due-missing-aster-ged Valayamkunnath, P., Barlage, M., Chen, F., Gochis, D. J., & Franz, K. J. (2020). Mapping of 30- meter resolution tile-drained croplands using a geospatial modeling approach. Scientific Data, 7(1), 257. https://doi.org/10.1038/s41597-020-00596-x Valiela, I., Collins, G., Kremer, J., Lajtha, K., Geist, M., Seely, B., et al. (1997). Nitrogen Loading From Coastal Watersheds To Receiving Estuaries: New Method And Application. Ecological Applications, 7(2), 358–380. https://doi.org/10.1890/1051- 0761(1997)007[0358:nlfcwt]2.0.co;2 Vermote, E., Justice, C., Claverie, M., & Franch, B. (2016). Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product. Remote Sensing of Environment, 185(Iss 2), 46–56. https://doi.org/10.1016/j.rse.2016.04.008 Wan, Z., Hook, S., & Hulley, G. (2017). MODIS/Terra Land Surface Temperature/Emissivity 8- Day L3 Global 1km SIN Grid V061 [Dataset]. NASA EOSDIS Land Processes DAAC. Accessed 2023-02-07 from https://doi. org/10.5067/MODIS/MOD11A2. 061. Wang, J., Beusen, A. H. W., Liu, X., & Bouwman, A. F. (2020). Aquaculture Production is a Large, Spatially Concentrated Source of Nutrients in Chinese Freshwater and Coastal Seas. Environmental Science & Technology, 54(3), 1464–1474. https://doi.org/10.1021/acs.est.9b03340 Wang, Z., Lai, C., Chen, X., Yang, B., Zhao, S., & Bai, X. (2015). Flood hazard risk assessment model based on random forest. Journal of Hydrology, 527, 1130–1141. https://doi.org/10.1016/j.jhydrol.2015.06.008 Ward, M. H., Jones, R. R., Brender, J. D., Kok, T. M. de, Weyer, P. J., Nolan, B. T., et al. (2018). Drinking Water Nitrate and Human Health: An Updated Review. International Journal of Environmental Research and Public Health, 15(7), 1557. https://doi.org/10.3390/ijerph15071557 Watson, S. B., Miller, C., Arhonditsis, G., Boyer, G. L., Carmichael, W., Charlton, M. N., et al. (2016). The re-eutrophication of Lake Erie: Harmful algal blooms and hypoxia. Harmful Algae, 56, 44–66. https://doi.org/10.1016/j.hal.2016.04.010 Weinstein, C. B., Bourgeau-Chavez, L. L., Martin, S. L., Currie, W. S., Grantham, K., Hamlin, Q. F., et al. (2021). Enhancing Great Lakes coastal ecosystems research by initiating engagement between scientists and decision-makers. Journal of Great Lakes Research, 47(4), 1235–1240. https://doi.org/10.1016/j.jglr.2021.04.018 Wellen, C., Arhonditsis, G. B., Labencki, T., & Boyd, D. (2012). A Bayesian methodological framework for accommodating interannual variability of nutrient loading with the 134 SPARROW model. Water Resources Research, 48(10). https://doi.org/10.1029/2012wr011821 Wellen, C., Kamran-Disfani, A.-R., & Arhonditsis, G. B. (2015). Evaluation of the Current State of Distributed Watershed Nutrient Water Quality Modeling. Environmental Science & Technology, 49(6), 3278–3290. https://doi.org/10.1021/es5049557 Williams, M. R., Livingston, S. J., Penn, C. J., Smith, D. R., King, K. W., & Huang, C. (2018). Controls of event-based nutrient transport within nested headwater agricultural watersheds of the western Lake Erie basin. Journal of Hydrology, 559, 749–761. https://doi.org/10.1016/j.jhydrol.2018.02.079 Wine, M. L., Golden, H. E., Christensen, J. R., Lane, C. R., & Makhnin, O. (2020). Seasonal watershed-scale influences on nitrogen concentrations across the Upper Mississippi River Basin. Hydrology and Earth System Sciences Discussions, 2020, 1–31. https://doi.org/10.5194/hess-2020-423 Winter, T. C. (2001). THE CONCEPT OF HYDROLOGIC LANDSCAPES. JAWRA Journal of the American Water Resources Association, 37(2), 335–349. https://doi.org/10.1111/j.1752- 1688.2001.tb00973.x Withers, P. J., Jordan, P., May, L., Jarvie, H. P., & Deal, N. E. (2014). Do septic tank systems pose a hidden threat to water quality? Frontiers in Ecology and the Environment, 12(2), 123–130. https://doi.org/10.1890/130131 Withers, P. J. A., May, L., Jarvie, H. P., Jordan, P., Doody, D., Foy, R. H., et al. (2012). Nutrient emissions to water from septic tank systems in rural catchments: Uncertainties and implications for policy. Environmental Science & Policy, 24, 71–82. https://doi.org/10.1016/j.envsci.2012.07.023 Woo, D. K., Song, H., & Kumar, P. (2019). Mapping subsurface tile drainage systems with thermal images. Agricultural Water Management, 218, 94–101. https://doi.org/10.1016/j.agwat.2019.01.031 Wuebbles, D., Cardinale, B., Cherkauer, K., & Davidson-Arnott, R. (2019). An Assessment of the Impacts of Climate Change on the Great Lakes. Environmental Law & Policy Center, 104(3–4), 629–652. https://doi.org/10.1007/s10584-010-9872-z Xie, Y., & Lark, T. J. (2021). Mapping annual irrigation from Landsat imagery and environmental variables across the conterminous United States. Remote Sensing of Environment, 260, 112445. https://doi.org/10.1016/j.rse.2021.112445 Yan, K., Yuan, Z., Goldberg, S., Gao, W., Ostermann, A., Xu, J., et al. (2019). Phosphorus mitigation remains critical in water protection: A review and meta-analysis from one of China’s most eutrophicated lakes. Science of The Total Environment, 689, 1336–1347. https://doi.org/10.1016/j.scitotenv.2019.06.302 135 Yan, L., & Roy, D. P. (2016). Conterminous United States crop field size quantification from multi-temporal Landsat data. Remote Sensing of Environment, 172, 67–86. https://doi.org/10.1016/j.rse.2015.10.034 Yang, Y., Anderson, M., Gao, F., Hain, C., Kustas, W., Meyers, T., et al. (2016). Impact of Tile Drainage on Evapotranspiration in South Dakota, USA, Based on High Spatiotemporal Resolution Evapotranspiration Time Series From a Multisatellite Data Fusion System. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(6), 2550– 2564. https://doi.org/10.1109/jstars.2017.2680411 Zhang, C., Walters, D., & Kovacs, J. M. (2014). Applications of Low Altitude Remote Sensing in Agriculture upon Farmers’ Requests– A Case Study in Northeastern Ontario, Canada. PLoS ONE, 9(11), e112894. https://doi.org/10.1371/journal.pone.0112894 Zhang, L., Zhao, Y., Hein-Griggs, D., Barr, L., & Ciborowski, J. J. H. (2019). Projected extreme temperature and precipitation of the Laurentian Great Lakes Basin. Global and Planetary Change, 172, 325–335. https://doi.org/10.1016/j.gloplacha.2018.10.019 Zhang, W., Pueppke, S. G., Li, H., Geng, J., Diao, Y., & Hyndman, D. W. (2019). Modeling phosphorus sources and transport in a headwater catchment with rapid agricultural expansion. Environmental Pollution, 255(Pt 2), 113273. https://doi.org/10.1016/j.envpol.2019.113273 Zhang, W., Li, H., Hyndman, D. W., Diao, Y., Geng, J., & Pueppke, S. G. (2020). Water quality trends under rapid agricultural expansion and enhanced in-stream interception in a hilly watershed of Eastern China. Environmental Research Letters, 15(8), 084030. https://doi.org/10.1088/1748-9326/ab8981 Zou, T., Zhang, X., & Davidson, E. A. (2022). Global trends of cropland phosphorus use and sustainability challenges. Nature, 611(7934), 81–87. https://doi.org/10.1038/s41586-022- 05220-z 136 APPENDIX A: SUPPORTING INFORMATION FOR CHAPTER 2 Text A2.1: SENSEflux Model Equations The main function of SENSEflux modeling is shown in equation (2.1) below. Where Lk is the simulated load (kg/day) at an in-stream location for an individual catchment. S represents the nutrient source, Spoint is the point source and directly flows into the rivers and streams, Ssep is the septic tank source, and Sij is the application of other sources i that can be harvested to watershed cell j. SepEff is removal efficiency on septic loads, it’s fixed as 0.3 for total nitrogen and 0.35 for total phosphorus(Luscz et al., 2017; W. D. Robertson et al., 2019). ExHij is an extraction factor that describes the in-place removal of nutrients before transport (such as harvest). For all other cells and sources, ExHij is equal to 1. Fj is a subsurface partition parameter, describing the fraction of nutrients that are transported via a subsurface pathway. It is a function of normalized groundwater recharge fraction (recharge fraction in a cell j divides the maximum recharge fraction across the SENSEflux model domain), where rechFj is the groundwater recharge fraction in cell j, and max(rechF) is the maximum value of groundwater recharge fraction across the SENSEflux model domain (equation (2.2)). The recharge fraction was defined as the average annual precipitation becoming recharged and is limited by 0.55 (Hyndman et al., 2007). 𝑐𝑒𝑙𝑙𝑠 𝑜𝑡ℎ𝑒𝑟 𝑠𝑜𝑢𝑟𝑐𝑒𝑠 𝐿𝑘 = ∑ 𝑅𝑗𝐿𝑎𝑐𝑢𝑠𝑗 × {𝑆𝑝𝑜𝑖𝑛𝑡 + 𝑆𝑠𝑒𝑝(1 − 𝑆𝑒𝑝𝐸𝑓𝑓)𝐵𝑠𝑒𝑗 + ∑ 𝑆𝑖𝑗 × 𝐸𝑥𝐻𝑖𝑗 𝑗 𝑖 × [(1 − 𝐹𝑗)𝐵𝑠𝑗 + 𝐹𝑗(1 − 𝐹𝑠𝑡𝑜𝑟𝑖𝑗)𝐵𝑔𝑗]} 𝐹𝑗 = 𝑓 × ( 𝑟𝑒𝑐ℎ𝐹𝑗 𝑚𝑎𝑥 (𝑟𝑒𝑐ℎ𝐹) ) (2.1) (2.2) The major updates between SENSEflux and its former version (Luscz et al., 2017) are an updated river retention function (equation (2.3) - (2.6)), lake retention and subsurface phosphorus storage with two new model parameters Lacusj and Flossj, defined in equation (2.7) and (2.8) respectively. 137 Rj describes river reduction of the remaining nutrients after landscape attenuation, and it split as denitrification for TN (sorption for TP) plus biological uptake & burial (equation (2.3)). For N denitrification or P sorption (equation (2.4)), it is an exponential function of DNSPj that was calculated based on the flow length tool and used streambed interaction as an input weight raster. The composite raster calculation is shown in equation (2.6), where 𝑘̂, 𝑠̂, and 𝑏̂ are derived from hydraulic conductivity, slope, and basin yield, respectively. R and V represent hydraulic radius and velocity, see details in Text A2.3. Biological uptake & burial is an exponential function of Tj that represents in-stream travel time from cell j to the downstream observation point (equation (2.5)). 𝑅𝑗 = 𝑒−𝛼∗𝐷𝑁𝑆𝑃𝑗 ∗ 𝑒−𝛼1∗𝐵𝑖𝑜𝑗 𝐷𝑁𝑆𝑃𝑗 = 𝑓𝑙𝑜𝑤𝐿𝑒𝑛(𝑠𝑡𝑟𝑒𝑎𝑚𝑏𝑒𝑑𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑗) 𝐵𝑖𝑜𝑗 = 𝑓𝑙𝑜𝑤𝐿𝑒𝑛(𝑖𝑛𝑆𝑡𝑟𝑒𝑎𝑚𝑇𝑟𝑎𝑣𝑒𝑙𝑇𝑖𝑚𝑒𝑗) 𝑗 ∗ 𝑠̂𝑗 ∗ (1 − 𝑏̂ 𝑘̂ 𝑅𝑗 ∗ 𝑉𝑗 𝑠𝑡𝑟𝑒𝑎𝑚𝑏𝑒𝑑𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑗 = 𝑗) (2.3) (2.4) (2.5) (2.6) We also consider nutrient attenuation via lakes or reservoirs (Lacusj) as they travel down the stream network as a function of travel distance in lakes (equation (2.7)), and only the loss for a lake or reservoir which has a connection with streams is considered due to the inherent river routing scheme in the SENSEflux model. Flossj is the fraction of groundwater pathway nutrients stored/lost in the soil and the deeper unsaturated zone where the floss is a calibrated constant (equation (2.8)). floss was assumed to be zero due to the high mobility of nitrogen. By adding this term, I can better estimate the amount of phosphorus in every grid cell delivered to the streams. 𝐿𝑎𝑐𝑢𝑠𝑗 = 𝑒−𝛼2∗𝐷𝐿𝑗 𝐹𝑠𝑡𝑜𝑟𝑗 = 𝑓𝑠𝑡𝑜𝑟 × (1 − 𝑟𝑒𝑐ℎ𝐹𝑗 𝑚𝑎𝑥 (𝑟𝑒𝑐ℎ𝐹) ) (2.7) (2.8) All basin attenuations are exponential functions of flow length (D) from each cell to the nearest 138 downgradient stream cell. Bsj, Bgj, and Bsej are basin reduction parameters, representing the overland flow, general groundwater, and septic plume pathways respectively (equations (2.9), (2.11), (2.12). Tile field pathway Bstj is also considered in the model as an alternative overland pathway, representing nutrient attenuation along with tile fields if tile exists in a cell equation (2.10). Equations describing each of these terms are given in (Luscz et al., 2017) and (Martin et al., 2021). 𝐵𝑠𝑗 = 𝑒−𝑏𝑠∗𝐷𝑗 𝐵𝑠𝑡𝑗 = 𝑒−𝑏𝑠𝑡∗𝐷𝑗 𝐵𝑔𝑗 = 𝑒−𝑏𝑔∗𝐷𝑗 𝐵𝑠𝑒𝑗 = 𝑒−𝑏𝑠𝑒∗𝐷𝑗 (2.9) (2.10) (2.11) (2.12) Text A2.2: Spatial Distribution of Loss Terms and Basin Storage Three in-situ loss terms are applied before nutrients are transported in SENSEflux: Septic removal (SepEff), Harvest (ExH), and Storage (Fstor). Septic removal efficiency is applied on septic load and fixed as 0.3 for N and 0.35 for P (Luscz et al., 2017; W. D. Robertson et al., 2019). Harvest includes all in-place root zone nutrient loss and is assumed to occur in cells with manure or chemical agricultural fertilizers applied. The Storage loss term includes both in-place storage and loss of nutrients below the root zone, which I assume to occur for phosphorus (Eq (2.8)). Fstor was assumed to be zero for nitrogen due to the high mobility of N. By adding this term, I can better estimate the amount of phosphorus in every grid cell delivered to the streams. Text A2.3: Spatial Distribution and Derivation for Instream and Lake Losses The attenuation of nutrients during stream/wetland transport is assumed to be broken into two components: 1) water column and sediment interface losses, and 2) hyporheic zone losses. Water column losses consist of biological uptake, followed by subsequent denitrification and particulate transport. For phosphorus, sediment burial is another active process. Hyporheic zone losses may 139 be biological uptake (N or P), denitrification (N), sorption (P), or mineralization (P). I label the corresponding terms “water column (WC)” and “hyporheic zone (HZ)” attenuation. Water column attenuation is assumed to be a function of travel time in the stream/wetland, given as 𝐿𝑜𝑠𝑠𝑤𝑐 = 𝛼𝑤𝑐𝑇𝑠, where 𝛼𝑤𝑐 is a calibrated constant across the domain, and 𝑇𝑠 is the travel time in-stream. Travel time in-stream in each model cell is given by 𝑇𝑠 = 1 𝑣 where 𝑣 is the velocity of water in that stream cell. While this is the loss in a single cell, integrating across all cells along the flowpath results delivery of nutrients 𝑁𝑜𝑢𝑡 = 𝑁𝑖𝑛𝑅𝑤𝑐, where 𝑅𝑊𝐶 = 𝑒−𝛼𝑤𝑐𝑇𝑇𝑠. Here, 𝑇𝑇𝑠 is the total travel time in streams, computed in ArcGIS as described in the main text (section 2.4.5). To compute losses due to streambed and hyporheic zone interactions beneath it, I define losses in this zone to be given by the residence time multiplied by a static constant, 𝐿𝑜𝑠𝑠ℎ𝑧 = 𝛼ℎ𝑧𝜏ℎ𝑧 where 𝜏ℎ𝑧 is the residence time in the hyporheic zone, defined as 𝜏ℎ𝑧 = 𝐷 𝑣ℎ𝑧⁄ where 𝐷 is the hyporheic zone depth and 𝑣ℎ𝑧 is the velocity of water flowing into/out of the HZ. The streambed interaction factor (D) is given by 𝐷~𝑓(𝐾, 𝑆, 1 𝑅𝑒𝑐ℎ ⁄ ), where K is the hydraulic conductivity of the streambed sediments, S is the slope of the stream channel, and Rech is groundwater recharge. Thus, I have assumed that higher streambed sediments increase the exchange of surface and groundwater (increasing hyporheic zone depth) and that a higher slope leads to greater streambed morphometric variability, and thus greater incidence of flow paths entering the sediments and exiting shortly thereafter, promoting hyporheic exchange. I further assume that greater groundwater recharge upstream leads to a stronger influx of water from below, which would reduce the depth of exchange through the groundwater pushing back against streamflow. Here, recharge is assumed to be represented by the basin yield of streams at their 30th percentile, thus 𝑅~𝐵𝑌30. I chose to normalize each of the terms by their maximum and minimum values across the model domain, and I log-transformed K. The final equation for hyporheic zone 140 depth D is: 𝐷 = 𝛼𝐷 ∙ [ 𝑙𝑜𝑔( 𝑙𝑜𝑔( ) 𝐾 𝐾𝑚𝑖𝑛 𝐾𝑚𝑎𝑥 𝐾𝑚𝑖𝑛 ] ∙ [ ) 𝑆 𝑆90 ] ∙ [1 − 𝐵𝑌30 𝐵𝑌30,𝑚𝑎𝑥 ] = 𝛼𝐷𝐾̃ ∙ 𝑆̃ ∙ 𝐵𝑌̃ (2.13) Where each of the terms with a tilde (~) represents 0-1 normalized values, corresponding to the similarly colored sections of the left-hand side of the equation. The units of D are [L] (here, m). Note for 𝑆̃ I limited the maximum value to 1. If I assume that the total flux of nutrients into the HZ (𝑛𝑖𝑛) is given by 𝑛𝑖𝑛 = 𝑄ℎ𝑧𝐶𝑖𝑛, where 𝑄 is the flux of water in/out of the HZ [L3/T] and 𝐶𝑖𝑛 is the input concentration to the HZ [M/L3], I can represent 𝑄ℎ𝑧 = 𝑃 ∙ 1 ∙ 𝑣ℎ𝑧, where 𝑃 is the perimeter of the stream channel [L], 1 is the unit length of the channel [L], and 𝑣ℎ𝑧 is the velocity of water exchange in the HZ [L/T]. Therefore, 𝑛𝑖𝑛 = 𝑃 ∙ 𝑣ℎ𝑧 ∙ 𝐶𝑖𝑛 = 𝑃∙𝑣ℎ𝑧∙𝑁𝑖𝑛 𝑄 , where 𝑁𝑖𝑛 is the input nutrient flux [M/T] in the stream channel above the HZ, and 𝑄 is the streamflow in the channel [L3/T]. We further assume that nutrient uptake is linearly related to the residence time in the HZ, 𝜏ℎ𝑧 which can be expressed as 𝜏ℎ𝑧 = 𝐷/𝑣ℎ𝑧. Thus, the flux of nutrients out 𝑛𝑜𝑢𝑡 = 𝑛𝑖𝑛 ∙ 𝜏ℎ𝑧 ∙ 𝛼ℎ𝑧, where 𝛼ℎ𝑧 is a parameter to be calibrated in the model. Substituting the definitions of 𝑛𝑖𝑛 and 𝜏ℎ𝑧, I get 𝑛𝑜𝑢𝑡 = 𝑃∙𝑣ℎ𝑧∙𝑁𝑖𝑛 𝑄 ∙ 𝐷 𝑣ℎ𝑧 ∙ 𝛼ℎ𝑧. Canceling terms, I get 𝑛𝑜𝑢𝑡 = 𝑃 𝑄 ∙ 𝑁𝑖𝑛 ∙ 𝐷 ∙ 𝛼ℎ𝑧. I can further use the relationship between 𝑃 (the wetted perimeter of the stream channel) and 𝑄 = 𝐴 ∙ 𝑣 [L3/T], determined by the hydraulic radius 𝑅 = 𝐴/𝑃 [L], where 𝐴 is the channel area, and 𝑣 is the in- stream average water velocity. Thus 𝑃 𝑄 = 𝐴 𝑅∙𝐴∙𝑣 = 1 𝑅∙𝑣 . Finally, if I substitute this and equation S3.1 into our equation for 𝑛𝑜𝑢𝑡 (noting that the linear constants combine 𝛼ℎ𝑧 = 𝛼ℎ𝑧 ∙ 𝛼𝐷) I get equation (2.14). 𝑛𝑜𝑢𝑡 = 𝑁𝑖𝑛 𝑲̃ ∙𝑺̃∙𝑩𝒀̃ 𝑹∙𝒗 𝛼ℎ𝑧 (2.14) The bold terms in equation (2.14) are independent of nutrient concentrations but vary in space. 141 Therefore, I combine these terms into a single model input layer I term the streambed exchange rate, 𝑆𝐸 = 𝐾̃∙𝑆̃∙𝐵𝑌̃ 𝑅∙𝑣 [T/L], shown in Figure A2.0.6 e. Other intermediate inputs to calculate 𝑆𝐸 are shown in Figures 2.2 and S6. While the values of 𝐾 and 𝑆 could be calculated from input static layers (e.g., Figure A2.0.1), 𝐵𝑌, 𝑅, and 𝑣 needed to be computed for each point along the stream channel. Text A2.4: Groundwater Recharge and Tile-Drained Area Calculation Groundwater recharge was estimated based on a series of linear models derived from the Landscape Hydrology Model (LHM), a coupled process-based hydrological model (Hyndman et al., 2007). LHM was originally developed for the Muskegon River Watershed, located in the central part of Michigan’s lower peninsula, and contains diverse land use representative of the broader region. This hydrological model combined several GIS layers including land use, soils, and station observation data to predict stream discharge, groundwater recharge, and evapotranspiration from 1990 to 2004. The linear regression models fit for each land use type to the percentage of precipitation that becomes recharges as a function of soil hydraulic conductivity. The annual precipitation from 2008 to 2012 was downloaded from Parameter-elevation Regressions on Independent Slopes Model (PRISM) database (OSU, 2014) and average annual precipitation from 2008 to 2012 was used for linear models. The soil hydraulic conductivity is derived from the soil texture of the soil survey geographic database (SSURGO) (USDA, 2013) and land cover data from the national land cover database 2011(USGS, 2011). The recharge estimates for USGLB are shown in Figure A2.0.1. To derive an estimated tile drainage map, I used GIS-based mapping based on the premise that crops grown on land with low slopes and poorly drained soil likely have tile drains. First, cells classified in the 2011 NLCD (USGS, 2011) as “Cultivated Crops” were extracted. Then, I fit a 142 model to land drained by tile county-level data from land-use practice in 2012 for the 109 counties in the USGLB (Figure A2.0.1), published by the United States Department of Agriculture, National Agricultural Statistical Service (NASS, 2012). 55 counties were randomly selected as training datasets and the remaining 54 counties as a validation dataset. This model selects areas that are cultivated land-use types, moderately low soil permeability (Ksat < 14.4 mm/hr), and low average slopes (< 1.2%) as tile drainage (r2 =0.83). The rest of the 54 counties were used to calibrate the model (r2=0.85). Estimates of tile-drained areas are shown in Figure A2.0.1. The aggregation method (maximum) used in pre-processing data may overestimate the area of tiled fields in the model. Text A2.5: Model Parameter Extended Discussion Four of the model parameters (f, ExH, SepEff, and fstor) are linear coefficients on loss terms in the model, while the remainder (basin and river losses) are coefficients in the exponent of an attenuation term—thus their values are not directly comparable. The subsurface partition parameter f is multiplied by the normalized recharge (Eq (2.2)) and represents the proportion of mobile surface-applied nutrients (after Harvest) that are sent through the groundwater pathway. Thus, areas with the highest recharge in the basin have 49% of mobile surface-applied nutrients sent to groundwater for N while 78% is routed through groundwater for P (table A2.1). See Figure A2.0.13 for a map of the final groundwater partition fraction. It is somewhat surprising that phosphorus has a higher fraction than nitrogen, given the relative ease with which NO3 in particular leaches from soils, however, this may be an artifact of the simplistic relationship between recharge rates and groundwater pathway hydrologic fractionation imposed here. It may also be due to a non-linear relationship between soil texture (underlying recharge) and P mobility. P moves much more readily through sandy soils than finer textured ones. That 143 relationship is also not captured properly here. Our model predicts that 80% of surface-applied N is harvested, lost to the atmosphere (N only), or stored in the root zone in agricultural settings, while 95% of P is. Figures S10a and S11a show total losses due to harvest, atmosphere, and storage for N and P, respectively. The higher rate of P harvest may be influenced by the tendency of P to sorb to unsaturated zone soils, rather than a more careful accounting of N and P needs when fertilizers are applied. Phosphorus then experiences an additional deep unsaturated zone storage, where up to 55% of groundwater-mobile P (in the lowest recharge areas, Eq (2.8)) is stored (Figure A2.0.11b). Areas with higher recharge then experience less storage proportionately. Nutrient attenuation during basin transport through surface runoff, tile drain fields, general groundwater, and septic plumes are determined by the overland travel distance along with the corresponding parameters (bs, bst, bg, and bse, see Eq (2.9-2.12)). The less calibrated parameter value means a higher delivery rate through the pathway, but the amount of nutrients delivered through these pathways are not directly comparable due to the different amounts traveled prior to basin attenuation. But I can compare bs and bst here as they are alternative pathways, tile drainage delivery rate is about 20 times higher than the overland runoff pathway for both TN (20.58) and TP (19.51). General groundwater pathway and septic plume pathway have the delivery rate between overland runoff and tile drains. Text A2.6: Load Comparison with Sparrow First of all, there are some important differences to note: SPARROW does not use spatially explicit nutrient sources nor attenuation processes and is run at a coarser resolution. Using the same observation dataset, the SENSEflux TN model slightly underestimated high loads while the SPARROW model slightly overpredicted them (Figure A2.0.9). Both SENSEflux and SPARROW models slightly underpredicted higher TP loads and overpredicted lower loads. These differences are not surprising because the two models 144 have several notable differences, including nutrient attenuation processes, methods to model nutrient sources, spatial resolution, and timeframes. Specifically, SENSEflux includes four distinct pathways (tile fields, overland, septic plumes, and groundwater, see Figure 2.1) while the SPARROW model uses data on land-to-water delivery factors, such as soil permeability, drainage density, precipitation, air temperature, the fraction of the stream catchment with tile drains to describe attenuation processes broadly across the basin(D. M. Robertson & Saad, 2011b). Moreover, SENSEflux uses spatially explicit nutrient source inputs from SENSEmap while SPARROW uses land use/cover and county-level estimates of nutrient masses to statistically compute sources more generally (Hamlin et al., 2020a). Finally, SPARROW models were developed for TN and TP with a 2002 base year (D. M. Robertson & Saad, 2011b), while nutrient sources and watershed factor data used in SENSEflux are based on ca. 2010 data. Figure A2.0.1 Model key inputs. (a) annual groundwater recharge for USGLB; (b) overland flowlength; (c) harvested areas where either manure or chemical agricultural fertilizers applied for TN and TP; (d) estimated tile drainage (indicated in yellow) estimated from land use, soil permeability, and average slope. 145 Figure A2.0.2 Study region and data source showing the GLB in the inset map, along with a. the Land Use/Land Cover across the basin from the National Land Cover Dataset (USGS, 2011); b. non-point TN source from SENSEmap (Hamlin et al., 2020a, b); c. Average annual precipitation from 2008 to 2012 from PRISM (PRISM Climate Group 2011); d. the saturated conductivity of the top soil layer from SSURGO (Soil Survey Staff, 2022). 146 Figure A2.0.3 SENSEmap nitrogen source for USGLB resampled to 720 m resolution for display, derived from Hamlin et al (2020). The units are kg/ha/yr except for point sources (kg/yr). The color breaks are based on quantile classification and round-off methods. 147 Figure A2.0.4 SENSEmap phosphorus source for USGLB resampled to 720 m resolution for display. The units are kg/ha/yr except for point sources (kg/yr). The color breaks are based on quantile classification and round-off methods. 148 Figure A2.0.5 Spatial domain showing nitrogen and phosphorus sampling locations that are used for delineating watersheds and loads used for model calibration and validation. (a) TN sampling locations (N = 116), (b) TP sampling locations (N = 119), (c) TN watersheds with loading, and (d) TP watersheds with loading. Maps are classified in quantiles, with each color representing 25% of the study domain. 149 Figure A2.0.6 Inputs are used to derive the river retention factor in SENSEflux. (a) average slope, (b) basin yield during baseflow; (c) hydraulic radius; (d) average velocity; (e) streambed exchange rate; (f) N denitrification or P sorption factor. 150 Figure A2.0.7 Model residual (log10 model – log10 observed) distribution and density are shown as violin plots. The bottom and top of each box represent the first and third quartiles, respectively, and the line inside each box represents the median. Zero residual is indicated as a black dashed line. The top and bottom bars (whiskers) represent the maximum and minimum residuals, respectively. Data beyond the end of the whiskers are "outlying" points and are plotted individually. None of the means were significantly different from zero, as measured by the one- sample t-test, with P values > 0.05. 151 Figure A2.0.8 Log model residuals (kg/day) by watersheds for both calibration and validation datasets. The color breaks are based on quantile classification and are rounded to the nearest 0.1. Figure A2.0.9 Comparison to the simulated loads (log10 of kg/day) in the SPARROW models. Blue dots and lines are for SENSEflux simulation, and red is for SPARROW. The dashed black line is a 1:1 line, where simulated loads are equal to observed loads. 152 Figure A2.0.10 TN model loss and attenuation outputs. (a) crop extraction of nitrogen; (b) total nitrogen loss during basin transport; (c) nitrogen uptake in streams and connected lakes. Maps are resampled from 120m SENSEflux outputs to 720m resolution here for display purposes and classified in quantiles with each color representing 20% of the dataset. 153 Figure A2.0.11 TP model loss and attenuation outputs. (a) crop extraction of phosphorus; (b) in- place phosphorus storage and loss of phosphorus below the root zone; (c) TP loss during basin transport; (d) phosphorus uptake in streams and connected lakes. Maps are resampled from 120m SENSEflux outputs to 720m resolution here for display purposes and classified in quantiles with each color representing 20% of the dataset. 154 Figure A2.0.12 Estimated total yield of TP delivered to lakes by four key pathways (kg/km2/yr). Maps are resampled from 120 m SENSEflux outputs to 720 m resolution here for display purposes and classified in quantiles, with each color representing 20% of the dataset; the white area in a&b within the basin boundary represents areas with no data as I assumed overland and tile fields are alternative pathways. Figure A2.0.13 Spatial distribution of SENSEflux surface and subsurface partition parameter (f) for TN (a) and TP (b). 155 Calibrated parameter value TP 0.78 0.95 0.35 0.55 1.27E-04 6.51E-06 5.90E-03 1.91E-02 1.41E-07 2.93E-04 1.94E-05 Table A2.1 Summary of optimized model parameters. Function type Parameter General function f ExH SepEff Subsurface partition Harvest extraction Efficiency multiplier on septic loads Linear TN 0.49 0.80 0.30 fstor bs bst bg bse dnsp bio lacus Fraction of groundwater- pathway nutrients stored in the deep unsaturated zone 0 Basin attenuation (surface) Tile attenuation (surface) Basin attenuation (subsurface) Basin attenuation (septic) N denitrification or P sorption in River Biological Uptake and Burial in River Lake reduction 1.41E-03 6.85E-05 3.76E-04 7.12E-04 1.97E-06 1.05E-04 1.36E-05 Exponential 156 APPENDIX B: SUPPORTING INFORMATION FOR CHAPTER 3 Figure A3.0.1 Comparison of TN versus the sum of TKN and TNN for melt (a) and baseflow (b) with 336 and 482 records from 2008 to 2012. (TN: total nitrogen; TKN: total Kjeldahl nitrogen; TNN: total nitrate and nitrite). Figure A3.0.2 Workflow to get concentration, flow, load, and watershed area. 157 Figure A3.0.3 Site concentrations used for SENSEflux model calibration and validation. 158 Figure A3.0.4 Watershed area (km2) by hydrologic seasons for TN and TP. 159 Figure A3.0.5 Model key inputs include (a) hydrologic landscape regions (HLR), (b) drift thickness and in-stream travel time during snowmelt, (c) and baseflow (d). For HLR, each value represents different regions. See region descriptions below. Hydrologic Landscape Region descriptions: 1. Subhumid plains with permeable soils and bedrock; 2. Humid plains with permeable soils and bedrock; 3. Subhumid plains with impermeable soils and permeable bedrock; 4. Humid plains with permeable soils and bedrock; 5. Arid plains with permeable soils and bedrock; 6. Subhumid plains with impermeable soils and bedrock; 7. Humid plains with permeable soils and impermeable bedrock; 8. Semiarid plains with impermeable soils and bedrock; 9. Humid plateaus with impermeable soils and permeable bedrock; 11. Humid plateaus with impermeable soils and bedrock; 12. Semiarid plateaus with permeable soils and impermeable bedrock; 13. Semiarid plateaus with impermeable soils and bedrock; 16. Humid mountains with permeable soils and impermeable bedrock; 17. Semiarid mountains with impermeable soils and bedrock; 18. Semiarid mountains with permeable soils and impermeable bedrock. 160 Figure A3.0.6 Surface and groundwater seasonal mobility for an example USGS station (04112500), Red Cedar River at East Lansing, MI. Streamflow data was downloaded for the 2010 water year, plotted as the black line. The gray shadow area represents baseflow that was separated using the hydrograph separation program, thus the white area between streamflow and baseflow is surface runoff. Two hydrological seasons are shown in rectangle area with orange (Melt) and baseflow (light blue). Surface and groundwater seasonal mobility are calculated for this station and is shown in the gray card on top left corner. 161 Figure A3.0.7 Maps for surface (S) and groundwater (G) seasonal mobility during snow melt (Melt) and baseflow (BF) seasons. 162 Figure A3.0.8 Model residual (log10 model – log10 observed) distribution and density are shown as violin plots, for annual, baseflow, and melt (different colors). The bottom and top of each box represent the first and third quartiles, respectively, and the line inside each box represents the median. Zero residual is indicated as a black dashed line. The top and bottom bars (whiskers) represent the maximum and minimum residuals, respectively. Data beyond the end of the whiskers are "outlying" points and are plotted individually. One-sample t-test is used to measure whether the mean residuals are significantly different from zero, indicated as ns (p > 0.05) and * (p < 0.05). 163 Figure A3.0.9 Log model residuals (kg/day) by watersheds. Text A3.1: Load Comparison with SPARROW model First, there are some critical differences: SPARROW does not use spatially explicit nutrient sources or attenuation processes and is run at a coarser resolution. Using the same observation dataset, the SENSEflux TN model slightly underestimated high loads, while the SPARROW model slightly overpredicted them (Figure A3.0.10). Both SENSEflux and SPARROW models slightly underpredicted higher TP loads and overpredicted lower loads. These differences are 164 unsurprising because the two models have notable differences, including nutrient attenuation processes, methods to model nutrient sources, spatial resolution, and timeframes. Specifically, SENSEflux includes four pathways (tile fields, overland, septic plumes, and groundwater). At the same time, the SPARROW model uses data on land-to-water delivery factors, such as soil permeability, drainage density, precipitation, air temperature, and the fraction of the stream catchment with tile drains to describe attenuation processes broadly across the basin(D. M. Robertson et al., 2019). Moreover, SENSEflux uses spatially explicit nutrient source inputs from SENSEmap, while SPARROW uses land use/cover and county-level estimates of nutrient masses to statistically compute sources more generally (Hamlin et al., 2020a). Finally, SPARROW models were developed for TN and TP with a 2002 base year (D. M. Robertson et al., 2019), while nutrient sources and watershed factor data used in SENSEflux are based on ca. 2010 data. Figure A3.0.10 Comparison to the simulated loads in the SPARROW models. Blue dots and lines are for SENSEflux simulation, and red is for SPARROW. The solid black line is a 1:1 line, where simulated loads are equal to observed loads. 165 Figure A3.0.11 Nutrient sources by lake basin. 166 Table A3.1 Summary of optimized model parameters (three parameters change across seasons, and the rest eight parameters are season shared) from optimization. Parameters Functio n type General function Parameters different across season Season shared parameters bs rbio tsettl f ExH SepE ff fstor bst bg bse dnsp Exponen tial Linear Exponen tial Basin reduction (surface) biological uptake in rivers Lake attenuation Subsurface partition Harvest extraction Efficiency multiplier on septic loads Fraction of groundwater-pathway nutrients stored in the deep unsat zone for P Tile reduction Basin reduction (subsurface) Basin reduction (septic) N denitrification and P sorption/mineralizatio n in rivers Model Season TN TP Baseflo w 1.76E- 02 3.03E- 05 2.68E- 06 Annual Melt 1.88E- 04 2.38E- 05 2.11E- 07 3.58E- 03 2.50E- 05 1.13E- 06 0.35 0.70 0.30 0.00 1.66E-03 1.09E-02 6.56E-04 Baseflo w 9.22E- 04 2.57E- 05 1.52E- 05 Annual Melt 9.71E- 05 2.40E- 05 6.08E- 07 6.59E- 06 2.30E- 05 9.19E- 07 0.77 0.68 0.35 0.54 2.76E-03 3.70E-03 8.33E-04 6.79E-06 8.48E-06 167 APPENDIX C: SUPPORTING INFORMATION FOR CHAPTER 4 Figure A4.0.1 The generation flow of ground truth points to Further exclusion of the points (i.e., 120-m apart for tile points; the equivalent number of tile and non-tile points) are not shown in this workflow. Text A4.1: Variable Selection 26 variables were initially chosen but not used in the final classification due to relatively higher correlation and lesser importance. These include (1) six variables that are derived from Landsat, specifically the maximum value of the normalized difference vegetation index (NDVI) in summer and spring, the normalized water index (NDWI) in growing season, maximum Tr_swir1 in growing season and max Tr_swir2 in spring and summer. (2) six climate-related variables, precipitation in spring and summer, aridity in summer and the growing season from GridMET, and the actual evapotranspiration (AET) in spring and summer; (3) three soil property variables at 0-1m, including the percentage of clay, hydraulic conductivity, and plant available water, while 168 these variables at 0-5 cm were selected for final classification; (4) nine soil moisture related variables, including the maximum surface soil moisture (SSM) and soil moisture percent (SMP) in spring, median SSM, SMP and subsurface soil moisture (SUSM) in spring, the range of SSM and SMP in spring, as well as median SSM and SUSM in the summer; and (4) two land surface temperature related variables, including maximum nighttime LST in growing season and the median difference of dayLST and nightLST in summer. Figure A4.0.2 The mean of monthly maximum NDVI and NDWI value across all tile and non-tile points. 169 Figure A4.0.3 The Person correlation coefficient (r < 0.8) between each two of the remaining 31 variables with short names (see full names in Table 4.1). 170 Figure A4.0.4 Variable importance in random forest classification. Variables are identified by their short names, which can be related to full names and other information using Table 1. Left: The mean decrease in node impurity from splits on each variable as measured by the Gini index. Right: The mean decrease in accuracy. For both, a larger number means higher importance. 171 Figure A4.0.5 Classified tile drainage area (km2) versus the statistical area from the USDA NASS, 2017. Each point stands for one county, the blue solid line is the linear regression line, and the black dashed line is the 1:1 line. 172