QUANTIFYING NUTRIENT TRANSPORT PATHWAYS USING SPATIALLY EXPLICIT 
MODELING AND REMOTE SENSING 

By 

Luwen Wan 

A DISSERTATION 

Submitted to 
Michigan State University 
in partial fulfillment of the requirements   
for the degree of 

Environmental Geosciences – Doctor of Philosophy 

2023 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ABSTRACT 

Excess  nitrogen  and  phosphorus  from  numerous  sources  delivered  through  complex  transport 

pathways have accelerated eutrophication and increased the incidence of harmful algal blooms, 

particularly  in  the coastal and Great  Lakes states in  the United States. To address  these issues, 

many  watershed  models  have  been  developed  to  simulate  nutrient  inputs  and  their  consequent 

deliveries to aid in the establishment of water quality management plans. However, these models 

typically lack spatially explicit descriptions of nutrient sources and attenuation factors, and they 

often  do  not  take  the  joint  of  surface  and  subsurface  transport  pathways  into  consideration.  In 

addition,  spatially  explicit  tile  drainage  areas  are  still  lacking,  which  limits  the  accuracy  of 

hydrology  and  water  quality  modeling.  This  dissertation  presents  a  spatially  explicit  model  to 

simulate  nutrient  loading,  sources,  and  pathways  annually  (Chapter  2)  and  during  two  distinct 

hydrologic seasons (chapter 3), as well as uses machine learning algorithms that translate remotely 

sensed Earth observations and environmental datasets into tile drainage maps (Chapter 4).  

In  Chapter  2,  I  enhance  and  apply  the  Spatially  Explicit  Nutrient  Source  Estimate  and  Flux 

(SENSEflux) model, to simulate nitrogen and phosphorus loads from the US Great Lakes Basin 

(USGLB) on an annual basis. The results show that agricultural sources are the dominant sources 

for both total nitrogen (58%) and phosphorus (46%) deliveries to the US Great Lakes. In addition, 

this study reveals that the surface pathways (sum of overland flow and tile field drainage) dominate 

nutrient delivery, transporting 66% of total nitrogen and 76% of total phosphorus loads to the US 

Great Lakes coastline.  

Building  on  the  annual  SENSEflux  simulation  in  Chapter  2,  Chapter  3  focuses  on  seasonal 

variations  in  nutrient  fluxes,  sources,  and  pathways.  Two  distinct  hydrologic  seasons  were 

examined, the first being snowmelt where high flow conditions are present in early spring and the 

second being  baseflow  where streamflow originates from  groundwater discharge in  mid-to-late 

summer. Results indicate that total nitrogen loading during snowmelt periods is four times greater 

than  annual  average  deliveries.  The  contribution  of  agricultural  sources  (chemical  agricultural 

fertilizer, manure, and N fixation) is substantially higher (15% for TN and 5% for TP) during melt 

than  baseflow,  while  point  sources,  septic  tanks,  and  atmospheric  deposition  become  more 

prominent contributors to nutrient delivery during baseflow. Thus, seasonal variation of nutrient 

transport should be considered when establishing nutrient criteria and reduction targets. 

In Chapter 4, the focus is shifted to tile drainage systems by developing a random forest classifier 

to map tile drainage in the US Midwest in 2017, as spatially explicit tile drainage information are 

still lacking. Thirty-one satellite-derived and environmental variables sampled at 60,938 tile and 

non-tile ground truth points collected from a variety of sources were used to train and validate the 

classifier.  The  classifier  achieved  a  good  performance  (Overall  accuracy:  96%;  F1  score:  0.9). 

Then, the classifier was  used to  do  classifications for other  regions that lack ground truth data 

within  the  study  region.  The  classified  tile  drainage  area  correlated  reasonably  well  with  the 

county-level area reported by the USDA National Agricultural Statistics Service (r2 = 0.68). The 

overall importance revealed that the maximum nighttime land surface temperature in the summer 

ranked highest, followed by climate- and soil-related variables.  

This  dissertation  has  produced  many  spatially  explicit  data  products,  including  nitrogen  and 

phosphorus loadings, sources, pathways, and hotspots. These are available not only on an annual 

basis but also for two hydrological seasons. Besides, a significant nutrient pathway, agricultural 

tile drainage, has been mapped and extended to the study region from the USGLB to the 14 states 

across the US Midwest. These products and insights could help watershed managers and decision-

makers implement nutrient reductions at the right time and place more effectively.

This work is dictated to my dear parents Mr. Zhongyu Wan and Mrs. Shuyun Deng, my 
wonderful sister Mrs. Shilin Wan and my beloved Yingjie, for their unconditional support and 
love. 

iv 

 
 
ACKNOWLEDGMENTS 

This dissertation represents five years and three months of hard work and personal growth. The 

journey was full of challenges and happiness. I would not reach this point without the tremendous 

support and guidance from my advisors, committee members, collaborators, friends, and family.  

I will start by thanking my advisors, Dr. Anthony Kendall and Dr. David Hyndman. Thank you 

for your unwavering support and guidance throughout this degree. I thank Anthony for continuing 

to support, challenge and motivate me. My research would have stalled many times if not for his 

brilliant and professional thoughts and ready availability for problem-solving. I would also like to 

thank  Anthony  for  his  technical  support  in  developing  and  completing  this  dissertation,  his 

practical  and  flexible  communication  strategies,  and  for  providing  thoughtful  and  honest 

suggestions for many aspects of my graduate study and life. I am grateful for his support and trust, 

which enabled me to work remotely during the final year. I am a better scientist and person because 

of  his  mentorship  and  friendship.  I  thank  Dave  for  his  deep  knowledge  and  comprehensive 

perspective, which has guided and challenged me throughout this dissertation and improved the 

quality of my research. I would also like to thank his care for students and prioritization of our 

needs,  such  as  providing  comments  and  suggestions  for  manuscripts  and  submitting  reference 

letters, which has provided me with lots of motivation and positivity. I often recall his famous 

saying, ‘You bet’, which encouraged me to ask questions and seek suggestions, and I am grateful 

for his generosity and kindness. I have learned so much from my noble people, Dave, and Anthony, 

and I will carry all the lessons forward in my career and life.  

Second,  I  sincerely  thank  my  committee  members  for  their  support  and  advice  throughout  my 

Ph.D. training. Thanks, Dr. Jay Zarnetske, for deepening my understanding of hydrogeology and 

watershed hydrology, stimulating more research interests in this discipline, and providing many 

v 

 
professional training opportunities. Thank you for encouraging and challenging me to think deeply 

and broadly, as well as giving career suggestions. Thank you, Dr. David Roy, for being direct and 

critical. His constructive comments from a remote sensing perspective challenged me to do better 

science, and his feedback is valuable. I would also like to thank him for allowing me to audit his 

remote sensing class and providing life suggestions.  

I also want to thank my funding sources and collaborators, this work would not have been possible 

without them, the National Oceanic and Atmospheric Administration, National Aeronautics and 

Space  Administration,  China  Scholarship  Council,  Kellogg  Biological  Station  Long-Term 

Ecological  Research  (KBS  LTER)  Program,  MSU  College  of  Natural  Science  and  MSU 

Department of Earth and Environmental Sciences.  

MSU  Hydrogeology  Lab  is  where  I  call  my  foreign  family,  a  community  that  deserves  all  the 

beautiful words in the world and gives me the best memories. I will cherish the friendships I gained 

here  throughout  my  life.  I  thank  my  officemates  Alex  Kuhl,  Autumn  Parish,  Quercus  Hamlin, 

Bailey Hannah, Brent Heerspink, and Chanse Ford for the supportive and generous atmosphere as 

well  as  my  other  friends  and  labmates:  Alexis  Lanier,  Ally  Brady,  Behnaz  Mirzendehdel,  Ben 

McCarthy, Erin Haacker, Jake Stid, Jeremy Rapp, Jill Deines, Leanne Hancock, Lisi Pei, Luisana 

Rodriguez, Mohammed Rahman, Quercus Hamlin, Ryan Vannier, Sam Smidt, Samin Abolmaali, 

Sherry  Martin  and  Xiao  Liu.  Their  continuous  positivity,  encouragement,  and  support  were 

essential to my journey. And I would like to provide some special thanks to Sherry Martin, for her 

mentorship,  for  being  a  professional,  supportive,  and  dedicated  co-author.  Alex  Kuhl,  for  her 

generous  advice  and  help,  especially  when  I  was  new  to  MSU.  Behnaz  Mirzendehdel,  for  her 

positivity and kindness, I miss you! Brent, thanks for your persisting care and encouragement, and 

friendly review of my writing materials. Quercus and Jeremy, thanks for the precious friendship! 

vi 

 
Thanks  for  all  the  professional  discussions  and  for  efficiently  reviewing  my  manuscripts  and 

writings.  I  am  grateful  for  all  the  snapshots  you  sent  in  the  past  year,  which  made  my  life  in 

California feel less lonely. Special thanks go to the EES department staff, including Pam Robinson, 

Ami McMurphy, Elizabeth McElroy, Dallas Coryell, Brittany Walter, and Judi Smelser.  

I thank my wonderful friends from the EES department, Lin Liu and Jiaxin Zhang. Lin, thank you 

for sharing helpful tips and resources, from fellowship application to career development. I know 

you are  always there when I need suggestions.  Jiaxin,  thank you for your creativity and multi-

talent, which has made my life much more fun, and thanks for listening and understanding my 

frustrations and providing your suggestions. I also want to thank my dear friends from the Center 

for Systems Integration and Sustainability (CSIS) at MSU, Yuqian Zhang, Ming Lei, Veronica 

Frans, and Sophia Chau. Thanks for getting together for pleasant traveling, delicious food, biking 

and hiking, and many late nights. I can’t wait to see you again soon! I am grateful for the friends 

and colleagues I have met through our reading group and the Hydro90 community. Thank you all 

for making me keep learning and making my life much happier during the pandemic.  

Finally, I thank my parents, Zhongyu Wan and Shuyun Deng, and my sister Shilin Wan. Dad and 

Mom, thanks for being  open-minded and supporting my decisions unconditionally. Dear sister, 

thanks for encouraging and helping me to participate in the overseas exchange program and pursue 

my  Ph.D.  in  the  United  States.  Thanks  for  breaking  the  traditional/conservative  mindset  and 

seeking economic independence, which greatly motivates me. Lastly thank you, goes to my best 

friend and cheerleader - Yingjie. Thanks for being there with me for over ten years, cheering me 

on, and helping me become a better person. 

vii 

 
TABLE OF CONTENTS 

CHAPTER 1: INTRODUCTION ................................................................................................... 1 

CHAPTER 2: THE IMPORTANT ROLE OF OVERLAND FLOW AND TILE FIELD 
PATHWAYS IN NUTRIENT TRANSPORT ................................................................................ 9 

CHAPTER 3: SEASONAL VARIATION OF NITROGEN AND PHOSPHORUS SOURCES, 
PATHWAYS AND LOADS: A SPATIALLY EXPLICIT MODEL SENSEFLUX ................... 38 

CHAPTER 4: SEETILEDRAIN: SPATIALLY EXPLICIT ESTIMATE OF TILE DRAINAGE 
USING REMOTE SENSING AND MACHINE LEARNING .................................................... 72 

CHAPTER 5: SUMMARY......................................................................................................... 111 

REFERENCES ........................................................................................................................... 116 

APPENDIX A: SUPPORTING INFORMATION FOR CHAPTER 2 ...................................... 137 

APPENDIX B: SUPPORTING INFORMATION FOR CHAPTER 3 ...................................... 157 

APPENDIX C: SUPPORTING INFORMATION FOR CHAPTER 4 ...................................... 168 

viii 

 
CHAPTER 1: INTRODUCTION 

1.1: Significance  

Nitrogen and phosphorus pollution is one of the most widespread issues around the world, posing 

persistent  threats  not  only  to  natural  ecosystems  but  also  to  human  health  (Basu  et  al.,  2023; 

Bowles et al., 2018; Goyette et al., 2016; Houlton et al., 2019). Elevated nutrient levels degrade 

river  water  quality  and  trigger  harmful  algal  blooms  (HABs)  and  they  can  also  contaminate 

drinking water and cause illnesses in both animals and humans (Liu et al., 2022; Pennino et al., 

2017). Exposure to HABs such as eating fish and outdoor recreational activities can cause many 

negative health effects, such as liver and kidney damage, and respiratory issues (Schullehner et al., 

2018). High levels of nitrates in drinking water can lead to infant methemoglobinemia (also called 

blue baby syndrome) (Hamlin et al., 2022; Ward et al., 2018). 

Excessive nutrient pollution has also been a problem that is increasingly challenging to solve due 

to the increasing population, changing climate, various  nutrient sources,  and complex transport 

pathways (Basu et al., 2023; Ma et al., 2023; Sinha et al., 2017). Nutrients can originate from a 

variety  of  sources,  including  atmospheric  deposition,  wastewater  treatment  plants  in  cities, 

excessive use of chemical fertilizers and animal wastes in agricultural and forested areas, septic 

tanks from households in rural areas, and nutrient runoff from urban areas such as golf courses 

(Byrnes et al., 2020; Hamlin et al., 2020a; Luscz et al., 2015; Zou et al., 2022). These nutrient 

sources are commonly nonpoint source pollution and challenging to identify since they typically 

have  no  single  source  or  discharge  point  but  are  transported  indirectly  into  surface  waters.  In 

addition,  nutrients  often  transport  through  multiple  surface  and  subsurface  pathways,  such  as 

overland runoff, tile drainage,  and groundwater,  which makes nutrient  management  even more 

difficult. As a result, nutrient pollution and related HABs have been a growing problem in most of 

1 

 
the coastal and Great Lakes states in the United States (Kalcic et al., 2019; Kimberly J Van Meter 

et al., 2023; D. M. Robertson & Saad, 2011a). 

The Great Lakes are the earth’s largest freshwater lake system. It supplies 21% of global and 80% 

of US fresh surface water (L. Zhang et al., 2019). The Great Lakes Basin is threatened by nutrient 

pollution due to agricultural and urban land use, and land management such as the density of tile 

drainage and wetland (Basu et al., 2023). The United States and Canada have signed the Great 

Lakes Water Quality Agreement and are obligated to protect and restore the Great Lakes. But some 

lakes, especially Lake Erie, continue to have serious nutrient issues with excessive nitrogen and 

phosphorus entering the nearshore region (L. Oldfield et al., 2020; Pennuto et al., 2014; Watson et 

al.,  2016). The massive  algal bloom in  Lake Erie in  August  2014 provoked a tap water ban in 

Toledo, Ohio, where nearly half a million people were without safe drinking water. There are many 

other nutrient pollution-related issues that have been seen in the past decades. For instance, the 

‘dead  zone’  area  in  the  Gulf  of  Mexico  was  measured  as  the  largest  in  2017  since  dead  zone 

mapping began there in 1985 due to nutrient pollution that is primarily from agriculture and urban 

runoff  in  the  Mississippi  River  basin  (Li  et  al.,  2023).  In  addition,  tile  drainage  across  the  US 

Midwest has been found directly related to nitrate loss in the Mississippi River basin (David et al., 

2010; Ma et al., 2023).  

Tile drainage was one of the significant advances across the US Midwest since the 19th century, 

promoting  highly  productive  agriculture  for  areas  where  it  was  too  wet  to  cultivate  (King, 

Williams, Macrae, et al., 2015). With agricultural intensification and increasing extreme climate 

events,  tile  drainage  is  growing  rapidly  and  becoming  more  prevalent  to  sustain  agricultural 

production.  Tile  drainage  area  increased  by  6.5  million  acres  from  2012  to  2017  in  the  US 

Midwest, and 70% of counties experienced expansion during the five years (NASS, 2012, 2017). 

2 

 
Research has found that 49% of soluble phosphorus and 48% of total phosphorus losses occurred 

via tile drainage discharge in the St. Joseph River watershed, located in the southwest portion of 

Michigan and the northern portion of Indiana, US (D. Smith et al., 2015). To quantify the impacts 

of tile drainage installation on water and nutrient flow and transport, maximize the benefits of tile 

drainage  investments,  such  as  maximizing  crop  yield  and  minimizing  environmental  impact, 

spatially explicit and fine resolution tile drainage information are needed.  

1.2: Research Gap 

Watershed models have been developed to understand nutrient loadings and sources across scales. 

These models include the Global Nutrient Export from WaterSheds (NEWS) model, the widely 

used Soil and Water Assessment Tool (SWAT) model, and the Spatially Referenced Regression 

on Watershed Attributes (SPARROW) model developed by USGS. Global NEWS was developed 

by an international, interdisciplinary scientific working group and includes basin-scale sub-models 

that  can  predict  nutrient  sources  and  estimate  natural  and  anthropogenic  sources  of  nutrients 

(Seitzinger  et  al.,  2005).  NEWS  does  not  simulate  detailed  hydrological  processes  within  a 

watershed but rather estimates the nutrient loading based on the statistical relationships between 

climate, land use, and river water quality. SWAT is a semi-distributed model that can simulate the 

hydrological cycle, nutrient cycling, and sediment transport at the watershed scale (Arnold et al., 

1998). The basic spatial unit used in SWAT is the Hydrologic Response Unit (HRU), which is 

determined by land use, soil type, and topography. In other words, these characteristics are to be 

uniform and do not vary spatially. SPARROW model uses multiple regression techniques and is 

used to estimate contaminant sources and transport in surface water (Preston et al., 2011; R. A. 

Smith,  Schwarz, et  al.,  1997).  Overall, these models  have been applied to  different  watersheds 

around  the  world,  but  they  lack  fine-resolution  and  spatially-  explicit  information  for  nutrient 

3 

 
sources, pathways, and loadings.  

Existing  tile  drainage  data  consists  primarily  of  the  area  reported  by  farmers  via  the  USDA 

National Agricultural Statistics Service’s survey at the county level and likely tile-drained area 

based on geospatial analysis, under the assumption that agricultural areas with lower slopes and 

poorly drained soils are likely to be tile-drained (Valayamkunnath et al., 2020). In addition, few 

studies use aerial images to identify tile drainage lines at the field scale (Naz & Bowling, 2008) or 

remote sensing images to identify subsurface drainage at the watershed scale (Cho et al., 2019). In 

recent  years,  the  availability  of  remote  sensing  with  large  spatial  coverage  and  high  temporal 

resolution has facilitated the mapping of agricultural practices, and the Google Earth Engine (GEE) 

provides the infrastructure for processing and analyzing large volumes of input datasets. Therefore, 

remote sensing and GEE can provide a cost-effective and expedient method for mapping large-

scale agricultural tile drainage. The spatially explicit tile drainage data would enhance the accuracy 

of the crop, hydrology, and water quality models and inform decisions regarding food security and 

environmental protection.  

1.3: Objective, Questions and Hypotheses 

Taken  together,  this  dissertation  includes  three  research  chapters  (Figure  1.1).  In  Chapter  2,  I 

modified the previously published version of the SENSEflux model to quantify total nitrogen and 

phosphorus  loadings,  sources,  and  pathways  and  identify  the  nutrient  delivery  hotspots  on  an 

annual basis within the US Great Lakes Basin and focused on addressing the three questions below. 

Research questions are indicated with the letter “Q” followed by a number, with corresponding 

numbered hypotheses indicated with the letter “H”. 

Q2.1:  How  much  nitrogen  and  phosphorus  are  delivered  annually  from  drainage  basins  in  the 

United States to the Great Lakes, and how do nitrogen and phosphorus loadings vary among lake 

4 

 
basins?  

Q2.2: What are the annual contributions of nutrient sources and pathways to nutrient transport and 

delivery to the Great Lakes?  

Q2.3: On an annual basis, where are the nutrient delivery hotspots located?  

I project that: (H2.1) a greater quantity of nitrogen and phosphorus loadings are delivered from the 

Great Lakes Basin to the Great Lakes, and these loadings vary across the basin, with the majority 

of  the  nutrient  loadings  coming  from  agricultural  and  urban  areas.  (H2.2)  the  contributions  of 

different nutrient sources and pathways varied, with agricultural sources accounting for most of 

the nitrogen and phosphorus pollution transported via surface pathways. (H2.3) annual hotpots are 

in these high nutrient delivery locations and maybe in these high nutrient input areas.  

Building on the annual SENSEflux model in Chapter 2, I further improved the SENSEflux model 

in Chapter 3 to simulate the seasonal loadings, sources, and pathways, with the major update being 

the seasonal nutrient mobility that was inferred from seasonal streamflow. More specifically, two 

hydrologic seasons were considered, snowmelt in early spring with high flows and baseflow, in 

mid-  to  late-summer  where  streamflow  originates  from  groundwater  discharge.  This  seasonal 

information further helps the decision makers to target nutrient reduction at the right time to help 

the reduction more efficiently. This chapter mainly addresses the three questions below.  

Q3.1: How much nitrogen and phosphorus are transported from US drainage basins to the Great 

Lakes during snowmelt and baseflow seasons? Does the pattern differ from the annual transport 

(i.e., delivery from each lake basin)?  

Q3.2: Which sources and pathways are dominant during the hydrologic seasons?  

Q3.3: When would it be the most effective time to reduce nutrient pollution in the Great Lakes, 

and where should seasonal reduction efforts be focused?  

5 

 
Here are hypotheses for Chapter 3. (H3.1) more nitrogen and phosphorus are delivered to the Great 

Lakes during the snow melt season, whereas less are delivered during baseflow.  The pattern is 

dependent on the regional variability of surface runoff and baseflow and is likely to differ from 

the  annual  patterns.  (H3.2)  agricultural  sources  are  most  likely  the  primary  source  during 

snowmelt,  whereas urban  sources become dominant  during baseflow season. Surface  pathways 

dominate nutrient transport and delivery during the snow melt season, but subsurface pathways 

are dominant during baseflow. (H3.3) due to increasing nutrient loadings, the snow melt season 

would be the best time to implement nutrient reduction plans. During baseflow, however, nutrient 

reduction should be prioritized to locations with higher nutrient concentrations and groundwater 

connections.  

Inspired  by  the  previous  two  chapters’  findings  that  tile  drainage  is  a  vital  nutrient  transport 

pathway on an annual basis, and it delivered more nutrients during snowmelt. Also, due to the lack 

of spatially explicit and high-resolution data on tile drainage at a large scale, I shifted my focus to 

tile drainage systems in Chapter 4. Here I integrated remotely-sensed and environmental datasets 

to map agricultural tile drainage across the Midwest in 2017 using random forest machine learning 

via the Google Earth Engine cloud  computing platform and identified the crucial variables that 

improve the tile drainage classification accuracy.  

Q4.1: Using remote sensing data, can a machine learning model identify tile drainage area at the 

pixel  level  and  achieve  good  accuracy?  Where  are  the  tile  drainage  regions  in  the  Midwestern 

United States? 

Q4.2: Which input variables are important to improve tile drainage classification accuracy across 

the US Midwest?  

I  proposed  the  following  hypothesis:  (H4.1)  with  satellite  imagery  and  large  environmental 

6 

 
datasets available, as well as the processing resources provided by Google Earth Engine, a machine 

learning classifier can map spatially explicit tile drainage information with reasonable accuracy. 

Tile  drainage  areas  are  found  in  agricultural  regions  with  inadequate  drainage.  (H4.2)  climate-

related factors (i.e., precipitation) and soil-related variables such as soil drainage class and slopes 

are important for tile drainage identification.  

Figure 1.0 Overview of this dissertation, including three research chapters. The schematic diagram 
in 
is  based  on  https://www.cityofgriffin.com/departments/storm-
water/watershed-management, and the icons in lower left corner are downloaded from Freepik.  

left  corner 

the  upper 

This dissertation has produced many spatially explicit data products for nutrient loadings, sources, 

pathways,  and  hotspots  on  an  annual  basis  and  during  hydrologic  seasons  (snowmelt  and 

baseflow). Also, a significant nutrient pathway, agricultural tile drainage, has been mapped and 

extended  to  the  study  region  from  the  US  Great  Lakes  Basin  to  the  14  states  across  the  US 

7 

 
 
Midwest. These maps can be linked to community-facing tools such as the tipping point planner 

(tippingpointplanner.org) that can help nutrient managers and government agencies pinpoint areas 

with the most extensive nutrient deliveries and understand the contributions of primary sources 

and pathways at a finer resolution. It can also help watershed managers and decision-makers target 

the areas for reduction at the right time and develop water quality management plans precisely. 

The  spatially  explicit  tile  drainage  locations  and  extent  can  provide  a  baseline  for  mapping 

historical tile drainage area change and predicting the potential areas for future expansion as more 

extreme  rainfall  events  have  been  predicted  under  climate  change  scenarios.  The  findings  and 

results from this dissertation are beneficial to broader interdisciplinary research efforts regarding 

food  security,  agricultural  water  management,  and  environmental  sustainability.  Finally,  the 

SENSEflux annual and seasonal models and the random forest machine learning algorithms for 

tile drainage detection can be readily applied to other regions.  

8 

 
 
 
 
CHAPTER 2: THE IMPORTANT ROLE OF OVERLAND FLOW AND TILE FIELD 

PATHWAYS IN NUTRIENT TRANSPORT 

2.1: Abstract 

Nitrogen and phosphorus pollution are of great concern to aquatic life and human wellbeing. While 

the vast majority of these nutrients are applied to the landscape, little is known about the complex 

interplay between nutrient applications, transport attenuation processes, and coastal loads. Here, I 

enhance and apply the Spatially Explicit Nutrient Source Estimate and Flux model (SENSEflux) 

to simulate the total annual nitrogen and phosphorus loads from the US Great Lakes Basin to the 

coastline,  estimate  the  relative  contributions  of  different  sources  and  pathways,  and  identify 

nutrient delivery hotspots. In addition to in-stream uptake, this model explicitly describes nutrient 

attenuation through four distinct pathways that are seldom described jointly in other models: runoff 

from tile-drained agricultural fields, overland runoff, groundwater flow, and septic plumes within 

groundwater. This model provides fully distributed estimates of nitrogen and phosphorus loading, 

sources, and pathways at high resolution (120 m). Our analysis shows that Lake Michigan and Erie 

have highest the nutrient loading from the US portion of the Great Lakes Basin among the Lakes, 

and the Lake Erie basin has the highest nutrient flux per unit area, in agreement with other studies. 

Agricultural sources, including chemical fertilizer, manure, and nitrogen fixation, are the dominant 

sources for both total nitrogen (58%) and total phosphorus (46%) deliveries to the US Great Lakes. 

In  addition,  this  study  reveals  that  the  surface  pathways  (i.e.,  overland  runoff  and  tile  field 

drainage)  dominate  nutrient  delivery,  transporting  66%  of  total  nitrogen  and  76%  of  total 

phosphorus loads to the US Great Lakes coastline. Importantly, this study provides the first basin-

wide estimates of both groundwater and septic-plume deliveries of nutrients to the lakes. I find 

that  the  groundwater  pathway  for  nitrogen  has  higher  delivery  than  overland  flow.  This  work 

9 

 
 
provides valuable information for environmental managers to target efforts to reduce nutrient loads 

to the Great Lakes, which could be transferred to other regions worldwide that are facing similar 

nutrient management challenges. 

Figure 2.0 Abstract graphic. Nitrogen budget for the US Great Lakes basin in ~2010. Total nitrogen 
and phosphorus sources are shown in colored rectangles. Orange arrows show the flow of nutrient 
processes,  including  harvest  and  root  zone  loss,  groundwater  storage,  septic  removal,  basin 
attenuation and stream losses.  

2.2: Introduction 

Nitrogen  and  phosphorus  loading  have  been  linked  to  degraded  surface  water  quality  and  the 

eutrophication  of  many  coastal  ecosystems  worldwide.  Research  has  focused  on  several 

ecosystems,  including  the  Gulf  of  Mexico  and  Chesapeake  Bay  in  the  United  States  (R.  B. 

Alexander et al., 2000; Lefcheck et al., 2018), the Laurentian Great Lakes Basin in North America 

(Chapra et al., 2016; D. M. Robertson et al., 2019), as well as Taihu Lake and the Yangtze Basin 

in China (Xuanjing Chen et al., 2022; L. Guo, 2007; Powers et al., 2016; J. Wang et al., 2020; W. 

Zhang  et  al.,  2019,  2020).  Actions  have  been  taken  to  restore  and  protect  water  quality.  For 

example, the US and Canada signed the Great Lakes Water Quality Agreement (GLWQA) in 1972 

10 

 
 
 
and  updated  it  in  2012  with  reduced  phosphorus  load  targets  (D.  M.  Robertson  et  al.,  2019). 

Although point source loads have been curtailed under the US Clean Water Act (CWA), nutrient 

pollution is still one of the most widespread, costly, and challenging environmental problems in 

the United States (Frei et al., 2021; Sharma, 2020). This is partly due to the technical difficulties 

inherent  in  predicting  the  complex  transport  of  pollutants  from  millions  of  “nonpoint”  sources 

through  heterogeneous  hydrologic  systems  to  receiving  water  bodies  via  diverse  surface  and 

groundwater pathways.  Developing effective management and mitigation strategies requires an 

improved understanding of the relative contributions of different nutrient sources and their varied 

transport pathways. 

Nutrient  loading  to  coastal  waters  is  derived  from  both  point  sources  (primarily  wastewater 

treatment  plants)  and  nonpoint  sources  (including  agricultural  and  non-agricultural  fertilizer, 

manure, nitrogen fixation, and atmospheric deposition) (Boyer et al., 2002; Hamlin et al., 2020a; 

Luscz  et  al.,  2015;  Powers  et  al.,  2016;  Swaney  et  al.,  2018).  Agricultural  practices  are  major 

contributors to nutrient contamination because of the widespread use of fertilizers and livestock 

manure (Chaplin-Kramer et al., 2016; Motew et al., 2018). Thus, managing agricultural sources 

typically has been the focus of efforts to reduce nitrogen and phosphorus losses to the environment 

(Bowles et al., 2018; Ockenden et al., 2017; W. Zhang et al., 2019, 2020). In contrast to the large 

body of research on agricultural nutrients, pollution from intensively managed urban landscapes 

is of great concern but is understudied (Filipović et al., 2015; Lapointe et al., 2017; Rakhimbekova 

et al., 2021). For example, golf courses have been cited as significant sources of nutrient loading 

to water bodies (Bock & Easton, 2020), and human wastewater sources may contribute up to 40% 

of agricultural loads to coastlines globally (Tuholske et al., 2021). Septic sources are direct and 

concentrated inputs to the groundwater system and are substantial sources of groundwater nitrate 

11 

 
 
in the United States (Ouyang & Zhang, 2012). Septic tanks have been considered as the primary 

cause of nutrient leaching to groundwater because of inappropriate site conditions, poor design, 

inadequate maintenance, and infrequent inspections (Pollack et al., 2011). Unfortunately, septic 

systems are commonly only evaluated at the time of permitting or during major building additions; 

less attention and resources are typically directed to septic system upkeep and maintenance, which 

are regulated in different ways across the US (L. Oldfield et al., 2020). 

It is also important to know the relative importance of different pathways through which nutrients 

travel to receiving water bodies, as transport and uptake processes differ along those pathways. 

Most studies have focused on pathways like surface runoff, while there has been little literature 

that comprehensively quantifies the relative contribution of other nutrient transport pathways (Gill 

& Mockler, 2016). Specifically, tile drainage (Ikenberry et al., 2014; King, Williams, & Fausey, 

2015;  Michaud  et  al.,  2019)  and  groundwater  (Brookfield  et  al.,  2021;  Holman  et  al.,  2008; 

Robinson, 2015) have been found as important transport pathways and are major contributors to 

nutrient  loads.  For  example,  groundwater  discharge  likely  accounts  for  ~50%  of  phosphorus 

loaded  to  Lake  Arendsee,  Germany,  thus  accelerating  the  eutrophication  of  the  lake  and 

detrimental ecological impacts (Meinikmann et al., 2015). (D. Smith et al., 2015) estimated that 

49% of soluble P and 48% of total P losses from fields occurred via tile discharge in research fields 

across  the  Lake  Erie  basin.  Therefore,  it  is  important  to  quantify  the  relative  contributions  of 

nutrients from these major pathways. 

It is not feasible to measure the relative contributions of different nutrient sources and pathways 

at the regional watershed scale, thus modeling approaches have been used to simulate their fate 

and transport. Three main categories of models are regression-based empirical, process-based flow 

and  transport,  and  hybrid  empirical  and  process-based  models  (R.  B.  Alexander  et  al.,  2002). 

12 

 
 
Regression-based  models  are  less  complicated  and  easier  to  implement,  although  most  ignore 

spatially  explicit  sources  and  lack  mechanistic  components,  interactions  between  sources,  and 

some nutrient loss processes (Howarth et al., 1996). Examples of process-based models include 

Nutrient Export from Watersheds (NEWS) (Mayorga et al., 2010a; Seitzinger et al., 2005) and the 

widely used Soil and Water Assessment Tool (SWAT) (Gassman et al., 2014). NEWS simulates 

hydrological processes on land and subsequently nutrient transport through surface and subsurface 

waters (Arnold et al., 1998). SWAT simulates transport using hydrologic response units (HRUs), 

which lump all similar land uses, soils, and slopes within a subbasin. Hybrid empirical and process-

based models such as the SPAtially Referenced Regression On Watershed attributes (SPARROW), 

a GIS-based watershed model developed by the United States Geological Survey (USGS), uses a 

hybrid approach to estimate nutrient sources, transport, and loadings around the world (Dai et al., 

2021; D. M. Robertson et al., 2019; D. M. Robertson & Saad, 2011b; R. A. Smith, Schwarz, et al., 

1997; Wellen et al., 2012). While SWAT, NEWS, and SPARROW consider nutrient sources and 

retention on landscape and stream networks to predict nutrient and source contribution (McCrackin 

et al., 2013), they do not characterize spatially explicit sources  nor extensively model different 

nutrient attenuation pathways. 

To  address  these  limitations,  I  enhanced  the  spatially-explicit  hybrid  SENSEflux  (Spatially 

Explicit Nutrient Source Estimate and Flux) model to estimate the fate and transport of nitrogen 

and  phosphorus  that  originate  from  point  and  nonpoint  sources.  SENSEflux  models  nutrient 

transport  across  the  landscape,  stream  network,  and  connected  inland  lakes  to  the  Great  Lakes 

coastline via spatially explicit pathways, including overland flow, tile drains, groundwater, and 

septic  plumes.  An  earlier  version  of  SENSEflux  was  developed  for  the  Lower  Peninsula  of 

Michigan  (Luscz  et  al.,  2017);  here  additional  modeling  capabilities  for  nutrient  reduction  and 

13 

 
 
retention  processes  have  been  incorporated.  In  particular,  SENSEflux  offers  an  improved 

simulation  of  nutrient  delivery  to  streams  via  groundwater  pathways  (Martin  et  al.,  2021), 

estimates 

the 

long-term  storage  of  phosphorus 

in 

the  subsurface,  and 

improves 

the 

parameterization of both in-stream and lake nutrient losses. Here, I estimate total phosphorus (TP) 

and total nitrogen (TN) loads to the US Great Lakes Basin (USGLB), and compare the relative 

contributions  of  nutrient  sources,  with  an  emphasis  on  transport  pathways.  The  results  can  be 

particularly useful for stakeholders to identify hotspot areas (e.g., high nutrient flux and yields) 

and  major  sources  and  pathways  that  contribute  to  nutrient  inputs  to  the  Great  Lakes.  This 

knowledge can help prioritize locations and strategies for nutrient reduction and provide valuable 

inputs to other hydrological and ecological studies. The SENSEflux model could be applied to 

other regions around the world that have nutrient management issues.  

2.3: Method 

2.3.1: SENSEflux Model Description   

SENSEflux uses a GIS and mass balance approach to simulate the nutrient fate and transport from 

point and nonpoint sources across the landscape through rivers to lakes and wetlands. A schematic 

diagram and a detailed conceptual framework of SENSEflux are shown in Figure 2.1. Broadly, 

SENSEflux  includes  four  components:  1)  nutrient  applications,  2)  in-situ  losses,  3)  basin 

attenuation  via  surface  and  subsurface  pathways,  and  4)  stream  and  lake  attenuation.  Loss  and 

attenuation terms are generally spatially-explicit, the product of both static landscape factors and 

an independent calibrated parameter for each process and nutrient. Prior to transport, three in-situ 

loss  terms  that  remove  nutrients:  crop  harvest/in-situ  loss,  septic  system  nutrient  removal 

efficiency, and unsaturated zone nutrient storage. I split basin transport into four distinct pathways, 

which  are  not  commonly  represented  in  most  hybrid  and  statistical  nutrient  transport  models: 

14 

 
 
overland  flow,  tile  drainage,  bulk  groundwater  flow,  and  septic  plumes.  Transport  along  each 

pathway  includes  an  attenuation  factor,  proportional  to  either  distance  or  time,  calibrated  and 

validated at sampling locations. Following basin attenuation, nutrients are then subject to stream 

and lake attenuation before ultimately reaching the desired end point (e.g., a sampling location or 

the Great Lakes coastline).  

SENSEflux supports six (P) and seven (N) spatially explicit nutrient applications. There are three 

agricultural terms: manure, agricultural chemical fertilizers, and nitrogen fixation. Urban land use 

terms include chemical non-agricultural fertilizers, point sources, and septic tanks. Atmospheric 

deposition of both N and P occurs in all landscapes. Importantly, I do not use Net Anthropogenic 

Nitrogen Inputs (NANI) (Hong et al., 2013) or Net Anthropogenic Phosphorus Inputs (NAPI) (Han 

et al., 2011), though these could be computed from the SENSEflux outputs. For this study, nutrient 

inputs are described by the 2010 SENSEmap product (described in section 2.4.2). 

There are three in-situ loss terms applied before nutrients are transported: Septic removal, Harvest, 

and Storage. The Harvest (ExH in Figure 2.1) loss term includes all in-place root zone losses of 

nutrients (i.e., sorption, denitrification, P mineralization, etc.) and is assumed to occur on in cells 

with  either  manure  or  chemical  agricultural  fertilizers  applied  (see  “harvested  areas”  in  Figure 

A2.0.1).  The Storage loss term (Fstor) includes both in-place storage and loss of nutrients below 

the root zone, for phosphorus. Note that Fstor is only applied to subsurface mobile nutrients (see 

below). Both Harvest and Storage only occur for surface-applied sources (excluding septic and 

point sources). Finally, septic sources are subject to Septic loss (SepEff). The spatial distributions 

and equations for these loss terms are detailed in Appendix A (Text A2.1 and Text A2.2). 

Before transport, during the calculation of in-situ losses, surface-applied nutrients remaining after 

harvest  are  partitioned  between  surface  and  groundwater  pathways  with  a  spatially  variable 

15 

 
 
partition parameter (F). This produces surface- and subsurface-mobile nutrient pools within each 

cell. The partition parameter (F) is assumed to vary directly with groundwater recharge. Nutrients 

applied to septic systems are subject to septic loss, then form the septic mobile pool. 

Nutrients from the mobile pools are then subject to basin transport and attenuation, consisting of 

movement to streams from each point on the landscape, with spatially-variable and source-specific 

attenuation  occurring  along  the  path.  Surface  mobile  nutrients  may  flow  to  streams  via  either 

overland flow (BS) or agricultural tile drains (BST), in areas where tile drainage exists. While little 

overland flow occurs in most tile-drained areas (King et al., 2014), this assumption may lead to a 

somewhat  elevated  estimate  of  overall  tile  drainage  flux  versus  overland  flow.  The  subsurface 

mobile nutrients are then transported and attenuated through groundwater flow (BG). Transport 

and attenuation of septic-mobile (remaining septic removal) nutrients within septic plumes (BSE) 

is  another  important  groundwater  pathway.  I  separated  nutrient  transport  along  this  pathway 

because attenuation within septic plumes occurs differently than in general groundwater flow due 

to the distinct chemical characteristics of septic tank effluent.  

16 

 
 
 
a. Schematic diagram 

b. Conceptual framework 

Figure 2.1 Schematic diagram (a) and conceptual framework (b) of the SENSEflux model. In the 
schematic diagram, seven nutrient sources are indicated in blue text. Nutrient transport across the 
landscape via surface and groundwater pathways are indicated with black dashed lines and yellow 
triangles.  Basin  attenuation  terms:  BS:  Overland  flow,  BST:  tile  field,  BG:  groundwater  flow, 
BSE:  septic  plume.  River  attenuation  term:  R:  in-stream  and  lake  attenuation.  (Based  on 
the 
https://www.cityofgriffin.com/departments/storm-water/watershed-management). 
conceptual diagram, brown boxes represent nutrient sources, and green arrows pointing left are 
septic onsite removal, crop harvest, and loss in the deep unsaturated zone. Dashed yellow arrows 
are distinct nutrient transport pathways. 

In 

17 

 
 
 
 
 
 
Nutrients  remaining  after  basin  transport  are  then  subject  to  attenuation  via  stream  and  lake 

processes (R). Point sources are applied directly to streams and lakes at this step. Stream processes 

(which here include flow through connected wetland systems) include separate terms for biological 

nutrient uptake, denitrification (N), and sorption/mineralization (P). Lakes are represented with a 

linear uptake term, proportional to the length of the nutrient flow paths that intersects lacustrine-

classified wetlands (i.e., lakes). Detailed descriptions for the derivation of these terms are given in 

Appendix A (Text A2.3).  

This study builds on and renames (here, SENSEflux) the model first presented by (Luscz et al., 

2017),  incorporating  new  loss  terms  and  improving  multiple  parameterizations.  These  changes 

were largely necessitated by the greater spatial extent of this model (see section 2.4.1), and thus 

the greater range of landscape and climate characteristics present. First, I added a subsurface in-

situ  storage  term  (Fstor)  for  phosphorus.  I  also  tested  this  approach  for  nitrogen,  but  as 

parameterized it decreased model fit and led to unrealistic results for groundwater N transport. 

Second, I replaced the simplistic R term in (Luscz et al., 2017), which also relied on a basin-yield 

parameter that dominated the overall stream uptake, potentially skewing results. The new R term 

includes spatially-variable characteristics related to denitrification or sorption, biological uptake, 

and lake losses. Finally, some small adjustments were made to the overall model equation. These 

changes are described in Appendix A (Text A2.1, Text A2.2, and Text A2.3). 

2.3.2: Model Parameter Estimation 

SENSEflux is calibrated separately for N and P, using observed fluxes or concentrations (here, 

concentrations).  This  study  uses  annually-averaged  fluxes  (section  2.4.3)  and  thus  represents 

average  annual  losses,  attenuation,  and  nutrient  delivery.  There  are  10  (N)  and  11  (P)  scalar 

parameters in the SENSEflux model; these include four loss and partition terms (SepEff, ExH, F, 

18 

 
 
and Fstor) as linear multipliers, with the remaining attenuation parameters (Bs, Bst, Bse, Bg, Rdn, 

Rbio, and Lacus) as multipliers within an exponent (see equations in Text A2.1). For N, Fstor is 

set to 0. Except for SepEff, all parameters are then optimized via automated parameter estimation. 

SepEff is the efficiency multiplier on septic loads and is set as 0.3 for TN (0.35 for TP) based on 

existing  studies (Luscz et  al.,  2017;  Tong et  al.,  2020;  Valiela et  al.,  1997).  While SENSEflux 

could  be  used  to  independently  calibrate  multipliers  for  both  SepEff  and  Bse  (septic  basin 

attenuation), here I did not have sufficient data density to independently calibrate both.  

Here I used MATLAB’s constrained non-linear local-minimum optimization routine fmincon. The 

objective function for this optimization was the mean absolute difference (error) between the base-

10 log (termed MAEL) of the observed and simulated nutrient concentrations. Several different 

objective functions were tested, including root-mean-square residuals, and root-mean-square log-

10 residuals. Ultimately MAEL was selected because it more equally weighted both low and high-

concentration  locations,  a  necessity  given  the  wide  range  of  nutrients  across  the  study  region 

(section  2.4.1  and  Figure  A2.0.2  and  Figure  A2.0.5).  I  also  tested  using  MAEL  with  loads,  as 

opposed to concentrations, but ultimately selected concentrations because  they provided higher 

sensitivity to attenuation parameters. 

2.4: Study Area and Model Inputs 

2.4.1: Study Area: US Great Lakes Basin (USGLB) 

The  Laurentian  Great  Lakes  encompass  Lakes  Superior,  Michigan,  Huron,  Erie,  and  Ontario, 

which make up Earth’s largest liquid freshwater system, with ~21% of the world’s and ~80% of 

the United States’ surface freshwater supply (L. Zhang et al., 2019). The USGLB includes portions 

of eight US states (Illinois, Indiana, Michigan, Minnesota, New York, Ohio, Pennsylvania, and 

Wisconsin). Average precipitation across USGLB varies between 500 and 1600 mm yearly (Figure 

19 

 
 
A2.0.2), and annual average temperatures from 3 to 10 ℃ in the 2010s (OSU, 2014). Forty million 

residents  of the United States and Canada depend on this lake system for clean drinking water 

(NOAA, 2019).  

Great Lakes’ ecosystems are being threatened by climate change, invasive species, and degraded 

water  quality  because  of  pollutants  from  residential,  agricultural,  and  industrial  activities 

(Kovalenko  et  al.,  2018;  Paerl  et  al.,  2016;  Sterner  et  al.,  2017).  Urbanization,  agricultural 

intensification,  and  failing  septic  systems  are  causing  contamination  across  the  GLB.  Several 

major  cities  in  the  southern  basin  (e.g.,  Chicago,  Detroit,  and  Cleveland)  have  significant 

impermeable surface areas that route precipitation directly to aquatic systems via overland flow. 

Most  agricultural  areas  are  in  the  southern  portion  of  the  basin  (Figure  A2.0.2)  and  produce 

substantial nutrient loads to the lakes (D. M. Robertson et al., 2019). Because of humid continental 

climates and broad areas with very permeable soils and high aquifer recharge, 43% of the USGLB 

coast  is  vulnerable  to  groundwater-borne  nutrients  (Knights  et  al.,  2017;  Kottek  et  al.,  2006). 

Harmful Algal Blooms (HABs) in the GLB have been a critical issue for millions who live in the 

region,  resulting  in  negative  effects  on  industries  (e.g.,  fishery,  tourism,  aquaculture),  ecology 

(e.g.,  fish  kills),  and  public  health  (e.g.,  drinking  water  contamination,  toxicity  to  pets  and 

livestock) (Carmichael & Boyer, 2016; Gobler, 2020; Michalak et al., 2013). 

2.4.2: Nutrient Source Inputs: SENSEmap  

To drive SENSEflux, I used the SENSEmap product that describes each of 7 distinct sources (total 

nitrogen,  TN)  or  6  sources  (total  phosphorus,  TP)  at  30  m  resolution,  ca.  2010.  These  sources 

include  point  sources,  along  with  the  non-point  sources  of  chemical  agricultural  fertilizers, 

chemical non-agricultural fertilizers, manure, septic tanks, atmospheric deposition, and N fixation. 

SENSEmap estimates sources using GIS and statistical methods constrained by broadly available 

20 

 
 
data from remote sensing, government databases, and literature (Hamlin et al., 2020a, 2020b). For 

this  study,  all  seven  sources  were  aggregated  to  120  m  resolution,  summing  from  the  30  m 

SENSEmap values (Figures A2.0.3 and A2.0.4).  

2.4.3: Calibration Data: In-Stream Nutrient Load 

TN and TP loads at sampling sites (TN: 116, TP: 119) within the USGLB, which were used by (D. 

M. Robertson & Saad, 2011b) to calibrate the SPARROW model, were extracted to calibrate and 

validate the SENSEflux model (Figure A2.0.5). Here, averages from 2008 – 2012 were used for 

calibration and validation. Load data was split randomly into two sets: 70% for model calibration 

and  30%  for  validation.  (D.  M.  Robertson  &  Saad,  2011b)  estimated  these  loads  using  the 

Fluxmaster  program,  which  may  overestimate  nitrogen  loads  and  underestimate  TP  due  to 

uncertainties  such  as  the  lack  of  continuous  measurements  of  concentration  and  streamflow 

(Richards et al., 2012; D. M. Robertson et al., 2019; Stenback et al., 2011). Nevertheless, they are 

the most complete dataset of annual loads available for the region. 

2.4.4: Basin Characteristics: Groundwater Recharge, Overland Flow Length, Harvested 

Areas, and Tile Drained Areas 

As discussed in section 2.3, spatially variable factors affect the fate and transport of TN and TP 

during both landscape (basin) and in-stream transport. Groundwater recharge (Figure A2.0.1), or 

the amount of water percolating  from the surface to the water table, is used to characterize the 

subsurface partition (F) and the fraction of groundwater-pathway nutrients stored in soil and deep 

unsaturated zone (Fstor). Overland flow length was calculated using ArcGIS Hydrology Toolbox 

based on the Digital Elevation Model (DEM) from the National Elevation Dataset (NED) and is 

used  as  part  of  the  reduction  factor  for  basin  pathways.  Harvested  areas  for  TN  and  TP  are 

determined by where manure or chemical agricultural fertilizers are applied. A novel tile drainage 

21 

 
 
layer was calculated to evaluate whether nutrients were likely transported via overland flow or tile 

fields. See details for groundwater recharge and tile drainage area calculation in Text A2.4. 

2.4.5: In-Stream/Lake Characteristics: Catchments, In-Stream Travel Time, Streambed 

Exchange Rate, and Lake Travel Distance  

The hydrologic networks move water through catchments and along rivers, with their associated 

drainage basin providing a critical component to hydrologic analysis and modeling (Brakebill & 

Terziotti, 2011). The (~30 m) resolution 1 arc-second DEM from the USGS National Elevation 

Dataset  (NED)  was  used  to  calculate  flow  direction  and  flow  accumulation  to  generate  stream 

networks  (Gsech  et  al.,  2002).  Watersheds  were  delineated  using  these  sampling  sites  as  pour 

points in the ArcGIS 10.6 Hydrology toolbox. TN and TP watersheds are shown in Figure A2.0.5 

with corresponding loads for each watershed.  

Like  the  overland  flow  length  calculation  in  the  previous  section,  in-stream  travel  time  was 

calculated using the ArcGIS Hydrology Toolbox with the NED DEM and the flowlength function. 

For this instance, a cost raster was supplied, calculated as the time/unit distance in each cell (i.e., 

1/velocity). For overland flow portions of the flow path, the cost function was set to 0, while in-

stream velocities were computed from USGS gauge site data as in (Mooney et al., 2020). In-stream 

velocities are used to calculate the biological uptake portion of stream attenuation (equation (2.5) 

in Text A2.1).  

The N denitrification/P sorption portion of the stream attenuation functions was assumed to be 

driven by the rate of exchange between streamflow and the stream bed and  the hyporheic zone 

beneath. I calculated this rate of exchange as the ratio of streambed flux and in-stream flow, as 

described in Text A2.3. Ultimately, this exchange rate (Figure A2.0.6, e) is the product and ratio 

of multiple factors, including hydraulic conductivity (Figure A2.0.6, d), slope (Figure A2.0.6, a), 

22 

 
 
basin yield during baseflow (Figure A2.0.6 b), hydraulic radius (Figure A2.0.6, c) and velocity 

(Figure A2.0.6, d). The exchange rate is then used to calculate denitrification/sorption in (equation 

(2.6) in Text A2.1).  

Travel distance in lakes was computed by providing a 0/1 cost raster to the flowlength function in 

ArcGIS, with  values of  1 indicating lakes and 0 otherwise. This  layer is used to compute lake 

retention (equation (2.7) in Text A2.1). 

2.5: Results and Discussion  

SENSEflux TN and TP annual models performed well (calibration/validation R2 values 0.93/0.86 

and 0.79/0.76, respectively, Figure 2.2), providing estimates of TN and TP loads to the USGLB 

using optimized parameters (Table A2.1). These parameters indicate higher rates of attenuation in 

overland vs. groundwater pathways, and that transport through tile drainage produces the lowest 

attenuation rates of all pathways (see extended discussion in Text A2.5).  

The TN model had a slightly better fit versus TP, with a difference of 14% for calibration and 10% 

validation  datasets,  respectively.  The  best-fit 

line  has  slopes  >  0.75  (TN  and  TP 

calibration/validation  slopes  are  0.90/0.76  and  0.77/0.78,  respectively),  indicating  a  slight  bias 

toward  high  predictions  at  low  loads,  and  low  predictions  at  higher  loads.  Overall,  the  model 

predicted loads were close to observed values: the mean absolute error of log daily loads (MAEL), 

for TN calibration and validation are 0.14 and 0.17 log10 (kg/day) respectively, 0.22 and 0.28 for 

TP.  Analyses  of  residuals  indicate  no  significant  bias  for  TN,  but  a  slight  (statistically  non-

significant) underestimate of TP deliveries for the validation dataset (Figure A2.0.7). The residuals 

for TN and TP are not significantly different from zero with P values ranging from 0.4 to 0.65 

based on a one-sample t-test (Figure A2.0.7). The spatial residuals (Figure A2.0.8) are reasonable 

with  83%  and  77%  of  watersheds  having  residuals  (log10)  between  -0.2  and  0.2  and  are 

23 

 
 
significantly  clustered  spatially  with  the  Moran’s  Index  of  0.43  and  0.41  for  TN  and  TP 

respectively.  

Figure 2.2 Plot of log10 simulated and observed daily loads for model calibration and validation 
datasets. The dashed black line is the 1:1 line. Solid blue and red lines indicate regression fits. n 
refers to the number of observation points in each of the calibration and validation datasets. 

2.5.1: Spatially Varied Nutrient Delivery and Loads to Lakes 

Simulated export of TN and TP varied substantially across the US Great Lakes Basin, with the 

majority of area (~60%) ranging between 128-912 kg/yr/km2 for nitrogen and 4-23 kg/yr/km2 for 

phosphorus, encompassing the ~20th to 80th percentile of nutrient deliveries (Figure 2.3a & 2.3b). 

Over the entire USGLB, mean TN and TP loads are 599 and 21.7 kg/yr/km2, respectively. Broadly, 

spatial  patterns  are similar for both  TN and  TP (Pearson correlation coefficient,  r  = 0.73). For 

instance, both TN and TP are high in the southern Lake Michigan, Saginaw Bay, Western Lake 

Erie, and Lake Ontario basins (Figures 2.3a and 2.3b).  

We also calculated the N:P ratio (ratio of total delivered TN to TP load) across USGLB and each 

24 

 
 
 
lake basin (Figure 2.3e & 2.3f), which can help us understand the lake trophic status. Overall, the 

USGLB is N enriched, with > 95% (5th percentile of the ratio = 17.85, median of 80.7) of cells 

delivering  nutrients  above  the  classical  Redfield  ratio,  defined  as  an  N:P  molar  ratio  of  ~16:1. 

Individual  Lake  basin  averages  of  N:P  delivery  ratio  is  between  ~58-67  (Figure  2.3f).  For 

comparison, basin averages the N:P ratios of SENSEmap nutrient inputs to the landscape from 

(Hamlin et al., 2020a)  vary between ~17-21, except for Lake Superior at 51.6. It is notable that in 

our current study, the delivery N:P ratio for Lake Superior is 62.4, indicating that the watersheds 

draining to this northernmost lake are relatively efficient at routing P to the coastline. In general, 

watershed deliveries to the lakes indicate that P should be the limiting nutrient, as is largely the 

case for non-marine waters in this region.  

Lake Michigan  receives  the highest TN loads followed by Lake  Erie, with 62.9 and 61.5 kt/yr 

nitrogen delivered from US lands to the water, respectively (Figure 2.3c). Lake Erie has the highest 

TP loads followed by Lake Michigan, with 2.4 and 2.3 kt/yr phosphorus delivery (Figure 2.3d). 

Lake Huron, Ontario, and Superior have much lower deliveries, all at or below 24 kt/yr nitrogen 

and 0.8 kt/yr phosphorus. This is consistent with the larger US drainage basins of Lakes Michigan 

and Erie among the Great Lakes. I also calculated nutrient yields, defined as nutrient fluxes divided 

by drainage basin area (Figures 2.3c & 2.3d). Not surprisingly, Lake Erie has the highest yields 

for  both  TN  and  TP,  with  nitrogen  yields  of  1148  kg/yr/km2  and  phosphorus  yields  of  43.5 

kg/yr/km2.  Lakes  Huron  and  Michigan  have  similar  nitrogen  yields  (567  and  544  kg/yr/km2, 

respectively) and phosphorus yields (18.6 and 20.1 kg/yr/km2, respectively). Lake Superior has 

the lowest nitrogen (218.3 kg/yr/km2) and phosphorus yields (7.7 kg/yr/km2).  

To evaluate the nutrient loading from SENSEflux, I compared simulated TN and TP loads with 

the  GLB  SPARROW  model(D.  M.  Robertson  &  Saad,  2011b),  which  is  the  most  comparable 

25 

 
 
model  with  the  closest  timeframe  and  is  widely  used  as  a  watershed  nutrient  load  predictor. 

Overall, the total modeled delivery of TN and TP from USGLB to the lakes using SENSEflux are 

0.37  and  0.45  times  lower  than  simulated  loads  from  the  SPARROW  model  for  TN  and  TP, 

respectively (Figure A2.0.9). More details about the reasons for these differences can be seen in 

Text A2.5.  

Figure  2.3  Predicted  nutrient  delivery  as  yield  (kg/km2/yr)  to  Great  Lakes  shoreline  and 
summarized nutrient loads by lake basin (black outlines) within the USGLB. TN (a) and TP (b) 
yields  are  direct  outputs  from  SENSEflux  that  are  resampled  to  720  m  resolution  for  display 
purposes and classified in quantiles, within which each color represents ~20% of the USGLB area. 
Bars  in  (c  –  TN)  and  (d  –  TP)  represent  the  total  basin-accumulated  fluxes  and  yields  (area 
normalized) for each nutrient. Map of N:P yield molar ratio at 720m resolution across USGLB, 
rounded to the nearest 1 (e); bar chart of mean N:P load molar ratio by lake basin within USGLB 
(f).  

26 

 
 
 
 
 
 
 
Figure 2.3 (cont’d) 

The average TP load from Lake Erie watersheds calculated by (Maccoux et al., 2016) was equal 

to  5.7  kt/yr  for  2008-2012  (excluding  direct  Point  and  Municipal  sources).  Thus,  the  estimate 

presented here is roughly 42% of that from Maccoux et  al. This difference is dominated by an 

underestimate of loads to Lake Erie from just one tributary, the Maumee River, which suggests 

some  important  pathway  or  source  mechanism  may  be  missing  or  underestimated  within 

SENSEflux.  Future  work  will  continue  to  refine  nutrient  input  mechanisms  and  specifically 

examine varying seasonal loads—given the established importance of winter and spring deliveries 

in that basin (Stow et al., 2015). 

2.5.2: Nutrient Delivery Efficiency and Hotspots 

SENSEflux provides a fully spatially-explicit estimate of nutrient deliveries, allowing a novel view 

of the landscape: nutrient delivery efficiency, defined as the ratio of deliveries to inputs (Figure 

2.4a & 2.4c). Median cell-by-cell TN delivery efficiency was 14.6%, while TP was just 3.7%. In 

27 

 
 
general, northern portions of the GLB have higher delivery efficiency, as do urban areas. Basin-

averaged delivery efficiencies range between 11 - 16.5% for N and 2.8 – 4.9% for P in all lakes 

except for Superior (Figure 2.4b & 2.4d). There, the basin is remarkably efficient at delivering 

nutrients, sending almost 28% of input N and 23% of input P to the coastline. Cumulatively over 

the entire region, TN delivery efficiency was 14.6% (coincidentally the same as the median cell-

by-cell delivery value), while TP was 4.3% (see Figure 2.0).  

Figure 2.4 The ratio between nutrient delivery and apply (defined as efficiency here) for TN (a) 
and  TP  (b)  across  the  USGLB  and  summaries  by  lake  basin  (b  &  d).  Nutrient  hotspots  are 
determined as high yield (x-axis) and efficiency (y-axis) areas in bivariate choropleth maps for TN 
(c) and TP (d).  

Deliveries are the result of both inputs and the cumulative storage and attenuation processes along 

transport  pathways.  Figures  A2.0.10  and  A2.0.11  in  Appendix  A  are  maps  of  the  loss  and 

attenuation  for  N  and  P,  respectively.  Harvest  is  the  dominant  loss  process  across  most  of  the 

domain,  providing  the  broad  N-S  gradient  seen  in  delivery  efficiencies  (Figures  A2.0.10a, 

28 

 
 
 
A2.0.11a).  Stream  and  lake  attenuation  variability  (Figures  A2.0.10c,  A2.0.11d)  occurs  at 

moderate  scales,  driven  in  large  part  by  the  travel  time  from  the  coastline.  Basin  transport 

attenuation (A2.0.10b, A2.0.11c) varies at the shortest scales, responding to distance from streams, 

tile vs. overland transport, and the presence of septic systems. 

Combining  nutrient  deliveries (Figures 2.3a &  2.3b) with  delivery efficiencies (Figures  2.4a & 

2.4c) produces a novel view of landscape nutrient transport function: delivery hotspots, quantified 

by terciles and presented in a bivariate colormap (Figures 2.4e and 2.4f). Areas of the landscape 

with both high loading and delivery efficiency (highest 33%) are the most intense sources of loads 

to  the  coastline  (shown  in  blue  on  the  hotspot  maps).  These  are  predominantly  urban  areas, 

particularly for TP. Next, areas with high delivery (highest 33%) but low efficiency (lowest 33%, 

teal on the hotspot maps), are agricultural areas generally more distant from the coastline. Areas 

with low delivery (lowest 33%) but high efficiency (highest 33%) are highlighted in magenta and 

concentrated in the northern areas of the region. Although they do not generate substantial total 

deliveries, their high delivery efficiency (ratio of deliveries to inputs) makes them of conservation 

interest.  In general,  areas with  high  delivery efficiency  would  be ideal  targets  for  conservation 

efforts, because reducing these inputs has a correspondingly high influence on marginal deliveries.  

2.5.3: Leading Sources of N And P Fluxes to The Great Lakes  

Agricultural  sources,  including  manure,  chemical  agricultural  fertilizer,  and  nitrogen  fixation, 

dominate nitrogen fluxes, totaling ~58% of all fluxes from the USGLB (Figure 2.5a). Agriculture 

was  the  largest  nitrogen  source  of  each  lake,  except  for  Lake  Superior  where  atmospheric 

deposition dominated (91%).  

Phosphorus fluxes were also dominated by agricultural sources (manure and chemical agricultural 

fertilizer) in USGLB (~46%, Figure 2.5b). On the lake basin scale, phosphorus fluxes in Lakes 

29 

 
 
Michigan, and Ontario are driven by agricultural sources because of large manure inputs, while 

Lakes Erie and Huron had higher inputs from chemical agricultural fertilizer. Specifically, Lakes 

Michigan and Ontario had 46% and 60% phosphorus loads from agricultural sources, with manure 

accounting for 27% and 40% of inputs, and 19% and 20% from chemical agricultural fertilizer, 

respectively. For Lakes Erie and Huron, agricultural sources appear to contribute 49% and 46%, 

where agricultural chemical fertilizer was the dominant source (35% and 30%).  

Urban  sources,  including  chemical  non-agricultural  fertilizer,  septic  tanks,  and  point  sources, 

account for only 12.5% of N contributions but 34% of total P in the USGLB (Figures 2.5a and 

2.5b). The contributions of these three sources for TN delivery are similar, with 4.1% of the point 

source, 3.8% of septic tanks, and 4.6% of chemical non-agricultural fertilizer. However, the point 

source accounts for 16.8% of TP delivery. The relatively lower point source contribution for TN 

demonstrates the effectiveness of the CWA and National Pollutant Discharge Elimination System 

(NPDES)  permitting  system.  The  other  major  urban  source  for  TP  transport  is  chemical  non-

agricultural fertilizer (14.7%), which is largely applied to golf courses and lawns. This is supported 

by (Baris et al., 2010) who found that 86% of surface water samples had phosphorus concentrations 

above the criteria set by the US Environmental Protection Agency, after comparing nitrate and TP 

concentrations from golf courses across the United States. It’s critical to ensure the effective use 

of  fertilizers  in  urban  systems,  and  use  best  management  practices  to  reduce  nutrient  transport 

(Bock & Easton, 2020), especially for TP. 

30 

 
 
Figure 2.5 Estimated percent of nutrients delivered to lakes by source. Donut figures represent the 
USGLB while rotated stacked bar plots show each lake basin within the USGLB. The black donut 
circle was divided into agricultural sources (brownish), urban sources (greenish), and atmospheric 
deposition sources (mint tulip) with corresponding percentages.  

Atmospheric deposition is an important contributor of N and P, consisting of 29% of TN and 20% 

of  TP  sources  in  the  USGLB  (donuts  in  Figure  2.5a  and  2.5b).  This  is  largely  because  of  the 

dominant  role  (91%  of  TN,  81%  of  TP  sources)  that  atmospheric  deposition  plays  in  Lake 

Superior,  which  has  severe  stoichiometric  imbalance  with  high  N  and  low  P  concentrations 

(Sterner, 2011). Unfortunately, atmospheric deposition can be difficult to manage, and research 

has found that current policies and technologies may not be sufficient to reduce deposition under 

critical loads (Clark et al., 2018; McCrackin et al., 2017).  

In summary, although management actions have focused on agricultural sources for decades, they 

still dominate TN and TP transport and delivery across most of the USGLB. While approximately 

71% and 88% of TN and TP sources applied to the landscape are from agriculture (Hamlin et al., 

2020a), only 58% and 46% of TN and TP are delivered to USGLB. This means that even though 

a higher percentage of agricultural phosphorus source is applied to landscapes, less percentage of 

31 

 
 
 
phosphorus  is  delivered  to  aquatic  ecosystems  in  these  lower  delivery  efficiency  areas.  These 

disproportional differences between sources and fates show the importance of nutrient transport 

and the natural differences between TN and TP attenuation.  

2.5.4: Nutrient Delivery Pathways: The Dominant and The Underappreciated 

We  summed  nutrient  deliveries  by  transport  pathways  across  the  USGLB,  shown  ranked  from 

highest to lowest proportion (surface (tile fields + overland) > subsurface (groundwater + septic 

plume) > point) for TN and TP within the USGLB (Figure 2.6). Our results agree with prior work 

that indicates surface pathways dominate transport (Baker et al., 1975; Ryden et al., 1974; Sims et 

al., 1998), contributing the largest proportion of TN and TP (66% and 76%) delivery to the lakes. 

Others have shown that overland flow was the primary export pathway for both P and N, but tile 

drainage cannot be overlooked (Kokulan et al., 2019; D. Smith et al., 2015) and contributes almost 

½ of some TP loads (D. Smith et al., 2015). I found that overland flow was the primary pathway 

(40%) for TP delivery to the US Great Lakes. TN was dominated by tile fields (46%) and 36% of 

TP  was  transported  by  tile  fields,  showing  tile  drainage  delivers  large  quantities  of  nutrients, 

especially nitrogen to the Lakes and is thus critical to manage. 

Subsurface pathways (groundwater flow and septic plumes) transport an important proportion of 

nutrients to the lakes, with about 30% of TN and 6.8 % of TP (Figure 2.6). The groundwater flow 

pathway  dominates  the  subsurface  transport  of  nitrogen  (26%  of  total  transport),  likely  due  to 

nitrogen’s high mobility. I found that septic plumes contributed 3.8% of total nitrogen (2% of TP) 

to lake loading. Other studies have indicated that septic systems are important nutrient sources (L. 

E. Oldfield et al., 2020; Reay, 2004), yet they are rarely accounted for and commonly overlooked. 

The pathway proportions from the point source are 4.1% for TN and 16.8% for TP, demonstrating 

more efforts are needed  to  reduce phosphorus loads to  the Great  Lakes.  Note our maps do not 

32 

 
 
include direct discharges of point sources to the Great Lakes coastline. 

Figure  2.6  Estimated  percentages  of  total  nutrients  delivered  to  lakes  by  pathways  (surface: 
overland and tile fields; subsurface: groundwater and septic plume).  

2.5.5: Heterogeneous Pathways 

Surface pathways dominated nutrient contributions within each Lake basin (Figure 2.7a & 2.7b). 

In the Lake Superior basin, overland flow dominates nutrient transport, with 61% of TN and 86% 

of TP. In the Lake Michigan basin, tile drainage transported 45% of TN (23% by overland flow) 

and 33% of TP (46% by overland flow). Tile fields delivered more nutrients than overland flow in 

Lakes Erie and Huron basins. This supports earlier work that found tile drainage to be the primary 

pathway  for  nutrient  delivery  to  streams  in  the  western  Lake  Erie  basin(D.  Smith  et  al.,  2015; 

Williams et al., 2018). In the Lake Ontario basin, tile fields transported 42% of nitrogen (23% for 

overland) and 37% of phosphorus (40% for overland).  

Subsurface  pathway  (septic  plume  and  groundwater)  contributions  varied  across  the  five  lake 

basins  (Figure  2.7a  &  2.7b).  The  bulk  groundwater  flow  pathway  (excluding  septic  plumes) 

contributed substantially across the lake basins, ranging from 24% in the Lake Erie basin to 34% 

33 

 
 
 
in  the  Lake  Superior  basin  for  N.  Conversely,  about  4.8%  (4.1%  -  5.5%)  of  phosphorus  was 

transported via the groundwater pathway. The proportion of nitrogen load from the septic pathway 

varies from 2% in the Lake Superior basin (1% for TP) to 4.4% in the Lake Huron basin (2.8% for 

TP). Controlling much of the landscape nutrient delivery to the USGLB, Michigan is one of the 

few  states  in  the  US  without  a  statewide  septic  code  governing  septic  system  installation, 

maintenance, and repair (J. Alexander, 2013) although discussion and development of such a code 

is ongoing. 

To  further  investigate  the  heterogeneity  of  nutrient  delivery,  I  mapped  the  amount  of  TN 

transported through our four basin transport pathways to streams (overland flow, Figure 2.7c; tile 

fields, 2.7d; groundwater, 2.7e; septic plume, 2.7f). Maps for TP transport pathways are shown in 

Figure  A2.0.12  because  of  space  limitations  and  overall  similar  spatial  patterns  in  TN  and  TP 

pathways.  For  instance,  overland  transport  pathways  are  high  (>  238  kg/km2/yr  TN,  >  11.97 

kg/km2/yr TP,) in the southern Lake Michigan basin, eastern Lake Ontario basin, and some urban 

areas (i.e., Detroit, MI; Cleveland, OH; Buffalo, NY; Rochester, NY). The tile field pathway is a 

major  contributor  (>  1053  kg/km2/yr  TN,  >  27.71  kg/km2/yr  TP)  in  agricultural  areas  (e.g., 

Southern Lake Michigan basin, Saginaw Bay, western Lake Erie, and Lake Ontario basins), likely 

due to high chemical fertilizer and manure inputs, along with higher tile drainage density. Lakes 

Erie, Ontario, and southwestern and the thumb area of Lake Michigan showed higher TN and TP 

delivery through groundwater flow, possibly due to higher groundwater withdrawal rates in these 

areas  and  elevated  nitrate  concentrations  (Hamlin  et  al.,  2022).  The  septic  plume  yields  were 

highest  around  large  cities,  where  dense  suburban  populations  are  not  yet  connected  to  sewer 

systems. 

These results also show substantial variability across the USGLB basins, with different dominant 

34 

 
 
pathways  in  different  Lake  basins  (Figure  2.7).  Specifically,  as  the  installation  of  tile  drainage 

expands or intensifies, fluxes from tile drainage will likely become more important. 

Figure 2.7 The estimated total yield of TN delivered to lakes by four key pathways. (a) and (b) are 
for TN and TP pathways by lake basin, (c) – (f) for TN overland, tile fields, groundwater, and 
septic plume respectively. Maps are resampled from 120 m SENSEflux outputs to 720 m resolution 
here  for  display  purposes  and  classified  in  quantiles,  with  each  color  representing  20%  of  the 
dataset;  the  white  area  in  c  &  d  within  the  basin  boundary  represents  areas  with  no  data  as  I 
assumed overland and tile fields are alternative pathways. Corresponding maps for TP are included 
in Figure A2.0.12. 

35 

 
 
  
 
2.6: Implications, Limitations, and Future Work 

This  study  uses  a  spatially  explicit  nutrient  transport  model  to  help  understand  the  fate  and 

transport  of  total  nitrogen  and  phosphorus  from  multiple  sources  along  different  pathways. 

Modeling  distinct  transport  pathways  provides  a  novel  alternative  to  many  models  that  do  not 

include important pathways, particularly groundwater and septic plumes. Our analysis shows that 

overland flow and tile fields  are major pathways of nutrient  transport, but  subsurface transport 

plays an important role. Specifically, tile drainage is highest in Lake Erie, transporting 44% more 

TN and 15% more TP than the overland pathway, suggesting the increasing  installation of tile 

drainage may have significant effects in other regions. For subsurface pathways, groundwater and 

septic plumes provide 30% of TN delivery and 6.8% of TP. This will become even more important 

when we consider legacy nutrients that often have long groundwater travel times (Martin et al., 

2011,  2017,  2021).  Thus,  these  subsurface  pathways  should  not  be  ignored  in  water  quality 

management and policy.  

Agricultural  nutrient  sources  (manure,  chemical fertilizer,  and nitrogen fixation) have played  a 

dominant role in the Great Lakes Basin’s history and will be a critical part of its future. I also found 

that  atmospheric  deposition  is  a  significant  source  of  nitrogen,  and  septic  tanks  contribute 

significant nitrogen and phosphorus loads to the environment. Groundwater also plays a substantial 

role in  transporting  nutrients  from  the landscape  to  streams,  and  eventually  to  the Great  Lakes 

coastline. These findings can be used along with the SENSEflux and SENSEmap products (Hamlin 

et  al.,  2020a)  for  the  US‐GLB  to  provide  managers  with  spatially  explicit  loading,  efficiency, 

source, and pathway estimates. For example, the Tipping Point Planner Program links watershed 

data to local decision-making processes(Weinstein et al., 2021) (tippingpointplanner.org), thereby, 

the addition of SENSEflux can help managers focus actions on specific sources and pathways. The 

36 

 
 
information presented here can provide important inputs to this community-facing tool.  

Future  research  could  improve  nutrient  flux  and  pathway  estimates  for  the  Great  Lakes  Basin, 

which would help inform more holistic decisions to achieve nutrient reduction strategies. A more 

accurate tile drainage map would improve estimates of the contributions of this pathway to the 

waterways in the basin. In addition, the role of septic plumes in phosphorus delivery and lakes that 

don’t have a connection with streams should be further explored to seek ways that protect water 

quality by reducing N and P loads. Also, this modeling assumes that all landscape input nutrients 

have had sufficient time to reach the streams where concentrations are observed and that nutrient 

inputs aren’t changing meaningfully over decadal time scales. Future efforts could include time-

varied surface loads, along with estimates of legacy timescales and travel times.  

2.7: Acknowledgments 

This work was funded by NASA Grant NNX11AC72G and NOAA Grant NA12OAR432007, and 

grant GS100013. Luwen Wan was partially supported by the China Scholarship Council (CSC). 

Any opinions, findings, conclusions, or recommendations expressed in this material are those of 

the authors and do not necessarily reflect the views of NASA, NOAA, or CSC.  

37 

 
 
CHAPTER 3: SEASONAL VARIATION OF NITROGEN AND PHOSPHORUS SOURCES, 

PATHWAYS AND LOADS: A SPATIALLY EXPLICIT MODEL SENSEFLUX 

3.1: Abstract 

Excess  nitrogen  and  phosphorus  can  cause  algal  blooms,  which  limit  recreational  activities, 

increase costs to treat drinking water, and can be toxic to  human and aquatic life. In temperate 

climates, there is a consensus that nutrient delivery to streams and rivers increases during snowmelt 

periods.  However,  the  quantification  of  total  nitrogen  and  phosphorus  loading,  sources,  and 

pathways  across  hydrologic  seasons  is  not  well-known.  Here  I  updated  the  Spatially  Explicit 

Nutrient Source Estimate and Flux model (SENSEflux) with a seasonally dynamic approach to 

simulate nutrient loadings, sources, and pathways to the US Great Lakes. Specifically, I examined 

two distinct hydrologic conditions: snowmelt, early spring conditions with characteristically high 

flows,  and  baseflow,  mid-  to  late-summer  where  the  bulk  of  streamflow  originates  from 

groundwater discharge. Results show that total nitrogen loading during snowmelt periods is four 

times greater than annual average deliveries. The contribution of agricultural sources (chemical 

agricultural fertilizer, manure, and N fixation) is substantially higher (14% for TN and 5% for TP) 

during melt than baseflow, while point sources, septic tanks, and atmospheric deposition become 

more prominent contributors to nutrient delivery during baseflow. In addition, most nutrients are 

transported  via  overland  flow  and  tile  fields  regardless  of  the  period.  Groundwater  and  septic 

plume pathways also play an essential role in transporting nutrients and become more significant 

during baseflow. Nutrient delivery hotspot analysis shows that targeting nutrient reductions during 

snowmelt is more effective than focusing on baseflow, especially for TN. The optimal location for 

nutrient reductions during baseflow would be those areas with high nutrient deliveries spatially. 

Thus,  seasonal  variation  of  nutrient  transport  should  be  considered  when  establishing  nutrient 

38 

 
 
criteria  and  reduction  targets.  These  results  and  high-resolution  (120  m)  SENSEflux  output 

products  provide  decision-makers  with  nutrient  reduction  insights.  In  addition,  the  SENSEflux 

model can be readily applied to other regions with nutrient management issues.  

3.2: Introduction 

Nitrogen and phosphorus enrichment are two of the most detrimental and costly stressors in surface 

waters, degrading water quality and causing eutrophication in the world’s stream, coastal, and lake 

systems (Abell et al., 2010; R. B. Alexander & Smith, 2006; Azzari et al., 2019; Huggins et al., 

2023; K J Van Meter & Basu, 2017; Pearce & Yates, 2020; K. Yan et al., 2019). Due to irregular 

occurrences  (i.e.,  heavy  precipitation,  melting  snow)  and  seasonal  agricultural  activity  such  as 

fertilization (Rixon et al., 2020), nutrient delivery can be continuous but is frequently intermittent. 

For instance, 42–92% of the annual total phosphorus (TP) load and 41–81% of the total nitrogen 

(TN) load were delivered during snowmelt in the Red River sub-watershed (Corriveau et al., 2013). 

The  seasonal  nutrient  delivery  then  alters  its  concentration  in  rivers,  streams,  and  downstream 

waterbodies. Another example for the Upper Mississippi River Basin is that TN concentration was 

found  to  be  lowest  in  September,  remain  steady  through  the  winter  and  increase  further  in  the 

spring season due to fertilizer application (Wine et al., 2020). In addition, excessive nitrogen and 

phosphorus pollution in aquatic systems frequently causes seasonal summertime algal blooms and 

hypoxia events (Bechard, 2020; Carmichael & Boyer, 2016; Gobler, 2020; Mchau et al., 2019; 

Pretty et al., 2003). Thus, simulating the seasonal dynamics of nutrient export and understanding 

the  seasonal  contributions  of  sources  and  pathways  are  crucial  for  managing  nitrogen  and 

phosphorus delivery and implementing nutrient reduction strategies (LaBeau et al., 2015; Mayorga 

et al., 2010b; Tong et al., 2019). 

Climate  and  environmental  changes  have  been  identified  that  affected  nutrient  delivery  and 

39 

 
 
exacerbated nutrient pollution, such as heavy precipitation and increased streamflow, rising water 

and air temperatures, and agricultural activities (Hallegraeff et al., 2021; Heil & Muni-Morgan, 

2021).  Precipitation  and  subsequent  runoff  mobilize  and  transport  nutrients  (dissolved  and 

particulate) from upland sources to downstream water bodies. For instance, in an agricultural Lake 

Mendota watershed, Wisconsin, most phosphorus was transported in the spring during the high 

flow period (Huisman et al., 2017). However, phosphorus pollution was also found to being the 

most serious in low-water season in the Qiuxi River, China, with high risk of phosphorus release 

from  sediment  (Xuemei  Chen et  al.,  2021). Air and water temperature also affect  nitrogen and 

phosphorus differently, such as changing  the biological  activity rate, solubility, transformation, 

and biogeochemical cycling processes. The reaction rates, mineralization, and denitrification have 

been shown to increase with higher temperatures (Alam & Dutta, 2013; Hartman et al., 2014; Paerl 

et  al.,  2011),  while  excessive  temperature  can  cause  nutrient  loss  and  ecological  impacts. 

Furthermore,  the  variation  and  magnitude  of  nitrate  loads  were  higher  at  the  beginning  of  the 

growing season due to fertilizer applications in the northern Red River basin (Nasab et al., 2018). 

These cases have demonstrated that the importance and complexity of nutrient transport has been 

recognized  by  many  researchers.  However,  because  the  coverage  of  nutrient  monitoring  and 

availability  of  natural  and  anthropogenic  nutrient  sources,  particularly  at  large  scales,  are 

frequently  limited,  the  quantification  of  nutrient  loadings,  sources,  pathways,  and  hotspots  is 

inevitably involved with watershed modeling.  

Nutrient models have been developed to quantify loads and the contribution of sources and been 

used for nonpoint source pollution management decisions (Wellen et al., 2015). As one of the most 

widely  used  process-based  watershed  models,  the  Soil  and  Water  Assessment  Tool  (SWAT) 

simulate nutrient loadings based on the Hydrologic Response Units (HRU), which are aggregated 

40 

 
 
land areas using unique combinations of land use and land cover, soil type, and topography (Arnold 

et  al.,  1998;  Bosch,  2008).  SWAT  predicts  nutrient  delivery  and  transport  based  on  lumped 

parameters within each sub-watershed. The SPARROW model was developed by the United States 

Geological Survey (USGS) that estimates the quantity of contaminant delivered downstream based 

on  monitoring  data,  location,  and  landscape  characteristics  (Preston  et  al.,  2011;  R.  A.  Smith, 

Schwar, et al., 1997). It has been applied to US watersheds, such as the Mississippi River basin 

and Chesapeake Bay watershed (M. P. Miller et al., 2020; D. M. Robertson & Saad, 2013), as well 

as  some  binational  watersheds  (Benoy  et  al.,  2016;  D.  M.  Robertson  et  al.,  2019).  Recently,  a 

seasonally dynamic application of SPARROW was developed and showed that seasonally varying 

streamflow  and  temperature  were  significant  drivers  of  nitrogen  and  phosphorus  uptake  in 

northeastern US river corridors (Schmadel et al., 2021). Another model example is the first global, 

spatially explicit model, named the Global  Nutrient  Export  from  WaterSheds  (NEWS). NEWS 

was developed to estimate nutrient export at river mouths based on basin-scale nutrient budgets 

and seasonal inputs (Mayorga et al., 2010b; Seitzinger et al., 2005). For example, calendar seasons 

(i.e., spring, summer, autumn, and winter) were used to simulate dissolved inorganic nitrogen in 

the  world’s  largest  rivers  (McCrackin  et  al.,  2014).  Thus,  simulating  nutrient  fluxes  and  their 

transport and delivery  can benefit  from  modeling that relates climatic  and hydrological  drivers 

with landscape and in-stream processes to identify hotspots and moments and inform watershed 

management.  However,  the  implementation  of  current  watershed  models  still  requires 

improvements due to the absence of spatial heterogeneities within basins and limited information 

on hydrological seasonal variations, thus existing models are generally applied to annual loadings 

and do not consider seasonal variabilities of nutrient loadings, sources, and pathways.  

To address these limitations, Luscz et al. (2017) developed a spatially explicit nutrient transport 

41 

 
 
model that simulate the seasonal differences in nutrient sources, pathways, and processes in the 

Lower  Peninsula  of  Michigan.  This  seasonal  pathway  and  process  model  was  inspired  by 

SPARROW but differs in different ways. For instance, this model has spatially explicit description 

of nutrient sources and basin features, thus describing spatially explicit nutrient attenuation along 

nutrient transport pathways. Also, it models temporally-variable transport and predicts seasonal 

variabilities.  

Here, I substantially improved the previously published nutrient transport modeling work (Luscz 

et  al.,  2017)  and  SENSEflux  annual  model  (see  Chapter  2)  to  predict  seasonal  nitrogen  and 

phosphorus loading variations from  the United States Great  Lake Basin  (USGLB)  to  the Great 

Lakes.  This  new  SENSEflux  seasonal  model  includes  major  updates  on  seasonal  mobility  of 

applied nutrients across the landscape and seasonal nutrient retention in streams and rivers. These 

SENSEflux seasonal models are then able to quantify the contributions of seven nutrient sources 

and  four  distinct  basin  pathways  to  in-stream  nutrient  concentrations.  More  specifically,  two 

distinct hydrologic conditions were chosen, early spring snow melt with very high flows that flush 

nutrients and the late summer baseflow primarily representing groundwater transport pathways. 

This  research  seeks  to  clarify  the  critical  times  and  locations  for  nutrient  delivery,  enabling 

watershed managers to implement cost-effective nutrient reduction strategies.  

3.3: Materials and Methods 

3.3.1: Study Area and Definition of Hydrological Seasons  

The Great Lakes are one of the world’s largest surface freshwater systems, consisting of Lakes 

Superior, Michigan, Huron, Erie, and Ontario (Figure 3.1). It stores approximately 84% of North 

America's surface freshwater and 21% of the world’s surface freshwater. The Great Lakes Basin 

covers  significant  portions  of  the  United  States  and  Canada.  It  provides  drinking  water  and 

42 

 
 
annually attracts tens of millions of visitors (Cotner et al., 2017). However, harmful algal blooms 

(HABs) have been found throughout the Great Lakes Basin. These have adverse ecological and 

economic  effects  and  threaten  human  health,  including  the  contamination  of  drinking  water 

(Michalak et al., 2013). In the summer of 2014, one of the most notable HABs led to a three-day 

tap water prohibition in Toledo, Ohio. Due to data availability, this study focuses on the United 

States side of the Great Lakes Basin (USGLB), which includes portions of eight states: Illinois, 

Indiana, Michigan, Minnesota, New York, Ohio, Pennsylvania, and Wisconsin.  

The Great Lakes are expansive, spatially diverse, and dynamic. Their physical characteristics, such 

as climate, soil, and land use, vary across the basin (Sterner, 2020). For instance, the USGLB is 

temperate and sub-humid, including Koeppen-Geiger  zones Dfa and Dfb, which  represent  hot-

summer and warm-summer humid continental climates, respectively (Rubel & Kottek, 2010). The 

climate has changed considerably over the past several decades. From 1910 to 2015, the annual 

mean temperature increased by 1.6 ℃, exceeding the average change of 1.2 ℃ for the remainder 

of the Continental United States (CONUS). Over the same time, precipitation increased by nearly 

10%,  with  more  coming  from  extreme  events  (Wuebbles  et  al.,  2019).  Additionally,  snow  is 

essential to the regional energy budget, hydrological cycle, and human activities (Ford et al., 2020, 

2021).  For  instance,  snow  results  in  adverse  effects  such  as  snowmelt-induced  flooding  and 

nutrient transport (Suriano et al., 2019), although less ice and fewer freezing days are predicted for 

the Great Lakes. Overall, it is widely acknowledged that climate change is a significant driver of 

change in the Great Lakes.  

Mixtures of clays, silts, sands, gravels, and boulders were deposited as glacial drift or as glacial 

river and lake sediments in the southern USGLB. In contrast, the northern USGLB has a typically 

thin layer of acidic soil atop bedrock. The Great Lakes region is renowned for its abundant and 

43 

 
 
diverse  agricultural  production.  Its  fertile  lands  provide  ideal  conditions  for  growing  maize, 

soybeans, winter wheat, and 15 percent of the nation’s dairy products. Most agricultural planting 

occurs in the southern portion of the basin, whereas the northern portion remains forested. As a 

result  of  increased  yields  and  arable  area  from  tile  drainage  installation,  the  southern  basin 

continues to use and enhance its drainage systems. However, they pose nutrient pollution issues 

for the Great Lakes, particularly in wet years (S. A. Miller & Lyon, 2021b).  

Throughout this manuscript, we use the phrase “hydrologic season” to  denote the times during 

which a set of hydrologic conditions (i.e., high-flow “snowmelt” and low-flow “baseflow”) occur. 

Within  the  USGLB,  high  streamflow  “snowmelt”  conditions  occur  most  frequently  during  a 

calendar period extending from March 1 to April 15. Similarly, low-flow “baseflow” conditions 

occur  most  often  from  July  15  to  September  15.  Within  these  calendar  periods,  hydrologic 

conditions  were  determined  based  on  streamflow.  This  distinction  allows  for  stream  nutrient 

samples collected during the hydrologic season to be filtered to include only those that meet the 

definition for snowmelt or baseflow conditions.  

3.3.2: Seasonal Nutrient Concentrations, Loads, and Watershed Delineation 

Comprehensive TN and TP concentrations were downloaded from  the STOrage and RETrieval 

Data Warehouse (STORET) via the Water Quality Portal (WQP, www.waterqualitydata.us). WQP 

was  developed  by  the  US  Environmental  Protection  Agency,  USGS,  and  the  National  Water 

Quality  Monitoring  Council  (Blodgett  et  al.,  2016).  It  is  the  largest  standardized  water  quality 

database  and  provides  hundreds  of  millions  of  data  dating  back  over  a  century  for  sites  in 

groundwater, inland, and coastal waters (Read et al., 2017). The database from multiple sources is 

invaluable for understanding the historical and current water quality of Nation’s rivers, while some 

records lacked or contained ambiguous metadata; parameter names; sample fractions (e.g., filtered, 

44 

 
 
unfiltered, dissolved, total, and others) and units of measurement (Sprague et al., 2017).  

In  this  study,  I  focus  on  total  phosphorus  (TP)  and  total  nitrogen  (TN).  TN,  unlike  TP,  has 

numerous  related  characteristic  names,  including  inorganic  and  organic  forms.  Ammonia, 

ammonium, nitrate, and nitrite are the inorganic forms of N. Total Kjeldahl nitrogen (TKN), the 

sum of organic nitrogen, ammonia, and ammonium, is the standard laboratory method for detecting 

organic nitrogen. While definitionally true, available data from both hydrologic seasons show an 

essentially equivalent relationship between TN and the sum of total nitrate (NO3

-), nitrite (NO2

-), 

and  TKN  (R2  =  1,  slope  ~  1,  Figure  A3.0.1).  Thus,  I  imputed  TN  values  for  sites  without 

measurements using the sum of NOx

- and TKN. For TP, I used sites that have total phosphorus 

measurements.  

To filter nutrient samples for hydrologic conditions, I first calculated the streamflow percentiles 

for each site using estimated streamflow (Mooney et al., 2020) from 2008 to 2012 during our study 

period  (circa  2010).  Then,  during  the  snowmelt season,  I  only  included  nutrient  measurements 

when streamflow exceeded the 70th percentile (for baseflow, less than the 30th percentile). 

Following the below steps, the TN and TP loads on effective seasons (baseflow and melt) for each 

site were calculated; see also Figure A3.0.2. First, I match the estimated streamflow to the site-

specific  concentration.  Secondly,  loads  are  determined  by  multiplying  the  concentration  by 

streamflow. Thirdly, each site's mean Load was calculated by averaging the load from the second 

step. Fourth, the median flow at each site from 2008 to 2012 is provided. Finally, the concentration 

at  each  site  was  calculated  by  dividing  the  mean  load  by  the  median  flow.  For  this  study, 

concentrations are used to calibrate and validate the models. At the same time, mean loads are 

employed as observed loads for each site and compared to the simulated loads (median flow times 

simulated concentration). Annual loads are downloaded from the Great Lakes Basin’ application 

45 

 
 
of the SPARROW model (D. M. Robertson et al., 2019). 

After  filtering  by  hydrologic  conditions,  I  compiled  concentrations  for  370  TN  (463  TP)  sites 

within the GLB during snow melt season, and 571 TN (1000 TP) sites during baseflow, see site 

concentrations and distributions in Figure A3.0.3. Watersheds were delineated using the ArcGIS 

Hydrology toolbox for all the sites with concentration (Figure A3.0.3) and streamflow data. DEM 

from USGS NED at 30 m resolution was utilized to calculate flow direction and flow accumulation 

to generate stream networks, which were then utilized to generate watershed polygons and raster 

(Gsech et al., 2002). The distribution of watershed size is shown in Figure A3.0.4.  

3.3.3: SENSEflux Model and Modifications for Seasonal Predictions 

3.3.3.1: SENSEflux Seasonal Model Overview  

Dynamic modeling of nutrient transport and fate has the potential to simulate seasonal nutrient 

loadings to streams and lakes. I updated SENSEflux from its annual model (see Chapter 2) and the 

seasonal model that was previously published version of the SENSEflux model (Luscz et al., 2017) 

to simulate seasonal nitrogen and phosphorus loadings, sources, pathways, and hotspots. Major 

updates include seasonal mobility that are derived to compare nutrient concentrations and fluxes 

across  landscape  processes  with  yearly  conditions,  seasonal  in-stream,  and  lake  retention,  and 

SENSEflux seasonal models are calibrated and validated with nutrient concentrations and fluxes 

during hydrological seasons See more details regarding the major updates in section 3.3.4. With 

these  model  updates,  SENSEflux  seasonal  models  simulates  nutrient  loading  in  snowmelt  and 

baseflow  with  three  seasonally  variable  parameters  for  surface  basin  reduction  (bs),  biological 

uptake in rivers (Bio) and lake attenuation (Tsettl). Because I do not have information about when 

nutrients are applied (also, when those become mobile), I use annual average nutrient inputs, with 

seasonal mobility inferred from hydrologic flux variability, to simulate nutrient loading.  

46 

 
 
SENSEflux  employs  a  mass  balance  approach  to  simulate  nutrient  transport,  see  equation  3.1. 

Specifically, SENSEflux has four major components: (1) nutrient sources; (2) in-situ losses; (3) 

seasonally variable basin attenuation; and (4) seasonally-varied river and lake attenuation. Sources 

are  spatially  explicit  and  include  five  surface-applied  sources  (agricultural  chemical  fertilizer, 

manure, nitrogen fixation, non-agricultural chemical fertilizer, and atmospheric deposition), noted 

as  other  sources  in  equation  3.1,  septic  tanks  (Ssep),  and  points  sources  (Spoint)  (Hamlin  et  al., 

2020a).  

𝑐𝑒𝑙𝑙𝑠

𝐿𝑘 =   ∑ 𝑅𝑗𝑇𝑠𝑒𝑡𝑡𝑙𝑗 × {𝑆𝑝𝑜𝑖𝑛𝑡 + {𝐺𝑠𝑚𝑆𝑠𝑒𝑝(1 − 𝑆𝑒𝑝𝐸𝑓𝑓)𝐵𝑠𝑒𝑗

𝑗

𝑜𝑡ℎ𝑒𝑟 
𝑠𝑜𝑢𝑟𝑐𝑒𝑠
+ ∑ 𝑆𝑖𝑗
𝑖

× 𝐸𝑥𝐻𝑖𝑗

× [𝑆𝑠𝑚(1 − 𝐹𝑗)𝐵𝑠𝑗 + 𝐺𝑠𝑚𝐹𝑗(1 − 𝐹𝑠𝑡𝑜𝑟𝑖𝑗)𝐵𝑔𝑗] 

𝑅𝑗 = 𝑒−𝛼∗𝐷𝑁𝑆𝑃𝑗 ∗   𝑒−𝛼1∗𝐵𝑖𝑜𝑗 

(3.1) 

(3.2) 

Harvesting (ExH), storage (Fstor), and septic removal (SepEff) are assumed the active processes 

during basin transport and represented as in-situ losses in SENSEflux modeling. Nutrient losses in 

the root zone, such as uptake by plants, denitrification, sorption, and P mineralization, are lumped 

as the Harvest term. Moreover, it has been assumed that these losses only occur in areas where 

manure  or  agricultural  chemical  fertilizer  is  applied.  Specifically,  surface-applied  nutrients  are 

harvested (ExH) first, and the remaining nutrients are partitioned between surface and groundwater 

pathways, and further formalized nutrient mobile pools. This partition is determined by a spatially 

variable  parameter  (F)  that  varies  in  direct  proportion  to  the  groundwater  recharge  fraction. 

Nutrient storage, such as in-place storage and nutrient losses below the root zone, and I assume 

the  storage  only  applied  to  phosphorus  because  of  the  relatively  high  mobility  of  nitrogen  in 

subsurface systems. A fixed percentage of nutrient removal is applied to septic sources.  

47 

 
 
After partitioning between surface and groundwater pathways, storage of P below the root zone, 

and  addition  of  septic  sources,  seasonal  nutrient  mobility  factors  are  introduced.  Specifically, 

event-based surface mobilization (Ssm) and event-based groundwater mobilization (Gsm). Here, I 

assume that nutrient mobility is related to hydrologic fluxes, that as fluxes increase so does nutrient 

mobility.  We  derive  Ssm  and  Gsm  from  USGS  stream  gauges  using  the  groundwater-discharge 

(baseflow) and surface runoff components of streamflow determined by hydrograph separation. 

For each gauge, the ratio between seasonal surface runoff and groundwater discharge with annual 

averages of the same are computed (e.g., snowmelt groundwater discharge/ annual groundwater 

discharge). Then, spatially continuous seasonal mobility layers are created for the USGLB with a 

Random  Forest  model  trained  at  the  USGS  gauges  to  just  three  variables,  watershed  size, 

hydrologic landscape region (HLR), and unconsolidated glacial  drift  thickness (more details  in 

section 3.3.4.2). Within the model, the seasonally mobile surface and groundwater nutrients are 

computed  as  the  product  of  their  respective  mobilization  factors  (i.e.,  Ssm,  Gsm)  and  available 

nutrients  within  the  surface  and  subsurface  pool  for  each  cell.  The  result  is  the  surface-  and 

subsurface-mobile nutrient pools. 

Nutrients  in  these  mobile  pools  are  then  transported  along  and  attenuated  within  transport 

pathways, namely, basin transport and attenuation. Surface-mobile nutrients are attenuated via one 

of the two basin pathways, overland flow (BS) or tile drains (BST). Here, I calibrated the model 

with  a  seasonally  varied  overland  flow  pathway  parameter  (bs)  with  a  spatially  explicit  and 

constant  overland  flow  length  (section  3.3.3).  The  subsurface-mobile  nutrients  are  transported 

through the groundwater flow pathway (BG). A separate groundwater pathway is septic plumes 

(BSE)  that  transport  the  mobile  nutrients  from  septic  systems.  The  septic  plume  pathway  is 

separated from general groundwater because of the distinct chemical characteristics of septic tank 

48 

 
 
sources.  

The remaining nutrients after basin transport and attenuation then enter stream where point sources 

are  then  added  and  subjected  to  further  attenuation  through  stream  (including  flow  through 

connected  wetland  systems)  and 

lake  processes,  such  as  N  denitrification  and  P 

sorption/mineralization (DNSP), and biological update (Bio) (see equation 3.2). We assume that 

N denitrification or P sorption/mineralization can be described as a streambed exchange rate within 

each  cell  along  the  travel  path.  The  streambed  exchange  rate  is  calculated  from  the  hydraulic 

conductivity of the streambed sediments, the average slope of  the stream channel, groundwater 

recharge, hydraulic radius, and in-stream water velocity (see Chapter 2). Within these variables, 

ksat,  slope,  and  groundwater  recharge  are  constant  throughout  the  year  but  vary  spatially,  see 

details in Chapter 2. Hydraulic radius (channel area/wetted perimeter of the stream channel) and 

velocity vary across the season, see calculation in section 3.3.4.2. More detailed descriptions for 

deriving these terms are seen Text A2.3: Spatial Distribution and Derivation for Instream and Lake 

Losses.  

The  other  in-stream  attenuation  process,  biological  uptake,  is  considered  to  be  a  function  of 

seasonally-variable instream travel time and is calibrated with seasonal parameters (rbio). Lake 

attenuation  was  modified  from  (Chapter  2)  where  I  considered  it  a  function  of  the  intersected 

distance when nutrients  are transported through the river and connected lakes. Here, I used the 

seasonally-varied settling time with a scalar parameter (tsettl) instead, see details in section 3.3.4.2.  

The seasonal model output is a map, at 120m resolution, of the total nutrient yield per cell per day, 

in this case, delivered to the GL coastline. These can then be used to compare nutrient fluxes (and 

in-stream  concentrations)  across  these  varied  seasons  but  do  not  fully  describe  nutrient  load 

variability throughout the year. In other words, this is a seasonal model of hydrologic conditions, 

49 

 
 
not  a true daily time-varying  nutrient  flux  estimate.  However, it  can  still  provide more precise 

information regarding when and where nutrient reduction should be targeted.  

3.3.3.2: Model Calibration and Optimization   

There are eleven parameters in total in the updated SENSEflux seasonal models. See more details 

about  these  parameters,  function,  and  their  types  (i.e.,  exponential,  or  linear)  in  table  A3.1. 

Generally, we assume that seven parameters (f, ExH, SepEff, fstor, bst, bg, bse, dnsp) do not exhibit 

seasonal variations, and the remaining three vary across seasons: bs, rbio, and tsettl. I assumed 

that subsurface, groundwater dependent, and harvest parameters would stay the same throughout 

the year. In preliminary calibrations, we initially allowed bst to vary seasonally, but it showed little 

sensitivity so was set to be uniform across seasons. Thus, it’s assumed that the rates of nutrient 

loss in tile drain field pathway would be relatively similar across season (depending on seasonal 

nutrient mobility). Similarly, for dnsp during the river and lake attenuation, it did not show much 

sensitivity  during the  preliminary  model runs, so it’s assumed to  be  stable throughout the year 

while the biological uptake (rbio) and the settling time (tsettl) parameters varied seasonally. All 

parameters, both seasonally variable and annually fixed, are optimized except for septic removal 

efficiency (SepEff), which is set to 0.3 for N and 0.35 for P as the same with SENSEflux annual 

models in Chapter 2.  

SENSEflux  models  are  calibrated  with  seasonal  observed  concentrations  or  fluxes.  Here, 

concentrations were selected because it improved models’ sensitivity to the attenuation terms, and 

modeled  fluxes  are  dominated  by  the  estimated  or  measured  flow  term  when  loads  were  used. 

After calibration, loads were used to assess model performance and present model results so that 

the results are comparable to those from other nutrient models in the literature. Seasonal TN and 

TP concentrations are compiled from the STORET database, see data processing in section 3.3.4.1. 

50 

 
 
SENSEflux models for TN and TP were calibrated separately. For each season of TN or TP, we 

used a random sampling approach to select 70% of the sites (Figure A3.0.3) for calibration and the 

rest (30%) for validation.  

For model optimization, the objective function (f(c)) was to minimize the mean absolute difference 

between log 10 of the simulated concentration (Csim) and log 10 of observed (Cobs) concentrations, 

see equation 3.3. Specifically, three (annual, melt and baseflow) models for each nutrient were run 

at the same time. The parameter values for each model were fitted using the MATLAB function 

fmincon,  which  uses  the  'interior-point'  algorithm  (Byrd  et  al.,  1999,  2000).  The  sum  of  the 

objective function from each season (annual, melt and baseflow) were used to access TN and TP 

model performance (equation 3.4). 

𝑓(𝑐) = 𝑚𝑒𝑎𝑛(𝑎𝑏𝑠(𝑙𝑜𝑔(𝐶𝑠𝑖𝑚) − 𝑙𝑜𝑔(𝐶𝑜𝑏𝑠)))       

 (3.3) 

𝑓(𝑐)𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 = 𝑓(𝑐)𝑎𝑛𝑛𝑢𝑎𝑙 + 𝑓(𝑐)𝑚𝑒𝑙𝑡 + 𝑓(𝑐)𝑏𝑎𝑠𝑒𝑓𝑙𝑜𝑤 

(3.4) 

Model residual (error) was calculated using equation 3.5 below. Thus, a residual of 1 indicates an 

order magnitude overprediction by the SENSEflux model. Positive errors indicate overprediction 

(ratio  of  model-simulated  to  observed  is  >  1)  by  the  model,  whereas  negative  errors  represent 

underpredictions (ratio of model-simulated to observed is < 1).  

𝐸𝑟𝑟𝑜𝑟 = 𝑙𝑜𝑔(𝐿𝑠𝑖𝑚) −  𝑙𝑜𝑔 (𝐿𝑜𝑏𝑠)     

(3.5) 

3.3.4: Seasonal and Static Model Inputs 

Based on SENSEflux model modifications from annual to seasonal introduced in previous section, 

additional seasonal inputs are needed for calibration and optimization. These include (1) the major 

update – surface and groundwater seasonal mobility; (2) seasonal hydraulic radius and velocity 

that are required to derive streambed exchange rate; (3) in-stream travel time; and (4) settling time 

in  lakes.  Groundwater  recharge,  slope  and  hydrologic  conductivity  that  are  used  to  calculate 

streambed exchange rate, overland flow length, and tile drainage remain the same as the annual 

51 

 
 
 
 
 
models in Chapter 2.  

3.3.4.1: Seasonal Mobility Derivation  

Seasonal mobility inferred from hydrologic flux variability is the primary update of SENSEflux 

seasonal models from annual models for comparing nutrient concentrations and fluxes with yearly 

conditions. I first employed the streamflow Hydrograph Separation Program (HYSEP) developed 

by USGS to split the streamflow into baseflow (groundwater discharge) and surface runoff. During 

this  process,  the  sliding  window  method  (Sloto  &  Crouse,  1996)  was  used  for  hydrography 

analysis to separate baseflow for each gauge. I separately calculated average baseflow during the 

Melt,  Baseflow  (here,  denoting  the  late  summer/early  fall  event  period),  and  Annual  periods. 

Surface runoff and baseflow for each period were then used to calculate surface water mobility 

(Ssm) and groundwater mobility (Gsm). Surface seasonal mobility (Ssm) are then calculated as the 

ratio of median surface runoff during the season (i.e., Melt, Baseflow) and Annual median surface 

runoff.  Similarly,  groundwater  seasonal  mobility  (Gsm)  are  computed  as  the  ratio  of  median 

baseflow during the season (i.e., Melt, Baseflow) and Annual median baseflow. See an example 

seasonal mobility calculation for USGS station (ID:04112500) in Figure A3.0.6. 

To calculate seasonal mobility, the following three steps are used for Gsm and Ssm separately. First, 

I applied HYSEP to separate streamflow from each USGS streamflow gauge in the Great Lakes 

Basin, including a 10km buffer, into baseflow and surface runoff components. Here, I modified 

the default HYSEP exponent parameter from 0.2 to 0.3, based on exploratory plotting to better 

estimate baseflow for this region. Next, I computed the median baseflow at each gauge and surface 

runoff for our Melt, Baseflow, and Annual periods. Gsm at each stream gauge for the Melt period 

is defined as the ratio of median baseflow during Melt to median Annual baseflow. Likewise, for 

Gsm during the Baseflow period (note the use of capitalization to indicate the Baseflow period, 

52 

 
 
separate from generic baseflow). Ssm was similarly calculated as the Melt/Baseflow median surface 

runoff  ratio  to  median  Annual.  Finally,  outliers  for  Gsm  and  Ssm  were  removed  based  on  their 

distributions. For Melt, Gsm is limited to 1-5 (Ssm: 1- 20). For Baseflow, Gsm and Ssm are between 0 

and 1.  

I then developed a random forest regression model to predict Gsm and Ssm at the HUC12 scale for 

all such watersheds across the US Great Lakes Basin. I chose the HUC12 scale to represent a fine 

spatial  scale  within  which  watershed  characteristics  are  somewhat  uniform.  I  first  trained  the 

model using individual stream gauge stations and then applied this to the HUC12s in the Basin. 

This model includes three input variables: drainage basin size, hydrologic landscape region (HLR) 

(Winter, 2001), and drift thickness from the USGS (Soller et al., 2012), see Figure A3.0.5a&b. To 

train this random forest model, I used the USGS-reported basin size for the stream gauge locations 

and  extracted  the  HLR  value  and  drift  thickness.  I  then  used  HUC12  watershed  size,  majority 

HLR, and mean drift thickness for prediction at HUC12 scale. The point-scale model with three 

variables provided reasonable seasonal mobility estimation, with r2 = 0.89 during Melt (r2 = 0.85 

during Baseflow) for Gsm (r2 = 0.86 and 0.87 for Ssm). Finally, the random forest regression model 

was used to predict Gsm and Ssm across the basin at Hydrologic Unit Level 12 (HUC12). Figure 

A3.0.7 depicts the maps for seasonal mobility.  

3.3.4.2: Seasonal In-Stream and Lake Attenuation: Hydraulic Radius, Velocity, In-stream Travel 

Time and Settling Time for Lakes  

Stream  attenuation  includes  water  column  loss,  sediment  interface,  and  hyporheic  zone  losses. 

Water column loss is mainly due to biological uptake, and it is a function of in-stream travel time 

across the seasons, see instream travel time during melt and baseflow in Figure A3.0.5c&d. Like 

overland  flow  length,  seasonal  in-stream  travel  time  was  calculated  using  NED  DEM  and  the 

53 

 
 
flowlength function within the ArcGIS Hydrology Toolbox. In this case, a weight raster (1/seasonal 

velocity) was specified. The data of seasonal velocities were computed from the USGS gauge site 

and the method in (Mooney et al., 2020).  

The sediment interface and hyporheic zone losses are determined by a streambed exchange rate 

derived from the streambed sediments' hydraulic conductivity (K), the slope of the stream channel 

(S), and hydraulic radius and in-stream average velocity. See details for this streambed exchange 

rate derivation in Chapter 2. Here for seasonal simulation, I updated the streambed exchange rate 

with seasonal, varied hydraulic radius and velocity, while K, S, and Recharge maintained the same 

throughout the year. More specifically, recharge was assumed by the basin yield of streams at their 

30th percentile, K and S are calculated from NED DEM and the gSSURGO soil dataset.  

Seasonal hydraulic radius was computed for each USGS stream gauge along the stream channel. 

First,  flows  and  geometries  for  USGS  gauge  stations  (i.e.,  width,  basin  area)  are  downloaded. 

Assuming the stream channels are rectangle, I calculate the river depth (D) as area (A) dividing by 

width (W). Hydraulic radius (R) is calculated as the ratio between channel area (A) and the wetted 

perimeter of the stream channel (P) where P = 2D+W.  

Seasonal SENSEflux models represent lake attenuation with a simple uptake term proportional to 

the setting time of nutrient flow paths crossing lacustrine-classified wetlands. The surface area of 

the lake was derived from the National Wetland Inventory (NWI) lacustrine-class wetlands. Then, 

seasonal hydraulic loadings were calculated by dividing seasonal streamflow by surface area. As 

settling time, the inverse of hydraulic loading is then calculated. 

3.3.4.3: Static Landscape Inputs 

Static  inputs  include  (1)  nutrient  source  inputs;  (2)  groundwater  recharge;  (3)  hydraulic 

conductivity (ksat) of the streambed sediments and the average slope of the stream channel (slope); 

54 

 
 
and  (4)  overland  flow  length  and  tile  drainage.  These  inputs  are  the  same  as  those  used  in  the 

annual SENSEflux models (Chapter 2), and thus the same across both hydrologic season models. 

Seven spatially explicit nutrient sources from SENSEmap (Hamlin et al., 2020a) serve as nutrient 

inputs  for  the  SENSEflux  model.  These  sources  include  three  agricultural  terms  (chemical 

agricultural  fertilizers,  manure,  and  N  fixation),  three  urban  terms  (chemical  non-agricultural 

fertilizers, septic tanks, and point sources), and atmospheric deposition. See maps in Chapter 2.  

Groundwater recharge was estimated using a series of linear models derived from the Landscape 

Hydrology  Model  (LHM),  which  combines  various  GIS  layers  to  predict  stream  discharge, 

groundwater  recharge,  and  evapotranspiration  (Hyndman  et  al.,  2007).  In  SENSEflux, 

groundwater  recharge  determines  the  fraction  of  nutrient  transport  between  surface  and 

groundwater  pathways.  It  also  supports  streambed  exchange  rate  calculation  for  nutrient  river 

attenuation, along with two additional factors ksat and slope.  

The overland flow length is calculated using the flowlength function and DEM. The tile drainage 

layer was derived as an alternate overland flow pathway, assuming cropland areas with moderately 

low soil permeability and low average slopes are likely tile-drained. In addition to two independent 

parameters (bs (seasonally-varied) and bst), the overland flow length and tile drainage layer are 

static landscape factors that support basin nutrient attenuation.  

3.4: Results 

3.4.1: Model Accuracy  

Six SENSEflux TN and TP models with optimized parameters (Table A3.1) provided estimates of 

TN and TP fluxes to the Great Lakes from their US drainage basins. These models performed well 

(Figure 3.1), and TN models performed better than their TP counterparts, indicated by higher R 

square  values  except  for  TN  calibration  dataset,  with  a  difference  of  -3%,  12%,  and  9%  for 

55 

 
 
calibration  (20%,  14%,  and  13%  for  validation)  in  annual,  melt,  and  baseflow.  Six  models 

corresponded reasonably well to the 1:1 line (black dashed line in Figure 3.1), with TN and TP 

models slightly overestimating low loads and underestimating high loads, respectively. Also, the 

slopes of the line of best fit are near 1 (0.7 to 0.96 for TN and 0.73 to 0.86 for TP), indicating a 

reasonable bias. In addition, the MAEL values for annual and baseflow are comparable for the TN 

and TP models, but the TN snowmelt model has lower MAEL values (~0.2) than the TP snowmelt 

model, making it the best-performing model of the six.  

The median of model residuals (common logarithm of simulated load minus observed load) for 

calibration  and  validation  watersheds  are  shown  in  Figure  A3.0.8.  Residual  analyses  reveal  no 

significant (P > 0.05) bias for the TN snowmelt model and TP annual model based on a one-sample 

t-test; however, the mean residuals of the TN annual model, the TP melt model, and the baseflow 

models  for  TN  and  TP  are  significantly  different  from  zero  (P  <  0.05).  The  residuals  (Figure 

A3.0.9) are significantly (P < 0.01) clustered with the Moran’s Index of 0.42, 0.30, and 0.43 for 

TN annual, snowmelt, and baseflow models, respectively (0.41, 0.37, and 0.50 for TP models).  

56 

 
 
 
Figure 3.1 Plot of log10 simulated and observed daily loads for model calibration and validation 
datasets. Solid red and red lines indicate linear regression fits, the dashed black lines are the 1:1 
line, and n refers to the number of observation points for model calibration and validation.  

3.4.2: Seasonally Variable and Spatially Explicit Nutrients Delivery to Lakes 

Simulated  TN  and  TP  flux  within  the  USGLB  varied  substantially  across  the  seasons.  For  TN 

57 

 
 
 
(Figure 3.2a), simulated loading across the lake basin and in USGLB are highest in the snowmelt 

season, followed by annual condition and baseflow. There is 1063 t/day of nitrogen delivered from 

US lands to the lakes during snowmelt, and this is about 2.8 times greater than the annual delivery 

(378 t/day) and 16 times greater than the TN delivery during baseflow (66 t/day). This is largely 

contributed by the Lake Erie basin where 4.5 times greater TN was delivered during snowmelt 

than annual conditions. For other lake basins, TN delivery is the highest during snowmelt, followed 

by annual and baseflow.  

TP has a unique seasonal pattern compared to TN (Figure 3.2). Simulated export of TP across the 

lake  basin  and  in  the  USGLB  is  the  greatest  under  annual  conditions  (28  t/day),  followed  by 

snowmelt  (21  t/day)  and  baseflow  (7  t/day).  This  is  somewhat  unexpected  because  of  greater 

seasonal  mobility  during  snowmelt,  though  the  difference  between  snowmelt  and  annual 

conditions is not large. The unique pattern for TP is likely due to  the dominance of particulate 

phosphorus relative to soluble reactive phosphorus (SRP) in aquatic environments such as rivers 

and lakes. And particulate phosphorus is typically associated with sediment, organic matter, and 

other particles suspended in the water column. Unlike TN, nitrate is typically the most abundant 

form  in  aquatic  systems,  and  its  high  solubility  in  water  makes  it  readily  transportable  in  the 

environment.  In  addition,  the  SENSEflux  TP  annual  model  overpredicts  and  the  TP  snowmelt 

model underpredicts significantly (Figure A3.0.9).  

Due to its drainage falling entirely within the USGLB, Lake Michigan receives the most TN (156 

t/day) and TP loads (12 t/day) on an annual basis compared to other lake basins. Lake Michigan 

has  the  highest  phosphorus  delivery  on  an  annual  basis  and  across  two  hydrological  seasons 

(baseflow and snowmelt). Lake Erie receives the highest TN delivery during snowmelt (453 t/day) 

and baseflow  (25 t/day), followed by Lake Michigan (334  t/day during snowmelt and  24 t/day 

58 

 
 
during baseflow). This is most likely because the western basin of Lake Erie has a relatively flat 

topography  with  minimal  relief.  This  region  has  generally  shallow  groundwater,  making  it 

susceptible to nutrient contamination during baseflow.  

Figure 3.2 Summarized TN and TP flux (a&b) within USGLB and flux (c&d) by lake basin within 
USGLB. 

59 

 
 
 
 
In addition, I calculated nutrient yields across lake basins, which are nutrient fluxes normalized by 

drainage  basin  area  (Figure  3.3  a&b).  Lake  Erie  has  the  highest  TN  yield,  followed  by  Lakes 

Michigan, Ontario, Huron, and Superior, with 1.9 kg/km2/day on an annual basis, and 8.5 and 0.5 

kg/km2/day for snowmelt and baseflow. Lake Superior has the lowest yield of total nitrogen among 

the five lake basins regardless of its annual condition or hydrologic seasons. The annual TN yield 

for  Lakes  Michigan,  Ontario,  and  Huron  is  between  1.18  and  1.39  kg/km2/day,  2.89  to  3.32 

kg/km2/day during snowmelt, and 0.14 to 0.26 kg/km2/day during baseflow. Lake Erie, like TN, 

has the greatest TP yield. Lakes Ontario, Michigan, and Huron follow with ~0.1 kg/km2/day on an 

annual  basin,  ~0.07  kg/km2/day  for  snowmelt  and  ~0.02  kg/km2/day  for  baseflow.  This  is 

consistent  with  Lake  Erie’s  presence  of  HABs  and  its  overall  low  water  quality  (Michalak  et 

al.,2013). Regardless of its annual conditions or hydrologic seasons, Lake Superior has the lowest 

TP yield among the five lake basins. 

Figure 3.3 Predicted (a) TN and (b) TP yields (kg/day/km2) to lakes in annual, melt, and baseflow. 
(c)&(e)&(g)  are  spatially  TN  yields  for  annual,  melt  and  baseflow  with  a  shared  legend 
((d)&(f)&(h) for TP). These maps are direct outputs from SENSEflux, then resampled to 720 m 
resolution  for  display  purposes  and  classified  in  quantiles,  within  which  each  color  represents 
~20% of the USGLB area. 

60 

 
 
 
 
Figure 3.3 (cont’d) 

We also found that simulated export of TN and TP varied substantially across seasons within the 

USGLB,  with  the  highest  TN  export  in  snowmelt  and  the  lowest  TN  and  TP  delivery  during 

baseflow (Figure 3.3, c-h). During snowmelt, most of Lake Ontario and Erie basin and the southern 

part of Lake Michigan and Huron basin exported higher than 641 kg/yr/km2 for nitrogen and 28.92 

kg/yr/km2  for  phosphorus.  While  in  baseflow,  most  of  these  areas  exported  lower  than  267 

kg/yr/km2 for nitrogen and 9.93 kg/yr/km2 for phosphorus.  

Overall, snowmelt increased TN flux to the Great Lakes while fewer TN and TP were delivered 

during  baseflow  compared  with  annual  conditions.  The  highest  nutrient  flux  from  the  Lake 

61 

 
 
Michigan basin is because of its largest drainage basin size, while Lake Erie receives the highest 

nutrient yield (flux per area).  

3.4.3: Nutrient Sources and Pathways Across Hydrologic Seasons 

The source contribution was highly variable across seasons for TN and TP with USGLB (Figure 

3.4).  Agricultural  sources  (chemical  agricultural  fertilizer,  manure,  and  N  fixation)  were  the 

dominant TN source annually (42%), and 66% and 52% for snowmelt and baseflow. Moreover, 

chemical agricultural fertilizer, manure, and N fixation contributed 8.2%, 2.8%, and 3% higher 

during  melt  than  baseflow.  Agricultural  sources  (chemical  agricultural  fertilizer  and  manure) 

accounted for 62% annually and 59%  and 54%  during melt  and baseflow for phosphorus. The 

contribution of agricultural chemical fertilizer and manure for TP was substantially higher (3.6% 

and 1.5%) during melt than baseflow.  

Atmospheric deposition and urban sources (point source, chemical non-agricultural fertilizer, and 

septic  tanks)  were  also  crucial  to  TN  and  TP  delivery.  Within  this,  atmospheric  deposition 

contributed  more  significantly  to  TN,  while  urban  sources  dominated  TP.  Remarkably, 

atmospheric  deposition  contributed  the  same  with  agricultural  sources  for  TN  delivery  and 

accounted for 41% annually, but much less than agricultural sources during snowmelt (25%) and 

baseflow (14%). Urban sources accounted for 17%, 9%, and 34% of annual, melt, and baseflow, 

respectively. However, these have been found differently for TP delivery. Urban sources (point 

source,  chemical  non-agricultural  fertilizer,  and  septic  tanks)  play  a  more  critical  role  than 

atmospheric deposition for phosphorus, with 27%, 30%, and 38% for annual, melt, and baseflow, 

respectively,  comparing  11%,  11%,  and  8%  of  TP  sources  coming  from  the  atmospheric 

deposition. Notably, point sources played a significant role in TN and TP delivery, particularly 

during baseflow with 23% of TN and 20% of TP coming from point sources, which is also shown 

62 

 
 
in the pathways (Figure 3.5) Results also show that septic tanks are a more significant source of 

TP than TN, contributing 6% (annually), 6% (snowmelt) and 3% (baseflow) higher TP than TN.  

Figure 3.4 Estimated percent of nutrients delivered to the USGLB by annual, melt, and baseflow 
sources. 

63 

 
 
 
 
Figure 3.5 Estimated percent of total nutrients in USGLB delivered to lakes by pathways. Points 
are drawn for annual, melt, and baseflow with different colors, with the difference highlighted with 
segments. Note: groundwater pathway in this diagram indicates bulk groundwater transport. 

Surface pathway (overland flow and tile fields) was the major nutrient transport pathway, totally 

contributing to 88% (66%), 94% (65%), and 63% (52%) of TN (TP) delivery during annual, melt 

and baseflow (Figure 3.5). Within this, TN was dominated by the tile drainage pathway while more 

64 

 
 
 
TP was transported through overland flow regardless of the seasons, except there are 37% more 

TN was transported by overland on an annual basis. In addition, subsurface pathways, including 

bulk  groundwater  and  septic  plumes,  played  a  significant  role,  especially  for  TP.  For  instance, 

septic plumes transported 9 to 12% of TP, and bulk groundwater transported about 12 to 16% of 

TP across the seasons.  

In summary, the relative contributions of each source and pathway to the total TN and TP delivered 

to the Great Lakes were highly variable seasonally within USGLB, with a higher proportion of 

nutrients dominated by agricultural sources delivered through surface pathways (tile drainage (TN) 

and  overland  runoff  (TP))  and  more  nutrients  coming  from  urban  sources  delivered  through 

subsurface pathways (groundwater and septic plumes) during baseflow.  

3.4.4: Seasonal Delivery Hotspots 

The  spatially  explicit  estimates  of  annual  and  seasonal  nutrient  deliveries  provide  a  novel 

landscape perspective:  seasonal  delivery hotpots. The hotpot incorporates seasonal  delivery (x-

axis in Figure 3.6) and the delivery ratio, which is defined as the ratio of snowmelt and baseflow 

deliveries  to  annual  deliveries  (y-axis).  The  seasonal  delivery  and  delivery  ratio  are  quantified 

using terciles and presented in bivariate colormaps for melt and baseflow (Figure 3.6). Targeting 

nutrient reductions during snowmelt is more effective than focusing on baseflow, especially for 

TN.  

During the snowmelt season, locations with both high delivery (highest 33%) and delivery ratio 

(highest  33%)  are  the  most  intense  delivery  area  to  the  coastline  (Figure  3.6,  a&b).  These  are 

predominantly agricultural regions, particularly for TN, and include the western Lake Erie basin, 

coastal  areas  in  western  Lake  Michigan,  and  thumb  areas  in  Michigan.  Targeting  nutrient 

reductions  in  these  regions  during  the  snowmelt  season  will  reduce  the  summertime  nutrient 

65 

 
 
concentration in these basins and the Great Lakes most effectively. Next, areas with low delivery 

(lowest 33%) but high delivery ratio (highest 33%, colored magenta on the hotspot maps) should 

be targeted for reductions during the snowmelt season. These areas are primarily located in the 

northern basin. 

Figure  3.6  Seasonal  nutrient  delivery  hotspots  are  determined  as  high  delivery  (x-axis)  and 
delivery ratio on hydrologic seasons compared with average annual condition (y-axis) in bivariate 
choropleth maps for TN (a&c) and TP (b&d) during snowmelt and baseflow. BF represents the 
baseflow season.  

During  baseflow,  the  delivery  ratios  for  TN  and  TP  are  less  than  one,  indicating  that  nutrient 

reduction  efforts  during  baseflow  would  have  been  less  effective  than  during  annual  average 

conditions. Thus, the optimal location for nutrient reductions during baseflow would be those with 

high nutrient delivery (highest 33%, colored as green).  

3.5: Discussion 

SENSEflux  annual  model  performances  are  comparable  to  the  SPARROW  models  for  the  US 

66 

 
 
 
Great Lakes Basin presented by (D. M. Robertson et al., 2019) (Figure A3.0.10). Lake Michigan 

had the highest  nutrient  flux because of the largest  drainage basin  size, consistent with (D. M. 

Robertson et al., 2019). The ranking of the highest yield across lake basins varied across seasons. 

On an annual basis, the Lake Superior basin and northern Lake Huron basin had much lower TN 

and TP delivery, likely because of low agricultural activity and sparse population. In addition, TN 

and TP yields are relatively high in the southern Lake Michigan basin, Saginaw Bay, and Lake 

Ontario  basins,  likely  due  to  high  chemical  fertilizer  and  manure  inputs  and  high  groundwater 

recharge.  

Lake Erie had the highest TN and TP yields, and this could be due to many reasons. There is a 

high density of agriculture with high tile drainage intensity in the Lake Erie basin,  agricultural 

sources teeming with nitrogen and phosphorus, which spurs algae and bacteria growth and directly 

stimulates the hypoxia and dead zone in Lake Erie. Besides, Lake Erie is the southernmost and 

warmest of the Great Lakes (Bockheim, 2020). Earlier snowmelt in the late winter and early spring 

increased nutrient flushing rate, with high nutrient delivery during snowmelt while relatively low 

TN and TP exports during baseflow (Figure 3.3, e-h). Thus, nutrient loadings are considered the 

primary driver of HABs in Lake Erie, associated with the eutrophication of Lake Erie (Michalak 

et al., 2013; Scavia et al., 2014; Watson et al., 2016). This information provides valuable timing 

information for targeting the total flux/yields to the Great Lakes from the lake basin level.  

3.5.1: Seasonal Critical Sources and Pathways 

Point sources contributed a higher proportion of TN and TP delivery to the Great Lakes (Figure 

3.5),  particularly  during  baseflow,  this  is  likely  due  to  the  pollutants  being  concentrated  more 

during a low flow condition. Thus, it’s important to monitor and regulate point source discharges 

to  ensure  that  they  are  meeting  established  standards.  Non-point  source  pollution  is  still  a 

67 

 
 
significant  concern as it stems  from  many different  sources and locations. Agricultural  sources 

(chemical  agricultural  fertilizer,  manure,  and  N  fixation)  were  the  leading  cause  of  nutrient 

pollution across the USGLB, which is consistent with other studies (Richards et al., 2012; D. M. 

Robertson et al., 2019).  

However,  agricultural  sources  were  not  always  the  leading  cause  of  water  pollution  across  the 

Great  Lake  basins.  For  instance,  atmospheric  deposition  was  the  primary  TN  and  TP  source 

(Figure  A3.0.11)  in  Lake  Superior.  Therefore,  it  is  crucial  to  regulate  atmospheric  deposition, 

especially for watersheds forested with little agricultural or industrial activities (Foley & Betterton, 

2019).  Many  nutrient  sources  and  loading  models  do  not  specifically  identify  atmospheric 

deposition  of  TP,  instead  attributing  to  the  forest  and  urban  area  sources  and  other  terms  in 

agricultural areas like in the USGS SPARROW model (Richards et al., 2012; D. M. Robertson et 

al., 2019; R. A. Smith, Schwarz, et al., 1997), thus may lead to underestimation of TP loads in 

such models. 

Besides,  agricultural  sources  do  not  always  remain  stationary  on  the  landscape  where  they  are 

applied. Snowmelt increased TN delivery to the Great Lakes (Figure 3.4). Due to heavy snowfalls 

in the winter and early spring, the rapid release of water from melting snow washing manure and 

commercial fertilizer flow down the gradient through overland flow and tile fields into the streams 

and rivers (Figures 3.4 and 3.5). These are consistent with research that has found nitrogen and 

phosphorus  exports  are  strongly  associated  with  seasonal  hydrology,  such  as  snowmelt  runoff 

being a critical period of nutrient export in a cold climate (Clement & Steinman, 2017; Corriveau 

et al., 2013; Rattan et  al., 2017, 2019). Thus, it is critical to manage the timing and amount of 

fertilizer applications (T. Guo et al., 2021). 

In  addition,  agricultural  tile  drainage  has  been  recognized  as  one  of  the  dominant  TN  delivery 

68 

 
 
pathways (Figure 3.5). Nutrients can quickly move to tile drains through preferential flow paths, 

cracks, and fissures in the soil, root channels, and other macropores and escape from the field. 

Considering  much  of  the  most  productive  farmland  and  extensive  use  of  tile  drainage  in  the 

USGLB, and climate change with warmer temperatures and more extreme events, it is significant 

to  determine  the  timing  of  nutrient  application  and  implement  cost-effective  drainage  water 

management. 

Lastly, septic tanks accounted for 12% of TP delivery during baseflow (Figure 3.4). Research has 

found that a significant proportion of nutrient delivery comes from septic tank systems under low 

flow conditions and thus causes eutrophication risk (P. J. A. Withers et al., 2012). Furthermore, 

current regulations and management are insufficient to effectively guarantee the septic systems' 

function (P. J. Withers et al., 2014). For instance, water pollution from septic tanks is a severe but 

under-appreciated problem across Michigan, where 10% of on-site wastewater treatment systems 

have failed and are polluting the environment. Thus, it is critical to regulate on-site septic systems 

and establish a statewide code governing how they are designed, installed, and maintained in states 

like Michigan, thus protecting public health, safety, and welfare. 

3.5.2: Implications, Limitations, and Future Work 

There is great interest in reducing nutrient loading to the surface and groundwater. Here I present 

a  spatially  explicit,  regional  analysis  of  seasonal  nutrient  export  that  I  calibrated  with 

comprehensive  TN  and  TP  concentration  measurements.  SENSEflux  uses  spatially  explicit 

nutrient sources and seasonal explicit environmental factors and provides insights into seasonal 

aspects  of  TN  and  TP  loading,  sources,  and  pathways  with  USGLB,  thus  improving  our 

understanding of the timing and extent of hypoxic areas and algal blooms related to coastal nutrient 

loadings,  consequently,  allows  us  to  optimize  nutrient  reduction  strategies.  This  work  has 

69 

 
 
demonstrated that SENSEflux has the potential to perform well in other regions which are facing 

water pollution challenges. Moreover, it could be applied for other periods, such as the summer 

rainy  season,  to  understand  seasonal  nutrient  transport  more  accurately.  The  spatially  explicit 

loading,  source,  and  pathway  estimates  for  USGLB,  along  with  findings,  can  be  easily 

communicated with stakeholders through a web-based support system such as the Tipping Point 

Planner program (TPP, tippingpointplanner.org) as they can access the results and explore details 

of SENSEflux outputs without the assistance of technical experts or special programming training 

(Weinstein  et  al.,  2021).  TPP  links  data  to  local  decision-making  processes.  Thus,  adding 

SENSEflux  outputs  will  help  managers  prioritize  their  actions  in  a  time-  and  location-based 

manner and move toward the desired direction.  

However,  I  acknowledge that there are some limitations in  our study. Firstly,  groundwater has 

been proven to be a vital nutrient transport pathway, although it is often ignored in most existing 

models. SENSEflux uses an overland flow length to describe groundwater attenuation and uses a 

surface watershed boundary to determine nutrient inputs to the groundwater pathway. Including 

factors  such  as  groundwater  travel  time  would  improve  load,  source,  and  pathway  predictions, 

especially  for  ground  watersheds  with  significantly  different  boundary  conditions  from  surface 

watersheds. Besides, as I don’t have seasonal nutrient source inputs such as allocating fertilizer 

and manure to seasons based on phenology and fertilizer practice, SENSEflux simulated seasonal 

nutrient loading to the lakes using the annual average.  

For future research, establishing robust monitoring networks such as sampling throughout the non-

growing season, particularly the winter and spring thaw periods, is critical to truly understanding 

the timing and quantity of nutrient loss. Besides, nutrient transport models could be refined by 

considering climate, cropping characteristics, and regional water quality targets. Climate change 

70 

 
 
has  been  playing  an  essential  role  in  water  and  nutrient  transport.  As  climate  change  brings 

warming  temperatures  and more frequent  and intense storm  events,  there is  a need to  consider 

climate change as a factor when planning future nutrient pollution programs to track changes over 

a targeted time frame. In addition, human activities have been altering hydrological processes and 

transporting  nutrients  as  well,  thus  explicitly  considering  and  evaluating  the  effectiveness  of 

constructed wetlands, stream buffers, agricultural practices (i.e., cover crop, conservation tillage), 

nutrient management, which could further help reduce nutrient losses (Lintern et al., 2020). These 

studies need to be ongoing to help practical management efforts at improving watershed health 

and sustainability.  

3.6: Acknowledgments 

This work was supported by NASA grants 80NSSC17K0262 and NNX11AC72G, NOAA Grant 

NA12OAR4320071,  and  GS100013.  Luwen  Wan  was  partially  supported  by  the  China 

Scholarship  Council 

(CSC)  scholarship.  The  statements, 

findings,  conclusions,  and 

recommendations  are  those  of  the  authors  and  do  not  necessarily  reflect  the  views  of  NASA, 

NOAA, or CSC. 

71 

 
 
 
CHAPTER 4: SEETILEDRAIN: SPATIALLY EXPLICIT ESTIMATE OF TILE DRAINAGE 

USING REMOTE SENSING AND MACHINE LEARNING 

4.1: Abstract 

Tile  drainage  has  become  increasingly  prevalent  in  the  US  Midwest  due  to  agricultural 

intensification and extreme climate events. This region would not be one of the most productive 

agricultural areas without its drainage systems. However, the existing tile drainage data are limited 

to  survey-based statistics at  the county level  and likely  tile-drained areas  that  are  mapped with 

geospatial  analysis.  The  lack  of  spatially  explicit  and  fine-resolution  information  about  tile 

drainage extent limits  the accuracy of hydrologic modeling  and the implementation of nutrient 

reduction strategies. To fill this gap, I developed a machine-learning model that maps agricultural 

tile drainage across the US Midwest at a 30-m resolution. The model used 20 satellite-derived and 

environmental variables and was trained with 43,165 tile and non-tile ground truth points based 

on the Google Earth Engine cloud-computing platform. The results show that our model achieved 

good accuracy, with 95.5% of points classified correctly and an F1 score of 0.904. The classified 

tile drainage area at the county scale has a reasonable agreement (r2 = 0.69) with the reported area 

from  the  USDA-NASS.  I  also  used  variable  importance  metrics  and  accumulated  effects  to 

interpret the machine learning model. Land Surface Temperature  (LST) variables, and  climate- 

and  soil-related  variables  are  the  most  important.  The  top-ranking  variable  is  the  median  of 

nighttime  LST  during  the  summer  based  on  the  overall  importance,  followed  by  median  soil 

moisture percent in the summer. The results may provide useful information for tile monitoring 

and may improve the accuracy of hydrologic and nutrient  simulations to achieve cost-effective 

agricultural  water  and  nutrient  management.  Furthermore,  the  machine  learning  algorithms 

developed in this study could be applied to other remote sensing mapping applications. 

72 

 
 
4.2: Introduction 

The use of drainage systems to remove excess water and promote crop productivity is widespread 

in poorly drained and humid regions worldwide where experience high levels of precipitation and 

have relatively high-water tables, such as Europe and North America. Denmark, for example, has 

artificially drained around 50% of its croplands, with most relying on tile drains (Møller et al., 

2018). Similarly, 14% of agricultural land in Germany is tile-drained (Hirt & Volk, 2011). Canada 

and the United States have 14% and 27%, respectively, of their farmland artificially drained, using 

either surface or subsurface drainage (ICID, 2018; Kokulan, 2019). The investment in tile drainage 

is largely driven by the growing occurrence of heavy rainfall events and the resulting benefits of 

tile drainage installation  to  crop yields as  it provides a  well-drained  environment for the crops 

(King, Williams, Macrae, et al., 2015).  

Tile drainage offers several major benefits for agricultural land systems. For instance, it helps to 

maintain  soil  structure  by  preventing  compaction  caused  by  waterlogged  conditions,  develop 

healthier and deeper root systems, provide moisture conditions for optimal crop growth, extend 

growing  season,  and  utimately  help  farmers  achieve  more  consistent  and  productive  harvests. 

Although tile drainage offers numerous benefits, it also serves as a significant pathway for water 

and  nutrient  transport.  Previous  research  primarily  focused  on  the  hydrological  effects  of  tile 

drainage,  including  evapotranspiration  and  hydrological  flux  metrics  such  as  discharge, 

streamflow index, and stormflow. While some studies have reported minor effects on hydrology 

(Khand et al., 2017; S. A. Miller & Lyon, 2021a), such as a slight decrease in annual cumulative 

evapotranspiration  (Yang  et  al.,  2016),  others  identified  significant  alterations  to  hydrological 

systems  resulting  from  agricultural  tile  drainage  (Thomas  et  al.,  2016).  Tile  drainage  is  an 

important contributor to watershed discharge, accounting for up to 47% of discharge in an Ohio 

73 

 
 
headwater watershed with seasonal variation (King et al., 2014). It also accounts for a substantial 

portion (46-54%) of annual discharge between March and June in the Boone River basin, Iowa 

(Schilling et al., 2019).  

Agricultural tile drainage systems also serve as an essential nutrient pathway, potentially leading 

to eutrophication concerns in receiving water bodies. The prevalence of extensive agricultural tile 

drainage in the Midwestern US has significant implications for surface water nutrient pollution, 

including  the  formation  of  "dead  zones".  Extensive  research  has  been  conducted  on  nitrogen, 

phosphorus,  and  their  species,  such  as  nitrate,  and  it  has  been  established  that  tile  drainage 

amplifies nutrient losses from agricultural lands (King, Williams, Macrae, et al., 2015; Ma et al., 

2023; M. P. Miller et al., 2017; S. A. Miller et al., 2022; Ren et al., 2022). For instance, Chapter 

2&3 of this dissertation found that tile drainage contributes 46% of total nitrogen and 36% of total 

phosphorus  annually  to  the  US  Great  Lakes  Basin,  with  even  greater  nutrient  transport  during 

snowmelt. Other studies have highlighted the crucial role of tile drainage in water and nutrient 

transport in sub-watersheds of the Great Lakes. For instance, (D. Smith et al., 2015) demonstrated 

that  tile  drainage  discharge  was  responsible  for  49%  of  soluble  phosphorus  and  48%  of  total 

phosphorus losses in the St. Joseph River watershed located in northeast Indiana, northwest Ohio 

and  south-central  Michigan,  US  and  accounted  for  approximately  30%  of  nitrogen  (primarily 

nitrate-nitrogen) (Ren et al., 2022). The water quality impacts of tile drainage have raised concerns 

about drinking water safety in Toledo, Ohio, and the large anaerobic zone in the Gulf of Mexico 

due to nutrient loading from the Mississippi River (Rabalais & Turner, 2019; D. Smith et al., 2015). 

Overall, tile drainage's hydrological and water quality impacts are subject to spatial heterogeneity, 

primarily due to the unique characteristics of different watersheds, changing climate patterns, and 

human activities. However, a significant factor contributing to this issue is the need for spatially 

74 

 
 
explicit tile drainage maps, especially at large scales. The lack of tile drainage information and 

spatial variability makes it challenging to study the impacts of tile drainage on crop growth and 

production, hydrological cycle, and nutrient leaching, especially at large scales.  

There are multiple approaches to estimate tile drainage extent, which have typically reported the 

percentage of tile drainage area at the county level or identified probable tile drainage locations. 

The USDA Agricultural Census, for instance, estimates the tile drainage area (identified as 'drained 

by tile') by surveying farmers in the CONUS counties every five years (NASS, 2012, 2017). The 

data are gathered at the county level from questionnaires sent to farmers and the quality of collected 

information depends on a high level of participation and respondents thus might not be consistently 

accurate (Wan et al., 2017). Geospatial analysis has also been used to detect likely tile-drained 

areas, with assumptions that agricultural areas with flat and poorly drained soils are likely to have 

tile drainage installed. (Sugg, 2007) indicates that a combination of soil class and hydrological 

properties of soil and soil drainage class would produce the most accurate estimates for different 

areas, while GIS analysis based on soil and land cover information can help estimate tile drainage 

extent  in  highly  drained  areas.  Several  recent  studies  have  used  soil  drainage  information  and 

topographic slope thresholds to estimate tile drainage areas (Jame et al., 2022; Valayamkunnath et 

al., 2020). However, geospatial analysis can only identify likely tile-drained areas and is dependent 

on estimates from the USDA Census of Agriculture (Ag Census). These estimates result in large 

uncertainties in estimates of the tile-drained extent without spatially explicit information  as Ag 

Census data is collected by county rather than by farm (NASS, 2017). 

There are alternative methods to identify tile drainage include thermal and aerial image processing 

techniques (Naz & Bowling, 2008; Prinds et al., 2019; Tilahun & Seyoum, 2020; Woo et al., 2019). 

For example, researchers mapped individual tile drains and estimated drainage spacing using high-

75 

 
 
resolution aerial imagery within the Hoagland watershed in west-central Indiana (Naz et al., 2009; 

Naz & Bowling, 2008). However, the estimates of tile drainage may not be accurate in the presence 

of crop residue and other spatial features with similar spectral response as tile drainage. Another 

study (Gökkaya et al., 2017) used an image differencing approach to delineate tile drainage area 

for the Shatto Ditch watershed in Indiana by comparing shortwave infrared reflectance (SWIR) 

before and after a ~2.5cm rainfall event, as SWIR is strongly related to soil moisture, and soil with 

tile  drainage  dries  relatively  faster,  thus  has  higher  SWIR.  However,  such  aerial  imagery 

approaches  can  be  expensive,  and  the  image  differencing  approach  is  limited  by  weather 

conditions (i.e., rainfall intensity and cloud cover). Recently, Cho et al. (2019) integrated variables 

from satellite imagery and existing data sources, such as land surface temperature, climate, soil 

moisture, and properties, to identify subsurface drainage in the Red River basin and the Bois de 

Sioux Watershed in Minnesota for multiple years (Cho et al., 2019). 

Currently, there is a lack of spatially explicit and well-validated information on tile drainage extent 

in the US Midwest, which includes ~93% of tile drainage in the United States (NASS, 2017). In 

this study, I used the Google Earth Engine computing platform to map tile drainage based on 2017 

data by combining a range of satellite-derived, climate- and soil-related variables with thorough 

ground  truth  points.  This  resulting  spatially  explicit  tile  drainage  dataset  can  be  employed  in 

hydrological, water quality, and crop modeling, thus supplying environmental managers with vital 

information  to  enhance agricultural  water and nutrient  management practices. Furthermore, the 

machine learning algorithms used in our study can help monitor changes in tile drainage areas over 

time,  setting  a  baseline  for  predicting  future  tile  drainage  installations  in  response  to  climate 

change and human activities. 

76 

 
 
4.3: Data and Methods 

4.3.1: Study Region  

Our study region includes parts of 14 states, with 12 states in the US Midwest: Illinois, Indiana, 

Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and 

Wisconsin (Bureau, 2010), and two Great Lakes states, Pennsylvania, and New York. This region 

is characterized by a humid continental climate with warm to hot summers, as noted by (Peel et 

al., 2007). Average annual rainfall in the region generally decreases from east to west, with annual 

average precipitation and evapotranspiration of 860 and 634 mm, respectively (Abatzoglou et al., 

2018).  

The Midwest region boasts deep, fertile soils with high concentrations of organic matter, creating 

excellent conditions to grow corn and soybeans. This has earned the region its reputation as the 

"agricultural  heartland"  of  the  US  (Figure  4.1a)  and  has  led  to  it  becoming  one  of  the  most 

cultivated agricultural areas worldwide (FAO, 2017). As depicted in Figure 4.1b, approximately 

70% of the soil in the Midwest is excessively or well-drained, while the remaining 30% fall under 

other  soil  drainage  classes  such  as  very  poorly  drained,  poorly  drained,  and  somewhat  poorly 

drained. The eastern Midwest primarily consists of lowlands, which gradually increase in elevation 

towards the west. The median slope across the region is 3.5%, and Figure 4.1c indicates that the 

mean slope of approximately 60% of the area is less than 3.9%. 

According to the world agricultural production report, the United States is responsible for over 

30% of the world's production of soybeans and corn (USDA, 2023); much of this is produced in 

the  Midwest  region  (34%  of  the  agricultural  area).  The  Midwest  provides  some  of  the  most 

prosperous  farmland  globally,  allowing  the  cultivation  of  crops  including  corn,  soybeans, 

sorghum, alfalfa hay, cotton, wheat, and more. As a result, agriculture plays an important role in 

77 

 
 
the local economies of the US Midwest. 

Figure 4.1 Study region (737 counties defined in (d)) maps showing (a) cropland and non-cropland 
areas based on the National Land Cover Database and the Cropland Dataset Layer; (b) the soil 
drainage class; (c) the mean slope; and (d) the tile drainage acres from the 2017 Ag Census. In (d), 
areas with no tile drains or withheld data due to privacy concerns as these counties contain few 
farms are represented as white. 

The US Midwest would not be one of the most productive agricultural areas in the world without 

its tile drainage systems (Fausey et al., 1995). It was reported that 224,190 km2 of land was tile 

drained  across  the  continental  United  States  (CONUS),  in  which  92.9%  (208,358  km2)  of  tile 

drainage is installed in the 14 midwestern states and this accounts for 21.6% of agricultural land 

(NASS, 2017). However, 183 counties among these states (1177 counties) either lack reported data 

due to the absence of tile drains or withhold information to safeguard individual farm privacy (NA 

counties, Figure4.2d). To ensure a continuous boundary while excluding most NA counties, our 

study region is limited to 737 counties delineated by the heavy blue line in Figure 4.1d (this subset 

78 

 
 
 
is hereafter referred to as the 'US Midwest'), which accounts for 91.7% (204,842 km2) of the tile-

drained area in CONUS (Figure 4.1d). 

4.3.2: Tile and Non-tile Ground Truth Point for Classification   

4.3.2.1: Tile Drainage Ground Truth Points 

The workflow for this study is shown in Figure 4.2. I generated a regional database of tiled and 

likely non-tiled ground truth points from various sources across the  US Midwest region. These 

points  were  collected  to  compare  variable  differences  between  the  two-point  groups,  train  a 

random forest machine learning model and perform an accuracy assessment at the pixel level for 

the classification. 

Observed  tile  drainage  points  (Figure  4.3)  were  obtained  from  point,  polyline,  and  polygon 

features. To compile point tile drainage features, a proportion of tile drainage points were identified 

through visual  interpretation using the aerial  imagery base map from Google Earth.  Initially,  a 

shared cropland area was identified from the Cropland Data Layer (CDL) in 2017 and the National 

Land Cover Dataset (NLCD) in 2016 since NCLD has ~3% more cultivated land than cropland in 

the  CDL  (USDA-NASS,  2017;  USGS,  2016).  The  shared  cropland  area  then  overlaps  with  an 

existing, most likely tile-drained layer called US-AgTile (Valayamkunnath et al., 2020), resulting 

in the identification of potential tile-drained areas. Subsequently, likely tile-drained points were 

randomly sampled from  these potential tile-drained areas and visually interpreted based on tile 

drain patterns, spaces, and canal ditches around the fields. 

79 

 
 
Figure  4.2  Workflow  diagram  illustrating  the  random  forest  classification  method  to  create 
agricultural tile drainage maps across the US Midwest. 

Another set of tile drainage points was obtained from an existing study that used a geospatial model 

to determine the likely tile-drained areas of the CONUS. This model integrated a cropland mask, 

topographic  slope,  and  soil  drainage  information  with  county-level  tile  drainage  census  data 

(Valayamkunnath  et  al.,  2020).  These  points  were  manually  identified  from  the  ERSI  multi-

resolution aerial imagery base map between October 2, 2012, and April 23, 2017, using the ESRI 

ArcMap identification tool. 

80 

 
 
 
Figure 4.3 Map of ground truth tile drainage points sourced from literature (orange), tile drainage 
permits (green), and visual interpretation from Google Earth aerial imagery (blue). The training 
and testing tile drainage points are 500 * 500-meter apart, indicated as different point shapes. The 
study area's counties are highlighted in gray, with the ones reported by USDA-NASS to have tile 
drainage indicated in light gray and areas marked as "NA" indicate no tile drains or withheld data 
due to privacy. 

I also obtained tile drainage points from polygon and polyline features. To ensure their accuracy, 

I randomly sampled points from these features and then excluded any that were too close to the 

edge  of  polygons  and  polylines.  The  polygons  are  obtained  from  permits  for  agricultural 

subsurface drainage tile locations provided by USGS  in North  Dakota (Finocchiaro, 2016) and 

South  Dakota  (Finocchiaro,  2014)  and  are  assumed  to  represent  ground  truth  tile  drainage 

measurements. The polylines consist of sources including tile lines and drainage structures in three 

sites in Michigan, US (Blissfield, Clayton, and Palmyra) that were digitized using GIS tools, tile 

line permits from the Bois de Sioux Watershed District, tile lines from the Story County Farm in 

Iowa, and a photo of a tile-drained field in the southeast research farm in Iowa. 

To  avoid  biased  classification  due  to  clustered  tile  points  from  available  data  and  the  spatial 

81 

 
 
 
correlation among adjacent points, a distance limit was set through 500*500-meter fishnet grids. 

(L. Yan & Roy, 2016) reported that the average and median field sizes of US farms are 400 and 

527  meters,  respectively.  Here,  a  threshold  of  500  meters  was  selected  to  ensure  training  and 

testing point separation thus handling co-variation while still maintaining an adequate number of 

points for model training and validation.  

4.3.2.2: Non-Tile Ground Truth Points  

To  identify  non-tile  points,  I  utilize  the  potential  tile-drained  layer  described  in  the  previous 

section. I created a 30-meter buffer zone and obtained the complimentary area within agricultural 

lands, which I define as a likely non-tile layer. I then extracted points at least 500 meters apart 

from this layer and defined them as likely non-tile points in the classification. While these points 

may not accurately reflect actual ground conditions, given the limited availability of non-tile data 

and the associated labor costs for manual identification, utilizing these likely non-tile points is a 

practical approach for classification. 

I employed cross-validation techniques, using different proportions of data, to train and validate 

the model. To validate the model, I randomly choose 30% of non-tile points. Based on the overall 

area ratio (roughly 1:4) of tile and non-tile in US Midwest region of Ag Census (NASS, 2017), the 

number of validation tile drainage points was determined. The remaining tile and non-tile points 

were then used to train the model. 

4.3.3: Effective Variable Selection from Satellite and Environmental Datasets  

To perform random forest classification and measure variable importance, the selection of input 

variables are important. I utilized three steps to select the final variables and determine the optimal 

number  of  variables  for  classification.  First,  I  gathered  variables  from  previous  studies  on  tile 

drainage detection (Cho et al., 2019) and other agricultural practices, such as irrigation mapping 

82 

 
 
(Deines  et  al.,  2017,  2019;  Xie  &  Lark,  2021),  as  both  tile  drainage  and  irrigation  can  affect 

environmental  conditions  and  promote  crop  growth,  which  can  be  reflected  in  vegetation  and 

water-related indices.  

Then, I identified the appropriate periods for these variables by analyzing the differences between 

tile and non-tile points based on two satellite-derived indices: the normalized difference vegetation 

index (NDVI) and the normalized water index (NDWI). These two variables was chosen due to 

the relatively high spatial and temporal resolution of Landsat imagery, as well as NDVI and NDWI 

has been demonstrated to be helpful in identifying tile drains (Cho et al., 2019; C. Zhang et al., 

2014). Periods for other variables then match with these identified time ranges from NDVI and 

NDWI as needed.  

 Finally, a function was executed to successively remove highly correlated and lesser important 

variables  to  handle  covariation  and  provide  robust  performance  for  variable  importance 

measurements. Within this, a Pearson correlation coefficient (r) was calculated between each pair 

of all variables. If the correlation exceeded a threshold (r > 0.8), we retained only the variable with 

a higher variable importance from the initial classification. Note, there is no standard ‘strength of 

correlation’ categories, they are domain specific and vary across different studies (Schober et al., 

2018). Here, we define 0.8 as the threshold for strong correlation.  

The satellite and environmental datasets used in this study include (1) Atmospherically corrected 

surface reflectance derived from the data produced by Landsat 7 ETM+ and  Landsat 8 Operational 

Land  Imager  (OLI)/Thermal  Infrared  Sensr  (TIRS)  sensors  (USGS,  2017);  (2)  MODIS  Land 

Surface Temperature; (3) GridMET, which provides daily climate data, such as precipitation, and 

potential  evapotranspiration  ;  (4)  TerraClimate,  which  provides  monthly  climate  and  climatic 

water balance for global terrestrial surfaces; (5) SMAP, NASA-USDA Enhanced SMAP Global 

83 

 
 
Soil  Moisture  Data;  (6)  POLARIS,  Probabilistic  Remapping  of  SSURGO  soil  properties;  (7) 

USGS 3DEP 10m  National  Map;  (8) gSSURGO Database,  which  contains gridded soil survey 

geographic data; (9) National Hydrography Dataset (NHD) from the US Geological Survey; and 

(10)  Hydrologic  landscape  regions  (HLR)  of  the  United  States;  and  (11)  Cropland  Data  Layer 

(CDL).  Using  these  datasets,  I  derived  variables  that  capture  the  distinguishing  characteristics 

between tile and non-tile points, guided by our knowledge and prior research (Cho et al., 2019). 

For example, we anticipate that fields with tile drainage will exhibit higher normalized difference 

vegetation index (NDVI) values during the growing season, as tile drains facilitate crop growth 

and NDVI correlates strongly with green biomass (Prinds et al., 2019).  

We utilized Landsat 7 and 8 images from the USGS Level 2, Collection 2, and Tier 1 data sources 

(USGS, 2017) to derive variables such as the normalized difference vegetation index (NDVI) and 

normalized water index (NDWI). NDVI is the most used vegetation index for observing greenery 

and is calculated with the equation (4.1), where NIR is near-infrared (which vegetation strongly 

reflects)  light  and  Red  is  the  visible  red  (which  vegetation  absorbs)  reflectance.  NDWI  is 

developed by (Gao, 1996) to enhance the water-related features of the landscapes. This index uses 

the  near-infrared  (NIR)  and  the  Short-Wave  infrared  (SWIR)  bands,  see  equation  (4.2).  These 

satellite imageries contain surface reflectance (SR) that is atmospherically corrected and produced 

by  the  Landsat  7  ETM+  sensor  and  Landsat  8  OLI/TIRS  sensor  using  the  Landsat  Ecosystem 

Disturbance Adaptive Processing System (LEDAPS) algorithm and the Land Surface Reflectance 

Code (LaSRC) respectively (Schmidt et al., 2013; Vermote et al., 2016).  

𝑁𝐷𝑊𝐼 = (𝑁𝐼𝑅 − 𝑅𝑒𝑑)/(𝑁𝐼𝑅 + 𝑅𝑒𝑑) 

𝑁𝐷𝑊𝐼 = (𝑁𝐼𝑅 − 𝑆𝑊𝐼𝑅1)/(𝑁𝐼𝑅 + 𝑆𝑊𝐼𝑅1) 

𝑇𝑟_𝑆𝑊𝐼𝑅1 = (1 − 𝑆𝑊𝐼𝑅1) ∗ (1 − 𝑆𝑊𝐼𝑅1)/(2 ∗ 𝑆𝑊𝐼𝑅1)  

𝑇𝑟_𝑆𝑊𝐼𝑅2 = (1 − 𝑆𝑊𝐼𝑅2) ∗ (1 − 𝑆𝑊𝐼𝑅2)/(2 ∗ 𝑆𝑊𝐼𝑅2)  

(4.1) 

(4.2) 

(4.3) 

(4.4) 

84 

 
 
 
 
 
 
 
Although land surface temperature (LST) is available from Landsat images, I opted to use LST 

from  the MODIS dataset  due to  Landsat  collection 2 surface temperature  data gaps because of 

missing Advanced Spaceborne Therma Emission and  Reflection Radiometer Global Emissivity 

Dataset (ASTER GED) (USGS, 2023). MODIS provides an 8-day average LST with a 1000 m 

resolution.  From  this  product,  I  obtained  daytime  and  nighttime  land  surface  temperature  and 

computed the difference between them to derive the diurnal LST difference (Wan et al., 2017).  

The gridded meteorological datasets GridMET and TerraClimate from the University of Idaho are 

used to obtain climate-related variables, which have a spatial resolution of approximately 4 km 

(Abatzoglou,  2013;  Abatzoglou  et  al.,  2018).  Daily  precipitation  and 

reflectance 

evapotranspiration (ETo) are sourced from the GridMET dataset (Abatzoglou, 2013). Aridity is 

also  calculated  as  the  ratio  between  cumulative  precipitation  and  ETo  and  is  used  in  the 

classification. Actual evapotranspiration (AET) variables in the spring and summer seasons are 

sourced from the monthly TerraClimate dataset (Abatzoglou et al., 2018) and derived using a one-

dimensional soil water balance model. GridMET was generated by blending the high-resolution 

spatial data from Parameter-elevation Regressions on Independent Slopes Model (PRISM) with 

the  high  temporal  resolution  data  from  National  Land  Data  Assimilation  System  (NLDAS)  to 

produce spatially- and temporally continuous variable (Abatzoglou, 2013). TerraClimate utilizes 

climatically-aided  interpolation,  which  combines  high-spatial  resolution  climatological  normal 

from the World Climate dataset with time-varying data from CRU Ts4.0 and the Japanese 55-year 

Reanalysis (JRA55) at coarser spatial resolutions (Abatzoglou et al., 2018).  

The NASA-USDA Enhanced SMAP data at 10-km spatial resolution is used to gather surface and 

subsurface (~1 m) soil moisture (SSM, SUSM) and the percent soil moisture (SMP). This dataset 

integrates  satellite-derived  Soil  Moisture  Active  Passive  (SMAP)  Level  3  soil  moisture 

85 

 
 
observations with the modified two-layer Palmer model based on the 1-D Ensemble Kalman Filter 

(EnKF) data assimilation approach. SMP is estimated as the available water for the plant divided 

by  the  total  water-holding  capacity  of  the  soil  profile,  which  helps  determine  if  the  soil  has 

sufficient water for crop growth. 

Clay percentage and saturated hydraulic conductivity estimates were derived from the Probabilistic 

Remapping  of  Soil  Survey  Geographic  Database  (SSURGO)  soil  property  database  called 

POLARIS (Chaney et al., 2019). I also estimated plant available water using the van Genuchten 

equation and parameters, such as saturated soil water content and residual volumetric water, also 

sourced from POLARIS.  

In addition, I incorporated topographic slope and soil drainage class information since flat, poorly 

drained fields  are ideal  for tile drainage installation (Jame et  al.,  2022;  Valayamkunnath et  al., 

2020). I computed the mean slope (in degrees) using elevation data from the USGS 3D Elevation 

Program  10-Meter  Resolution  Digital  Elevation  Model(USGS,  2019).  Soil  drainage  class 

information was obtained from the Gridded SSURGO (USDA, 2013), a product similar to standard 

SSURGO but in file geodatabase format. It includes seven categories (Figure 4.2b), (0) excessively 

drained;  (1)  somewhat  excessively  drained;  (2)  well  drained;  (3)  moderately  well  drained;  (4) 

somewhat poorly drained, (5) poorly drained, and (6) very poorly drained.  

Canals  and  ditches  from  the  National  Hydrography  Dataset  (NHD)  were  also  included.  These 

waterways are man-made and serve purposes from water transportation to irrigation, drainage, and 

navigation. To identify the proximity of each cell in the study region to these canals and ditches, I 

computed the Euclidean distance. This information is particularly relevant as these waterways are 

frequently  utilized  for  draining  water  from  tile-drained  fields.  Hydrologic  Landscape  Regions 

(HLRs) are delineated by USGS based on land surface form, geology, and climate among 43,931 

86 

 
 
small  watersheds  across  CONUS,  using  GIS  and  statistical  analysis.  Ultimately,  20  landscape 

hydrologic  region  classes  were  generated  based  on  the  similarities  in  landscape  and  climate 

characteristics, and hydrologic processes in different regions are assumed to be affected by the 

landscape and climate factors. To identify agricultural tile drainage, the cultivated crops category 

from the National Land Cover Database (NLCD) was selected to create a mask for classification. 

Cultivated  lands  refer  to  areas  utilized  for  annual  crop  production,  such  as  corn,  soybeans, 

vegetables, tobacco, and cotton. 

Except for processing soil drainage class and distance to canals or ditches locally and uploading 

them to the Google Earth Engine (GEE), all these data sources above can be accessed through the 

GEE data archive and the publicly shared awesome-gee-community-datasets, which are sourced 

by the community and are available as Earth Engine assets (Roy et al., 2023). The native resolution 

of these data sources varies from 10 to 10,000 meters; I aggregated or disaggregated them to 30 

meters resolution to derive the variables and for the final classification with the built-in reduce 

resolution and interpolation functions in GEE.  

4.3.4: Random Forest Classification and Accuracy Assessment  

We chose Random Forest (RF) because RF is capable of handling non-monotonic relationships, 

accommodating  non-linear  relationship  between  variables  effectively  and  most  importantly, 

reducing the likelihood of overfitting which is one of the biggest problems in machine learning by 

creating  random  subsets  of  features  and  building  multiple  trees  using  those  subsets  (Breiman, 

2001). RF classification has been widely used for classification tasks such as irrigation mapping, 

flood risk assessment, and water quality predictions (Belgiu & Drăguţ, 2016a; Cho et al., 2019; 

Deines et al., 2017, 2019; Z. Wang et al., 2015; Xie & Lark, 2021).  

Random forest classifiers are tuned by varying the number of decision trees (ntree) and the number 

87 

 
 
of variables randomly selected and tested for the best split when growing trees (mtry). A review 

study about random forest in remote sensing found that most research set ntree to 500 (Belgiu & 

Drăguţ,  2016b)  because  the  error  rate  became  stable  before  this  number  of  trees  is  achieved 

(Lawrence et al., 2006) and since the default value for ntree in the randomForest R package is 500 

(Liaw & Wiener, 2002a). The mtry can be varied between one and the maximum number of input 

variables, with the default of the square root of variable numbers. The classifier with this pair of 

ntree and mtry then makes predictions for other areas that don’t have ground truth data based on 

what was learned from the training data via the Google Earth Engine cloud computing platform.  

Accuracy  assessment  was  conducted  at  both  point-based  and  county  level.  For  point-based 

assessment, I calculated seven commonly used metrics based on the tile and non-tile ground truth 

points,  including  accuracy,  sensitivity,  specificity,  positive  and  negative  predictive  values, 

accuracy, Kappa and F1 score (Congalton & Green, 2019), using the  confusionMatrix function 

from the caret package in R (Kuhn, 2008). These metrics range from 0 to 1, and the higher value 

means the classification model is more reliable. Accuracy is a statistical measure of how well a 

binary classification can correctly identify or exclude a condition, calculated as the proportion of 

correct predictions, both true positives (tile) and true negatives (non-tile), among the total number 

of points examined. Sensitivity (Recall) and specificity describe the accuracy of a classification of 

the  presence  or  absence  of  a  condition.  Sensitivity  measures  how  well  a  test  can  identify  true 

positives, while specificity is a measure of how well a classification can identify true negatives. 

The positive predictive value (Precision) is the proportion of true positive classifications out of all 

positive  classifications  made  by  the  model,  so  it  measures  the  probability  that  a  positive 

classification  made  by  the  model  is  correct.  Similarly,  the  negative  predictive  value  is  the 

proportion of true negative classifications out of all negative classifications made by the model 

88 

 
 
and measures the probability that a negative classification made by the model is correct. The F1 

score is a common metric that many classification models use to assess the quality of their models, 

calculated as the equation (4.5) (Sasaki, 2007).  

𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 ∗ (𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙)/(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙) 

(4.5) 

For county-level accuracy assessment, statistical data in 2017 from the United States Department 

of  Agriculture  (USDA)  National  Agricultural  Statistics  Service  (NASS)  are  used.  More 

specifically, I calculate classified tile drainage areas for each county and then fit a linear regression 

model to compare them with reported ones.  

4.3.5: The ‘Black Box’ of the Machine Learning Model  

We  used  a  collection  of  tools  to  evaluate  and  communicate  the  results  of  the  tile  drainage 

classification model by examining variable importance, such as Mean Decrease Accuracy, Mean 

Decrease Gini and Shapley values, and accumulated local effects (ALE). The combination of these 

measures provide a comprehensive understanding of how each variable contributes to the model’s 

performance.  

Variable importance helps to quantify what variables are driving model  performance and more 

specifically, how much each variable improves the model’s accuracy. I ran the importance function 

from the randomForest package in R (Liaw & Wiener, 2002b). I extract variable values for training 

and validation points and develop a proxy random forest classification with the same training and 

testing data and identical parameter settings (ntree and mtry) using the randomForest packages in 

R,  which  means  the  variable  importance  values  likely  mirror  their  importance  in  GEE.  The 

importance 

function  produces 

two 

importance  measures:  MeanDecreaseGini 

and 

MeanDecreaseAccuracy.  MeanDecreaseGini  is  an  impurity-based  importance,  representing  the 

total  decrease  in  node  impurities  from  splitting  on  the  variable,  averaged  over  all  trees. 

MeanDecreaseAccuracy is permutation-based importance, measuring accuracy reduction on out-

89 

 
 
 
of-bag samples when the variable values are randomly permuted and are more reliable than the 

MeanDecreaseGini  (Strobl  et  al.,  2008).  Both  measures  have  been  applied  to  different  studies 

(Deines et al., 2017; Matasci et al., 2018; Schroeder et al., 2017).  

I also used the Shapley values to check how much every predictor contributed to a given prediction. 

Shapley  is  the  average  contribution  of  a  variable  value  to  the  prediction,  not  the  difference  in 

prediction  when  I  remove  the  feature  from  the  model.  I  computed  the  overall  importance  by 

assigning a score of 31 to the variable with the highest ranking within each of the three importance 

measures, and a score of one to the least important variable. The total score from the three measures 

was then used to calculate the overall importance. The highest possible score is 93, which would 

correspond to a variable that ranked at the top for all three measured importance. 

However, we are still not able to know whether the high or low value of the variables yields higher 

probabilities of the targeted class (i.e., tile) via variable importance metrics.  I then checked the 

ALE with the iml R package (Molnar, 2018). ALE looks at the general relationship between the 

predictors and the targeted class (i.e., tile drainage).  

4.3.6: Comparison with Existing Products 

To ensure the reliability of our tile drainage product for the US Midwest in 2017, I compared tile 

drainage areas for the US Midwest to the three other likely tile drainage maps and reported areas 

from Ag Census for the study region.  

We also performed a random forest classification using the same training and validation points. 

However, I only used slope, and soil drainage class, which are commonly used in other products 

in the comparison. I set the number of trees (ntree) to be the same as our classification (default: 

500), and the number of mtry is set as two since there are only two variables. I then compared the 

out-of-bag  (OOB)  error  with  our  classification  which  had  20  variables  for  the  training  and 

90 

 
 
validation datasets. The accuracy and F1 score metrics are compared as well.  

4.4: Results 

4.4.1: Selected Variables and Their Difference between Tile and non-Tile  

Initially, I gathered 57 variables from nine satellite and environmental datasets described in section 

2.3.  I  then  identified  three  time  periods  -  spring:  4/1  -  5/31,  summer:  7/1  -  8/31,  and  growing 

season: 5/1 - 9/31 based on the mean of maximum (max among available Landsat images) NDVI 

and NDWI for all the ground truth points across the study region (Figure A4.0.2). The objective 

was  to  maintain  a  consistent  pattern  for  tile  and  non-Tile  points  during  the  same  period.  For 

example, tile points exhibited relatively lower NDVI but higher NDWI in April and May compared 

to  non-Tile  points,  thus  I  define  the  spring  season  as  April  1st  to  May  31st.  Subsequently,  I 

successively remove highly correlated (r > 0.8) and lesser important variables, which results in 26 

variables were removed and the remaining 31 variables are used for classification. The ultimately 

used 31 variables are listed in Table 1, with their short names, full names, units, number of scenes 

or images, time periods, equations, data sources, and resolutions. The remaining 26 variables were 

initially chosen but not used in the final classification and are detailed in Text A4.1.  Figure A4.0.3 

shows the correlation coefficient for the remaining variables, with r < 0.8.   

Several other variables were investigated in this research, including combinations of VV single 

co-polarization with vertical transmit/vertical receive and VH dual-band cross-polarization with 

vertical transmit/horizontal receive, extracted from the C-band Synthetic Aperture Radar Ground 

Range  Detected  of  the  Sentinel-1  mission  (Sentinel-1  SAR  GRD)  from  GEE,  as  well  as  the 

neighborhood index and combinations of selected variables. However, these variables were not 

incorporated into our final classification model as they did not significantly improve prediction 

accuracy. 

91 

 
 
Table 4.1 Summary information for 31 satellite-derived and environmental variables. 

Variable full 
name 

Max NDVI in 
growing 
season  

Unit 

Scene 

Time 
period 

Equation 

Data source 

Resoluti
on (m) 

NA 

1758 

5/1-9/30 

(NIR-Red)/(NIR+Red) 

611 

4/1-5/31 

745 

7/1-8/31 

611 

4/1-5/31 

7/1-8/31 

745 

(NIR-
SWIR1)/(NIR+SWIR1) 

(1-SWIR1) * (1-
SWIR1)/(2*SWIR1) 

Surface 
reflectance 
from USGS 
Landsat 7 Level 
2, Collection 2, 
Tier 1 and 
USGS Landsat 
8 Level 2, 
Collection 2, 
Tier 1 (Landsat) 

30 

Abbreviation 

NDVI_grow_max 

NDWI_spr_max 

NDWI_summ_max 

Tr_swir1_spr_max 

Tr_swir1_summ_max 

Tr_swir2_grow_max 

Max NDWI in 
spring  

NA 

Max NDWI in 
summer 

NA 

NA 

NA 

Max Tr_swir1 
in spring  

Max Tr_swir1 
in summer 

Max Tr_swir2 
in growing 
season  

NA 

1758 

5/1-9/30 

(1-SWIR2) * (1-
SWIR2)/(2*SWIR2) 

TerraClimate: 
Monthly 
Climate and 
Climatic Water 
Balance for 
Global 
Terrestrial 
Surfaces, 
University of 
Idaho 

GridMET:Univ
ersity of Idaho 
Gridded 
Surface 
Meteorological 
Dataset  

4638.3 

4638.3 

MOD11A2.061 
Terra Land 
Surface 
Temperature 
and Emissivity 
8-Day Global 
1km (MODIS) 

1000 

AET_grow 

AET in 
growing 
season  

mm 

5 

5/1-9/30 

NA 

Precip_grow 

Aridity_spr 

Aridity_preGrow_3yr 

DayLST_median_gro
w 

DayLST_median_spr 

DayLST_range_spr 

DayLST_median_su
mm 

Precipitation in 
growing 
season 

Aridity in 
spring  

Aridity in pre-
growing 
season from 
the prior 3 
years  
Median 
daytime land 
surface 
temperature 
(LST) in 
growing 
season  
Median 
daytime LST 
in spring 
Range (max - 
min) of 
Daytime LST 
in spring  
Median of 
Daytime LST 
in summer 

mm 

153 

5/1-9/30 

NA 

61 

4/1-5/31 

Precipitation/PET 

NA  

93 

5/1-5/31 

Precipitation/PET 

20 

NA 

Kelvin 

7 

7 

8 

5/1-9/30 

4/1-5/31 

NA 

4/1-5/31 

Range = max - min 

7/1-8/31 

NA 

92 

 
 
 
Table 4.1 (cont’d)  

DayLST_range_grow 

DayLST_range_sum
m 

DiffLST_median_gro
w 

DiffLST_median_spr 

NightLST_max_sum
m 

NightLST_max_spr 

SMP_median_summ 

SUSM_max_spr 

SUSM_range_spr 

Range of 
daytime LST 
in growing 
season  
Range of 
daytime LST 
in summer 
Median 
difference 
between 
daytime and 
nighttime LST 
in growing 
season 
Median 
difference 
between 
daytime and 
nighttime LST 
in spring 
Maximum 
Nighttime LST 
in summer 
Maximum 
Nighttime LST 
in spring  
Median of 
percent soil 
moisture in 
spring  
Maximum 
subsurface soil 
moisture in 
spring  
Range of 
subsurface soil 
moisture in 
spring  

20 

5/1-9/30 

Range = max - min 

8 

7/1-8/31 

Range = max - min 

20 

NA 

5/1-9/30 

7 

NA 

4/1-5/31 

8 

7/1-8/31 

NA 

20 

5/1-9/30 

NA 

% 

21 

7/1-8/31 

NA 

mm 

20 

NA 

4/1-5/31 

20 

Range = max - min 

mm 

4/1-5/31 

Clay_mean_5cm 

Percent of clay 
(0-5 cm) 

% 

Ksat_mean_5cm 

Saturated 
hydraulic 
conductivity 
(0-5 cm) 

log10(cm/h
r) 

Paw_mean_5cm 

Plant available 
water (0-5 cm) 

mm 

1 

1 

1 

Slope_mean 

Mean slope  

degree 

1 

Soil_drain_class 

Soil drainage 
class 

NA  

1 

NA 

NA 

NA 

NA 

NA 

NA 

93 

NASA-USDA 
Enhanced 
SMAP Global 
Soil Moisture 
Data (SMAP) 

10000 

Probabilistic 
Remapping of 
SSURGO 
(POLARIS ) 

Derived from 
POLARIS with 
the van 
Genutchten 
equation 
USGS 3DEP 
10m National 
Map Seamless 
(NED) 
Gridded Soil 
Survey 
Geographic 
Database(gSSU
RGO) 

30 

10 

30 

 
 
 
 
 
Table 4.1 (cont’d)  

Canal_ditch_dist 

HLR 

Cropland 

Distance to the 
nearest canal 
ditch  

Hydrologic 
Landscape 
Regions  

A crop-specific 
land cover data 
layer 

meter 

1 

NA  

1 

NA 

NA 

NA 

NA 

NA  

1 

2017 

NA 

National 
Hydrography 
Dataset (NHD) 
Hydrologic 
landscape 
regions of the 
United States 
(HLR) 
USDA NASS 
Cropland Data 
Layers (CDL) 

500 

30 

30 

Note: Landsat bands are used to calculate variables, where NIR is near infrared band, Red is red 

band surface reflectance, SWIR1 is the Short-wave Infrared 1 band and SWIR2 is the Short-wave 

Infrared 2 band.  

We  then  compared  variable  differences  between  the  two  groups:  tile  and  non-tile  ground  truth 

(Figure 4.4). Our analysis, based on the Wilcoxon test in R, revealed that there is a significant 

difference (p < 0.0001) between the two groups for the 31 variables used in the final classification. 

For brevity, we only show the top 20 of the 31 variables, based on mean decrease accuracy.  

Four climate-related variables were chosen among the selected twenty variables, including actual 

evapotranspiration  (AET)  in  growing  season  (AET_grow),  aridity  in  spring  (Aridity_spr),  and 

precipitation in growing season (Precip_grow) and aridity in pre-growing season from the prior 

three years (Aridity_preGrow_3yr). The median AET across all tile points is higher than non-tile 

points  across  seasons,  with  the  largest  difference  seen  in  the  growing  season  (167  mm).  More 

accumulative precipitation in growing season for the tile points compared with the non-tile points; 

the difference is about 58 mm. This is consistent with the fact that tile drainage installation occurs 

in areas with more precipitation for optimal crop growth. For aridity, the median aridity index in 

the spring season for the tile group is 0.93, and 0.88 for the non-tile group. However, during the 

pre-growing (May) season in prior three years, aridity for tile (0.75) points is lower than non-tile 

(0.79) points.  

94 

 
 
Seven  variables  derived  from  the  MODIS  product  were  selected.  During  the  summer,  the 

maximum  nighttime  LST  is  0.4  degrees  Kelvin  higher  for  all  tile  points,  with  most  points 

concentrated around a relatively low (292 Kelvin) value; this is probably attributed to the cooling 

effect  of  the  green  vegetation.  The  nighttime  LST  in  spring  for  two  groups  have  similar 

distributions, and coincidentally, the same median value (284 Kelvin). The median value of the 

maximum daytime LST in the growing season and in spring are close for the two groups. A notable 

difference  (3.6  Kelvin,  median)  in  the  range  (maximum-minimum)  of  daytime  LST  during  the 

growing season for the tile group, as well as in the spring (3.2 Kelvin, median), while in summer, 

the difference is about 1 Kelvin.  

95 

 
 
Figure 4.4 Significant variable differences between tile and non-tile points (p <= 0.0001, Wilcoxon 
test) for all variables. The strips are colored by different data sources. The bottom and top of each 
box represent the first and third quartiles, respectively, and the line inside each box represents the 
median. Data beyond the end of the whiskers are "outlying" points and are not shown here. 

Median soil moisture percent (SMP) in the summer, the maximum and range of subsurface soil 

moisture in spring are the variables that the model picked from the SMAP dataset. I found that 

SMP in the summer is 21% higher for the tile points and this is likely because tile drainage maintain 

the right level of water table during the summer season for crop growth. Subsurface soil moisture 

96 

 
 
 
is higher for tile points in the spring, with 25 mm difference. The range of subsurface soil moisture 

between the two groups is little (0.5 mm) but significant.  

For  soil  drainage  class,  lower  values  indicate  higher  drain  capability,  with  zero  representing 

excessively drained soil, and six indicating very poorly drained soil. Our results indicate that most 

(56%) tile points are in poorly drained (22%, value = 5), somewhat poorly drained (34%, value = 

4) regions. In contrast, non-tiled points are mainly (52%) in well-drained (value = 2) areas. I also 

observed that the number of points decreases with increasing mean slope for both tile and non-tile 

points. The maximum slope of non-tiled points is close to ten, while for tile points, the maximum 

slope is 6. The median slopes for non-tiled and tile are 1.37 and 0.62, respectively. This is likely 

because tile drainage systems are typically installed in areas with relatively low slopes. The pattern 

for the hydrologic landscape regions is similar, with non-tile points spread into more regions, while 

tile  points  are  concentrated  in  regions  (5)  arid  plains  with  permeable  soils  and  bedrock,  (3) 

subhumid  plains  with  impermeable  soils  and  permeable  bedrock  and  (7)  humid  plains  with 

permeable soils and impermeable bedrock, respectively.  

Both tile and non-tile groups exhibit a similar pattern in terms of distance to canals or ditches, with 

more points located in areas with lower distances. However, I found that tiled points are generally 

closer to canals and ditches, with a median distance of  4882 meters less compared to non-tiled 

points.  I  also  found  that  the  patterns  for  soil  property  variables,  such  as  saturated  hydrologic 

conductivity (Ksat) and clay percentage, were similar between the two groups. Tile points had a 

slightly but significantly higher  (4.5%) clay percentage and a lower Ksat value, indicating that 

areas where tile drainage was installed would have a lower capacity to transmit water without tile 

drainage systems.  

97 

 
 
4.4.2: Classified Map from Random Forest Classification and Accuracy Assessment  

In the random forest classification, I used 28,723 tile points and 32,215 non-tile points, along with 

selected 31 variables, listed in Table 1, to train and validate the classification model. The training 

and testing tile drainage points are 500 meters apart shown in Figure 4.3, likely non-tile points are 

distributed as 500 meters away between training and testing dataset, but not shown in the map due 

to the large number of points and limited spaces. Specifically, I used 48,982 points (26,442 tile 

points and 22,540 non-tile points) to train the model, with the remaining 11,956 points (2281 tile 

points and 9675 non-tile points) held out for validation. I use the default settings, 500 trees and 5 

variables  per  split  for  classification.  The  classified  map  is  shown  in  Figure  4.5.  Classified  tile 

drainage  areas  are  concentrated  in  the  corn  belt  region,  including  eastern  Dakotas,  southern 

Minnesota,  north  central  Iowa,  northeastern  Illinois  and  Indiana,  northwestern  Ohio,  and 

Michigan's thumb area. In general, the machine learning model can capture the reported tile fields 

in the western Lake Erie basin and tile permits in SD. 

The point-based assessment indicates that the classification model achieved good overall accuracy, 

with a score of 0.96, which indicates that 96% of points, including tile and non-tile, are classified 

correctly (Figure 4.6). The Kappa and F1 scores are 0.876 and 0.901. Additionally, the other four 

metrics were higher than 0.85, including sensitivity (0.95), specificity (0.96), positive predictive 

value (Precision) (0.85), and negative predictive value (0.99), demonstrating the good quality of 

the classification model.  

We aggregated the tile drainage areas for each county from the classified map and compared them 

with the area from the USDA NASS, as illustrated in Figure 4.7a. The random forest classification 

model provided reasonable agreement with the reported area from Ag Census, with an R2 value of 

0.68. However, the model tended to overestimate tile drainage area, with a slope of 1.1, especially 

98 

 
 
in counties with larger tile drainage reported by NASS. It underestimates the counties with smaller 

reported areas such as these points with reported areas lower than 500 km2.  

Figure 4.5 Classified tile drainage (blue) map with the background of agricultural lands (yellow). 
Two zoom-in windows show the reported tile field (black circle) in the western Lake Erie basin 
and tile drainage permits (black rectangle) in South Dakota (red rectangle).  

At the state level (Figure A4.0.5), eight states (Illinois, Indiana, Iowa, Michigan, Minnesota, North 

Dakota,  Ohio,  and  Pennsylvania)  had  R2  values  greater  than  0.65,  and  two  of  these  states 

(Michigan  and  Pennsylvania)  had  underestimated  tile  drainage  areas.  The  estimates  for  the 

remaining states (Kansas, Missouri, Nebraska, New York, South Dakota, and Wisconsin) are less 

accurate, likely because they are less heavily tile-drained or have fewer ground truth points. 

99 

 
 
  
Figure  4.6  Point-based  accuracies  of  tile  drainage  classification  for  the  test  dataset  with  seven 
metrics. 

We also created a residual map at the county level, residuals are calculated as the percentage of 

area  difference  divided  by  the  area  reported  by  NASS  (Figure  4.7b).  The  map  revealed  that 

overestimation occurred in counties with heavily tile-drained areas, primarily in Eastern Dakota, 

Southern  Minnesota,  the  Des  Moines  Lobe  in  Iowa,  Northeastern  Illinois,  Middle-northern 

Indiana, and Northwestern Ohio. This is likely due to our tile drainage points being concentrated 

in these areas and I assumed tile drainage permits in South Dakota, North Dakota, and the Bois de 

Sioux Watershed District are ground truth tile points.  

Additionally, I found that 25% of counties were not predicted to have any tile drainage installation, 

despite having reported tile drains by NASS (NA-Classified in Figure 4.7b). These counties have 

100 

 
 
 
a relatively low reported area, with a median reported  area of ~26 km2 or ~5% of reported tile 

drainage  in  agricultural  lands.  This  indicates  that  our  classification  model  may  require  better 

ground truth information for regions with a low percentage of tile drains. It is important to note 

that the area reported by farmers through surveys may be somewhat inaccurate, however, it is the 

only source I have for this comparison across the region. 

Figure 4.7 Accuracy and spatial residuals observed at the county level. (a) Comparison between 
the classified tile drainage area (km2) and the reported area from the USDA NASS in 2017. Each 
point on the graph represents one county, with the blue solid line representing the linear regression 
line and the black dashed line indicating the 1:1 line. (b) Spatial residual at the county level, which 
was computed as the percentage of area difference divided by the NASS area. NA-NASS stands 
for no reported area from the Ag Census in 2017, NA-Classified represents the model that did not 
capture tile drainage in these counties.  

101 

 
 
 
 
 
 
 
 
Figure 4.7 (cont’d) 

4.4.3: Overall Variable Importance and Accumulated Local Effects 

The importance of 31 input variables for tile drainage classification across the US Midwest was 

assessed using the Mean Decrease Gini  (Gini), Mean Decrease Accuracy (MDA), and Shapley 

values. Higher values of Gini and MDA suggest that a variable has a more significant impact on 

classifying tile and non-tile areas. AET in the growing season (AET_grow) ranked the highest in 

the MDA, and the maximum nighttime LST in the summer (NightLST_max_summ) ranked top in 

the Gini index (Figure A4.0.4), thus they were in the top right corner of the multi-way importance 

plot (Figure 4.8). Four additional variables (aridity in spring, soil moisture percent in summer, the 

range of daytime LST in growing season and soil drainage  class) ranked high in both Gini and 

MDA, meaning that removing these variables will largely reduce impurity and model accuracy.  

Shapley value (Figure 4.9a) can determine whether variables positively or negatively affected the 

classification accuracy of our model for a given prediction (tile, in this case). It is important to note 

102 

 
 
 
that Shapley values are the average contribution of an input variable to the target prediction, thus 

it might be biased due to the uneven distribution of ground truth points. Our findings revealed that 

four variables, soil moisture percent in the summer, median daytime LST in summer and growing 

season  and  actual  evapotranspiration  in  growing  season,  negatively  impacted  tile  drainage 

identification.  Specifically,  soil  moisture  percent  in  the  summer  made  the  largest  negative 

contribution to identifying tile drains, this is somewhat unexpected as higher soil moisture percent 

were found in tile drainage points compared to non-tile points (Figure 4.4).  

Conversely, the remaining 26 variables had positive contributions toward identifying tile drains, 

and  cropland  type  did  not  show  any  impacts  on  tile  drainage  identification.  Of  these  positive 

variables, soil drainage class had the largest positive contribution, followed by aridity in spring, 

the range of daytime LST in the growing season (which was the second variable based on the Gini 

index). Align with expectations, tile drains are installed in relatively poorly drained soils (higher 

class number) thus resulting in positive impacts.  

103 

 
 
Figure  4.8  Two  measures  of  variable  importance  from  the  random  forest  classification.  The 
measure on the x-axis is Mean Decrease Accuracy and Mean Decrease Gini in the y-axis. The P 
value  shows  the  significance  level  of  importance.  The  top  11  variables  according  to  these  two 
measures are labeled as circles. Variables are identified by their short names, which can be related 
to full names and other information using Table 4.1. 

Finally, I calculated the overall importance (Figure 4.9b), based on these three-importance metrics, 

Gini  index, MDA, and Shapley value.  It  revealed that variables derived from  MODIS product, 

climate-  and  soil-related  variables  are  the  most  important.  The  top-ranking  variable  is  the 

maximum of nighttime LST during summer. This is somewhat consistent with (Cho et al., 2019) 

where the mean LST in the spring strongly contributed to the random forest classification for tile 

104 

 
 
 
drainage in the Bois de Sioux Watershed District in Minnesota and in the Red River basin (overlies 

portions of Dakotas and Minnesota).  

Figure 4.9 (a) Shapley value and  (b) overall importance  (summed score from MDA, Gini,  and 
Shapley). The +/- Shapley value represents the average positive/negative impacts of variables on 
the classification. Different colors in (b) represent different data sources.  

105 

 
 
 
 
Figure 4.10 Accumulated local effects for top 20 variables, with the locally estimated scatterplot 
smoothing (loess) method. The strips are colored by different data sources.  

Figure 4.10 displays the accumulated local effects to evaluate whether high or low values of the 

variables correspond to higher probabilities of tile drains. Our analysis demonstrates that none of 

the effects exhibit monotonic behavior. The magnitude of the effects is plotted on the y-axis. The 

soil  drainage  class  ranked  top  among  the  positive  variables  based  on  the  Shapley  value  and 

maximum nighttime LST in the summer ranked top based on Gini index stand out with relatively 

larger  ALE  values.  Notably,  these  two  variables  are  about  one  magnitude  higher  than  the 

106 

 
 
 
 
remaining variables. 

The  general  patterns  can  be  categorized  into  several  groups:  (1)  higher  values  indicate  lower 

probabilities of tile drains, such as the actual evapotranspiration in the growing season; (2) higher 

values indicate higher probabilities of tile drains but stabilizes at a certain point, such as aridity in 

spring  (threshold:  0.75);  (3)  higher  values  yield  lower  probabilities  of  tile  drains,  such  as  soil 

drainage class (higher values mean poorly drained soils); and (4) up and down trend variables.  

4.5: Discussion and Conclusion 

4.5.1: Comparison with Existing Products 

Table 4.2 listed the tile drainage area in the US Midwest (defined in Figure 4.2d) from different 

products.  Our  product,  SEETileDrain,  estimated  that  185,069  km2  were  tile  drained,  which  is 

~9.6% lower than the estimates from survey-based statistics from USDA-NASS, while the other 

three  (AgTile-US,  TD-MostPD,  TD-AllPD)  have  estimated  higher  likely  tile  drainage  than 

reported. AgTile-US utilizes information on soil drainage and topographic slope threshold within 

cropland  areas  to  estimate  tile  drainage  and  constrain  the  geospatial  model  with  statistical  tile 

drainage  area  at  the  county  level,  based  on  data  from  the  Census  of  Agriculture  in  2017 

(Valayamkunnath et al., 2020). The estimate is ~2% lower than the reported areas from USDA-

NASS. Another study by (Jame et al., 2022) developed two TransformingDrainage (TD) extent 

products for the US Midwest, which are based on soil drainage class. These classes were selected 

because they are directly related to crop production and are therefore considered more suitable for 

estimating likely drained land than soil properties alone. The first product, TD-MostPD, includes 

areas with very poorly and poorly drained soils that are likely tile-drained. The second product, 

TD-AllPD, includes somewhat poorly drained soils in addition to the two categories used in TD-

MostPD. TD-MostPD and TD-AllPD estimated 2.8 and 5 times more tile drainage extent than the 

107 

 
 
statistics from USDA-NASS for this region, respectively.  

Table 4.2. Estimated tile drainage area among existing products. 

Product 

Estimated 

Name  

Area (km2) 

Resolution  

Notes 

SeeTileDrain 

185,069 

30 m 

learning  

USDA-NASS 

204,842  

county-level  

County-level survey statistics  

Derived  from  satellite  and  environmental  datasets  with  machine 

AgTile-US 

201,206 

30 m  

Geospatial analysis based on slope and soil drainage class  

TD-MostPD 

576,493 

30 m  

Areas have very poorly and poorly drained soils 

TD-AllPD 

1,025,288 

30 m  

soils 

Areas  have  very  poorly,  poorly,  and  somewhat  poorly  drained 

Because these maps (AgTile-US, TD-MostPD, TD-AllPD) rely on soil drainage class and slope 

information, I used a random forest classification model based on these inputs and all ground truth 

points. The model used the same ntree (500) and two variables at each split (mtry = 2), with an 

out-of-bag (OOB) estimate of the error rate at 36%. This indicates that only 64% of ground truth 

points  were  classified  correctly.  The  model  also  exhibited  relatively  low  values  for  sensitivity 

(0.66), and specificity (0.61), with a positive predictive value (0.29), and negative predictive value 

(0.88), demonstrating the inferior quality  of the classification model which used two variables, 

compared to our final classification model that used 31 variables. The F1 score for this model was 

0.4, compared to the F1 score of 0.9 obtained with 31 variables, indicating that the variables added 

significantly improved the binary classifications. 

4.5.2: Conclusion, Limitation and Future Work  

Spatially-explicit  agricultural  tile  drainage  across  the  US  Midwest,  at  30-m  resolution,  was 

mapped  using  a  random  forest  machine  learning  classification  model.  This  model  was 

108 

 
 
 
implemented on the Google Earth Engine cloud computing platform, incorporating 31 variables 

from  eleven  datasets  and  comprehensive  ground  truth  points.  The  resulting  classified  map 

demonstrated  good  accuracy  in  point-based  assessment  and  reasonable  agreement  with  the 

reported  area  from  USDA-NASS.  Land  Surface  Temperature,  soil  moisture  percent,  actual 

evapotranspiration, and soil drainage class were identified as strong predictors for tile drainage 

identification.  The  machine  learning  model  developed  here  could  be  readily  applied  to  other 

regions,  both  for  historical  and  future  years.  The  resulting  data  product  may  be  useful  for 

hydrological,  water quality,  and crop modeling  research. This  information  may  help  watershed 

managers and stakeholders achieve cost-effective agricultural water and nutrient management and 

maintain optimal crop production.  

The accuracy of the algorithm depends on the coverage of ground truth information, which was 

not evenly distributed and may be imbalanced for tile and non-tile points. Besides, assumptions 

about tile drainage permits and potential issues during the visual interpretation can introduce model 

bias.  In  addition,  this  study  successively  removed  relatively  high  correlated  but  less  important 

features  based  on  pre-defined  threshold,  while  correlations  between  variables  might  still  exist. 

Even  though  variable  co-variation  is  not  often  considered  in  random  forest  classification, 

additional  preprocessing  could  be  used  to  reduce  the  effects  of  correlated  variables  on  the 

performance of the classification model and improve the robustness of variable importance.  

For future work, deep learning methods and tile drainage line extraction Artificial intelligence (AI) 

models could be used to identify tile drainage areas. These approaches have the potential to provide 

more  accurate  and  efficient  classification  results.  Moreover,  the  use  of  high-resolution  images, 

such as the Harmonization of the Landsat and Sentinel-2 data (HLS) (Claverie et al., 2018), images 

provided  by  the  Planet  Lab  or  Google's  aerial  imagery,  would  enable  the  identification  of  tile 

109 

 
 
drainage  at  a  finer  resolution.  This  approach  could  provide  more  detailed  information  on  the 

distribution  and  patterns  of  tile  drainage  across  the  landscape.  The  incorporation  of  such 

techniques and data sources could  enhance our understanding  of the effects  of tile drainage on 

agricultural landscapes and facilitate development of more effective management strategies. 

4.6: Acknowledgments 

This  work  was  funded  by  NASA  Grant  NNX11AC72G,  NOAA  Grant  NA12OAR432007, 

GS100013,  and  KBS  LTER  Graduate  Fellowship.  Luwen  Wan  was  partially  supported  by  the 

China  Scholarship  Council  (CSC).  Any  opinions,  findings,  conclusions,  or  recommendations 

expressed  in  this  material  are  those  of  the  authors  and  do  not  necessarily  reflect  the  views  of 

NASA, NOAA, or CSC. I am grateful to Eva Arvizu and Jordyn Porter who visually interpreted 

tile drainage points via the Google Earth aerial images. I thank Eunsang Cho (NASA, Goddard 

Space Flight Center) who shared his code and guidance on the machine learning algorithm.  

110 

 
 
 
 
CHAPTER 5: SUMMARY 

5.1: Conclusion 

This dissertation simulated nutrient fluxes to the Great Lakes from the US side of their drainage 

basins, quantified the contributions of seven nutrient sources and four distinct transport pathways 

and identified nutrient delivery hotspots. Modeling transport pathways in Chapter 2 offers a novel 

alternative to many models that do not include essential pathways, such as groundwater and septic 

plumes.  Nutrient  simulation  on  seasonal  variations  can  help  stakeholders  implement  nutrient 

reductions at the most effective time.  

Chapter 2 found that the annual average nitrogen and phosphorus loads to the Great Lakes from 

its US drainage basin are 599 and 21.7 kg/yr/km2, respectively. The spatial patterns are similar, 

with high nutrient delivery areas in southern Lake Michigan, Saginaw Bay, and the Western Lake 

Erie basin. SENSEflux modeling results demonstrate that agricultural sources are the main sources 

of nutrient pollution, and surface pathways (overland runoff and tile drainage) are dominant in the 

US Great Lakes Basin on an annual basis. Additionally, these results highlight those other sources 

(i.e.,  septic  tanks,  atmospheric  deposition,  sources  from  developed  urban  areas)  and  pathways 

(groundwater and septic  plumes) cannot  be overlooked.  Annually,  areas  with  high  loading and 

delivery efficiency are the most intense sources of nutrient loads to the Great Lakes coastline.  

SENSEflux seasonal models in Chapter 3 found that there is 1063 t/day of nitrogen delivered from 

US lands to the lakes during snowmelt, and this is about 2.8 times greater than the annual delivery 

(378 t/day) and 16 times greater than the TN delivery during baseflow (66 t/day). And there are 

more phosphorus delivered annually and during snow melt season, compared to baseflow. Most 

nutrients are transported via overland flow and tile fields regardless of the period. Groundwater 

and  septic  plume  pathways  play  an  essential  role  in  transporting  nutrients  and  become  more 

111 

 
 
significant during baseflow. Besides, agricultural sources contribute substantially higher (14% for 

TN and 5% for TP) during melt than baseflow, while point sources, septic tanks, and atmospheric 

deposition become more prominent contributors to nutrient delivery during baseflow. In addition, 

targeting  nutrient  reductions  during  snowmelt  is  more  effective  than  focusing  on  baseflow, 

especially  for  TN,  based  on  seasonal  delivery  hotspot  analysis.  Thus,  seasonal  variations  of 

nutrient transport should be considered when implementing water quality management plans to 

achieve nutrient reduction targets.  

Through community-facing tools, the SENSEflux model outputs and insights from Chapter 2&3 

can  be  linked  to  the  local  decision-making  process,  assisting  watershed  managers  in  focusing 

actions on specific sources and pathways at the right time and the right place.  

This dissertation also used a machine learning model to map agricultural tile drainage across the 

US Midwest in Chapter 4. This is because tile drainage is an important nutrient transport pathway 

but  the  spatially  explicit  tile  drainage  data  at  finer  resolution  was  lacking  at  large  scales.  The 

random  forest  classification  model  was  implemented  on  the  Google  Earth  Engine,  with 

comprehensive tile and non-tile ground truth points and twenty satellite-derived and environmental 

variables.  The  classifier  model  achieved  good  accuracy  (95.5%),  with  an  F1  score  of  0.904. 

Besides, aggregated areas from the classified map for each county correlated well with the area 

reported by USDA Agricultural Census. The maximum nighttime land surface temperature during 

summer ranked highest based on the assessment of variable importance, followed by climate- and 

soil-related variables.  

In summary, SENSEflux model products about nutrient loadings, sources, pathways, and hotspots, 

as  well  as  insights  from  these  products  can  help  decisions  makers  target  the  areas  for  nutrient 

reduction  at  the  right  time  and  develop  watershed  management  strategies  more  effectively. 

112 

 
 
SENSEflux  model  could  be  readily  applied  to  other  watersheds  with  nutrient  pollution  and 

management  issues,  with  a  resolution  limited  only  by  spatially  explicit  nutrient  inputs.  The 

classified  tile  drainage  map  can  improve  the  accuracy  of  crop,  hydrological,  and  water  quality 

modeling  work  across  the  US  Midwest.  The  identified  tile  drainage  locations  can  provide  a 

baseline for mapping tile drainage area change over time predicting for the future. Similarly, the 

random forest machine learning approach for tile drainage classification could be readily applied 

to other regions where the key variables are available.  

5.2: Limitations 

There  are  some  limitations  in  this  dissertation,  even  though  it  provides  spatially  explicit  data 

products and results necessary to understand the interactions between human and natural systems. 

First, SENSEflux considers tile drainage to be an alternative nutrient pathway to overland runoff, 

which somewhat overestimates tile drainage’s contribution. Second, this dissertation assumed that 

groundwater  watersheds  have  the  same  boundary  as  surface  watersheds  and  that  nutrient 

attenuation via the groundwater pathway was determined by the overland flow length. Third, there 

are some inherent uncertainties in the annual nutrient source due to the various sources, the coarse 

resolution of some inputs, and so on, but SENSEmap is the most suitable and useful nutrient source 

for this dissertation. SENSEflux used the annual average nutrient inputs with seasonal mobility 

inferred  from  streamflow  variability,  to  simulate  seasonal  nutrient  loading,  as  I  do  not  have 

information about when nutrients are applied. Fourth, the distribution and accuracy of ground truth 

points  are  critical  for  accurate  classification.  Some  of  the  tile  drainage  points  used  in  this 

dissertation  are  clustered,  tile  drainage  permits  from  three  states  were  assumed  to  be  real  tile 

drainage points, and likely non-tile points were used for classification, because tile and non-tile 

ground truth points are not easily accessible, and labor and cost-intensive.  

113 

 
 
5.3: Future Work 

 Future research could improve nutrient delivery simulation by distinguishing between overland 

runoff and flow from tile drainage, as well as nutrient travel time through groundwater watersheds 

and estimates of legacy timescales, particularly in watersheds with significantly different boundary 

conditions  than  surface  watersheds.  Furthermore,  time-varying  nutrient  source  inputs  such  as 

allocating fertilizer and manure to seasons based on phenology and fertilizer practice, as well as a 

robust monitoring network for collecting nutrient concentration across seasons could improve the 

seasonal nutrient delivery simulation and understanding the timing and quantity of nutrient losses. 

Furthermore, considering nutrient uptake and transformation from wetland systems before flowing 

into  the  Great  Lakes  is  necessary  because  wetlands  are  effective  natural  systems  for  nitrogen, 

phosphorus, and carbon, as well as reducing the effects of nutrient pollution on aquatic ecosystems. 

In  addition,  the  SENSEflux  model  can  simulate  nutrient  species  such  as  dissolved  inorganic 

nitrogen  (nitrate  and  nitrite),  and  soluble  reactive  phosphorus,  which  have  different  chemical 

properties,  bioavailability,  and  ecological  impacts  in  the  environment  than  total  nitrogen  and 

phosphorus. Furthermore, the SENSEflux model can be extended to the Great Lakes Basin, which 

would  aid  in  making  more  holistic  decisions  for  nutrient  management  and  conservation  in  the 

United States and Canada.  

Future research could make use of high-resolution remote sensing images and refine tile drainage 

detection  methods.  Using  images  from  the  Planet  Lab,  Google's  aerial  imagery,  or  the 

Harmonization of the Landsat and Sentinel-2 data (HLS) would enhance classification accuracy 

and  enable  the  identification  of  tile  drainage  at  a  higher  resolution.  In  addition,  deep  learning 

techniques and tile drainage line extraction Artificial intelligence (AI) models have the potential 

to  identify  tile  drainage  areas.  These  high-resolution  images  and  AI  techniques  can  be  used  to 

114 

 
 
identify historical tile drainage and predict future tile drainage, the resulting tile drainage area will 

improve our understanding of the effects of tile drainage on agricultural landscapes.  

Lastly, future interdisciplinary research could focus on the relationship between food production, 

nutrient pollution, and climate change, as well as human activities. The influence of climate change 

on food production, water balance, nutrient transport, and the modifications of human behaviors 

to  manage  complex  landscapes  has  been  growing.  Future  agricultural  and  hydrological 

management research needs to account for the warming temperatures and the increased frequency 

and intensity of extreme events brought on by climate change. In addition, agricultural practices 

such as tile drainage, cover crop, conservation tillage, and nutrient management practices such as 

the  edge  of  field  practices,  woodchip  bioreactors,  and  constructed  wetlands  could  help  reduce 

nutrient losses. Future management efforts aimed at enhancing human well-being and fostering a 

sustainable planet will require the integration of these processes. 

115 

 
 
 
 
REFERENCES 

Abatzoglou, J. T. (2013). Development of gridded surface meteorological data for ecological 
applications and modelling. International Journal of Climatology, 33(1), 121–131. 
https://doi.org/10.1002/joc.3413 

Abatzoglou, J. T., Dobrowski, S. Z., Parks, S. A., & Hegewisch, K. C. (2018). TerraClimate, a 

high-resolution global dataset of monthly climate and climatic water balance from 1958–
2015. Scientific Data, 5(1), 170191. https://doi.org/10.1038/sdata.2017.191 

Abell, J. M., Özkundakci, D., & Hamilton, D. P. (2010). Nitrogen and Phosphorus Limitation of 
Phytoplankton Growth in New Zealand Lakes: Implications for Eutrophication Control. 
Ecosystems, 13(7), 966–977. https://doi.org/10.1007/s10021-010-9367-9 

Alam, M. J., & Dutta, D. (2013). Predicting climate change impact on nutrient pollution in 
waterways: a case study in the upper catchment of the Latrobe River, Australia. 
Ecohydrology, 6(1), 73–82. https://doi.org/10.1002/eco.282 

Alexander, J. (2013, May 12). Michigan has nation’s weakest regulations on septic systems. 

Retrieved from 
https://www.mlive.com/environment/2013/05/michigan_has_nations_weakest_r.html 

Alexander, R. B., & Smith, R. A. (2006). Trends in the nutrient enrichment of U.S. rivers during 
the late 20th century and their relation to changes in probable stream trophic conditions. 
Limnology and Oceanography, 51(1part2), 639–654. 
https://doi.org/10.4319/lo.2006.51.1_part_2.0639 

Alexander, R. B., Smith, R. A., & Schwarz, G. E. (2000). Effect of stream channel size on the 

delivery of nitrogen to the Gulf of Mexico. Nature, 403(6771), 758–761. 
https://doi.org/10.1038/35001562 

Alexander, R. B., Johnes, P. J., Boyer, E. W., & Smith, R. A. (2002). A comparison of models 
for estimating the riverine export of nitrogen from large watersheds. Biogeochemistry, 57–
58(1), 295–339. https://doi.org/10.1023/a:1015752801818 

Arnold, J. G., Srinivasan, R., Muttiah, R. S., & Williams, J. R. (1998). Large Area Hydrologic 
Modeling and Assessment Part I: Model Development. Journal of the American Water 
Resources Association, 34(1), 73–89. https://doi.org/10.1111/j.1752-1688.1998.tb05961.x 

Azzari, G., Grassini, P., Edreira, J. I. R., Conley, S., Mourtzinis, S., & Lobell, D. B. (2019). 
Satellite mapping of tillage practices in the North Central US region from 2005 to 2016. 
Remote Sensing of Environment, 221, 417–429. https://doi.org/10.1016/j.rse.2018.11.010 

Baker, J. L., Campbell, K. L., Johnson, H. P., & Hanway, J. J. (1975). Nitrate, Phosphorus, and 

Sulfate in Subsurface Drainage Water. Journal of Environmental Quality, 4(3), 406–412. 
https://doi.org/10.2134/jeq1975.00472425000400030027x 

116 

 
 
Baris, R. D., Cohen, S. Z., Barnes, N. L., Lam, J., & Ma, Q. (2010). Quantitative analysis of over 
20 years of golf course monitoring studies. Environmental Toxicology and Chemistry, 29(6), 
1224–1236. https://doi.org/10.1002/etc.185 

Basu, N. B., Dony, J., Meter, K. J., Johnston, S. J., & Layton, A. T. (2023). A Random Forest in 
the Great Lakes: Stream Nutrient Concentrations Across the Transboundary Great Lakes 
Basin. Earth’s Future, 11(4). https://doi.org/10.1029/2021ef002571 

Bechard, A. (2020). The economic impacts of harmful algal blooms on tourism: an examination 
of Southwest Florida using a spline regression approach. Natural Hazards, 104(1), 593–609. 
https://doi.org/10.1007/s11069-020-04182-7 

Belgiu, M., & Drăguţ, L. (2016a). Random forest in remote sensing: A review of applications 

and future directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24–31. 
https://doi.org/10.1016/j.isprsjprs.2016.01.011 

Belgiu, M., & Drăguţ, L. (2016b). Random forest in remote sensing: A review of applications 

and future directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24–31. 
https://doi.org/10.1016/j.isprsjprs.2016.01.011 

Benoy, G. A., Jenkinson, R. W., Robertson, D. M., & Saad, D. A. (2016). Nutrient delivery to 
Lake Winnipeg from the Red—Assiniboine River Basin – A binational application of the 
SPARROW model. Canadian Water Resources Journal / Revue Canadienne Des 
Ressources Hydriques, 41(3), 1–19. https://doi.org/10.1080/07011784.2016.1178601 

Blodgett, D., Read, E., Lucido, J., Slawecki, T., & Young, D. (2016). An Analysis of Water Data 

Systems to Inform the Open Water Data Initiative. JAWRA Journal of the American Water 
Resources Association, 52(4), 845–858. https://doi.org/10.1111/1752-1688.12417 

Bock, E. M., & Easton, Z. M. (2020). Export of nitrogen and phosphorus from golf courses: A 

review. Journal of Environmental Management, 255, 109817. 
https://doi.org/10.1016/j.jenvman.2019.109817 

Bockheim, J. G. (2020). Soils of the Laurentian Great Lakes, USA and Canada, 121–127. 

https://doi.org/10.1007/978-3-030-52425-8_9 

Bosch, N. S. (2008). The influence of impoundments on riverine nutrient transport: An 

evaluation using the Soil and Water Assessment Tool. Journal of Hydrology, 355(1–4), 
131–147. https://doi.org/10.1016/j.jhydrol.2008.03.012 

Bowles, T. M., Atallah, S. S., Campbell, E. E., Gaudin, A. C. M., Wieder, W. R., & Grandy, A. 

S. (2018). Addressing agricultural nitrogen losses in a changing climate. Nature 
Sustainability, 1(8), 399–408. https://doi.org/10.1038/s41893-018-0106-0 

Boyer, E. W., Goodale, C. L., Jaworski, N. A., & Howarth, R. W. (2002). Anthropogenic 

nitrogen sources and relationships to riverine nitrogen export in the northeastern U.S.A. 
Biogeochemistry, 57–58(1), 137–169. https://doi.org/10.1023/a:1015709302073 

117 

 
 
Brakebill, J. W., & Terziotti, S. E. (2011). A Digital Hydrologic Network Supporting NAWQA 

MRB SPARROW Modeling--MRB_E2RF1WS. https://doi.org/10.3133/70046785 

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. 

https://doi.org/10.1023/a:1010933404324 

Brookfield, A. E., Hansen, A. T., Sullivan, P. L., Czuba, J. A., Kirk, M. F., Li, L., et al. (2021). 

Predicting algal blooms: Are we overlooking groundwater? Science of The Total 
Environment, 769, 144442. https://doi.org/10.1016/j.scitotenv.2020.144442 

Bureau, U. S. C. (2010). Census Regions and Divisions of the United States. Retrieved from 

https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf 

Byrd, R. H., Hribar, M. E., & Nocedal, J. (1999). An Interior Point Algorithm for Large-Scale 

Nonlinear Programming. SIAM Journal on Optimization, 9(4), 877–900. 
https://doi.org/10.1137/s1052623497325107 

Byrd, R. H., Gilbert, J. C., & Nocedal, J. (2000). A trust region method based on interior point 
techniques for nonlinear programming. Mathematical Programming, 89(1), 149–185. 
https://doi.org/10.1007/pl00011391 

Byrnes, D. K., Meter, K. J. V., & Basu, N. B. (2020). Long‐Term Shifts in U.S. Nitrogen 

Sources and Sinks Revealed by the New TREND‐Nitrogen Data Set (1930–2017). Global 
Biogeochemical Cycles, 34(9). https://doi.org/10.1029/2020gb006626 

Carmichael, W. W., & Boyer, G. L. (2016). Health impacts from cyanobacteria harmful algae 
blooms: Implications for the North American Great Lakes. Harmful Algae, 54, 194–212. 
https://doi.org/10.1016/j.hal.2016.02.002 

Chaney, N. W., Minasny, B., Herman, J. D., Nauman, T. W., Brungard, C. W., Morgan, C. L. S., 
et al. (2019). POLARIS Soil Properties: 30‐m Probabilistic Maps of Soil Properties Over the 
Contiguous United States. Water Resources Research, 55(4), 2916–2938. 
https://doi.org/10.1029/2018wr022797 

Chaplin-Kramer, R., Hamel, P., Sharp, R., Kowal, V., Wolny, S., Sim, S., & Mueller, C. (2016). 
Landscape configuration is the primary driver of impacts on water quality associated with 
agricultural expansion. Environmental Research Letters, 11(7), 074012. 
https://doi.org/10.1088/1748-9326/11/7/074012 

Chapra, S. C., Dolan, D. M., & Dove, A. (2016). Mass-balance modeling framework for 

simulating and managing long-term water quality for the lower Great Lakes. Journal of 
Great Lakes Research, 42(6), 1166–1173. https://doi.org/10.1016/j.jglr.2016.04.008 

Chen, Xuanjing, Wang, M., Kroeze, C., Chen, X., Ma, L., Chen, X., et al. (2022). Nitrogen in the 

Yangtze River Basin: Pollution Reduction through Coupling Crop and Livestock 
Production. Environmental Science & Technology. https://doi.org/10.1021/acs.est.1c08808 

118 

 
 
Chen, Xuemei, Zhang, W., Yin, Y., Tang, J., Li, G., & Yan, Y. (2021). Seasonal variation 

characteristics and release potential of phosphorus in sediments: a case study of the Qiuxi 
River, a typical diffuse source pollution river in southwestern China. Journal of Soils and 
Sediments, 21(1), 575–591. https://doi.org/10.1007/s11368-020-02805-x 

Cho, E., Jacobs, J. M., Jia, X., & Kraatz, S. (2019). Identifying Subsurface Drainage using 

Satellite Big Data and Machine Learning via Google Earth Engine. Water Resources 
Research, 55(10), 8028–8045. https://doi.org/10.1029/2019wr024892 

Clark, C. M., Phelan, J., Doraiswamy, P., Buckley, J., Cajka, J. C., Dennis, R. L., et al. (2018). 

Atmospheric deposition and exceedances of critical loads from 1800−2025 for the 
conterminous United States. Ecological Applications, 28(4), 978–1002. 
https://doi.org/10.1002/eap.1703 

Claverie, M., Ju, J., Masek, J. G., Dungan, J. L., Vermote, E. F., Roger, J.-C., et al. (2018). The 

Harmonized Landsat and Sentinel-2 surface reflectance data set. Remote Sensing of 
Environment, 219, 145–161. https://doi.org/10.1016/j.rse.2018.09.002 

Clement, D. R., & Steinman, A. D. (2017). Phosphorus loading and ecological impacts from 
agricultural tile drains in a west Michigan watershed. Journal of Great Lakes Research, 
43(1), 50–58. https://doi.org/10.1016/j.jglr.2016.10.016 

Congalton, R. G., & Green, K. (2019). Assessing the accuracy of remotely sensed data: 

principles and practices, Third Edition (Third Edition). CRC Press. 

Corriveau, J., Chambers, P. A., & Culp, J. M. (2013). Seasonal Variation in Nutrient Export 
Along Streams in the Northern Great Plains. Water, Air, & Soil Pollution, 224(7), 1594. 
https://doi.org/10.1007/s11270-013-1594-1 

Cotner, J. B., Weinke, A. D., & Biddanda, B. A. (2017). Great Lakes: Science can keep them 

great. Journal of Great Lakes Research, 43(5), 916–919. 
https://doi.org/10.1016/j.jglr.2017.07.002 

Dai, Y., Lang, Y., Wang, T., Han, X., Wang, L., & Zhong, J. (2021). Modelling the sources and 
transport of ammonium nitrogen with the SPARROW model: A case study in a karst basin. 
Journal of Hydrology, 592, 125763. https://doi.org/10.1016/j.jhydrol.2020.125763 

David, M. B., Drinkwater, L. E., & McIsaac, G. F. (2010). Sources of Nitrate Yields in the 

Mississippi River Basin. Journal of Environmental Quality, 39(5), 1657–1667. 
https://doi.org/10.2134/jeq2010.0115 

Deines, J. M., Kendall, A. D., & Hyndman, D. W. (2017). Annual Irrigation Dynamics in the 

U.S. Northern High Plains Derived from Landsat Satellite Data. Geophysical Research 
Letters, 44(18), 9350–9360. https://doi.org/10.1002/2017gl074071 

Deines, J. M., Kendall, A. D., Crowley, M. A., Rapp, J., Cardille, J. A., & Hyndman, D. W. 

(2019). Mapping three decades of annual irrigation across the US High Plains Aquifer using 

119 

 
 
Landsat and Google Earth Engine. Remote Sensing of Environment, 233, 111400. 
https://doi.org/10.1016/j.rse.2019.111400 

FAO. (2017). FAO Statistical Databases. Retrieved from http://faostat.fao.org 

Fausey, N. R., Brown, L. C., Belcher, H. W., & Kanwar, R. S. (1995). Drainage and Water 
Quality in Great Lakes and Cornbelt States. Journal of Irrigation and Drainage 
Engineering, 121(4), 283–288. https://doi.org/10.1061/(asce)0733-9437(1995)121:4(283) 

Filipović, V., Toor, G. S., Ondrašek, G., & Kodešová, R. (2015). Modeling water flow and 

nitrate–nitrogen transport on golf course under turfgrass. Journal of Soils and Sediments, 
15(8), 1847–1859. https://doi.org/10.1007/s11368-014-0980-7 

Finocchiaro, R. G. (2014). Agricultural Subsurface Drainage Tile Locations by Permits in South 

Dakota. Retrieved from http://dx.doi.org/10.5066/F7KS6PNW 

Finocchiaro, R. G. (2016). Agricultural Subsurface Drainage Tile Locations by Permits in North 

Dakota. Retrieved from http://dx.doi.org/10.5066/F7QF8QZW 

Foley, T. A., & Betterton, E. A. (2019). Nitrogen dry deposition to Lake Superior and Lake 

Michigan. Journal of Great Lakes Research, 45(2), 224–239. 
https://doi.org/10.1016/j.jglr.2018.12.003 

Ford, C. M., Kendall, A. D., & Hyndman, D. W. (2020). Effects of shifting snowmelt regimes on 
the hydrology of non-alpine temperate landscapes. Journal of Hydrology, 590, 125517. 
https://doi.org/10.1016/j.jhydrol.2020.125517 

Ford, C. M., Kendall, A. D., & Hyndman, D. W. (2021). Snowpacks decrease and streamflows 
shift across the eastern US as winters warm. Science of The Total Environment, 793, 
148483. https://doi.org/10.1016/j.scitotenv.2021.148483 

Frei, R. J., Lawson, G. M., Norris, A. J., Cano, G., Vargas, M. C., Kujanpää, E., et al. (2021). 

Limited progress in nutrient pollution in the U.S. caused by spatially persistent nutrient 
sources. PLoS ONE, 16(11), e0258952. https://doi.org/10.1371/journal.pone.0258952 

Gao, B. (1996). NDWI—A normalized difference water index for remote sensing of vegetation 

liquid water from space. Remote Sensing of Environment, 58(3), 257–266. 
https://doi.org/10.1016/s0034-4257(96)00067-3 

Gassman, P. W., Sadeghi, A. M., & Srinivasan, R. (2014). Applications of the SWAT Model 

Special Section: Overview and Insights. Journal of Environmental Quality, 43(1), 1–8. 
https://doi.org/10.2134/jeq2013.11.0466 

Gill, L. W., & Mockler, E. M. (2016). Modeling the pathways and attenuation of nutrients from 

domestic wastewater treatment systems at a catchment scale. Environmental Modelling & 
Software, 84, 363–377. https://doi.org/10.1016/j.envsoft.2016.07.006 

120 

 
 
Gobler, C. J. (2020). Climate Change and Harmful Algal Blooms: Insights and perspective. 

Harmful Algae, 91, 101731. https://doi.org/10.1016/j.hal.2019.101731 

Gökkaya, K., Budhathoki, M., Christopher, S. F., Hanrahan, B. R., & Tank, J. L. (2017). 

Subsurface tile drained area detection using GIS and remote sensing in an agricultural 
watershed. Ecological Engineering, 108, 370–379. 
https://doi.org/10.1016/j.ecoleng.2017.06.048 

Goyette, J., Bennett, E. M., Howarth, R. W., & Maranger, R. (2016). Changes in anthropogenic 

nitrogen and phosphorus inputs to the St. Lawrence sub‐basin over 110 years and impacts on 
riverine export. Global Biogeochemical Cycles, 30(7), 1000–1014. 
https://doi.org/10.1002/2016gb005384 

Gsech, D., Oimoen, M., Greenlee, S., Nelson, C., Steuck, M., & Tyler, D. (2002). The National 

Elevation Dataset: photogrammetric engineering and remote sensing, 68, 5–11. 

Guo, L. (2007). Doing Battle With the Green Monster of Taihu Lake. Science, 317(5842), 1166–

1166. https://doi.org/10.1126/science.317.5842.1166 

Guo, T., Johnson, L. T., LaBarge, G. A., Penn, C. J., Stumpf, R. P., Baker, D. B., & Shao, G. 
(2021). Less Agricultural Phosphorus Applied in 2019 Led to Less Dissolved Phosphorus 
Transported to Lake Erie. Environmental Science & Technology, 55(1), 283–291. 
https://doi.org/10.1021/acs.est.0c03495 

Hallegraeff, G., Enevoldsen, H., & Zingone, A. (2021). Global harmful algal bloom status 
reporting. Harmful Algae, 102, 101992. https://doi.org/10.1016/j.hal.2021.101992 

Hamlin, Q. F., Kendall, A. D., Martin, S. L., Whitenack, H. D., Roush, J. A., Hannah, B. A., & 
Hyndman, D. W. (2020a). Quantifying Landscape Nutrient Inputs With Spatially Explicit 
Nutrient Source Estimate Maps. Journal of Geophysical Research: Biogeosciences, 125(2). 
https://doi.org/10.1029/2019jg005134 

Hamlin, Q. F., Kendall, A. D., Martin, S., Whitenack, H., Roush, J., Hannah, B., & Hyndman, D. 
W. (2020b, February 11). SENSEmap-USGLB: Nitrogen and Phosphorus Inputs. Retrieved 
from https://www.hydroshare.org/resource/1a116e5460e24177999c7bd6f8292421/ 

Hamlin, Q. F., Martin, S. L., Kendall, A. D., & Hyndman, D. W. (2022). Examining 

Relationships Between Groundwater Nitrate Concentrations in Drinking Water and 
Landscape Characteristics to Understand Health Risks. GeoHealth, 6(5), e2021GH000524. 
https://doi.org/10.1029/2021gh000524 

Han, H., Bosch, N., & Allan, J. D. (2011). Spatial and temporal variation in phosphorus budgets 
for 24 watersheds in the Lake Erie and Lake Michigan basins. Biogeochemistry, 102(1–3), 
45–58. https://doi.org/10.1007/s10533-010-9420-y 

Hartman, M. D., Baron, J. S., Ewing, H. A., & Weathers, K. C. (2014). Combined global change 

effects on ecosystem processes in nine U.S. topographically complex areas. 
Biogeochemistry, 119(1–3), 85–108. https://doi.org/10.1007/s10533-014-9950-9 

121 

 
 
Heil, C. A., & Muni-Morgan, A. L. (2021). Florida’s Harmful Algal Bloom (HAB) Problem: 
Escalating Risks to Human, Environmental and Economic Health With Climate Change. 
Frontiers in Ecology and Evolution, 9, 646080. https://doi.org/10.3389/fevo.2021.646080 

Hirt, U., & Volk, M. (2011). Quantifying the proportion of tile-drained land in large river basins. 

Physics and Chemistry of the Earth, Parts A/B/C, 36(12), 591–598. 
https://doi.org/10.1016/j.pce.2011.05.004 

Holman, I. P., Whelan, M. J., Howden, N. J. K., Bellamy, P. H., Willby, N. J., Rivas‐Casado, M., 

& McConvey, P. (2008). Phosphorus in groundwater—an overlooked contributor to 
eutrophication? Hydrological Processes, 22(26), 5121–5127. 
https://doi.org/10.1002/hyp.7198 

Hong, B., Swaney, D. P., & Howarth, R. W. (2013). Estimating Net Anthropogenic Nitrogen 
Inputs to U.S. Watersheds: Comparison of Methodologies. Environmental Science & 
Technology, 47(10), 5199–5207. https://doi.org/10.1021/es303437c 

Houlton, B. Z., Almaraz, M., Aneja, V., Austin, A. T., Bai, E., Cassman, K. G., et al. (2019). A 
World of Cobenefits: Solving the Global Nitrogen Challenge. Earth’s Future, 7(8), 865–
872. https://doi.org/10.1029/2019ef001222 

Howarth, R. W., Billen, G., Swaney, D., Townsend, A., Jaworski, N., Lajtha, K., et al. (1996). 

Regional nitrogen budgets and riverine N & P fluxes for the drainages to the North Atlantic 
Ocean: Natural and human influences. Biogeochemistry, 35(1), 75–139. 
https://doi.org/10.1007/bf02179825 

Huggins, X., Gleeson, T., Serrano, D., Zipper, S., Jehn, F., Rohde, M. M., et al. (2023). 

Overlooked risks and opportunities in groundwatersheds of the world’s protected areas. 
Nature Sustainability, 1–10. https://doi.org/10.1038/s41893-023-01086-9 

Huisman, N. L. H., Huisman, A. J., & Karthikeyan, K. G. (2017). Seasonal and animal farm size 
influences on in-stream phosphorus transport in an agricultural watershed. Nutrient Cycling 
in Agroecosystems, 109(1), 29–42. https://doi.org/10.1007/s10705-017-9866-6 

Hyndman, D. W., Kendall, A. D., & Welty, N. R. H. (2007). Evaluating Temporal and Spatial 
Variations in Recharge and Streamflow Using the Integrated Landscape Hydrology Model 
(ILHM). 

ICID. (2018, January 1). World Drained Area. Retrieved from https://www.icid.org/world-

drained-area.pdf 

Ikenberry, C. D., Soupir, M. L., Schilling, K. E., Jones, C. S., & Seeman, A. (2014). Nitrate‐
Nitrogen Export: Magnitude and Patterns from Drainage Districts to Downstream River 
Basins. Journal of Environmental Quality, 43(6), 2024–2033. 
https://doi.org/10.2134/jeq2014.05.0242 

122 

 
 
Jame, S. A., Frankenberger, J., Reinhart, B. D., & Bowling, L. (2022). Mapping Agricultural 

Drainage Extent in the U.S. Corn Belt: The Value of Multiple Methods. Applied 
Engineering in Agriculture, 38(6), 917–930. https://doi.org/10.13031/aea.15226 

Kalcic, M. M., Muenich, R. L., Basile, S., Steiner, A. L., Kirchhoff, C., & Scavia, D. (2019). 
Climate Change and Nutrient Loading in the Western Lake Erie Basin: Warming Can 
Counteract a Wetter Future. Environmental Science & Technology, 53(13), 7543–7550. 
https://doi.org/10.1021/acs.est.9b01274 

Khand, K., Kjaersgaard, J., Hay, C., & Jia, X. (2017). Estimating Impacts of Agricultural 

Subsurface Drainage on Evapotranspiration Using the Landsat Imagery-Based METRIC 
Model. Hydrology, 4(4), 49. https://doi.org/10.3390/hydrology4040049 

King, K. W., Fausey, N. R., & Williams, M. R. (2014). Effect of subsurface drainage on 

streamflow in an agricultural headwater watershed. Journal of Hydrology, 519, 438–445. 
https://doi.org/10.1016/j.jhydrol.2014.07.035 

King, K. W., Williams, M. R., & Fausey, N. R. (2015). Contributions of Systematic Tile 

Drainage to Watershed‐Scale Phosphorus Transport. Journal of Environmental Quality, 
44(2), 486–494. https://doi.org/10.2134/jeq2014.04.0149 

King, K. W., Williams, M. R., Macrae, M. L., Fausey, N. R., Frankenberger, J., Smith, D. R., et 

al. (2015). Phosphorus Transport in Agricultural Subsurface Drainage: A Review. Journal of 
Environmental Quality, 44(2), 467–485. https://doi.org/10.2134/jeq2014.04.0163 

Knights, D., Parks, K. C., Sawyer, A. H., David, C. H., Browning, T. N., Danner, K. M., & 

Wallace, C. D. (2017). Direct groundwater discharge and vulnerability to hidden nutrient 
loads along the Great Lakes coast of the United States. Journal of Hydrology, 554, 331–341. 
https://doi.org/10.1016/j.jhydrol.2017.09.001 

Kokulan, V. (2019). Environmental and Economic Consequences of Tile Drainage Systems in 
Canada (Journal of Environmental Quality) (pp. 469–477). Retrieved from https://capi-
icpa.ca/wp-content/uploads/2019/06/2019-06-14-CAPI-Vivekananthan-Kokulan-Paper-
WEB-1.pdf 

Kokulan, V., Macrae, M. L., Lobb, D. A., & Ali, G. A. (2019). Contribution of Overland and 
Tile Flow to Runoff and Nutrient Losses from Vertisols in Manitoba, Canada. Journal of 
Environmental Quality, 48(4), 959–965. https://doi.org/10.2134/jeq2019.03.0103 

Kottek, M., Grieser, J., Beck, C., Rudolf, B., & Rubel, F. (2006). World Map of the Köppen-
Geiger climate classification updated. Meteorologische Zeitschrift, 15(3), 259–263. 
https://doi.org/10.1127/0941-2948/2006/0130 

Kovalenko, K. E., Reavie, E. D., Barbiero, R. P., Burlakova, L. E., Karatayev, A. Y., Rudstam, 

L. G., & Watkins, J. M. (2018). Patterns of long-term dynamics of aquatic communities and 
water quality parameters in the Great Lakes: Are they synchronized? Journal of Great Lakes 
Research, 44(4), 660–669. https://doi.org/10.1016/j.jglr.2018.05.018 

123 

 
 
Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal 

of  Statistical Software, 28 (5), 1–26. Retrieved from 
https://www.jstatsoft.org/article/view/v028i05 

LaBeau, M., Mayer, A., Griffis, V., Watkins, D., Robertson, D., & Gyawali, R. (2015). The 

importance of considering shifts in seasonal changes in discharges when predicting future 
phosphorus loads in streams. Biogeochemistry, 126(1–2), 153–172. 
https://doi.org/10.1007/s10533-015-0149-5 

Lapointe, B. E., Herren, L. W., & Paule, A. L. (2017). Septic systems contribute to nutrient 

pollution and harmful algal blooms in the St. Lucie Estuary, Southeast Florida, USA. 
Harmful Algae, 70, 1–22. https://doi.org/10.1016/j.hal.2017.09.005 

Lawrence, R. L., Wood, S. D., & Sheley, R. L. (2006). Mapping invasive plants using 

hyperspectral imagery and Breiman Cutler classifications (randomForest). Remote Sensing 
of Environment, 100(3), 356–362. https://doi.org/10.1016/j.rse.2005.10.014 

Lefcheck, J. S., Orth, R. J., Dennison, W. C., Wilcox, D. J., Murphy, R. R., Keisman, J., et al. 
(2018). Long-term nutrient reductions lead to the unprecedented recovery of a temperate 
coastal region. Proceedings of the National Academy of Sciences, 115(14), 3658–3662. 
https://doi.org/10.1073/pnas.1715798115 

Li, Y., Robinson, S. V. J., Nguyen, L. H., & Liu, J. (2023). Satellite prediction of coastal hypoxia 

in the northern Gulf of Mexico. Remote Sensing of Environment, 284, 113346. 
https://doi.org/10.1016/j.rse.2022.113346 

Liaw, A., & Wiener, M. (2002a). Classification and Regression by randomForest. R News, 2(3), 

18–22. 

Liaw, A., & Wiener, M. (2002b). Classification and Regression by randomForest, (2(3)), 18–22. 

Lintern, A., McPhillips, L., Winfrey, B., Duncan, J., & Grady, C. (2020). Best Management 

Practices for Diffuse Nutrient Pollution: Wicked Problems Across Urban and Agricultural 
Watersheds. Environmental Science & Technology, 54(15), 9159–9174. 
https://doi.org/10.1021/acs.est.9b07511 

Liu, M., Graham, N., Wang, W., Zhao, R., Lu, Y., Elimelech, M., & Yu, W. (2022). Spatial 

assessment of tap-water safety in China. Nature Sustainability, 1–10. 
https://doi.org/10.1038/s41893-022-00898-5 

Luscz, E. C., Kendall, A. D., & Hyndman, D. W. (2015). High resolution spatially explicit 
nutrient source models for the Lower Peninsula of Michigan. Journal of Great Lakes 
Research, 41(2), 618–629. https://doi.org/10.1016/j.jglr.2015.02.004 

Luscz, E. C., Kendall, A. D., & Hyndman, D. W. (2017). A spatially explicit statistical model to 
quantify nutrient sources, pathways, and delivery at the regional scale. Biogeochemistry, 
133(1), 37–57. https://doi.org/10.1007/s10533-017-0305-1 

124 

 
 
Ma, Z., Guan, K., Peng, B., Sivapalan, M., Li, L., Pan, M., et al. (2023). Agricultural nitrate 
export patterns shaped by crop rotation and tile drainage. Water Research, 229, 119468. 
https://doi.org/10.1016/j.watres.2022.119468 

Maccoux, M. J., Dove, A., Backus, S. M., & Dolan, D. M. (2016). Total and soluble reactive 
phosphorus loadings to Lake Erie A detailed accounting by year, basin, country, and 
tributary. Journal of Great Lakes Research, 42(6), 1151–1165. 
https://doi.org/10.1016/j.jglr.2016.08.005 

Martin, S. L., Hayes, D. B., Rutledge, D. T., & Hyndman, D. W. (2011). The land‐use legacy 
effect: Adding temporal context to lake chemistry. Limnology and Oceanography, 56(6), 
2362–2370. https://doi.org/10.4319/lo.2011.56.6.2362 

Martin, S. L., Hayes, D. B., Kendall, A. D., & Hyndman, D. W. (2017). The land-use legacy 

effect: Towards a mechanistic understanding of time-lagged water quality responses to land 
use/cover. Science of The Total Environment, 579, 1794–1803. 
https://doi.org/10.1016/j.scitotenv.2016.11.158 

Martin, S. L., Hamlin, Q. F., Kendall, A. D., Wan, L., & Hyndman, D. W. (2021). The land use 
legacy effect: looking back to see a path forward to improve management. Environmental 
Research Letters, 16(3), 035005. https://doi.org/10.1088/1748-9326/abe14c 

Matasci, G., Hermosilla, T., Wulder, M. A., White, J. C., Coops, N. C., Hobart, G. W., & Zald, 
H. S. J. (2018). Large-area mapping of Canadian boreal forest cover, height, biomass and 
other structural attributes using Landsat composites and lidar plots. Remote Sensing of 
Environment, 209, 90–106. https://doi.org/10.1016/j.rse.2017.12.020 

Mayorga, E., Seitzinger, S. P., Harrison, J. A., Dumont, E., Beusen, A. H. W., Bouwman, A. F., 
et al. (2010a). Global Nutrient Export from WaterSheds 2 (NEWS 2): Model development 
and implementation. Environmental Modelling & Software, 25(7), 837–853. 
https://doi.org/10.1016/j.envsoft.2010.01.007 

Mayorga, E., Seitzinger, S. P., Harrison, J. A., Dumont, E., Beusen, A. H. W., Bouwman, A. F., 
et al. (2010b). Global Nutrient Export from WaterSheds 2 (NEWS 2): Model development 
and implementation. Environmental Modelling & Software, 25(7), 837–853. 
https://doi.org/10.1016/j.envsoft.2010.01.007 

McCrackin, M. L., Harrison, J. A., & Compton, J. E. (2013). A comparison of NEWS and 

SPARROW models to understand sources of nitrogen delivered to US coastal areas. 
Biogeochemistry, 114(1–3), 281–297. https://doi.org/10.1007/s10533-012-9809-x 

McCrackin, M. L., Harrison, J. A., & Compton, J. E. (2014). Factors influencing export of 

dissolved inorganic nitrogen by major rivers: A new, seasonal, spatially explicit, global 
model. Global Biogeochemical Cycles, 28(3), 269–285. 
https://doi.org/10.1002/2013gb004723 

McCrackin, M. L., Cooter, E. J., Dennis, R. L., Harrison, J. A., & Compton, J. E. (2017). 

Alternative futures of dissolved inorganic nitrogen export from the Mississippi River Basin: 

125 

 
 
influence of crop management, atmospheric deposition, and population growth. 
Biogeochemistry, 133(3), 263–277. https://doi.org/10.1007/s10533-017-0331-z 

Mchau, G. J., Makule, E., Machunda, R., Gong, Y. Y., & Kimanya, M. (2019). Harmful algal 

bloom and associated health risks among users of Lake Victoria freshwater: Ukerewe Island, 
Tanzania. Journal of Water and Health, 17(5), 826–836. 
https://doi.org/10.2166/wh.2019.083 

Meinikmann, K., Hupfer, M., & Lewandowski, J. (2015). Phosphorus in groundwater discharge 

– A potential source for lake eutrophication. Journal of Hydrology, 524, 214–226. 
https://doi.org/10.1016/j.jhydrol.2015.02.031 

Meter, K J Van, & Basu, N. B. (2017). Time lags in watershed-scale nutrient transport: an 
exploration of dominant controls. Environmental Research Letters, 12(8), 084017. 
https://doi.org/10.1088/1748-9326/aa7bf4 

Meter, Kimberly J Van, Schultz, V. O., & Chang, S. Y. (2023). Data-driven approaches 

demonstrate legacy N accumulation in upper Mississippi river basin groundwater. 
Environmental Research Letters. https://doi.org/10.1088/1748-9326/acea34 

Michalak, A. M., Anderson, E. J., Beletsky, D., Boland, S., Bosch, N. S., Bridgeman, T. B., et al. 
(2013). Record-setting algal bloom in Lake Erie caused by agricultural and meteorological 
trends consistent with expected future conditions. Proceedings of the National Academy of 
Sciences, 110(16), 6448–6452. https://doi.org/10.1073/pnas.1216006110 

Michaud, A. R., Poirier, S., & Whalen, J. K. (2019). Tile Drainage as a Hydrologic Pathway for 
Phosphorus Export from an Agricultural Subwatershed. Journal of Environmental Quality, 
48(1), 64–72. https://doi.org/10.2134/jeq2018.03.0104 

Miller, M. P., Tesoriero, A. J., Hood, K., Terziotti, S., & Wolock, D. M. (2017). Estimating 
Discharge and Nonpoint Source Nitrate Loading to Streams From Three End‐Member 
Pathways Using High‐Frequency Water Quality Data. Water Resources Research, 53(12), 
10201–10216. https://doi.org/10.1002/2017wr021654 

Miller, M. P., Capel, P. D., García, A. M., & Ator, S. W. (2020). Response of Nitrogen Loading 

to the Chesapeake Bay to Source Reduction and Land Use Change Scenarios: A 
SPARROW‐Informed Analysis. JAWRA Journal of the American Water Resources 
Association, 56(1), 100–112. https://doi.org/10.1111/1752-1688.12807 

Miller, S. A., & Lyon, S. W. (2021a). Tile drainage causes flashy streamflow response in Ohio 

watersheds. Hydrological Processes, 35(8). https://doi.org/10.1002/hyp.14326 

Miller, S. A., & Lyon, S. W. (2021b). Tile Drainage Increases Total Runoff and Phosphorus 

Export During Wet Years in the Western Lake Erie Basin. Frontiers in Water, 3, 757106. 
https://doi.org/10.3389/frwa.2021.757106 

126 

 
 
Miller, S. A., Lyon, S. W., & Moore, R. H. (2022). Impacts of a nutrient trading plan on stream 
water quality in Sugar Creek, Ohio. JAWRA Journal of the American Water Resources 
Association. https://doi.org/10.1111/1752-1688.13067 

Møller, A. B., Beucher, A., Iversen, B. V., & Greve, M. H. (2018). Predicting artificially drained 

areas by means of a selective model ensemble. Geoderma, 320, 30–42. 
https://doi.org/10.1016/j.geoderma.2018.01.018 

Molnar, C. (2018). iml: An R package for Interpretable Machine Learning. Journal of Open 

Source Software, 3(26), 786. https://doi.org/10.21105/joss.00786 

Mooney, R. J., Stanley, E. H., Rosenthal, W. C., Esselman, P. C., Kendall, A. D., & McIntyre, P. 

B. (2020). Outsized nutrient contributions from small tributaries to a Great Lake. 
Proceedings of the National Academy of Sciences, 117(45), 28175–28182. 
https://doi.org/10.1073/pnas.2001376117 

Motew, M., Booth, E. G., Carpenter, S. R., Chen, X., & Kucharik, C. J. (2018). The synergistic 
effect of manure supply and extreme precipitation on surface water quality. Environmental 
Research Letters, 13(4), 044016. https://doi.org/10.1088/1748-9326/aaade6 

Nasab, M. T., Grimm, K., Bazrkar, M. H., Zeng, L., Shabani, A., Zhang, X., & Chu, X. (2018). 
SWAT Modeling of Non-Point Source Pollution in Depression-Dominated Basins under 
Varying Hydroclimatic Conditions. International Journal of Environmental Research and 
Public Health, 15(11), 2492. https://doi.org/10.3390/ijerph15112492 

NASS. (2012). Agricultural Census data for county-level tile drainage area, 2012. Retrieved 

February 15, 2019, from https://www.nass.usda.gov/ 

NASS. (2017). Agricultural Census data for county-level tile drainage area, 2017. Retrieved 

from https://www.nass.usda.gov/ 

Naz, B. S., & Bowling, L. C. (2008). Automated identification of tile lines from remotely sensed 

data. Transactions of the ASABE. 

Naz, B. S., Ale, S., & Bowling, L. C. (2009). Detecting subsurface drainage systems and 

estimating drain spacing in intensively managed agricultural landscapes. Agricultural Water 
Management, 96(4), 627–637. https://doi.org/10.1016/j.agwat.2008.10.002 

NOAA. (2019, February 1). Great Lakes Ecoregion. Retrieved September 30, 2021, from 

https://www.noaa.gov/education/resource-collections/freshwater/great-lakes-ecoregion 

Ockenden, M. C., Hollaway, M. J., Beven, K. J., Collins, A. L., Evans, R., Falloon, P. D., et al. 
(2017). Major agricultural changes required to mitigate phosphorus losses under climate 
change. Nature Communications, 8(1), 161. https://doi.org/10.1038/s41467-017-00232-0 

Oldfield, L., Rakhimbekova, S., Roy, J. W., & Robinson, C. E. (2020). Estimation of phosphorus 
loads from septic systems to tributaries in the Canadian Lake Erie Basin. Journal of Great 
Lakes Research, 46(6), 1559–1569. https://doi.org/10.1016/j.jglr.2020.08.021 

127 

 
 
Oldfield, L. E., Roy, J. W., & Robinson, C. E. (2020). Investigating the use of the artificial 

sweetener acesulfame to evaluate septic system inputs and their nutrient loads to streams at 
the watershed scale. Journal of Hydrology, 587, 124918. 
https://doi.org/10.1016/j.jhydrol.2020.124918 

OSU. (2014, February 4). PRISM Climate Group. Oregon State University. Retrieved from 

https://prism.oregonstate.edu 

Ouyang, Y., & Zhang, J.-E. (2012). Quantification of Shallow Groundwater Nutrient Dynamics 

in Septic Areas. Water, Air, & Soil Pollution, 223(6), 3181–3193. 
https://doi.org/10.1007/s11270-012-1100-1 

Paerl, H. W., Hall, N. S., & Calandrino, E. S. (2011). Controlling harmful cyanobacterial blooms 
in a world experiencing anthropogenic and climatic-induced change. Science of The Total 
Environment, 409(10), 1739–1745. https://doi.org/10.1016/j.scitotenv.2011.02.001 

Paerl, H. W., Gardner, W. S., Havens, K. E., Joyner, A. R., McCarthy, M. J., Newell, S. E., et al. 

(2016). Mitigating cyanobacterial harmful algal blooms in aquatic ecosystems impacted by 
climate change and anthropogenic nutrients. Harmful Algae, 54, 213–222. 
https://doi.org/10.1016/j.hal.2015.09.009 

Pearce, N. J. T., & Yates, A. G. (2020). Spatial and temporal patterns in macronutrient 

concentrations and stoichiometry of tributaries draining the lower Great Lakes-St. Lawrence 
basin. Journal of Great Lakes Research, 46(4), 989–1000. 
https://doi.org/10.1016/j.jglr.2020.05.002 

Peel, M. C., Finlayson, B. L., & McMahon, T. A. (2007). Updated world map of the Köppen-
Geiger climate classification. Hydrology and Earth System Sciences, 11(5), 1633–1644. 
https://doi.org/10.5194/hess-11-1633-2007 

Pennino, M. J., Compton, J. E., & Leibowitz, S. G. (2017). Trends in Drinking Water Nitrate 

Violations Across the United States. Environmental Science & Technology, 51(22), 13450–
13460. https://doi.org/10.1021/acs.est.7b04269 

Pennuto, C. M., Dayton, L., Kane, D. D., & Bridgeman, T. B. (2014). Lake Erie nutrients: From 

watersheds to open water. Journal of Great Lakes Research, 40(3), 469–472. 
https://doi.org/10.1016/j.jglr.2014.07.002 

Pollack, L., Comuzzi, J., Brooks, I. B., Trépanier, P., Speck, S., & Knott, L. D. (2011). Fifteenth 

Biennial Report Executive Summary & Recommendations. 

Powers, S. M., Bruulsema, T. W., Burt, T. P., Chan, N. I., Elser, J. J., Haygarth, P. M., et al. 

(2016). Long-term accumulation and transport of anthropogenic phosphorus in three river 
basins. Nature Geoscience, 9(5), 353–356. https://doi.org/10.1038/ngeo2693 

Preston, S. D., Alexander, R. B., & Wolock, D. M. (2011). Sparrow Modeling to Understand 
Water-Quality Conditions in Major Regions of the United States: A Featured Collection 

128 

 
 
Introduction1. Journal of the American Water Resources Association, 47(5), 887–890. 
https://doi.org/10.1111/j.1752-1688.2011.00585.x 

Pretty, J. N., Mason, C. F., Nedwell, D. B., Hine, R. E., Leaf, S., & Dils, R. (2003). 

Environmental Costs of Freshwater Eutrophication in England and Wales. Environmental 
Science & Technology, 37(2), 201–208. https://doi.org/10.1021/es020793k 

Prinds, C., Petersen, R. J., Greve, M. H., & Iversen, B. V. (2019). Locating tile drainage outlets 
and surface flow in riparian lowlands using thermal infrared and RGB-NIR remote sensing. 
Geografisk Tidsskrift-Danish Journal of Geography, 119(1), 1–12. 
https://doi.org/10.1080/00167223.2019.1573408 

Rabalais, N. N., & Turner, R. E. (2019). Gulf of Mexico Hypoxia: Past, Present, and Future. 

Limnology and Oceanography Bulletin, 28(4), 117–124. https://doi.org/10.1002/lob.10351 

Rakhimbekova, S., O’Carroll, D. M., Oldfield, L. E., Ptacek, C. J., & Robinson, C. E. (2021). 
Spatiotemporal controls on septic system derived nutrients in a nearshore aquifer and their 
discharge to a large lake. Science of The Total Environment, 752, 141262. 
https://doi.org/10.1016/j.scitotenv.2020.141262 

Rattan, K. J., Corriveau, J. C., Brua, R. B., Culp, J. M., Yates, A. G., & Chambers, P. A. (2017). 
Quantifying seasonal variation in total phosphorus and nitrogen from prairie streams in the 
Red River Basin, Manitoba Canada. Science of The Total Environment, 575, 649–659. 
https://doi.org/10.1016/j.scitotenv.2016.09.073 

Rattan, K. J., Blukacz-Richards, E. A., Yates, A. G., Culp, J. M., & Chambers, P. A. (2019). 
Hydrological variability affects particulate nitrogen and phosphorus in streams of the 
Northern Great Plains. Journal of Hydrology: Regional Studies, 21, 110–125. 
https://doi.org/10.1016/j.ejrh.2018.12.008 

Read, E. K., Carr, L., Cicco, L. D., Dugan, H. A., Hanson, P. C., Hart, J. A., et al. (2017). Water 
quality data for national‐scale aquatic research: The Water Quality Portal. Water Resources 
Research, 53(2), 1735–1745. https://doi.org/10.1002/2016wr019993 

Reay, W. G. (2004). Septic Tank Impacts on Ground Water Quality and Nearshore Sediment 

Nutrient Flux. Groundwater, 42(7), 1079–1089. https://doi.org/10.1111/j.1745-
6584.2004.tb02645.x 

Ren, D., Engel, B., Mercado, J. A. V., Guo, T., Liu, Y., & Huang, G. (2022). Modeling and 
assessing water and nutrient balances in a tile-drained agricultural watershed in the U.S. 
Corn Belt. Water Research, 210, 117976. https://doi.org/10.1016/j.watres.2021.117976 

Richards, R. P., Alameddine, I., Allan, J. D., Baker, D. B., Bosch, N. S., Confesor, R., et al. 

(2012). “Nutrient Inputs to the Laurentian Great Lakes by Source and Watershed Estimated 
Using SPARROW Watershed Models” by Dale M. Robertson and David A. Saad 2: 
Discussion - Nutrient Inputs to the Laurentian Great Lakes by Source and Watershed 
Estimated Using SPARROW Watershed Models. JAWRA Journal of the American Water 
Resources Association, 49(3), 715–724. https://doi.org/10.1111/jawr.12006 

129 

 
 
Rixon, S., Levison, J., Binns, A., & Persaud, E. (2020). Spatiotemporal variations of nitrogen 

and phosphorus in a clay plain hydrological system in the Great Lakes Basin. Science of The 
Total Environment, 714, 136328. https://doi.org/10.1016/j.scitotenv.2019.136328 

Robertson, D. M., & Saad, D. A. (2011a). Nutrient Inputs to the Laurentian Great Lakes by 
Source and Watershed Estimated Using SPARROW Watershed Models. Journal of the 
American Water Resources Association, 47(5), 1011–1033. https://doi.org/10.1111/j.1752-
1688.2011.00574.x 

Robertson, D. M., & Saad, D. A. (2011b). Nutrient Inputs to the Laurentian Great Lakes by 

Source and Watershed Estimated Using SPARROW Watershed Models1: Nutrient Inputs to 
the Laurentian Great Lakes by Source and Watershed Estimated Using SPARROW 
Watershed Models. JAWRA Journal of the American Water Resources Association, 47(5), 
1011–1033. https://doi.org/10.1111/j.1752-1688.2011.00574.x 

Robertson, D. M., & Saad, D. A. (2013). SPARROW Models Used to Understand Nutrient 
Sources in the Mississippi/Atchafalaya River Basin. Journal of Environmental Quality, 
42(5), 1422–1440. https://doi.org/10.2134/jeq2013.02.0066 

Robertson, D. M., Saad, D. A., Benoy, G. A., Vouk, I., Schwarz, G. E., & Laitta, M. T. (2019). 
Phosphorus and Nitrogen Transport in the Binational Great Lakes Basin Estimated Using 
SPARROW Watershed Models. JAWRA Journal of the American Water Resources 
Association, 55(6), 1401–1424. https://doi.org/10.1111/1752-1688.12792 

Robertson, W. D., Stempvoort, D. R. V., & Schiff, S. L. (2019). Review of phosphorus 
attenuation in groundwater plumes from 24 septic systems. Science of The Total 
Environment, 692, 640–652. https://doi.org/10.1016/j.scitotenv.2019.07.198 

Robinson, C. (2015). Review on groundwater as a source of nutrients to the Great Lakes and 

their tributaries. Journal of Great Lakes Research, 41(4), 941–950. 
https://doi.org/10.1016/j.jglr.2015.08.001 

Roy, S., Swetnam, T., Trochim, E., Schwehr, K., & Pasquarella, V. (2023). samapriya/awesome-

gee-community-datasets: Community Catalog (1.0.4). Zenodo. Retrieved from 
https://doi.org/10.5281/zenodo.7614526 

Rubel, F., & Kottek, M. (2010). Observed and projected climate shifts 1901-2100 depicted by 

world maps of the Köppen-Geiger climate classification. Meteorologische Zeitschrift, 19(2), 
135–141. https://doi.org/10.1127/0941-2948/2010/0430 

Ryden, J. C., Syers, J. K., & Harris, R. F. (1974). Phosphorus in Runoff and Streams. Advances 

in Agronomy, 25, 1–45. https://doi.org/10.1016/s0065-2113(08)60777-4 

Sasaki, Y. (2007). The truth of the F-measure. Teach Tutor Mater, 1(5), 1–5. 

Scavia, D., Allan, J. D., Arend, K. K., Bartell, S., Beletsky, D., Bosch, N. S., et al. (2014). 

Assessing and addressing the re-eutrophication of Lake Erie: Central basin hypoxia. Journal 
of Great Lakes Research, 40(2), 226–246. https://doi.org/10.1016/j.jglr.2014.02.004 

130 

 
 
Schilling, K. E., Gassman, P. W., Arenas-Amado, A., Jones, C. S., & Arnold, J. (2019). 

Quantifying the contribution of tile drainage to basin-scale water yield using analytical and 
numerical models. Science of The Total Environment, 657, 297–309. 
https://doi.org/10.1016/j.scitotenv.2018.11.340 

Schmadel, N. M., Harvey, J. W., & Schwarz, G. E. (2021). Seasonally dynamic nutrient 

modeling quantifies storage lags and time-varying reactivity across large river basins. 
Environmental Research Letters, 16(9), 095004. https://doi.org/10.1088/1748-9326/ac1af4 

Schmidt, G., Jenkerson, C., Masek, J., Vermote, E., & Gao, F. (2013). Landsat ecosystem 

disturbance adaptive processing system (LEDAPS) algorithm description: U.S. Geological 
Survey Open-File Report 2013–1057, 17p. 

Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation Coefficients. Anesthesia & 
Analgesia, 126(5), 1763–1768. https://doi.org/10.1213/ane.0000000000002864 

Schroeder, T. A., Schleeweis, K. G., Moisen, G. G., Toney, C., Cohen, W. B., Freeman, E. A., et 
al. (2017). Testing a Landsat-based approach for mapping disturbance causality in U.S. 
forests. Remote Sensing of Environment, 195, 230–243. 
https://doi.org/10.1016/j.rse.2017.03.033 

Schullehner, J., Hansen, B., Thygesen, M., Pedersen, C. B., & Sigsgaard, T. (2018). Nitrate in 
drinking water and colorectal cancer risk: A nationwide population‐based cohort study. 
International Journal of Cancer, 143(1), 73–79. https://doi.org/10.1002/ijc.31306 

Seitzinger, S. P., Harrison, J. A., Dumont, E., Beusen, A. H. W., & Bouwman, A. F. (2005). 

Sources and delivery of carbon, nitrogen, and phosphorus to the coastal zone: An overview 
of Global Nutrient Export from Watersheds (NEWS) models and their application. Global 
Biogeochemical Cycles, 19(4), n/a-n/a. https://doi.org/10.1029/2005gb002606 

Sharma, A. (2020). The Wicked Problem of Diffuse Nutrient Pollution from Agriculture. Journal 

of Environmental Law, 32(3), 471–502. https://doi.org/10.1093/jel/eqaa017 

Sims, J. T., Simard, R. R., & Joern, B. C. (1998). Phosphorus Loss in Agricultural Drainage: 

Historical Perspective and Current Research. Journal of Environmental Quality, 27(2), 277–
293. https://doi.org/10.2134/jeq1998.00472425002700020006x 

Sinha, E., Michalak, A. M., & Balaji, V. (2017). Eutrophication will increase during the 21st 

century as a result of precipitation changes. Science, 357(6349), 405–408. 
https://doi.org/10.1126/science.aan2409 

Sloto, R. A., & Crouse, M. Y. (1996). HYSEP: A Computer Program for Streamflow 

Hydrograph Separation and Analysis. Retrieved from 
https://pubs.usgs.gov/wri/1996/4040/wri19964040.pdf 

Smith, D., King, K., Johnson, L., Francesconi, W., Richards, P., Baker, D., & Sharpley, A. N. 
(2015). Surface Runoff and Tile Drainage Transport of Phosphorus in the Midwestern 

131 

 
 
United States. Journal of Environmental Quality, 44(2), 495–502. 
https://doi.org/10.2134/jeq2014.04.0176 

Smith, R. A., Schwarz, G. E., & Alexander, R. B. (1997). Regional interpretation of water‐

quality monitoring data. Water Resources Research, 33(12), 2781–2798. 
https://doi.org/10.1029/97wr02171 

Smith, R. A., Schwar, G. E., & Alexander, R. B. (1997). Regional interpretation of water-quality 

monitoring data. 

Soller, D. R., Packard, P. H., & Garrity, C. P. (2012). Database for USGS Map I-1970 — Map 

showing the thickness and character of Quaternary sediments in the glaciated United States 
east of the Rocky Mountains: U.S. Geological Survey Data Series 656. Retrieved from 
https://pubs.usgs.gov/ds/656/ 

Sprague, L. A., Oelsner, G. P., & Argue, D. M. (2017). Challenges with secondary use of multi-

source water-quality data in the United States. Water Research, 110, 252–261. 
https://doi.org/10.1016/j.watres.2016.12.024 

Stenback, G. A., Crumpton, W. G., Schilling, K. E., & Helmers, M. J. (2011). Rating curve 
estimation of nutrient loads in Iowa rivers. Journal of Hydrology, 396(1–2), 158–169. 
https://doi.org/10.1016/j.jhydrol.2010.11.006 

Sterner, R. W. (2011). C:N:P stoichiometry in Lake Superior: freshwater sea as end member. 

Inland Waters, 1(1), 29–46. https://doi.org/10.5268/iw-1.1.365 

Sterner, R. W. (2020). The Laurentian Great Lakes: A Biogeochemical Test Bed. Annual Review 

of Earth and Planetary Sciences, 49(1), 1–29. https://doi.org/10.1146/annurev-earth-
071420-051746 

Sterner, R. W., Ostrom, P., Ostrom, N. E., Klump, J. V., Steinman, A. D., Dreelin, E. A., et al. 
(2017). Grand challenges for research in the Laurentian Great Lakes: Grand challenges for 
Great Lakes research. Limnology and Oceanography, 62(6), 2510–2523. 
https://doi.org/10.1002/lno.10585 

Stow, C. A., Cha, Y., Johnson, L. T., Confesor, R., & Richards, R. P. (2015). Long-Term and 
Seasonal Trend Decomposition of Maumee River Nutrient Inputs to Western Lake Erie. 
Environmental Science & Technology, 49(6), 3392–3400. https://doi.org/10.1021/es5062648 

Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable 

importance for random forests. BMC Bioinformatics, 9(1), 307. 
https://doi.org/10.1186/1471-2105-9-307 

Sugg, Z. (2007). Assessing US farm drainage: Can GIS lead to better estimates of subsurface 

drainage extent? World Resources Institute. 

132 

 
 
Suriano, Z. J., Robinson, D. A., & Leathers, D. J. (2019). Changing snow depth in the Great 

Lakes basin (USA): Implications and trends. Anthropocene, 26, 100208. 
https://doi.org/10.1016/j.ancene.2019.100208 

Swaney, D. P., Howarth, R. W., & Hong, B. (2018). Nitrogen use efficiency and crop 

production: Patterns of regional variation in the United States, 1987–2012. Science of The 
Total Environment, 635, 498–511. https://doi.org/10.1016/j.scitotenv.2018.04.027 

Thomas, N. W., Arenas, A. A., Schilling, K. E., & Weber, L. J. (2016). Numerical investigation 
of the spatial scale and time dependency of tile drainage contribution to stream flow. 
Journal of Hydrology, 538, 651–666. https://doi.org/10.1016/j.jhydrol.2016.04.055 

Tilahun, T., & Seyoum, W. M. (2020). High-Resolution Mapping of Tile Drainage in 

Agricultural Fields Using Unmanned Aerial System (UAS)-Based Radiometric Thermal and 
Optical Sensors. Hydrology, 8(1), 2. https://doi.org/10.3390/hydrology8010002 

Tong, Y., Xu, X., Zhang, S., Shi, L., Zhang, X., Wang, M., et al. (2019). Establishment of 

season-specific nutrient thresholds and analyses of the effects of nutrient management in 
eutrophic lakes through statistical machine learning. Journal of Hydrology, 578, 124079. 
https://doi.org/10.1016/j.jhydrol.2019.124079 

Tong, Y., Wang, M., Peñuelas, J., Liu, X., Paerl, H. W., Elser, J. J., et al. (2020). Improvement in 

municipal wastewater treatment alters lake nitrogen to phosphorus ratios in populated 
regions. Proceedings of the National Academy of Sciences, 117(21), 11566–11572. 
https://doi.org/10.1073/pnas.1920759117 

Tuholske, C., Halpern, B. S., Blasco, G., Villasenor, J. C., Frazier, M., & Caylor, K. (2021). 
Mapping global inputs and impacts from of human sewage in coastal ecosystems. PLoS 
ONE, 16(11), e0258898. https://doi.org/10.1371/journal.pone.0258898 

USDA. (2013). Soil Survey Geographic (SSURGO) database - Drainage Class. Retrieved from 

https://sdmdataaccess.sc.egov.usda.gov 

USDA. (2023). World Agricultural Production (WAP27-34) (Foreign Agricultural Services 

Circular Series) (pp. 27–34). Retrieved from 
https://apps.fas.usda.gov/psdonline/circulars/production 

USDA-NASS. (2017). USDA National Agricultural Statistics Service Cropland Data Layer. 

Retrieved from https://nassgeodata.gmu.edu/CropScape/ 

USGS. (2011). NLCD 2011 Land Cover Coterminous United States. Retrieved from 

http://mrlc.gov/data 

USGS. (2016). NLCD 2016 Land Cover Coterminous United States. Retrieved from 

https://www.mrlc.gov/data 

USGS. (2017). Landsat Collection 2 Level-2 Science Products. Retrieved from 

https://www.usgs.gov/landsat-missions/landsat-collection-2-level-2-science-products 

133 

 
 
USGS. (2019). 3D Elevation Program 10-Meter Resolution Digital Elevation Model. 

USGS. (2023). Landsat Collection 2 Surface Temperature data gaps due to missing ASTER 

GED. Retrieved August 2, 2023, from https://www.usgs.gov/landsat-missions/landsat-
collection-2-surface-temperature-data-gaps-due-missing-aster-ged 

Valayamkunnath, P., Barlage, M., Chen, F., Gochis, D. J., & Franz, K. J. (2020). Mapping of 30-
meter resolution tile-drained croplands using a geospatial modeling approach. Scientific 
Data, 7(1), 257. https://doi.org/10.1038/s41597-020-00596-x 

Valiela, I., Collins, G., Kremer, J., Lajtha, K., Geist, M., Seely, B., et al. (1997). Nitrogen 

Loading From Coastal Watersheds To Receiving Estuaries: New Method And Application. 
Ecological Applications, 7(2), 358–380. https://doi.org/10.1890/1051-
0761(1997)007[0358:nlfcwt]2.0.co;2 

Vermote, E., Justice, C., Claverie, M., & Franch, B. (2016). Preliminary analysis of the 

performance of the Landsat 8/OLI land surface reflectance product. Remote Sensing of 
Environment, 185(Iss 2), 46–56. https://doi.org/10.1016/j.rse.2016.04.008 

Wan, Z., Hook, S., & Hulley, G. (2017). MODIS/Terra Land Surface Temperature/Emissivity 8-
Day L3 Global 1km SIN Grid V061 [Dataset]. NASA EOSDIS Land Processes DAAC. 
Accessed 2023-02-07 from https://doi. org/10.5067/MODIS/MOD11A2. 061. 

Wang, J., Beusen, A. H. W., Liu, X., & Bouwman, A. F. (2020). Aquaculture Production is a 

Large, Spatially Concentrated Source of Nutrients in Chinese Freshwater and Coastal Seas. 
Environmental Science & Technology, 54(3), 1464–1474. 
https://doi.org/10.1021/acs.est.9b03340 

Wang, Z., Lai, C., Chen, X., Yang, B., Zhao, S., & Bai, X. (2015). Flood hazard risk assessment 

model based on random forest. Journal of Hydrology, 527, 1130–1141. 
https://doi.org/10.1016/j.jhydrol.2015.06.008 

Ward, M. H., Jones, R. R., Brender, J. D., Kok, T. M. de, Weyer, P. J., Nolan, B. T., et al. 

(2018). Drinking Water Nitrate and Human Health: An Updated Review. International 
Journal of Environmental Research and Public Health, 15(7), 1557. 
https://doi.org/10.3390/ijerph15071557 

Watson, S. B., Miller, C., Arhonditsis, G., Boyer, G. L., Carmichael, W., Charlton, M. N., et al. 
(2016). The re-eutrophication of Lake Erie: Harmful algal blooms and hypoxia. Harmful 
Algae, 56, 44–66. https://doi.org/10.1016/j.hal.2016.04.010 

Weinstein, C. B., Bourgeau-Chavez, L. L., Martin, S. L., Currie, W. S., Grantham, K., Hamlin, 
Q. F., et al. (2021). Enhancing Great Lakes coastal ecosystems research by initiating 
engagement between scientists and decision-makers. Journal of Great Lakes Research, 
47(4), 1235–1240. https://doi.org/10.1016/j.jglr.2021.04.018 

Wellen, C., Arhonditsis, G. B., Labencki, T., & Boyd, D. (2012). A Bayesian methodological 

framework for accommodating interannual variability of nutrient loading with the 

134 

 
 
SPARROW model. Water Resources Research, 48(10). 
https://doi.org/10.1029/2012wr011821 

Wellen, C., Kamran-Disfani, A.-R., & Arhonditsis, G. B. (2015). Evaluation of the Current State 
of Distributed Watershed Nutrient Water Quality Modeling. Environmental Science & 
Technology, 49(6), 3278–3290. https://doi.org/10.1021/es5049557 

Williams, M. R., Livingston, S. J., Penn, C. J., Smith, D. R., King, K. W., & Huang, C. (2018). 
Controls of event-based nutrient transport within nested headwater agricultural watersheds 
of the western Lake Erie basin. Journal of Hydrology, 559, 749–761. 
https://doi.org/10.1016/j.jhydrol.2018.02.079 

Wine, M. L., Golden, H. E., Christensen, J. R., Lane, C. R., & Makhnin, O. (2020). Seasonal 

watershed-scale influences on nitrogen concentrations across the Upper Mississippi River 
Basin. Hydrology and Earth System Sciences Discussions, 2020, 1–31. 
https://doi.org/10.5194/hess-2020-423 

Winter, T. C. (2001). THE CONCEPT OF HYDROLOGIC LANDSCAPES. JAWRA Journal of 
the American Water Resources Association, 37(2), 335–349. https://doi.org/10.1111/j.1752-
1688.2001.tb00973.x 

Withers, P. J., Jordan, P., May, L., Jarvie, H. P., & Deal, N. E. (2014). Do septic tank systems 

pose a hidden threat to water quality? Frontiers in Ecology and the Environment, 12(2), 
123–130. https://doi.org/10.1890/130131 

Withers, P. J. A., May, L., Jarvie, H. P., Jordan, P., Doody, D., Foy, R. H., et al. (2012). Nutrient 

emissions to water from septic tank systems in rural catchments: Uncertainties and 
implications for policy. Environmental Science & Policy, 24, 71–82. 
https://doi.org/10.1016/j.envsci.2012.07.023 

Woo, D. K., Song, H., & Kumar, P. (2019). Mapping subsurface tile drainage systems with 

thermal images. Agricultural Water Management, 218, 94–101. 
https://doi.org/10.1016/j.agwat.2019.01.031 

Wuebbles, D., Cardinale, B., Cherkauer, K., & Davidson-Arnott, R. (2019). An Assessment of 
the Impacts of Climate Change on the Great Lakes. Environmental Law & Policy Center, 
104(3–4), 629–652. https://doi.org/10.1007/s10584-010-9872-z 

Xie, Y., & Lark, T. J. (2021). Mapping annual irrigation from Landsat imagery and 

environmental variables across the conterminous United States. Remote Sensing of 
Environment, 260, 112445. https://doi.org/10.1016/j.rse.2021.112445 

Yan, K., Yuan, Z., Goldberg, S., Gao, W., Ostermann, A., Xu, J., et al. (2019). Phosphorus 
mitigation remains critical in water protection: A review and meta-analysis from one of 
China’s most eutrophicated lakes. Science of The Total Environment, 689, 1336–1347. 
https://doi.org/10.1016/j.scitotenv.2019.06.302 

135 

 
 
Yan, L., & Roy, D. P. (2016). Conterminous United States crop field size quantification from 

multi-temporal Landsat data. Remote Sensing of Environment, 172, 67–86. 
https://doi.org/10.1016/j.rse.2015.10.034 

Yang, Y., Anderson, M., Gao, F., Hain, C., Kustas, W., Meyers, T., et al. (2016). Impact of Tile 
Drainage on Evapotranspiration in South Dakota, USA, Based on High Spatiotemporal 
Resolution Evapotranspiration Time Series From a Multisatellite Data Fusion System. IEEE 
Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(6), 2550–
2564. https://doi.org/10.1109/jstars.2017.2680411 

Zhang, C., Walters, D., & Kovacs, J. M. (2014). Applications of Low Altitude Remote Sensing 
in Agriculture upon Farmers’ Requests– A Case Study in Northeastern Ontario, Canada. 
PLoS ONE, 9(11), e112894. https://doi.org/10.1371/journal.pone.0112894 

Zhang, L., Zhao, Y., Hein-Griggs, D., Barr, L., & Ciborowski, J. J. H. (2019). Projected extreme 
temperature and precipitation of the Laurentian Great Lakes Basin. Global and Planetary 
Change, 172, 325–335. https://doi.org/10.1016/j.gloplacha.2018.10.019 

Zhang, W., Pueppke, S. G., Li, H., Geng, J., Diao, Y., & Hyndman, D. W. (2019). Modeling 
phosphorus sources and transport in a headwater catchment with rapid agricultural 
expansion. Environmental Pollution, 255(Pt 2), 113273. 
https://doi.org/10.1016/j.envpol.2019.113273 

Zhang, W., Li, H., Hyndman, D. W., Diao, Y., Geng, J., & Pueppke, S. G. (2020). Water quality 
trends under rapid agricultural expansion and enhanced in-stream interception in a hilly 
watershed of Eastern China. Environmental Research Letters, 15(8), 084030. 
https://doi.org/10.1088/1748-9326/ab8981 

Zou, T., Zhang, X., & Davidson, E. A. (2022). Global trends of cropland phosphorus use and 
sustainability challenges. Nature, 611(7934), 81–87. https://doi.org/10.1038/s41586-022-
05220-z 

136 

 
 
  
 
 
APPENDIX A: SUPPORTING INFORMATION FOR CHAPTER 2  

Text A2.1: SENSEflux Model Equations 

The main  function of SENSEflux  modeling  is  shown in  equation (2.1)  below. Where  Lk is  the 

simulated  load  (kg/day)  at  an  in-stream  location  for  an  individual  catchment.  S  represents  the 

nutrient source, Spoint is the point source and directly flows into the rivers and streams, Ssep is the 

septic tank source, and Sij is the application of other sources i that can be harvested to watershed 

cell j. SepEff is removal efficiency on septic loads, it’s fixed as 0.3 for total nitrogen and 0.35 for 

total phosphorus(Luscz et al., 2017; W. D. Robertson et al., 2019). ExHij is an extraction factor 

that describes the in-place removal of nutrients before transport (such as harvest). For all other 

cells and sources, ExHij is equal to 1. Fj is a subsurface partition parameter, describing the fraction 

of  nutrients  that  are  transported  via  a  subsurface  pathway.  It  is  a  function  of  normalized 

groundwater recharge fraction (recharge fraction in a cell j divides the maximum recharge fraction 

across the SENSEflux model domain), where rechFj is the groundwater recharge fraction in cell j, 

and max(rechF) is the maximum value of groundwater recharge fraction across the SENSEflux 

model  domain  (equation  (2.2)).  The  recharge  fraction  was  defined  as  the  average  annual 

precipitation becoming recharged and is limited by 0.55 (Hyndman et al., 2007).  

𝑐𝑒𝑙𝑙𝑠

𝑜𝑡ℎ𝑒𝑟 
𝑠𝑜𝑢𝑟𝑐𝑒𝑠

𝐿𝑘 =   ∑ 𝑅𝑗𝐿𝑎𝑐𝑢𝑠𝑗 × {𝑆𝑝𝑜𝑖𝑛𝑡 + 𝑆𝑠𝑒𝑝(1 − 𝑆𝑒𝑝𝐸𝑓𝑓)𝐵𝑠𝑒𝑗 + ∑ 𝑆𝑖𝑗

× 𝐸𝑥𝐻𝑖𝑗

𝑗

𝑖

× [(1 − 𝐹𝑗)𝐵𝑠𝑗 + 𝐹𝑗(1 − 𝐹𝑠𝑡𝑜𝑟𝑖𝑗)𝐵𝑔𝑗]} 

𝐹𝑗 = 𝑓 × (

𝑟𝑒𝑐ℎ𝐹𝑗
𝑚𝑎𝑥 (𝑟𝑒𝑐ℎ𝐹)

) 

(2.1) 

(2.2) 

The major updates between SENSEflux and its former version (Luscz et al., 2017) are an updated 

river retention function (equation (2.3) - (2.6)), lake retention and subsurface phosphorus storage 

with two new model parameters Lacusj and Flossj, defined in equation (2.7) and (2.8) respectively. 

137 

 
 
Rj describes river reduction of the remaining nutrients after landscape attenuation, and it split as 

denitrification for TN (sorption for TP) plus biological uptake & burial (equation (2.3)). For N 

denitrification  or  P  sorption  (equation  (2.4)),  it  is  an  exponential  function  of  DNSPj  that  was 

calculated based on the flow length tool and used streambed interaction as an input weight raster. 

The composite raster calculation is shown in equation (2.6), where 𝑘̂, 𝑠̂, and 𝑏̂ are derived from 

hydraulic conductivity, slope, and basin yield, respectively. R and V represent hydraulic radius 

and velocity, see details in Text A2.3. Biological uptake & burial is an exponential function of Tj 

that represents  in-stream  travel  time from  cell j  to the downstream  observation point (equation 

(2.5)). 

𝑅𝑗 = 𝑒−𝛼∗𝐷𝑁𝑆𝑃𝑗 ∗   𝑒−𝛼1∗𝐵𝑖𝑜𝑗 

𝐷𝑁𝑆𝑃𝑗 = 𝑓𝑙𝑜𝑤𝐿𝑒𝑛(𝑠𝑡𝑟𝑒𝑎𝑚𝑏𝑒𝑑𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑗) 

𝐵𝑖𝑜𝑗 =  𝑓𝑙𝑜𝑤𝐿𝑒𝑛(𝑖𝑛𝑆𝑡𝑟𝑒𝑎𝑚𝑇𝑟𝑎𝑣𝑒𝑙𝑇𝑖𝑚𝑒𝑗) 
𝑗 ∗ 𝑠̂𝑗 ∗ (1 − 𝑏̂
𝑘̂
𝑅𝑗 ∗ 𝑉𝑗

𝑠𝑡𝑟𝑒𝑎𝑚𝑏𝑒𝑑𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑗 =

𝑗)

(2.3) 

(2.4) 

(2.5) 

(2.6) 

We also consider nutrient attenuation via lakes or reservoirs (Lacusj) as they travel down the stream 

network as a function of travel distance in lakes (equation (2.7)), and only the loss for a lake or 

reservoir  which  has  a  connection  with  streams  is  considered  due  to  the  inherent  river  routing 

scheme  in  the  SENSEflux  model.  Flossj  is  the  fraction  of  groundwater  pathway  nutrients 

stored/lost  in  the  soil  and  the  deeper  unsaturated  zone  where  the  floss  is  a  calibrated  constant 

(equation (2.8)). floss was assumed to be zero due to the high mobility of nitrogen. By adding this 

term, I can better estimate the amount of phosphorus in every grid cell delivered to the streams.  

𝐿𝑎𝑐𝑢𝑠𝑗 = 𝑒−𝛼2∗𝐷𝐿𝑗 

𝐹𝑠𝑡𝑜𝑟𝑗 = 𝑓𝑠𝑡𝑜𝑟 × (1 −

𝑟𝑒𝑐ℎ𝐹𝑗
𝑚𝑎𝑥 (𝑟𝑒𝑐ℎ𝐹)

) 

(2.7) 

(2.8) 

All basin attenuations are exponential functions of flow length (D) from each cell to the nearest 

138 

 
 
 
downgradient  stream  cell.  Bsj,  Bgj,  and  Bsej  are  basin  reduction  parameters,  representing  the 

overland  flow,  general  groundwater,  and  septic  plume  pathways  respectively  (equations  (2.9), 

(2.11), (2.12). Tile field pathway Bstj is also considered in the model as an alternative overland 

pathway,  representing  nutrient  attenuation  along  with  tile  fields  if  tile  exists  in  a  cell  equation 

(2.10). Equations describing each of these terms are given in (Luscz et al., 2017) and (Martin et 

al., 2021).  

𝐵𝑠𝑗 = 𝑒−𝑏𝑠∗𝐷𝑗 
𝐵𝑠𝑡𝑗 = 𝑒−𝑏𝑠𝑡∗𝐷𝑗 
𝐵𝑔𝑗 = 𝑒−𝑏𝑔∗𝐷𝑗 
𝐵𝑠𝑒𝑗 = 𝑒−𝑏𝑠𝑒∗𝐷𝑗 

(2.9) 

(2.10) 

(2.11) 

(2.12) 

Text A2.2: Spatial Distribution of Loss Terms and Basin Storage 

Three in-situ loss terms are applied before nutrients are transported in SENSEflux: Septic removal 

(SepEff), Harvest (ExH), and Storage (Fstor). Septic removal efficiency is applied on septic load 

and fixed as 0.3 for N and 0.35 for P (Luscz et al., 2017; W. D. Robertson et al., 2019). Harvest 

includes  all  in-place  root  zone  nutrient  loss  and  is  assumed  to  occur  in  cells  with  manure  or 

chemical agricultural fertilizers applied. The Storage loss term includes both in-place storage and 

loss of nutrients below the root zone, which I assume to occur for phosphorus (Eq (2.8)). Fstor was 

assumed to be zero for nitrogen due to the high mobility of N. By adding this term, I can better 

estimate the amount of phosphorus in every grid cell delivered to the streams. 

Text A2.3: Spatial Distribution and Derivation for Instream and Lake Losses  

The  attenuation  of  nutrients  during  stream/wetland  transport  is  assumed  to  be  broken  into  two 

components: 1) water column and sediment interface losses, and 2) hyporheic zone losses. Water 

column losses consist of biological uptake, followed by subsequent denitrification and particulate 

transport. For phosphorus, sediment burial is another active process. Hyporheic zone losses may 

139 

 
 
be biological uptake (N or P), denitrification (N), sorption (P), or mineralization (P). I label the 

corresponding terms “water column (WC)” and “hyporheic zone (HZ)” attenuation.  

Water column attenuation is assumed to be a function of travel time in the stream/wetland, given 

as 𝐿𝑜𝑠𝑠𝑤𝑐 = 𝛼𝑤𝑐𝑇𝑠, where 𝛼𝑤𝑐 is a calibrated constant across the domain, and 𝑇𝑠 is the travel time 

in-stream. Travel time in-stream in each model cell is given by 𝑇𝑠 =

1

𝑣

 where 𝑣 is the velocity of 

water in that stream cell. While this is the loss in a single cell, integrating across all cells along the 

flowpath  results  delivery  of  nutrients 𝑁𝑜𝑢𝑡 = 𝑁𝑖𝑛𝑅𝑤𝑐,  where 𝑅𝑊𝐶 = 𝑒−𝛼𝑤𝑐𝑇𝑇𝑠.  Here, 𝑇𝑇𝑠 is  the 

total travel time in streams, computed in ArcGIS as described in the main text (section 2.4.5).  

To compute losses due to streambed and hyporheic zone interactions beneath it, I define losses in 

this zone to be given by the residence time multiplied by a static constant, 𝐿𝑜𝑠𝑠ℎ𝑧 = 𝛼ℎ𝑧𝜏ℎ𝑧 where 

𝜏ℎ𝑧 is the residence time in the hyporheic zone, defined as 𝜏ℎ𝑧 = 𝐷 𝑣ℎ𝑧⁄

 where 𝐷 is the hyporheic 

zone depth and 𝑣ℎ𝑧 is the velocity of water flowing into/out of the HZ.  

The streambed interaction factor (D) is given by 𝐷~𝑓(𝐾, 𝑆, 1 𝑅𝑒𝑐ℎ

⁄

), where K is the hydraulic 

conductivity  of  the  streambed  sediments,  S  is  the  slope  of  the  stream  channel,  and  Rech  is 

groundwater  recharge.  Thus,  I  have  assumed  that  higher  streambed  sediments  increase  the 

exchange of surface and groundwater (increasing hyporheic zone depth) and that a higher slope 

leads  to  greater  streambed  morphometric  variability,  and  thus  greater  incidence  of  flow  paths 

entering  the  sediments  and  exiting  shortly  thereafter,  promoting  hyporheic  exchange.  I  further 

assume that greater groundwater recharge upstream leads to a stronger influx of water from below, 

which  would  reduce  the  depth  of  exchange  through  the  groundwater  pushing  back  against 

streamflow. Here, recharge is assumed to be represented by the basin yield of streams at their 30th 

percentile, thus 𝑅~𝐵𝑌30. I chose to normalize each of the terms by their maximum and minimum 

values across the model domain, and I log-transformed K. The final equation for hyporheic zone 

140 

 
 
depth D is: 

𝐷 = 𝛼𝐷 ∙ [

𝑙𝑜𝑔(

𝑙𝑜𝑔(

)

𝐾
𝐾𝑚𝑖𝑛
𝐾𝑚𝑎𝑥
𝐾𝑚𝑖𝑛

] ∙ [
)

𝑆
𝑆90

] ∙ [1 −

𝐵𝑌30
𝐵𝑌30,𝑚𝑎𝑥

] = 𝛼𝐷𝐾̃ ∙ 𝑆̃ ∙ 𝐵𝑌̃                             (2.13) 

Where each of the terms with a tilde (~) represents 0-1 normalized values, corresponding to the 

similarly colored sections of the left-hand side of the equation. The units of D are [L] (here, m). 

Note for 𝑆̃ I limited the maximum value to 1. 

If I assume that the total flux of nutrients into the HZ (𝑛𝑖𝑛) is given by 𝑛𝑖𝑛 = 𝑄ℎ𝑧𝐶𝑖𝑛, where 𝑄 is 

the flux of water in/out of the HZ [L3/T] and 𝐶𝑖𝑛 is the input concentration to the HZ [M/L3], I can 

represent 𝑄ℎ𝑧 = 𝑃 ∙ 1 ∙ 𝑣ℎ𝑧, where 𝑃 is the perimeter of the stream channel [L], 1 is the unit length 

of the channel [L], and 𝑣ℎ𝑧 is the velocity of water exchange in the HZ [L/T]. Therefore, 𝑛𝑖𝑛 = 𝑃 ∙

𝑣ℎ𝑧 ∙ 𝐶𝑖𝑛 =

𝑃∙𝑣ℎ𝑧∙𝑁𝑖𝑛
𝑄

, where 𝑁𝑖𝑛 is the input nutrient flux [M/T] in the stream channel above the 

HZ, and 𝑄 is the streamflow in the channel [L3/T].  

We  further  assume  that  nutrient  uptake  is  linearly  related  to  the  residence  time  in  the  HZ, 𝜏ℎ𝑧 

which  can  be  expressed  as 𝜏ℎ𝑧 = 𝐷/𝑣ℎ𝑧.  Thus,  the  flux  of  nutrients  out 𝑛𝑜𝑢𝑡 = 𝑛𝑖𝑛 ∙ 𝜏ℎ𝑧 ∙ 𝛼ℎ𝑧, 

where 𝛼ℎ𝑧 is a parameter to be calibrated in the model. Substituting the definitions of 𝑛𝑖𝑛 and 𝜏ℎ𝑧, 

I get 𝑛𝑜𝑢𝑡 =

𝑃∙𝑣ℎ𝑧∙𝑁𝑖𝑛
𝑄

∙

𝐷
𝑣ℎ𝑧

∙ 𝛼ℎ𝑧. Canceling terms, I get 𝑛𝑜𝑢𝑡 =

𝑃

𝑄

∙ 𝑁𝑖𝑛 ∙ 𝐷 ∙ 𝛼ℎ𝑧. I can further use 

the  relationship  between 𝑃 (the  wetted  perimeter  of  the  stream  channel)  and 𝑄 = 𝐴 ∙ 𝑣 [L3/T], 

determined  by  the  hydraulic  radius 𝑅 = 𝐴/𝑃 [L],  where 𝐴 is  the  channel  area,  and 𝑣 is  the  in-

stream average water velocity. Thus 

𝑃

𝑄

=

𝐴

𝑅∙𝐴∙𝑣

=

1

𝑅∙𝑣

. Finally, if I substitute this and equation S3.1 

into our equation for 𝑛𝑜𝑢𝑡 (noting that the linear constants combine 𝛼ℎ𝑧 = 𝛼ℎ𝑧 ∙ 𝛼𝐷) I get equation 

(2.14). 

                     𝑛𝑜𝑢𝑡 = 𝑁𝑖𝑛

𝑲̃ ∙𝑺̃∙𝑩𝒀̃
𝑹∙𝒗

𝛼ℎ𝑧 

                      (2.14) 

The bold terms in equation (2.14) are independent of nutrient concentrations but vary in space. 

141 

 
 
 
 
Therefore, I combine these terms into a single model input layer I term the streambed exchange 

rate, 𝑆𝐸 =

𝐾̃∙𝑆̃∙𝐵𝑌̃
𝑅∙𝑣

  [T/L], shown in Figure A2.0.6 e. Other intermediate inputs to calculate 𝑆𝐸 are 

shown in Figures 2.2 and S6. While the values of 𝐾 and 𝑆 could be calculated from input static 

layers (e.g., Figure A2.0.1), 𝐵𝑌, 𝑅, and 𝑣 needed to be computed for each point along the stream 

channel.  

Text A2.4: Groundwater Recharge and Tile-Drained Area Calculation 

Groundwater  recharge  was  estimated  based  on  a  series  of  linear  models  derived  from  the 

Landscape Hydrology Model (LHM), a coupled process-based hydrological model (Hyndman et 

al.,  2007).  LHM  was  originally  developed  for  the  Muskegon  River  Watershed,  located  in  the 

central  part of Michigan’s lower peninsula,  and  contains diverse land use representative of the 

broader region. This hydrological model combined several GIS layers including land use, soils, 

and  station  observation  data 

to  predict  stream  discharge,  groundwater  recharge,  and 

evapotranspiration from 1990 to 2004. The linear regression models fit for each land use type to 

the percentage of precipitation that becomes recharges as a function of soil hydraulic conductivity. 

The  annual  precipitation  from  2008  to  2012  was  downloaded  from  Parameter-elevation 

Regressions on Independent Slopes Model (PRISM) database (OSU, 2014) and average annual 

precipitation from 2008 to 2012 was used for linear models. The soil hydraulic conductivity is 

derived from the soil texture of the soil survey geographic database (SSURGO) (USDA, 2013) 

and  land  cover  data  from  the  national  land  cover  database  2011(USGS,  2011).  The  recharge 

estimates for USGLB are shown in Figure A2.0.1.  

To derive an estimated tile drainage map, I used GIS-based mapping based on the premise that 

crops grown on land with low slopes and poorly drained soil likely have tile drains. First, cells 

classified in the 2011 NLCD (USGS, 2011) as “Cultivated Crops” were extracted. Then, I fit a 

142 

 
 
model to land drained by tile county-level data from land-use practice in 2012 for the 109 counties 

in  the  USGLB  (Figure  A2.0.1),  published  by  the  United  States  Department  of  Agriculture, 

National  Agricultural  Statistical  Service  (NASS, 2012). 55 counties were randomly selected as 

training datasets and the remaining 54 counties as a validation dataset. This model selects areas 

that are cultivated land-use types, moderately low soil permeability (Ksat < 14.4 mm/hr), and low 

average  slopes  (<  1.2%)  as  tile  drainage  (r2  =0.83).  The  rest  of  the  54  counties  were  used  to 

calibrate  the  model  (r2=0.85).  Estimates  of  tile-drained  areas  are  shown  in  Figure  A2.0.1.  The 

aggregation method (maximum) used in pre-processing data may overestimate the area of tiled 

fields in the model.  

Text A2.5: Model Parameter Extended Discussion 

Four of the model parameters (f, ExH, SepEff, and fstor) are linear coefficients on loss terms in the 

model,  while  the  remainder  (basin  and  river  losses)  are  coefficients  in  the  exponent  of  an 

attenuation term—thus their values are not directly comparable.  

The  subsurface  partition  parameter  f  is  multiplied  by  the  normalized  recharge  (Eq  (2.2))  and 

represents the proportion of mobile surface-applied nutrients (after Harvest) that are sent through 

the groundwater pathway. Thus, areas with the highest recharge in the basin have 49% of mobile 

surface-applied nutrients sent to groundwater for N while 78% is routed through groundwater for 

P  (table  A2.1).  See  Figure  A2.0.13  for  a  map  of  the  final  groundwater  partition  fraction.  It  is 

somewhat surprising that phosphorus has a higher fraction than nitrogen, given the relative ease 

with which NO3 in particular leaches from soils, however, this may be an artifact of the simplistic 

relationship between recharge rates and groundwater pathway hydrologic fractionation imposed 

here. It may also be due to a non-linear relationship between soil texture (underlying recharge) and 

P  mobility.  P  moves  much  more  readily  through  sandy  soils  than  finer  textured  ones.  That 

143 

 
 
relationship is also not captured properly here. 

Our model predicts that 80% of surface-applied N is harvested, lost to the atmosphere (N only), or 

stored in the root zone in agricultural settings, while 95% of P is. Figures S10a and S11a show 

total losses due to harvest, atmosphere, and storage for N and P, respectively. The higher rate of P 

harvest may be influenced by the tendency of P to sorb to unsaturated zone soils, rather than a 

more careful accounting of N and P needs when fertilizers are applied.  

Phosphorus then experiences an additional deep unsaturated zone storage, where up to  55% of 

groundwater-mobile P (in the lowest recharge areas, Eq (2.8)) is stored (Figure A2.0.11b). Areas 

with higher recharge then experience less storage proportionately.  

Nutrient  attenuation  during  basin  transport  through  surface  runoff,  tile  drain  fields,  general 

groundwater,  and  septic  plumes  are  determined  by  the  overland  travel  distance  along  with  the 

corresponding parameters (bs, bst, bg, and bse, see Eq (2.9-2.12)). The less calibrated parameter 

value  means  a  higher  delivery  rate  through  the  pathway,  but  the  amount of  nutrients  delivered 

through these pathways are not directly comparable due to the different amounts traveled prior to 

basin attenuation. But I can compare bs and bst here as they are alternative pathways, tile drainage 

delivery rate is about 20 times higher than the overland runoff pathway for both TN (20.58) and 

TP  (19.51).  General  groundwater  pathway  and  septic  plume  pathway  have  the  delivery  rate 

between overland runoff and tile drains.  

Text A2.6: Load Comparison with Sparrow 

First of all, there are some important differences to note: SPARROW does not use spatially explicit nutrient 

sources nor attenuation processes and is run at a coarser resolution. Using the same observation dataset, the 

SENSEflux  TN  model  slightly  underestimated  high  loads  while  the  SPARROW  model  slightly 

overpredicted  them  (Figure  A2.0.9).  Both  SENSEflux  and  SPARROW  models  slightly  underpredicted 

higher TP loads and overpredicted lower loads. These differences are not surprising because the two models 

144 

 
 
have  several  notable  differences,  including  nutrient  attenuation  processes,  methods  to  model  nutrient 

sources, spatial resolution, and timeframes. Specifically, SENSEflux includes four distinct pathways (tile 

fields, overland, septic plumes, and groundwater, see Figure 2.1) while the SPARROW model uses data on 

land-to-water delivery factors, such as soil permeability, drainage density, precipitation, air temperature, 

the fraction of the stream catchment with tile drains to describe attenuation processes broadly across the 

basin(D. M. Robertson & Saad, 2011b). Moreover, SENSEflux uses spatially explicit nutrient source 

inputs  from  SENSEmap  while  SPARROW  uses  land  use/cover  and  county-level  estimates  of  nutrient 

masses  to  statistically  compute  sources  more  generally  (Hamlin  et  al.,  2020a).  Finally,  SPARROW 

models were developed for TN and TP with a 2002 base year (D. M. Robertson & Saad, 2011b), while 

nutrient sources and watershed factor data used in SENSEflux are based on ca. 2010 data.  

Figure  A2.0.1  Model  key  inputs.  (a)  annual  groundwater  recharge  for  USGLB;  (b)  overland 
flowlength; (c) harvested areas where either manure or chemical agricultural fertilizers applied for 
TN  and  TP;  (d)  estimated  tile  drainage  (indicated  in  yellow)  estimated  from  land  use,  soil 
permeability, and average slope. 

145 

 
 
 
 
Figure A2.0.2 Study region and data source showing the GLB in the inset map, along with a. the 
Land Use/Land Cover across the basin from the National Land Cover Dataset (USGS, 2011); b. 
non-point TN source from SENSEmap (Hamlin et al., 2020a, b); c. Average annual precipitation 
from 2008 to 2012 from PRISM (PRISM Climate Group 2011); d. the saturated conductivity of 
the top soil layer from SSURGO (Soil Survey Staff, 2022). 

146 

 
 
 
 
Figure A2.0.3 SENSEmap nitrogen source for USGLB resampled to 720 m resolution for display, 
derived from  Hamlin  et  al  (2020). The units are  kg/ha/yr except  for point sources (kg/yr). The 
color breaks are based on quantile classification and round-off methods. 

147 

 
 
 
 
Figure  A2.0.4  SENSEmap  phosphorus  source  for  USGLB  resampled  to  720  m  resolution  for 
display. The units  are kg/ha/yr except  for point sources (kg/yr). The color breaks  are based on 
quantile classification and round-off methods. 

148 

 
 
 
 
Figure A2.0.5 Spatial domain showing nitrogen and phosphorus sampling locations that are used 
for delineating watersheds and loads used for model calibration and validation. (a) TN sampling 
locations (N = 116), (b) TP sampling locations (N = 119), (c) TN watersheds with loading, and (d) 
TP watersheds with loading. Maps are classified in quantiles, with each color representing 25% of 
the study domain.  

149 

 
 
 
 
Figure A2.0.6 Inputs are used to derive the river retention factor in SENSEflux. (a) average slope, 
(b) basin yield during baseflow; (c) hydraulic radius; (d) average velocity; (e) streambed exchange 
rate; (f) N denitrification or P sorption factor.  

150 

 
 
 
 
 
Figure A2.0.7 Model residual (log10 model – log10 observed) distribution and density are shown 
as violin plots. The bottom and top of each box represent the first and third quartiles, respectively, 
and the line inside each box represents the median. Zero residual is indicated as a black dashed 
line.  The  top  and  bottom  bars  (whiskers)  represent  the  maximum  and  minimum  residuals, 
respectively.  Data  beyond  the  end  of  the  whiskers  are  "outlying"  points  and  are  plotted 
individually. None of the means were significantly different from zero, as measured by the one-
sample t-test, with P values > 0.05.  

151 

 
 
 
 
Figure  A2.0.8  Log  model  residuals  (kg/day)  by  watersheds  for  both  calibration  and  validation 
datasets. The color breaks are based on quantile classification and are rounded to the nearest 0.1. 

Figure A2.0.9 Comparison to the simulated loads (log10 of kg/day) in the SPARROW models. 
Blue dots and lines are for SENSEflux simulation, and red is for SPARROW. The dashed black 
line is a 1:1 line, where simulated loads are equal to observed loads.  

152 

 
 
 
 
 
Figure A2.0.10 TN model loss and attenuation outputs. (a) crop extraction of nitrogen; (b) total 
nitrogen loss during basin transport; (c) nitrogen uptake in streams and connected lakes. Maps are 
resampled  from  120m  SENSEflux  outputs  to  720m  resolution  here  for  display  purposes  and 
classified in quantiles with each color representing 20% of the dataset.  

153 

 
 
 
Figure A2.0.11 TP model loss and attenuation outputs. (a) crop extraction of phosphorus; (b) in-
place phosphorus storage and loss of phosphorus below the root zone; (c) TP loss during basin 
transport; (d) phosphorus uptake in streams and connected lakes. Maps are resampled from 120m 
SENSEflux outputs to 720m resolution here for display purposes and classified in quantiles with 
each color representing 20% of the dataset.  

154 

 
 
 
 
Figure A2.0.12 Estimated total yield of TP delivered to lakes by four key pathways (kg/km2/yr). 
Maps are resampled from 120 m SENSEflux outputs to 720 m resolution here for display purposes 
and classified in quantiles, with each color representing 20% of the dataset; the white area in a&b 
within the basin boundary represents areas with no data as I assumed overland and tile fields are 
alternative pathways. 

Figure A2.0.13 Spatial distribution of SENSEflux surface and subsurface partition parameter (f) 
for TN (a) and TP (b).  

155 

 
 
 
  
 
 
 
Calibrated parameter value  

TP  
0.78 
0.95 

0.35 

0.55 

1.27E-04 
6.51E-06 

5.90E-03 

1.91E-02 

1.41E-07 

2.93E-04 

1.94E-05 

Table A2.1 Summary of optimized model parameters.  

Function 
type  

Parameter  General function 

f 
ExH 

SepEff 

Subsurface partition 
Harvest extraction 
Efficiency multiplier on 
septic loads 

Linear  

TN 
0.49 
0.80 

0.30 

fstor 

bs 
bst 

bg 

bse 

dnsp 

bio 

lacus 

Fraction of groundwater-
pathway nutrients stored in 
the deep unsaturated zone 

0 

Basin attenuation (surface) 
Tile attenuation (surface) 
Basin attenuation 
(subsurface) 
Basin attenuation (septic) 
N denitrification or P 
sorption in River 
Biological Uptake and 
Burial in River 
Lake reduction 

1.41E-03 
6.85E-05 

3.76E-04 

7.12E-04 

1.97E-06 

1.05E-04 

1.36E-05 

Exponential  

156 

 
 
 
 
APPENDIX B: SUPPORTING INFORMATION FOR CHAPTER 3  

Figure A3.0.1 Comparison of TN versus the sum of TKN and TNN for melt (a) and baseflow (b) 
with 336 and 482 records from 2008 to 2012. (TN: total nitrogen; TKN: total Kjeldahl nitrogen; 
TNN: total nitrate and nitrite). 

Figure A3.0.2 Workflow to get concentration, flow, load, and watershed area. 

157 

 
 
 
 
Figure A3.0.3 Site concentrations used for SENSEflux model calibration and validation. 

158 

 
 
  
 
Figure A3.0.4 Watershed area (km2) by hydrologic seasons for TN and TP. 

159 

 
 
 
 
Figure  A3.0.5  Model  key  inputs  include  (a)  hydrologic  landscape  regions  (HLR),  (b)  drift 
thickness and in-stream travel time during snowmelt, (c) and baseflow (d). For HLR, each value 
represents different regions. See region descriptions below.  

Hydrologic Landscape Region descriptions: 1. Subhumid plains with permeable soils and bedrock; 

2. Humid plains with permeable soils and bedrock; 3. Subhumid plains with impermeable soils 

and permeable bedrock; 4. Humid plains with permeable soils and bedrock; 5. Arid plains with 

permeable soils and bedrock; 6. Subhumid plains with impermeable soils and bedrock; 7. Humid 

plains with permeable soils and impermeable bedrock; 8. Semiarid plains with impermeable soils 

and  bedrock;  9.  Humid  plateaus  with  impermeable  soils  and  permeable  bedrock;  11.  Humid 

plateaus  with  impermeable  soils  and  bedrock;  12.  Semiarid  plateaus  with  permeable  soils  and 

impermeable  bedrock;  13.  Semiarid  plateaus  with  impermeable  soils  and  bedrock;  16.  Humid 

mountains  with  permeable  soils  and  impermeable  bedrock;  17.  Semiarid  mountains  with 

impermeable soils and bedrock; 18. Semiarid mountains with permeable soils and impermeable 

bedrock. 

160 

 
 
 
Figure  A3.0.6  Surface  and  groundwater  seasonal  mobility  for  an  example  USGS  station 
(04112500), Red Cedar River at East Lansing, MI. Streamflow data was downloaded for the 2010 
water year, plotted as the black line. The gray shadow area represents baseflow that was separated 
using the hydrograph separation program, thus the white area between streamflow and baseflow is 
surface  runoff.  Two  hydrological  seasons  are  shown  in  rectangle  area  with  orange  (Melt)  and 
baseflow (light blue). Surface and groundwater seasonal mobility are calculated for this station 
and is shown in the gray card on top left corner.  

161 

 
 
 
 
Figure  A3.0.7  Maps  for  surface  (S)  and  groundwater  (G)  seasonal  mobility  during  snow  melt 
(Melt) and baseflow (BF) seasons.  

162 

 
 
 
 
Figure A3.0.8 Model residual (log10 model – log10 observed) distribution and density are shown 
as violin plots, for annual, baseflow, and melt (different colors). The bottom and top of each box 
represent  the  first  and  third  quartiles,  respectively,  and  the  line  inside  each  box  represents  the 
median.  Zero  residual  is  indicated  as  a  black  dashed  line.  The  top  and  bottom  bars  (whiskers) 
represent the maximum and minimum residuals, respectively. Data beyond the end of the whiskers 
are "outlying" points and are plotted individually. One-sample t-test is used to measure whether 
the mean residuals are significantly different from zero, indicated as ns (p > 0.05) and * (p < 0.05).  

163 

 
 
 
 
 
Figure A3.0.9 Log model residuals (kg/day) by watersheds. 

Text A3.1: Load Comparison with SPARROW model 

First,  there  are  some  critical  differences:  SPARROW  does  not  use  spatially  explicit  nutrient 

sources or attenuation processes and is  run at a coarser resolution. Using the same observation 

dataset,  the  SENSEflux  TN  model  slightly  underestimated  high  loads,  while  the  SPARROW 

model slightly overpredicted them (Figure  A3.0.10). Both  SENSEflux  and SPARROW models 

slightly  underpredicted  higher  TP  loads  and  overpredicted  lower  loads.  These  differences  are 

164 

 
 
 
unsurprising  because  the  two  models  have  notable  differences,  including  nutrient  attenuation 

processes,  methods  to  model  nutrient  sources,  spatial  resolution,  and  timeframes.  Specifically, 

SENSEflux includes four pathways (tile fields, overland, septic plumes, and groundwater). At the 

same  time,  the  SPARROW  model  uses  data  on  land-to-water  delivery  factors,  such  as  soil 

permeability,  drainage  density,  precipitation,  air  temperature,  and  the  fraction  of  the  stream 

catchment  with  tile  drains  to  describe  attenuation  processes  broadly  across  the  basin(D.  M. 

Robertson et al., 2019). Moreover, SENSEflux uses spatially explicit nutrient source inputs from 

SENSEmap, while SPARROW uses land use/cover and county-level estimates of nutrient masses 

to statistically compute sources more generally (Hamlin et al., 2020a). Finally, SPARROW models 

were developed for TN and TP with a 2002 base year (D. M. Robertson et al., 2019), while nutrient 

sources and watershed factor data used in SENSEflux are based on ca. 2010 data. 

Figure A3.0.10 Comparison to the simulated loads in the SPARROW models. Blue dots and lines 
are for SENSEflux simulation, and red is for SPARROW. The solid black line is a 1:1 line, where 
simulated loads are equal to observed loads.  

165 

 
 
 
 
Figure A3.0.11 Nutrient sources by lake basin. 

166 

 
 
 
 
Table A3.1 Summary of optimized model parameters (three parameters change across seasons, 
and the rest eight parameters are season shared) from optimization.  

Parameters 

Functio
n type  

General function 

Parameters 
different 
across 
season 

Season 
shared 
parameters  

bs  

rbio 

tsettl 

f 

ExH 
SepE
ff 

fstor 

bst 

bg 

bse 

dnsp 

Exponen
tial  

Linear  

Exponen
tial  

Basin reduction 
(surface) 
biological uptake in 
rivers 

Lake attenuation 

Subsurface partition 

Harvest extraction 
Efficiency multiplier 
on septic loads 
Fraction of 
groundwater-pathway 
nutrients stored in the 
deep unsat zone for P 
Tile reduction 
Basin reduction 
(subsurface) 
Basin reduction 
(septic) 
N denitrification and 
P 
sorption/mineralizatio
n in rivers 

Model Season  

TN 

TP 

Baseflo
w  
1.76E-
02 
3.03E-
05 
2.68E-
06 

Annual   Melt  

1.88E-
04 
2.38E-
05 
2.11E-
07 

3.58E-
03 
2.50E-
05 
1.13E-
06 
0.35 

0.70 

0.30 

0.00 

1.66E-03 

1.09E-02 

6.56E-04 

Baseflo
w  
9.22E-
04 
2.57E-
05 
1.52E-
05 

Annual   Melt  

9.71E-
05 
2.40E-
05 
6.08E-
07 

6.59E-
06 
2.30E-
05 
9.19E-
07 
0.77 

0.68 

0.35 

0.54 

2.76E-03 

3.70E-03 

8.33E-04 

6.79E-06 

8.48E-06 

167 

 
 
 
 
 
 
APPENDIX C: SUPPORTING INFORMATION FOR CHAPTER 4 

Figure A4.0.1 The generation flow of ground truth points to Further exclusion of the points (i.e., 
120-m apart for tile points; the equivalent number of tile and non-tile points) are not shown in this 
workflow.  

Text A4.1: Variable Selection  

26 variables were initially chosen but not used in the final classification due to relatively higher 

correlation and lesser importance. These include (1) six variables that are derived from Landsat, 

specifically the maximum value of the normalized difference vegetation index (NDVI) in summer 

and  spring,  the  normalized  water  index  (NDWI)  in  growing  season,  maximum  Tr_swir1  in 

growing  season  and  max  Tr_swir2  in  spring  and  summer.  (2)  six  climate-related  variables, 

precipitation in spring and summer, aridity in summer and the growing season from GridMET, 

and the actual evapotranspiration (AET) in spring and summer; (3) three soil property variables at 

0-1m, including the percentage of clay, hydraulic conductivity, and plant available water, while 

168 

 
 
 
these  variables  at  0-5  cm  were  selected  for  final  classification;  (4)  nine  soil  moisture  related 

variables, including the maximum surface soil moisture (SSM) and soil moisture percent (SMP) 

in spring, median SSM, SMP and subsurface soil moisture (SUSM) in spring, the range of SSM 

and SMP in spring, as well as median SSM and SUSM in the summer; and (4) two land surface 

temperature  related  variables,  including  maximum  nighttime  LST  in  growing  season  and  the 

median difference of dayLST and nightLST in summer.  

Figure A4.0.2 The mean of monthly maximum NDVI and NDWI value across all tile and non-tile 
points. 

169 

 
 
 
 
Figure A4.0.3 The Person correlation coefficient (r < 0.8) between each two of the remaining 31 
variables with short names (see full names in Table 4.1).  

170 

 
 
 
 
Figure A4.0.4 Variable importance in random forest classification. Variables are identified by their 
short names, which can be related to full names and other information using Table 1. Left: The 
mean decrease in node impurity from splits on each variable as measured by the Gini index. Right: 
The mean decrease in accuracy. For both, a larger number means higher importance.  

171 

 
 
 
 
Figure A4.0.5 Classified tile drainage area (km2) versus the statistical area from the USDA NASS, 
2017. Each point stands for one county, the blue solid line is the linear regression line, and the 
black dashed line is the 1:1 line.  

172