Q UANTIFYING THE NUTRIENT LANDSCAPE IN THE GREAT LAKES RE GION : MAPPING AND ANALYZING NUTRIENT SOURCES AND GROUNDWATER NITRATE By Quercus Florence Hamlin A THESIS Submitted to Michigan State University in partial fulfillment of the require ment s for the degree of Environmental Geosciences Master of Science 202 0 ABSTRACT MAPPING AND ANALYZING NUTRIENT SOURCES AND GROUNDWATER NITRATE By Quercus Florence Hamlin Since the mid - 19 th century , the rise of industrial agriculture and g rowing popula tion has signif icantly altered nutrient cy c ling . These changes are from multiple sources , such as chemical fertilizers, livestock w aste, and human waste . Excess nutrients have led to a suit e of water quality problems that damage human and animal healt h, ecol ogy , and economics. In this thesis, I begi n to qu antify the Nutri e n t Landscape , a term I use to refer to the set of processes and properties that drive cycling of n itrogen and phosphorus thr oughout our modern environment . To understand the I first develop al go rithms utilizing broadly available data to estimate nutrient inputs from seven dis t inc t sources across the U .S. portion of the Laurentian Great Lakes Basin at 30 m eter r esoluti on. Chapter I s mapping effort, referred to as the Spatially Expli cit Nutrient Sourc e Estimate map (SENSEmap), provide s new information for management and modeling , as well as a classification system to categorize wa t ers heds based on the ir nut rie nt source composition. Second, I exa mine the groundwater component of the Nutrie nt Landscape by exploring a dataset of over 300,000 nitrate samples from drinking water wells using Classification and Regression Tree (CART) anal y sis to determine drivers of el evated concentration . This analysis reveal ed high nitrate concentrations result from a combination of hazardous land use and vulnerable geology . The data products and finding s in this thesis provide a quantitative framework for inform ing management s trateg ies and driving the next generation of nutrient modeling . iii ACKNOWLEDGEMENTS T h e work involved in my Master s thesis spans a four year period with support fr om people all along the way. This thesis represents more than a body of research, but a tran sforma tive pa th from which I found p urpose and friendship . I want to ac knowledge the tremendo us support from my frie nds, family , an d lab, to which t here is significant overlap. First and f ore most, the kind ness, positivity, and support from my home in the H ydrola b got m e to where I am. Starting as an undergraduate in the lab in 2015, I quickly gained the responsibility of what would make up my first chapter and inspire me to pursue graduate degrees in water quality. My work on SENSEmap started with the suppo rt of my su pe rvisors and mentors, Drs. Anthony Kendall and Sher ry Martin. Early in my lab career, t hey trusted me with the responsibility and challenge of an intense coding and GIS project . Anthony and Sherry s mentorship shaped the scientist I am today f r om an early p lace . Their support while I was an undergraduate continued into my graduate career starting at Syracuse University and later welcoming m y return to MSU. Additionally , their friendshi p made it all a humble, war m experience . I thank my committee , Dr s. David Hyndman, Anthony Ken dall, and Lisa Tiemman for their guidance and insight. I th ank my primary advisor Dr. Da vid Hyndman for his optimism of ten a double - edged sword for his student s , but overall a positive trait. Particularly at the end of my Maste r s degree during the COVID - 19 Pandemic, he was flexible and supportive while working to dispel my negativity about my wor k. I am ex cited t o continue working with Dave throughout my PhD. I t seems so long ago, but as an undergraduate I was intimi date d by him , unsu re what to expect a s we began the paper process. iv The friends I made in the Hydrolab brought joy to graduate school and my li fe these are friendship s I will cherish throughout my life . I thank my officemates Brent Heerspink, Bailey Hannah, Luwen Wan , Jak e Roush, and C hanse Ford for the great atmospher e in the Boat Room as well as my other friend s an d labmate s : Alex Kuhl , Ally Brady, Behnaz Mirzendehdel , Ben Mc Carthy, Erin Haack er, Jake Stid, Jeremy Rapp , Jill Deines, Joe Lee - Cullin (honorary Hydrol abmate ) , Kyle Cole, Leanne Hancock, and Mohamm e d Rahman . Some spe cial shoutouts: At the end of my Master s, during mandat ory quarantin e , I thank Brent and Alex for their daily Z o om companionship that ensured I completed my thesis on time and stayed sane . T his cr isis brought us closer and without our f riendship I am not sure I would have made it through . I thank Jeremy f or his friendship since I started in the lab with his humor , intellectual banter, and empathy as a friend and colleague. To Luwen and Ally, thank you for your positivity and light as dear frien ds. Thank you to my student coauthors on th e SENSEmap manuscrip t, Henry Whitenack, Jake Roush, and Bailey Hannah t hose revisions were a marathon . Thank you t o Kyle and Yvonn e, the undergraduates I am m entori ng, for giving me the joy of teaching and letting me be a part of your gr owth. Thank you as well to the D e par tment of Eart h and Environm ental Sciences staff who mak e the logisti cs of academia run : Ami McMurphy, Pam Robinson, Britt an y Walter, Elizabe th McE lroy, and Judi Smelser. Outside of the university , I thank my friends , particularly Bri ar Fraleigh, Kenzie E dd y, and Julia Cousino for getting together for all the brunch dates, concerts, and late nights . There would be no success without the wonder ful va cations taken t o see Nico Caban a yan, Bev Yockelson, Tom Quinn, and Cynthia Santos my l ong - distance , wonderful friends who have always let me crash on couc hes and helped revive me . You al l let me ramble about my science v and the frustrations inherent to gr aduate school. I can t wait to see you aga in when trav el beco mes p ossible again . Finally , I thank my parents , Fern and Chris Hamlin , for your support throughout my journey. T o my dad thanks for the bike rides, women s basketball fanaticism , and hel ping m e mo ve so many times . You also got me into science with my fi rst research job at Notre D ame with Dr. Jason McLachlan, Jill Deines, and Jody Peters . This start pushed me forward . The last thank you , a critical one, goes to my therapist for the last tw o year s at the DBT Institute of Michigan. The skills I learned and support received were life changing, allowing me to grow into a life I want to have. Thank you all. It has been a wild ride and I am excited for wh at comes next. vi TABLE OF CONTENTS LIST OF TABLES ................................ ................................ ................................ .................. viii LIST OF FIGURES ................................ ................................ ................................ .................. ix KEY TO ABBREVIATIONS ................................ ................................ ................................ .... xi CHAPTER 1: QUANTIFYING LANDSCAPE NUTRIENT INPUTS WITH SPATIALLY EXPLICIT NUTRIE NT SOURCE ESTIMATE MAPS ................................ ............................... 1 Abstract ................................ ................................ ................................ ................................ ... 1 1. Introduction ................................ ................................ ................................ ......................... 2 2. Materials and Methods ................................ ................................ ................................ ......... 6 2.1 Study Area ................................ ................................ ................................ ..................... 6 2.2 Data Sources ................................ ................................ ................................ .................. 9 2.3 Mapping Nutrient Sources ................................ ................................ ............................ 11 2.3.1 Atmospheric Deposition ................................ ................................ ........................ 12 2.3.2 Septic Systems ................................ ................................ ................................ ....... 15 2.3.3 Chemical Non - agricultural Fertilizer ................................ ................................ ...... 16 2.3.4 Point Sources ................................ ................................ ................................ ......... 18 2.3.5 Agricultural Fertilizer Sources ................................ ................................ ............... 18 2.3.5.1 Manure ................................ ................................ ................................ ............ 20 2.3.5.2 Chemical Agricultural Fertilizer ................................ ................................ ...... 21 2.3.6 Nitrogen Fixation ................................ ................................ ................................ ... 21 2.4 Analytical Methods ................................ ................................ ................................ ...... 23 2.4.1 k - means Clustering and Nutrient Input Landscapes ................................ ................ 24 2.4.2 Model Comparisons ................................ ................................ ............................... 24 2.4.3 Sensitivity of Ch emical Agricultural Fertilizer ................................ ....................... 25 3 Results and Discussion ................................ ................................ ................................ ....... 27 3.1 SENSEmap - US - GLB ................................ ................................ ................................ ... 27 3.2 N:P Ratios ................................ ................................ ................................ .................... 33 3.3 Nutrient Input Landscapes ................................ ................................ ............................ 35 3.4 Comparisons to Other Products ................................ ................................ .................... 42 3.5 Sensitivity to Chemi cal Agricultural Fertilizer ................................ .............................. 43 4 Conclusions ................................ ................................ ................................ ........................ 45 Acknowledgments ................................ ................................ ................................ ................. 46 APPENDIX ................................ ................................ ................................ .............................. 48 CHAPTER 2: CONNECTING LANDSCAPE CHARACTERITICS T O GROUNDWATER NITRATE CONCENTRATION ................................ ................................ ............................... 90 Abstract ................................ ................................ ................................ ................................ . 90 vii 1 Introduction ................................ ................................ ................................ ........................ 91 2 Study Area ................................ ................................ ................................ .......................... 94 3 Data ................................ ................................ ................................ ................................ .... 97 3.1 Well Chemistry ................................ ................................ ................................ ............ 97 3.2 Wellogic ................................ ................................ ................................ ...................... 98 3.3 Nitrogen Loads ................................ ................................ ................................ ............. 98 3.4 Land Use/Land Cover ................................ ................................ ................................ .. 99 3.5 Soil ................................ ................................ ................................ .............................. 99 3.6 Aquifer Recharge ................................ ................................ ................................ ......... 99 3.7 Aquifer Saturated Hydraulic Conductivity ................................ ................................ .. 100 4 Methods ................................ ................................ ................................ ............................ 101 4.1 Kriging ................................ ................................ ................................ ....................... 101 4.2 CART ................................ ................................ ................................ ........................ 103 5 Results and Discussion ................................ ................................ ................................ ..... 105 5.2 Interpreting CART ................................ ................................ ................................ ..... 109 5.2.1 Probability of Exceeding 0.4 mg/L ................................ ................................ ....... 110 5.2.2 Probability of Exceeding 2 mg/L ................................ ................................ .......... 113 5.2.3 Sensitivity ................................ ................................ ................................ ............ 115 5.3 Understanding Drivers of Groundwater Nitrate Concentration ................................ .... 116 5.3.1 Recharge ................................ ................................ ................................ .............. 116 5.3.2 Vulnerable Geology and Hazardous Land Use ................................ ..................... 117 6 Conclusion ................................ ................................ ................................ ........................ 119 Acknowledgements ................................ ................................ ................................ ............. 121 APPENDIX ................................ ................................ ................................ ............................ 122 REFERENCES ................................ ................................ ................................ ....................... 130 viii LIST OF TABLES Table 1. 1 Summary of SENSEmap - U S - GLB data so u r c e s ................................ ....................... 1 0 T able A1.1 Summary of pho sphorus atmosph e ric deposition sites ................................ ............ 52 Table A1.2 Calibrated values fo r WWTP and Septic co nditions ................................ ............... 56 Table A1.3 Nitrogen fe rtilizer rates ................................ ................................ .......................... 60 Table A 1.4 Summary statistics of Nutrien t Input Landsc apes ................................ ................... 79 Tab l e A1. 5 Pixel level quantil e values for N nonpoint source map ................................ ........... 81 Table A1.6 Pixel level quantile values for P nonpoint source maps ................................ .......... 81 Table A1.7 Summary statistics c omparing SENSEmap and SPARROW ................................ .. 82 Table A1 . 8 S u mmar y sta tistics comparing SENSEmap and NANI ................................ ........... 84 Table 2.1 S ummary of data sources ................................ ................................ .......................... 97 Table A2.1 Complete set of variables used within groundwater nitrate CART ......................... 12 8 ix LIST OF F IGURES Figure 1. 1 S ENSEmap study area ................................ ................................ ............................... 7 Figure 1.2 Schematic diagram of S ENSEmap nutrie nt sources on the landscape ....................... 11 Figure 1. 3 SENSE m ap nitrogen sources for the U.S. Great Lakes Basin ................................ ... 2 8 Figure 1.4 To tal nonpoint N and P ................................ ................................ ............................ 31 Figure 1. 5 Summarie s of sources and applicati on intensity at the lake basin scale .................... 33 Figur e 1.6 SENS Emap N:P ratios ................................ ................................ ............................. 34 Figure 1.7 Nutrient Input Landscapes ................................ ................................ ....................... 36 Figure 1.8 Distribution of N and P source percent within each Nutrient Input Landscape ......... 37 Figure 1. 9 Distri butions of L ULC within thre e intensive agricultural Nutrient Input Landscap es ................................ ................................ ................................ ................................ ................ 39 Figu re 1.10 Comparison Panel of LULC, S E NSEmap TP, N:P ratios, and NILs ....................... 41 Figure A1.1 Atmospheric deposition P site map b y region ................................ ........................ 51 Figure A1.2 Logical statements use d to classif y blocks as WWT P service area ........................ 56 Figure A1.3 Septic WWTP Delineation flowchart ................................ ................................ .... 68 Fi gure A1.4 Septic Placement f low cha r t ................................ ................................ ................... 69 Figure A1.5 Chemical Non - Agricultural Fertilizer flow chart ................................ .................... 70 Figure A1.6 Point sources flow chart ................................ ................................ ......................... 71 Figure A 1.7 . Manure flow char t ................................ ................................ ................................ 72 Figure A1.8 Chemical Agricultural Fertilizer Placement f lowchart ................................ ........... 73 Figure A1.9 N Fixation Yield C alculation flo wc h art ................................ ................................ 74 Figure A1.10 N Fixation Calculation flow chart ................................ ................................ ........ 75 x Figure A1. 11 Silhouette plot for Nutrient Input Land scape s silhouette co efficients .................. 77 Figure A1.12 Proportion of LULC within all Nutr ient Input Landscapes ................................ .. 78 Figure A1.1 3 SENSEmap p hospho rus s ources ................................ ................................ ......... 80 Fi g ure A1. 14 HUC8 level comparison s of SPARROW and SENSEmap ................................ ... 8 3 Figure A1.15 County level compariso ns be tween SENSEmap and NANI ................................ . 85 Figure A1. 16 Difference map between SENSEmap and Cao et al. (2018) N fertilizer ............... 86 Figure A1. 17 Pixel level differences b y SENSEm a p and Ca o et al. ( 2018) fe rtilizer application intensi ty ................................ ................................ ................................ ................................ ... 87 Figure 2. 1 Groundwater ni trate s tudy are a ................................ ................................ ................ 96 Figure 2.2 Well - level nitrate concentration maps ................................ ................................ ..... 10 6 Figure 2.3 Nitrate kriging results ................................ ................................ ............................. 108 Figure 2. 4 CART results for >0.4 mg/L NO 3 - N ................................ ................................ ....... 112 Figure 2. 5 CART results for >2 mg/L NO 3 - N ................................ ................................ .......... 114 Figure 2.6 Aquifer recharge compar ed to probability of exceeding 0.4 mg/L NO 3 - N ............... 117 Figure A2.1 Map of t hickness o f Quaternary glacial sediment ................................ ................. 125 Figure A2.2 Map s of s oil textures for soil depths 100 - 300 cm ................................ ................. 126 Fig ure A2.3 M ap of n itrog en loads to groundwater ................................ ................................ . 127 Fi gure A2.4 Map of Aquifer satu rated conductivity ................................ ................................ . 127 Fig ure A2.5 CART decision tree for se nsitivity analysis pre dicting mean NO 3 - N concentration ................................ ................................ ................................ ................................ ............... 129 xi KEY TO ABBREVIATIONS CART Classification and Regression Trees CDL Croplan d Data Layer EPA Environmenta l Pr otection Agency LP Mich insu l a N Nitrogen NANI Net Anthropoge nic N itrogen Index NIL Nutrient Input Landscape NLCD National Land Cover Database NO 3 - N Nitrate as Nitrogen P Phosp horus SE N SEmap Spatially Explicit Nutrient Source E stim ate map SPARROW SPAtial ly Referenced Regressions on Watershed attributes TN T otal Nitrogen TP Total Phosphorus USDA United States Department of Agriculture USGS United States Geological Survey US - GLB Un i ted States Great Lakes Basi n WWTP Wastewa ter Treatm ent Plant 1 CHAPTE R 1: QUANTIFYING LANDSCAPE NUTRIENT INPUTS WITH SPATIA LLY EXPLICIT NUTRIENT SOURCE ESTIMATE MAPS A bstract N utrient management is an essential part of watershed planning worl dwide to p rotect water resources f rom both widespread la ndscape in puts of nutrien ts (N and P) and point source emissions. To provide inf ormation to regional watershed planners and better understand nutrient sources, we develo ped the Spatially Explicit Nutr ient Sourc e Estimate Map (SENSEmap ) to quantify indiv idu al sources of N and P at their entry points in the landscape. We modeled seven s ources of N and six sources of P across the United States Great Lakes Basin at 30m resolu tion: atmospheric deposition, s eptic syst ems, chemical non - agricu ltural fertilizer, che mical agri cultural fertil izer, manure, nitrogen fixation, and point sources. By modeling these sources, we provide a more detailed view of nutrient inputs to the landsca pe beyond what would be possibl e from lan d use alone. We found 71 % and 88% of N and P, respective ly, came from a gricu ltural sources. The nature of agricultural nutrien t inputs varied significantly across the basin, as relative contributions of chemical agr icultural fertilizers, manure, and N fixa tion changed according t o diverse land use pra ctices reg ionally. We the n app lied k - means cluster analysis and identified nine Nutrient Input Landscapes (NIL) with N and P source characteristics, grouped into intensi ve agricultural, urban, and rur al landsca pes. These NILs can offe r insights into lan dsc ape variab ility that land use data alone cannot; within agricultural NILs, appli cation of chemical fertilizer and manure varied greatly, but land uses were similar. Thes e NILs can provide a framework for broadl y categorizing watershed s that may prove us efu l to both ecological and manag ement practices. 2 1. Introductio n Agricultural inte nsification, rising population, and increased meat consumption have led to a massive incr ease in landscape nutrient inpu ts worldwi de. Anthropogenic nitrog en and phosphorus i npu ts dominat e global nitrog en (N ) and phosphorus (P) cycles, and drive eutrophicat ion of both inland and coastal waters (Howarth, 2008; Smith et al., 1999; Anderson, 2002) . Beyond fundamentally altering ecosystem s, these high loads of N and P have fueled har mful algal blooms (HABs), whic h directly harm human and ecosystem health, and br ing negative economic consequences (Paerl & Otten, 2013; Carmichael, et al. 2001; Dodds e t al. 2009; Brooks et al. 2016) . The need to manage anthropogenic inputs of nutrient s h as become clear, however a lac k of detailed information regarding where and how nutrients are applied hinders both science and management. Nutrients are applied to the l andscape through a host of natu ral and an thropogenic mechanisms, few of which can be ef fectively measured, and e ven f ewer of which are actively monitored. Further comp licating the issue is that where present, federal and provincial level statues exempt non - point source inputs from repor ting requi rements. For instance, t he US Clean Water A ct tightly re gulates point s ource loads from wastewater treatment plants and other facilities but allows unreported nutrient applications for agricultural and municipal lan d management (Clean Water Act, 1972). Lac king detailed data, scie ntists and watershe d m anagers co mmonly turn to nutri ent modeling as a primary tool to understand and q uantify nutrient inputs, especially to aid in the management of Total Maximum Daily Loads (TMDL), a part of the US Clean Wat ershed Act (US EPA, 1991 ). Since nutrients ar e poorly m onitored, model ing h as become a major tool to understand non - point sou rce nutrient fluxes within watersheds. Land use is commonly used to determine 3 nutrient lo ads, as distinct sources can be estimated d o n land use may be used, o r lan d use may be an input within a more mechanistic tr ansport framework. USGS SPAtially Referenced Regressions on Watershed attributes (SPARROW ) uses a variety of watershed c haracteris tics to fit a regression model calibrated w ith measured in stream loads and transported through a stream network (Smith et al. , 1997; Alexander et al., 2004). Other methods include the Net Anthropogenic Nitrogen Ind ex (NANI) which correlates rive rine N flu x with four variables: s ynthetic fertilizer , a gricultura l N fixation, a tmosp heric deposition, and net movement of human and an imal food and has recently been applied to all counties in the United States (Swaney et a l., 2018; Howarth et al., 2012; Howarth e t al., 1996). Nutrient s ources and land use ar e also inp uts to process - based models. For example, the ELEMeNT (Exploration of Long - tErM Nutrient Trajectories) modeling approach uses land use and nitrogen surplus cal culations (atmospheric depositi on, biolog ical nitrogen fixation, chemical fertilizer , a nd manure) within a legac y and transport framework (Van Meter et al., 2017). A c ommonly used process - based model, SWAT (Soil and Water Assessment Tool) represents differ ent nutrient source input rates through p arametrized land use typ es (Arnold et al., 199 8). Howeve r, these approa ches often make broad assumptions about the loading rat e from a single land use. This has been identified as a concern in regional USGS SPARROW models (Preston et al., 2011), and become s especially problematic in agricultural la nds capes wher e a variety of crops are grown and fertilized with different amounts a nd types of fertilizer. This is an issue for quantifying the environmental effects of dif ferent fertilizing systems, as manure and chemical fertilizers ha ve different nitrou s o xide emiss ions (a potent green house gas) and nitrate leaching to groundwater (Ba sso & Ritchie, 2005; Tuomisto et al., 2012; Charles et al., 2017). Additionally, local da ta about septic systems and poi nt sources have been shown to impr ove 4 modeling when a dde d to coars er national dat a (Sc own et al., 2017). Approaches that use explicit so urces, like NANI, generally estimate loads at large watershed or county scales and are us ed to predict riverine outputs (Han & All an, 2008; Hong et al., 2 013; Goyette et al. , 2 016), rath er than informi ng sp ecific changes in local management. Here we descri be the first version of the Spatially Explicit Nutrient Source Estimate Map (SENSEmap) pr oduct, constructed using an app roach that seeks to avoid compromi ses frequently made be tween spat ial extent, res oluti on, and detail of nutrient specificity. We then de monstrate the approach for the U.S. portion of the Laurentian Great Lakes Basin (US - GLB), a region that contains both in land lakes and Great Lakes represe nting relatively pr ist ine to hig hly eutrophic c ondit ions, and where HABs are increasingly inflicting e cological, human health, and economic harm. With SENSEmap - US - GLB , we estimate landscape nu trient inputs for seven N and six P sour ces at 30 m resolution. S ources include atm osp heric depos ition, septic system s, chemical non - agricultural fertilizer, chemical agricultural fertilizer, manure, nitrogen fixation from legumes, and point sources. SENSEmap - US - GLB was developed from and expan ds upon previous research in er Peninsula wh ere six nutri ent sources were quantified and transported using a sta tistical model to estimate riverine exports (Luscz et al., 2015; 2017). Highlights of i mpro vements to the method include updating and improving all nutrient source modeling met hods, adding N Fixation, ula to all eight Great Lakes States. SENSEmap outputs include pixel - based nutrient inpu ts a ggregated to the HUC12 waters hed scale. We also identify nine dis tinct Nutrient In put Landscapes (NILs), which are sets of watersheds most similar in their quantity and composition of nutrient sources. 5 These NILs reduce a complex set of 13 spatially va ryin g nutrient sources into a str aightforwa rd framework that can be l inked to other in dic ators of wat er quality an d ecological condition. 6 2. Materials and Methods 2.1 Study Area The North American Laurentian Great Lakes Basin (GLB) consists of five basins sur rounding Lakes Superior, Mi chigan, Hu ron, Erie, and Ontario, co vering 580,000 sq uar e kilometers i n the Unite d States and Canada. HABs are found throughout the worl d and across the American Great Lakes region, most notably in the large annual blooms in west ern Lake Erie (Figure 1. 1). The HABs in Lake Erie have been par ticularly dangero us due to toxins in cyanobac teria that cause neurological and liver damage to human s and animals, resulting in chal ak, et al. 2013; Fitzsimmon s, 2014). Intensive use of chemical agricultural fert ili zer and manure are major causes of modern high nutrient loads (Withers et al., 2 014). Although Lake Erie is a prominent example, eutrophication and HABs are a problem at all scales, and problems as la rge as tho se found in Lake Erie resu lt from mismanage men t of landscape s in the co ntributing watershed. 7 Figure 1.1 SENSEmap st udy area. Land use/land cover in the U.S. Great Lakes Basin from NLCD 2011. Nonurban areas use the Anderson lev el 1 description, while urb an areas a re differentiated using An derson l evel 2 descr iption. This s tudy descr ibes the development of SENSEmap - US - GLB , which q uantifies average annual nutrient inputs for the 2008 - 2015 period over the United States Great Lakes B asin (US - GLB) portion of e ight state s: Indiana, Illinois, Michi gan, M innesota, Ne w York, Ohio, Pe nnsylvani a, and Wisconsin. This expansion of nutrient i np ut maps was possible due to readily available data in the US, including census, agricultural census, a nd land use datasets. Futu re work ma y extend these maps to the Canadi an portion o f the basin and develop t emporal estimates of landscape nutrient inputs . Climatologically, the US - GLB is temperate and sub - humid, and includes Koeppen - Geiger zones Dfa and Df b (hot - summer and warm - sum mer humid continental climates, respe ctivel y). Annual a verage (1981 - 2 010) temp eratures range from 3 - 10° C (PRISM). 8 Annual pr ecipitation is distributed relatively uniformly between 700 and 1000 mm, with local areas receiving up to 1500 mm/yr. A signific ant annual snowpack accumulates durin g the winter month s, melting inter mittently during the season in the southern part of the b asin and often persisting until early spring in the northern parts of the basin. The US - GLB exhibits a north - south gradient of l and use: i n the north, typically abov e 43 t o 45 degrees N, population d ensity is low and land use/cover (LULC) is primarily fo re st and wetlands, with some low intensity agriculture (Figure 1 .1 ). In the central and southern basin, intensive agriculture and urban cent ers dominate. Ohio, Indiana , and southern Mic higan primarily grow corn and soybean in rotations with a single annual g rowing season. Alfalfa and other hay crops are also grown as a primary crop in northern regions. Winte r wheat and other small gr ains like rye and barley are also com mon. S pecialty cro ps include potat oes, appl es, grapes, blueberries, dry beans, cherries, sq uash, sugar beets, asparagus, carrots, tomatoes, and cucumbers. Livestock agriculture is also common, primarily chickens, dairy cows, catt le, hogs, chickens, and tur keys. Wisconsin no tably has intens ive dairy operations (USDA Ag Census, 2012). The Grea t Lakes Basin contains Detroit, Michigan; Milwaukee, Wisconsin; Cleveland, Ohio; and Buffalo, New York, as well as other mid - size cities and many rural agricultural to wns. P opulation de nsity is higher in the so uthern and central GLB, as there are more majo r cities and agricultural towns. Although cities and many towns have wastewater treatment plants, septic systems are still heavily used with in suburbs, small towns, an d rura l population s. Septic system s are a d ispersed, but commonly neglected, nutrient sou rc e that loads directly to groundwater. The last glacial period and subsequent melt deposited coarse - te xtured sediments across mu ch of the basin. Exceptions to this a re are as of lake p lain clays that formed in pro - glacial 9 lakes, now covering some of the a re as adjacent to lakes Huron and Erie. As a result, soil textures vary from coarse sands and gravels to fine textured soils with g reater tha n 90% clay particles. This variab ility in soi l textures is re levant to what crops are grown, how much fertilizer is ap plied, and eventually how much of those applied nutrients are used by crops versus those that are mobi lized via runoff, erosion, or deep p ercolation into groundwater syste ms. 2.2 Data Sources SENSEm ap quanti fies nutrient inputs at their origin on the la nd scape at a 30 m resolution to match the resolution of the National Land Cover Dataset (NLCD, Homer et al., 2015). The model uses data from 2008 to 2015 to produce av erage annual nutri ent inputs estim ates for this timespan. Those data sources are listed i n Table 1. 1 and are described in further detail for each nutrient source in the following section. Core datasets for SENSEmap incl ude the US Census (population and hou sehold s), USDA Agr icultural Census (livesto ck, crops), USDA Cropland Data Layer (remote s en sing product locating major crops), and the NLCD (land use/land cover ) . 10 Table 1. 1 Summary of SENSEmap - US - GLB data sources . Data sources, t ime period, and resolution for ea ch nutrient source. Description Datas et(s) Source(s) Year(s) Resolution Used For L and Cover, Imperviousness National Land Cover Dataset (NLCD) Multi - Resolution Land Characteristics Consortium (MRLC) 2011 30 m All N Atmosp heric Deposition Total Depo sition Maps (Total N, wet+dry) National Atm ospheric Deposition Program (NADP) Schwede & L ear (2014) 2008 - 2015 4134.354 m Atm. Dep. Soil Properties SSURGO National Resources Conservation Service n/a Vector/30 m N Fixati on, Chem. Ag. Fertilizer, Manure Cro p Loca tions Cropla nd Data Layer (CDL) USDA NASS 2008 - 2015 30 m N Fixation, Chem. Ag. Fert ilizer, Manure Crop Yields Agricultural Census USDA NASS 2007, 2012 County N Fixation, Chem. Ag. Fertilizer, Manure Chemical Fer tilizer Qu antities County level chemi cal fe rtilizer USG S: Brakebill & Gronberg ( 2017), Gronberg & Spahr (2012) 2008 - 2012 Count y Chem. Ag., Chem. NonAg. Fertilizer Remotely - sensed Greenness LANDSAT NASA, Accessed via Google Earth Engine 2008 - 2015 30 m N Fi xation An imal/Farm Inventories Agric ultura l Census USD A NASS 2012 County Manure Confined Animal Feeding Operations State Dat abases Point Manure Incorporated Areas, Census Blocks TIGER Line Files US Census 2010 Polygon Septic Drinking Water Well Loc ations State Databases Poi nt Sept ic Re gulated Point Sources National Pollut ant Discharge Elimination System (NPDES) Repor ts USEPA Discharge Monitoring Report 2008 - 2015 Point Septic, Point Sources Household Information US Census US Cens us, IPUMS 2010 Tabular Septic, Chem. NonAg Fe rtilize r Gol f Course Locations Original Google Ea rth imagery 2008 - 2015 Polygon Chem. NonAg Fert ilizer 11 2.3 Mapping Nutrient Sources S even nutrient sources are included in SENSEmap as illustrated in Figure 1. 2, which shows th e diff erent landscape processe s assoc iated with each source. SE tial resolution allows these processes to be c aptured at their origin, rather than av eraged over a ata - driven and vary a t 30 meter resolution. Q uantifi cation and spatial locatio n of each input s ource is done via one of four sets of methods: 1) summarizing reported loads at known locations for point sources, 2) interpolation of observed data for atmosphe ric deposition, 3) sp atial disaggregation of county - level estimated inputs for chemical agricul tural and non - agricultural fertilizers, and 4 ) summarizing, locating, and disaggrega - based inputs, from a variety of coarser scales for CAFO s, septic syste ms, no n - CAFO manure, and N fix ation. Fi gure 1.2 Schematic diagram of SENSEma p nutrient sources on the landscape . Sources p ictured include (clockwise from upper l eft): point sources, chemical non - a gricultural fertilizer, atmospheric depos ition (wet and dry), manure (confined and unc onfined ), sep tic tank s, N fixation from legumes, a nd chemical agricultural fertilizer. 12 For both N and P, the individual SENSEmap sourc es are quantified as mass of Total N (TN as kg) or Total P (TP as kg) across all organic and inor ganic forms. Each of the individ ual so urces consists of multiple species, w hich are summed according to source - s pecific p rocedures. For all non - point sources, inputs are reported as annual area - specific inputs (kg - N/ha/yr or kg - P/ha/yr) . Point source loads are quantified as annual mass f luxes (kg - N/yr or kg - P/yr). For nutrient ac counting, we considered all significa nt source s of nutrients added directly to the l andscape. For example, all nutrients from animal excretion could become appli ed manure (alth ough some N is lost to the atm osphere prior to application, as described below), even though some of those nutrients likely ca me from harvested N and P within the G reat Lakes Basin. By contrast, properly - maintained septic systems are regular ly pumped out, thus removing a significant qu antity of the nutrients that are then commonly lan d - applied. Here we omit land applicat ion of se ptage and assume that all human - excret ed nutrients deposited into septic tanks are applied at the septic location. In the follow sub - s ections, we describe key details about the seven nutrient sources. Much mor e detailed information, including flo w charts illustrating the complete workflow for each source are included in the Appendix Figures A1. 3 - A1. 10. 2.3.1 Atmospher ic Deposition A tmosp heric loading of nitrogen and ph osphor us occurs through both wet and dry de position and can be the primary loadi ng mechan ism for either nutrient in remote area s. The main components of phosphorus emitted to the atmosphere are dust from soils, marine a eroso ls, volcanic ash, biomass burnin g, and combustion of oil and coal. Agricult ural activity and phosphate manufactu ring and mining may contribute to higher rates of localized deposition in some areas. Fossil fuel combustion and high intens ity agriculture are the main sources of 13 nitro gen to the at mosphere. Wet deposition is measured by collecting precipitation in a samp ler and a nalyzing for nitrogen and phosphorus c ompounds. The volume of collected precipitation is then multiplied by the ana lyzed concentra tions of nitrogen and phosphor us comp ounds and corrected for sampler cross - secti onal area. Dry deposition is more cha llenging to directly measure, so rates are esti mated by modeling particle deposition velocities of measured atmospheric conc entrations of c ompou nds (USEPA, 2010). Wet d epositi on rat es are generally driven by large scal e processes and can be significantly correlate d between sites separated by large dis tances (Anderson and Downing, 2006): therefore, all sites selected were withi n 700 km of the Grea t Lakes Basin. Large phos phorus partic les can move over short distances whi le finer dust may be able to travel o ver thous ands of kilometers (Tipping et al., 20 14) and a 700 km buffer was chosen for atmospheric phosphorus analysis as wel l. Atmospheric phos phorus concentrations are much l ower r elative to other macronutrients makin g accurate detection difficult (Mahow ald, 2008 ). As a result, atmospheric monitoring is limited and nutrient accounting models rarely include it within their bud gets (David & G entry , 2000). Luscz et al. (20 15) use d four sites from a single monitoring netwo rk to interpolate P deposition. To ex pand this analysis, we identified all comparabl e P deposition data through an extensive literature review (see Appendix Text 1.1 for a deta iled description of this revie w proce ss). T hrough this review, we identified 23 papers and 1 monitoring network conta ining dat a from 98 sites reporting total wet an d dry TP deposition within a 700 km radius of the GLB from 1970 to 2011 (see complete refere nce s in Table A1. 1). We assume d that P depo sition rates have remained largely st able through time. This assumption is consiste nt with a review by Tipping et al. (20 14) who found that P deposition rates at sites with ten or more years of cont inuous data 14 sho ws th at significant systematic change s are uncommon. This is supported by our lo ngest continuous set of atmospheric p hosphorus data collection which showed no signi ficant trend in annual deposition (Eimers et al., 2009). In contrast, both em ission and subs equen t deposition rates of atm ospheri c nitr ogen during the past four decades lar gely declined on national and regiona l scales (Monks et al., 2009; Kothawala et al., 2011). Due to the limited available data points and their inherent distribu tion across the land scape, available phosphor us data was i nterpolated using kriging to develop spatially distributed loading estimat es. IDW a nd kriging interpolation methods were tested to find the most appropriate model for both nutrients. Interpolation m ethods were tes ted a nd selected based upon th e param eters which best minimized the root mean sq uare error (see Text A1 .1). Phosphoru s is most ly collected through bulk precipitatio n methods (both dry and wet fallout) and as such loading estimates were model ed one time as total phosphorus. SENSEmap us es the Nation al Atmospheric Deposition Program (NA DP) Total Deposition Science Committe e (TDEP) product for total nitrogen deposition. TDEP estimates atmospheric nitrogen deposition rates using a hybrid approach combining mode led c oncentrations with measur ed conc entrat ions (Schwede and Lear, 2014). The ap proach estimates dry concentrations u sing data from the Clean Air Status and Trends Network (CASTNET), the Ammonia Monitoring Network (AMoN) and the SouthEastern Aerosol Resear ch an d Characterization (SEARC H) netw ork co mbined with modeled concentrations fr om the Community Multiscale Air Quali ty (CMAQ) model. The dry deposition value estim ates are then combined with wet deposition values from the National Trends Ne twork (NTN) to devel op total deposition value s of ni trogen including organic nitrogen. The TDEP model is useful for regional, broad - scale dep osition mapping (Bytnerowicz et al., 2 016; Sicard et al., 2016). SENSEmap uses 15 a mean of 2008 - 2015 values from grid ded TDEP maps t o cap ture average annual total nitrog en dep osition for that timeframe ( ftp.epa.g ov/castnet/tdep ). 2.3.2 Septic Syste ms Onsite wastewater treatment systems, known as septic tanks, are widely use d in rural and subur ban areas. They can be co nsidere d poin t sources on small scales, but on a r egional scale they are best represent ed as a n on - point sources due to unknown locati ons, high densities, and small loads. While septic tanks are an urban source of nutrients, t hey a re a direct source of nut rients to gro undwater and therefore transport diff ers significantly from surface - applie d urban n utrients such as non - agricultural chem ical fertilizer. Our septic source model consists of three elements: 1) crea ting an exclusi on ma sk defined by service are a bound aries of wastewater treatment plants (WWTP) within which no septic systems are p laced, 2) overlaying an inclusion mask based on land use and appropriate set back distances, and 3) calculating septic syste m numbers and l oadin g rates from US Census da ta and litera ture sources. Determining exclusion a reas required extensive data synthesi s to mode l WWTP service areas, as these are not publicly available at the scale of the GLB. Drinking water wells, population density, and d istan ce to WWTPs were used to classif y serv ice areas. Inclusion areas were selec ted based on land use, using primaril y urban, non - WWTP service areas. Within each Ce nsus block in inclusion areas, one septic system was assumed per household un it. These septi c sys tems were first placed ad jacent to wel l locations, and then randomly within the primary inclusion mask of each C ensus blo ck if there were more reported househo ld units than known wells. Septic systems were placed at densities less than 1 per 675 squa re me acr e, a no rmal h ousehold lot size in the US) and were secondarily applied to non - urban are as near roads as needed. See Text A 1.2 for more information. 16 Septic nutrient discharge was assumed to occur at the septic sy stem location a nd va ry across Census blocks accordin g to a verage household size and vacancy. We did not consider pumping for off - sit e disposal of septage solids -- an omission that would not change total loaded nutrients within a watershed but would affect t he spatial plac ement . For specific nutrient rates, w e used t Systems Manual estimate of 4.1 kg/y ear/person nitrogen and 1 kg/year/person phosphorus (USEPA, 2002). These values were then multiplied by average household si ze at the Censu s blo ck level and reduced to account for ho usehold units that are designated sea sonal or vacant. Within each Census b lock, the final per - septic load was applied to all placed septic systems. These loading rates were then rasterized and summe d at the 30 m c ell s ize. 2.3.3 Chemical Non - agricult ural F ertilizer Non - agricultural chemical f ertilizers, which are primarily appli ed to lawns and golf courses, are major sources of nutrients in urban areas. We developed a two - part approach to spatially d isaggregate cou nty - l evel estimates of total non - agri cultur al chemical fertilizer sales from 200 8 - 2012, where 2012 was the latest ava ilable year (Brakebill & Gronberg, 2017), to the 30 meter scale. Within each county, fertilizer applications were placed on golf courses, f ollow ed by distribution of the remain ing nu trients to suburban and urban lawn ar eas. Golf courses, which generally h ave large and frequent fertilizer applications, were first identified using state - specific golf course business directories. Golf courses w ere f irst batch identified based on t wo gen eral golf business directories with g olf course addresses in each GLB stat e. When available, additional state and county specific golf course business directories were used to find courses that may not have been i denti fied in the first sweep. To crea te a c omplete golf course inventory, we fur ther located golf courses not include d in business directories 17 through systematic visual inspection of 2008 - 2015 aerial imagery across the Great Lakes Basin. T he golf courses list ed in the business directories w ere th en address - matched in ArcGIS to locat e them. For each golf course, polygon s were manually delineated for each course. We analyzed the average proportion of fairway and green to total golf course are a (which includ es ca rt paths, sand traps, roughs, an d wate r features) and estimated the fertili zable area to be 85% for this region. We then randomly applied nutrients at local (Midwest) golf course fertilizer application rates (N: 117 kg/ha/yr; P: 10.7 kg /ha/yr) estimat ed by a survey from the Environmental Insti tute for Golf to golf course raster c ells (GCSAA, 2009) until the total fe rtilized area matched the estimated area within each the course. Following golf course applications, remaining nutrients fro m the county - le vel s ales data (Brakebill & Gronberg, 2017) was applied to urban lawns. As in Ru ddy et al. (2006), we assumed that ar eas with higher population density are more likely to apply lawn fertilizer. In other words, more lawns and businesses choos e to fertilize when there are more people nearby. In aggre gate, this increases fertilizer appli cation rates per Census tract area; h ere we assumed that rates increase linearly to a maximum of 4996 kg N/km 2 within a tract at a population density of 700 pers ons/km 2 . This m ethod was also used in Luscz et al. ( 2015). Remaining county fertilizer was dist ributed among tracts and placed rando mly within urban cells between 10 and 100 m from roads, assuming that fertilizer would be primarily applied to lawns near ro ads. Rates with in ea ch selected cell were given by a verage recommended residential application rates for low - maintenance established lawns given by the University of Minnesota Extension (N: 73.2 kg/ha/yr, P: 10.7 kg/ha), and sufficient cells within each tr act were select ed to match the tract - level applicati on rat e (Rosen et al., 2015). Finally, tota l cell input amounts were corrected b ased on the cell percent perviousness (NLCD 2011, Homer et al. 2015). 18 2.3.4 Point Sources Point source data was acquired th charg e Monitoring Report (DMR) Pollut ant Lo ading Tool (US EPA, 2017), which comb ines National Pollutant Discharge Eli mination System (NPDES) permit data with DMR data to describe the nature and amount of discharge from each point source. NPD ES observes and regu lates effluent information for f acilit ies that discharge to surface waters in the United States. Depending on the facility and its permit, reporting parameters, techniques, and frequency can vary, making the data difficult to use withou t harmonization . As such, we downloaded data for eac h stat e within the Great Lakes Basin for ea ch year from 2008 to 2 015. Depending on the facility, nitrogen or phosphorus is occasionally reported as something other than simply N or P (such as NH 3 , NO 3 - N, and PO 4 ). In th ese s ituations, we converted the repo rted v alue based on mass balance equations, so all data were repr esented as either nitrogen or phosphorus on a mass basis. Most sources did not report every year, so to arrive at a representative value fo r 2008 - 2015 we calcu lated a geometric mean for each point source across the eight years. 2.3.5 Agricultural Fertiliz er Sources The majority of the nutrient demand for agricultural crops are met by applications of manure and chemical agricultural fertilize rs. Crop - specif ic fe rtilizer demand was only used fo r N, w ith values scaled by intensity and ar ea normalized using re commended rates for Michigan (Warncke et al., 2004; Warncke & Dahl, 2003 see Table A1. 3 for values). P requirements are commonly determin ed based on loc al so il tests and desired crop yield rather than simply crop type, so we did not include a P crop dema nd variable within the fertilizer demand model (Warncke et al., 2004). We assumed that if a local source of manure is available, farmers wi ll choose manur e ove r chemical agricultural fertiliz ers du e to lower costs, or the need to spre ad 19 manure associated w ith larger animal feeding operations, thus leading to high rates of nutrient input (Long et al., 2018). The process for determining which c ells would have manu re applied are described in the manure section. Since farmers often apply a dditional N fertilizer to manured fields, we applied N chemical agricultural fertilizer to pixels where manure had not met the 8). The cells o utsid e of manure spreading areas then recei ved chemical agricultural fertilizer as described below. Q uantities of manure and chemical agriculture N fertilizer applications were both based on a per - crop pixel fertilizer demand model. This wa s first paramet erize d using county - level data, then applie d at the pixel scale to disaggregate county - level fertilize r totals. The model considered four county - level averaged variables: latitude, soil texture, cropland area, and crop - specific nutrient dema nd. The county level was chosen to match the resolut ion of USGS fertilizer use data (Brakebill & Gronberg, 2017). We used gradient boosted regression trees (BRT) in scikit - learn , a machine learning technique that creates many regression trees (models that separate predic tors by binary splits) and has better predi cting power at extremes than CART mod els (Elith et al., 200 8; Pedregosa et al., 2011). The BRT models produced good fits to county data (N: R 2 = 0.93; P: R 2 = 0.87), suggesting that the model could handle a wide r ange of predictor values at the cell level. We then applied the BRT using per - pi xel values for the fou r variables to predict per - pixel fertilizer demand, which was then used as a starting estimate for manure and chemical fertilizer applicati ons. Additional info rmation on the methods used to m odel f ertilizer demand is provided in Text A 1.3, including additi onal details on our BRT application. 20 2.3.5.1 Manure We used multiple data sources to estimate manure application areas with animal - specific and nutrient - s pecif ic loading rates. This procedure consi sts of four steps: 1) determine a com plete animal inventory including counts, types, and confinement status for all farms; 2) calculate how much manure is produced at each farm, quantify N volatiliz ation during st orage , and determine the total acreag e to w hich manure should be applied; 3) pla ce the farms within ea ch county; and 4) create a buffered area around each farm upon which the manure will be spread. Key details about each of these steps are discussed below , wit h additional information in Text A 1.4. Animal inventories and confinement s tatus for all farms we re estimated using a combination of USDA Ag Census data and state - level records. These records do not allow us to precisely determine the s ize of each far m, bu t rather to designate appropriat e numb ers of farms in different size interv als within each county . Manure is then produced by each animal at specific loading rates, either provided by literature or estimated via an empirical relationshi p based on aver age a nimal size (for less common anim al typ es). Some N is lost to the atmosphere during manure storage , which is estimated for each farm and animal type separately. An additional small fixed ratio of manure is lost from CAFOs due to incomple te recovery. Fa rms a re then placed within each count y, at actual known locations for large Conf ined Animal Feeding Op erations and pseudo - randomly within agricultural areas of each county for smaller farms. Finally, a buffer is iteratively created around ea ch farm to whic h man ure is applied on both row crop and gr assland land use types. Within each b uffer, manure is appli ed at relative rates given by the BRT fertilizer demand model, adjusted to match total farm nutrient production following losses. 21 2.3.5.2 C hemical Agricul tural Fertilizer Following manure app licati on, chemical agricultural fertilizer was applied to all non - manured cells at the N and P determined rates from the BRT demand model. Manured cells that had not met N demand received supplemental N f ertilizer. Brak ebill and Gronberg (2017) calculated chemic al agricultural fertilizer use by cou nty based on observed values and a fertilizer spending model for states without records up to 2012. We used the average value from 2008 - 2012 as our observed coun ty fertilizer l oad a nd scaled non - manured fertilized pixel s within each county to match observe d county values. Addit ional information on chemical agricultural fertilizer methodology can be found Text A1. 1.5. 2.3.6 Nitrogen Fixation Symbiotic nitrogen fixa tion is an impo rtant component of nitrogen budgets i n agri cultural regions. Legumes such as bea ns, hay, and clover host microbes within their root systems that fix nitrogen that becomes incorporated both within plant biomass and surr ounding soil at rates v arying by crop yield , soil, and climatic conditions (Gools by et al., 1999; Barry et al., 1993; Meisinger & Randall, 1991). In regions with significant legume cultivation, omitting N fixation from budgets would significantly underesti mate agricultural N sou rce inputs to t he la ndscape. Because of this, we inc luded four nitrogen fixing crops that are c ommonly grown in the GLB: soybeans, dry beans, alfalfa, and non - alfalfa hay. Natural N fixation was not included, as high natural N fixing biomes (savannah, gras sland) are not prese nt across significant areas of t he GLB , while forested biomes almost entire ly recycle N (Cleveland et al., 2013). Here we calculate nitrogen fixation as a function of crop type, yield, soil conditions, and local fertilization rates fol lowing methods devel oped in other studies. In the li teratu re, both area - based and yield - based c alculations have been used to estimate N fixation rates. Area - based 22 calculations use a fixed rate per unit area of cropland (Han & Allan, 2008; Boyer et al., 200 2). Yield - based esti mates can be based on either abo vegrou nd - only or total system biomass (Meis inger & Randall, 1991). Above - ground yield - based estimates rely on percentages of grain to total plant mass, protein content in grain, and the proportion of abov e - ground accumu lated N harvested (nitrogen harvest i ndex) (David & Gentry, 2000; David et al., 1997). These authors used a constant rate of nitrogen from fixation based on data from Illinois for extents as broad as the Mississippi Ri ver Basin, irrespective of yield varia bilit y (David et al. 2010). As relati vely c ool temperatures and shorter growing seasons limit yields in much of the US - GLB, we chose to account for heterogeneity in crop N fixation using a yield - dependent approach foll owing the method of Han and Allan (200 8), w hich was developed from Meisinge r and work. This provides estimates of fixed nitrogen mass per unit crop corrected for typical crop - specific moisture rates (i.e., the ratio of harvested dry biomas s to total harvest mass ) based on a ra nge o f published field studies. Meisi nger a nd Randall (1991) also provide guidel ines to estimate the percent of total plant nitrogen from fixation based on soil nitrogen availability and other nitrogen inputs such as f ertilization. We consid er N fixation f rom s oybeans, dry beans, alfalfa, and non - a lfalfa hay, which are the primary N f ixing crops grown in the US - GLB region. Details of the N fixation model, including how we statistically estimated yield using soil type an d remotely - sensed green ness, are provi ded i n Text A1. 1.6. To account for cr op rot ation, especially the common corn - soy rotation, we adjusted per - cell values based on how frequently the cell was planted in soy between 2008 and 2015 using the CDL. As a satel lite product, the CDL c ontains inaccur acies and errors in area due to mixed pixel s (cells with one class that may have multiple classes within them). To correct for area inaccuracy, we used ancillary data from the USDA Ag Census 2012 to adjust total 23 cultiv ated areas, as recommen ded by Lark et al. ( 2017). Using this yield - based me thod, our N fixation values varied based on yield, soil, N applications, and percent organic matter. As a result of crop - rotation based averaging across the 2008 - 2015 model period, final values for averag e annual N fixa tion are lower than fixation during a singl e fixing crop year. The resulting mod el considers more heterogeneous landscape features and integrates multiple remote sensing and spatially explicit datasets. 2.4 Analytical Methods After producing the 30 m raste rs fo r each source, we analyzed these raste rs individually and in their spatiall y aggregated forms to identify patterns and compare to other data. Three scales of aggregation were used: Great Lakes catchment (US - side o nly), Hydrologic Unit C ode 8 - digit (HU C8), and HUC 12 - digit (HUC12). Within those catchments, a variety of aggregate q uantities were calculated, including totals of each source, totals across N and P, and ratios of N:P inputs within aggregation features. F inally, we used k - means clustering to ident ify watersheds with similar nutr ient i nput profiles, which we call Nutrient Input Landscapes (NILs). SENSEmap results were summarized at the HUC12 watershed level, which average 79 km 2 in the US - GLB. By aggregatin g to a fine watershed s cale, we can un derst and local patterns in nutrient i nputs at a scale relevant to management. Ad ditionally, aggregating removes any potential privacy issues that may exist with 30 m outputs (i.e., farms). Furthermore, HUC12 is the spa tial framework within t he NOAA Great L akes Basin Tipping Point Planner, a m ulti - m odel decision support system to aid w atershed planners (tippingpointplanner.org). The Tipping Point Planner is being used to help Great Lakes communities understand the health of their watersheds. 24 2.4.1 k - means C luste ring and Nutrient Input Landscap es We grouped the resultant 3627 HUC12 wate rsheds based on their nutrient input composition using k - means clustering to identify similar watersheds across the US - GLB. K - means cluste ring is a commonly used clustering alg orith m that minimizes the sum of squa red er ror among a specified number of group s (Jain, 2010). We explored the effect of different ate of input by source (kg/ha/yr), tot al in put from source (kg), and percen t of t otal nutrient input by source). The g oal was to find physically meaningful patterns in the landscape, rather than fit a specific predictive model. Our final model included 9 g roups (k=9) generated f rom 13 variable s (% of total nutrient input by sourc e for all N and P) described watersheds in the most physically meaningful way and produced the highest mean silhouette score (measure of similarity of k - groups) (mean silhouette sco re = 0.39; silhouette p lot and further clus ter information found in Text A1 .2 ). W e labelled nine Nutrient Input Landsc apes post - hoc based on the composition of sources, land use, and input rates within the groups (See section 3.3). 2.4.2 Model Comparisons We compared SENSEmap nu trient inputs t o thr ee other products: 1) The USGS S PAtial ly Referenced Regressions on Watershe d attributes (SPARROW) model (Smith et al., 1997), 2) Net Anthropogenic Nitrogen Index (NANI) inputs from Swaney et al., (2018), and 3) 5 km annual fertilizer ma ps from Cao et al. ( 2018). SPARROW provides both a n utrien t inputs database, in some ways compa rable to the SENSEmap product here, along with regressions of nutrient uptake within watersheds. We compared chemical agricultural fertili zer, chemical non - agric ultural fertili zer, and total manure to SPARROW inpu ts fro SPARROW model including the Great Lakes Basin using data processed by Wieczorek and 25 Lamotte (2011). We summarized these sources from SENSEmap at the HUC8 level to be co nsistent with S PARRO W output summaries. NANI from Sw aney e t al. (2018) quantifies key nitrogen inputs and exports at county level across the continental US. We summarized SENSEmap at the county level and normalized to county area to compare to chemical agr icultural ferti lizer , total manure, and N fixation. Cao et al. (2018) produced 5 km resolution maps of nitrogen fertilizer applications from 1850 to 2015 that depict crop - specific rates, application timing, and ammonium - N and nitrate - N proportions. SENSEma p was aggregate d to 5 km cells using mean input inte nsity and compared to the average value of Cao et al. (2018) between 2008 - Section 3.4 and additional de tails in Text A1.4 . 2.4 .3 Sensitivity of C h emical Agricultural Fertilizer We per formed a sensitivity analysis of SENS Emap chemical agricultural fertilizer inputs to better understand the role of this important nutrient source in controlling broader lands cape patterns. For this analysis, count y le v el totals from Brakebill & Gronb erg (2 017) were adjusted to create fourteen new simulations addressing three different types of uncertainty: systematic bias in 1) positive or 2) negative direction, and 3) random uncertainty. Following t he methodology desc r ibed above to produce the chemic al agr icultural fertilizer values, these co unty - level values were used to adjust pixel level predictions. For each of the fourteen simulations, N and P were aggregated at the HUC12 watershed level with va riation between run s quantified via coefficient of v ariati on. Watersheds in each simulation wer e then assigned to Nutrient Input Landscapes using the k - means clusters defined from the primary run. Changes between Nutrient Input Land scape classifications fo r each of the s imul a tions and the primary run 26 were t hen ta bulated. Further description of sensi tivity analysis methods can be found in A1. 5 and discussion of the sensitivity results can be found in Section 3.5. 27 3 Results and Discuss ion 3.1 SENSEmap - US - GL B The primary S ENSEm ap - US - GLB product is composed of s even N sources and six P sources, with ea ch N source input map shown in Figure 1. 3 (P sources are visually similar in distribution, see Fig ure A1. 13). Generated at 30 m resolution, t he full variability within each map is d iff icult to visualize in printed f orm. Within Figure 1. 3, we provide an ins et for each source showing an enlarged area surrounding Toledo in western Ohio to better illustrate the high - resolution 30 m outputs. The full raster datasets, a long with the H UC12 and HUC8 summaries, are available for download from Hydroshare ( https://doi.org/10.4211/hs.1a116e5460e24177999c7bd6f8292421 ). 28 Figure 1. 3 SENSEmap nitrogen s ources for the U.S. Gre at Lakes Basin . The color bar br eak s were selected to highlight variabil ity within each nutrient source at this map scale. Refer to Table A 1. 5 for pixel - level distributions. At the top right of each subfigure is a zoom box, with loca tion identified in ( A). (A) Chemical agricultural (Ag) fer tilizer, (B) Manure, (C) N fixation, (D) Atmospheric deposition, (E) Chemical nonagricultural (NonAg) fertilizer, (F) Septic tanks, (G) Point sources, and (H) Proportion of sou rc es to total inputs o f each nutrient within t he U.S. Great Lakes Basin (US - G LB). 29 Agricultural s ources (chemical agricultural fertilizer, manure, and N fixation) dominate nutrient inputs at the scale of the US - GLB. Chemical agricultural fertilizer makes up 3 4% of N and 45% of P inputs, while manure co ntributes 27% of N and 42% of P (See Figure 1. 3H). Across the GLB, chemical agricultural fertilizer use is most intense in western Ohio and eastern ur e is also common in the southern lo wer penin sula of Michi gan, Wisconsin, an d New York, nutrien t input intensities are lower in these areas. Nitrogen fixation shows similar patterns as chemical agricultural fertilizer due to the common practice of corn - so y rotations. Nitrogen fixation makes up 11% of N inputs in the US - GLB, making it an important so urce to estimate in nutrient budgets. The landscape patterns of manure appear differently than chemical agricultural fertilizer and N fixation due to model assu mp tions that manure i s spread close t o the far ms and CAFOs where it is genera ted. This creates c ircular pockets of high intensity manure loads as is apparent in the inset manure map. Although there are farms and CAFOs in Ohio, Indiana, and Michigan, more l iv estock operations a re present in Wi sconsin a nd New York c ausing higher inte nsity manure inputs . Urban sources are generally smaller contributors to total inputs but can be major sources in localized areas. Chemical non - agricultural fertilizer and septic ta nks represent simil ar proportions o f nutrien t input in th e US - GLB (chemical non - ag. fertilizer N: 2%, P: 2%; septic N: 2%, P: 2%) but have very different landscape patterns. Cities serviced by WWTPs do not have septic system inputs, but nearby suburbs an d rural areas have mo re densely popul ated area s, leading to higher septic inp ut intensities. Alt hough septic systems do not amount to a high proportion of total nutrient input within the US - GLB, they load distinctly differently than surface inputs, supplyi ng nutrients directly into groundwate r pathway s. Furthermor e, septic systems are often present i n areas lacking other sources (except for atmospheric deposition), and they tend to 30 cluster near water bodies where their nutrient loads may contribute to lake eu trophication (Beal et al., 2005; Ho lman et a l., 2008). Ch emical non - agricul tural fertilizer sh ows an opposite landscape pattern to septic: proximal suburbs and urban cores have high intensity input due to extensive lawn and golf course fertilizer applica ti on. Similarly to se ptic systems, ch emical no n - agricultura l fertilizer can b e an intense local nutrient input, even if its relative proportion is low at the US - GLB scale. Point sources are highest in urban areas with large or multiple WWTPs, with reportin g requirements varyin g by state. As i s the cas e with septic systems, point so urces generally loa d directly to streams. Thus, fewer mechanisms for nutrient removal exist, and transport times are shorter -- magnifying the impact of this relatively small propor ti on of total inputs on in - stream loa ds and de liveries to t he Great Lakes. At mospheric depositio n follows a gradient based on population and agricultural LULC. The highest rates of atmospheric deposition are found in heavily farmed and more densely populat ed areas in the south ern basin, while lower va lues are foun d in the more spar sely populated and forested northern basin. However, in these northern areas, atmospheric deposition may be the dominant nutrient input source. For N, atmospheric deposition is a si gnificant source ac ross the US - GLB, accounti ng for 22% of total inputs, whi le just 5% of appli ed P comes from atmospheric deposition. SENSEmap - US - GLB - point sources (atmospheric deposition, septic systems, chemical non - agricultural fertilizer, chemic a l agricultural fert ilizer, manure, nitrogen f ixation from legumes) were comb ined into maps of total non - point N and P (Figure 1. 4). Figure 1. 4 is classified in quantiles to show the distribution of input across the US - GLB, with each color representing 20% of cells within the domain. N and P show simi lar trends i n the highest loadi ng quantiles (red and orange) across the intensely farmed southern US - GLB. However, N and P 31 show differences in the northern basin. While broadly similar, patterns in low loading q uantiles in Michiga ula vary f rom N to P; in N, there is a di stinct input incre ase from west to east, whereas P shows a less distinct pattern and more variability. This is driven by the greater relative importance of atmospheric deposition t o landscape N input s than for P. Al so, cells with higher quantiles of N inpu ts are concentrate peninsula than for P. Figure 1.4 Total nonpoint N and P. (A) Total nonpoint nitrogen and (B) Total nonpoint p hosphorus. The colo r breaks are cla ssified to assign ~20% of pixels to each nutrient input rat e bin. Point sources are excluded because they are applied as a mass flux, rather than an area - specific mass flux of the nonpoint sources. We analyzed nutrient so urce compositions (Figure 1. 5 A - B) a nd input i ntensity (F igure 1. 5C) at indiv idual lake basin scales to better understand the heterogeneity across the US - GLB. Such metrics can be useful to understand the relative possible nutrient sources to a lake and the highest possible l oading. Recall, t hese sourc es and inpu t rates refer to the landscape inputs and Huron, Michigan, and Ontario receive ~70 - 75% N and ~80 - 90% P input from agricu ltural sources (c hemical ag fertilizer , manure, N fixation ). Non - agricultur al human sources (point, septic, chemical non - ag fertilizer) make up less than 10% of lake basin inputs except P inputs to Lake Superior, which receives 11% loadin g from septic and an additional 10% of loadin g from poin t and chemical non - a g fertilizer comb ined. Lake Superior receives a more balanced P input 32 composition than any other lake, but has an atmospheric d eposition - dominated N load. However, as seen in Figur e 1. 5C, Lake Super are very small compa red to other more ag ricultural and po pulated lake basins. Despite similar source compositions in all lake basins (excluding Superior), Lake Erie ha s a catchment loading rate ~1.5x higher than Lakes Hu ron, Michigan, and Ontario. This hi gh input r ate and int ense agricultural co ntribution suppor ts the passage of the Clean Water Act, point source loads have drop ped sufficiently t hat they no longe r dominate nutrient i nputs. In contrast t o the success of reducing point sources, non - point source inputs remain high as they are not regulated in a similar manner. An additional concern with high input rates of non - point sources is the pr esence of land us e legacies that can r esult in decades of delay before impr ovements to surface water concentrations are observed (Meals et al., 2010, Martin et al., 2011, 2017; Verhougs traete, et al. 2015; Ray et al. 2012; Van Meter & Bas u, 2015). 33 Figure 1.5 Summaries of sources a nd applicat ion intensity at the lake basin scale . (A) Proportion of N input from each source by lake basin, (B) Proportion of P input from each source by lake basin, and (C) Input intensities by lake basin (area normalized by the land area of the U.S. lake basin. 3.2 N:P Ratios Nitrogen to phosphorus (N :P) ratios can be used to help understand nutrient inputs and their potential ecological effects. The Redfiel d Ratio is a ratio of N:P typical of aquatic life and is commonly near 1 6:1 in natural wa ters. Hig h N:P ratios within the water co lumn are associat ed with oligotrophic, non - anthropogenic impacted lakes; whereas, low N:P ratios are associated with mesotroph ic to eutrophic lakes with higher anthropogenic inputs (Downing & McCaul ey, 1992). For co mparison, we calculat ed N:P ratios of nut rient inputs at t he HUC12 watershed scale to provide a view of N:P inputs in the US - GLB watersheds (Figure 1. 6A). When viewed this way, the northern basin is highly N - enriched (and thus P - poor), wit h a 51:1 ratio of nutrient inputs to L sin (Figure 1. 6B) . This is consistent with studies that have 34 described Lake Superior as an extreme, low P environment (Sterner source is largely from atmo spheric deposition (Figure 1. 5A) an d shows a much higher N:P landscape input ratio than the o ther Great Lakes (Figure 1. 5C). Generally, and perhaps surprisingly, more southern watersheds have input ratios within range of the Redfield Ratio (pale yellow), w ith urban and low agricultural inte nsity are as being mor e N - enriched than in tensive agricultu ral areas. Few watersheds had lower ratios than 12:1, and those that did were frequently high manure wate rsheds. These values provide a first look at nutrients rel ative to one anoth er the lake basin scale; h owever, due to varied transport rates of differen t sources, these values are not representative of loads to the lake. Figure 1.6 SE NSEmap N:P Ratios. (A) Map of N:P input ratio by HUC12 watershed and (B) Bar chart of N: P input r atio by U S portion of each lake bas in. 35 3.3 Nutrient Input Landscape s SENSEmap - US - G LB at its finest scale shows 30 m cell values of nutrient inputs; however nutrient inputs are a result of wi der landscape processes that are managed at watershed sca les. By zooming ou t to the HUC12 wa tershed sc ale, nutrient input patterns bec ome more readi ly visible. Following k - means clustering (section 2.4.1), nine Nutrient Input Landscapes g accor ding to the nutrient and lan d use char acteristics of each cluster. Tho se 9 landscape s of the US - GLB can then be grouped into three broad categories: intensive agricultural, urban, and rural la ndscapes (Figure 1. 7). Names for each of the landscapes a re prov ided in the Figure 1. 7 legen d and land scape definitions visualized thr ough watershed source distributions are shown in Figure 1. 8. For summary statistics of each landscape definition, see Tabl e A1. 1.4 . 36 Figure 1.7 Nutrient Input Landscapes. Nutri ent Inp ut Landscap es determined from k - means cl usters and labeled post hoc acco rding to clu ster characteristics. Letters indicate major cities, A: Detroit, MI; B: Cleveland, OH; C: Buffalo, NY. Landsca pes, as described in the legend, include three intensiv e agric ultural gro ups (manure dominat ed, chem ic al agricultural fertilizer domin ated, and mi xed), three urban groups (urban core, suburban, and suburban edge), and three rural groups (towns, low intensi ty agriculture, and remote). Landscapes are colored to be iden tifiable by group, with intens ive agri cu ltural landscapes in warm colors , urban land scapes in greyscales, and rural landscapes in blue. The intensive agricultural group contains three separate l andscapes, clustered based on proportions of chemical a gricul tural fertil izer versus manure: Manure Do minated, Chemical Fertilizer Dom inated, and Mixed (Fig ure 1. 8 A - C). These landscapes are found primarily in the south and central US - GLB (Fig ure 1. 7). Large areas in Wisconsin and New York as well as multiple smalle r areas in M ichigan are dominat ed by m anu re, whereas northern Indiana and Ohio and m uch of the central Lower Peninsula of Michigan are dominated by chemical agricultural fertilizer. Agricultur al HUC12 watersheds at the periphery of the Manure and Che mical Fertilizer D ominated landscapes tend t o f all into the Mixed class. 37 Figu re 1.8 Dist ribution of N and P source percent within each Nutrient Input Landscape. (A) Intensive Ag: Chemical Fertiliz er Dominated, (B) Intensive Ag: Manure Dominated, (C) Inte nsive Ag: Mixed, ( D) Urban Core, (E) Suburba n, (F) Suburban Edge, (G) Rural: To wns, (H) Ru ral: Low Intensity Agriculture, (I) Remote. Legend: NF: N fixation, CA: Chemical agricultural fertilizer, MN : Manure, AD: Atmospheric deposition, NA: Chemical nonagri cultur al fertilize r, SP: Septic, PT: Point. Hat ched boxes: Phosphorus, unhatche d boxes: Ni trogen. 38 More densely populated areas of cities and suburbs classify into Urban Core, Suburban, and Suburban Edge landscapes. Urban Core watersheds have large contrib utio ns from poin t sources (Fig ure 1. 8 D), m eanin g in general that they are locat ed in urb an watersheds with major WWTPs. Suburban watersheds are characterized by high chemical non - agricultural fertil izer due to lawns and golf courses (Fig ure 1. 8E). Suburban edg e watersheds have both rural and subur ban p opulations, leading to a mixture of agric ultural sources, chemical non - agricultural fertilizer, and septic tanks (Fig ure 1.8 F). The radiating spatial p attern of an Urban Core downtown transitioning to Suburban , Su burban Edge and beyond can be see n in both Detroit in southeast Michigan an d Clevela nd in northeast Ohio, as well as around Buffalo in New York (municipalities labeled on Fig ure 1. 7 caption). R ural landscapes are most abundant in the northern tier of the US - GLB and i n less populated port ions of Ne w York state (Fig ure 1. 7). These landscap es are classed into: Towns, Low Intensity Agricultural, and Remote. Nutrient inputs in remote regions come alm ost entirely from atmospheric deposition, whereas rural to wns and rural lo w intensity agricultu re ha ve ap preciable fractions of anthropog enic sour ces including septic/non - ag fertilizer and chemical fertilizer/manure sources respectively (Figure 1. 8G - I). N utrient inputs and loading are often linked to LULC becaus e it is a proxy for management practi ces a nd po pulation dynamics, however LULC is insuff icient to characterize the source composition of nutrients. Since SENSEmap is source - specific, it allows a mor e detailed interpretation of land uses that affect nutrien t so urces than L ULC alone. In Intensi ve Ag ricul tural landscapes, distributions of nutrie nt source (Figure 1. 8A - C) vary significantly by source despite similar LULC (Figure 1. 9A - C) (See Fig ure A1. 12 for LULC for all landscapes). Similar to the intensive ag land scapes, Urba n Core and Suburban l andsc apes have similar LULC 39 signatures but receive nutrients from very different sources, point sources and chemical non - agricultural fertilizer, respectively (F ig ure 1. 8D - F, Fig ure A1. 12 D - F). Without source specificit y, u nique charac teristics of nutrient sour ce an d composition cannot be determin ed. Our N utrient Input Landscapes thus illuminate additional patterns in the landscape not visible with LULC alone. Figure 1. 9 Distributions of LULC within three intensive agr icul tural Nutrie nt Input Landscapes . Distr ibuti ons of LULC are shown as violin plots, wh ere filled area represents probability density to provide additional information on the underlying distributi on. Note that while source distributions for these three cl asse s (Figures 1 .8 A 1.8 C ) differed great ly, t heir LULC distributions are stri kingly si milar. (A) Intensive Agriculture: Chemical Fertilizer Dominated, (B) Intensive Agriculture: Manure Dominated, (C) Intensive Agriculture: Mixed. The additional complexit y o f the landsc ape becomes clear when LUL C maps are juxtaposed with different v iews off ered by SENSEmap and Nutrient Input Landscapes (Figure 1. 10). For instance, variations in total non - point P (F igure 1. 9B) within land classified simply as agriculture ha ve P inputs fro m 5 to 10 kg/ha/yr dep endi ng on location and crop practices. Bey ond just a single nutrient map, N:P ratios at the HUC12 level (Figure 1. 10C) help illustrate variable N:P input relati onships over a small area. While similarly classified, agri cul tural waters heds with low ratios a re l ikely hotspots for P loading to the we stern La ke Erie Basin, exacerbating algal bloom issues there. These watersheds can then become key focus areas for 40 man agement action to reduce P deliveries to the Lake. The four in tensive ag w atersheds in Figure 1. 1 0D are al l P - enriched relative to their l argely c hemically - fertilized neighbors. While some of this may be due to assumptions within SENSEmap (i.e. fields rece ive either manure or chemical fertilizer, not both), manure is highly P en riched due to signific ant losses in N between animal emissions a nd field inputs. Thus, these maps serve to highlight the benefits that may be achieved through policies demanding robu st design and enforcement of nutrient management plans for liv estock opera tions. Beyond simple i nput diffe rences, the pathways and rates o f nutrie nt transport and uptake differ across sources. By being space - and source - specific, SENSEmap provides the det ailed view of the landscape needed by local planners to bet ter manage thei r watersheds. 41 Figure 1.1 0 Comp arison Panel of LULC, SENSEmap T P, N:P r atios, and NILs . Four views of a portion of the US - GLB surrounding Toledo, OH (red box on locator map). (A) LULC; see Figure 1. 3 for caption and legend, (B) Total nonpoi nt P, (C) N:P r atio at HUC12 watershe d le vel, a nd (D) Nutrient Input Landscapes . 42 3.4 Comparisons to Oth er Products existing nutrient products. No other products were found to comp are at the same resolution and for all nu trient s ources; however, we selected USG S SPAR ROW (Robertson & Saad, 2011), the chemical agricultural fertilizer product. Her e, we summarize the results of these co mp arisons. Additional information can be f ound i n Text A1. 4.1, A1. 4.2, and A1. 4.3. We compared the landscape inputs within a subset of the SENSEmap sources to those from the USGS SPARROW model (Robertson & Saad, 2011; Wi eczorek & La motte, 2011). Note that th ese prod ucts are not equivalent: SENSEma p calc ulates discrete nu trient inputs in kg/ha/yr, whereas USGS SPARROW uses both LULC area and nutrient masses to statistically compute source - specific fluxes. In particular, SP ARROW design ates urban inputs by LUL C, rather than by urban sources (i.e., sep tic an d non - agricultural chemical fertilizer). Differences between mean HUC8 N and P landscape input rates were small, with SENSEmap - US - GLB showing slightly higher rates for N an d slightly l ower for P (TN: 2.1 +/ - 8. 5 kg/ha/y r, TP: - 0.1 +/ - 1.3 kg/ha/yr; po sitiv e values indicate S ENSEmap was higher). These differences could be partly due to different data sources and timeframes. For source specific comparisons and data differen ces , see Text A 1. 4.1. The Net Anthropo ge nic Nitro gen Index focuses on mass fluxes of k ey nitrogen sources to explain nitrogen dynamics and in - stream concentrations. Swaney et al. (2018) calculated N fixation, total fertilizer, agricultural fertilizer, man ure , and atmosp heric deposition at the co unty leve l across the United States. Unli ke SE NSEmap, NANI includ es nitrogen exports via food and feed, whereas SENSEmap describes only nutrient inputs. We compared 2012 N input values for 43 manure, chemical agricultu ral fertilizer, and N fixation at the c ou nty - level using rates per total county ar ea. W e found that N inte nsity of manure and chemical agricultural fertilizer were similar, with SENSEmap showing consistently higher manure and lower chemical agricultural fe rti lizer values , while N fixation had m or e variati on (mean +/ - standard deviation, posi tive values denote SENSEmap had higher values: Manure N: 3.8 +/ - 6.2 kg/ha/yr, Chemical Agricultural Fertilizer N: - 2.7 +/ - 5.6 kg/ha/yr, N Fixation: 7.1 +/ - 15.9 kg/ha/ yr) . Additional information is provided i n Text A1 . 4.2. ly de veloped 5 km resolu tion maps of nitrogen fertilizer inputs, which are directly comparable (though at a coarser resolution) to the chemical agricultural fertilizer N sour ce layer from S ENSEmap. Following aggre ga tion of S ENSEmap to 5 km resolution, we f ound that in general the fertilizer estimates had good agreement (1.5 kg/ha +/ - 9 kg/ha). At low nutrient level (< 20 kg/ha), there was no net bias between the two products ( Fig ure A1. 17), while SENSEmap had sligh tl y higher estimates in higher nutrient pix els. Differences were mo re pronounced in intensive agricultural areas; in high manure areas SENSEmap tended to have lower chemical agricultural fertilizer values, while the r eve rse was true in non - manured areas (F ig ure A. 16) . As a result of this source - spe cific bias between produ cts, differences were not spatially uniform (Figure A1. 16). For additional methodology and figures, see A1. 4.3. 3.5 Sensitivity to Chemical Agricultur al Fertilizer S ENSEmap is driven by ext er nal produ cts to quantify chemical agricul tural and chemical non - a gricultural fertilizer. Brakebill & Gronberg (2017) have recently updated work by Ruddy et al. (2006) and Gronberg & Spahrg (2012) to model county lev el N and P fert ilizer kg per year. Howe ve r, the Br akebill & Gronberg (2017) produc t is modelled based on f ertilizer expenditures 44 and there are few other fertilizer products to compare to. This reliance on a single product methodology introduces uncertainty in to nutrient budgets based on these v al ues, such as SENSEmap. Given the dominanc e of the chemical agricu ltural fertilizer source across much of the GLB, a sensitivity analysis of SENSEmap to uncertainty within the county - level Brakebill & Gronberg (2017) pr oduct is war ranted. TN and TP were la rgely una ffected at the HUC12 scale (coef ficie nt of variation (CV , or standard deviation divided by mean): TN: 2.3%, TP: 5.1%), while as expected total chemical agricultural fertilizer (kg) was more affected (CV: ch em. ag fertiliz er kg N: 11.4%, P: 17.0% ). Landscap e level patterns, as described b y Nut rient Input Landsca pes, remained largely the same. 85% of HUC12 watersheds were always assigned the same landscape, regardless of change in chemical agricultural fertili zer . Among the 15% of watersheds with a t least one simulation producing a differen t lan dscape, changes wer e primarily between Intensive Ag: Chemical Agricultural Fertilizer Dominated and Intensive Ag: Mixed. Suburban Edge, the least cohesive cluster, also saw significant numbers of changes. Ove ra ll, broad patterns in the landscape were not s ensitive to specifi c values of total chemical agricultural fertilizer at the county level. Additional information can be found in Text A1.5 . 45 4 Conclusions SENSEmap pro vid es a spatial ly - and source - explicit d escription of landscape nutrient inputs i n a f ormat that should be useful to both managers and scientists. By quantifying seven sources of nutrients, we found agricultural sources (Chemical Ag Fertilizer, Manure, N Fix ation) domin ate the US - GLB, accounti ng for 71% of N and 88% of P. We can use t he in sights from SENSEmap - US - GLB to provide managers with detailed estimates of nutrient inputs. As part of the Great Lakes Basin Tipping Point Planner, we are cultivating co nne ctions with watershed managers. To su mmarize nutr ient source budgets and find p atte rns in inputs, we used k - means clustering to describe nine Nutrient Input Landscapes. Notably, land use composition did not vary significantly between three intensive ag ric ultural land scapes, suggesting that la nd use alone does not capture the heteroge neit y and detail in nutrie nt sources. Individual nutrient sources (specifically manure and chemical agricultural fertilizer) behave differently in terms of speciation and tr ans port, making it important to recogni ze that a tota l input from agriculture can h ave significantly differen ces based on nutrient source. The concept of our Nutrient Input Landscapes can serve a number of other purposes, such as analyzing different manage men t strategies across similarly compos ed watersheds to better understand their tra nsfe rability and developin g metamodels for nutrient modeling. Quantifying uncertainty in a product with no ground truthing is difficult, but uncertainty arises from errors i n c lassificatio n in remote sensing prod uc ts (NLCD, CD L), pseudo - random placement (s epti c, non - CAFO manure far ms), literature compilation of P deposition, and statistical models used to approximate spatial heterogeneity (chemical agricultural fertilizer, N fix ation). Beca use of the assumptions a nd pseudo - rand om placement within many algor ithm s, 46 we do not claim tha t SENSEmap inputs are precise at the 30 m scale; rather, they are more accurate at aggregated watershed scales that are better representative of sp ati al uncertain ties. We also recognize th at this appr oach does not account for ever y nu trient source; future work will add other nutrient sources that are regionally - minor but locally - important, such as land applied septage, land - applied secondary treated was tewater, com mercial septic systems, ur ban animal w astes, and aquaculture. Opport unit ies for future work wi th the SENSEmap framework are broad. We plan to investigate different spatial extents and use both statistical and process - based modeling to explor e s urface and s ubsurface nutrient trans po rt. We are c urrently applying the SENSEmap fra mework to the Canadian GLB, and the methods developed here can also be expanded to any part of the continental United States. Currently, SENSEmap - US - GLB is being used as a driving vari able to determine causes o f coastal wet land invasion within the regio n. Additionally, we are pr ocessing SENSEmap - US - GLB source inputs with a statistical and geospatial nutrient transport model, similar to the methods used in Luscz et al. (20 17) . The SENSEm ap - US - GLB products can a ls o be used as in puts to a range of crop, hydro l ogy, and biogeochemical m odels. Acknowledgments This chapter was coauthored by Anthony Kendall, Sherry Martin, Henry Whitenack, Jacob Roush, Bailey Hannah, and D a vid Hyndman . W e thank Emily Luscz for developing the initial framework for the conceptual model and Jillian Deines for assistance proc ess ing remote s ensing inputs. The autho rs have no compet ing interests to declare. SENSE map - US - GLB is available fo r download on Hydroshare at 30 m, HUC12 and HUC 8 scale ( https://d oi.org/10.4211/hs.1a116e5460e2 4177999c7bd6f8292421 ). Fu nding for this work was provided by USDA - NIFA grant 2015 - 68007 - 2313, NOAA grant NA12OAR4320071, NASA 47 grants 80NSSC17K0262 and NNX11AC72G. Querc us Hamlin was p artially supported by th e NSF Graduate Res earch Fellowship Program. Any opinions, findings, and co nclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of USDA NIFA, NASA, N SF, or NOAA. 48 APPENDIX 49 T ext A 1.1 Source M ethodology The following secti ons contain additional methodology for nutrient source modeling processes. Text A1.1.1 Atmospheric Deposition of Phosphorus Compared to nitrogen, atmospheric phospho rus deposition is more difficult to me asur e and lacks the ex tent of monitoring netwo rks th at are available for nitrogen. To overcome this limitation, we conducted an extensive literature review to gather reporting data on annual atmospheric phosphorus depo sit ion rates. We f irst conducted an ex haus tive search, then limited the papers based on th e following criteria: sites could be a maximum of 700 kilometers distance from the Great Lakes basin, a year or more of data must have been collected, and the study m ust have used acce ptable collection me thod s, as discussed be low. Da ta collection met hods v aried, and studies were excluded for reasons such as basing annual load estimates on 5 - months of collection, collections based off of ships, and using volunteer colle cte d samples. Eac h site selected for our modeling had to be review ed and selected i ndivid ually, a process which often involved nutrient species and unit conversions, averaging of loading rates across multiple years and removal of outliers on a case by cas e b asis. To standa rdize the model all unit s were converted t o kg/ha /yr. Variations o f unit checked. A number of papers also r epo rted values in phosphate (PO4) and were converted to phos phorus (P) for our model . Site locations often lacked specific latitude and longitude coordinates. In these cases, site locations were estimated based on georeferencing of map images from the pape rs. The map scale at which images were geo referenced varied. A num ber of papers had multi ple collection sites but did not report individual collector deposition rates, instead opting to average site deposition values across study. These sites would have t he same values as other sites from the sam e paper. With indi vidual sites rates unava ilable , 50 to be points 5 miles away or less from each other and may have bee n f rom the same or different studies. In t otal there were 7 cluster s that were each were e xamined very carefully; means were established for each cluster, and individual sites within the clusters were closely examined to see if their values coincided with nea rby sites in th e same cluster. Cons ider ation was made whe ther to average nearby s ites t ogether to achieve "area averages", however this was not done. In total, 98 sites were input to the ArcGIS kriging toolset to develop spatially distributed loading es tim ates for both a tmospheric nitrogen and phosphorus. Other modelin g techniques (IDW ) were Analyst was used to compare and adjust models and variables until we were confiden t i n the accuracy of our model. Model opti mization was based on min imizing the root mean s quare error. Wet and dry nitrogen were modeled separately using ordinary kriging, with no transform, no trend removal, and parameters non - optimized settings. Their ou tpu ts were summed together to develop a to tal nitrogen load estimat e. One model for total phosphorus (both dry and wet deposition) was generated using ordinary kriging; log - transform; no trend removal; hole - effect model; parameters optimized settings. Opti miz ed versus non - o ptimized in this con text refers to whether the Op timize button in the Ge ostatistical Analyst was used. We found that we were able to best minimize the root mean error using non - optimized settings for nitrogen and optimized settings for ph osp horus. 51 Figure A 1 .1 A tmospheric d epos ition P site map b y regio n. 52 Table A1.1 Sum mary o f phosphorus atmospheric deposition sites. Regions Range Mean Median Number Number States Included of Sites of Papers Upper Midwest 0.07 - 0.51 0.24 0.23 32 9 a M I, MN, ND, ON, WI Lower Midw est 0. 25 - 0.56 0. 43 0.41 12 3 b IL, IA Southeast On tario 0.09 - 0.56 0.33 0.35 36 8 c ON Northeast 0.04 - 0.29 0.11 0.09 16 3 d CT, NJ, NY, PA, RI Southern 0.18 - 0.49 0.33 0.33 2 2 e NC, TN a (Bradley, 2011), (Delum yea an d P etel, 197 8), (E isenriech et al., 19 77), (Linsey et al., 1987), (Munger, 1982), (Robertson et al., 2009), (Schindler et al., 1971), (Schindler et al., 1973), (Schindler et al., 1976), (Wright, 1976) b (Anderson and Downing, 2006), (Eisenriech et al., 197 7), (Murp hy, 19 74) c (Bradley, 201 1), (Brown et al., 2011), (Foster, 1974), (Lin sey et al., 1987), (Schindler and Nighswander, 1970), (Winter et al., 2002), (Winter et al., 2007) d (Koelliker et al., 2003), (Pearson and Fisher, 1983), (Yang et al ., 1996) e (Brins on et al., 1980), (S wank and Henderson, 1976) 53 Text A1.1.2 Se ptic - Exclusion and Inclusion Masks Our exclusion mask, given by the service areas of WWTPs throughout the study area, was determined based on drinking water well da ta, US Ce nsus data , and WWTP locations as re port ed in the Clean Watershed Needs Act databa se (US Census, 2010; CWNS, 2012). Bec ause no comprehensive statewide data were available for WWTP service areas, we developed an algorithm to infer their locations fr om the se three dat asets. The automated waste wate r treatment plant (WWTP) service area deli neation model was developed as an imp rovement to the Luscz et al. (2015) septic place ment model, which required hand delineation of WWTP service boundaries . The large r spat ial extent n ecessi tated a more efficie nt , systematic, and rep roducible method to est imate boundaries. To facilitate WWT P service area classification, we compiled a point dataset of drinking water wells in all eight Great Lakes states. The algorithm uses the se along with C ensus data to comput e tw o quantities at the Census block level: th e ratio of corrected drinking water w ell counts to household unit and population density. Individual states collect and maintain well records; however, the quality an d comp let eness of digiti zed records varies. Afte r compiling and removing non - domestic clas sified wells, we compared ratios of w ells to household units in blocks unlikely to be in service areas (not in a Census designated place, greater than 5 household uni ts, gr eat er than 1 well) to come up with an appr assump tion was that a fixed proportion of w ells would be missed in each county because of incomplete digitization efforts at the county level, and by using only blocks high ly lik ely to be ou tside of WWTP service area s, w e could find this proportion of missing we lls. For example, if a rural Census b lock had 10 household units and 8 wells, we could assume an undercount of 2, or 20%. We would then divide the 8 wells by 80% to a dd 2 w ell s and rai se the well number to 10, matc hing household units in a block highly lik ely to be outside a WWTP service area . We 54 applied a fixed undercount proportion at each county. The county undercount value was chosen by taking a state - calibrated pe rcenti le of all bl ock un dercounts within a c ount y. This state threshold percentile (85 - 95% ) was set to the 90th percentile whe re possible; however, Indiana had to be lowered to the 85th because of excess misclassified agricultural wells that led to 90th p ercent ile values o ver 1. We chose a high per cent ile to select a ratio that was reflective of areas most likely to be unserved, thus adding a conservative amount of wells. At the county level, all ehold rat io to add the c orrected number of w ells . Block classification was based on correc ted well to household unit ratios, po pulation density (block group level), and block membership to a Census designated place. Census designated places are named loci of pop ula tion but may or may not be incorpor ated (US Census, 2010). Variables were calibra ted based on multiple runs, qualitati ve assessment of spatial distributions, comparison to existing service area maps, and confirmation of existing wastewater service s for sma ll, low d ensity areas. Calibrations wer e performed at the state level to account for variation in drinking water well data collection (Table A1. 1 ). The conditions (Figure A1. 2 ) followed the assumption that a block with higher density, low well to househ old unit rat io (gi ven by Table A1. 3), and inside a CDP would be within a service are a. Additional conditions to account for small, low density areas and densely populated areas w ith an inflated ratio due to low household unit number were also include d. Aft er initial c lassif ication, the 97th pe rcen tile of density in serviced blocks was cal culated and used to reclassify blocks that had higher ratios than the standard ratio threshold . After initial classification, blocks were dissolved. Dissolved, in th e GIS sen se, refer s to r emoving boundaries o f su bregions (Census blocks) with common chara cteristics (WWTP 55 service area classification) to create larger, contiguous regions. This creat ed a layer of service areas, non - service areas, and unclassified blocks. T o co rre ct for sm all po lygons likely miscla ssif ied (identifiable by being surrounded by a different class), we used buffered WWTP points from Clean Watershed Needs Survey data (CWNS 2 012). Buffer size was variable based on the population receiving treatme nt and de termined by qua litative assessment of d istance from WWTP to plausible boundaries in each state. Because WWTPs are not always centrally located, buffer sizes erred on the large side. Buffer population ranges and corresponding size were determined f or eac h s tate. To improv e the service areas, pol ygons under 2 km 2 from serv ice area to unserviced based on if they fell within a WWTP buffer. For example, if a 1 km 2 pol ygon was originally classified as unserviced but was within a WWTP buffe r, it wou ld be cha nged t o serviced. The flip ped polygons were dissolved again to create fi nal service areas. After excluding WWTP service areas, primary and secondary inclusion masks f or septic placement were defined. The primary inclusion mask represents urban and suburban areas , whereas the second ary inclusion mask proved necessary to account for septic systems in rural settings. The primary inclusion mask includes all areas with NLCD cells classified as urban, over 15 meters away from water features, and betwe en 10 and 60 meter s from roads. These setb acks were selected to represent a reasonab le set of requirements for riparian sanitary codes, distances to public infrastructure, and un der the assumption that in most cases, residences are built relatively c lose t o r oads. The secon dary inclusion mask incl uded all non - urban cells that met the ripa rian and road set back requirements. 56 Table A1. 2 Calibrated values for WWTP and Septic condit ions . S ee Fig ure A1. 2 for conditions. State Well : Household Ratio Population Density (P op/km 2 ) Low Sta n dard H igh Low Standard H igh IN 0.08 0.12 0.5 20 220 500 IL 0.1 0.1 0 .5 20 300 400 MI 0.05 0.1 0.5 18 135 400 MN 0.1 0.1 0.5 20 250 400 NY 0.1 0.1 0.5 20 220 400 OH 0.1 0.12 0.5 20 230 400 PA 0.1 0.1 0.5 12 300 500 WI 0.08 0.12 0. 5 30 300 400 F igure A1.2 Logical state ments used to classify blocks as WWTP service area. 57 T e x t A1.1.3 Fertilizer Demand Model The basis for the spatial arrangement of both manure and chemical agricultural fertilizer models is crop - and location - specif ic ferti lizer de ma nd. I n reality, this is an inc redibly complex function of environment , management, policy, economic, and individu al considerations the modeling of which is beyond the scope of this effort. Instead, we chose a more straightforward approach of fitt ing a ma ch ine l earning model of f ertiliz er nutrients as determined from fertili zer sales data at the county - level (Brakebil l & Gronberg, 2017) created using Boosted Regression Trees (BRT) to county - level crop, soil, and geographic variables. We use d t he gu iding pr in ciple s that fertilizer is appl ied based on type of crop, as well as s oil texture, regional agricultural intensity , and climate. To vary fertilizer application within a county, we fit a boosted regression tree model that used crop type, so il textu re, tota l cropl and, and latitude to the average rate of chemical agricultural f ertilizer spread over county cropland (kg/ha N and P). This model produces a relative demand of fertilizer across crop types, specific to each location across the Great Lak es Ba sin. Thi s deman d model, applied a t the p ixel - level, is then used as a means to spatially distribute crop nutrient applicati ons within manured areas and across chemically fertilized areas of each county. Boosted regression trees (BRT) is a method th at combi nes clas si ficat ion trees and boos ting. C lassification trees take any number of variables and breaks data into mutually excl usive groups that explain the most variance between groups (Brieman et al., 1984). The pli tting until t he most variance can be e xplaine d in the datasets by these logical rule s. The boosting algorithm combines many indi vidual models to used by BRT ut ilize s a cumu la tive regression that fo rms thr ough recursive trees and calculated 58 res iduals (Elith et al., 2008). Independent var iables are ranked on their relative influence, allowing for a proportional measure of variable importance. We used BRT to qua nti fy cr op - speci fi c nut rient demand at th e count y level in response to geography, soil, and crop types, with input variables descri bed in section A1 .1.3.1 below. BRT has a number of input parameter that control aspects of the algorithm. Our implementation, co nduct ed using P ython scikit - learn ve rsion 0 .19.1, used largely default parameters. We did, however specify 100 boosting stages and required a m inimum of 5 samples per leaf (each split produces two leaves) within each classification tree. The county - mo del was fit with v ariab les that are also observe d at the pixel level. Ideally, we could fit a pixel - level model to observed fertilizer rates in fiel ds; however, this was not possible. Thus, the model is fit to observed county - level variables, including Bra keb ill a nd Gronb er g (20 lizer r ates. We then applied the county - level BRT to predict fertilizer application rate in every cropland pixel in the GLB. At the pixel level, crop demand was calculated based on the average demand hectares (descr ibe d bel ow) over a n 8 - y ear crop rotation, descri bed by CDL data from 2008 - 2015. The tot al hectares of cropland within a county was applied to all pi xels within the county as an additional measure of regional agricultural intensity. Latitude and soil textur e w ere n ative to 3 0 m r esolution. Applyin g this model created variation within applicat ions within a county. Finally, we adjusted all pixel rates wi thin each county by a debiasing factor to match the observed total fertilizer at the county level. In the f utu re, w e hope t o impro ve this method wit h point observations for validation, regionall y - specific crop recommended rates, and rules for smoothing ra tes within fields. Text A1.1.3.1 Fertilizer Demand Model Variables 1. Crop - specific demand : Crop types were ca tegor ized int o four levels of demand, based o n Michigan recommended rates given in W arncke (2004). Table A1. 3 provides the input 59 fertilizer rates and their demand classes. The median fertilizer rate specified by Warncke (2004) for each demand group was use d to create a s caled coefficient appli ed as a emand class weight. Thus, 100 hectares of corn would be weighted to more fertilizer demand hectares than 100 he ctare s of a l es ser f ertilized crop lik e wheat . Ag Census 2012 Crop harvested area fo r all crops with greater than 1000 ha in the US - GLB was conve rted to demand hectares and area normalized (USDA, 2012). 2. Latitude : We chose latitude as a variable that de scrib es both re giona l practice and cli mate, a s in the US - GLB, there is a north - south gradient in both climate and agricultural intensity. Souther n parts of the US - GLB like Ohio and Indiana are intensely farmed and more suitable for high yields, thus res ult ing i n higher f ertil izer application r ates, w hereas central and northern counties pr oduce lower yields and apply fertilizer at lower rates. 3. So il Texture : Soil texture was represented via mapunit - , horizon - and component - weighted average percent sand and perc ent clay f rom S SURGO for depths 0 - 50 cm. These were processed from the gSSURGO dataset (gSSURGO 2019). 4. Total Cropland : Total area of cro pland was used as a variable to describe agriculture intensity, following the idea of agriculturally intense co untie s applyi ng fert ilizer at higher r ates. W e used the 2012 Ag Census total croplan d area at the county level. 60 Table A1.3 Nitrogen fertilizer r ates . Rates derived from Warncke et al. (2004) and Warncke & Dahl (2003). Raw rates were not applied in SENS Em ap. Fe rtilizer D emand Group Crop N Fer tilizer kg/ha/yr High Potatoes 203 Onions 180 Sod/Grass Seed 162 Cabbage 158 Corn 143 Sweet Corn, Popcorn 133 Carrots 124 Medium High Tomatoes 111 Peppers 111 Greens 111 Watermelons 111 Sugarb eets 106 Sun flower 89 Herb s 89 Squash 89 Pumpkins 89 Mint 89 Cherries 84 Peaches 84 Asparagus 79 Christmas Trees 72 Medium Low Sorghum 67 Spring and Winter Wheat 67 Millet 67 Speltz 67 Grapes 67 Cranberrie s 67 Tritical e 62 Canola 57 Appl es 57 Radishes 57 Low B lueberries 49 Dry Beans 44 Peas 44 Barley 27 Oats 27 Rye 22 No Fertilizer Switchgrass, Soybeans, Alfalfa, Other Hay, Clover, 0 61 Text A1. 1 . 4 Manure T he animal inventory was bu ilt from two s o urces: 1) the 2012 USDA Agricu ltural Census that re ports county level information for all states on the number and type of farms, and 2) separate state - level records of confined animal feeding operations (CAFOs), w hich only include addres ses and inventori e s for farms with animal s count s above a certain siz e (CAFO designations vary across both animals and states). The USDA Agricultural Census includes all farms, including those regulated as CAFOs. However, the Census doe s not disclose anima l c ounts for all f arms, especially small operati ons, out of privacy c oncerns. To estimate animal counts for undisclosed farms, the number of animals in CAFOs were subtracted from the USDA Census population to give an aver age herd si ze o f non - CAFO farms, by an imal type and c ounty. Herd sizes were used to label farms as pastu re operations or confined operations based on a threshold confinement level given by Kellogg et al (2000). The confinement threshold limit differs by an imal; for e xamp le, dairy cows are a lwa ys confined, w h ile hogs are only confi ned if the head count is gre ater than 450 (Kellogg et al., 2000). For each farm in the animal inventory, a nutrient excretion rate and average county - level application rate was th en estimate d . N utrient loads for ea ch animal type in each county were calcul ated ba sed on estimated nutr ient excretion rates per animal (NRCS, 2008). Not all animal types in the USDA Ag Census have an excretion rate described by the NRCS. For animals with no nutrien t ex cretion information, da ta on excretio n rates was used to crea te a li near model to estimat e excretion rate as a function of animal weight, following the method used by Luscz et al (2015). Manure is typically stored prior to application, and N is volatil ized and lost to the atm osp here via denit r ification during this t ime. We used per - animal empi rical ratios of N to P for applied manure (Table 6 and 7, from Lorimor et al., 2008) to quantify denitrification for each farm, based on the original N: P ratio. Co unty - average manure appl ica tion rates wer e then 62 calculated by div iding t he spreadable manure acreage from the Ag Census by total county animal excretion in terms of kg P. We then randomly placed individual non - CAFO farms reported in the USDA Ag Census int o ag ricultural areas, de fin ed using a com b ination of the 2011 NLC D and C DL. We assumed that m anure fertilizer would be used in the area surrounding farms, rather than transported to distant fields, as farmers would minimize transportation expens es. Farms w ere only placed in cells wh ere the surrou n ding area could support the re quired area to spread manure, calculated manure acres. To prevent spreading m anure from conf ined farms on pastur e a reas for uncon f ined animals, we create d exclu sion b uffers for past ure animal farms. These were created according to manure spreading (inclusion) buffers we re c reated according to est imates of the r equired spreading area. To acc ount for overlapping initial buffers and areas where the buffers intersect roads and urban areas, buffer sizes increased iteratively until 96% of farms met their required ar ea for manu re s preading. Overlappin g b uffer s were me r ged and dissolved after e ach i teration of the buffe r size, and re - clipped to include only agricultural land. Each confined farm is assumed to lose a portion of manure during transport and storage. This portion, gi ven by Kellogg (2000), v ari es by animal s p ecies and is applied as a sing le point at the locat ion of each confined farm. Manure loads were applied within buffers at the rate of P fertilizer demand given by the fertilizer demand model. Demand was summed and adju sted by the factor t o m atch observed f arm P loads. N applicat ions we re determined by the N:P ratio after N loss for each farm. Due to manure being significantly P - enriched after N loss, we applied at 63 over - ferti lize with N. If N need w as not met, addit i onal N chemical agricul tural f ertilizer was applied . Text A1.1.5 Chemical Agricultural Fertilizer Chemical agricultural fertilizers are applied following manure separately for two cases: 1) areas within a manure bu ffer that did not receiv e a dequate N fert i lizer from manure alone , and 2 ) cropland areas outs ide of a manure buffer. For the first case, applied manure was subtracted from N demand from the county - adjusted BRT prediction at each pixel and any re maining dem and was satisfied using che mical agricult u ral fertilizer. The rem aining cropland cells receiv ed N and P fertilizer at the rate predicted by the relative demand model, applied based on the 8 - year rotation for each pixel. We then adjusted all pixe ls within a cou nty by the factor ne ede d to match the observed county fertili zer fro m Brakebill & Gronber g (2017). Since the county level fertilizer is derived from purchasing data, intercounty transport (i.e. a farmer purchases fertilizer in one county and applies it in a neighboring county ) c an create unre a listic TN and TP values at the county level. To ass ure realistic values, we put upper and lower bounds on the adjustment factors used to assure aggregate inputs matched the county - level data in order to keep pixels wit hin reasonable ferti liz er rates. As a result, occasionally in puts of fertilizers did not sum to the observed county value. In particular, we specified that pixel values for N could not be adjusted by a multiplicative factor less than 0.2 or greater tha n 5, which represent the 3 r d and 95 th per c entile of county level adjustm ent factors. P had a minimum factor of 0.1 (5 th percentile) and no maximum factor because the crop specific demand model for P tended to overestimate actual fertilizer appli cations. T ext A1.1.6 Nitrogen Fixa tio n To calculate nitrogen fixation from legumes , we used the Croplan d Data Layer (CDL, USDA NASS, 2008 - 2015) to identify cells containing leguminous crops and grasses (here soy, 64 dry beans, alf alfa, and other hay) . We the n applied a fix ation rate to all le gum e and grass ce l ls based on soil inorga nic N, applied manure fertil izer, and spatially disaggregated yield. Measurements of yield are available at the county level in the in the USDA Agricultural Census and Survey. To calculate y ield at the 30 m cell re sol ution, we anal y zed the relationship be tween c ounty yield from the 2012 Ag Censuses and county mean vegetation indices, plant available water, and latitude at the included cells. Greener fields and higher plant availabl e water are ass ociated with higher yie lds and are av a ilable at high spatial resolut ions (Courault et al. , 2016). Latitude was included due to a north - south gradient in climatic change and suitability of fields for soybeans and hay. The s outhern parts of th e basin in Ohio and southern Michig an are more heavi l y planted with soybeans , while alfalfa and other ha y are more associated with northern latitudes in northern Michigan, Wisconsin, Minnesota, and New York (CDL). Using these relationships allowed for spat ially disag greg ating coarser county yi eld data to in d ividual cells within a county. We tested three veg etation indices: greenness index (GI), normalized difference vegetation index (NDVI), and enhanced vegetation index (EVI). Landsat 5, 7, and 8 images (n ative 30 m reso lution) were compile d a nd cleaned in G oogle Earth Engine over the 20 08 - 2015 time series u sed for placement. Cells were only included if the CDL crop type matched the identified crop type in each year. We ran versions of each model with all C DL cells ov er t he 8 year period and wi th only cells w ith the correct crop mo re freq uent than one year to adjust for random error in the CDL. Median and mean of annual maximum values were tested in the regression. Plant available water (PAW) was estimated u sing SSURGO tex tural classes mapped to hydrologic pr o perties using the ROSET TA data base (Schaap et al. 2 001), and calculated as 65 field capacity minus wilting point for the depths 0 - 50 cm. Cell row in our model grid was used as a proxy for latitude. We buil t models us ing both multiple linear re gression and C A RT (Classification and Regress ion Trees). CART usin g NDVI median of annual maximum, latitude, and PAW produced the best model for soybeans (R 2 = 0.75, p < 0.001). Multiple linear regression using EVI med ian of annu al m aximum and latitude was the most succ e ssful model for dry bea ns (R 2 = 0.53, p < 0.001). U nsurprisingly, there was little variation in R 2 values (< .05 difference) across models due to similarity between vegetation indices. Once yield was ca lculated at all cells, rates were d ebi ased by multip l ying the observed Censu s and S urvey average yield d ivided by average county calculated yields. This assured calculated mean county yield was equal to observed county yield. Alfalfa and non - alfalfa hay di d not produ ce s atisfactory relation shi ps: the models could not predict high yields and coefficients of d etermination were between 0.1 and 0.4. We chose to assign county yields to all cells within a county and not spatially disaggregate the values. To calc ulate N fix atio n, we first determin ed available soil inorganic N. We followe d the m ethod described in Go olsby et al. (1999) and used in Han and Allan (2008). Using SSURGO data converted to raster and resampled to our model grid as previously described, we calculated the mass of organic matt er in the upper 5 0 cm of soil (to capture the ma jority of root zone a ctivity) with the product of bulk density, volume, and percent organic matter. We followed literature (Han & Allan 2008; Goolsby et al. 1999) and used s oil N conte nt o f 3% of organic matt er mineralization at a rate of 2% per yea r in cu ltivated soil (Gentry et al., 1998; Stevenson, 1994). Using mineralized N, applied manure, and applied chemical fertilizer; we calculated percent of plant N fixed using Meis inger and R anda (Eq 1). 66 (1) Meisinger and Randall (1991) provide a table of common values of nitrogen content base d on crop and m oisture rate. To pic k a rate for non - alfalfa hay, we determin ed the types of commonly gro wn hay in the GLB through contact with the Michigan and Minnesota agricultural extensions. We averaged rates for the major types of hay (clover, birdsfo ot trefoil, orc hard grass and blueg ras s) fro m Meisinger and Randall (1991) a nd used a rate of 0.02% N fr om fixation (Warncke et al., 2004; Kaiser et al., 2011). Rates for soy, dry beans, and alfalfa were 0.055%, 0.036%, and 0.0235% respectively (Meisinger & Randall, 1991 ). We calculated N f ixa tion ( kg/ha/yr) in each cell as the pr oduct of yield (kg/ha/yr), N content, and percent N from fixation (Eq 1). Since our model is all cells in the Cropland Data Layer (CDL) that had be en any N fixing crop between 2008 and 2015. By using multiple year s of data, we allowed for the effect of a corn - soy rotation. Altho ugh we use all cells, we corrected calculated fixed nitrogen to take into account freq uency and yearl y area. To do this, we adjust ed each cell to a proportion of total yearly fixation based on the frequency of being identified as the assigned crop (e.g., 5 - year soybean cell received the for frequency, we corrected the cal cul ated f ixation based on area using the USDA Agricultural Census 201 2 values, as suggested as a best practice for CDL use in Lark et a l. (2017). By adjusting by observed area divided by CDL area, we assured that we did n ot over - account for the usual plant ed area o f each crop. The final cell valu es are abstract, in that the y do not represent an actual yearly load at the point, but when ag gregated to scales of analysis they provide representative total loads. 67 Text A1.1.7 So urce Methodolog y Flow Charts The fo l lowing f igures contain flow charts for each SENSEmap source methodol ogy. Additional description of terminology and variables for each chart can be found in the main document Methods Section 1. 2.3, and in the Text S 1 .1 ab ove. 68 Figure A1.3 Septic WWTP De li neation flowchart . 69 Figure A1.4 Sept ic Placement f lowchart 70 Figure A1.5 Chemical Non - Agricultural Fertilizer flow chart. 71 Figure A1.6 Point S ources flow chart. 72 Figure A1.7 Manure flow chart. 73 Figure A1.8 Ch emical Agricultural Fertilizer Placement f low chart. 74 Figure A1.9 N Fixation Yiel d C alcu lation fl owchart 75 Figure A1.10 N Fixation Calculation flow chart. 76 Text A1.2 K - means clustering of watersheds to produce Nutrient Input Landscapes Data c lustering, or c luster analysis, use s algorithms to find u nderlying, natural groups or patterns in a re presentation of objects based on similarities (Jain, 2010). The term representation refers to how objects within a dataset are described for example, we could cluster watersheds based on land use, population, flow, or n utrients and we would find d ifferen t groups based on their similarity. Additiona lly, the way we quantified the variables of choice could influence the clusters in that using total area of each land use compared instead of percent land use could create di fferent groups. Similarity i s a bro ad term and how similarity is determined is d ecided by the clustering algorithm used (Jain, 2010). In our analysis, we selected k - means clustering to det ermine if there were natural groupi ngs of watersheds base d on their relative composition of nut rient s ources. K - means clustering is one of the olde st, simplest, and widely used clustering algorithms, dating back in multiple fields to the 1950s and 1960s (J ain, 2010). K - m eans creates cluster s for a set of n - dimen sional poi nts into an a priori determi ned K n umber of clusters where the squared error of a point and the mean cluster centroid is minimized (Jain, 2010). We experimented with different representati ons of watershe d nutrients by using rates of nutrient inp ut sources in kg/ha, including total n itrogen and phosphorus, and using percent nutrient i nput by nutrient source. We experimented with the value of k , which determines the number of clusters, as wel l as running mu ltiple times with di fferent initialization random se eds. Silhouette coefficients are us ed in clustering analysis to describe how coh esive clusters are. Silhouette coefficients are calculated by comparing the distance of a sample to the mean distance to oth er samples in the sa me clusters compared t o the mean distance of the next neares t clust er. Fig ure A1. 11 visualizes the silhouette co efficients from our Nutrient Input Landscapes. We found nine clusters best physically represented the data. A t fewer than ni ne clusters, our poi nt 77 source - cluster (or landscape) was lost, d ue to its small size. By using our clusters, we provide a way to categorize watersheds based on the 13 variables of N and P sources. Figure A1. 11 Silhou ette plot for N utrient Input Landsc apes silhouette coeffi cients. S i lhouette coefficients repres ent how well a sample fits into its cluster and ran ge from - 1 to 1, where negative values indicate a sample should have been classified in a different cluster. E ach watershed i s represented by a l ine, sorted by cluster , thus cr e ating different thickness of each c luster representative of how many watersheds are in a cluster. For example, Urban Core is the smallest cluster and Remote and Intensive Ag: Chemical Ferti lizer are the l argest. The line ext ends in the x directio n to its s ilhouette coefficient. 78 Fi gure A1 .12 Proportion of LULC within all Nutrient Input Landscapes. The violin represents the distribution of LULUC proportions for each watershed. For example, a hash mark at ~60% for agriculture in (A) means the medi an percen t age agriculture in Intensive Ag: Ch em Fert landscapes was 60%. 79 Ta ble A1.4 Summary statistics for Nutrient Input Landscapes. All 13 variables define a single landscape . Nutrient Input L andscape N (%) Chem NonAg Fertili zer Atmospheric Deposi tion Sept i c Point Manure Chem Ag Ferti lizer N Fix Mean Std Mean Std Mean Std Mean Std Mean Std Mean Std Mean Std Intensive Ag: Mixed 1.2 2. 3 18.9 7.5 1.9 1.7 0.4 2.7 28.8 7.4 37.6 9.0 11.2 4.5 In tensive Ag: Man ure 0.8 1.6 14.9 7.7 1.2 0.9 0.2 1.0 54.0 10.1 19.4 8.4 9.5 4.7 Intensive Ag: C hem Fer tiliz er 1.1 2.0 18.8 7.5 2.0 1.7 0.5 2.0 8.7 5.3 54.9 8.5 14.0 5.6 Urban Core 16.2 13.1 31.6 21.0 2.2 2.6 41.4 28.9 1.5 3 .5 6.0 8.7 1.3 1.7 Suburban 42. 7 13.0 44.8 8.7 4.1 5.0 2.8 7.7 1.0 2.1 3.7 4.7 0.8 1.3 Suburban E dge 7.8 7.8 45.1 11.7 7.8 5. 4 1.5 5 .5 8.4 5.8 21.7 9.8 7.7 5.2 Remote 0.2 0.6 97.8 2.5 0.7 1.0 0.0 0.2 0.3 0.8 0.5 0. 9 0.4 1.0 Rural: Low Intensity Ag 1.4 3.2 50.5 12.9 2.8 2.5 0.6 3.1 25 .9 8.1 8.5 6.1 10.3 7.2 Rural: Tow ns 2.0 3.8 81.7 8.9 4. 8 4.6 0.2 1.0 4.7 4.1 3.4 3.8 3.3 3.2 Nutrie nt Input Landscape P (%) Intensive Ag: Mixed 1.0 1.9 4.3 2.3 3.8 3.4 0.6 1.8 43.2 8.9 47.1 8.4 Intensive Ag: Manure 0.5 1.0 2.8 2.0 2.2 1.7 0.3 1.5 71.4 9.2 22.8 8 .5 Intensive Ag : Chem Fertilizer 1.0 2.0 5. 0 2 . 4 4.4 3.7 1.0 3.1 14.2 8.4 7 4.5 9.3 Urban Core 18.9 14.1 8.3 4.6 5.4 6.6 52.3 20.6 2.8 5.0 12.3 13.4 Suburban 58.8 18.6 14.6 4.6 11.8 13.7 3. 3 8.2 2.2 4.3 9.4 10.1 Suburban E dge 7.7 8.1 13. 3 6.3 19.0 11.7 2.2 5.6 15.2 10.0 42.6 12. 8 Re m ote 0.5 1.9 89.0 10.9 5.4 6. 4 0.2 2 .2 1.8 4.3 3.1 5.0 Rural: Low Intensity Ag 1.5 3.4 16.2 7.6 7.8 5.7 1.1 4.2 52.6 11.1 20.7 10.1 Rural: Towns 3.5 6.1 43.7 13.3 19.9 14.9 1.9 8.2 15.2 12.7 15.7 11.4 80 Text A1. 3 Results Figures and Tables Th e following pages contain add itional results related figures and tables. Figure A1.1 3 SENSEmap p hosphorus s ources. Color breaks are not necessarily representative of individual pixel values due to visual resampli ng. Colors are meant to show phys ically meaningful p atterns in the data visible at this map scale. Pixel level distributions of sources ca n be found in Table A1. 4. 81 Table A1.5 Pixel level quantile values for N nonpoint source maps . Zeroes are excluded. N ( kg/ha/yr) Percentil e Chem Non Ag Fertilizer Septic A tmospheric Depos ition Chem Ag Ferti lizer Manure Nitrogen Fixation 0 0.59 11.4 3.6 0.01 0.74 0.01 20 28.4 93.2 6.9 40.4 250.4 4.4 40 41.6 112.1 8.5 56.1 346.6 11.9 60 51.7 125.0 10.0 69. 9 451.7 20.6 8 0 67.8 144.1 11.4 87 .5 615.7 3 0.0 100 167.6 1000.0 22.1 477.1 915.6 301.7 Table A1 .6 Pixel level quantile values for P nonpoint source maps . Zeroes are excluded. P (kg/ha/yr) Percentile Chem NonAg Fertilizer Septic Atmospheric Deposition Chem Ag Fertili zer Manure 0 0.08 2.7 0.1 1 0.10 0.12 20 4.1 22.5 0.17 4.6 46.0 40 6.0 27.0 0.21 6.1 62.3 60 7.6 30.1 0.28 7.9 83.4 80 10.3 34.7 0.33 11.4 119.4 100 28.3 200.0 0.43 27.9 198.4 82 Text A1.4 Comparisons to other Products The following s ections detail m ethodology and resu lts fro m comparisons made betwe e n SENSEmap an d other nutrient produ cts. Text A1. 4.1 SPARROW Table A1. 7 shows mean and standard deviation of absolute difference between nutrient inputs at the HUC8 level in units kg/ha/yr , where positive values indicate SE NSEmap h ad higher values than S P ARROW. SENSEma p and S PARROW have a f ew key differences in data sources and temporal resolution. SENSEmap - GLB uses the 2012 Agricultural Census for manure calculations and Brakebill & Gronbe rg (2017) data fo r chemical fertili zer (200 8 - 2012), whereas SPARRO W uses earlier data fr om Ruddy et al. (2006) for both manure and chemical fertilizer. SENSEmap also uses more contemporary state - level CAFO datasets, which may have raised the total confined manure quantitie s. Additionally, e ach mode l used different approa c hes to spatial ly disa ggregate datase ts and distribute them across watersheds, thus creating additional variance between the two datasets. Table A1. 7 Summary statistics comparing SENSEmap and SPARROW . Mean and standard deviation of d iffer ence between SENSE map and SPARROW . Positive values indicate SENSEm a p had higher values than SPARROW. SPAR ROW Source Difference (kg/ha/yr) N TN 2.1 +/ - 8.5 Total Manure 4.9 +/ - 8.6 Chem NonAg Fert. 0.1 +/ - 1.1 Chem Ag Fert - 2.8 +/ - 5.8 P TP - 0.1 +/ - 1.2 Total Manure 0.6 +/ - 1. 3 Chem NonAg Fert. - 0.1 +/ - 0 .3 Chem Ag Fert - 0.7 +/ - 1.1 83 Fi gure A1. 14 HUC8 level comparison of SPARROW and SENSEmap. kg / ha /yr. Grey line indicates 1:1. 84 Text A1. 4.2 Net Anthropogenic Nitrogen Index (NANI ) in Swaney et al. (20 18) SENSEmap was compared to the Net Anthropoge n ic Nitrogen Index (NANI) as presented i n Swaney et al. (2018). N inputs in kg/ha/yr were calculated at the county - level using total county area. 2012 inputs were compared due to SENSEmap usi ng the 2012 Agricu ltur al Census an d fertilizer data ending in 201 2 . Figure A1. 1 5 shows bivariate plots fo r chemical agricultural fertilizer, manure, and N fixation. Table A1. 8 Summary statistics comparing SENSEmap and NANI. Mean and standard deviation of difference between SENSEmap and NAN I (Swaney et al., 2018) at coun ty level. Positiv e values indicate SENSEmap was higher than NANI. N ANI Source Differenc e (kg/ha/yr) N Total Manure 3.8 +/ - 6.2 Chemical Agricultural Fertilizer - 2.7 +/ - 5.6 N Fixat ion 7.1 +/ - 15.9 85 Figure A1.15 Co unty level comparisons between SENSEmap a nd NANI. kg / ha/yr. Grey line indicates 1 :1. 86 T ext A1. 4.3 Chemical Agricultura l Fertilizer Cao et al. (2018) fertilizer produ ct. SENSEmap was r esampled to 5 km resolution with me an fertilizer intensity to ma t ch Cao et al. (2018). SENSEmap gener ally had lower fertilizer intensities in areas with higher manure content (Wisconsin, northeastern Indiana, western Michigan) and higher fertilizer intensi ties in intens ely farmed areas like Ohio and the thumb of Michig a n (Fig ure A1. 16). Figure A1. 16 Di fference map between SEN SEmap and Cao et al. (2018) N fertilizer . P ositive difference indicates SEN SEmap > Cao et al. (2018). 87 Figur e A1. 17 Pix el level differ ences by SENSEmap fe rtilizer application intens i ty . The difference between Cao et al. ( 2018) and SENSEmap is plotted on the y - axis, with positive values indicating SENSEmap had higher fertiliz er application and negative values ind icating Cao et al. (2018) was higher. The grey line marks zero and the bl a ck line is a fitted lowess curve. 88 T ex t A1. 5 Sensitivity to Chemical Agricultural Fertilizer We performed a sensitivity analysis based on the f chemical ag ricultural fe rtilizer at the county level. Our final model use s the average value from 2008 - 2012 fro to this total fertilizer, we ran 14 n ew simulation s with differ ent values of chemical agricultural fertilizer at the county level. County - level values are used within SENSEmap to adjust pixel level predictions by the ratio needed to match simulated values to observed county N and P. However, th ese values el outputs f ertilizer modeling based on expenditu res. We produced 14 new simulations of pixel level chemical agricultural fertilizer N and P. New county f ertilizer values were chosen based on th e following methods. Fir st, mean and standard deviation of N and P kg was calculated for both 2000 - 2012 and 200 8 - 2012 in each county. One standard deviation was added and subtracted to each mean per county, creating 4 runs where values were systematicall y shifted pos itively or ne gatively. The other me thod created random noise b y using a random adjustment uniformly between +/ - 1 standard deviation per county, with 5 runs using the 2000 - 2012 distribution and 5 runs usin g 2008 - 2012 distribution. This resulte d in 14 new s imulations of county - level chemical agricultural fertilizer. A fter pixels were adjusted for each si mulation, HUC12 watershed aggregations were performed. Coefficient of variation (standard deviation/mean) was calculated across all simulations p er watershe d for chemica l agricultural fertili zer N and P (kg/yr) and tot a l N and P (kg/yr) to describe the wid th of the distribution across runs. Finally, to assess changes in broader nutrient patterns, we applied o ur Nutrient Input Landscape cluster de fi nitions to each sensitiv ity simulation. K - mean s clusters produce a centro i d value for each cluster and distance is computed to find which cluster a sample belongs in. Thus, 89 we can apply clusters trained on a differen t dataset (i.e. primary SENSEmap) to p re dict which cluster any w atershed would fall in . If the product is not sen s itive to these changes in chemical ag ricultural fertilizer, Nutrient Input Landscape should always be the same. 85% of watersheds remained con sistent through all sensitivity simula ti ons. 90 CHA PTER 2 : CONNEC TING LANDSCAPE CHARACT ERITICS TO GROUNDWATER NITR A TE CONCENTRATION A bstract Nitrate in groundwater has become a growing concern due to recent studies suggesting that even low concentrations c an lead to elevated cancer risk. Data o n groundwater nitrate conc entrations is not wide ly available due to varying regulations and sampling limitations. Here, we utilize a e the 1980s to better understand driver s of groundwa ter nitrate c oncentrations. We focu s on the 2006 - 2015 period a n d evaluate the probability of exceedi ng 0.4 mg/L and 2 mg/L concentrations using interpolated nitrate concentrations. Classification and Regr ession Tree (CART) analysis was used wi th watershed - level summari es of concentration ex ceedance probabilities with a suite of potential driver variables , including land use, geologic attributes, soil characteristics, and nitrogen loading. CART divides a da taset into groups that minimize within - gr oup varianc e based on th e highest performing s plit in driver variables. C A RT explained 43.2% of variation in th e >0.4 mg/L case with only six terminal groups, which is a strong performance for a small final tree and a complex dataset. Aquifer recharge wa s identified across all an alyses as the most imp ortant initial variable to s eparate low and high concentration wa tersheds. A combination of deep soil variables (texture and saturated hydraulic conductivity) and land u se variables further separated low, med iu m, and high probability groups. Our findings s uggest that sufficient rech a rge is necessary to mobilize nitrogen , and that even forested or low intensity agricultural areas can load nitrate to aquifers if high rechar ge and vulnerable geologic features are p resent. The se findings c an improve identificat ion of high nitrate risk ar e as and determine how and where manage ment strategies will affect aquifers based on climatologic and geologic vulnerability. 91 1 Introduction Ni trogen cycling has been significantly a lt ered by ant hropogenic ac tivities with the rise of industrial agriculture a nd growing population (Keeler et al., 2016 ; Vitouse k et al., 199 7) . Nitrogen sources including chemical fertiliz e r, manure, and human waste have creat ed a suite of water quality challenges that affect the environment, human health, and economics. Nitroge n loading to groundwater can threaten h uman health w hen ingested via drinking water. As early as the 1930s, high c o ncentrations of nitrate in drinking w ater from wells in agricultural areas was shown to cause infant methemoglobinemia, also known as blue ba by syndrome, a fatal condition that ren ders hemoglob in unable to car ry oxygen throughou t the body (Ward et al. , 2005) . The se high nitrate loads were a result of increasing prevalence of chemical fertilizers an d resulted in public health regulations limiting drinking water to less than 10 mg/L NO 3 - N (W alton, 1951) . In recent years, researchers have begun to link nitrate in drinking water to higher preva lence of multiple cancers and birth def ects, such as colorectal c anc er, thyroid cancer, and adverse pregnancy outc o mes (Mana ssara m et al., 2006; W ard et al., 2005, 2018) . I n contrast to infant methemoglobinemia , these risks occur at much lower nitrate concentrations in drinking water. Ward et al. (2018) reviewed epidemiology literature and found many studies with increased ris k o f various cancers a ssociated with exposure to d rinking water nitrate at concentratio ns as low as 1.5 mg/L NO 3 - N, with multiple studies finding heightened risk around 2 mg/L NO 3 - N (Schullehner et al, 2 018; Temkin et al., 2019) . As these h ealth risks b ecome quantif ied , it is increasingl y important to understand w h at environmental conditions lead to h igh groundwater nitrate concentrations, and through both this and more extensive sampling identify commu nities at risk. Additionally, nitrate i n groundwater is a consist ent and long - term sour ce of 92 nitrogen to streams a n d lakes, making it important to under stand when constructing nutrient budgets and addressing eutrophication (Benet tin et al., 2016; Vero et a l ., 2017) . Limited data on groundwate r nitrate is available due to varying regulations combined with the effort, cost, and privacy concerns r elated to sampling wells. There can be highly hetero geneous patte rns of nitrate concent rations in the subsurface d u e to variable loading pulses, geologi c characteristics, and the laminar flow characteristics of groundwater (Canter, 1997) . Thus, extensive sampling would be required to effectively map areas. These cha llenges have led researchers to turn to a variety of index - based, st atistical, and proc ess - based modeling methods t o both understand drivers of groundwa ter nitrate concentration and predict nitrate concentration at unsampled locations. Recently, machine le arning methods like classification and regression tr ees (Bu row et al., 2010) , boosted regression trees (Motevalli et al., 2019; Nolan, Fienen, & Lorenz, 2015; Ransom et al., 2017) , and random forest (Messier et al., 2019; Wheeler et al., 2015) have been used to link lar ge datasets of nitrate samples to drive r variables to predict con cen t rations at non - sam pled locations. In addition to producing non - linear predictions, these tree - based algorithms assess the importance of different variables in describing concentrations, w hich can help decipher the underlying c onditions that lead to ele vat e d nitrate concentr ations in a region. Large d atasets of well samples are frequently used to train the models with related variables summarized within circular buffers. These studies have ranged from continental to small waters hed scales and have had va ryi n g predictive succe ss. Studies have reported t he most significant driver variables falling into categories related to nitrogen inputs (Nolan et al., 2014; Nolan & Hitt, 2006) , geologic properties (Barzegar et al., 2018; Messier et al., 2019; Mot evalli et al., 2019) , redox conditions (Ransom et al., 2017) , and well depth (Wheeler et al., 2015) . 93 Here, we analyze drivers of gr oundwater nitrate concentration with classification and regression t ree analysis (CART) using an extensive dataset of nitrate measurements from Over 300,000 samples f rom 76,724 unique wells were used to characterize nitrate concentrat ion across the region and interpolated using k riging. Probabilities of exceedance for two concentration levels were calculated from kriging and aggregated to a small surface - watershed (HUC 12) scale to analyze using CART with 29 physical and management vari ables (USGS, 20 13) . Unlike other machine learning studies published to date, we utilize the high sampling density in our dataset to interpolate and generate training data using watershed summaries instead of circular buffers surrounding sparse individual well points. Th e objectives of this work are to: 1. - 2015 using over 76,000 unique wells with more than 300,000 samples. 2. Assess drivers of nitrate concentration using landscape characteristics at small water shed level with CART. With this extensive dataset, we provide maps of nitrate concentration in groundwater and link them to the physical proce sses and land use management. By understanding the dri vers of groundwater nitrate concentrations, we can inform ma nagement strategies and quantify nitrate exposure. 94 2 Study Area d States and bordered by four of five Laurentian Great L akes (Fig ure 2.1 B). The state of Michigan includes the lar ge ly forested Upper Peninsula, however it was not included in this analysis due to limited data. Land use within the LP is highly variable, w ith urban, intensive agriculture, and large swaths of fo rests and wetlands throughout the state (Homer et al., 2015) . There is a significant north - south divide in land use, where major u rb N (Fig ure 2.1A ). Metropolitan areas such as Detroit and Grand Rapids are located on the southeastern and western central regions, home to most o (U.S. Bureau, 2020) . Much of the land area of the southern LP is under agricultural management, including corn - soy rotations, wheat, and h ay in major row - and field - cropped areas and orchards a long the coast of Lake Michigan (USDA Ag Census, 2012). Nort forested and sparsely inhabited. The Quaternary period deposited much of the glacial and post - glacial alluvi al geology making up aquifers commonly used for private drinking water wells in the LP. Coarse - textured glacial depo sits cover much of the LP, with the exception of increased clay and silt in the central and eastern LP, which was formerly a lakebed. Quater nary glacial drift aquifers are used for private drinkin g water wells for much of the LP due to sandy, high conducti vity sediments. This aquifer is deepest in the northwest aquifer thickness exceeds 100 meters (Fig ure A2. 1, Soller and Garrity 2018) . Soil satu rated hydraulic conductivity (K Sat ) ranges from 3 - 550 mm/hr with lower values in the south central and eastern forme r lake beds, and higher conductivity across the western and northern LP (Figure 2. 1 D ). These patterns correspond with soil texture (Fig ure 95 A 2. 2). Bedrock wells are used where glacial drift is limited (Figure A2. 1, <5 meters). Aquifer recharge, an important characteristic for groundwater nitrate analysis, exhibits an east - west gradient due to a combination of highly conductive soils and high pr ecipitation in the west (Fig ure 2. 1C). The western half of the LP has increased precipitation in the form of lake ef fect precipitation due to its proximity to Lake Michigan. Annual precipitation for the LP ranges from approximately 700 - 1100 mm/yr (PRISM Cl imate Group, 2012). 96 Figure 2. 1 Groundwater nitrate s tudy area . environmental variables show n. A) Locator map of Continental United States with study area highlighted in red, B) Land use/land cover map modified from NLCD 2011 (Homer et al., 2015) , C) Modeled map of yearly aquifer recharge (See Section 2 . 3.6), D) Map of soil saturated hydraulic conductivity for 100 - 300 cm depth from SSURGO with quantile breaks (NRCS, 2017) . 97 3 Data The following subsections br iefly describe the data sources used for this study. Tab le 2. 1 summarizes these sources. Table 2. 1. Summary of data sources. Detailed vari ables found in each group can be found in Table A2. 1. Data Type Data Source Author Time Spatial Resolution Nitrate Well Chemistry MI EGLE 2006 - 2015 Well Points Well Geology Wellogic MI EGLE 2006 - 2015 Well Poin ts Groundwater N SENSEmap Hamlin et al (2 020), Wan et al (In prep) 2008 - 2015 120 m Soil Properties gSSURGO USDA NRCS 30 m Aquifer K Sat Wellogic Wellogic, Farrand & Bell (1982) 120 m Recharge Modeled Hyndman et al. (2007); Wan et al. (In prep) 2010 12 0 m Land Use NLCD Homer et al. (2015) 20 11 30 m 3.1 Well Chemistry Well chemistry data was retrieved via a Freedom of Information Act request from the Michigan Department of Envi ronment, Great Lakes, and Energy (EGLE) in December 2019. This database i nc luded over 3.6 million samples of various chemicals from dri nking water wells across the state dating from 1984 to 2019. Well chemistry data was provided as two categories of wells, pp ly wells, industrial wells, and wells bel onging to businesse s or organizations. Private wells, which are those used in private residences, were sampled when drilled, generally resul ting in a single chemistry sample per well, whereas public wells are rout in ely monitored depending on their category of use. 98 3.2 Wellog ic EGLE provides a publicly available database of drinking water well information digitized lled Wellogic, which contains all wells drilled since 1996 and partial re co rds of wells drilled prior to 1996 due to variability in cou nty - by - county archival digitization efforts. Information includes date of drilling, screening intervals, and aquifer prop erties. In this study, we utilized both the spatial location of the wells , as well as information on which aquifer e ach well was screen ed within. 3.3 Nitrogen Loads The Spatially Explicit Nutrient Source Estimate map (SENSEmap) was used to quantify total nitrogen inputs to groundwater (Hamlin et al., 2020; Wan et al., In Prep) . SENSEmap quantifies point sources and six non - point sources of nitrogen inputs to the landscape in Hamlin et al. (2020) by synthesiz in g literature, remote sensing products, go vernment records (ie., US Census, US Agricultural Census), and modeling produc ts (ie. Estimates of county level fertilizer loads). Sources i nclude atmospheric deposition, chemical agricultural fertilizer, manure, ch emical non - agricultural fertilizer, septi c tanks, N fixation from legumes, and point sources. Wan et al. (In prep) impl ements a statistical transport model to quantify how much nitr ogen survives to the Great Lakes coastline after attenuation along surfac e and groundwater pathways. This model was updated from previous work by Luscz et al. (2017). In this study, we use two m odel outputs: the total nitrogen (TN) load to groundwater per year from all surface - applied sources (kg/ha/yr) and the TN load from sep ti c tanks to groundwater (kg/ha/yr). Both e stimates are representative of an average year during 2008 - 2015. See Figure A2 . 3 for the map of TN loads to groundwater. 99 3.4 Land Use/Land C over Land Use/Land Cover (LULC) variables were summarized from the 2011 N at ional Land Cover Database, shown in Figur e 2.1 (Homer et al., 2015) . B ot h individual land use classes as reported in NLCD and aggregated land use classes (ie., Total urban, total forest, tota l agriculture) were used within analysis. Land use per categor y was tabulated and normalized to the proportion of the watershed. 3.5 So il Soil variables were extracted from gSSUR GO and processed using ROSETTA (NRCS, 2017; Schaap et al., 2001) . This study used data fr om soil layer 1 (0 - 5 cm) and soil layer 4 (100 - 300 cm). Variab les included were texture by percent (Figure A2.2 ) and saturated hydrauli c conductivity (K Sat ) (Figure 2.1D ). 3.6 Aq uifer Recharge Aquifer recharge was calculated using outputs from the Landscap e Hydrology Model (LHM) and statistical relationships derived from those results. Modeling was performed over a 28 - year period for an a pp roximately 20,000 km 2 within the Muskegon HUC8 watershed in west - central LP (Hyndman, Kendall, & Welty, 2007; Kendall, 2009) . Thi s watershed has diverse geologic and land u se characteristics reasonably representative of conditions found across the re mainder of the LP. Within each of five major land use classes (urban, agricultural, grass, deciduous, coniferous), linear regressions w er e developed to quantify the fraction of a nnual precipitation that has been simulated to become aquifer recharge as a fu nction of soil hydraulic conductivity. 100 3.7 Aquifer Saturated H ydraulic Conductivity Aquifer saturated hydraulic conductivity (K Sat ) was e stimated by the Michigan State University Remote Sensing & GIS using well log descriptions of sediment texture and pump tests to determine K values and was reported in the Michigan Wellogic Database ( Michigan Waterwells for WELLOGIC dataset , 2 01 9) . To extend these well - based measureme nts to the rest of Michigan, we used the geometric mean of well - based K for ea ch Quaternary geologic polygon (Farrand & Bell, 19 82) . Polygons without wells used the average K for other po lygons of similar geologic class. See Figure A2.4 for this map of aquifer c onductivity. 101 4 Methods Utilizing the la rge well chemistry dataset collected, analyzed, and stored by a variety of organ izations and individuals resulted in the need for extensive pre - processing and quality assurance/quality control (QA/QC) protocols. F or m ore details, see section A2. 1. Briefly, we used the following major steps to prepare well chemistry data for use: 1) Ge ocode data, or identify spatial coordinates for each point, 2) Perform QA/QC on geocoded addresses, 3) Perform QA/QC on concentration mea surements, and 4) Join well chemistry d ata to well geologic data from Wellogic. At this time, only nitrate measurements were used, while future work will include processing sulfat e and nitrite for use in this analysis. The 2006 - 2015 period was used in this analysis to incorporate a large volume of data and match the time period of a key analysis dataset, SENSEmap, which re presents the average nitrogen inputs expected within 2008 - 20 15. 4.1 Kriging Kriging, a geostatistical interpolation method, was used to c reate two distinct map types: 1) a cont inuous surface of groundwater nitrate concentrations, and 2) a similar surface o f the probability for exceedance of different nitrate concen trations. Kriging fits a function to the spatial autocorrelation (here, s emiv ariogram) calculated from known points in a geographic dataset, assuming that some variation between points is due part ially to randomness and to the distance between points (Bailey & Gatrell, 1995) . Depending on the data, the we ight given to each point near a prediction cell varies by the shape of the chosen semivariogram function. Due to the large spatial extent, variable sampling density, and variation in concentration measurements, we used Empiri cal Bayesian Kriging (EBK) in A rcGI S Pro 2.5 to automate the kriging proce ss (ESRI, 2020a) . This method iteratively produces semivario grams for subsets of 102 the data to tailor th e kriging function to each neig hbor s manual control of fitting a semivariogram function to the entire dataset (ESRI, 2020b; Hussain, Pilz, & Spoeck, 2010) . Probability kriging is a method that computes the proba b pred icti on to exceed a given threshold. This pr ovided additional information beyond kriging predictions. The probability kriging option within EBK was used to test four thresholds of exceedance in this analysis: 0.4 (the most com m on detect limit, or lowest det ecta ble concentration of nitrate by the mac hine), 2, 5, and 10 mg/L NO 3 - N. 2 mg/L NO 3 - N was chosen to be representative of new health literature indicating increased risk at values from >2 mg/L NO 3 - N (Schullehner et al., 2018; Temkin et al., 201 9 ) . 5 mg/L NO 3 - N was chosen a s a mid - level and 10 mg/L NO 3 - N was chosen drinking water. Results are only be presented here for exceedance of 0.4 mg/L and 2 mg/L NO 3 - N. Although kriging produces a pr e diction at every cell within t he s tudy area, error rises with distance fr om known data points. We only included kriging results within 3 km of a sample point to eliminate areas where kriging does not have enough data, and predictions approach the dataset m ean rather than being influenc ed b y nearby points. This threshold was cho sen because as it is the length of correlation, or range, found from a preliminary simple kriging of the entire dataset. Essentially, these 3 km buffered inclusion masks allow for th e benefits of kriging to extrap olat e and spatially debias the point data, without over - extending the kriged extrapolations to areas distant from sampling locations. 103 4.2 CART Classification and Regression Trees (CART) analysis was used to explore nonlinear statistical relationships betw een groundwater nitrate concentration and p otential driver variables icius, 2000) using a single response variable and a suite of driver variables as inputs. CART starts with the en t ire dataset and breaks the res pons e variable into two groups based on thr esholds in the driver variable that minimizes within - group variance. Each resultant group is then split again (the split is referred to as a node) based on whichever driver threshold next minimizes variance. These rec does not improve the perf o rmance (here measured using th e co mplexity parameter, otherwise known as proportion reduction of error, PRE) of the CART by at least 3%, the group will not be split further. The final CART results include the optimal decision tree and terminal groups, a m e asure of response explained (P RE), split. PRE behaves similarly to the coefficient of determination (R 2 ) used in linear regression, ranging from 0 to 1 and corresponding to the percent of dataset variance explained. Competitor spl its are defined for each node and would divide the data into groups with a similar PRE, but typically create different groups than the optimal split. Surrogate splits are other driver thresholds that would split the group in a similar manner to the opt imal split. Analysis of competitor and surr ogate splits can provide extended understanding of how the driver variables are related. We summarized driver and response data at the watershed level using the USGS Watershed Bounda r y Dataset Hydrologic Unit Code sys tem; specifically at the HUC12 scale 104 wh ich includes watersheds for small streams with approximately 80 km 2 area (USGS, 2013) . Within watersheds, summarizes were compiled using medians for a l l variables except land use an d so il texture, which were reported as area l proportions summing to one. CART analysis was performed for all exceedance thresholds for both Quaternary and Bedrock aquifers, producing eight CART trees. Median probability of th r eshold exceedance in each wate rshe d was used as the response variable. Fo r sensitivity analysis, mean concentration from all wells in watersheds with over 20 wells was used as a response variable in a separate CART analysis. Driver variables included repr e sentatives from all variables in T able 2. 1. Detailed descriptions of all 29 variables included can be found in Table A2. 1. 105 5 Results and Discussion Groundwater nitrate concentrations can first be viewed as a map of well points and concentrations for each aquifer after QA/QC and geoco ding (Figure 2 .2 ). Concentration breaks wer e determined based on physically meaningful thresholds described in Se ction 4.1. Quaternary aquifer wells are most densely located in the western and north central areas of the state, as well as a strip in the eas tern portion, direc tly outside Detroit (Fig ure 2 .2 A - 3) that generally corresponds to glacial sediment features gr eater than 50 meters thick (Fig ure A1. 1). The western portion of the state consistently had areas of elevated nit rate concentration, including in the northwestern region where land cover is primarily forest (Fig ure 2.1 A). The areas of highest concentration samples are north of Grand Rapids (Figure 2 .2 A - 1) and south of Kalamazoo (Figure 2 .2 A - 2), both corresponding wit h intensive agricultural areas . 26% of Quaternary wells have detectable n itrate (>=0.4 mg/L NO 3 - N), with the distribution shown in legend of Fi gure 2 .2 A. Bedrock wells are primarily found in the eastern and south - central portions of the state, and only 6% of these wells had detectable concentrations (>=0 .4 mg/L, Figure 2 .2 B leg end). Bedrock wells in the most southern central area are associated w ith higher nitrate concentrations, likely due to their placement in an intensely agricultural region (Fig ure 2. 1A ). The remaining analysis focu ses on wells in the Quaternary glacial depo sit aquifers, due to their wide use for drinking water and closer rela tionship with surficial nitrogen loading processes (Fig ure 2 .2 A). 106 Figure 2. 2 Well - level nitrate concentration m aps . Locations and concentrati on values of wells sampled in 2006 - 2015 in the A) Quaternary Aqui fer and B) Bedrock Aquifer. Legends show the distribution within each bin normalized to 100%. Highest concentration point values are plotted on top. Blue numeral s in Figure 2 .2 A denote genera l locations of major cities: 1) Grand Rapid s, 2) Kalamazoo, 3) De troit. Kriging provides both a smooth surface of predicted concentrations (Figure 2. 3A) and probability layers for exceeding selected thresholds, here >0.4 mg/L (Figure 2.3 B) and >2 mg/L NO 3 - N (Fi gure 2. 3D). The probability of exce edance can be understood conceptually through the lens of sampling within a watershed. A new sample collected within a given w atershed would have an expected probability of exceeding 0.4 mg/L NO 3 - N shown in Figure 2. 3C. Thus, localized areas within the wat ershed will be both more and less likely to exhibit nitrate exceedance than the watershed median. Although the general patter n of higher nitrate values in the western portion of the region remains throughout all m aps in Figure 2. 3, the nuance of simply ha ving detectable nitrate to having likelihood of health concern can be seen in the variation in pattern in Figure 2. 3B and 107 2. 3D. Much of the western half of the LP has median probabili ties of exceedance of 0.4 mg/L NO 3 - N over 0.5, visible in both continuous (Figure 2. 3B) and watershed summarized (Figure 2. 3C) views. In contract, probability of exceeding 2 mg/L NO 3 - N is less than 0.25 for most of the study region (Figure 2. 3D), excluding pockets within the western LP with h igher values. These pockets are more easily identified as the medium reds in Figure 2. 3E. This distinction is also temporally relevant: although this p rovides recent nitrate concentrations, areas with detectable nitrate may exceed 2 mg/L if nitrogen contin ues to be loaded in excess levels to what is taken up by plants. 108 Figure 2.3 Nitrate k riging r esults . Quaternary aquifer (2006 - 2015) kriged map, excee dance probabilities, and watershed (HUC12) summarized median exceeda nce probabilities. Empirical B a yesian Kriging produces a predicted concen tration at each cell and probability of exceedance of selected thresholds. Both predictions and probability layers are clipped to areas presumed to have enough data to be confident, h ere chosen by the correlation l ength of 3 km buffers around each well. Pr obability exceedance maps are then summarized to the HUC12 watersheds if there were nitrate concentration predictio ns for greater than 66% of the area. A) Predicted NO 3 - N concentratio n clipped to areas of confiden c e. B) Probability of exceeding 0.4 mg/L, o r the detection limit. C) Median probability of exceeding 0.4 mg/L at HUC12 watershed level. D) 109 Figure 2.3 (cont d) Probability of exce eding 2 mg/L NO 3 - N. E) Median probability of exc eeding 2 mg/L at HUC12 watersh e d level. 5.2 Interpr eting CART The benefi ts of CART for this study are: 1) Identifying the most descriptive variables leading nitrate concentrations and 2) Grouping similarly be having watersheds. Additional analysis of compe titor and surrogate splits can f urther identify corr elatio ns between driv er variables and determine breaks in the dataset. Often, spatial trends emerge in these groups due to the inherent spatial nature of geo graphic analysis geology and climate influenc e land use, population density , and nitrogen inputs. Here, we present CAR T analysis for two cases: Quaternary aquifer exceedance of 0.4 mg/L NO 3 - N (Figure 2. 4) and Quaternary aquifer exceedance of 2 mg/L NO 3 - N (Figure 2. 5). Results are discussed by examini ng each split in the decision tr ee (Figure 2. 4A, Fig ure 2. 5A), the distri bution of terminal CART groups (Figure 2. 4B, 2. 5B), and the geographic distribution of exceedance probabilities within each terminal CAR T group (Figure 2. 4C, Figure 2. 5C). Variables referred to in this section fa ll into four groups de termin ed by physical understanding and analysis of competitor and surrogate splits: 1) recharge, 2) geologic, 3) land use/land cover (LULC), 4) nitrogen inpu ts. Recharge is isolated to its own category du e to its specific nature as a : it i s influenced by precipitation (climate), land use (ability of water to permeate soil), and geology (soil characteristics, unsaturated zone travel time) . Geologic variables include soil textures, soi l saturated hydraulic conducti vi ty, and aquifer satu rated hydraulic condu ctivity. LULC describe the land use composition of a watershed e.g., how much is forested, urban, or agricultural. Nitrogen inputs are described by total nitrogen loads to groundwat er and loads from septic syste ms , which are another variab le combining 110 dr iving factors: nitrogen loading is determined by specific management (not visible solely by land use), geologic conditions, and precipit ation driving transport. 5.2.1 Probability of Exceeding 0.4 mg/L CART for de te ctable nitrate resul ted in six groups of watersheds representing low, medium, and high probability and are described by a combination of recharge, soil, and land use (Figure 2. 4A). Low probability, where it would be unlikely to detect nitrate in groundwa te r, is described by G roup X 1 (mean probabi lity of exceedance = 0.082) and is explained by recharge < 250 mm/yr and deep soil (100 - 300 cm) with a sand texture < 65%, paired with a surrogate split of deep soil clay texture > 11 .5% (Fig ure 2. 4A). This soil t ex tural split correspo nds wi th sandy loam. low recharge areas with soil with less sand than sandy loam will isolate the least nitrate - sus ceptible areas. In contrast, h ig h probability Groups X5 (m ean = 0.51) and X6 (mean = 0.53) are found on both sides of the recharge split and are described by a combination of geologic and land use variables. For Group X5, on the lower recharge side (<250 m m/yr), deep soil is sandy (>65 %) and has a high prop ortion of land in cro p agriculture (>40%). Although Group X6 falls on the high recharge side, it similarly is described by a geologic variable (high deep soil K Sat ; >3.6e - 6 m/s) and a land use variable (l ower proportion of forest; <= 57 %). Medium risk grou ps (X2 , mean = 0.21; - favorable geology and land use Group X3 has high sand (>65%) but low crops (<40%), while Group X4 has low dee p soil KSat (<3.6e - 6 m/sec) an d Group X5 has high de ep soi l K Sat (13 mm/h r) and high forest (>57%). Note that for simply detecting nitrate, groups of high and medium risk fall on both sides of the recharge threshold but are differentiated based on a combin ation of geologic a nd land use v ariables. 111 Mapping t he gro ups (Fig ure 2. 4 B) reveals a strong east - west divide, where almost the entire western half of the LP is at medium or high probability of detectable nitrate. There is not a strong north - south divide i n high probability Groups X5 a nd X6, which can be fo und in any latitude i n the western LP. Lacking this latitudinal divide drives some of the variability within Groups X5 and X6 (Fig ure 2. 4B) land use in northwestern LP has more forest, smaller towns, an d lacks intensive c orn - soy agr ic ulture found in much of th e southwest LP. 112 Figure 2.4 CART results for >0.4 mg/L NO 3 - N . Results from CART analysis for median probability of exceeding 0.4 mg/L, or nitrate detection limit within a HUC12 watershed. A) Decisi on tree identify ing each spl it within the data. E a ch no de contains the gr oup mean probability of exceedance, the total watersheds in the group ( n ), and the proportion response explained (PRE), a metric that describes the performance of the split relative t o the entire dat aset. Nodes are colored based on t heir mean probability, with higher probabilities in darker colors. 113 Figure 2 .4 (cont d) The final CART groups are numbered from low to high concentration. B) The final CART groups are mapped by HUC12. Colors correspond and groups are c on sisten t with those in A, with higher pr o bab il ities in darker colors. C) Violin plots for each final group. Violin plots display the range and mean as a line with a probability density function fit to show stribution thicker areas c on tain m ore samples. 5.2.2 Probability of Ex cee ding 2 mg/L The basic tenets of a combination of recharge, soil, and land use remains intact for identifying the probability of exceeding human health risk thresholds, 2 mg/L NO 3 - N , in watersheds. Here (Fig ur e 2 . 5A), recharge again creates a signific a nt br eak in the data: 624 watersheds are isolated based solely on having low aquifer recharge at an almost identical threshold as detectable NO 3 - N (0.4 mg/L: <250 mm/yr; 2 mg/L: <251 mm /yr). The remaining watershe ds are sp lit first based on a relatively r a re oc currence which includes the 31 watersheds with almost no mixed forest (< 0.3%), with surrogate splits highlighting other non - agricultural or urban land cover. These outliers are th en divided into the highest and lowes t groups (Y1, mean = 0.014 and Y7 , m ean = 0.27) based on deep soil K Sat (threshold: 15 mm/hr). Moving to the left of Mixed Forest, to watersheds that have mixed forest, the groups are split by a series of land use varia bles (Crops < 34%, Grassland < 3.82%) and finally a soil variable (dee p s oil clay >= 6%). Here, the story is less clear and the groups of medium risk (Y5) and high risk (Y6) are both small and have wide distributions (Fig ure 2. 5B), with only 17 and 24 wate rsheds, respectively. Genera lly , those watersheds with higher probabil i ti es of exceedance have more agricult ural land, and if soil is introduced there is extremely low clay (<5.5%) with a surrogate split of high sand content (>90.5%). Spatially, a differen t pattern emerges much of the northwestern LP is no longer in a medi u m or high probability category and po ckets of such in the southwestern LP become more isolated. It is important to note the relative level of difficulty in explaining the 114 probability of exceeding 2 mg/L because th e o verall probabilities at the watershed l e ve l a re much lower (Figure 2. 3E) and few have high values. Figure 2. 5 CART results for >2 mg/L NO 3 - N . Results from CART analysis for median probability of exceeding 2 mg/L, or detectin g nitrate within a HUC12 wa ter shed. A) Decision tree identifying eac h spl i t within the data. Each node con tains the group mean probability of 115 F i gure 2.5 (cont d) exceedance, the total watersheds in the group ( n ), and the proportion response explained (PRE) , a metric that describes t he performance of the split relative to the en tire dataset. N od es a re colored based on their mean p robability, with higher probabilities in darker colors. The final CART groups are numbered from low to high concentration. B) The final CART groups are mappe d b y HUC12. Colors correspond and groups are c onsistent with th ose in A, with higher probabilities in darker colors. C) Violin plots for each final group. Violin plots display the range and mean as a line with a probability densi ty function fit to show the thicker areas c ontain more sam pl es. 5.2.3 Sensitivity We performed C ART analysis on a response variable based solely on point data to confirm that kriging methods produce comparable results. This C ART reasonably predicts wat ersh ed - averaged well point concentrations in Q uaternary aquife rs (Fi g ure A2. 5). This response variab le includes spatial biases in sampling that we believe were reduced by kriging. Regardless, driver variables had similar splittin g thresholds. Recharge isol ated the lowest concentration watersheds at a threshold of 242 m m/ye ar. The group of outliers identi fied by having no mixed forest in the 2 mg/L CART were also identified and further split by a similar soil K Sat value (12 mm/hr i n soil depth 0 - 20 cm). Fore st p roportion of 47% and crop proportion of 50 % appeared and s pl it l ogically (ie., higher crop propo rtion was associated with higher nitrate concentrations). Aquifer K Sat , which was not a top performing driver in the exceedance C ARTs, appeared and sensibly spl it away a low concentration group based on low conductivit y. The CART also performed similarly t o the exceedance probability CARTs (PRE=40.3% compared to 0.4 mg/L: 43.2%, 2 mg/L: 40%). The similarity of these results, along w ith the benefits of the kri ging , suggest that our primary analyses are bo th robust and pr ef erre d to this similar watershed poin t averaging for our dataset. 116 5.3 Understanding Drivers of Groundwater Nitrate Concentration 5.3.1 Recharge Aquifer recharge was c onsistently the highest per form ing initial split at a threshold of 242 - 25 1 mm/yr. This th re shol d creates a visible split in the LP (Figure 2. 6A) consistent with the general pattern of the presence of groundwater nitrate (Figure 2. 6B). There were no compara ble surrogate splits; howev er, soil texture variables were almost as effe ctive as competi to rs. The descriptive ability of recha rge in LP groundwater nitrate may be due to its multifaceted dimensions as a transport variable combining precipitation and geo logy. Nitrate would be unex pect ed in an aquifer underlying soils that are not sufficientl y perm eable; not only must nitrogen be applied, regardless of land use, but the resulting nitrate must be mobile within the unsaturated and saturated zones. However, r echarge never appeared outs ide of the initial split in any CARTs, nor wer e there any equa ll y pe rforming variable in surrogate o r competitor analysis. Recharge is a modeled variable and not generally readily available for groundwater nitrate analysis elsewh ere, suggesting the importa nce of modeling and the need for better quanti fied recharge ma ps . Af ter identifying high recharge ar eas, the nuance in groups was described by additional geologic and land use variables. 117 Figure 2.6 Aquifer recharge compared to p robability of exceeding 0.4 mg/ L NO 3 - N . A) HUC12 summarized median aquife r recharge colo re d by CART significant splits. >250 mm/yr was the consistent split for Quaternary aquifers. 241 - 250 mm/yr represents the additional watersheds included at the lower Be drock CART recharge split. B) H UC12 sum marized median probability of exce eding 0.4 mg/L, r epli cated from Figure 2. 3C. Side by side, the figures show the broad consistent pattern of high recharge and elevated detectable nitrate probabilities. 5.3.2 Vulnera ble Geology and Hazardous L and U se Follo wing recharge (a transport - relate d variables) com es the factors of geology and land use. Agricultural land and high nitrogen inputs are found across the LP, but detecting groundwater nitrate is improbable in the east ern parts of the state desp ite e xtensive row - and field - crop agriculture . In contrast, e ve n he avily forested areas in northwestern LP, such as the Manistee National Forest, fall into medium detection probability Group X4 (Figure 2. 4C). Surrogate splits of ten group land use variable s whe n they a ppear in CART and similarly group soil and geolog y. Alt hough this analysis does not show the full distributions of each variable probability groups. As it stand s, simpl y having non - vulnerable geology w ill usually isol at e th e 118 lowest probability classes. Groups with vulnerable geology combined with hazardous land use result in the highest probabilities. The breaks in variables identi fied in CART analysis allow the descript ion of what azar consists of sandy (>65%), low clay (<6%) soils at 100 - 300 cm depth with K Sat greater than ~14 mm/hr. Soil texture an d hydraulic conductivity ar e con nected v ariables, with coarser soils havi ng increased con du ctiv ity. Aquifer K Sat did not appear as top CART selection, but surrogate splits showed a level of 11 - 13 m/day was associated with soil variables. Hazardous land use includes watersheds with > 30 - 4 0% crop land use and minimal natural land use. Although f or est and agriculture both appeared as deciding land uses, they did not act in binary, meaning that a split at 40% agriculture did not correspond to a surrogate split of 60% forest due to the co mplex ity of o ther urban and wetland land use. Notably, nitroge n inpu ts to groundwater or inputs from septic did not appear in CART analysis for >0.4 mg/L or >2 mg/L. Occasionally, these variables acted as surrogates with land use variables (TN and crops; s eptic and urb an), but were not as effective at improving the m od el p erformance and so were not selected as the split. Nitrogen inputs to groundwater, like recharge, is a combination variable affected by land use and management, r echarge, and geologic condi tions . The la ck of nitrogen loading as a domin ant variable in th is a nalysis may be due to modeling parameters or the lack of distinction between nitrogen sources. TN, without chemical speciation, may not be effective to describe different behavior of nitro gen s ources i n variable subsurface environment s. 119 6 Conclusio n This a nalysis provides new views and methods to describe drivers of groundwater nitrate kriging, and CART. Sources of u ncertainty include well nitrate measureme nts, kriging met ho ds, mo deled inputs, and variables not included in the current analysis. These sources of uncertainty also provide avenues of future work and improvements to this stu dy. While there are signifi cant benefits of using a spatially and tempora lly extensive da ta set, t he breadth introduces limitations and uncertainty. Many practitioners retrieved, analyzed, and digitized samples across time and space. Water table elevation, recent precipitation condit ions , and season of sample were not included i n this analysis, m eaning spatially proximal samples may not reflect the same physical conditions. Though, it is likely that this variation is dampened by the presence of samples from a wide range of environment s. A d ditionally, we selected a ten - year period of samples to i nc rease our spatial coverage and sampling density to improve kriging. Despite the variable conditions and uncertainty in such a large dataset, we see benefit in includ ing more data and allowing krig i ng to account for uncertainty between nea rby data points. T his an alysis did not include some variables found to be significant in other studies, notably well depth and a redox condition variable. Future work will include sul fate concentration in the w ater s hed from the well chemistry database as a proxy for redox c onditi ons, as dissolved oxygen is unavailable in this dataset. Well depth, screening interval, and static water level were not included due to insufficient data with in the Wellogic database an d di f ficulty summarizing this data at the wate rshed level. Alt ho ugh we lls were separated by aquifer type, it is expected that shallower wells (with shallow screening interval) are more susceptible to elevated nitrate concentratio n. 120 This extensive dataset from state - collected samples allows for a stat istically powerf ul analy sis over a wide range of land use and geologic conditions. Future work may include increased driver variables, alternative tree - based machine learning methods, extended analysis of the d istr i butions of other driver variables in each CART group, and a n appl ication of quantifying nitrate exposure hotspots based on the relationship between elevated nitrate concentration wells, population density, public supply well service, and continued nit roge n loading areas. This increased understand ing of how vulne ra ble ge ology is affected by a variety of hazardous land uses can be combined with land use legacies and future projections to identify areas that are of concern or wo rth protecting. This stud y ut i lizes an extensive database of drinking w ater nitrate wel l measur ements drivers of groundwater nitrate concentration using CART analysis. Aquifer recharge was i solated iable that can q ui ckly e liminate many watersheds from concern of nitrate concentration. Watersheds with sufficient recharge were separated based on a combination of vulnerable geology and hazardous land use. Ou r an a lysis identified sandy, high hydraulic co nductivity soils a - forested, sparsely - populated watersheds were found to have medium likelihood of d etec t able nitrate concentrations if they were in high recharge , geolog ically vulnerable areas. These findings can be used to identify areas of concern for elevated nitrate concentration and drive modeling efforts using land use l egacies, future land use pr ojec t ions, and process - based modeling to aid i n management str at egy. 121 A cknowledgements This chapter was coauthored by Anthony Kendall, Sherry M artin , and David Hyndman. We thank Yvonne Krimmel and Bailey Hannah for assistance in processing data and providing feedback on the manuscript. Data was provided from th e Michigan Department of En viro n ment, Great Lakes, and Energy from FOIAA 5371 - 19. Funding w as pro vided by NOAA Grant NA12OAR4320071 and NASA Grant NNX11AC72G. 122 APPENDIX 123 Text A2.1 Methods: Geocoding Well chemistry data was not provided with latitude - longitude coordinates for use in spatial analysis. However , addresses for each sample we re pro vided. Because wells had been sampled and data recorded by many people over a long time span, there were significant inconsistencies in how address es were entered. Wells we re also fre quently given a street address ly ident ifiable by an owner. Before we were able to use GIS geocoding software to convert address to coordinates, we filtered samples based on addresses an d eliminated wells withou t street ad dresses. Although private wells were usually only sampled once, p ub lic we lls were often sampled multiple times with inconsistently entered addresses. OpenRefine ( https://openrefine. org/ ), a software for fi ltering and parsing text in messy datasets, was used to create unique addres s IDs fo r samples that had similar addresses so that only one spatial point would be created per well. Due to the size of the dataset and our limitations i n human resources, we wer e not able to utili ze all samples or manually fi nd addresses that were not e as ily id entifiable. Geocoding was performed in ArcGIS Pro 2.2 using the ArcGIS World Geocoding Service. This geocoding algorithm includes a score which dem onstrates how confident i t is that t he addre ss was correctly matched. We selected a minimum score of 95 /100 b ased on a qualitative analysis of sampled addresses with lower scores. Lower scores were often missing a street address (placed at a town center) o r were moved to different street num bers tha n the input address had. 124 T ext A2.2 Methods: Wellogic Join To cla ssify samples aquifer source, data from the well properties database, Wellogic, was needed. Public wells are identified by a WSSN code for their owner; however, many owners have mul tiple well s. Public well owners with a single well wer e joined to the chemi stry database based on WSSN. All other wells within the database were joined based on a spatial join operation performed in ArcGIS Pro 2.5. Wells in the chemistry dataset w ere joine d to the n earest Wellogic well within 1 km. If there w as not a well w ithin 1 km, the wells could not be joined. addresses were available in the Wellogic databases, differences in address nomenclature and format required an unrealistic amount of pr e - process ing work t o join wells based on addre ss. 125 Figure A2.1 Map of t hickness of Quaternary glacial sediment . From Soller & Garrity ( 2018). 126 Figure A2.2 Map s of s oil textures for soil depths 100 - 300 cm . Fom NRCS ( 2017) A) % Sand, B ) % Silt, C) %Clay 127 Fig ure A2.3 Map of nitrogen loads to groundwate r. TN loads to groundwater. (Hamlin et al., 202 0; Wan et a l., In Prep) Fi gure A2.4 M ap of a quifer saturated conductivity . C alculated u sing Wellogic and Quaternary geology ( Farrand & Bell , 1982) 128 Table A2.1 Complete set of variables used withi n CART . Variabl es are grouped b y variable category. Category Variable Defi niti on Unit Ni trogen TN_GW_kghayr_median Total Nitrogen Load to Groundwater, HUC12 median kg/ha/yr TN_Sep_kghayr_HUC12_m d Total Nitrogen Load from Septic Tanks, HUC12 median kg/ha/yr Aquif er a q_k sat_mday_medi an Aquifer Ksat, HUC12 Median m/day rech arge _mmyr_media n Aquifer Recharge, HUC12 median mm/yr WellProp_Drift Proportion of HUC12 Wells in Quaternary Aquifer - Soil sandtotal_r_lay1_median Soil 0 - 20 cm % Sand, HUC12 median % s andt ota l_r_lay4_medi an Soil 100 - 300 cm % Sand, HUC12 median % cl aytotal_r_l ay1_median Soil 0 - 20 cm % Clay, HUC12 median % claytotal_r_lay4_median Soil 100 - 300 cm % Clay, HUC12 media n % ksat_soil_lay1_median Soil 0 - 20 cm Ksat, HUC12 median mm/hr ksa t_s oil_lay4_medi an Soil 100 - 300 cm Ksat, HUC12 median mm/hr LU LC OpenWate r_per Open Water %, HUC12 % DevOpen_per Developed - Open %, HUC12 % DevLow_per Developed - Low Intensit y %, HUC12 % DevMed_per Developed - Medium Intensity %, HUC12 % Dev Hig h_per Develop ed - High Intensity %, HUC12 % Barren_pe r Ba rren %, HUC 12 % DecidFor_per Forest - Deciduous %, HUC12 % EvergreenFor_per Forest - Evergreen %, HUC12 % Mixed For_per Forest - Mixed %, HUC12 % ShrubScrub_per Shrub/Scrub %, HU C12 % GrasslandHe rb_per Grassland Herbaceous %, HUC12 % P astu reHay_per P asture/Hay %, HUC12 % Crops_per Cultivated Crop %, HUC12 % WoodyWetlands_per Woody Wetlands %, HUC12 % HerbWetlands_per Herbaceous Wetlands %, HUC12 % forest_per Aggr egat ed Forest %, HUC 12 % urban_per Aggregated Urban %, HUC12 % wetland_p er Aggregated Wetland %, HUC12 % 129 Fig ure A2.5 CART decision tree for sensitivity analysis predicting mean N O 3 - N concentration . Watersheds included must have >= 20 wells in Quaterna ry aquif er. Var iable names c an be referenced in Table A2.1 . 130 REFERENCES 131 REFERENC ES Alexander, R. B., Smith, R. A., & Schwarz, G. E. (2004). Estimates of diffuse phosphorus sources in surface waters of the United States using a spatial ly referenced w atershed model. Wate r Science and Technology , 49 (3), 1 10. Anderson, D. M., Gi lbert, P. M., & Burkholder, J . M. (2002). Harmful Algal Blooms and Eutrophication Nutrient Sources, Composition, and Consequences. Estuaries , 25 (4b), 704 726. http s://doi.org/10. 1007/BF02804901 Ande rson, K. A., & Downing, J. A. (2006). Dry and wet atmosphe ric deposition of nitrogen, phosphorus and silicon in an agricultural region. Water, Air, and Soil Pollution , 176 (1 4), 351 374. https://doi.org/10.1007/s11270 - 006 - 9172 - 4 Arnold, J. G., Srinivasan, R., Muttiah, R. S., & Williams, J. R. (1998). Large area H ydrologic Modeling and Assessment Part I: Model development . Journal of the American Water Resources Association , 34 (February), 73 89. https://doi.org/10.1111/j.17 52 - 1688.1998.tb 05961.x Bailey, T. C ., & Gatrell, A. C. (199 6 ). Interactive spatial data analy sis. Computers & Ge osciences , 22(8), 953 - 954 . https://doi.org/10.1016/s0098 - 3004(96)80468 - 7 Barry, D. A. J., Goorahoo, D., & Goss, M. J. (1993). Estimation of Nitr ate Concentrati ons in Groundwater U sing a Whole Farm Nitrogen Budget. Journal of Environmenta l Quality , 22 (4), 767 775. https://doi.org/10.2134/jeq1993.0047242500220004 0019x Barzegar, R., Moghaddam, A. A., Deo, R., Fijani, E., & Tziritis, E. (2018). Mappin g groundwater c ontamination risk of multiple aquifers using multi - model ensemble of machine l earning algorithms. Science of the Total Environment , 621 , 697 712. https:/ /doi.org/10.101 6/j.scitotenv.2017.11.185 Basso, B., & Ritchie, J. T. (2005). Impact of c ompost, manure and inorganic fertil izer on nitrate leaching and yield for a 6 - year maize - alfa lfa rotation in Michigan. Agriculture, Ecosystems and Environment , 108 (4), 329 341. https: //doi.org/10.1016/j.agee.2005.01.011 Beal, C. D., Gardner, E. A., & Menzi es, N. W. (2005 ). Process, performa nce, and pollution potential: A review of septic tank - soil absorption systems. Australian Journal of Soil Research , 43 (7), 781 802. h ttps://doi.org/ 10.1071/SR05018 Benettin, P., Fovet, O., van der Velde, Y., Ruiz, L., Hra chowitz, M., Ho de, A. J. (2016). Transit times - the link between hydrology and water quality at the catchment scale. Wiley Interdisciplinary Reviews: Water , 3 (5), 6 29 657. https://doi.org/10.1002/wat2.1155 132 Boyer, E. W., Goodale, C. L., J aworski, N. A., & Howarth, R. W. (2 002). Anthropogenic nitrogen sources and relationships to riverine nitrogen export in the northeastern U.S.A. Biogeochemistry , 57 (1), 137 169. https ://doi.org/10.1023/A:1015709302073 Brakebill, J., & Gronberg, J. (2017). County - Level Es timates of Nitrogen And Phosphorus from Commercial Fertilizer for the Contermi nous United States, 1987 - 2012. https://doi.org/doi.org/10.5066/F7H41PKX Bri eman, L., Fried man, J., Olshen, R., & Stone, C. (1984). Classification and Regression Tr ees . New York: Routledge. https://d oi.org/10.1201/9781315139470 Brinson, M. M., Bradshaw, H. D., Holmes, R. N., & Elkins, J. B. (1980). Litterfall, Stemflow, and Throug hfall Nutrient Fluxes in an Alluvial Swamp Forest. Ecology , 61 (4), 827 835. https://doi. org/10.2307/193 6753 Brooks, B. W., Lazorchak, J. M., Howard, M. D. A., Johnson, M. V. V., Mor ton, S. L., Perkins, blooms becoming the greatest inland water quality threat to public health and aquatic ec osystems? Envir onmental Toxicology and Chemistry , 35 (1), 6 13. https://doi.org/10.1002/etc.32 20 Brown, L. J., Taleban, V., Gharabaghi, B., & Weiss, L. (2011). Seasonal and spatial dis tribution patterns of atmospheric phosphorus deposition to Lake Simcoe, O N. Journal of G reat Lakes Research , 37 (SUPPL. 3), 15 25. https://doi.org/10.1016/j.jglr.2010. 09.008 Burow, K. R., Nolan, B. T., Rupert, M. G., & Dubrovsky, N. M. (2010). Nitrate in gr oundwater of the United States, 1991 - 2003. Environmental Science and Tech nology , 44 (13), 4988 4997. https:// doi.org/10. 1021/es100546y Bytnerowicz, A., Johnson, R. F., Zhang, L., Jenerette, G. D., Fenn, M. E., Schilling, S. L., & Gonzalez - Fernandez, I. (201 5). An empirical inferential method of estimating nitrogen deposition to Mediterranean - t ype ecosystems: The San Bernard ino mountains case study. Environmental Polluti on , 203 (2), 69 88. https://doi.org/10.1016/j.envpol.2015.03.028 Canter, L. W., & Canter, L . W. (2019). Influence of Subsurface Processes. In Nitrates in Groundwate r . https://doi. org/10.1201/97802037 45793 - 2 Cao , P., Lu, C., & Yu, Z. (2018). Historical nitro gen fertilizer use in agricultural ecosystems of the contiguous United States during 1850 - 2015: Application rate, timing, and fertilizer types. Earth System Scienc e Data , 10 (2), 969 984. https://doi .org/10.519 4/essd - 10 - 969 - 2018 Carmichael, W. W., Azevedo, S. M. F. O., An, J. S., Molica, R. J. R., Jochimsen, E. M., Lau, S., 001). Human fatalities form cyanobacteria: Chemical and biological eviden ce for cyanotox ins. Environmental H ealth Persp ectives , 109 (7), 663 668. https://doi.org/10.12 89/ehp.01109663 133 Charles, A., Rochette, P., Whalen, J. K., Angers, D. A., Chantigny, M. H., & Bertrand, N. (2017). Global nitrous oxide emission factors from agricu ltural soils af ter addition of orga nic amendme nts: A meta - analysis. Agriculture, Ecosystems a nd Environment , 236 (3), 88 98. https://doi.org/10.1016/j.agee.2016.11.021 Cleveland, C. C. , Houlton, B. Z., Smith, W. K., Marklein, A. R., Reed, S. C., Parton, W., Running, S. W. (2013). Patterns of new vers us recycled primary production in the terrestri al biosphere. Proceedings of the National Academy of Sciences of the United States of Amer ica , 110 (31), 12733 12737. https://doi.org/10.1073/pnas.1302768110 Courau lt, D., Demarez , V., Guérif, M., Le Page, M., Simonneaux, V., Ferrant, S., & Veloso, A. (2016 ). Contribution of Remote Sensing for Crop and Water Monitoring. In N. Baghdadi & M. Zribi (Eds.), Land Surface Remote Sensing in Agriculture and Forest (pp. 113 1 77). Elsevier. https://doi.org/10.1 016/B978 - 1 - 78548 - 103 - 1.50004 - 2 David, M. B., Drinkwater, L . E., & Mcisaac, G. F. (2010). Sources of Nitrate Yields in the Mississippi River Basin. J ournal of Environmental Quality , 39 (5), 1657 1667. https://doi.org/ 10.213 4/jeq2010.0115 David, M. B., & Gent ry, L. E. (2000). Anthropogenic Inputs of Nitrogen and Pho sphorus and Riverine Export for Illinois, USA. Journal of Environmental Quality , 29 (2), 49 4 508. https://doi.org/10.2134/jeq2000.00472425002900020018x David, M. B. , Gentry, L. E. , Kovacic, D. A., & Smith, K. M. (1997). Nitrogen balance in and export from a n agricultural watershed. Journal of Environmental Quality , 26 (4), 1038 1048. https://doi. org/10.2134/jeq1997.00472425002600040015x K. E. (2000). Classif ication and regressi on trees: A powerful yet simple technique for ecological d ata analysis. Ecology , 81 (11), 3178 3192. https://doi.org/10.1890/0012 - 9658(2000)081[3178: CARTAP]2.0.CO;2 Delumyea, R. G., & Petel, R. L. (1978). Wet and dry depos ition of phosph orus into Lake Huron . Water, Air, and Soil Pollution , 10 (2), 187 198. https:// doi.org/10.1007/BF00464714 Dodds, W. K., Bouska, W. W., Eitzmann, J. L., Pilger, T. J., Pi Thornbrugh, D. J. (2009). Policy Analys is: Eu trophication of U . S . Freshwaters Environmental Science & Technology , 43 (1), 12 19. https://doi.org/10.1021/es801217q Downing, J. A., & Mccauley, E. (1992). The nitrogen phosphorous relationship in lakes. Limnology and Oceanography , 37 (3 7), 93 6 945. Eimers, M. C., Watmough, S. A., Paterson, A. M., Dillon, P. J., & Yao, H. (2009). Long - term declines in phosphorus export from forested catchm ents in south - central Ontario. Canadian Journal of Fisheries and Aquatic Sciences , 66 (10), 1682 1692. https ://doi.org/10.1 139/F09 - 101 134 Eisenrei ch, S. J., Emmling, P. J., & Beeton, A. M. (1977). Atmospheric Loading of Phosphorus and Other Chemicals to Lake M ichigan. Journal of Great Lakes Research , 3 (3 4), 291 304. https://doi.org/10.1016/S0380 - 1330(77)7226 1 - 0 El ith, J., Leathw ick, J. R., & Hastie , T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology , 77 (4), 802 813. https://doi.org/10.1111/j.1365 - 2656.2008.01390.x Environment and Climate Change Canada (2016). Great Lakes Precipi tation Network. https ://doi.org/10.17616/ R3J349 ESRI. (2020). ArcGIS Pro Version 2.5. ESRI. (2020). What is Empirical Bayesian kriging? Retrieved March 5, 2020, from https://pro.arcgis.com/en/pro - app/help/analysis/geostatistical - analyst/what - is - empirical - b ayesia n - krig ing - .htm Farrand, W., & Bell, D. (1982). 1982 Quaternary Geology. Dept. of Geological Sciences, Univeristy of Michigan. Geologic Survey Division, MDEQ. Division Geographic Information Services Unit, Resource Mapping and Aerial Photography, MDNR . Retr ieved from http ://www.dnr.state.mi. us/spatialdatalibrary/metadata/quaternary_geology.htm Fitzsimmons, E. G. (2014, August 3). Tap Water Ban for Toledo Residents. The New York Times . Foster, N. W. (1974). Annual Macroelement Transfer From Pinus Banksi ana La mb. Fo rest to S oil. Canadian Journa l of Forest Research , 4 (4), 470 476. https://doi.org/10.1139/x74 - 069 Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder b y the author s). The A nnals of Statistics , 28 (2), 337 407. https://doi.org/10.1214/aos/1016218223 GCSAA. (2009). Golf course environment profile: volume III summary nutrient use and management on U.S. golf courses . Gentry, L. E., David, M. B., Smith, K. M., & Kov acic, D. A. (19 98). Nitrogen cyclin g and tile drainage nitrate loss in a corn/soybean watershed. Agriculture, Ecosystems and Environment , 68 (1 2), 85 97. https://doi.org/10.1016/S0167 - 8809(97)00139 - 4 Goolsby, D. A., Battaglin, W. A., Lawrence, G. B., Artz, R. S. , Aulenba ch, B. T., Hooper, R . P., - Atchafalaya River Basin: Topic 3 Report for the Integrated Assessment of Hypoxia in the Gulf of Mexico. Silver Spring, MD: NOAA Coast al Oce an Pro gram. Goy ette, J., Bennett, E . M., Howarth, R. W., & Maranger, R. (2016). Changes in anthropogenic nirogen and phosphorus inputs to the St. Lawrence sub - basin over 110 years and impacts on 135 riverine export. Global Biogeochemical Cycles , 1 15. ht tps:// doi.or g/10.1002 /2016GB005384.Receiv ed Gronberg, J. A. M., & Spahr, N. E. (2012). County - Level Estimates of Nitrogen and Phosphorus from Commercial Fertilizer for the Conterminous United States, 1987 - 2006 . Reston, VA. Retrieved from https://pubs.usgs. gov/si r/2012 /5207/pdf /sir20125207.pdf Ham lin, Q. F., Kendall, A. D., Martin, S. L., Whitenack, H. D., Roush, J. A., Hannah, B. A., & Hyndman, D. W. (2020). Quantifying Landscape Nutrient Inputs With Spatially Explicit Nutrient Source Estimate Maps. Journal of Ge ophysi cal Resea rch: Biogeosciences , 125 (2), 1 24. https://doi.org/10.1029/2019JG005134 Hamlin, Q. F., Kendall, A. D., Martin, S. L., Whitenack, H. D., Roush, J. A., Hannah, B. A., & Hyndman, D. W. (2020). SENSEmap - USGLB: Nitrogen and Phosphorus Input s. Hyd roshar e . https: //doi.org/https://do i.org/10.4211/hs.1a116e5460e24177999c7bd6f8292421 Han, H., & Allan, J. D. (2008). Estimation of nitrogen inputs to catchments: Comparison of methods and consequences for riverine export prediction. Biogeochemistry , 91 (2 3 ), 177 199. htt ps://doi.org/10.1007 /s10533 - 008 - 9279 - 3 Hargan, K. E., Paterson, A. M., & Dillon, P. J. (2011). A total phosphorus budget for the Lake of the Woods and the Rainy River catchment. Journal of Great Lakes Research . https://doi.org/10.1016/ j.jglr .2011. 09.001 Ho lman, I. P., Whelan, M. J., Howden, N. J. K., Bellamy, P. H., Willby, N. J., Rivas - Casado, M., & McConvey, P. (2008). Phosphorus in groundwater - an overlooked contributor to eutrophication? Hydrological Processes , 22 , 5121 5127. https ://doi .org/10 .1002/hy p.7198 Homer, C. G., Completion of the 2011 National Land Cover Database for the conterminous United States - Representing a decade of land cover change info rmatio n. Phot ogrammet ric Engineering and Remote Sensing , 81 (5), 345 354. https://doi.org/10.14358/PERS.81.5.345 Hong, B., Swaney, D. P., & Howarth, R. W. (2013). Estimating net anthropogenic nitrogen inputs to U.S. watersheds: Comparison of methodologies. Enviro nmental Science and Technology . htt ps://doi.org/10.1021/es303437c Howarth, R. W., Billen, G., Swaney, D., Townsend, A., Jaworski, N., - Liang , Z. (1996). Regional nitrogen budgets and riverine N & P fluxes for the drainages to the No rth At lantic Ocean: N atural and human inf luences. Biogeochemistry , 35 , 75 139. Howarth, R. W. (2008). Coastal nitrogen pollution: A revie w of sources and trends globally and regionally. Harmful Algae , 8 (1), 14 20. https://doi.org/10.1016/j.hal.2008.08.015 136 Howart h, R., Swaney, D., Billen, G., Garn Nitrogen fluxes from the landscape are cont rolled by net anthropogenic nitrogen inputs and by climate. Frontiers in Ecology and the Environment . https://doi.org/1 0.1890 /100178 Hussain , I., Pilz, J., & Sp oeck, G. (2010). Hierarchical Bayesian space - time interpolation versus spatio - temporal BME appro ach. Advances in Geosciences , 25 (March), 97 102. https://doi.org/10.5194/adgeo - 25 - 97 - 2010 Hyndman, D. W., Kendall, A. D ., & W elty, N. R. H. (2007). Evaluating t emporal and spatial variations in recharge and streamflow using the integrated landscape hydrolo gy model (ILHM). Geophysical Monograph Series , 171 , 121 141. https://doi.org/10.1029/171GM11 Jain, A. K. (2010). Data c luster ing: 50 years b eyond K - means. Patte rn Recognition Letters , 31 (8), 651 666. https://doi.org/10.1016/j.patrec.2009.09.011 Kaiser, D. E., Lamb, J. A., & Eliason, R. (2011). Fertilizer guidelines for agronomic crops in Minnesota. University of Minnesota Extens ion , 1 44. Retr ieved from www.exten sion.umn.edu/nutrient - management/%0ABU - 06240 - S Keeler, B. L., Isbell, F., Hill, J. D., Tessum, C . W., Marshall, J. D., Gourevitch, J. D., & Polasky, S. (2016). The social costs of nitrogen. Science Advances , (Octobe r). ht tps://doi.org/1 0.1126/sciadv.160021 9 Kellogg, R. L., Lander, C. H., Moffitt, D. C., & Gollehon, N. (2000). Manure nutrients relativ e to the capacity of cropland and pastureland to assimilate nutrients: spatial and temporal trends for the United State s . U .S . Department of Agriculture: Natura l Resources Conserv e Service. Kendall, A. (2009). Predicting the impacts of land use and climate on regional - scale hydrologic fluxes. Michigan State University. Koe lliker, Y., Totten, L. A., Gigliotti, C. L., Offenb erg, J . H., Reinfelde r, J. R., Zhuang, Y. , & Eisenreich, S. J. (2004). Atmospheric wet deposition of total phosphorus in New Jersey. Water, Air, and Soil Pollution , 154 (1 4), 139 150. https://doi.org/10.10 23/B:WATE.0000022952.12577.c5 Kothawala, D. N., Wat mough, S. A., Futter, M. N., Zhang, L., & Dillon, P. J. (2011). Stream Nitrate Responds Rapidly to Decreasing Nitrate Deposition. Ecosystems , 14 (2), 274 286. https://doi.org/10.1007/s10021 - 011 - 9422 - 1 Lark, T. J., Mueller, R. M., Johnson, D. M., & Gibbs, H. K. (2 017). Measuring land - use and land - c and recommendations. International Journal of Applied Earth Observation and Geo information , 62 (June), 224 235. https://doi.org/10. 1016/j .jag.2017.06.00 7 137 Linsey, C. A., Sch indler, D. W., & Stainton, M. P. (1987). Atmospheric Deposition of Nutrients and Major Ions at the Experimental Lakes Area in Northwestern Ontario, 1970 to 1982. Can. J. Fish. Aquat. Sci. , 44 , 206 214. Long, C. M., Muenic h, R. L., Kalci c, M. M., & Scavia, D. (2018). Use of manure nutrients from concentrated animal feeding operations. Journal of Great Lakes Research , 44 (2), 245 252. https://doi.org/10.1016/j.jglr.20 18.01.006 Lorimor, J., Powers, W., & Sutton, A. (2008 ). Man ure Characteris tics. Mwps - 18 , 1 24. Luscz, E. C., Kendall, A. D., & Hyndman, D. W. (2017). A spatially explicit statistical model to quantify nutrient sources, pathways, and delivery at the regiona l scale. Biogeochemistry , 133 (1), 37 57. https://doi. org/10 .1007/s10533 - 01 7 - 0305 - 1 Luscz, E. C ., Kendall, A. D., & Hyndman, D. W. (2015). High resolution spatially explicit nutrient source models for the Lower Peninsula of Michigan. Journal of Great Lakes Research , 41 , 618 629. https://doi.org/10.1016/j.jglr .2015. 02.004 Mahowald , N., Jickells, T. D ., Baker, A. R., Artaxo, P., Benitez - Nelson, C. R., Bergametti, G., concentrations an d deposition rates, and anthropogenic impacts. Global Bioge ochemical Cycle s , 22 (4), 1 19. http s://doi.org/10.1029/2008GB003240 Martin, S. L., Hayes, D. B., Kendall, A. D., & Hyndman, D. W. (2017). The land - use legacy effect: Towards a mechanistic understan ding of time - lagged water quality responses to land u se/cov er. Science of the Total Environmen t , 579 , 1794 1803. https://doi.org/10.1016/j.scitotenv.2016.11.158 Martin, S. L., Hayes, D. B., Rutledge, D. T., & Hyndman, D. W. (2011). The land - use legacy effe ct: Adding temporal context to lake chemistry. Limnol ogy an d Oceanography , 56 (6), 2362 2370. h ttps://doi.org/10.4319/lo.2011.56.6.2362 Meals, D. W., Dressing, S. A., & Davenport, T. E. (2010). Lag Time in Water Quality Response to Best Management Practices : A Review. Journal of Environment Quality , 39 (1), 85 . http s://doi.org/10. 2134/jeq2009.0108 Me isinger, J. J., & Randall, G. W. (1991). Estimating Nitrogen Budgets for Soil - Crop Systems. In R. F. Follett, D. R. Keeney, & R. M. Cruse (Eds.), Managing nitroge n for groundwater quality and farm profitability (pp. 85 12 4). Madison, WI : Soil Science Socie ty of America. https://doi.org/10.2136/1991.managingnitrogen.c5 Messier, K. P., Wheeler, D. C., Flory, A. R., Jones, R. R., Patel, D., Nolan, B. T., & Ward, M. H. (2019). Modeling groundwater nitrate exposure in pri vate w ells of North C arolina for the Agri cultural Health Study. Science of the Total Environment , 655 , 512 519. https://doi.org/10.1016/j.scitotenv.2018.11.022 138 Michalak, A. M., Anderson, E. J., Beletsky, Za gorski , M. A. (2013). Record - setting alga l bloom in Lake Erie caused by agricultural and meteorological trends consistent with expected future conditions. Proceedings of the National Academy of Sciences , 201216006 . https://doi.org/10.1073/pnas.1216006110 M ichiga n Waterwells fo r WELLOGIC dataset. (2019). Lansing, MI: Michigan State University Remote Sensing and GIS Research and Outreach Services. Retrieved November 1, 2019 from http://gis - michigan.opendata.arcgis.com/datasets/water - wells - southwest - michigan M onks, P. S., Granier, C., Fuzzi, S., Stoh (2009). Atmospheric composition change - global and regional air quality. Atmospheric Environment , 43 (33), 5 268 5350. https://doi.org/10.1016/j.atmosenv.2009.08. 021 Mo tevalli, A., Na ghibi, S. A., Hashem i, H., Berndtsson, R., Pradhan, B., & Gholami, V. (2019). Inverse method using boosted regression tree and k - nearest neighbor to quantify effects of point and non - point source nitrate pollution in groundwater. Journ al of Cleaner Product ion , 228 , 1248 1263. https://doi.org/10.1016/j.jclepro.2019.04.293 Munger, J. W. (1982). Chemistry of atmospheric precipitation in the north - central united states: Influence of sulfate, nitrate, ammonia and calcareous soil particulates . Atmo spheric Environ ment (1967) , 16 (7), 1633 1645. https://doi.org/10.1016/0004 - 6981(82 )90258 - X Murphy, T. J. (1974). Sources of phosphorus inputs from the atmosphere and their significance to oligotrophic lakes . WRC Research Report . Retrieved from http:/ /web.e xtension.illino is.edu/iwrc/pdf/92.p df Nolan, B. T., Fienen, M. N., & Lorenz, D. L. (2015). A statistical learning framework for groundwater nitrate models of the Central Valley, California, USA. Journal of Hydrology , 531 , 902 911. https://doi.org/10. 1016/j .jhydrol.2015.1 0.025 Paerl, H. W., & Otten, T. G. (2013). Harmful Cyanobacterial Blooms: Causes, Consequences, and Controls . Microbial Ecology , 65 (4), 995 1010. https://doi.org/10.1007/s00248 - 012 - 0159 - y Pearson, F. J., & Fisher, D. W. (1983). Chemica l Comp osition of Atmo spheric Precipitatio n in the Northeastern United States . Retrieved from https://pubs.usgs.gov/wsp/1535p/repo rt.pdf E. (2011). Scikit - learn: M achine Learning in Py thon. Journal of Mac hine Learning Research , 12 , 2825 2830. PRISM (2012). PRISM Climate Group, Oregon State University. http://www.prism.oregonstate.edu/ Preston, S. D., Alexander, R. B., Schwarz, G. E., & Crawford, C. G. (2011). Factor s Affe cting Stream Nu trient Loads: A Synt hesis of Regional SPARROW Model Results for the 139 Continental United States. Journal of the American Water Resources Association , 47 (5), 891 915. https://doi.org/10.1111/j.1752 - 1688.20 11.00577.x Quinlan, J. (1996). Ba gging, Boosting, and C4.5. Proceedings Th irteenth American Association for Artificial Intelligence National Conference on Artificial Intelligence , 725 730. Ransom, K. M., Nolan, B. T., A. Traum, J., Faunt, C. C., Bell, A. M Harter, T. (20 17). A hybrid m achine learning mode l to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA. Science of the Total Environment , 601 602 , 1160 1172. https://doi.org/10.1016 /j.scitotenv.2017.05.192 Ray, D., Pijan owski, B. C., K endall, A. D., & Hyn dman, D. W. (2012). Coupling land use and groundwater models to map land use legacies: Assessment of model uncertainties relevant to land use planning. Applied Geography , 34 , 356 370 . https://doi.org/10.1016/j.apgeo g.2012 .01.002 Roberts on, D. M., & Saad, D . A. (2011). Nutrient Inputs to the Laurentian Great Lakes by Source and Watersh ed Estimted Using SPARROW Watershed Models. Journal of the American Water Resources Association , 47 (5), 1011 1033. https://doi.org/10.1 111/j. 1752 - 1688.2011. 00574.x Robertson, D . M., Rose, W. J., & Fitzpatrick, F. A. (2009). Water Quality and Hydrology of S ilver Lake, Barron County, Wisconsin, With Special Emphasis on Responses of a Terminal Lake to Changes in Phosphorus Loading and Water Level, 50. Rosen, C. J., Horgan, B. P., & Mugaas, R. J. (2015). Fertilizing lawns. Retrieved May 5, 2015, from http://www .extension.umn.edu:80/garden/yard - garden/lawns/fertilizing - lawns/ Ruddy, B. C., Lorenz, D. L., & Mueller, D. K. (2006). County - Level Es timate s of Nutrient I nputs to the Land Su rface of the Conterminous United States , 1982 2001. Reston, VA: U.S. Geologic al Survey. https://doi.org/ 10.3133/sir20065012 Ruggles, S., Flood, S., Goeken, R., Grover, J., Meyer, E., Paca, J., & Sobek, M. (2019). IPUMS USA: Version 9. 0: Households . Minne apolis, MN: IPUMS. https://doi.org/https://doi.org/10.18128/D010.V9.0 Schaap, M. G., Leij, F. J., & Van Genuchten, M. T. (2001). Rosetta: A computer program for estimating soil hydraulic parameters wit hierarchical pedotr ansfer function s. Journal of Hydrol ogy , 25 (3 4), 163 176. https://doi.org/10.1061/(ASCE)IR.1943 - 4774.0000007 Scheider, W. A., Snyder, W. R., & Clark, B. (1979). Deposition of nutrients and major ions by precipitation in south - central Ontario. Water, Air, a nd Soil Polluti on . https://doi.org/ 10.1007/BF01047121 Schindler, D. W., Armstrong, F. A. J., Holmgren, S. K., & Brunskill, G. J. (1971). Eutrophication of Lake 227, Experimental Lakes Area, Northwestern Ontario, by Addition 140 of Phosphate and Nitrate. Journa l of the Fisher ies Research Board o f Canada , 28 (11), 1 763 1782. https://doi.org/10.1139/f71 - 261 Schindler, D. W., Kling, H., Schmidt, R. V, Prokopowich, J., Frost, V. E., Reid, R. A., & Capel, M. (1973). Eutrophication of Lake 227 by Addition of Phos phate and Nitrate: th e Second, Third, and Fourth Years of En richment, 1970, 1971, and 1972. Journal of the Fisheries Research Board of Canada , 30 (10), 1415 1440. https://doi.org/10.1139/f73 - 233 Schindler, D. W., Newbury, R. W., Beaty, K. G., & Campbell, P. (1976 ). Natural Wate r and Chemical Budge ts for a Small Prec ambrian Lake Basin in Central Canada. Journal of Fisheries Research Board of Canada , 33 (11), 2526 2543. https://doi.org/10.1139/f76 - 297 Schindler, D. W., & Nighswander, J. E. (1970). Nutrient supp ly and primary produc tion in clear Lake, Eastern Ontario. Jo urnal of the Fisheries Research Board of Canada , 27 (11), 2009 Schullehner, J., Hansen, B., Thygesen, M., Pedersen, C. B., & Sigsgaard, T. (201 8). Ni trate in drinki ng water and colorec tal cancer risk: A nationwide population - based cohort study. International Journal of Cancer , 143 (1), 73 79. https://doi.org/10.1002/ijc.31306 Schwede, D. B., & Lear, G. G. (2014). A novel hybrid approach for estima ting t otal deposition in the United State s. Atmospheric Envi ronment , 92 , 207 220. https://doi.org/10.1016/j.atmosenv.2014.04.008 Scown, M. W., McManus, M. G., Carson, J. H., & Nietch, C. T. (2017). Improving Predictive Models of In - Stream Phosphorus Concen tratio n Based on Nati onally - Available Spa tial Data Coverages . Journal of the American Water Resources Association , 53 (4), 944 960. https://doi.org/10.1111/1752 - 1688.12543 Paole tti, E. (2016). Global topics and n ovel approaches in the study of air pollution, climate change and forest ecosystems. Environmental Pollution , 213 , 977 987. https://doi.org/10.1016/j.envpol.2016.01.075 Smith, R. A., Schwarz, G. E., & Alexander, R. B. (19 97). Regional i nterpretation of wat er - quality monitoring data. Water Resources Research , 33 (12), 2781 2798. Smith, V. H., Tilman, G. D., & Nekola, J. C. (1999). Eutrophication: impacts of excess nutrient inputs on freshwater, marine, an d terrestrial ecosys tems. Environme ntal Pollution , 100 , 179 196. https://doi.org/10.1016/S0269 - 7491(99)00091 - 3 Soller, D., & Garrity, C. (2018). Quaternary sediment thickness and bedrock topography of the glaciated United States east of the Rocky Mountains . U.S Geologic al Sur vey Scientific Investigations Map 3 392. https://doi.org/10.3133/sim3392 141 Sterner, R. (2011). C:N:P stoichiometry in Lake Superior: freshwater sea as end member. Inland Waters , 1 (1), 29 46. https://doi.org/10.5268/IW - 1.1.365 Stevenson, F. J. (1994). Hu mus Ch emistry: Genesi s, Composition, Reac tions . John Wiley & Sons, Inc. Swaney, D. P., Howarth, R. W., & Hong, B. (2018). Nitrogen use efficiency and crop production: Patterns of regional variation in the United States, 1987 2012. Science of the Total Envi ronmen t , 635 , 498 511 . https://doi.org/10 .1016/j.scitotenv.2018.04.027 Swank, W. T., & Henderson, G. S. (1976). Atmospheric Input of Some Cations and Anions to Forest Ecosystems in North - Carolina and Tennessee. Water Resources Research , 12 (3) , 541 546. Tem kin, A ., Evans, S., M anidis, T., Campbell , C., & Naidenko, O. V. (2019). Exposure - based assessment and economic valuation of adverse birth outcomes and cancer risk due to nitrate in United States drinking water. Environmental Research , (Decem ber 2018), 108 442. h ttps://doi.org/ 10.1016/j.envres.201 9.04.009 Atmospheric deposition of phosphorus to land and freshwater. Environmental Sciences: Processes and Impacts , 1 6 (7), 1608 1617. http s://doi.org/10.1039/ c3em00641g Tuomisto, H. L., Hodge, I. D., Riordan, P., & Macdonald, D. W. (2012). Does organic farming reduce environmental impacts? - A meta - analysis of European research. Journal of Environmental Man agement , 112 (8 34), 3 09 320. https:/ /doi.org/10.1016/j.j envman.2012.08.018 U S Clean Water Act (1972). 33 U.S.C. §§1251 1387 . US Census Bureau. (2010). US 2010 Census. US Census Bureau . (2020). U.S. Census Michigan QuickFacts. Retrieved Mar ch 6, 2020, from https://www.cen sus.go v/quickfacts/MI US Department of Ag riculture National Agricultural Statistics Service. (2012). Census of Agriculture. US Department of Agriculture National Agricultural Statistics Service. ( 2015 ). Croplan d Data Layer 2008 - 2015. Washington, D.C.: USD A - NASS . Retrieved fro m https://nassgeodat a.gmu.edu/CropScape/ U S Department of Agriculture Natura l Resources Con servation Service ( NRCS ). (2008). Agricultural Waste Characteristics. 142 US Department of Agriculture Natural Resources Con servation Service (NRCS ). (20 17). Gridded So il Survey Geographic (gSSURGO) Database: User Guide, version 2.2. US Environmental Protection Agency . (2010). Clean Air Status and Trends Network (CASTNET). Retrieved from https://www.epa.gov/castnet US Environmental Protection Agency . (2012 ). Clean Waters hed Needs Survey (CW NS) - 2012 Report and Data. Retrieved from https://www.epa.gov/cwns US Environmental Protection Agency . (2012). Clean Watershed Needs Survey (CWNS) - 2012 Report and Data. Retrieved from https://www.epa.gov/cw ns US Enviro nmental Protect ion Agency . (2017). Discharge Monitoring Report (DMR) Pollutant Loading Tools. Retrieved July 10, 2017, from https://cfpub.epa.gov/dmr/ US Environmental Protection Agency . (1991). Guidance for Water Quality - based Decisions: The TMDL Pr ocess, 440/4 - 91 0 . US Environmental Prote ction Agency . (2002). Onsite wastewater treatment s ystems manual . US Geological Survey, US Department of Agriculture, N atural Resources Conservation Service. (2013). Federal standards and procedures for the National Water shed Boundary D ataset (WBD). Techni ques and Methods 11 - A3 . USGS. https://doi.org/10.3133/tm11A34 Van Meter, K. J., Basu, N. B., & Van Cappellen, P. (2017). Two centuries of nitrogen dynamics: Legacy sources and sinks in the Mississippi and Susquehann a Rive r Basins. Globa l Biogeochemical Cyc les , 31 (1), 2 23. https://doi.org/10.1002/2016GB005498 Van Meter, K. J., & Basu , N. B. (2015). Catchment legacies and time lags: A parsimonious watershed model to predict the effects of legacy storage on nitrogen ex port. PLoS ONE , 10 (5) , 1 22. https://doi. org/10.1371/journal.pone.0125971 Verhougstraete, M. P., Martin, S. L., Kendall, A. D., Hyndman, D. W., & Rose, J. B. (2015). Linking fecal bacteria in rivers to landscape, geochemical, and hydrologic factors and sou rces a t the basin sca le. Proceedings of t he National Academy of Sciences , 112 (33), 10419 10424. https://doi.org/10.1073/ pnas.1415836112 Vero, S. E., Basu, N. B., Van Meter, K., Richards, K. G., Mellander, P. - E., Healy, M. G., & Fenton, O. (2017). Review: t he env ironmental stat us and implications of the nitrate time lag in Europe and North America. Hydrogeology Journal , 7 22 . https://doi.org/10.1007/s10040 - 017 - 1650 - 9 Ti lman, D. G. (19 97). Human alteratio n of the global nitrogen cycle: Sources and consequences. Ecological Applicatio ns . https://doi.org/10.1890/1051 - 0761(1997)0 07[0737:HAOTGN]2.0.CO;2 143 Walton, G. (1951). Survey of literature relating to infant methemoglobinem ia due to nitra te - contaminated wate r. American Journal of Public Health . https://doi.org/10.2105/ajph.41.8_pt_1.98 6 Wan, L., Kendall, A., Martin, S., Hamlin, Q., & Hyndman, D. (n.d.). Quantifying Nutrient Loads to the Great Lakes Cosatline with a Spatially Explicit Nutrient Transport Model. Ward, M. H., deKok, T. M., Levallois, P., Brender, J., Gulis, G., Nolan, B. T. , & VanDerslice, J. (2005). Workgroup report : Drinking - water nitrate and health - Recent findings and research needs. Environmental Health Perspectives , 113 (11), 1607 1614. http s://doi.org/10.1289/ehp.8043 Ward, M. H., Jones, R. R., Brender, J. D., de Kok, Br eda, S. G. (2018). Drinking water nitrate and human health: An updated review. International Journal of Environmental Research and Public Health , 15 (7), 1 31. https://doi.org/10.3390/ijerph15071557 Warncke, D. D., & Dahl, J. (2003). Nutrient Recommendation s for Vegetable Crops Grown in Michigan . Retrieved from https://msu.edu/~warnck e/Nutrient Management Info 3.1 Nutrient Recommendations for Vegetable Crops Grown in Michigan The Structure.pdf Warncke, D., & Dahl, J. (2004). Nutrient Recommendations for Fiel d Crops in Michigan (E2904). Management . Michigan State University Extension. R etrieved from https://msu.edu/~warncke/Nutrient Management Info 3.1 Nutrient Recommendations for Vegetable Crops Grown in Michigan The Structure.pdf Warncke, D., Dahl, J., & Jac obs, L. (2004). Nutrient Recommendation s for Field Crops in Michigan. Extension Bulletin , E2904 (May). Wheeler, D. C., Nolan, B. T., Flory, A. R., DellaValle, C. T., & Ward, M. H. (2015). Modeling groundwater nitrate concentrations in private wells in Iowa. Science of the Total Environment , 536 , 481 488. https://doi.org/10.1016/j.scit otenv.2015.07.080 Wieczorek, M. E., & Lamotte, A. E. (2011). Attributes for MRB_E2RF1 Catchments by Major River Basins in the Conterminous United Stages (U.S. Geological Survey Digital Data Series DS - 491). Retrieved from http://water.usgs.gov/nawqa/modelin g/rf1attributes.html Winter, J. G., Catherine Eimers, M., Dillon, P. J., Scott, L. D., Scheider, W. A., & Willox, C. C. (2007). Phosphorus Inputs to Lake Simcoe from 1990 to 200 3: Declines in Tributary Loads and Obse rvations on Lake Water Quality. Journal of Great Lakes Research , 33 (1), 46 61. https://doi.org/10.3394/0380 - 1330(2007)33 Winter, J. G., Dillon, P. J., Futter, M. N., Nicholls, K. H., Scheider, W. A., & Scott, L. D. (2 002). Total phosphorus budgets and nitr ogen loads: Lake Simcoe, Ontario (1990 t o 1998). Journal of Great Lakes Research , 28 (3), 301 314. https://doi.org/10.1016/S0380 - 1330(02)70586 - 8 144 Withers, P. J. A., Neal, C., Jarvie, H. P., & Doody, D. G. (2014). Agricu lture and eutrophication: Where do we g o from here? Sustainability (Switzerland ) , 6 (9), 5853 5875. https://doi.org/10.3390/su6095853 Wright, R. F. (1976). The Impact of Forest Fire on the Nutrient Influxes to Small Lakes in Northeastern Minnesota. Ecology , 57 (4), 649 663. https://doi.org/10.230 7/1936180 Yang, X., Miller, D. R., Xu, X ., Yang, L. H., Chen, H. M., & Nikolaidis, N. P. (1996). Spatial and temporal variations of atmospheric deposition in interior and coastal Connecticut. Atmospheric Environment , 30 (22), 3801 3810. https://doi.org/10.1 016/1352 - 2310(96)00094 - 5