THE IMPORTANCE OF THE FRESHWATER LANDSCAPE, CONNECTIVITY, AND REGIONAL PROCESSES FOR UNDERSTANDING SPATIAL PATTERNS AND DRIVERS OF LAKE, STREAM, AND WETLAND PROPERTIES AT MACROSCALES By Katelyn Beth Shank King A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Fisheries and Wildlife – Doctor of Philosophy 2021 ABSTRACT THE IMPORTANCE OF THE FRESHWATER LANDSCAPE, CONNECTIVITY, AND REGIONAL PROCESSES FOR UNDERSTANDING SPATIAL PATTERNS AND DRIVERS OF LAKE, STREAM, AND WETLAND PROPERTIES AT MACROSCALES By Katelyn Beth Shank King Freshwater ecosystems are a good model for studying questions associated with environmental change because their nutrients and biota reflect changes in their surrounding watershed. Although many studies of fresh waters focus on understanding how the terrestrial landscape affects lake, stream, and wetland properties, it is becoming more widely recognized that we must also consider the freshwater landscape, including connections (or lack thereof) among surface waters, to understand and manage freshwater ecosystems. However, freshwater types continue to be studied individually (e.g., lakes or streams or wetlands) or at relatively local scales (e.g., an individual lake to tens of lakes within a watershed), whereas environmental changes such as land use intensification, climate change, and the spread of non-native species affect all freshwater types and often occur at broad spatial scales such as regions, continents, and the globe. Understanding patterns of freshwater properties at broad-extents can be complicated by the influence of drivers operating at different spatial scales (i.e. cross-scale interactions). An emergent specialty in ecology, macrosystems ecology, provides a framework for broad-extent, multi-scale, and cross-freshwater type studies, improving predictions and contributing to the understanding of freshwater ecosystem responses to change at the regional to continental extents relevant for management and policy. In addition, the recent rise of open science perspectives and advancements in computational tools makes it possible to collate data and perform analysis at the macroscale. My dissertation uses this theoretical foundation and publicly-available databases to understand macroscale patterns in nutrients and fish biodiversity across freshwater types and the processes that may underlie those patterns. In my first chapter, I compiled total phosphorus, total nitrogen, chlorophyll a, and percent macrophyte cover from over 3,500 lakes, streams, and wetlands sampled by the Environmental Protection Agency’s National Aquatic Resource Surveys across the continental U.S. This research led to the understanding that these different freshwater types may share similar patterns and drivers of nutrients across the U.S., but different patterns and drivers for biotic properties. For my second chapter, I further investigated biotic properties across lakes and streams, specifically focusing on fish biodiversity patterns and connections between lakes and streams. I used fish data from 559 lakes and 854 streams from the midwestern/northeastern U.S. and found that discrete connectivity classes helped explain variation in fish species composition and richness across lakes and streams. My third chapter is a data paper that describes methods for creating a database (LAGOS-NETWORKS) that includes a suite of surface connectivity metrics for 86,511 lakes and 898 networks in the U.S. This is the first database to provide accessible and comprehensive lake network metrics at the national scale. LAGOS-NETWORKS was used in my final chapter, where I used these continuous connectivity metrics and other multi-scale drivers to investigate how the effects of connectivity on fish species richness change with regional-scale land use. I found that connectivity had different effects on lake and stream fish species richness depending on regional-scale agricultural land use, showing a cross-scale interaction (CSI) and the effect of this CSI differed by freshwater type. Collectively, my dissertation uses multi-scale, cross-scale, and an integrated freshwater landscape approach to further understand patterns and processes in aquatic ecosystems at the macroscale. ACKNOWLEDGMENTS I would first like to thank my advisor, Dr. Kendra Spence Cheruvelil, who has been a wonderful mentor. She allowed me the flexibility to pursue my own interests in research and in professional development. Because of her support and continued motivation, my PhD experience was a positive one. In addition, I’d like to thank other members of my lab, Dr. Patricia Soranno, Dr. Patrick Hanley, Dr. Ian McCullough, Autumn Poisson, Dr. Nick Skaff, Nicole Smith, and Dr. Joe Stachelek who consistently reviewed my work, helped with coding, and listened to many practice presentations. I would also like to recognize my committee members Dr. Dana Infante and Dr. Gary Roloff for their guidance throughout the dissertation process. Finally, I would like to thank Joel King and the rest of my family for continually believing in me and cheering me on. “Full Stream Ahead” iv PREFACE The chapters in this dissertation were written as separate papers. Each paper was written with me as lead author and several collaborators as co-authors. Two of my chapters have been published in academic peer-reviewed journals, the third chapter has undergone one round of peer review by a journal, and the fourth chapter is anticipated for submission. The citation for each chapter is as follows: • King, K., Cheruvelil, K. S., and Pollard, A. 2019. Drivers and spatial structure of abiotic and biotic properties of lakes, wetlands, and streams at the national scale. Ecological Applications 29: e01957. https://doi.org/10.1002/eap.1957 • King, K.B.S., Bremigan, M.T., Infante, D., and Cheruvelil, K.S. 2021. Surface water connectivity affects lake and stream fish species richness and composition. Canadian Journal of Fisheries and Aquatic Sciences. http://dx.doi.org/10.1139/cjfas-2020-0090 • King, K., Wang, Q., Rodriguez, L.K., and Cheruvelil, K.S. In revision. Lake networks for the conterminous U.S. (LAGOS-US NETWORKS). Limnology and Oceanography Letters. Submitted for peer review December 2020 • King, K.B.S., Wagner, T., and Cheruvelil, K.S., In Prep. Regional differences in connectivity on fish species richness in lakes and streams at the macroscale. To be submitted Each of the aforementioned dissertation chapters follows the principles of ‘open science’. Thus, I published my data and code for reproducibility and to facilitate future research: • King, K. 2020. Code and data for lake and stream fish species richness and composition. v • King, K.B.S. 2019. Lakes, wetlands, and streams at the national scale. Zenodo. Zenodo. http://doi.org/10.5281/zenodo.4266961 http://doi.org/10.5281/zenodo.3246537 • King, K. 2018. Lake, wetland, and stream biotic and abiotic properties from the National Aquatic Resource Surveys. Knowledge Network for Biocomplexity. https://dx.doi.org/10.5063/F13J3B5D In addition to my dissertation, I co-authored several papers as a member of the Continental Limnology project (https://lagoslakes.org/projects/continental-limnology/). As a member of this team of ~25 people in the fields of limnology, computer science, ecoinformatics, and statistics, I helped to collate a database that provides open-access to water quality and landscape data for lakes across the continental U.S. in order to increase ecological understanding of drivers and patterns of inland lake chemical and biological properties at the macroscale. The published papers I co-authored while on this project include: • Soranno, P. Cheruvelil, K., Liu, B., Wang, Qi., Tan, PN., Zhou, J., King, K., McCullough, I., Stachelek, J., Bartley, M., Filstrup, C., Hanks, E., Lapierre, JF., Lottig, N., Schliep, E., Wagner, T., and Webster, K. 2020. Ecological prediction at macroscales using big data: Does sampling design matter? Ecological Applications. https://doi- org.proxy1.cl.msu.edu/10.1002/eap.2123 • Filstrup, C., King, K., and McCullough, I. 2019. Evenness effects mask richness effects on ecosystem functioning at macro-scales in lakes. Ecology Letters. https://doi.org/10.1111/ele.13407 • McCullough, I., King, K., Stachelek, J., Diez, J., Soranno, P. and Cheruvelil, K. 2019. Extending the patch-matrix model to fresh waters: A connectivity-based conservation framework for lakes. Landscape Ecology. https://doi.org/10.1007/s10980-019-00915-7 • Wagner, T., Lottig, N. R., Bartley, M.L., Hanks, E. M., Schliep, E.M., Wikle, N.B. , King, K.B.S., McCullough, I., Stachelek, J., Cheruvelil, K.S., Filstrup, C.T., Lapierre, J.F., Liu, B., Soranno, P.A., Tan,PN, Wang, Q., Webster, K., and Zhou, J. 2019. Increasing accuracy of lake nutrient predictions in thousands of lakes by leveraging water clarity data. Limnology and Oceanography Letters. https://doi.org/10.1002/lol2.10134 • Soranno, P., Bacon LC, Beauchene M, … King, K.B.S.…and 76 others. 2017. LAGOS- NE: A multi-scaled geospatial and temporal database of lake ecological context and water quality for thousands of U.S. lakes. GigaScience. https://doi.org/10.1093/gigascience/gix101 vi TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................ix LIST OF FIGURES ..................................................................................................................x INTRODUCTION ................................................................................................................... 1 CHAPTER 1: DRIVERS AND SPATIAL STRUCTURE OF ABIOTIC AND BIOTIC PROPERTIES OF LAKES, WETLANDS, AND STREAMS AT THE NATIONAL SCALE .. 6 CHAPTER 2: SURFACE WATER CONNECTIVITY AFFECTS LAKE AND STREAM FISH SPECIES RICHNESS AND COMPOSITION ......................................................................... 7 CHAPTER 3: LAKE NETWORKS FOR THE CONTERMINOUS U.S. (LAGOS-US- NETWORKS) ...........................................................................................................................8 3.1 Abstract ........................................................................................................................ 8 3.2 Introduction .................................................................................................................. 8 3.3 Data Description........................................................................................................... 12 3.3.1 Overview of data sources ..................................................................................... 12 3.3.2 Overview of data tables and variables .................................................................. 14 3.3.3 Overview of data access....................................................................................... 18 3.4 Methods ....................................................................................................................... 19 3.4.1 Creating lake connectivity networks .................................................................... 19 3.4.2 Linking dams to lake connectivity networks ........................................................ 21 3.4.3 Quantifying lake and network connectivity metrics .............................................. 22 3.5 Technical Validation .................................................................................................... 26 3.5.1. Informational Flags ............................................................................................. 26 3.5.2 Validation and Quality Control/Quality Assurance............................................... 27 3.6 Data Use and Recommendations for Reuse ................................................................... 29 3.7 Comparison with existing datasets ................................................................................ 30 3.8 Acknowledgments ........................................................................................................ 31 CHAPTER 4: REGIONAL DIFFERENCES IN THE EFFECTS OF CONNECTIVITY ON FISH SPECIES RICHNESS IN LAKES AND STREAMS AT THE MACROSCALE ............ 33 4.1 Abstract ........................................................................................................................ 33 4.2 Introduction .................................................................................................................. 34 4.3 Methods ....................................................................................................................... 37 4.3.1 Study Extent and Data ......................................................................................... 37 4.3.2 Analysis ............................................................................................................... 43 4.4 Results ......................................................................................................................... 45 4.4.1 Regional variation in fish species richness: unconditional model ......................... 45 4.4.2 Identifying multi-scale drivers of fish species richness across lakes and streams .. 46 vii 4.4.3 Regional differences in average species richness in lakes and streams.................. 47 4.4.4 Cross-scale interactions affect the connectivity-species richness relationship in lakes and streams.......................................................................................................... 49 4.5 Discussion .................................................................................................................... 52 4.5.1 Differences and similarities in drivers of fish species richness between lakes and streams ......................................................................................................................... 53 4.5.2 Differences in connectivity-fish species richness relationships across HU4 regions and freshwater types ..................................................................................................... 55 4.5.3 Conclusions and Management Implications ......................................................... 57 4.6. Acknowledgements ..................................................................................................... 58 APPENDIX ............................................................................................................................. 59 REFERENCES ........................................................................................................................ 64 viii LIST OF TABLES Table 3.1. Description and occurrence of lake informational data flags in nets_networkmetrics_medres (number of lakes = 86,511)………………………………………26 Table 4.1. Local, watershed, and HU4-scale predictors used in models, as well as their minimum, median, mean, and maximum values across the five-state study extent. Iowa, Wisconsin, and Michigan = midwest and New Hampshire and Maine = northeast. In the midwest, n=516 lakes and 458 streams and in the northeast n=70 lakes and 194 streams………………………………42 ix LIST OF FIGURES Figure 3.1. Lake connectivity networks. Lakes (n=86,511) in the LAGOS-US NETWORKS module, colored according to their network membership (n=898 networks). NETWORKS includes lakes >1 ha in surface area that are connected to other lakes (i.e. no isolated lakes or lakes only connected to streams are included) in the conterminous U.S………………………...12 Figure 3.2. Dam locations. Dam points (n=49,525) in the LAGOS-US NETWORKS module overlaid on networks colored according to their network membership (n=898 networks)...……13 Figure 3.3. The LAGOS-US NETWORKS schema. NETWORKS includes metadata in the form of a source table and a data dictionary and four data tables (nets_networkmetrics_medres, nets_binetworkdistance_medres, nets_uninetworkdistance_medres, and (nets_flow_medres). The tables are connected to each other and other LAGOS-US modules via lagoslakeid, depicted with red text. The nets_networkmetrics_medres table also includes observation-level flags, depicted with blue text. The variables in black text included in the four data tables are representative examples. The census population of lakes is n=86,511; however, the flow table includes identification for all flowlines (n=2,665,206)……………………………………………………15 Figure 3.4. Map of downstream lakes and dams. Map depicting location of lakes in NETWORKS color-coded according to a) the distance to the nearest downstream lake (km), where gray circles mean the lake has no downstream lake and b) the number of downstream dams from each lake within its network…………………………………………………………16 Figure 3.5. Map of lake landscape position. Map depicting location of lakes in NETWORKS color-coded according to their landscape position measured as a) lake network number and b) lake order…………………………………………………………………………………….......17 Figure 3.6. Network metrics. Network metric summaries of a) frequency distribution of the number of lakes in a network b) frequency distribution of the number of lakes in networks with less than 100 lakes, c) boxplot of the average distance (km) between lakes in a network, d) boxplot of the average lake area (ha) in a network, and e) boxplot of the number of dams in the network. Note that for visualization purposes, the boxplots were truncated at the high end resulting in the removal of 2, 10, and 25% of networks in panels c, d, and e, respectively……..17 Figure 3.7. Network creation. A bidirectional graph (a) and unidirectional graph (b). An example of a lake network (c) compared to its corresponding bidirectional graph (d) to illustrate how networks were created and how upstream or downstream distances were defined in NETWORKS. The distance between lake C and lake D includes traversing the network downstream and then upstream. The stream course distance is used as a weight in panel (d); thicker connecting lines depicts further distances. Panel (d) was made using the “igraph” package (Csardi and Nepusz 2006)………………………………………………………………………..21 x Figure 3.8. Lake network number and lake order. Example of part of a lake network with lake network number (LNN) and lake order (LO) metrics for each lake……………………………..24 Figure 4.1: Study extent includes 32 HU4 regions and 2 macro-regions (midwest and northeast) in the U.S. Points represent lake (n=586) and stream (n=652) sample sites and black lines are HU4 sub-regions…………………………………………………………………………………41 Figure 4.2. Estimated effects of local and regional scale predictors on lakes (a) and streams (b). Blue values represent important predictors of fish species richness when the 90% credible intervals do not overlap zero. Refer to Table 4.1 for a description of predictor variables and units………………………………………………………………………………………………46 Figure 4.3. The effect of proportion agriculture cover within a HU4 on the HU4 specific log- mean richness in a) lakes and b) streams. Points represent HU4 specific log-mean richness and lines from the values are 95% credible intervals. The gray shaded area is the 95% credible region for the fitted line. The 90% credible interval for the effect of agriculture in both regions and in both panels overlapped 0. The x-axis labels were back-transformed for clarity…………48 Figure 4.4. The effect of road density within a HU4 on the HU4 specific log-mean richness in a) lakes and b) streams. Points represent HU4 specific log-mean richness and lines from the values are 95% credible intervals. The gray shaded area is the 95% credible region for the fitted line. The 90% credible interval for the effect of agriculture in both regions and in both panels overlapped 0. …………………………………………………………………………………….48 Figure 4.5. Cross-scale interaction estimates between each local connectivity-species richness relationship and HU4 agriculture (Ag) or road density (RoadDens) in midwest lakes (a), midwest streams (b), northeast lakes (c) or northeast streams (d). Blue values represent significant CSIs when the 90% credible intervals do not overlap zero. Refer to Table 4.1 for predictor descriptions and units…………………………………………………………………………….50 Figure 4.6. Graph of the cross-scale interactions in the midwest between HU4-scale proportion agriculture cover and the slope of the distance to the nearest upstream lake - richness relationship in lakes (a) and streams (b). The x-axis labels in panels a-b were back-transformed for clarity. Each point represents an estimated mean HU4 slope, which is displayed geographically for lakes (c) and streams (d). The lines represent 95% credible intervals for each HU4 value and the gray shaded area is the 95% credible region for the fitted line. The 90% credible interval for the effect of HU4 agriculture on the relationship between the distance to upstream lake and species richness in streams did not overlap zero (b), whereas the 90% credible interval for lakes did overlap zero (a) (see text for details).……………………………………………………………51 Figure 4.7. Correlation of watershed drivers…………………………………………………….60 Figure 4.8. Correlation of connectivity metrics………………………………………………… 61 Figure 4.9. Correlation matrix of HU4 drivers…………………………………………………..62 xi Figure 4.10. Correlation matrix across HU4 and local scales……………………………...........63 xii INTRODUCTION Recent advances in the fields of macrosystems ecology (Heffernan et al. 2014) and metacommunity ecology (Leibold et al. 2004) have demonstrated the importance of studying phenomena at broad-scales and identifying patterns and processes that structure biotic and abiotic properties of terrestrial and aquatic ecosystems. This is likely because environmental changes such as land use intensification, climate change, and the spread of non-native species can occur at broad spatial extents such as regions and continents. These frameworks have specific emphasis on the hierarchical structure of ecosystems, including drivers from different spatial scales (i.e. multi-scale drivers), interactions of drivers across spatial scales, and connections among and within terrestrial and aquatic ecosystems (Peters et al. 2008; Soranno et al. 2014; Heino et al. 2020). Although single-system or within-region studies provide mechanistic understanding of ecosystems, macroscale studies have provided insight into the broad-scale patterns and drivers that better reflect the scale of management (e.g. state and federal). For aquatic ecosystems, these emerging sub-disciplines have increased the number of studies focused on how the terrestrial landscape and the connections among fresh waters affect freshwater properties. However, despite the calls for studies integrating freshwater types (Chaloner and Wotton 2011; Stendera et al. 2012; Kraemer 2020), there is still a tendency for ecologists to focus on a single, discrete freshwater type (i.e. lakes or streams or wetlands), leaving a gap in our understanding of how waterbodies may respond to human activity, especially at the macroscale. Research of abiotic and biotic properties among seemingly 1 disparate freshwater types is needed to improve our knowledge and to form a basis for freshwater restoration and protection. Further, fresh waters exchange materials and organisms across their boundaries. Surface water connections can be important for delivering resources, providing refuge, or supplying pathways for recolonizing communities in lakes and streams (Olden et al. 2001; Morelli et al. 2016). On the other hand, connections can distribute contaminants and invasive species (Fausch et al. 2009; Jackson and Pringle 2010). Thus, it is not known how the effects of connectivity may vary across different regions and whether connectivity has similar effects on stream and lake species. Finally, the interaction of drivers across scales (cross-scale interactions; CSIs) can influence how biota may respond to future global change (e.g., Scheffer et al. 2015; Wagner et al. 2016), but there are few examples of CSIs in freshwater systems, especially focused on fish biodiversity. More research of the connectivity of lake and stream ecosystems can aid in explaining variation across regions and across freshwater types to prevent future degradation of biodiversity. My dissertation aims to fill the aforementioned knowledge gaps by using the macrosystems and metacommunity ecology frameworks to compare lake, stream, and wetland abiotic and biotic properties at regional to national extents. Freshwater experiments and monitoring are rarely set up a priori to sample lakes, streams, and wetlands at the same time, and those that are able to do so occur at fine spatial extents. Therefore, my dissertation research required extensive compilation of large datasets from disparate sources, another reason why cross-freshwater type studies are rare. In addition, I contributed to the practice of ‘open science’ by publishing these compiled datasets and code in order to increase future research and facilitate reproducibility. 2 In my first chapter, I used both biotic and abiotic data from over 3,500 lakes, streams, and wetlands sampled by the Environmental Protection Agency’s National Aquatic Resource Surveys across the continental U.S. to investigate whether lake, wetland, and stream biotic and abiotic properties 1) respond to similar ecosystem and watershed drivers and 2) have similar spatial structure at the national scale. I found that the drivers of total phosphorus and total nitrogen did not differ across lakes, wetlands, or streams and showed similar patterns in concentration across the U.S. However, patterns in the biotic variables (chlorophyll-a and percent macrophyte cover) were dependent upon freshwater type. I also found that the top drivers of biotic and abiotic patterns had similar spatial structure as the response variables, showing that drivers operating at multiple scales act on ecosystem properties, regardless of freshwater type. In my second chapter, I further investigated biotic properties across lakes and streams, specifically focusing on fish biodiversity patterns. I used fish data from 559 lakes and 854 streams from the midwestern/northeastern United States to examine the differences and similarities in lake and stream fish species richness and composition across surface water connectivity classes. I found that connectivity (characterized as discrete classes of isolated lakes and headwater streams to highly connected lakes and rivers) was positively related to fish species richness, lakes and streams shared many species, and connectivity helped explain variation in species composition across lakes and streams. My results demonstrate that lake and stream fish communities are not independent from each other and the importance of including both freshwater types in science and management. My third chapter made use of computational approaches to quantify lake networks, which are defined as a set of lakes connected by ephemeral or permanent streams, regardless of the 3 directionality of those connections (e.g. upstream, downstream, or both). I collaborated with a computer science PhD student to use a graph theory framework, the Lake multi-scaled Geospatial and temporal database (LAGOS-US), and the National Hydrography Dataset (NHD) to create 898 lake networks for lakes >1ha in surface area for the conterminous U.S. From these networks, I derived a suite of surface connectivity metrics for 86,511 lakes, including the number of and distance to both up- and downstream lakes, dams, and network position. This is the first database to provide accessible and comprehensive lake network metrics at the national scale. This database provided me with data for chapter four of my dissertation, and will facilitate future research that considers the influence of the entire network on lake abiotic and biotic properties. In my final chapter, I used the continuous connectivity metrics from chapter three, beyond the categorical classes used in my second chapter, as well as other local, landscape, and regional scale drivers to compare the effects of connectivity on fish species richness to other multi-scaled drivers in lakes and streams. I found that the stream course distance to the nearest downstream lake was important for fish species richness in lakes, whereas the distance to the nearest upstream lake was important in streams. Although connectivity had a positive effect on species richness overall in both lakes and streams, this relationship varied across regions. This variation could be partially explained by regional scale agriculture land use, showing a cross- scale interaction. Interestingly, this CSI had opposite effects in lakes compared to streams. In addition to connectivity, I found that two local-scale drivers (waterbody area and elevation) and one regional-scale driver (precipitation) were important for species richness in both lakes and streams. My results show the importance of considering both freshwater types, multi-scale drivers, and cross-scale interactions when studying lakes and streams at the macroscale and when making management plans across varying landscapes. 4 My dissertation research is unique because I study abiotic and biotic properties across multiple freshwater types and their multiscale drivers at the macroscale. My results inform understanding of broad-scale freshwater responses to changes in climate and land use intensification. My work also suggests that managers include lakes, streams, and wetlands in conservation and restoration plans, protect fresh waters across a range of connectivity in order to maximize biodiversity protection within a region, and consider multiscale drivers (and interactions among them) when conducting national freshwater assessments and management and conservation efforts. 5 CHAPTER 1: DRIVERS AND SPATIAL STRUCTURE OF ABIOTIC AND BIOTIC PROPERTIES OF LAKES, WETLANDS, AND STREAMS AT THE NATIONAL SCALE I examined the effects of drivers quantified at various spatial scales on lake, wetland, and stream biotic and abiotic properties as well as the spatial structure of these properties and their top drivers across the continental U.S. My results showed that the drivers of total phosphorus (TP) and total nitrogen (TN) did not differ across lakes, wetlands, or streams and showed similar patterns in concentration across the United States. However, patterns in the biotic variables (chlorophyll-a and percent macrophyte cover) were dependent upon freshwater type. I also found that the top drivers, percent agriculture cover and percent forest cover in the watershed, had similar spatial structure as TN and TP, showing that drivers operating at multiple scales act on ecosystem properties, regardless of freshwater type. For a full text of this published work go to: King, K., Cheruvelil, K.S., and Pollard, A. 2019. Drivers and spatial structure of abiotic and biotic properties of lakes, wetlands, and streams at the national scale. Ecological Applications 29: e01957. https://doi.org/10.1002/eap.1957 6 CHAPTER 2: SURFACE WATER CONNECTIVITY AFFECTS LAKE AND STREAM FISH SPECIES RICHNESS AND COMPOSITION I investigated the differences and similarities in lake and stream fish species richness and composition across connectivity classes in 559 lakes and 854 streams from the midwestern/northeastern United States. I found that discrete connectivity classes, ranging from isolated lakes and headwater streams to highly connected lakes and rivers classes, were positively related to fish species richness, lakes and streams shared many species, and connectivity helped explain variation in species composition across lakes and streams. My results demonstrated that lake and stream fish communities were not independent from each other and the importance of including both freshwater types in science and management. For a full text of this published work go to: King, K.B.S., Bremigan, M.T., Infante, D., and Cheruvelil, K.S. 2021. Surface water connectivity affects lake and stream fish species richness and composition. Canadian Journal of Fisheries and Aquatic Sciences. http://dx.doi.org/10.1139/cjfas-2020-0090 7 CHAPTER 3: LAKE NETWORKS FOR THE CONTERMINOUS U.S. (LAGOS-US- NETWORKS) King, K., Wang, Q., Rodriguez, L.K., and Cheruvelil, K.S. In revision. Lake networks for the conterminous U.S. (LAGOS-US NETWORKS). Limnology and Oceanography Letters. Submitted for peer review December 2020 3.1 Abstract Knowing the degree of surface water connectivity among lakes can help scientists better understand and predict the movement of abiotic materials and biota within networks. Quantifying broad-scale networks that include lake and stream connections is difficult computationally. Starting from the medium resolution National Hydrography Dataset’s (NHD) lakes, streams, and rivers, we applied a graph theory approach to identify lake connectivity networks, a set of lakes connected by streams - upstream, downstream, or both. There are a total of 898 networks that include 86,511 lakes >1ha in surface area. The LAGOS-US NETWORKS v1 module contains four data tables, one of which includes derived surface water connectivity metrics for lakes and networks within the conterminous United States, including dams. NETWORKS also includes a flow table as well as a bidirectional and a unidirectional distance table that provide the stream course distances between every connected lake. 3.2 Introduction Freshwater surface connectivity is an important area of research for aquatic ecologists. Knowing the degree of connectivity among lakes (e.g. a gradient from isolated to highly connected lakes) can help scientists better understand and predict the movement of materials and biota within networks. Studies have shown that connectivity affects lake characteristics such as water 8 chemistry (e.g., Webster et al. 2000; Sadro et al. 2012; Soranno et al. 2015) and biotic diversity (e.g., Olden et al. 2001; Beisner et al. 2006; Griffiths et al. 2015). Research also shows that incorporating both streams and lakes into measures of connectivity gives a more accurate representation of nutrient processing and biotic movements (Jones 2010) than using only one freshwater type (lakes or streams). One way to quantify freshwater surface connectivity is to create metrics for surface water networks, or a series of connected lakes and stream reaches. These metrics can incorporate the number of and distance to surface water connections as well as the waterbody position within a network (i.e., landscape position). For example, Olden et al. (2001) investigated how a suite of connectivity metrics such as upstream and downstream watercourse distances between lakes, watercourse distance through an intermediate lake, and stream gradient, corresponded to fish community composition. They found different connectivity metrics were important for different lakes. Popular landscape position metrics like stream Strahler order (Strahler 1957) and link magnitude (Shreve 1967) have been used to capture the spatial arrangement of a stream reach within a river network. Similarly, lake landscape position within a network has been characterized with lake network number and lake order, a lake’s position in a lake chain, and the Strahler order of the outflowing stream, respectively (Kling et al. 2000; Riera et al. 2000; Martin and Soranno 2006). The position of a lake in the network has been shown to be correlated with both abiotic (e.g. Kling et al. 2000) and biotic (e.g. Kratz et. al 1997) properties. However, connectivity metrics that describe surface water network structure by incorporating both streams and lakes are needed to better understand the influence of connectivity (or isolation) on biotic and abiotic lake properties. 9 The use of standard definitions and methods to quantify surface water networks has been hampered because metric selection for quantifying connectivity depends on both the research question and focus of the study (e.g. biota vs. nutrients, streams vs. lakes) as well as the spatial scale of interest. In addition, when working at broad scales (regions to continents) and including both streams and lakes, it is difficult to balance accurate estimates of surface water connectivity and computational challenges. Graph theory approaches provide ways to overcome computational challenges because they have minimal data requirements while still providing accurate estimates of connectivity (Calabrese and Fagan 2004). However, because these metrics can be computationally difficult, studies that have applied graph theory to lakes are often restricted to a few watersheds (Bishop-Taylor et al. 2015; Saunders et al. 2016). Our research fills the need for accessible and comprehensive lake connectivity metrics at the national scale. A recent study using the NHD high resolution lakes (>0.5 ha) and medium resolution permanent rivers and streams classified river networks into four types based on surface connections across the conterminous United States (U.S.; Gardner et al. 2019). This study demonstrated how lake/reservoir abundance and size scale with stream order and provided a first step in incorporating lakes into river networks (Gardner et al. 2019). Our research complements their river-centric study by focusing on lake networks and makes the data publicly- accessible for further research and applications. This data paper presents the LAGOS-US NETWORKS v1 data module that includes a total of 898 networks that include 86,511 lakes >1ha in surface area. The number of lakes in a network ranges from 2 to 32,811 lakes, the largest network being the Mississippi River basin (Figure 3.1). NETWORKS was created using a graph theory framework to create lake connectivity networks for the conterminous U.S., where lakes and streams were the nodes and 10 connections between them were the edges (Urban and Keitt 2001; Eros et al., 2012). We defined lake connectivity networks as a set of lakes connected by ephemeral or permanent streams, regardless of the directionality of those connections (e.g. upstream, downstream, or both) and we excluded connections through the Great Lakes, oceans, and estuaries. NETWORKS includes all lakes that are connected to other lakes (i.e. no isolated lakes or lakes only connected to streams are included), which is about 18% of all lakes >1 ha in surface area in the study extent (Smith et al. 2020). This proportion is comparable to similar studies that found 33% (Hill et al. 218) and 15% (Gardner et al. 2019) of NHD lakes to be in-network. From these networks, we derived surface connectivity metrics, including metrics for connections among lakes (both upstream and downstream), dam metrics, and network position (spatial orientation of the lake within its network) for connected lakes. We also included a bidirectional and unidirectional distance between every pair of connected lakes and a flow table that describes the flow path direction between two flowlines (e.g., TO and FROM) that was used to create the networks and metrics in the other three tables. All metrics in the NETWORKS module can be linked to individual lakes in the LAGOS-US database platform via ‘lagoslakeids’ (Smith et al. 2020) or linked to the medium resolution NHDplusV2 via ‘comids’ (USGS 2019). NETWORKS metrics can be used in conjunction with other abiotic and biotic datasets to further ecological prediction, such as how nutrients or toxins move through a network, changes in invasive species distributions, or how biota might move up or downstream in response to climate change. These networks will help advance our understanding of how surface water connections and landscape position affect abiotic and biotic properties of lakes at regional to continental scales. 11 Figure 3.1. Lake connectivity networks. Lakes (n=86,511) in the LAGOS-US NETWORKS module, colored according to their network membership (n=898 networks). NETWORKS includes lakes >1 ha in surface area that are connected to other lakes (i.e. no isolated lakes or lakes only connected to streams are included) in the conterminous U.S. 3.3 Data Description 3.3.1 Overview of data sources The LAGOS-US NETWORKS module was created using existing datasets from a variety of data sources. The lake connectivity networks were derived from the lake and stream flow tables of the medium resolution U.S. National Hydrography Dataset (NHDplusV2) downloaded August 5, 2019 (USGS 2019). NHDplusV2 is a national geospatial surface water dataset that integrates information from the NHD, the National Elevation Dataset (NED), and the Watershed Boundary Dataset (WBD) at a 1:100,000-scale. Lakes from the NHDplusV2 were matched to lakes from the LAGOS-US LOCUS v1 data module (Smith et al. 2020), which includes lakes and reservoirs greater than or equal to 1 ha from the high resolution National Hydrography Dataset (NHD). The NHDplusV2 was also matched to dams using a variety of data sources. 12 First, we used the National Anthropogenic Barrier Dataset (NABD) (Ostroff et al. 2013) a dataset of large, anthropogenic barriers that were originally spatially linked to the NHDPlusV1 data product to facilitate analyses based on the NHD and National Inventory of Dams (NID 2015). However, we used a modified NABD that was augmented by Cooper et al. (2017) with 170 additional dams from the USFWS Fish Passage Decision Support Tool and that included dam removals since the NABD was published as listed in the 2018 American Rivers dam removal database (Rivers 2019). This modified NABD dataset was used to establish the population of dams (n=49,525) that reside on streams or lakes and calculate dam metrics for all lakes and networks within the LAGOS-US NETWORKS module (Figure 3.2). The NETWORKS module includes a source table that can be linked to the data tables. See the user guide for additional details (King et al. under review). Figure 3.2. Dam locations. Dam points (n=49,525) in the LAGOS-US NETWORKS module overlaid on networks colored according to their network membership (n=898 networks). 13 3.3.2 Overview of data tables and variables The NETWORKS module contains two metadata tables and four data tables (Figure 3.3). The metadata tables are a ‘data dictionary’ that provides a definition for each variable name or column of every table in the module and includes important information such as units, and a ‘source table’ that includes a description of the data sources used to create NETWORKS. The four data tables contain the key variables and include 1) a lake connectivity metrics table (nets_networkmetrics_medres) that has lake identifier information, upstream and downstream connectivity metrics, upstream and downstream dam locations, landscape position, and network connectivity metrics 2-3) two distance tables (nets_binetworkdistance_medres, nets_uninetworkdistance_medres) that include lake identifier information as well as upstream and downstream distances between pairs of connected lakes using either a bidirectional graph (2) or unidirectional graph (3), and 4) the modified flow table (nets_flow_medres) with NHDplusV2 common identifiers for NHDFlowlines that describes the flow path direction between two flowlines (e.g., TO and FROM) and that was used to create the networks and metrics. 14 LAGOS-USNETS NETS Metadata Data source table Metadata for sources Data dictionary Definitions of all columns NETS Data Tables Lake connectivity Identifiers lagoslakeid Lake NHDplusV2 ID Network ID Connectivity Lake distance (Bi) Identifiers lagoslakeid Stream Distance upstream distance downstream distance total distance Lake distance (Uni) Identifiers lagoslakeid Stream Distance upstream distance downstream distance total distance Lake and stream flow Identifiers lagoslakeid Flowline NHDplusV2 ID Dist to upstream lakes Dist to downstream lakes # upstream lakes # downstream lakes Lake Order Lake Network Number Dams Nearest upstream dam Nearest downstream dam Dam ID Networks Total # lakes Average dist between lakes Average lake area Total # dams Observation-level flags Figure 3.3. The LAGOS-US NETWORKS schema. NETWORKS includes metadata in the form of a source table and a data dictionary and four data tables (nets_networkmetrics_medres, nets_binetworkdistance_medres, nets_uninetworkdistance_medres, and (nets_flow_medres). The tables are connected to each other and other LAGOS-US modules via lagoslakeid, depicted with red text. The nets_networkmetrics_medres table also includes observation-level flags, depicted with blue text. The variables in black text included in the four data tables are representative examples. The census population of lakes is n=86,511; however, the flow table includes identification for all flowlines (n=2,665,206). Figures 3.4-3.6 highlight some variables from the nets_networkmetrics_medres tables. For example, we found that more lakes have a connection to a downstream lake than an upstream lake and that the nearest distance to a downstream lake can be up to 200 km, with lakes in the Mississippi network even further than that (Figure 3.4a). Similarly, there are more lakes with downstream dams in their network than upstream dams. Many lakes have at least one dam downstream and in the Mississippi River basin some have >10 dams downstream (Figure 3.4b). The majority of U.S. lakes have a low lake network number (LNN), indicating a high amount of network branching rather than long, linear chains (Figure 3.5a). Higher values of LNN appear in 15 the upper-midwest, west, and south-central U.S. Lake order (LO) is fairly evenly distributed across the U.S., and the majority of lakes tend to be lower order (Figure 3.5b). LNN ranges from 1-50 and LO ranges from 0-9. At the network scale, although the Mississippi River network includes 32,811 lakes (Figure 3.6a), the majority of lake networks have <100 lakes within a network, and many networks consist of only 2 connected lakes (Figure 3.6b). The average distance between lakes in a network ranges from less than 1 to over 1500 km, with a median distance of approximately 7 km (Figure 3.6c). The average area of lakes within a network ranges from just over 1 to about 47,000 ha, with a median of approximately 18 ha (Figure 3.6d). The number of dams in a network ranges from 0 to about 25,000 (the Mississippi River network), with the majority of networks including 1 dam (Figure 3.6e). Figure 3.4. Map of downstream lakes and dams. Map depicting location of lakes in NETWORKS color-coded according to a) the distance to the nearest downstream lake (km), where gray circles mean the lake has no downstream lake and b) the number of downstream dams from each lake within its network. 16 Figure 3.5. Map of lake landscape position. Map depicting location of lakes in NETWORKS color-coded according to their landscape position measured as a) lake network number and b) lake order. Figure 3.6. Network metrics. Network metric summaries of a) frequency distribution of the number of lakes in a network b) frequency distribution of the number of lakes in networks with less than 100 lakes, c) boxplot of the average distance (km) between lakes in a network, d) boxplot of the average lake area (ha) in a network, and e) boxplot of the number of dams in the network. Note that for visualization purposes, the boxplots were truncated at the high end resulting in the removal of 2, 10, and 25% of networks in panels c, d, and e, respectively. 17 3.3.3 Overview of data access LAGOS-US NETWORKS v1 is made up of metadata and data tables that are csv files as well as a documentation guide in pdf form, all of which are available for public download via the EDI repository (King et al. under review). There is also code available on GitHub for those who would like to reproduce, extend, or adapt our networks (Wang and King 2020) and an R package that can be used to download and link NETWORKS with the other LAGOS-US core and extension modules (lagosus; Stachelek 2020). When NETWORKS data are included in analyses, users should cite them as well as this data paper that describes the motivation and context for creating the NETWORKS module. NETWORKS v1 data and documentation: King, K., Wang, Q., Rodriguez, L.K., Haite, M., Danila, L., Pang-Ning, T., Zhou, J., and Cheruvelil, K.S. under review. User Guide for LAGOS-US NETWORKS v1.0: Data module of surface water networks characterizing connections among lakes, streams, and rivers in the conterminous U.S. Environmental Data Initiative. https://portal-s.edirepository.org/nis/mapbrowse?scope=edi&identifier=681. Dataset accessed XX/XX/2020. Data paper: King, K., Wang, Q., Rodriguez, L.K., and Cheruvelil, K.S. under review. Lake connectivity networks for the conterminous U.S. (LAGOS-US NETWORKS v1). Limnology and Oceanography Letters. Submitted for peer review December 2020. LAGOS-US R package: Stachelek J. 2020. LAGOSUS: Interface to the Lake Multi-scaled Geospatial and Temporal Database. R package version 0.0.1. DOI forthcoming. 18 3.4 Methods This section outlines the methods used to derive lake connectivity networks as well as the derived connectivity metrics in LAGOS-US NETWORKS v1. We also explain how dam data from the NABD was linked to our networks to add potential barriers to connectivity. For further technical detail on this process, we have submitted data documentation in the form of a user guide along with the metadata and datasets on EDI (King et al. under review) and users can consult the published code (Wang and King 2020). 3.4.1 Creating lake connectivity networks Lake connectivity networks across the continental U.S. were created using the flow table from the medium resolution NHDPlusV2 database (USGS 2019). The flow table from NHDPlusV2 consisted of every flowline (streams and artificial flowlines that go through lakes) either in the ‘FROM’ column or ‘TO’ column, denoting a direction of flow from one line to the other, as well as the distance for each connection between two flow lines. Prior to creating a graph, we removed several connections. We removed coastline connections (Fcode 56600; McKay et al. 2012) so that the connectivity networks did not connect through the ocean, estuaries, or the Great Lakes, as well as IDs associated with the Great Lakes water bodies. Artificial flowlines were linked to water bodies (nhdplusv2_comid), and these water bodies were linked to lagoslakeids using the lake_link table from the LAGOS_US_LOCUS module (Smith et al. 2020). The modified version of the NHDPlusV2 flow table, including where artificial flowlines are matched to lakes from the LAGOS-US database, can be found as the nets_flow_medres data table. We applied a graph theory framework to create lake connectivity networks from the nets_flow_medres data table. Graphs are mathematical structures used to model pairwise 19 relations between objects, or nodes. In our case, we were interested in modeling the pairs of lakes that are connected by streams. We created lake connectivity networks using bidirectional graphs, which considered both downstream and upstream connections, using both lakes and streams as nodes (Figure 3.7a). We used Dijkstra's algorithm (Cormen et al. 2001) to traverse the graph both upstream and downstream starting at a given lake. During the traversal, if a node was a stream, we continued traversing the graph until the node was a lake. We saved the distance from the given lake to this lake and stopped traversing. If there were multiple paths to connect the same two lakes, the algorithm chose and saved the path with the shortest length. This produced all the connections of the given lake to its neighbor lakes. This process was repeated for every lake until the connections and stream course distances between all lakes were known. All lakes that were connected to another lake up or downstream were considered part of one network (Figure 3.7c,d). We assigned each of these networks a unique identification number (net_id). All of the stream course distances between pairs of lakes can be found in the nets_binetworkdistance_medres. The artificial flowline distances through lakes were not included in these distances. This table includes upstream, downstream, and total distance between two lakes. The total distance may be smaller than the sum of the upstream and downstream columns because the graph did not have information on where the stream reaches intersected each other; therefore, an intersecting stream reach was only counted once for the total distance, but was included in both the downstream and upstream distance columns. 20 Figure 3.7. Network creation. A bidirectional graph (a) and unidirectional graph (b). An example of a lake network (c) compared to its corresponding bidirectional graph (d) to illustrate how networks were created and how upstream or downstream distances were defined in NETWORKS. The distance between lake C and lake D includes traversing the network downstream and then upstream. The stream course distance is used as a weight in panel (d); thicker connecting lines depicts further distances. Panel (d) was made using the “igraph” package (Csardi and Nepusz 2006). 3.4.2 Linking dams to lake connectivity networks The NABD is a dataset of large, anthropogenic barriers that are spatially linked to the NHDPlusV1 data product to facilitate analyses based on the NHD and National Inventory of Dams (Ostroff et al. 2013). Cooper et al. (2017) added 170 additional dams to this database from the USFWS Fish Passage Decision Support Tool and excluded ~250 dams that were identified as having been removed since the NABD was published (Rivers, 2019). The 49,525 dams were linked to the NHDPlusV2 flowlines and were incorporated into networks. Dams were assigned to a lagoslakeid if they were less than 50 m from a lake. Dams that were directly on (or in) a lake 21 could not be considered as up- or downstream because they were on the node and therefore, did not have a direction in reference to that node. Therefore, these dams were assigned as upstream or downstream from a lake using two methods: 1) Using ArcGIS, lake inlets and outlets were identified using the start and end vertices associated with the artificial flowlines and extracted as points representing inlets and outlets. For each dam point location, the nearest 3 inlets or outlets (combined) were identified using euclidean distance in the ArcGIS GenerateNear tool. If both inlets and outlets for the same lake were very near each other or an inlet or outlet for another lake was very near, the dam position was assigned for manual review. Methods are available as Python code within the LAGOS GIS Toolbox (http://github.com/cont-limno/LAGOS_GIS_Toolbox; national_outlets_inlets.py, dams_link_lake_junctions.py). There were 11,551 dams that were assigned upstream or downstream of a lake using this method. 2) The remaining dams (n=1,079) that could not be identified by the automated process were then manually classified by visual inspection of the dam location in comparison to the NHD polygons and flowlines and manually assigned as either on the upstream or downstream side of a lake. Two data flags were created during the process of linking dams to lakes and streams/rivers. These flags were for cases when a dam fell onto an artificial flowline contained within a lake or when multiple dams fell on the same lake (Table 3.1; section on Informational Flags). 3.4.3 Quantifying lake and network connectivity metrics After creating the connectivity networks, several metrics were created at the lake scale using a unidirectional graph. Unidirectional graphs consider either downstream or upstream connections (Figure 3.7b). We used Dijkstra's algorithm (Cormen et al. 2001) to traverse the 22 graph downstream starting at a given lake. The same process was used for the unidirectional graph that was used for the bidirectional graph described in the above section “Creating lake connectivity networks”. The stream course distances between two lakes using a unidirectional graph can be found in the nets_uninetworkdistance_medres table. Because a unidirectional graph traverses the network downstream only, this table includes a downstream distance and there is a mirror image of the distance in the other direction (i.e. upstream). The metrics for the nearest lake distance were determined by comparing the distance between each lake and all of its neighboring lakes and choosing the nearest distance upstream and the nearest distance downstream from the unidirectional graph. Note that not all lakes have both an upstream and downstream lake. The number of directly connected lakes upstream was computed as the indegree of a lake, i.e. the number of lakes upstream only connected through streams flowing into the lake. Similarly, the number of directly connected downstream lakes was calculated using the outdegree of a lake, i.e. lakes directly connected through streams flowing out of a lake. There were instances when a lake did not have any directly connected upstream or directly connected downstream lakes because the lake was only connected through the bidirectional graph to the lake network (n=7,617). Therefore, we also included a metric for the nearest lake using bidirectional distance. These instances are easily identifiable because these lakes only have a nearest bidirectional distance and do not have a nearest downstream or nearest upstream lake distance. Two metrics that describe the position of a lake within the network and landscape were derived using a unidirectional graph: lake network number (LNN) and lake order (LO) (Riera et al. 2000; Martin and Soranno 2006) (Figure 3.8). LNN was computed by starting at the first lake in a network (i.e. no upstream lakes) and assigning that lake a “1”, then moving downstream to 23 another lake and assigning that lake a “2”, and so on throughout the network. Therefore, multiple lakes in a network could be assigned a “1” if they do not have any upstream lakes. Lakes with multiple upstream lakes were assigned the larger sequential number (Martin and Soranno 2006). LO was assigned using the Strahler stream order from the NHDplusV2 attributes. LO followed the Strahler stream order of the outflowing stream, where the higher order stream was chosen if more than one outlet was present (Riera et al. 2000; Martin and Soranno 2006). There were two exceptions to this: headwater lakes were assigned a “0” and terminal lakes received the Strahler order of the inflowing stream (Riera et al. 2000; Martin and Soranno 2006). We considered inflowing streams for LO calculations to differentiate between headwater lakes and lakes that had inflowing streams but not upstream lakes. There were instances when a loop between two lakes occurred (0.02% of all connections), for example lake A flowed to lake B and lake B flowed back to lake A. In these instances, we randomly removed one connection. Figure 3.8. Lake network number and lake order. Example of part of a lake network with lake network number (LNN) and lake order (LO) metrics for each lake. 24 Several dam metrics were derived that characterize barriers to connectivity. The Depth First Search (DFS; Cormen et al. 2001) algorithm was used to traverse each lake-stream network to find all of the upstream dams and downstream dams. Dijkstra’s algorithm was used to compute the distance to the nearest upstream and downstream dams (Cormen et al. 2001). Because we used a graph to create the network, the algorithm did not have the exact location of the dam on the stream reach, just the flowline it was located on. Therefore, when deriving the metrics for the nearest dam, the entire stream reach with the dam was included in the distance calculation. Thus, there were instances when two or more dams fell on the same stream flowline (8.7 % occurrence). In these instances, all dams were considered as the nearest up- or downstream dams, they were assigned the same distance from the lake, and all of the dam ids were included and separated by a comma. These instances are easily identifiable because more than one dam is listed in the lake_nets_nearestdamdown_id or lake_nets_nearestdamup_id column. Similarly, if multiple dams were on a lake (0.15% occurrence), all of the dams were considered the nearest dam, all dam ids were included, and dams located on a lake were assigned the distance of 0 km. Lakes with multiple dams on the lake were assigned a flag (Table 3.1; section on Informational Flags). At the network scale, we traversed the completed lake connectivity networks using the DFS algorithm. This process calculated total lakes in each network, the average distances between lakes in a network, and the total number of dams in each lake network. The average area of the lakes in a network was calculated using lake area values from LAGOS-US LOCUS v1.0 polygons (Smith et al. 2020), grouping lakes by networks, and then using the Calculate Geometry tool in ArcGIS. 25 3.5 Technical Validation 3.5.1. Informational Flags During construction of the module, we created a series of data flags that convey something about a data observation that may be of interest to users. These flags are all informational flags of general relevance to the data user and none of these flags are cautionary flags that indicate potential concerns for inclusion of particular data observations in analysis (Table 3.1). Flag Value Description User Relevance Y,N lake_nets_ damonlake _flag Y,N lake_nets_ multidam_ flag A value of ‘Y’ indicates that there is at least one dam on this lake. This means that the dam point falls onto one of the artificial flowlines that flows through a lake and is therefore associated with the lake and not a stream reach. A value of ‘N’ indicates there is not a dam on a lake. A value of ‘Y’ indicates that there are multiple dams on a lake. A value of ‘N’ indicates there are not multiple dams. This flag primarily serves to alert the user of the presence of a dam directly on a lake as opposed to on a connecting stream reach. This flag identifies lakes that have multiple dams. There may be a dam at multiple inlets or outlets or a dam at both locations. Number of occurrence Percent of data 12,630 14.6 % 132 0.15 % Table 3.1. Description and occurrence of lake informational data flags in nets_networkmetrics_medres (number of lakes = 86,511). 26 3.5.2 Validation and Quality Control/Quality Assurance The validation and Quality Control/Quality Assurance (QAQC) process was intended to ensure that the procedures used to create the values for NETWORKS variables resulted in the intended outcomes. We used two methods for validation and QAQC. First, during the creation of the metrics, a simulation graph was created to validate the code. This simulation graph included paths that were unidirectional as well as bidirectional, multiple connections between lakes, lakes that were directly connected to other lakes without streams, and a Great Lake. Using this simulation graph, we checked that the distance between pairs of lakes was correct for downstream, upstream, and bidirectional connections. Then, we ensured that the code accurately selected the shorter distance if there were multiple connections between lakes for both the unidirectional and bidirectional connections. For lakes that did not have a stream connection between them, we ensured the code resulted in downstream and upstream distances of 0 km. Finally, we tested that the code ignored connections to the Great Lakes. Our team manually examined resulting networks and associated metrics using either ArcGIS 10.3 Desktop (ESRI 2014) or the “hydrolinks” package (Winslow et al. 2018), which downloads and traverses paths for the medium resolution NHDplusV2 data to identify potential issues with either the input data or code. All solvable issues were reconciled and the networks or metrics were regenerated and retested until no further issues were found. After metrics were quantified, we proceeded with a second phase of QA/QC. We queried the NETWORKS metrics data table (nets_networkmetrics_medres) to: 1) identify potential data or geoprocessing issues and 2) verify that data values were sensible (e.g. within expected ranges and expected completeness of data). These checks of individual variables assessed that the workflow generating data accurately reflected both the source data and the lake-specific values. 27 For this process, the nets_networkmetrics_medres data table, in csv (comma-separated values) format, was imported by semi-automated R scripts that then summarized the data table, ensured comparability with the source GIS layer and data dictionary, summarized and mapped values for each variable, and automatically generated scores for three main evaluation criteria in a QAQC summary report provided in html format. Note that actions were iterative; the QAQC review feeds back into the data creation process, which then re-exports the data table and then re-runs the entire QAQC process. Below are the three evaluation criteria and subsequent actions used in this process: 1) Match with GIS data: This check compared the list of lagoslakeids in the data table with those in the corresponding LAGOS-US LOCUS reference shapefiles maintained in an ArcGIS geodatabase (GIS_LOCUS; Smith et al. 2020). If a “Fail” warning was generated, non-matching lagoslakeids were manually investigated to identify the source of the mismatch between the data table and the reference GIS data layer. 2) Match with metadata: Variable names in the data table were compared with the master list of variable names maintained in the metadata table data dictionary. Where there was no match, due to missing or incorrect names in either the data dictionary or the data table, a “Fail” warning was generated and the mismatches were listed in a table in the QAQC report. Where a “Fail” warning was generated, the data dictionary and data table variable names were examined and the name(s) in error were fixed as necessary. 3) Missing value: This check counted the number of observations with missing values, listed them, and produced maps of their location. A “Warn” evaluation was created for this criterion and variables were inspected to make sure there were no gaps in the input data. 28 3.6 Data Use and Recommendations for Reuse We advise users to heed caution when combining the data in NETWORKS that are based on the medium resolution NHDplusV2 flow data with other resolutions of the NHD data or derived data using other NHD versions. For example, when users combine NETWORKS with LAGOS-US, they should be aware that connectivity metrics will differ between the LOCUS and NETWORKS modules. For example, a lake classified as connected in LOCUS might not be a part of the NETWORKS lakes or a lake classified as isolated in LOCUS might be part of a network. This difference between modules is due to NETWORKS being based on the medium resolution NHDplusV2 flow data, whereas LOCUS used the NHD high resolution (Smith et al. 2020). The NHD high resolution includes smaller streams, which can connect some lakes that are not connected in the NHD medium resolution. On the other hand, the NHD high resolution separates some lakes that lie very close to a stream and considers them isolated when they are connected in the NHD medium resolution. Additionally, NETWORKS metrics were only included for lakes connected to other lakes, and therefore do not include isolated lakes or lakes that are only connected to streams. NETWORKS is based on the medium resolution NHDplusV2 flow data because of computing capacity at the conterminous U.S. scale and the availability of stream attributes in the medium resolution that are not available in the NHD high resolution data. The data in NETWORKS have not yet been used in research that has been published, but are being used in several on-going efforts that will result in publications. For example, these metrics are being used to quantify how connectivity affects both stream and lake fish communities within and across networks. These data are also being used to determine how freshwater networks best facilitate latitudinal range shifts for species under ongoing climate 29 change and if highly connected networks reside in protected areas. We also plan to use these data for studies of invasive species movement and species distribution modeling. NETWORKS will be a valuable data source for building broad-scale understanding of the role of connectivity and barriers to connectivity for movement of abiotic materials and biota. The module can be linked with other LAGOS-US modules as well as with the NHD using unique identifiers. Additionally, the distance and flow tables act as an “edge list” that can be used in the ‘igraph’ package (Csardi and Nepusz 2006) to calculate more graph metrics for specific networks. Finally, future users can combine these data with a variety of lake abiotic and biotic data, or incorporate weights such as size or quality of habitat patch on the nodes (lakes or streams) (Eros et al. 2012) to answer a myriad of questions related to freshwater surface connectivity. 3.7 Comparison with existing datasets Although the majority of past studies fail to address surface water connectivity at the U.S. national scale, we provide an overview of pre-existing datasets so that readers and users understand what connectivity information was available at the time of writing, and how these previous methods align with or deviate from the networks and connectivity metrics in the NETWORKS module. Several connectivity datasets for the conterminous U.S. exist for streams. For example, the NHD (McKay et al. 2012) includes connectivity metrics such as a modified version of Strahler stream order. Cooper and Infante (2017) have created dam metrics for streams in the conterminous U.S., which represent network fragmentation. However, these datasets and metrics 30 do not include lakes and the networks stop at dams because they were created for biotic variables that cannot move past these barriers (e.g. fish). For lakes, there are a few broad-scale U.S. datasets that have important similarities and differences to NETWORKS. The LAGOS-US LOCUS module (Smith et al. 2020) includes several connectivity metrics, such as connectivity classes, the number of upstream lakes, upstream lake area, and stream density within a watershed. However, this dataset lacks downstream connections because it was created for abiotic variables. LakeCAT includes some metrics such as density of streams or dams within a catchment, however, they do not explicitly quantify lake or network connectivity metrics (Hill et al. 2018). Fergus et al. (2017) provide connectivity information at the HUC 12 and HUC 8 scale, including lake, stream, and wetland densities and clusters, although this is only for the northeastern/northern midwestern region of the U.S. The “hydrolinks” package (Winslow et al. 2018), which was used for NETWORKS validation, is a great tool for quantifying and mapping connectivity; however, this tool only traverses upstream or downstream, it includes coastal lines and Great Lakes polygons, and it is best used for small extents because of computation time. Therefore, NETWORKS extends these datasets and tools by providing lake networks and connectivity metrics for the entire conterminous U.S. that include both lakes and streams and both upstream and downstream information that is useful for studying abiotic and biotic patterns and processes. 3.8 Acknowledgments We thank Dana Infante and Arthur Cooper for their dam database and expertise, Arika Hawkins for her hourly help with manual dam classification, Maggie Haite for her help with data validation and GIS maps, Nicole Smith for GIS support, Pang-Ning Pan and Jiayu Zhou for 31 computer science expertise, Arnab Shuvo for EDI support, and Katherine Webster for QA/QC support. We also thank the Continental Limnology Project team, especially Patricia Soranno, for contributions and discussions throughout the development of this data module. 32 CHAPTER 4: REGIONAL DIFFERENCES IN THE EFFECTS OF CONNECTIVITY ON FISH SPECIES RICHNESS IN LAKES AND STREAMS AT THE MACROSCALE King, K.B.S., Wagner, T., and Cheruvelil, K.S., In Prep. Regional differences in connectivity on fish species richness in lakes and streams at the macroscale. To be submitted. 4.1 Abstract Fish species richness and composition are likely shaped by drivers that operate at multiple spatial scales. At broad-scales, environmental gradients such as climate or human pressures such as land use might change the effect of local drivers on biotic and abiotic properties within an ecosystem (i.e. cross-scale interaction). For example, surface water connectivity may have different effects on fish species richness depending on the regional climate or land use/cover. However, it is not known whether connectivity has similar effects on fish species richness across different regions, whether cross-scale interactions occur, nor how those effects vary in streams and lakes. Therefore, we examined how drivers at multiple spatial scales effect lake and stream species richness and quantified how the relationship between connectivity and fish species richness changes in response to regional anthropogenic drivers. Using fish data from 586 lakes and 652 streams from the midwestern/northeastern U.S. and Bayesian hierarchical modeling, we found that connectivity was associated with higher species richness, but that connectivity had different effects on richness depending on regional-scale land use and freshwater type. By studying lakes and streams together and incorporating multi-scale drivers into models, our results inform scientific understanding of what drives variation in fish species diversity at broad spatial scales. 33 4.2 Introduction Ecological patterns and processes affecting fish species richness and composition are shaped by different drivers operating at multiple spatial scales (i.e. multi-scale drivers), often referred to as hierarchically structured filtering (Tonn, 1990; Poff, 1997). This filter structure, as well as complex interactions of drivers operating at different spatial scales (i.e. cross-scale interactions (CSIs); Peters et al. 2007; Soranno et al. 2014), can make it challenging to predict biodiversity patterns as well as identify multi-scale stressors (Stendera et al. 2012). At broad extents, the importance of one driver over another or the effect of drivers on ecosystem biotic and abiotic properties may differ spatially due to environmental gradients such as climate or human pressures such as land use. For example, in streams across the northeastern U.S., brook trout populations showed an abrupt decline (i.e. threshold) with increasing watershed urban development (Wagner and Midway 2014). However, the threshold between trout populations and watershed urban development was higher in regions where regional mean summer stream water temperature was higher, showing a CSI between watershed-scale urban cover and regional-scale water temperature (Wagner and Midway 2014). Macrosystems ecology (Heffernan et al. 2014) provides a framework for understanding hierarchical filtering, cross-scale interactions, and regional differences in drivers of fish species richness, which is necessary to understand and manage fish in freshwater ecosystems across broad-spatial extents. Surface water connectivity is a prime example of a driver of fish species richness that may vary regionally and interact with other drivers. Surface water connections as well as the position of a lake or stream in the landscape have been linked to species richness and human pressures. For instance, surface water connections between lakes and streams may be positively associated with species diversity because connectivity provides a means of dispersal and refugia 34 during perturbations (Olden et al. 2001; Morelli et al. 2016; King et al. 2021). In addition, lakes and streams lower in the landscape (i.e. lower elevations) tend to be more vulnerable to pressures because low elevation landscapes generally have a higher human population density or accumulate nutrients and contaminants (Solheim et al. 2019). Although commonly studied separately, lakes are not independent from streams nor are streams unaffected by their connections to lakes (Jones 2010; Gardner et al. 2019). Therefore, including lake and stream biota and abiotic characteristics, as well as the connections among them, is needed for understanding the role of surface water connectivity in driving fish species richness (King et al. 2021) and the resilience of freshwater networks to anthropogenic stressors (Soranno et al. 2010; Heino et al. 2020, Kraemer 2020). Surface water connectivity may have different effects on freshwater quality and fish species richness depending on the regional climate, land use or cover, or terrain (e.g. Magnuson et al. 1998; Henriques-Silva et al. 2019). For example, increasing connections between lakes, streams, and wetlands can reduce extinction risk of native species by providing a means of refuge from changing water temperatures or reduced habitat (McCluney et al. 2014), and contribute to a supply of nutrients that can enhance productivity (Willis and Magnuson 2000). On the other hand, large networks of lakes and streams can allow for the movement of non- native species or, in highly developed areas, could promote the spread of pollutants that negatively affect biota (Fausch et al. 2009; Jackson and Pringle 2010). Therefore, it is not known whether connectivity has similar effects on stream and lake fish species richness across different regions with varying gradients of natural and human landscape features nor how those effects may affect ow biota respond to future global change. My research of the differences and 35 similarities in connectivity of lake and stream ecosystems can aid in explaining variation across regions and across freshwater types to prevent future degradation of biodiversity. Our study investigated drivers at multiple spatial scales for both lake and stream fish diversity with an emphasis on understanding how the role of continuous connectivity metrics between lakes and streams differs regionally. We asked, how does the role of surface water connectivity on fish communities in lakes and streams change relative to other multi-scale drivers? We developed three main hypotheses related to this question. First, we expected continuous connectivity metrics, such as the number of and the distance to lakes, to have an overall positive effect on fish species richness in both lakes and streams because connectivity provides food, refuge, and spawning sites. Second, we expected overall lower average species richness in regions with high agriculture or road density because perturbations can shift community composition toward more common and tolerant species and lower species diversity. Third, we hypothesized that connectivity may have a negative effect on richness in some regions and a positive effect in other regions. This hypothesis was based on the fact that we anticipated land use to affect the connectivity-richness relationship. In other words, we expected a cross- scale interaction would be present, such that there would be a positive connectivity-richness relationship in regions with lower levels of agriculture or road density and a negative connectivity-richness relationship in more disturbed regions. This latter expectation was expected because connectivity in disturbed regions may promote excess nutrients causing water quality degradation or non-native species movement. 36 4.3 Methods 4.3.1 Study Extent and Data Fish community data collected during 1991-2009 from lakes and streams was compiled from universities and state and federal agencies from the United States of Iowa, Maine, Michigan, New Hampshire, and Wisconsin (Figure 4.1) (Alger 2009; Daniel et al. 2015). More detailed information about the fish dataset can be found in King et al. (2021). Briefly, lake and stream sites were a combination of both random and non-random sampling, including lakes and reservoirs (hereafter, lakes) and wadeable streams and non-wadeable rivers (hereafter, streams). Sampling was performed with the goal of collecting all species present, not targeting specific species. Single pass backpack and boat electrofishing methods were used to collect stream fishes and fyke, trap nets, and/or electrofishing were used to collect lake fishes (King et al. 2021). We processed these data by removing individuals that were not identified to the species level or were hybrids. Due to potential differences in sampling effort across gear types and across lakes and streams, we used rarefaction curves to select only waterbodies with sufficient sampling to estimate species richness (sensu Irz et al. 2007; Niu et al. 2012). Only stream sites or lake-gear combinations that reached an asymptote (<0.05 degrees) at 90% of the total individuals captured (Yang 2013) were kept for further analysis. The asymptote of a rarefaction curve is an estimate of species richness (e.g. number of different species) within a waterbody based on a random sample of the observed individuals (Gotelli and Chao 2013). These rarefaction estimates for species richness were used in the subsequent models (King et al. 2021). Our dataset included a total of 586 lakes and 652 streams (Figure 4.1). For every sampling site, we selected multi-scale predictor variables that have been shown to drive fish communities in lakes and streams, with a focus on the role of connectivity. We 37 characterized connectivity in the following ways: 1) lake or stream order, which describes the position of a lake or stream in the landscape (Strahler 1957), 2-3) the number of directly connected downstream or upstream lakes and the distance to the nearest downstream or upstream lake, which represent habitat availability, refuge, and the potential for colonization (Olden et al. 2001), and 4) the proportion wetland cover in the 100m buffer around a lake or stream reach, because connections to wetlands provide fish with refuge, food, spawning, and nesting sites (Jude and Pappas 1992). Note that wetland cover in the 100m buffer was correlated with proportion wetland in the watershed and region, thus we included only the 100m buffer in analyses. For lakes, lake order (based on Strahler stream order of the outflowing stream; Martin and Soranno 2006), the distance to the nearest upstream or downstream lake, and the number of directly connected upstream and downstream lakes were obtained from LAGOS- US_NETWORKS v1 (King et al. in review). For streams, stream order was the Strahler stream order from NHDplus-V2 (USGS 2019), and the distance to the nearest upstream or downstream lake, and the number of directly connected upstream and downstream lakes were calculated using the same methods for lakes (King et al. in review). For isolated lakes we assigned an order of -1, and for isolated lakes and streams not connected to lakes, we assigned zero to the number of directly connected downstream or upstream lakes, and assigned a value 100km further than the maximum distance to represent the most isolation. The proportion of wetland in the 100m buffer of lakes was obtained from LAGOS-NE (Soranno et al. 2015; Soranno et al. 2017) and for streams it was obtained from StreamCat (Hill et al. 2016). Other local scale factors that drive fish species richness include the lake and stream area, which are known to have a positive relationship with fish species richness (Mehner et al. 2005; King et al. 2021); elevation, which may affect the ability of a fish species to swim to a lake or 38 stream (Kratz et al. 1997) and is related to temperature; and temperature, which has been shown to structure freshwater fish communities (Allan and Castillo 2007; Wehrly et al. 2012). Lake area and elevation were obtained from LAGOS-NE (Soranno et al. 2015; Soranno et al. 2017). Stream area was calculated from the stream reach length (NHDplus-V2; USGS 2019) and stream width from the sampling records or the Google Earth engine measuring tool on satellite imagery for streams >2nd order (Google Earth Pro 2019). Stream elevation was the mean elevation of the stream watershed from StreamCat (Hill et al. 2016). Water temperature measurements were not available for all water bodies in our study extent, so we used air temperature at the site as a proxy. Lake mean summer maximum air temperatures, calculated as a mean of the maximum air temperatures during the summer (June, July, and August) months, were obtained from Collins et al. (2018). We used similar methods to extract mean summer maximum air temperatures for stream sites from PRISM (PRISM 2004). In addition to these local-scale factors, we included watershed and HU4 regional drivers. Stream watersheds were defined as including all upstream catchments, and metrics were obtained from StreamCat (Hill et al. 2016). Lake watersheds were defined as the area of land draining directly into the lake and includes the connected upstream stream and lake drainage areas of lakes <10ha, and metrics were obtained from LAGOS-NE (Soranno et al. 2015; Soranno et al. 2017). Watersheds are nested within USGS hydrologic units, which are delineated based on river and groundwater hydrologic traits (Seaber et al. 1987). We used the hydrologic unit 4 (HU4; n=32) as regions in our study (Figure 4.1). We obtained the area of the watershed, which is often correlated with stream density and runoff. We also included dam density within the watershed to represent fragmentation that may lower fish species diversity over time (Cooper et al 2015). At both the watershed and HU4 scales, road density and proportion urban land use were 39 positively correlated with each other, and proportion forest and proportion agriculture cover were negatively correlated. Therefore, we included only road density, which serves as a proxy for stocking or transport of organisms and introduction of non-native species, and proportion agriculture land cover, which can increase nutrient content and lead to changes in the food web (Allan 2004), at the HU4 scale. We used the baseflow index at the HU4-scale as an indicator of groundwater contribution to the stream network, which can moderate stream temperatures and flow. Finally, broad-scale drivers such as precipitation and temperature can indirectly affect fish diversity over time. However, HU4 temperature was highly correlated with lake or stream air temperature, therefore, we only included HU4 precipitation in models. All HU4 metrics were obtained from LAGOS-NE (Soranno et al. 2015; Soranno et al. 2017). All variables selected to model had a correlation coefficient <0.7 (Figure 4.7,4.8,4.9,4.10). See Table 4.1 for a summary of the predictor variables. We also defined two larger regions, hereinafter referred to as macro-regions, by combining Iowa, Michigan, and Wisconsin as the midwest (MW; HU4 n =24) and New Hampshire and Maine as the northeast (NE; HU4 n=8) (Figure 4.1). We expected potential differences in the effects of the multi-scale drivers in these two macro-regions because of differences in their ranges of natural features like elevation, precipitation, temperature, land cover, and hydrology, as well as differences in their ranges of human impact, such as agriculture (Abell et al. 2008). In addition, the NE has naturally lower species richness due to the difficulty of recolonizing this area after glaciation (Griffiths 2010), as well as colder temperatures and terrain less suitable for many fish species compared with the MW. 40 Figure 4.1. Study extent includes 32 HU4 regions and 2 macro-regions (midwest and northeast) in the U.S. Points represent lake (n=586) and stream (n=652) sample sites and black lines are HU4 sub-regions. 41 predictors water body area (km2) stream/lake order # upstream lakes # downstream lakes midwest northeast abbreviation min med mean max min med mean max 0.00016 0.20 1.5 0.00005 0.009 0.86 area order #up_lakes #down_lakes 2.0 0.0 1.0 190 100 2.0 2.9 2.2 110 690 60 8.0 360 13 190 2030 0.15 0.20 0.96 -1.0 0.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 190 21 1.7 2.0 1.0 106 720 32 6.0 68 2.0 190 2030 0.24 0.25 0.77 24 15 230 0.0 0.92 0.04 0.50 1100 27 24 21000 280 740 250 0.70 0.01 3.1 1.3 0.07 0.03 0.51 0.54 1100 1200 distance to nearest lake upstream (km) dist_down_lake distance to nearest lake downstream (km) dist_up_lake wetland in 100m buffer (proportion) wetland_buf mean max summer air temperature (oC) watershed area (km2) watershed elevation (m) watershed dam density (dams/km2) HU4 road density (km/km2) HU4 agriculture (proportion) HU4 mean baseflow HU4 30year mean precipitation(mm) 22 tmax_summer 0.04 area_ws 170 elev_ws dam_dens_ws 0.0 hu4_roaddensity 1.2 0.02 hu4_agriculture 0.14 hu4_baseflow hu4_precip 700 26 26 952 29 335 320 0.01 0.0 2.0 1.9 0.34 0.42 0.55 0.55 840 840 20 30 290000 0.27 530 1.1 5.2 0.83 0.78 1010 16 0.0 0.75 0.02 0.48 1060 Table 4.1. Local, watershed, and HU4-scale predictors used in models, as well as their minimum (min), median (med), mean, and maximum (max) values across the two macro-regions. In the midwest, n=516 lakes and 458 streams and in the northeast n=70 lakes and 194 streams. 42 4.3.2 Analysis There were 11 local/watershed-scale predictors, six of which were connectivity metrics, and four HU4-scale predators used to explain variation in both lake and stream fish species richness. Prior to modeling, variables that ranged from 0-1 were arcsine-square-root transformed and all others were natural-log transformed, except for temperature, elevation, and precipitation, which were already normally distributed. All predictor variables were standardized (mean = 0, SD = 1). We also tested whether there was significant regional-scale (HU4) variation in fish species richness. We used the intraclass correlation coefficient to quantify the among HU4 region variation in average species richness for streams and lakes separately. Separate Bayesian hierarchical Poisson models were used for lakes and streams with the same local- and regional-scale drivers to simultaneously identify important drivers of fish species richness and compare across freshwater types. The hierarchical model was as follows: !"~$%&'+)*+*"+⋯+)-+-",/012 456 7=1,…; (4.1) Where <= was richness for lake or stream i, !" was the region-specific (HU4-specific) intercepts, >*…>?were the estimated effects of local-scale predictors @*…@?, &'was the fixed intercept, )*…)-were the estimated effects of the region-scale predictors +*…+-, and /01 was the conditional among-region variance. The local-scale and regional-scale predictors received a horseshoe prior, which allows for better separation of potentially correlated values. Predictors were deemed important if the corresponding 90% credible interval did not overlap with zero. 43 A Bayesian hierarchical Poisson model, similar to that described above, was used to quantify potential cross-scale interactions affecting the connectivity-richness relationship. Models were fitted for streams and lakes separately. The model was a varying intercept, varying slope model with the local-scale connectivity predictors and two HU4 anthropogenic disturbance predictors (i.e. agriculture and road density) used to model the variability in the slopes describing the connectivity-richness relationships across HU4s (i.e. CSIs). The model also included a macro-region effect, where HU4s in Maine and New Hampshire were considered the northeast (NE) and HU4s in Iowa, Michigan, and Wisconsin were considered the midwest (MW). Cross scale interactions were deemed important if the corresponding 90% credible interval did not overlap with zero. The hierarchical model was as follows: AαjβpjF~MVNJKθ0kα+θ1kα∙Z1j+θ2kα∙Z2j θ0kβp+θ1kβp∙Z1j+θ2kβp∙Z2jR,ΣT,456 7=1…; UVW X=1…2 (4.2) the HU4-specific slopes that describe the relationship between each local-predictor variable where <= was richness for lake or stream i, !"was the HU4-specific intercepts and >*"…>?" were @*…@? and richness for HU4 region j. ϴ'Y0 is the grand mean intercept (across all lakes or streams), ϴ*Y0 is the mean intercept for agriculture, and ϴ1Y0 is the intercept for road density, where k represents varying intercepts for the MW or NE region. Z'Y[?are the varying intercepts and Z*Y[?and Z1Y[? are the slopes describing the relationships between HU4-scale predictors and the slopes in the relationship between connectivity metrics – richness for each region k (e.g. Z*[? represents the estimated CSIs with agriculture and Z1[?represents the estimated CSIs with road 44 density). Diffuse normal priors were used for ϴ-Y0 and Z\-Y[? and we modeled the variance- covariance matrix (^) using the scaled inverse-Wishart distribution (Gelman and Hill 2007). To estimate parameters, we ran three parallel Markov chains, each with 50,000 total samples, where we discarded the first 40,000 samples. We ensured convergence for all parameters with the Brooks-Gelman-Rubin statistic, where values were <1.1 indicating convergence for all models (Brooks and Gelman 1998). Analyses were run from within R (R Core Team 2021) using the JAGS program (Plummer 2003) from the “r2jags” package (Su and Yajima 2020). 4.4 Results 4.4.1Regional variation in fish species richness: unconditional model Fish species richness was different among HU4 regions in both lakes and streams. For streams, the intraclass correlation coefficient (ICC) was 29%, indicating that 29% of the total variation in fish species richness was attributed to across-HU4 region differences. For lakes, the ICC was 15%, indicating 15% of the total variation in lake fish species richness was attributed to across-region differences. The unconditional models show that a multilevel approach will account for this regional variation in fish species richness by allowing for random intercepts among regions. Thus, the remaining analysis used hierarchical models to account for regional variation in fish species richness. 4.4.2 Identifying multi-scale drivers of fish species richness across lakes and streams Important connectivity predictor variables varied among lakes and streams, with distance to downstream lake having a negative effect on lake species richness and distance to upstream 45 lake having a negative effect on stream species richness (Figure 4.2). However, for both streams and lakes waterbody area, elevation, and precipitation had consistent positive, negative, and negative effects on richness, respectively. In lakes, HU4 road density was important for explaining variation in species richness, having a positive effect. In streams, watershed area, watershed dam density, and proportion of agriculture cover in a HU4 all had a positive effect on species richness. Figure 4.2. Estimated effects of local and regional scale predictors on lakes (a) and streams (b). Blue values represent important predictors of fish species richness when the 90% credible intervals do not overlap zero. Refer to Table 4.1 for a description of predictor variables and units. 4.4.3 Regional differences in average species richness in lakes and streams We found differences in the mean species richness across macro-regions and across HU4s in the midwest (MW), but not the northeast (NE). In general, the NE region had lower mean species richness than the MW region for both lakes and streams. In the MW, mean species richness in lakes ranged from 6-11 and from 6-13 in streams. Mean species richness in NE lakes 46 ranged from 6-7 and from 5-8 in NE streams. However, we did not find a significant relationship between agriculture and the HU4 mean species richness in lakes in the MW (posterior mean= 0.02, 90% CRI = -0.20,0.22) or NE (NE posterior mean=0.15, 90% CRI=-2.92,3.30) regions (Fig 3a), nor for streams in both regions (MW posterior mean=-0.09, 90% CRI=-043,0.27; NE posterior mean=-0.97, 90% CRI=-4.40,2.41; Figure 4.3b). Similarly, we did not find a significant relationship between road density and the HU4 mean species richness in lakes in the MW (posterior mean= 0.04, 90% CRI = -0.16,0.23) or NE (NE posterior mean=0.05, 90% CRI= - 0.27,0.38) regions (Figure 4.4a), nor for streams in the MW (posterior mean= -0.07, 90% CRI = - 0.40,0.25) or NE (posterior mean= -0.04, 90% CRI = -0.49,0.42) regions (Figure 4.4b). Figure 4.3. The effect of proportion agriculture cover within a HU4 on the HU4 specific log- mean richness in a) lakes and b) streams. Points represent HU4 specific log-mean richness and lines from the values are 95% credible intervals. The gray shaded area is the 95% credible region for the fitted line. The 90% credible interval for the effect of agriculture in both regions and in both panels overlapped 0. The x-axis labels were back-transformed for clarity. 47 Figure 4.4. The effect of road density within a HU4 on the HU4 specific log-mean richness in a) lakes and b) streams. Points represent HU4 specific log-mean richness and lines from the values are 95% credible intervals. The gray shaded area is the 95% credible region for the fitted line. The 90% credible interval for the effect of agriculture in both regions and in both panels overlapped 0. 4.4.4 Cross-scale interactions affect the connectivity-species richness relationship in lakes and streams There was spatial variation in the connectivity-species richness relationship for lakes and streams in the midwest (MW) and in the northeast (NE). For lakes, slope estimates for the distance to the nearest upstream lake and richness relationship ranged from -0.35 to 0.42 across HU4s in the MW region and from -0.23 to 0.55 in the NE region. Similarly, for streams in the MW, the slope estimates for the relationship between the distance to the nearest upstream lake and richness ranged from -0.50 to 0.40, but were mostly positive in the NE (-0.23 to 0.34). In examining the presence of CSIs, we found that agriculture was important for streams in the MW (Figure 4.5b). Agriculture had a negative effect on the relationship between the distance to upstream lake and richness, whereby an increase in HU4 agriculture cover 48 corresponded with that relationship becoming more negative (posterior mean= -0.22, 90% CRI = -0.41,-0.03; Figure 4.6b,d). Although no significant CSIs were detected in MW lakes, the posterior probability that agriculture has a positive effect on the relationship between distance to the nearest upstream lake and richness relationship is 91% (posterior mean= 0.18, 90% CRI = - 0.04,0.40; Figure 4.5a). Interestingly, this is the opposite effect of agriculture on this relationship for lakes as compared to what we found for streams (Figure 4.6a,c). There was a significant difference in the effect of agriculture on distance to the nearest upstream lake and richness relationship between lakes and streams in the MW (95% CRI= 0.07,0.74). There were no CSIs detected in the NE region for both lakes and streams (Figure 5c,d). 49 Figure 4.5. Cross-scale interaction estimates between each local connectivity-species richness relationship and HU4 agriculture (Ag) or road density (RoadDens) in midwest lakes (a), midwest streams (b), northeast lakes (c) or northeast streams (d). Blue values represent significant CSIs when the 90% credible intervals do not overlap zero. Refer to Table 4.1 for predictor descriptions and units. 50 Figure 4.6. Graph of the cross-scale interactions in the midwest between HU4-scale proportion agriculture cover and the slope of the distance to the nearest upstream lake - richness relationship in lakes (a) and streams (b). The x-axis labels in panels a-b were back-transformed for clarity. Each point represents an estimated mean HU4 slope, which is displayed geographically for lakes (c) and streams (d). The lines represent 95% credible intervals for each HU4 value and the gray shaded area is the 95% credible region for the fitted line. The 90% credible interval for the effect of HU4 agriculture on the relationship between the distance to upstream lake and species richness in streams did not overlap zero (b), whereas the 90% credible interval for lakes did overlap zero (a) (see text for details). 51 4.5 Discussion Surface water connections among lakes and streams at broad spatial extents create extensive networks that provide habitat for fish in both freshwater types. Using hierarchical and spatially varying modeling, we found that continuous surface water connectivity metrics were important for understanding variation in species richness in both lakes and streams, and that differences in the effect of connectivity on species richness across HU4 regions could be partially explained by agricultural land use (i.e. a cross-scale interaction). However, connectivity metrics and the effect of land use on the connectivity-species richness relationship differed across the freshwater types. These findings suggest that it is important when examining broad- scale fish species richness in both lakes and streams to include not only multi-scale drivers, but also cross-scale interactions. Multi-scale processes can influence how fish communities are structured and because these processes are likely to vary in different ecological contexts (e.g. climate, land cover), there is a need for broad-scale understanding of this variability. Surprisingly, we found that regional (HU4) agriculture and road density were not important for explaining broad-scale differences in mean species richness in lakes or streams (Figure 4.3,4.4). This lack of relationship in the NE and absence of cross-scale interactions may be partly due to small ranges in both mean fish species richness (5-8) and agriculture cover at the HU4 scale (2-7%). However, although the gradient of agricultural cover in the midwest ranged from almost 2-83%, we still found no relationship between HU4 agriculture and HU4 mean fish species richness (range 6-13). Other broad-scale drivers, such as precipitation and temperature may be driving differences in regional species richness. For example, midwest HU4s with low agriculture and low mean species richness were located in northern Michigan, which is similar in climate and geography to the northeast, and likewise supports cold water and fewer fish species. 52 Cold water species have also been difficult to capture with all gear types in Michigan (Wehrly et al. 2012), therefore, mean species richness in these HU4s could be underestimated. 4.5.1 Differences and similarities in drivers of fish species richness between lakes and streams We found that connectivity had a positive effect on fish species richness in both lakes and streams. As the distance from a lake or stream to the nearest lake increased, species richness decreased (Figure 4.2). This result supported our hypothesis and likely points to the importance of having a lake nearby to provide habitat or a means of colonization for both lake and stream fish. Interestingly, by investigating both lakes and streams, we were able to identify that the distance to the nearest downstream lake was important for fish species richness in lakes, whereas the distance to the nearest upstream lake was important in streams. These results may be because flowing water carries food and nutrients from upstream, thus increasing productivity in a stream and supporting more diversity (Willis and Magnuson 2000). In contrast, it may be that when lake fish experience environmental stress such as low oxygen during summer drought or winter ice cover, they move downstream for temporary refuge (Tonn and Magnuson 1982). In addition to connectivity, we found that two local-scale drivers (waterbody area and elevation) and one regional-scale driver (precipitation) were important for species richness in both lakes and streams (Figure 4.2). Previous comparative studies have also found relationships between fish species richness and community composition and both local and broad-scale factors like morphometry and climate (e.g. Allan 1997; Jackson et al., 200l; Sharma et al., 2011). For example, waterbody area has a positive effect on species richness in both freshwater types, most likely due to large water bodies inherently having more habitat and more habitat complexity to support more individuals and more species than smaller systems (MacArthur and Wilson 1967; Eadie et al. 1986). Similarly, a negative relationship between elevation and species richness is 53 unsurprising given lakes and streams at higher elevations are generally smaller and more isolated (Kratz 1997). Finally, there are a variety of reasons why high levels of precipitation can negatively affect species richness. For example, increases in runoff from precipitation can deliver sediment and nutrients to lakes and streams, causing turbidity and reducing light availability for submerged aquatic vegetation and periphyton that, in turn, reduces food and shelter availability for fish (Allan and Castillo 2007). Additionally, high amounts of precipitation can change the shape of stream channels, causing more flashy flow and favoring highly tolerant species, which can shift community assemblages from native to non-native species because they have a good physiological tolerance to low or high flows (Allan and Castillo 2007). We also found a few differences in which broad-scale drivers were important for understanding variation in species richness for lakes versus streams. For example, HU4 agricultural land cover explained variation in stream species richness, but not lakes (Figure 4.2). A study comparing lake and stream biotic responses to total phosphorus (TP) concentration found a positive correlation between elevated TP and stream fish communities, but less clear patterns in lakes (Johnson et al. 2014). This result may indicate that fish respond more strongly to nutrient concentrations (and other effects of agriculture like increased sediment) in streams than lakes, possibly due to the stronger lateral connectivity between streams and the surrounding landscape. Other national scale studies also found a similar positive relationship between intolerant fish species in streams and agriculture (Essleman et al. 2011). This positive relationship between fish species richness and agriculture may be driven by several factors. First, in our study extent, land use and other natural features may be confounded. Coarse geology which promotes cold waters that support less species occurs in the north, whereas finer geology that supports warm waters and more species as well as good conditions for agriculture occurs in 54 the south. Second, there could be an interaction between agriculture and local-scale variables that we did not test in our models (e.g. riparian vegetation), which would reduce the stressors caused by agricultural land cover (Gergel et al. 2002). Finally, this result may be due to the presence of nonnative fish species since previous research has shown that nonnative richness increases in agricultural areas, which contributes to an overall increase in total species richness (Peoples et al. 2020). Investigating the confounding effects of broad-scale natural and anthropogenic features and including more local scale drivers might give insight into mechanistic processes occurring across scales. A second difference in drivers of fish species richness between lakes and streams was the effect of HU4 road density. We expected a negative effect of road density on both lake and stream richness because impervious surfaces increase runoff, which can increase turbidity and pollutants (Trombulak and Frissell 2000; Allan 2004). However, we found no effect on streams and a positive effect of road density on lake species richness (Figure 4.2). This result in lakes may signal increased human transport of fishes along roadways either by stocking, pet release, or unintentionally through fishing boats (Trombulak and Frissell 2000; Stohlgren et al. 2006). Our study of both lake and stream fish communities increases understanding of the similarities and differences among multi-scale drivers of fish species richness. 4.5.2 Differences in connectivity-fish species richness relationships across HU4 regions and freshwater types Although we found that connectivity had a positive effect on species richness overall in both lakes and streams, our macroscale study was important for elucidating that this relationship varied across HU4 regions. In addition, by studying both lake and stream fishes, we found 55 differences in the effects of the cross-scale interaction across freshwater types, which are consistent with some previous studies at local-scales. In midwestern streams, we found a cross-scale interaction between agriculture land use and the relationship between the distance to the nearest upstream lake and fish species richness (Figure 4.5b). Less disturbed HU4 regions had a positive effect on the relationship between distance from the nearest upstream lake and stream species richness. In contrast, HU4 regions with high agricultural land use had a negative effect of distance from the nearest upstream lake on stream species richness (Figure 4.6b). We pose the following possible explanations for these contrasting results. First, we hypothesize that in low agriculture regions, predator fish in nearby upstream lakes may move into streams and prey on stream fish, reducing species richness by shifting assemblages to lentic species and habitat generalists (Schlosser et al 2000; Herbert et al. 2003). In contrast, fish assemblages in high agriculture regions tend to be more influenced by the watershed factors relative to local-scale factors (Wang et al. 2006) and have poor water quality, so are more likely to be dominated by tolerant species and non-native species. Therefore, the presence of a lake in such high agricultural regions may lead to increased species richness of nearby streams by trapping excess particulates and nutrients (Kling et al. 2000, Zhang et al. 2012), thereby improving water quality of downstream waters and allowng more species to survive. In addition, the type of lake upstream of a stream site may have important effects on stream fish communities. For example, a natural lake might provide habitat for stream fish, whereas a reservoir that modifies the natural flow, sediment transport, and migration of fish may not (Sondergaard and Jeppesen 2007). The above possible mechanisms that could explain this important broad-scale pattern in fish species richness and the role of lakes on stream biota warrants further research. 56 Agriculture had the opposite effect on the connectivity - species richness relationship in lakes (Figure 4.6a). In low agriculture HU4 regions, distance to a nearby upstream lake and lake species richness had a negative relationship, as we expected. This result can be explained by the fact that nearby lakes have been shown to provide fish with refuge and source fish populations that facilitate higher species richness (Magnuson et al. 1998). In HU4 regions with higher proportion agricultural land cover, we found that less connectivity to other lakes had a positive relationship with species richness, perhaps due to less transport of excess nutrients and sediment from streams causing water quality degradation (Jackson and Pringle 2010) or less chance of non-native species moving into the area (Fausch et al. 2009). Our broad-scale study elucidates an important relationship between connectivity and fish species richness that changes regionally, depends on the freshwater type, and can be a starting point to develop local-scale or experimental studies to further understand the mechanisms underlying these patterns. 4.5.3 Conclusions and Management Implications We found that both lake and stream fish species richness benefit from being connected to other water bodies, although the way fish use connected water bodies differs across freshwater types. These results have many implications for science and management of freshwaters. For example, our study supports the idea that ecological drivers across scales can influence how biota may respond to future global change (e.g., Scheffer et al. 2015; Wagner et al. 2016), which can help managers plan at finer scales by understanding how the regional landscape may constrain local drivers. For stream fishes in particular, managers will want to consider the upstream lakes' effect on downstream waters. And, given the cross-scale interactions we found in our research, managers will want to consider that the relationships between connectivity and species diversity may vary across regions, and may be particularly important in regions 57 vulnerable to human impacts. Therefore, making decisions on reconnecting waterways will need to take a multi-scale approach that uses information from a broader geographic area than the immediate watershed. All of these implications point to the importance of future integrated investigations of lakes and streams for understanding patterns and processes related to fish species richness, which will then translate into better predictions of high-risk areas for biodiversity loss and targeting of areas for conservation to prevent future degradation of biodiversity. 4.6. Acknowledgements We thank state and federal agencies for collecting and supporting the assemblage of these fish data, including the US Fish and Wildlife Service, USGS, Environmental Protection Agency, Michigan Department of Natural Resources Status and Trends program, Wisconsin Department of Natural Resources, Iowa DNR, Illinois Department of Natural Resources, Maine Department of Environmental Protection; New Hampshire Fish and Game. Thanks to Mary Tate Bremigan and Dana Infante for their collaboration with data sources. We also thank Dana Infante, Patricia Soranno, and Gary Roloff for their comments and feedback on initial drafts. 58 APPENDIX 59 Figure 4.7. Correlation of watershed drivers 60 Figure 4.8. Correlation of connectivity metrics 61 Figure 4.9. Correlation matrix of HU4 drivers 62 Figure 4.10. Correlation matrix across HU4 and local scales 63 REFERENCES 64 REFERENCES Abell, R., Thieme, M.L., Revenga, C., Bryer, M., Kottelat, M., Bogutskaya, N., Coad, B., Mandrak, N., Balderas, S.C., Bussing, W., Stiassny, M.L.J., Skelton, P., Allen, GR., Unmack, P., Naseka, A., Ng, R., Sindorf, N., Robertson, J., Armijo, E., Higgins, J.V., Heibel, T.J., Wikramanayake, E., Olson, D., López, H.L., Reis, R.E., Lundberg, J.G., Pérez, M.H.S., and Paulo, P. 2008. Freshwater Ecoregions of the World: A New Map of Biogeographic Units for Freshwater Biodiversity Conservation. BioScience 58:403–414. https://doi.org/10.1641/B580507. Alger, B.M. 2009. Measuring Anthropogenic Disturbances in a hydrogeomorphic-based lake classification built using fish assemblages in 360 north-temperate lakes. M. Sc. thesis, Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan. Allan, J. D. 2004. Landscapes and Riverscapes: The Influence of Land Use on Stream Ecosystems. Annual Review of Ecology, Evolution, and Systematics 35:257–84. https://doi.org/10.1146/annurev.ecolsys.35.120202.110122. Allan, J. D., and Castillo, M. M. 2007. Stream Ecology 2nd Ed. Springer, Dordrecht, The Netherlands. Allan, J. D., Erickson, D. L., and Fay, J. 1997. The influence of catchment land use on stream integrity across multiple spatial scales. Freshwater Biology 37:149–161. https://doi.org/10.1046/j.1365-2427.1997.d01-546.x. Beisner, B. E., Peres-Neto, P. R., Lindström, E. S., Barnett, A., Longhi, M. L., Lindstr, E. S., and Longhi, M. L. 2006. The Role of Environmental and Spatial Processes in Structuring Lake Communities from Bacteria to Fish. Ecology 87:2985–2991. Bishop-Taylor, R., Tulbure, M.G. and Broich, M. 2015. Surface water network structure, landscape resistance to movement and flooding vital for maintaining ecological connectivity across Australia’s largest river basin. Landscape Ecology 30:2045. https://doi.org/10.1007/s10980-015-0230-4. Brooks, S.P. and Gelman, A. 1998. General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 7:434– 455. Calabrese, J. M., and Fagan, W. F. 2004. A comparison-shopper’s Guide to connectivity metrics. Ecology and the Environment 2:529-536. Cardille, J. A., Carpenter, S. R., Coe, M. T., Foley, J. A., Hanson, P. C., Turner, M. G., and Vano, J. A. 2007. Carbon and water cycling in lake-rich landscapes: Landscape connections, lake hydrology, and biogeochemistry. Journal of Geophysical Research 112:1–18. https://doi.org/10.1029/2006JG000200. 65 Collins, S.M., I.M. McCullough, S.K. Oliver, and Skaff, N.K. 2018. LAGOS-NE Annual, seasonal, and monthly climate data for lakes and watersheds in a 17-state region of the U.S. ver 1. Environmental Data Initiative. https://doi.org/10.6073/pasta/4abe86a2c00dc9a628924aa149d7bf34. Accessed 12/23/2020. Cooper, A.R, and Infante, D.M. 2017. Dam Metrics Representing Stream Fragmentation and Flow Alteration for the Conterminous United States Linked to the NHDPLUSV1: U.S. Geological Survey data release. https://doi.org/doi:10.5066/F7FN14C5. Cooper, A.R., Infante, D.M., Daniel, W.M., Wehrly, K.E., Wang, L., and Brenden, T.O. 2017. Assessment of dam effects for streams and fish assemblages of the conterminous USA. Science of the Total Environment. doi:10.1016/j.scitotenv.2017.02.067. Cormen, T.H., Leiserson, C.E., Rivest, R.L., and Stein, C. 2001. Section 22.3: Depth-first search and Section 24.3: Dijkstra's algorithm. In Introduction to Algorithms (2nd Ed., pp. 540–549, 595-601). MIT Press and McGraw-Hill. ISBN 0-262-03293-7. Covino, T. 2016. Hydrologic connectivity as a framework for understanding biogeochemical flux through watersheds and along fluvial networks. Geomorphology 277:133–144. https://doi.org/10.1016/j.geomorph.2016.09.030. Csardi G. and Nepusz, T. 2006. The igraph software package for complex network research. Daniel, W. M., Infante, D. M., Hughes, R. M., Tsang, Y.-P., Esselman, P. C., Wieferich, D., InterJournal Complex Systems 1695. https://igraph.org. Herreman, K., Cooper, A.R., Wang, L., and Taylor, W. W. 2015. Characterizing coal and mineral mines as a regional source of stress to stream fish assemblages. Ecological Indicators 50: 50–61. https://doi.org/10.1016/j.ecolind.2014.10.018. Eadie, J. M., Hurly, T. A., Montgomerie, R. D., and Teather, K. L. 1986. Lakes and rivers as islands: species-area relationships in the fish faunas of Ontario. Environmental Biology of Fishes 15. Eros, T., Olden, J. D., Schick, R. S., Schmera, D., and Fortin, M. J. 2012. Characterizing connectivity relationships in freshwaters using patch-based graphs. Landscape Ecology https://doi.org/10.1007/s10980-011-9659-2. ESRI. 2014. ArcGIS Desktop: Release 10.3. Redlands, CA: Environmental Systems Research Esselman, P. C., Infante, D. M., Wang, L., Wu, D., Cooper, A. R., and Taylor, W. W. 2011. An index of Cumulative Disturbance to River Fish Habitats of the Conterminous United States from Landscape Anthropogenic Activities. Ecological Restoration, North America 29:133– 151. doi: 10.3368/er.29.1-2.133. Institute. 66 Fausch, K. D., Rieman, B. E., Dunham, J. B., Young, M. K., and Peterson, D. P. 2009. Invasion versus Isolation: Trade-offs in Managing Native Salmonids with Barriers to Upstream Movement. Conservation Biology 23:859–870. https://doi.org/10.1111/j.1523- 1739.2008.01159.x. Fergus, C., J. Lapierre, S. Oliver, N. Skaff, K. Cheruvelil, K. Webster, C. Scott, and Soranno, P. 2017. Integrated freshwater abundance and connectivity clusters at the Hydrologic Unit 8 scale for the Midwest and Northeast U.S.A. freshwater metric variables and k-means cluster assignment ver 2. Environmental Data Initiative. https://doi.org/10.6073/pasta/e69b7495674e403baa19a22ffbfb40e1. Gardner, J.R., Pavelsky, T.M., and Doyle, M.W. 2019. The Abundance, Size, and Spacing of Lakes and Reservoirs Connected to River Networks. Geophysical Research Letters. https://doi.org/10.1029/2018GL080841. Gelman, A., and Hill, J. 2007. Data analysis using regression and multilevel/ hierarchical models. Cambridge, UK: Cambridge University Press. Gergel, S. E., Turner, M. G., Miller, J. R., Melack, J. M., and Stanley, E. H. 2002. Landscape indicators of human impacts to riverine systems. Aquatic Sciences 64:118–128. https://doi.org/10.1007/s00027-002-8060-2. Google Earth Pro. NOAA. https://www.google.com/earth. Accessed 11/01/2019. Gotelli, N. J. and Chao, A. 2013. Measuring and Estimating Species Richness, Species Diversity, and Biotic Similarity from Sampling Data. Encyclopedia of Biodiversity: Second Edition 5:195- 211. https://doi.org/10.1016/B978-0-12-384719-5.00424-X. Griffiths, D. 2010. Pattern and process in the distribution of North American freshwater fish. Biological Journal of the Linnean Society 100: 46–61. Griffiths, D. 2015. Connectivity and vagility determine spatial richness gradients and diversification of freshwater fish in North America and Europe. Biological Journal of the Linnean Society 116:773–786. https://doi.org/10.1111/bij.12638. Heffernan, J. B., Soranno, P. A., Angilletta, M. J., Buckley, L. B., Gruner, D. S., Keitt, T. H., Kellner, J.R., Kominoski, J.S., Rocha, A.V., Xiao, J., Harms, T.K., Goring, S.J., Koenig, L.E., McDowell, W.H., Powell, H., Richardson, A.D., Stow, C.A., Vargas, R. and Weathers, K. C. 2014. Macrosystems ecology: Understanding ecological patterns and processes at continental scales. Frontiers in Ecology and the Environment 12:5–14. https://doi.org/10.1890/130017. Heino, J., Alahuhta, J., Bini, L. M., Cai, Y., Heiskanen, A., Hellsten, S., Kortelainen, P., the era of global change: moving beyond single-lake thinking in maintaining biodiversity and Kotamaki, N., Tolonen, K.T., Vihervaara, P., Vilmi, A., and Angeler, D. G. 2021. Lakes in ecosystem services. Biological Reviews 96:89–106. https://doi.org/10.1111/brv.12647. 67 Henriques-Silva, R., Logez, M., Reynaud, N., Tedesco, P. A., Brosse, S., Januchowski-Hartley, S. R., Oberdorff, T., and Argillier, C. 2019. A comprehensive examination of the network position hypothesis across multiple river metacommunities. Ecography 42. https://doi.org/10.1111/ecog.03908. Herbert, M. E., and Gelwick, F. P. 2003. Spatial Variation of Headwater Fish Assemblages Explained by Hydrologic Variability and Upstream Effects of Impoundment. Copeia 2: 273- 284. Hill, R. A., Weber, M. H., Debbout, R. M., Leibowitz, S. G., and Olsen, A. R. 2018. The lake- catchment (LakeCat) dataset: Characterizing landscape features for lake basins within the conterminous USA. Freshwater Science 37:208–221. Hill, R.A., Weber, M.H., Leibowitz, S.G, Olsen, A.R., and Thornbrugh, D.J. 2016. The Stream- Catchment (StreamCat) Dataset: A Database of Watershed Metrics for the Conterminous United States. Journal of the American Water Resources Association 52:120-128. doi: 10.1111/1752-1688.12372. Irz, P., Michonneau, F., Oberdorff, T., Whittier, T. R., Lamouroux, N., Mouillot, D., and Argillier, C. 2007. Fish community comparisons along environmental gradients in lakes of France and north-east USA. Global Ecology and Biogeography. https://doi.org/10.1111/j.1466-8238.2006.00290.x. Jackson, D. A., Peres-Neto, P. R., and Olden, J. D. 2001. What controls who is where in freshwater fish communities--the roles. Canadian Journal of Fisheries and Aquatic Sciences 58. doi: 10.1139/cjfas-58-1-157. Jackson, C. R., and Pringle, C. M. 2010. Ecological Benefits of Reduced Hydrologic Connectivity in Intensively Developed Landscapes. BioScience 60:37–46. https://doi.org/10.1525/bio.2010.60.1.8. Johnson, R. K., Angeler, D. G., Moe, S. J., and Hering, D. 2014. Cross-taxon responses to elevated nutrients in European streams and lakes. Aquatic Sciences 76:51–60. https://doi.org/10.1007/s00027-013-0311-x. Jones, N. E. 2010. Incorporating lakes within the river discontinuum: longitudinal changes in ecological characteristics in stream–lake networks. Canadian Journal of Fisheries and Aquatic Sciences. 67:1350–1362. https://doi.org/10.1139/F10-069. Jude D.J., and Pappas, B. 1992. Fish utilization of Great Lakes Coastal Wetlands. Journal of Great Lakes Research 18:651-672. King, K.B.S., Bremigan, M.T., Infante, D., and Cheruvelil, K.S. 2021. Surface water connectivity affects lake and stream fish species richness and composition. Canadian Journal of Fisheries and Aquatic Sciences. http://dx.doi.org/10.1139/cjfas-2020-0090. 68 King, K., Wang, Q., Rodriguez, L.K., and Cheruvelil, K.S. in review. Lake connectivity networks for the conterminous U.S. (LAGOS-US NETWORKS). Limnology and Oceanography Letters. Submitted for peer review December 2020. King, K., Wang, Q., Rodriguez, L.K., Haite, M., Danila, L., Tan, P.N., Zhou, J., and Cheruvelil, K.S. in review. User Guide for LAGOS-US NETWORKS v1.0: Data module of surface water networks characterizing connections among lakes, streams, and rivers in the conterminous U.S. Environmental Data Initiative. Submitted for peer review December 2020. https://portal-s.edirepository.org/nis/mapbrowse?scope=edi&identifier=681. Kling, G. W., Kipphut, G. W., Miller, M. M., and O’Brien, W. J. 2000. Integration of lakes and streams in a landscape perspective: The importance of material processing on spatial patterns and temporal coherence. Freshwater Biology 43: 477–497. https://doi.org/10.1046/j.1365- 2427.2000.00515.x. Kraemer, B. M. 2020. Rethinking discretization to advance limnology amid the ongoing information explosion. Water Research 178. https://doi.org/10.1016/j.watres.2020.115801. Kratz, T., Webster, K., Bowser, C., Magnuson, J., and Benson, B. 1997. The influence of landscape position on lakes in northern Wisconsin. Freshwater Biology 37:209–217. MacArthur R.H., and Wilson E.O. 1967. The Theory of Island Biogeography. Princeton (NJ): Princeton University Press. Magnuson, J. J., Tonn, W. M., Banerjee, A., Toivonen, J., Sanches, O., and Rask, M. 1998. Isolation vs. extinction in the assembly of fishes in small northern lakes. Ecology 79:2941– 2956. Martin, S. L., and Soranno, P. A. 2006. Lake landscape position: Relationships to hydrological connectivity and landscape features. Limnology and Oceanography 51:801–814. Mccluney, K. E., Poff, L., Palmer, M. A., Thorp, J. H., Poole, G. C., Williams, B. S., Williams, M.R., and Baron, J. S. 2014. Riverine macrosystems ecology: sensitivity, resistance, and resilience of whole river basins with human alterations. Frontiers in Ecology and the Environment 12:48–58. https://doi.org/10.1890/120367. McKay, L., Bondelid, T., Dewald, T., Johnston, J., Moore, R., and Rea, A. 2012. NHDPlus Version 2: User Guide. Mehner, T., Diekmann, M. , Bramick, U., and Lemcke, R. 2005. Composition of fish communities in German lakes as related to lake morphology trophic state, shore structure and human-use intensity. Freshwater Biology 50:70–85. Morelli, T. L., Daly, C., Dobrowski, S. Z., Dulen, D. M., Ebersole, J. L., Jackson, S. T., Jessica D. Lundquist, Millar, C.I., Maher, S.P., Monahan, W.B., Nydick, K.R., Redmond, K.T., 69 Sawyer, S.C., Stock, S., and Beissinger, S. R. 2016. Managing Climate Change Refugia for Climate Adaptation. Plos One. https://doi.org/10.1371/journal.pone.0159909. National Inventory of Dams (NID). 2015. Washington, DC: U.S. Army Corps of Engineers: Federal Emergency Management Agency. https://nid.sec.usace.army.mil/ords/f?p=105:1. Niu, S. Q., Franczyk, M. P., and Knouft, J. H. 2012. Regional species richness, hydrological characteristics and the local species richness of assemblages of North American stream fishes. Freshwater Biology 57:2367–2377. https://doi.org/10.1111/fwb.12016. Olden, J. D., Jackson, D. A., and Peres-Neto, P. R. 2001. Spatial Isolation and Fish Communities in Drainage Lakes. Oecologia 127:572–585. doi: 10.1007/s004420000620. Ostroff, A., Wieferich, D., Cooper, A., Infante, D. and USGS Aquatic GAP Program. 2013. 2012 National Anthropogenic Barrier Dataset (NABD): U.S. Geological Survey - Aquatic GAP Program: Denver, CO. Peoples, B. K., Davis, A. J. S., Midway, S. R., Olden, J. D., and Stoczynski, L. 2020. Landscape- scale drivers of fish faunal homogenization and differentiation in the eastern United States. Hydrobiologia 4. https://doi.org/10.1007/s10750-019-04162-4. Peters, D. P. C., Bestelmeyer, B. T., and Turner, M. G. 2007. Cross-Scale Interactions and Changing Pattern-Process Relationships: Consequences for System Dynamics 10:790–796. https://doi.org/10.1007/sl0021-007-9055-6. Plummer, M. 2003. JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd International workshop on distributed statistical computing (DSC 2003). Presented at the Hornik, K., Leisch, F. and Zeileis, A., editors, pp. 20–22. Poff, L. N. 1997. Landscape Filters and Species Traits: Towards Mechanistic Understanding and Prediction in Stream Ecology. Journal of the North American Benthological Society 16:391– 409. https://www.jstor.org/stable/1468026. PRISM Climate Group. Oregon State University, http://prism.oregonstate.edu, created 4 Feb 2004. Accessed 01/13/2021. R Core Team. 2021. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available from: https://www.R-project.org/. Riera, J. L., Magnuson, J. J., Kratz, T. K., and Webster, K. E. 2000. A geomorphic template for the analysis of lake districts applied to the Northern Highland Lake District, Wisconsin, U.S.A. Freshwater Biology 43: 301–318. https://doi.org/10.1046/j.1365-2427.2000.00567.x. Rivers American. American Rivers Dam Removal Database. 2019. In: Database: Figshare. Available https://doi.org/10.6084/m9.figshare.5234068.v6. 70 Roth, N. E., Allan, J.D., and Erickson, D. L. 1996. Landscape influences on stream biotic integrity assessed at multiple spatial scales. Landscape Ecology 11:141–156. https://doi.org/10.1007/BF02447513. Sadro, S., Nelson, C. E., and Melack, J. M. 2012. The Influence of Landscape Position and Catchment Characteristics on Aquatic Biogeochemistry in High-Elevation Lake-Chains. Ecosystems 15:363–386. https://doi.org/10.1007/s10021-011-9515-x. Saunders, M. I., Brown, C. J., Foley B, H, M. M., Febria, C. M., Albright, R., Mehling, M. G., Kavanaugh, M.T., and Burfeind, D. D. 2016. Human impacts on connectivity in marine and freshwater ecosystems assessed using graph theory: a review. Marine and Freshwater Research, 67, 277–290. https://doi.org/10.1071/MF14358. Scheffer, M., Barrett, S., Carpenter, S. R., Folke, C., Green, A. J., Holmgren, M., Hughes, T.P., Kosten, S., Van De Leemput, I.A., Nepstad, D.C., Van Nes, E.H., Peters, E.T.H.M., and Walker, B. 2015. Creating a safe operating space for iconic ecosystems: Manage local stressors to promote resilience to global change. Science. https://doi.org/10.1126/science.aaa3769. Schlosser, I. J., Johnson, J. D., Ladd Knotek, W., and Lapinska, M. 2000. Climate Variability and Size-Structured Interactions Among Juvenile Fish Along A Lake-Stream Gradient. Ecology 81. Seaber, P.R., Kapinos, F.P. and Knapp, G.L. 1987. Hydrologic units maps, U.S. Geological Survey Water Supply Paper 2294. Section 508. Accessible from: http://www.usgs.gov/accessibility.html. Sharma, S., Legendre, P., De Cáceres, M., and Boisclair, D. 2011. The role of environmental and spatial processes in structuring native and non-native fish communities across thousands of lakes. https://doi.org/10.1111/j.1600-0587.2010.06811.x. Shreve, R. L. 1967. Infinite topologically random channel networks. Journal of Geology 75:178- 186. Smith, N.J., Webster, K.E., Rodriguez, L., Cheruvelil, K.S., and Soranno, P.A. 2020. LAGOS- US LOCUS v1.0: Module of location, identifiers, and physical characteristics of lakes and their watersheds in the conterminous U.S. Environmental Data Initiative. Accessed 05/13/2019. Solheim, A. L., Globevnik, L., Austnes, K., Kristensen, P., Moe, S. J., Persson, J., Phillips, G., Poikane, S., van de Bund, W., and Birk, S. 2019. A new broad typology for rivers and lakes in Europe: Development and application for large-scale environmental assessments. Science of the Total Environment. https://doi.org/10.1016/j.scitotenv.2019.134043. 71 Søndergaard, M., and Jeppesen, E. 2007. Anthropogenic impacts on lake and stream ecosystems, and approaches to restoration. Journal of Applied Ecology 44:1089–1094. https://doi.org/10.1111/j.1365-2664.2007.01426.x. Soranno, P.A., Bacon, L.C. , Beauchene, M., Bednar, K.E., Bissell, E.G., Boudreau, C.K., Boyer, M.G., Bremigan, M.T., Carpenter, S.R., Carr, J.W., Cheruvelil, K.S., Christel, S.T., Claucherty, M., Collins, S.M., Conroy, J.D., Downing, J.A., Dukett, J., Fergus, C.E., Filstrup, C.T., Funk, C., Gonzalez, M.J., Green, L.T., Gries, C., Halfman, J.D., Hamilton, S.K., Hanson, P.C., Henry, E.N., Herron, E.M., Hockings, C. Jackson, J.R., Jacobson-Hedin, K., Janus, L.L., Jones, W.W., Jones, J.R., Keson, C.M., King, K.B.S., Kishbaugh, S.A, Lapierre, J.-F. , Lathrop, B., Latimore, J.A., Lee, Y., Lottig, N.R., Lynch, J.A., Matthews, L.J., McDowell, W.H., Moore, K.E.B., Neff, B.P., Nelson, S.J., Oliver, S.K., Pace, M.L., Pierson, D.C., Poisson, A.C., Pollard, A.I., Post, D.M., Reyes, P.O., Rosenberry, D.O., Roy, K.M., Rudstam, L.G., Sarnelle, O., Schuldt, N.J., Scott, C.E., Skaff, N.K., Smith, N.J., Spinelli, N.R., Stachelek, J.J., Stanley, E.H., Stoddard, J.L., Stopyak, S.B., Stow, C.A., Tallant, J.M., Tan, P.-N., Thorpe, A.P., Vanni, M.J., Wagner, T. Watkins, G., Weathers, K.C., Webster, K.E., White, J.D., Wilmes, M.K., and Yuan, S. 2017. LAGOS-NE: A multi- scaled geospatial and temporal database of lake ecological context and water quality for thousands of U.S. lakes. Gigascience https://doi.org/10.1093/gigascience/gix101. Soranno, P.A., Bissell, E.G., Cheruvelil, K.S., Christel, S.T., Collins, S.M., Fergus, C.E., Filstrup, C.T., Lapierre, J.F, Lottig, N.R., Oliver, S.K., Scott, C.E., Smith, N.J., Stopyak, S., Yuan, S., Bremigan, M.T., Downing, J.A., Gries, C., Henry, E.N., Skaff, N.K., Stanley, E.H., Stow, C.A., Tan, P.-N., Wagner, T., and Webster, K.E. 2015. Building a multi-scaled geospatial temporal ecology database from disparate data sources: Fostering open science and data reuse. GigaScience. https://doi.org/10.1186/s13742-015-0067-4. Soranno, P. A., Cheruvelil, K. S., Bissell, E. G., Bremigan, M. T., Downing, J. A., Fergus, C. E., Filstrup, C.T., Henry, E.N., Lottig, N.R., Stanley, E.H., Stow, C.A., Tan, P.N., Wagner, T., and Webster, K. E. 2014. Cross-scale interactions: Quantifying multi-scaled cause-effect relationships in macrosystems. Frontiers in Ecology and the Environment 12:65–73. https://doi.org/10.1890/120366. Soranno, P. A., Cheruvelil, K. S., Wagner, T., Webster, K. E., and Bremigan, M. T. 2015. Effects of land use on lake nutrients: The importance of scale, hydrologic connectivity, and region. PLoSONE 10:1–22. https://doi.org/10.1371/journal.pone.0135454. Soranno, P. A., Cheruvelil, K.S., Webster, K. E., Bremigan, M. T., Wagner, T., and Stow, C. A. 2010. Using Landscape Limnology to Classify Freshwater Ecosystems for Multi-ecosystem Management and Conservation. BioScience 60:440–454. https://doi.org/10.1525/bio.2010.60.6.8. Stendera, S., Januschke, Á. K., Hering, Á. D., Adrian, R., Bonada, N., Cañedo-Argüelles, Á. M., and Pletterbauer, F. 2012. Drivers and stressors of freshwater biodiversity patterns across different ecosystems and scales: a review. Hydrobiologia 696:1–28. https://doi.org/10.1007/s10750-012-1183-0. 72 Stohlgren, T. J., Barnett, D., Flather, C., Fuller, P., Peterjohn, B., Kartesz, J., and Master, L. L. 2006. Species richness and patterns of invasion in plants, birds, and fishes in the United States. Biological Invasions 8:427–447. https://doi.org/10.1007/s10530-005-6422-0. Strahler , A. N. 1957. Quantitative analysis of watershed geomorphology. Transactions of the American Geophysical Union 38:913–920. https://doi.org/10.1029/TR038i006p00913. Su, Y.S. and Yajima, M. 2020. Using R to Run ‘JAGS’. R package. https://cran.r- project.org/package=R2jags. Taylor, P. D., Fahrig, L., Henein, K., and Marriam, G. 1993. Connectivity Is a Vital Element of Landscape Structure. Oikos 68:571–573. Thorp, J. H. 2014. Metamorphosis in river ecology: From reaches to macrosystems. Freshwater Biology 59:200–210. https://doi.org/10.1111/fwb.12237. Tonn, W. M. 1990. Climate Change and Fish Communities: A Conceptual Framework. Transactions of the American Fisheries Society 119:337–352. Tonn, W. M., and Magnuson, J. J. 1982. Patterns in the Species Composition and Richness of Fish Assemblages in Northern Wisconsin Lakes. Ecology 63:1149–1166. doi:10.2307/1937251. Trombulak, S. C., and Frissell, C. A. 2000. Review of Ecological Effects of Roads on Terrestrial and Aquatic Communities. Conservation Biology 14:1. Urban, D., and Keitt, T. 2001. Landscape Connectivity: A Graph-Theoretic Perspective. Ecology 82:1205–1218. https://doi.org/10.1890/00129658(2001)082[1205:LCAGTP]2.0.CO;2. U.S. Geological Survey. 2019. National Hydrography Dataset (NHDPlus Version 2.1 for Hydrologic Unit (HU) 4 - 2001 (published 10/02/2019)). https://www.epa.gov/waterdata/nhdplus-national-data. Accessed 08/05/2019. Vannote, R. L., Minshall, G. W., Cummins, K. W., Sedell, J. R., and Cushing, C. E. 1980. The River Continuum Concept. Canadian Journal of Fisheries and Aquatic Science 37:130–137. Wagner, T., Fergus, C. E., Stow, C. A., Cheruvelil, K. S., and Soranno, P. A. 2016. The statistical power to detect cross-scale interactions at macroscales. Ecosphere 7:1–12. https://doi.org/10.1002/ecs2.1417. Wagner, T., and Midway, S. R. 2014. Modeling spatially varying landscape change points in species occurrence thresholds. Ecosphere 5: 1–16. https://doi.org/10.1890/ES14-00288.1. Wang, Q. and King, K. 2020. Code for LAGOS-US NETWORKS v1.0 (Version v1.0.0) Zenodo. http://doi.org/10.5281/zenodo.4383172. 73 Wang, L., Lyons, J., Kanehl, P., and Gatti, R. 1997. Influences of watershed land use on habitat quality and biotic integrity in Wisconsin streams. Fisheries 22:6–12. Wang, L., Seelbach, P.W., and Lyons, J. 2006. Effects of levels of human disturbance on the influence of catchment, riparian, and reach-scale factors on fish assemblages. In Landscape influences on stream habitats and biological assemblages. R. M. Hughes, L. Wang, and P. W. Seelbach, editors. Pages 199–219. American Fisheries Society, Symposium 48, Bethesda, Maryland. Webster, K. E., Soranno, P. A., Baines, S. B., Kratz, T. K., Bowser, C. J., Dillon, P. J., Campbell, P., Fee, E.J., and Hecky, R. E. 2000. Structuring features of lake districts: Landscape controls on lake chemical responses to drought. Freshwater Biology 43:499–515. https://doi.org/10.1046/j.1365-2427.2000.00571.x. Wherly, K. E., Breck, J. E., Wang, L., and Szabo-Kraft, L. 2012. A Landscape-Based Classification of Fish Assemblages in Sampled and Unsampled Lakes. Transactions of the American Fisheries Society 141:414-425. doi:10.1080/00028487.2012.667046. Willis, T. V, and Magnuson, J. J. 2000. Patterns in fish species composition across the interface between streams and lakes. Canadian Journal of Fisheries and Aquatic Sciences 57:1042– 1052. https://doi.org/10.1139/f00-028. Winslow, LA, Hahn, T.H., Princiotta, S.D., Leach, T.H., and Rose, K.C. 2018. hydrolinks: A new tool to link macroscale to inland water bodies. Version 0.10.0. https://cran.r- project.org/web/packages/hydrolinks/index.html. Yang, W., Ma, K., and Kreft, H. 2013. Geographical sampling bias in a large distributional database and its effects on species richness-environment models. Journal of Biogeography 40:1415–1426. https://doi.org/10.1111/jbi.12108. Zhang, T., Soranno, P. A., Cheruvelil, K.S., Kramer, D. B., Bremigan, M.T., and Ligmann- Zielinska, A. 2012. Evaluating the effects of upstream lakes and wetlands on lake phosphorus concentrations using a spatially-explicit model. Landscape Ecology 27:1015–1030. https://doi.org/10.1007/s10980-012-9762-z. 74