INVESTIGATING LANDSCAPE-STREAM WATER QUALITY RELATIONSHIPS AND STREAM WATER QUALITY PRESERVATION STRATEGIES IN THE TEXAS GULF REGION USING A HYBRID OF MACHINE LEARNING AND HYDROLOGICAL MODELING APPROACH By Runzi Wang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Planning, Design and Construction—Doctor of Philosophy 2020 ABSTRACT INVESTIGATING LANDSCAPE-STREAM WATER QUALITY RELATIONSHIPS AND STREAM WATER QUALITY PRESERVATION STRATEGIES IN THE TEXAS GULF REGION USING A HYBRID OF MACHINE LEARNING AND HYDROLOGICAL MODELING APPROACH By Runzi Wang This research investigates how land use, urban development pattern, topography, soil, climate, and population influence the stream nitrate (NO3 --N), ammonium (NH4 +-N), orthophosphate (PO4 3--P), total phosphate (TP), and Escherichia coli (E.coli) concentrations in the Texas Gulf Region. Specifically, the study focuses on how the land-stream water relationship varies by different sample sites, basins, ecoregions, and different years between 1991 and 2011. It also examines the benefits of compact urban development and verifies the management strategies to place best management practices (BMP) in hydrologically sensitive areas (HSAs). The 2011 cross-sectional study in the Texas Gulf Region indicates that the connectedness of developed areas and the adjacencies between developed areas and other land covers were more significant than the percentage of developed areas in their effect on stream water quality. The relationships between landscape factors and stream water quality varied by season, location, and pollutant category, with these associations generally stronger in dry seasons and in coastal suburban watersheds. Using a random forest machine learning algorithm, a predictive model demonstrated that high density aggregated urban development is the most effective in protecting stream water quality. The predicted average dry season NO3 -N and TP concentrations were 0.17 mg/l and 0.09 mg/l in high density aggregated scenarios, compared to 1.2 mg/l and 0.28 mg/l in the current sprawled development scenario. The longitudinal study from 1991-2011 confirms the effects of controlling developed areas and agricultural areas in improving stream water quality. With the derived annual land cover composition and longitudinal nutrient and E.coli concentration data, it was found that adding 1 percent of developed area led to a 6.31% increase of NO3 --N concentration and a 3.52% increase of PO4 3--P concentration in the Texas Gulf Region. Some unobserved characteristics led to high nutrient concentrations in the Middle Colorado-Concho and the Lower Trinity basins, and high E.coli concentration in the San Jacinto basin. The relationships between land cover and stream water quality varied more at the local scale than basin and region scales; they did not change significantly in the 20 years between 1991 and 2011. In the BMP siting strategy study, the effectiveness of placing BMP in HSAs was verified using a Soil & Water Assessment Tool (SWAT). The hydrological sensitivity of subbasins had a significantly nonlinear positive association with NO3 -N concentrations. Defining HSAs as areas with the highest 2% hydrological sensitivity and designating them to be preserved as green space was the most effective in reducing NO3 -N output. Generally, it was suggested that evidence-based ecological planning should incorporate performance evaluation with valid data-driven methods. Overall, this research was one of the first empirical studies to demonstrate the water quality degradation consequence of urban sprawl and the advantage of compact urban development. Machine learning and big data approaches were proven to be powerful tools for scenario prediction in land use planning to forecast environmental impacts of different urban development patterns. This study also established a robust Texas regional scale longitudinal water quality modeling approach depending upon efficient data fusion techniques, which can guide multiscale land use planning and watershed management. ACKNOWLEDGEMENTS First and foremost, I would like to express my sincere gratitude to my major advisor, Dr. Ming- Han Li. Without Dr. Li’s excellent mentorship, it is impossible for me to become an independent researcher and find a good place to continue my academic career. I am also very appreciated to my committee members at MSU for their support to develop my thesis. They are Dr. Jun-Hyun Kim, Dr. Mark Wilson, and Dr. Scott Loveridge. In addition, I feel grateful to my previous committee members at TAMU, Dr. Xiao Yu, Dr. Xinyuan Wu, and Dr. Sorin Popescu. They help me a lot in initiating this thesis proposal and passing the preliminary exam at TAMU. I want to thank all the faculty at SPDC for inspiring my work. Dr. Yue Cui supported me financially to engage in the COPR research project. I also thank Dr. Galen Newman at TAMU for recommendations in my job search. I took lots of courses at both MSU and TAMU, and I feel grateful to all the instructors who taught me useful knowledge and skills to help me become an interdisciplinary researcher. I appreciate all the great time I spent with my friends at both MSU and TAMU. My friends support me with knowledge and take care of me in my life. I also thank my colleagues on the SESYNC project team. My life and career will be totally different without my friends. I sincerely hope we can be life-long partners and always help each other. Finally, I would like to deeply thank my husband Xuewen Zhang, who always supported me for more than ten years. He is the best engineer, friend, and life-long partner I can think of. I also want to thank my parents, and my parents in law for their love and understanding. Runzi Wang 7/28/2020 iv TABLE OF CONTENTS LIST OF TABLES ........................................................................................................................ vii LIST OF FIGURES ..................................................................................................................... viii KEY TO ABBREVIATIONS ........................................................................................................ ix CHAPTER 1 INTRODUCTION .................................................................................................... 1 BIBLIOGRAPHY ....................................................................................................................... 4 CHAPTER 2 PREDICTING STREAM WATER QUALITY UNDER DIFFERENT URBAN DEVELOPMENT PATTERN SCENARIOS WITH A MACHINE LEARNING APPROACH .. 6 2.1 Introduction ........................................................................................................................... 6 2.2 Data and Method ................................................................................................................... 9 2.2.1 Study Site ........................................................................................................................ 9 2.2.2 Data and variables ........................................................................................................ 10 2.2.3 Data analysis ................................................................................................................. 14 2.2.4 Scenario Design ............................................................................................................ 17 2.3 Result ................................................................................................................................... 21 2.3.1 Important catchment characteristics selected by LASSO regression ........................... 21 2.3.2 Spatial variation of the effects of urban development pattern on stream water quality 28 2.3.4 Stream water quality prediction under alternative planning scenarios ......................... 33 2.4 Discussion ........................................................................................................................... 37 2.4.1 Planning implication based on urban development pattern metrics ............................. 37 2.4.2 The complexity of the impact of urban development pattern on stream water quality 39 2.4.3 Interpretation of the spatiotemporal non-stationary land-water relationships .............. 41 2.4.4 The advantages and limitations of applying machine learning in scenario prediction . 43 2.5. Conclusion .......................................................................................................................... 45 APPENDIX ............................................................................................................................... 47 BIBLIOGRAPHY ..................................................................................................................... 50 CHAPTER 3 DERIVING ANNUAL LAND COVER MAPS AND MODELING THE LONGITUDINAL EFFECT OF LAND COVER CHANGE ON NUTRIENT AND BACTERIA CONCENTRATIONS .................................................................................................................. 56 3.1 Introduction ......................................................................................................................... 56 3.2 Data and Method ................................................................................................................. 59 3.2.1 Study Site ...................................................................................................................... 59 3.2.2 Data ............................................................................................................................... 61 3.2.3 Methods ........................................................................................................................ 63 3.3 Result ................................................................................................................................... 70 3.3.1 Land cover change in the Texas Gulf Region .............................................................. 70 3.3.2 The spatial and temporal distributions of nutrient and bacteria concentrations ........... 73 3.3.3 The longitudinal relationship between land cover and water quality ........................... 78 3.4 Discussion ........................................................................................................................... 86 3.4.1 The impact factors on stream water quality in the Texas Gulf Region ........................ 86 v 3.4.2 The performance of the model system ......................................................................... 87 3.4.3 The limitations of the study and future research suggestions ....................................... 89 3.4.4 Model applications and management implications ...................................................... 91 3.5 Conclusion ........................................................................................................................... 93 BIBLIOGRAPHY ..................................................................................................................... 95 CHAPTER 4 EVALUATING THE EFFECTIVENESS OF WATERSHED PRESERVATION BASED ON THE HYDOLOGICALLY SENSITIVE AREA (HSA) SITING APPROACH—A DEMONSTRATION OF DATA-DRIVEN ECOLOGICAL PLANNING METHOD ............. 107 4.1 Introduction ....................................................................................................................... 107 4.2 Data and Method ............................................................................................................... 110 4.2.2 Data Acquisition ......................................................................................................... 113 4.2.3 HSA Calculation and Mapping ................................................................................... 114 4.2.4 Statistical Analysis ..................................................................................................... 115 4.2.5 SWAT modelling ........................................................................................................ 116 4.3 Result ................................................................................................................................. 116 4.3.1 HSA Map .................................................................................................................... 116 4.3.2 The Relationships between Hydrologically Sensitivity and Water Quality ............... 118 4.3.3 Scenario Simulation .................................................................................................... 119 4.4 Discussion ......................................................................................................................... 120 4.4.1 Water Quality Management Implication .................................................................... 120 4.4.2 The Interdisciplinary Ecological Planning Approach ................................................ 121 4.5 Conclusion ......................................................................................................................... 126 BIBLIOGRAPHY ................................................................................................................... 127 CHAPTER 5 CONCLUSION AND RECOMMENDATION ................................................... 134 vi LIST OF TABLES Table 2-1. Data sources................................................................................................................. 11 Table 2-2. Explanatory variables .................................................................................................. 12 Table 2-3. Scenario description .................................................................................................... 21 Table 2-4. LASSO linear regression results of TP concentration ................................................. 23 Table 2-5. Lasso linear regression results of E.coli concentration ............................................... 25 --N concentration in wet seasons .................... 27 Table 2-6. Lasso linear regression results of NO3 Table 2-7. Model performance comparison between Lasso linear regression and GWR ............ 30 Table 2-8. Random forest prediction results ................................................................................. 34 Table 2-9. Scenario prediction results of pollutant concentration ................................................ 36 Table 2A-2-10. Description of landscape metrics. ....................................................................... 48 Table 3-1. Confusion Matrix of the classification agreement compared with NLCD 2006. ........ 71 Table 3-2. Confusion matrix of the classification agreement compared with NLCD 2001 ......... 71 Table 3-3. Candidate models to predict log (NO3-N) concentration in wet seasons and their comparison .................................................................................................................................... 80 Table 3-4. Mixed model results to predict pollutant concentrations ............................................ 83 --N output in the period from 2008 to 2011.......... 119 Table 4-1. SWAT simulation results of NO3 vii LIST OF FIGURES Figure 2-1. Study Site ................................................................................................................... 10 Figure 2-2. Data analysis flowchart .............................................................................................. 15 Figure 2-3. Scenario Maps ............................................................................................................ 19 --N GWR model performance .................................................... 29 Figure 2-4. TP, E.coli, and NO3 Figure 2-5. GWR model coefficients of urban development pattern effects in the wet season (TP model on the left and E.coli model on the right) .......................................................................... 32 Figure 2-6. Scatter plots of predicted values against observed values of TP concentration in the test set............................................................................................................................................ 33 Figure 2-7. Examples of watersheds with the similar percentage of developed area but different urban development pattern metrics and TP concentration ............................................................ 38 Figure 2-8. Scatter plots showing correlations between IJI, COHESION and the percentage of urban developed areas ................................................................................................................... 40 Figure 3-1. Texas Gulf Region with a base map of NLCD 2011 and the Texas ecoregions ........ 61 Figure 3-2. Method flowchart ....................................................................................................... 64 Figure 3-3. Land cover proportions and conversions of the six ecoregions from 1991 to 2011. . 73 Figure 3-4. Change of nutrients and bacteria concentrations in the six ecoregions ..................... 75 Figure 3-5. The spatial distributions of nutrient and E.coli concentrations in 1991, 2001, and 2011............................................................................................................................................... 78 Figure 3-6. The scatter plots of predicted values vs observed values of Model 4 and Model 5. .. 81 Figure 3-7. Bar charts of random intercepts of basin.................................................................... 85 Figure 4-1. Study Site (The Middle Brazos-Bosque basin) ........................................................ 112 Figure 4-2. Hydrological sensitivity map and the critical source areas in the McGregor subbasin ..................................................................................................................................................... 117 Figure 4-3. The relationship between mean hydrological sensitivity and log (NO3 seasons ........................................................................................................................................ 118 Figure 4-4. Data-driven ecological planning workflow using hydrology layer as an example .. 122 Figure 4-5. Multidisciplinary methods as extensions of the “layer-cake” model ....................... 124 --N) in wet viii KEY TO ABBREVIATIONS AI Aggregation Index AIC Akaike information criterion AREA_MD Median of Patch Area BD Biomass Decrease BI Biomass Increase BMP Best Management Practice CA Total (Class) Area CIRCLE Median of Related Circumscribing Circle COHESION Patch Cohesion Index CONTAG Contagion CONTIG_MD Median of Contiguity Index CRP Clean Rivers Program CSA Critical Source Area CV Change Vector DCIA Directly Connected Impervious area DEM Digital Elevation Model DIVISION Landscape Division Index dNBR differenced Normalized Burn Ratio dNDVI differenced Normalized Difference Vegetation Index E.coli Escherichia coli ED Edge Density ENN_MD Median of Euclidean Nearest Neighbor Distance FRAC_MD Median of Fractal Dimension Index GEE Google Earth Engine GLCM Grey Level Co-occurrence Matrix GWR Geographically Weighted Regression ix GYRATE_MD Median of Radius of Gyration HSA Hydrologically Sensitive Area HUC Hydrologic Unit Code IJI Interspersion Juxtaposition Index LASSO Least Absolute Shrinkage and Selection Operator LID Low Impact Development LPI Largest Patch Index LSI Landscape Shape Index LS factor Length and Steepness factor MESH Effective Mesh Size MSE Mean Square Error MSIDI Modified Simpson’s Diversity Index MSIEI Modified Simpson’s Evenness Index NDVI Normalized Difference Vegetation Index NH4 NLCD National Land Cover Database NLS Non-linear Least Squares NO3 NP Number of Patches NPS Nonpoint Source Pollution NSE Nash–Sutcliffe efficiency OLS Ordinary Least Squares PAFRAC Perimeter-Area Fractal Dimension PARA_MD Median of Perimeter-Area Ratio PCA Principal Component Analysis PD Patch Density PLADJ Proportion of Like Adjacencies PLAND Percentage of Landscape +-N Ammonium --N Nitrate x 3--P Orthophosphate PR Patch Richness PRD Patch Richness Density PO4 RCVMAX Relative Change Vector MAXimum RF Random Forest RUSLE Revised Universal Soil Loss Equation SHAPE_MD Median of Shape Index SHDI Shannon’s Diversity Index SHEI Shannon’s Evenness Index SIDI Simpson’s Diversity Index SIEI Simpson’s Evenness Index SPLIT Splitting Index SSURGO Soil Survey Geographic Database SWAT Soil and Water Assessment Tool SWQM Surface Water Quality Monitoring TCEQ Texas Commission on Environmental Quality TE Total Edge TOPMODEL TOPography based hydrological MODEL TP Total Phosphorous TRI Terrain Ruggedness Index TWI Topographic Wetness Index VSA Variable Source Area xi CHAPTER 1 INTRODUCTION In Texas, 410 out of a total of 1214 water bodies did not meet the applicable water quality standards or were threatened for one or more designated uses according to a 2012 Texas Commission on Environmental Quality (TCEQ) integrated report. Nonpoint source (NPS) pollution contributes to 45% of stream water quality impairment and 48% of lake water quality impairment (TCEQ, 2014). NPS pollution that results from a variety of sources such as lawns, construction areas, farms, and highways is difficult to control. To address the issue, Texas has Watershed Protection Plans to protect and restore stream water quality on a watershed basis across multiple jurisdictions. Therefore, technical support is pressingly needed to meet the complex challenge of stream water quality management at the watershed scale, especially from the NPS pollution point of view. It is a general understanding that land use practices including urbanization, agricultural intensification, and deforestation are dominant drivers in influencing stream water quality (Yu et al., 2013; Manfrin et al., 2016; Zhang et al., 2017). However, this conclusion is sometimes not well applied to the local water environment because there are considerable differences in the relationships between stream water quality and local landscape features in different regions and basins (Ding et al., 2016). In Texas, there are a few studies investigating lake and reservoir water quality, while research efforts on stream water quality are very limited (Santhi et al., 2006; Patino et al., 2014). It is necessary for new studies to provide scientific and technical support in managing stream water quality in response to changing landscapes in Texas. This kind of support will benefit the formulation and implementation of stream water quality conservation policies and practices. Stream water quality is related to many natural and anthropogenic factors such as land use composition, landscape configuration, topography, geology, climate, hydrology, and socioeconomic factors. The interactions between these explanatory factors are also complex 1 depending on spatial and temporal scales. To uncover the complicated nonlinear land-water relationships accurately and explicitly, several knowledge gaps need to be addressed. Firstly, there are fewer predictive studies compared to the common interpretation studies. If stream water quality can be predicted accurately with landscape characteristics, it will inform urban planning and watershed management policy makers about stream water quality under specific planning scenarios. Additionally, although the variation in the stream water quality is well explained by landscape factors using conventional statistical models, it is not guaranteed that the derived quantitative relationship could be generalized to new planning scenarios. Secondly, most previous studies investigating land-water relationships are cross-sectional studies, which are often criticized due to their relatively weak internal validity. Some research based on data from multiple years always treats samples from different years independently. Longitudinal research is thus needed to model how the land-water relationships change with long-term urban development, considering the dependency in stream water quality data from multiple years. Thirdly, to control stream water pollution, although some research has proposed prioritized sites to place best management practices (BMP) or low impact development (LID) practices to treat contaminants before they enter the streams, there are few studies verifying the effectiveness of BMP and LID siting strategies. For example, it is suggested that LID and BMP be placed in HSAs, which is a small portion of the watershed more susceptible to producing runoff (Walter et al., 2000, Martin-Mikle et al., 2015). However, empirical studies to verify the function of HSAs as critical source areas (CSAs) of pollution is still needed. The overall goal of this study is to understand the complex relationships between landscape characteristics and stream water quality in the Texas gulf region with advanced analytical methods; 2 specifically, a combination of conventional statistical models, machine learning algorithms, and hydrological models. The three objectives below will be addressed in the following three chapters: 1) Chapter 2 focuses on predicting stream water quality in the Texas gulf region with landscape characteristics, with the focus on urban developed pattern. It also interprets variations in stream water quality with the most important landscape features with the consideration of spatial variability. The importance of urban development density and urban area configuration on stream water quality is verified. 2) Chapter 3 investigates the changing relationship between land use and stream water quality in the Texas gulf region from 1991 to 2011. It discovers how the variations in the land-water relationships are partitioned spatially and temporally. It also generates annual land cover maps from 1991 to 2011 to match the temporal resolution of the stream water quality data. 3) Chapter 4 confirms that placing BMPs in HSAs is efficient in reducing nutrient loadings in streams with hydrological models. It proposes an interdisciplinary data-driven framework to make suggestions to ecological planning and design. The significance of this study is to apply big data and cutting-edge technologies to frame a large-scale longitudinal study in the landscape architecture discipline. Several key questions related to the stream water quality in Texas were answered, including urban developed pattern impact, regional-scale spatial variations, temporal changes and causal inference, and target management practices. It serves as a comprehensive and multidimensional theoretical and technical guide to the sustainable stream water quality management in Texas. 3 BIBLIOGRAPHY 4 BIBLIOGRAPHY Ding, J., Jiang, Y., Liu, Q., Hou, Z., Liao, J., Fu, L., & Peng, Q. (2016). Influences of the land use pattern on water quality in low-order streams of the Dongjiang River basin, China: a multi-scale analysis. Science of the total environment, 551, 205-216.Manfrin, A., Bombi, P., Traversetti, L., Larsen, S., & Scalici, M. (2016). A landscape-based predictive approach for running water quality assessment: a Mediterranean case study. Journal for nature conservation, 30, 27-31. Martin-Mikle, C. J., de Beurs, K. M., Julian, J. P., & Mayer, P. M. (2015). Identifying priority sites for low impact development (LID) in a mixed-use watershed. Landscape and urban planning, 140, 29-41. Patiño, R., Dawson, D., & VanLandeghem, M. M. (2014). Retrospective analysis of associations between water quality and toxic blooms of golden alga (Prymnesium parvum) in Texas reservoirs: Implications for understanding dispersal mechanisms and impacts of climate change. Harmful Algae, 33, 1-11. Santhi, C., Srinivasan, R., Arnold, J. G., & Williams, J. R. (2006). A modeling approach to evaluate the impacts of water quality management plans implemented in a watershed in Texas. Environmental modelling & software, 21(8), 1141-1157. Texas Commission on Environmental Quality. (2014). Managing nonpoint source pollution in Texas, 2013 annual report Walter, M. T., Walter, M. F., Brooks, E. S., Steenhuis, T. S., Boll, J., & Weiler, K. (2000). Hydrologically sensitive areas: variable source area hydrology implications for water quality risk assessment. Journal of Soil and Water Conservation, 55(3), 277-284. Yu, D., Shi, P., Liu, Y., & Xun, B. (2013). Detecting land use-water quality relationships from the viewpoint of ecological restoration in an urban area. Ecological Engineering, 53, 205- 216. Zhang, L., Karthikeyan, R., Bai, Z., & Srinivasan, R. (2017). Analysis of streamflow responses to climate variability and land use change in the Loess Plateau region of China. Catena, 154, 1-11. 5 CHAPTER 2 PREDICTING STREAM WATER QUALITY UNDER DIFFERENT URBAN DEVELOPMENT PATTERN SCENARIOS WITH A MACHINE LEARNING APPROACH 2.1 Introduction Human-induced land use, such as urban and industrial land use, is recognized as a dominant factor affecting stream water quality. For example, a small increase in the percentage of urban land use has been found to exert a disproportionately large influence on pollutant generation (Ai et al., 2015; Giri and Qiu, 2016; Oeding et al., 2018; Sun et al., 2011; Wijesiri et al., 2018). Within a similar percentage of urban developed areas, varying patterns of urban development can contribute to considerable differences in stream water quality due to different pollutant generation, built-up and wash-off processes (Goonetilleke et al., 2005; Liu et al., 2012). Therefore, stream water quality prediction in various locations, densities, and patterns of urban development can serve as a basis for developing sound stream water quality management schemes (Fan and Shibata, 2015; Holcomb et al., 2018). However, the specific influence of urban development patterns on stream water quality, as well as the influence of spatial and temporal dynamics, remains unclear. Urban development pattern has complex influences on stream water quality as measured by the interactions between area, shape, edge, aggregation of urban areas, and stream pollutant concentrations (Forman, 2014; Sun et al., 2014; Yu et al., 2013;). Theoretically, large areas of directly connected impervious areas (DCIA) have been shown to harm downstream water bodies (Del Monaco, 2017; Jones et al., 2005; Obropta and Del Monaco; 2018 Sohn et al., 2019). However, this does not necessarily mean urban development should be more dispersed to reduce DCIA as it can lead to potential ecosystem fragmentation and difficulty in implementing management practices (Bu et al., 2014; Shi et al., 2017). The ambiguity regarding whether intact or fragmented urban areas cause stream water degradation can be seen in the contradictory conclusions of 6 investigations between urban development pattern and stream water quality. Some researchers have argued that intact urban patterns with large amounts of impervious surface can contribute to water quality deterioration (Ding et al., 2016; Li et al., 2009). However, other studies found that greater interspersion of urban areas, as indicated by high Contiguity Index and Patch Cohesion Index significantly increased the export of pollutants due to the destruction of natural areas (Lv et al., 2014; Shi et al., 2013). More research is needed to address this question, particularly in terms of controlling the percentage of urban developed areas at the same levels. Doing so ensures different urban development patterns are comparable in terms of their influence on stream water quality. One of the major challenges in quantifying stream water quality in accordance with factors of urban development patterns is to understand which factors are the most important/efficient in influencing stream water quality. Some studies have found that size and number of urban areas— as quantified by Patch Density, Largest Patch Index, and Edge Density—showed higher degrees of relationships to water quality compared to the isolation and connectedness of urban areas (Carey et al., 2011; Lee et al., 2009). Others have found that shape and aggregation of urban developed areas had a higher explanatory power in predicting stream water quality variations (Li et al., 2015; Yu et al., 2013). These varying results from previous studies regarding the correlation between urban development pattern and stream water quality have been attributed to two reasons. First, many studies reported important urban development pattern metrics at the local level using a small number of catchment samples (Li et al., 2015; Lintern et al., 2017; Sun et al, 2014). Thus, few studies have investigated the importance of urban development pattern in the context of a large heterogeneous area with a large watershed sample size. Second, there is a lack of more robust methods for improving the generalization of results regarding the importance of urban 7 development pattern metrics. For example, stepwise regression, the most commonly used algorithm for finding variable importance in predicting stream water quality, was found to sometimes generate problematic results due to approaches intent on only local optimization at each selection step (Harrell, 2017). Furthermore, quantifying the relationships between stream water quality and urban development pattern necessitates the development of predictive models that can be used to forecast stream water quality in alternative urban planning scenarios (Avila et al., 2018; Holcomb et al., 2018; Molina-Navarro et al., 2020; Sharifi et al., 2017). Machine learning algorithms like boosted regression tree analysis, neural networks, and self-organizing maps have been applied to depict the complex, non-linear relationships between landscape characteristics and stream water quality with satisfactory model performance (Clapcott et al., 2012; Hameed et al., 2016; Kalteh and Berndtsson, 2008; Lek, 1999; Mirzaei et al., 2019). One advantage of machine learning application in stream water quality prediction is the possibility of controlling the same percentage of urban developed area in scenario prediction to determine the partial effect these patterns have on stream water quality. The other advantage is that after the accuracy of machine learning model is tested on a new dataset, the generalizability can be ensured and it can then be applied to forecast stream water quality under future land use planning scenarios to support policy decision-making (Chermack et al., 2008; Schreiber et al., 2019). Although stream water quality prediction with different land use scenarios has been explored in such predictive studies using machine learning algorithms, to our best knowledge, very few studies focus on the impact of urban development pattern. The goal of this study is thus to provide a comprehensive understanding of how different urban development patterns influence stream water quality, covering the aspects of important factors, spatial variations, predictive models, and potential mechanisms. Using the Texas Gulf Region as 8 the study site, stream water quality—represented by NO3 --N, TP, and E.coli concentrations—was quantified and predicted by metrics of patterns of urban development, controlling for landscape spatial pattern, topography, soil, climate, and population. Specifically, this study has three objectives: 1) To identify the most important factors of urban development pattern that influence NO3 --N, TP, and E.coli concentrations and suggest specific urban forms to protect stream water quality; 2) To uncover the seasonal and spatial non-stationary relationships between urban development pattern and stream water quality; and 3) To develop predictive models that can forecast stream water quality based on different scenarios of urban development densities and configurations as well as provide implications for land use planning. 2.2 Data and Method 2.2.1 Study Site The study site was the Texas Gulf Region, which has an area of 471,080 km2 (Figure 2-1). It is one of 21 water resource regions (HRU 02) in the United States, consisting of 11 subregions (HRU 04) and 23 basins (HRU 06). The climate of this region is diverse, with a maritime climate along the coast, a continental climate in the central and northern areas, and a dry and hot climate in the west. These diverse climates lead to heterogeneous landscapes across the region. From east to west, the terrain ecosystem changes from coastal swamps and piney woods to rolling plains and rugged hills. The heterogeneity of these climate and landscape factors provide ideal samples for studying their influences on stream water quality. Moreover, the increasing population in the study site has resulted in problems associated with urban sprawl, which has put natural forest areas at risk and degraded stream water quality. Texas currently has a population of approximately 29 million, with a growth rate of 1.8% every year (World Population Review, 2019). Nonpoint source pollution closely related to urban expansion 9 contributes to 45% of stream water quality impairment in Texas. Bacteria, nutrients, dissolved oxygen, and organics are the major causes of stream water quality degradation (Texas Commission on Environmental Quality, 2014). I therefore selected NO3 --N, TP, and E.coli concentrations as the contaminants of interest in this study. Other common pollutants such as Total suspended solid and heavy metal were not included because of the data quality and availability. Figure 2-1. Study Site 2.2.2 Data and variables Pollutant concentration data from 1,047 sampling stations in the Texas Gulf Region were used as predicted variables in this study. To monitor and assess stream water quality, the Texas 10 Commission on Environmental Quality’s (TCEQ) Surface Water Quality Monitoring (SWQM) Program has installed over 3,000 active monitoring stations throughout the region. Pollutant concentration data in 2011 were obtained from the SWQM program and aggregated in dry and wet seasons by taking the average values. According to the monthly average precipitation in Texas, the dry season went from November to April and the wet season occurred the rest of year (Pratt and Chang, 2012). Landscape metrics at both class and landscape levels, climate, soil, topography, and population were included as explanatory variables to explain variations in stream water quality. The class level metrics included land covers of developed area, developed open area, forest area, and planted area, which have been demonstrated to be major environmental drivers of changes in stream water quality (Clement et al., 2017; Glinska-Lewczuk et al., 2016; Teklu et al., 2016). Our analytical steps focused on metrics from urban development pattern and used other metrics as control variables. The definition of all land covers was in accordance with NLCD (Homer et al., 2015), and all variables in this study and their corresponding data sources are presented in Table 2-1. Table 2-1. Data sources Dataset Structure Variables Spatial Resolution Raster landscape metrics 30m population, population density elevation, slope NA 0.33 arc seconds/ 30m precipitation, temperature 2.5 arc minutes/ 5km NLCD 2011: USGS National Land Cover Database Tiger census block Shapefile USGS National Elevation Dataset 1/3 arc-second PRISM Monthly spatial climate dataset AN81m Raster Raster 11 Table 2-1 (cont’d) SSURGO database TCEQ SWQM program Shapefile Table hydrological soil groups, soil storage depth 1:12000 stream pollutant concentration NA I incorporated high dimensions of landscape metrics in the machine learning models, including 76 class level metrics and 32 landscape level metrics in the categories of area, edge, shape, and contagion/interspersion (McGarigal, 1995), as presented in Table 2-2. It was assured that correlated variables would not cause multi-collinearity issues in the machine learning models and a large set of features can potentially increase predicting accuracy. Table 2-2. Explanatory variables Category Subcategory Variable Class Level Metrics (76) 1 (including classes of developed open area, developed area, forest area, and planted area) Area (28) Edge (8) Shape (20) Percentage of Landscape (PLAND), Total Area (CA), Median of Patch Area (AREA_MD), Median of Radius of Gyration (GYRATE_MD), Largest Patch Index (LPI), Number of Patches (NP), Patch Density (PD) Total Edge (TE), Edge Density (ED) Median of Perimeter-Area Ratio (PARA_MD), Median of Shape Index (SHAPE_MD), Median of Fractal Dimension Index (FRAC_MD), Median of Related Circumscribing Circle (CIRCLE), Median of Contiguity Index (CONTIG_MD) Contagion/Intersp ersion (20) Landscape Division Index (DIVISION), Splitting Index (SPLIT), Interspersion Juxtaposition Index (IJI), Landscape Shape Index (LSI), Patch Cohesion Index (COHESION) 12 Landscape Level Metrics (32) Area (6) Edge (2) Shape (6) Contagion/Intersp ersion (10) Diversity (8) Table 2-2 (cont’d) Total Area (CA), Largest Patch Index (LPI), Median of Patch Area (AREA_MD), Median of Radius of Gyration (GYRATE_MD), Number of Patches (NP), Patch Density (PD) Total Edge (TE), Edge Density (ED) Perimeter-Area Fractal Dimension (PAFRAC), Median of Perimeter-Area Ratio (PARA_MD), Median of Shape Index (SHAPE_MD), Median of Fractal Dimension Index (FRAC_MD), Median of Related Circumscribing Circle (CIRCLE), Median of Contiguity Index (CONTIG_MD) Landscape Division Index (DIVISION), Splitting Index (SPLIT), Effective Mesh Size (MESH), Interspersion Juxtaposition Index (IJI), Landscape Shape Index (LSI), Patch Cohesion Index (COHESION), Contagion (CONTAG), Proportion of Like Adjacencies (PLADJ), Aggregation Index (AI), Median of Euclidean Nearest Neighbor Distance (ENN_MD) Patch Richness (PR), Patch Richness Density (PRD), Shannon’s Diversity Index (SHDI), Simpson’s Diversity Index (SIDI), Modified Simpson’s Diversity Index (MSIDI), Shannon’s Evenness Index (SIEI), Simpson’s Evenness Index (SIEI), Modified Simpson’s Evenness Index (MSIEI) Climate (24) Precipitation (12) Monthly Precipitation, Seasonal Average Precipitation Temperature (12) Monthly Temperature, Seasonal Average Temperature Topography (2) Elevation, Slope 13 Soil (6) Table 2-2 (cont’d) Soil Storage, the Presence of Hydrologic Soil Groups A, B, C, D, C/D, B/D Population (2) Population, Population Density I added environmental and social control variables including precipitation, temperature, slope, elevation, soil type, soil storage depth, population, and population density to control for model bias. In terms of the climatic variables, seasonal total precipitation and mean temperature were included in the statistical models to simplify interpretation. Monthly total precipitation and mean temperature were used in the machine leaning models to facilitate higher predicting accuracy. In this study, soil type referred to hydrological soil groups (HSG). HSG A, B, C, and D have a high infiltration rate, a moderate infiltration rate, a slow infiltration rate, and a very slow infiltration rate, respectively. If a soil was placed in HSG D because of a high-water table, it might be assigned to a dual hydrologic group such as A/D, B/D, or C/D. The first letter of the pair represented the soil’s group if drained and the second letter, D, represented the natural drainage condition. 2.2.3 Data analysis As presented in Figure 2-2, I first applied LASSO regression to identify whether urban development patterns were the dominant factors in determining stream water quality among all the catchment characteristics. GWR models were then developed to understand the spatial variation of the relationships between urban development pattern and pollutant concentrations. RF regression was used to train machine learning models to predict stream water quality. After confirming test set accuracy using RF regression was satisfactory, the final model was employed to predict stream water quality under four scenarios of different urban development patterns. 14 Figure 2-2. Data analysis flowchart • LASSO Regression LASSO regression was employed to select for important factors in stream water quality while minimizing prediction error. LASSO regression results identified key factors in urban development patterns that determine stream water quality. The results were also used to select other important catchment characteristics. LASSO regression is a machine learning method that performs both variable selection and regularization to improve prediction accuracy and a regression model’s interpretability (Tibshirani, 1996). It selects only a subset of covariates by forcing the sum of the absolute value of regression coefficients to be less than a fixed value, which forces some variable coefficients to be set to zero. Variables with non-zero coefficients are then considered more important in predicting the outcomes. The objective of LASSO regression is to solve Equation 2-1, where yi is the outcome and xi is the covariate vector. The parameter t, which 15 determines the amount of regularization, is tuned throughout the cross-validation process. Compared to the common stepwise regression approach used widely in previous studies assessing stream water quality, LASSO regression has the advantage of reaching a global rather than local optimization to make a prediction. With the cross-validation process tuned to the hyperparameter t, LASSO regression also guarantees model generalization in a new dataset. I implemented LASSO regression in “scikit-learn” and “statsmodels” packages in Python 3.0. min , β β 0    1 N N ∑ i 1 = ( y i ) β β − T x i − 0 2    subject to p ≤∑ tβ j j 1 = Equation 2-1 • Geographically Weighted Regression (GWR) In this study, GWR was applied to investigate the spatially varying associations between urban development pattern metrics selected by LASSO regression and stream water quality. GWR allows linear predictors to be a function of spatial coordinates (u, v), as represented in Equation 2-2. In this equation, y is the pollutant concentration, xj is the covariate vector, and jβ is the corresponding vector coefficient. GWR assumes that the contribution of each sample to the local regression model is weighed according to its proximity to the local sample point. A common choice of weighting function is the Gaussian curve, as shown in Equation 2-3, where dij is the distance between observation point i and the realization point j, and the bandwidth b is the parameter to be determined. An adaptive kernel bandwidth was employed in this study in accordance with the judgement of AIC. GWR was implemented in “spgrw” package in R. y p 1 + = ∑ j 1 = β j ( , ) u v x j 16 w ij = exp  −   2 d ij 2 2 b     Equation 2-2 Equation 2-3 • Random forest regression RF regression was used to train models to quantify the nonlinear relationships between explanatory variables and stream water quality. It was further applied to scenario predictions of pollutant concentrations in accordance with different urban development patterns. RF is an ensemble learning method that consists of a large number of individual decision trees. Random samples are taken with replacement and a random subset of features are used to generate each regression decision tree. A prediction is made by averaging the results of all regression trees (Breiman, 2001). To guarantee the generalization of the predictive models, 90% of the samples were used to train the models and the remaining sample was used to test the models’ performance metrics, including Mean Square Error (MSE) and R2 (Wang et al., 2019). Ten-fold cross validation was employed to train the hyperparameters including the maximum depth of the tree (max_depth), the minimum number of samples required to split a node (min_sample_split), and the maximum number of features to look for the best split (max_features) using a grid search fashion. The number of regression trees were set to be 1,000. Random forest regression was also implemented in Python 3.0 “scikit-learn” package. 2.2.4 Scenario Design To understand the effects of urban developed density and configuration on stream water quality, I created four alternative urban development scenarios in the upstream area of The Woodland, TX 17 and predicted their pollutant concentrations of NO3 --N, TP, and E.coli in both dry and wet seasons (Figure 2-3). The Woodlands was well-known for Ian McHarg’s ecological planning approach (McHarg and Sutton, 1975). The current development condition was chosen as the baseline scenario, where 33.6% of the area (24 km2) was developed into urban areas. Low density development is the major development type in the current condition. The boundary of the scenario site is the Bear Branch-Panther Branch sub-watershed boundary with the HUC12 ID 120401020211. 18 Figure 2-3. Scenario Maps The alternative scenarios included four extreme development scenarios where developed areas were extremely scattered or aggregated: high-density aggregated development, high-density sprawl development, medium density aggregated development, and medium density sprawl 19 development (Figure 2-3). I applied two criteria to create the four development scenarios. First, the total impervious surface area was the same as the baseline scenario. According to the land cover description of NLCD, impervious surface accounts for 20%-49% in low-density development, 50%-79% in medium density development, and 80%-100% in high-density development. To quantify impervious surface in urban areas for each density, I used the median value of the impervious surface percentage, which were 35%, 65%, and 90% for low density, medium density, and high-density developments, respectively (Yang and Li, 2011). The impervious surface area added up to be 16.4 km2 in all scenarios. Second, all of the existing land cover types in the baseline scenario—including water, forest, grassland, planted, and wetland— stayed the same. The reduced urban areas in the four alternative scenarios were changed to forest areas that represent undeveloped conditions. To approximate the maximum degree of aggregated/sprawl development, I manually chose locations of high/medium density development that had changed to forest areas in ArcGIS 10.5. The key difference in each scenario was urban development patterns, as presented in Table 2- 3. Compared to the two sprawled scenarios, the two aggregated scenarios were characterized by higher LPI, COHESION, lower ED, LSI, and shape complexity. Therefore, developed areas were clumped into larger patches with simpler shape and were more physically connected in the two aggregated scenarios. These differences in urban development pattern metrics laid the foundation for quantifying stream water quality with different urban densities and configurations. 20 Table 2-3. Scenario description Metrics Impervious area (%) High density developed area (%) Medium density developed area (%) Low density developed area (%) LPI NP ED (meters per hectare) FRAC_MD CIRCLE_MD LSI IJI (percent) COHESION Urban development density Urban development configuration (represented by landscape metrics of developed area2) 2.3 Result Baseline Scenario 16.4 1.4 (1.2)1 High density Aggregated Scenario 16.4 18.2 (16.4) High Density Sprawled Scenario 16.4 18.2 (16.4) 12.8 (8.3) 19.4 (6.8) 0 0 41.60 494 86.88 622470 0.4907 27.34 33.53 99.48 15.75 58 13.07 95850 0.4123 6.10 40.65 99.02 0 0 3.45 124 41.55 304800 0.4907 19.33 42.39 95.43 Medium Density Aggregated Scenario 16.4 0 25.2 (16.4) 0 32.90 175 28.11 206100 0.4123 10.78 42.98 99.42 Medium Density Sprawled Scenario 16.4 0 25.2 (16.4) 0 9.81 277 61.45 448290 0.4907 23.54 36.40 98.02 2.3.1 Important catchment characteristics selected by LASSO regression COHESION, IJI, and LPI of developed areas were found to be important in affecting TP in both dry and wet seasons (Table 2-4). When developed areas were more interspersed with other land cover types (indicated by IJI) and became less physically connected (indicated by COHESION), TP concentration was likely to reduce. Larger patches of developed area (indicated by LPI) were positively, significantly correlated with TP concentration in dry seasons. Because urban development pattern metrics were the only landscape metrics selected by the TP LASSO 21 regression, urban development patterns outweighed other land use patterns in affecting TP concentration. In addition, areas with a very low infiltration rate (indicated by soil group D) significantly contributed to low TP concentration. The presence of soil group C/D was significantly associated with high TP concentration in the wet season. High temperature, low forest percentage, and low slope catchments were significantly associated with high TP concentration. The important catchment characteristics affecting TP concentration in dry and wet seasons were similar, with larger and more significant effects in the dry season. 22 Table 2-4. LASSO linear regression results of TP concentration Developed area class level metrics Control variables Constant COHESION IJI LPI PLAND Soil storage The presence of soil group D The presence of soil group C/D slope population Percentage of forest area Mean temperature elevation The presence of soil group B/D coefficient -1.783 0.074 -0.090 0.229 -0.068 0.184 -0.120 0.100 -0.091 0.097 -0.148 0.157 -0.250 0.059 Wet season t value -41.932 1.453 -1.553 1.347 -0.375 2.930 -2.444 2.18 -1.356 2.018 -2.268 2.675 -3.392 1.219 p value <0.001 0.147 0.121 0.178 0.708 0.003** 0.015* 0.028* 0.175 0.044* 0.024** 0.008** 0.001** 0.223 coefficient -1.946 0.077 -0.067 0.144 n/a 0.224 -0.102 0.077 Dry season t value -51.951 1.764 -1.438 2.942 n/a 4.283 -2.492 1.958 -0.064 0.118 -0.142 0.390 n/a n/a -1.139 2.783 -2.752 9.278 n/a n/a p value <0.001 0.078 0.151 0.003** n/a <0.001** 0.013* 0.051 0.255 0.006** 0.006** <0.001** n/a n/a Note of Table 2-4, Table 2-5, and Table 2-6: * indicates the significance level of 0.05; ** indicates the significance level of 0.01 23 The number of important variables associated with E.coli concentration were found to be larger than those of TP concentration, which indicated a more complex mechanism, particularly in wet seasons (Table 2-5). Complex shape (indicated by SHAPE) of urban developed areas was positively and significantly related to E.coli concentration in wet seasons. Similarly, high edge density (ED) and shape complexity (SHAPE) of planted areas was found to be significantly and positively correlated with E.coli concentration in both dry and wet seasons. At the landscape level, the median of CONTIG had significant positive correlation with E.coli concentration in dry seasons, meaning that the high spatial connectedness of land cover patches was likely to increase E.coli concentration. Moreover, low soil storage capacity and low infiltration rates helped to significantly reduce E.coli concentration. High temperature, high soil storage, and the presence of soil group D all contributed to high E.coli concentration. 24 Table 2-5. Lasso linear regression results of E.coli concentration Developed area class level metrics Planted area class level metrics Forest area class level metrics Landscape level metrics Other control variables Constant PLAND IJI LPI PD Median of CIRCLE Median of SHAPE DIVISION ED NP Median of SHAPE PLAND PLAND Median of FRAC SPLIT Median of AREA Median of CONTIG Median of AREA Median of FRAC IJI MESH TE AI Soil storage The presence of soil group D The presence of soil group C/D Mean temperature Population density Population Wet season coefficient 4.123 0.328 -0.121 0.250 -0.093 -0.157 0.213 n/a -0.286 -0.067 n/a n/a -0.146 0.134 -0.105 n/a 0.503 -0.103 0.162 0.078 -0.874 n/a n/a 0.526 -0.258 0.181 0.173 0.254 0.196 t value 53.486 0.747 -0.956 0.744 0.868 -1.608 2.619 n/a -2.533 -0.572 n/a n/a -1.362 1.207 -1.249 n/a 1.749 -0.297 0.579 0.548 -0.835 n/a n/a 4.567 -2.956 2.220 1.726 1.171 2.035 p value <0.001 0.455 0.339 0.457 0.386 0.108 0.009** n/a 0.012* 0.567 n/a n/a 0.174 0.228 0.212 n/a 0.083 0.766 0.562 0.584 0.404 n/a n/a <0.001** 0.003** 0.027* 0.085 0.242 0.042* Dry season coefficient 4.534 0.643 -0.111 n/a n/a n/a n/a -0.170 n/a n/a 0.250 0.181 -0.094 n/a n/a 0.106 0.562 n/a n/a n/a n/a -0.144 -0.119 0.455 -0.160 0.105 0.438 0.098 n/a t value 71.632 2.580 -1.337 n/a n/a n/a n/a -1.259 n/a n/a 3.369 1.911 0.570 n/a n/a 1.510 6.453 n/a n/a n/a n/a -2.088 -1.679 5.187 -2.261 -2.261 5.384 0.570 n/a p value <0.001 0.010** 0.182 n/a n/a n/a n/a 0.209 n/a n/a 0.001** 0.056 0.569 n/a n/a 0.131 <0.001** n/a n/a n/a n/a 0.037 0.094 <0.001** 0.024** 0.024** <0.001** 0.569 n/a 25 All aspects of urban development patterns were importantly associated with NO3 --N concentration, including area, cohesion, adjacency, edge, shape, and area. Meanwhile, the mechanism of planted and forest areas were relatively simple (Table 2-6). In dry season, when the proportion of developed areas increased and these areas became more connected (indicated by COHESION), NO3 --N concentration significantly decreased. When developed area became more interspersed to other land cover patches (indicated by IJI), NO3 --N concentration significantly decreased as well. The percentages (PLAND) and connectedness (CONTIG) of planted areas were also positively and significantly correlated with NO3 --N concentration in dry seasons. It was found that a simple and intact shape (PARA) of forest area significantly contributed to the reduction of NO3 --N concentration in both dry and wet seasons. At the landscape level, the more fragmented (PAFRAC) the landscape, the higher and more significant the NO3 --N concentration in dry season. NO3 --N concentration was negatively associated with precipitation and thus, NO3 --N concentration in the wet season was lower than in the dry season. 26 Table 2-6. Lasso linear regression results of NO3 --N concentration in wet seasons Wet season Dry season Developed area class level metrics coefficient -0.867 0.141 -0.169 0.169 0.003 n/a -0.103 -0.076 n/a n/a -0.242 0.261 n/a -0.206 0.228 0.113 0.426 0.160 -0.396 t value -9.903 1.101 -0.749 1.110 0.024 n/a -0.489 -0.369 n/a n/a -2.030 2.441 n/a -2.148 2.499 0.981 3.792 1.447 -4.098 p value <0.001 0.272 0.454 0.268 0.981 n/a 0.625 0.713 n/a n/a 0.043* 0.015* n/a 0.032* 0.017* 0.327 <0.001** 0.032* <0.001** coefficient -0.436 0.324 -0.319 n/a n/a 0.174 -0.124 n/a 0.312 -0.356 -0.190 0.439 0.218 -0.116 0.183 0.068 0.706 n/a n/a t value -5.730 2.968 -3.044 n/a n/a 2.157 -1.179 n/a 3.409 3.323 -1.849 4.595 2.291 -1.450 2.244 0.704 7.052 n/a n/a p value <0.001 0.003** 0.003** n/a n/a 0.032* 0.239 n/a 0.001** 0.001** 0.065 <0.001** 0.023* 0.148 0.025* 0.482 <0.001** n/a n/a Constant COHESION IJI LPI ED Median of AREA Median of CONTIG Median of CIRCLE PLAND Median of CONTIG PLAND Median of PARA Developed open area class level metrics Planted area class level metrics Forest area class level metrics Landscape level metrics PAFRAC Other control variables The presence of soil group D The presence of soil group C/D Population Mean temperature Soil storage Mean precipitation 27 2.3.2 Spatial variation of the effects of urban development pattern on stream water quality In this study, TP GWR performed better in coastal areas such as the Neches Basin, the Lower Brazos Basin, and the Central Texas Coastal Basin; with the R2 higher than 0.4 (Figure 2-4). However, they did not perform equally well in the Houston metropolitan area and the agricultural watersheds like the Middle Brazos Basin. The performance of the E.coli GWR was also better in coastal areas, including the Galveston Bay-San Jacinto Basin and the Neches Basin. 28 Figure 2-4. TP, E.coli, and NO3 --N GWR model performance Compared to LASSO regression, GWR performed better in predicting TP and E.coli concentrations, indicated by a lower AIC and higher R2 (Table 2-7). The performance of NO3 --N 29 models were similar between GWR and LASSO regressions, which was attributed to the relatively smaller sample size and spatial extent. Table 2-7. Model performance comparison between Lasso linear regression and GWR Model df N Observation TP wet season dry season E.coli wet season dry season wet season dry season NO3 N -- 804 868 754 788 329 355 13 10 22 15 14 13 R2 LASSO regression 0.32 0.34 0.29 0.33 0.42 0.45 GWR 0.49 0.44 0.32 0.44 0.42 0.46 AIC LASSO regression 2596 2645 3416 3158 1252 1276 GWR 2394 2526 3365 3061 1242 1259 Among the most important metrics of urban development, COHESION, i.e., the aggregation of urban developed areas, exerted a greater positive effect on TP concentration in the southern portion of the study area, which included the Nueces-Southwestern Texas Coastal Basin, the Central Texas Coastal Basin, the Lower Colorado-San Bernard Coastal Basin, and the Lower Brazos Basin (Figure 2-5). The effects of COHESION in the Galveston Bay-San Jacinto Basin and the Trinity Basin, in contrast, trended towards negative. When developed areas were more proportionally interspersed with other land cover types (higher IJI), TP concentration in the Central Texas Coastal Basin and the Galveston Bay-San Jacinto Basin were likely to decrease. Large patches of developed area (higher LPI) were shown to have a spatially heterogeneous effects on TP concentration. In the Central Texas Coastal Basin, the Lower Colorado-San Bernard Coastal Basin, and the Lower Brazos Basin, the effect was positive, and changed to negative in the Trinity Basin and most of the coastal areas. Complex shape (SHAPE) and large patches (LPI) of urban developed areas had greater positive effects on E.coli concentration in coastal basins, including the Nueces-Southwestern Texas Coastal Basin, the Central Texas Coastal Basin, the Galveston Bay-San Jacinto Basin, the Sabine Basin, and the east parts of the Lower Colorado-San Bernard Coastal Basin and the Lower Brazos Basin. 30 SHAPE had negative effects on E.coli concentration in some agricultural basins such as the Middle Brazos Basin and west of the Lower Brazos Basin. The IJI of developed areas had a greater negative effect on E.coli concentration in the northwest part of the study area, including the Middle Brazos Basin and the Lower Brazos Basin. I discuss the mechanism that is potentially driving the spatial variation in the effects of urban development pattern in section 4.3. 31 Figure 2-5. GWR model coefficients of urban development pattern effects in the wet season (TP model on the left and E.coli model on the right) 32 2.3.4 Stream water quality prediction under alternative planning scenarios In the RF regression model, the variations and trends of all pollutant concentrations were well captured in the test set; however, the extreme values were not well predicted (Figure 2-6). The very low concentrations tended to be overestimated and the very high concentrations tended to be underestimated. In the wet season, the R2 in the test set was 0.56, 0.45, and 0.66 for the TP, E.coli, and NO3 -N RF models, respectively (Table 2-8). Similar to the GWR performance, RF predicting accuracy in the dry season were slightly higher than in the wet season. Figure 2-6. Scatter plots of predicted values against observed values of TP concentration in the test set 33 Table 2-8. Random forest prediction results TP wet season dry season E.coli wet season Train set correlation R2 MSE correlation R2 MSE 0.98 1.23 Test set 0.96 0.16 0.74 0.55 0.98 0.96 0.15 0.75 0.56 0.85 0.98 0.96 0.58 0.65 0.42 4.61 dry season wet season 0.98 0.97 -- NO3 N 0.96 0.35 0.67 0.45 2.62 0.94 0.35 0.80 0.64 1.75 dry season 0.97 0.94 0.29 0.81 0.66 1.36 34 Top 10 important variables IJI of forest area, precipitation, population, temperature, COHESION of developed area, population density, PAFRAC, soil storage ED of planted area, precipitation, temperature, slope, IJI of developed area, PAFRAC, COHESION of developed area, population density, PD of planted area, population, elevation, soil storage Temperature, percentage of planted area, ED of planted area, precipitation, COHESION of developed area, DIVISION of developed area, soil storage, population density, LPI of developed area, percentage of developed area Precipitation, IJI, temperature, population density, soil storage, percentage of developed area Precipitation, population density, LPI of developed open area, temperature, soil storage, COHESION of developed area, population density, median of CIRCLE of forest area, median of GYRATE of forest area, LPI of forest area PAFRAC, elevation, temperature, soil storage, population, median of GYRATE of forest area, IJI of developed area, COHESION of developed area, LPI of forest area, precipitation, median of AREA of forest area, median of CONTIG of forest area The importance of climatic factors was highlighted in the RF regression because using monthly average temperature and total precipitation data yielded much higher accuracy than that obtained through seasonal average temperature and total precipitation. It is therefore likely that climatic factors exhibited interaction effects with urban development patterns and other environmental variables on stream water quality. According to the variable importance of RF regression (Table 8), the aggregation and interspersion of developed areas (indicated by COHESION and IJI) were important in affecting TP and NO3 --N concentrations, which aligned with the LASSO regression results. COHESION, LPI, and DIVISION of developed areas were important in affecting the concentration of E.coli in wet seasons. With respect to other landscape patterns, the ED of planted areas was significant for both TP and E.coli concentrations. Shape complexity and aggregation of forest areas were important factors in influencing NO3 --N concentration. Landscape level metrics were found to not be as important as class level metrics. The prediction results of the alternative planning scenarios suggested that high density aggregated development patterns were advantageous in reducing TP and NO3 --N concentrations (Table 2-9). All high density and medium density compact developments had a lower than half concentration of all pollutants compared to the current development, indicating the benefits of small footprint urban areas. Aggregated development in both high and medium density scenarios had lower TP and NO3 --N concentrations when compared to sprawl development of the same density. However, aggregated development contributes to higher E.coli concentrations than sprawl development of the same density in wet seasons. Specifically, for TP concentration, two sprawled development scenario result in the hypereutrophic conditions, while two aggregated development scenarios result in eutrophic conditions in lotic ecosystems (Grand River Water Management Plan, 2013). Unpolluted water generally has a NO3 --N concentration of less than 1.0 mg/l, which can be 35 achieved in the four alternative high and medium density development but cannot be achieved in the low-density current development scenario. Overall, the most recommended urban development pattern for stream water quality protection was high density aggregated development; though specific attention should be paid in areas with potential E.coli pollution to avoid very high density development. It was worth noting that the predicted values of TP and NO3 --N were comparable to the measured data at the TCEQ Station #16629 , which was located close to the outlet of the basin, indicating the reliability of our prediction models. Table 2-9. Scenario prediction results of pollutant concentration TP wet season dry season season dry season wet season dry season Current development 0.28 (0.551) High- density aggregated development 0.10 High- density sprawl development 0.14 Medium- density aggregated development 0.11 Medium- density sprawl development 0.18 0.28 (0.11) 0.09 119.72 67.23 1.98 (2.92) 43.68 26.31 0.1 1.2 (1.58) 0.17 0.14 30.28 41.50 0.19 0.25 0.09 98.58 54.90 0.15 0.22 0.13 76.59 32.02 0.25 0.42 E.coli wet -- NO3 N Notes of Table 2-9: 1. Values in the parentheses are measured pollutant concentrations at the TCEQ Station # 16629, which is close to the outlet of this basin 2. The unit is mg/l for TP and NO3 --N and MPN/100ml for E.coli 36 2.4 Discussion 2.4.1 Planning implication based on urban development pattern metrics Given that interpreting urban development pattern metrics can be difficult in land use planning, in this section I discuss how to link these metrics with specific land cover maps using sample watersheds in the study region. Three pair-wise comparisons of land cover maps with similar developed percentages but different TP concentrations were given in Figure 2-7. The two watersheds—(a) and (b) as represented in Figure 2-7 (1)—produced very different TP concentrations, which was likely associated with different IJI in developed areas. The watershed #12083 (Figure 2-7-a) was identified as more aggregated development, with a relatively integral natural core in the west. The IJI of developed area in this watershed was larger because the developed area was more equally adjacent to other land patch types. The watershed #11155 watershed (Figure 2-7-b) was low density development with scatted developed open areas. The IJI in this watershed was small because the developed area was largely adjacent to the developed open area only. The higher TP concentration in watershed # 11155 was likely caused by pollutants generated from the landscape gardens in the developed open areas and the greater extent of road surface area associated with detached houses (Goonetilleke et al., 2005). The comparisons between watersheds in Figure 2-7-c and Figure 2-7-d and watersheds in Figure 2-7-e and Figure 2-7-f showed how shape and edge complexity potentially affected pollutant concentration. Different shape and edge complexity was associated with different drainage connections and road systems that influence runoff velocity, pollutant travel distance, and time of transport (Liu et al., 2012). The watershed #20730 (Figure 2-7-c) had a lower percentage of developed area but a higher TP concentration than the watershed #16655 (Figure 2- 7-d). The complex shape and sprawled development of watershed #20730 led to more interspersed 37 land uses and more complex drainage and road systems. Higher ED of developed area in the watershed #17406 (Figure 2-7-e) was found to be associated with higher TP concentration than watershed #11405 (Figure 2-7-f) given the similar percentage of developed area. The high ED of developed area in watershed #17406 implied a sprawled road system that degraded the structure of natural areas (Lee et al., 2009). Figure 2-7. Examples of watersheds with the similar percentage of developed area but different urban development pattern metrics and TP concentration Urban development pattern metrics are related to percentage, aggregation, patch shape, and connectivity of developed areas, and thus can represent characteristics of urban sprawl like low- 38 density development, leapfrog development over vacant lands, and decentralization (Riitters et al., 1995; Gordon and Richardson, 1996; Ewing, 2008; Bhatta, 2010). I argue that urban sprawl had a direct relationship to stream water quality, as it affected pollutant generation, build-up, and wash off by altering the structure of urban forms and the surrounding natural areas (Goonetilleke et al., 2005; Liu et al., 2012). To sum up, the ideal urban form for stream water quality protection should avoid (1) sprawl of low-density development with large lawn areas and complex road systems; (2) complexly shaped of urban areas that are likely to have complicated drainage and road systems; and (3) scatted patches of urban areas that destroy integral natural areas. 2.4.2 The complexity of the impact of urban development pattern on stream water quality The results of this study indicate that both size and connectedness of urban developed areas (LPI, COHESION, and IJI of developed areas) were important in influencing stream water quality. This conclusion differs from Lee and others’ argument that the dispersion and connectedness of land cover appear to be less informative in measuring the relationship between land use and water quality compared to size and number metrics (Lee et al., 2009). Regarding the aggregation of urban areas, COHESION has showed a negative correlation with runoff and pollutant concentration in some studies (Li et al., 2015), while large and aggregated urban area, as indicated by high contiguity index (CONTIG) or contagion index (CONTAG), has been associated with poor stream water quality in others (Lee et al., 2009; Lv et al., 2015; Shi et al., 2017). Because greater interspersion and increases in the number of urban patches may accelerate soil erosion and sediment exportation (Shi et al., 2013), I argue that, although an intact urban area with large impervious surfaces can result in the deterioration of water quality (Alberti et al., 2007; Lee et al., 2009), the same area of impervious surface can lead to worse stream water quality with greater dispersion, as verified in our scenario prediction. 39 It is worth noting here that, without the control of developed area percentage, the effect of urban developed pattern on water quality should always be interpreted with caution due to the collinearity between urban development pattern and urban area percentage. As indicated in Figure 2-8, IJI, COHESION, and the percentage of developed areas were correlated with each other. Therefore, the effect of urban development pattern on stream water quality derived in statistical models can sometimes be caused by the percentage of urban developed area. I also wanted to note that, in Figure 2-8, the relationships between urban development patterns and percentage of urban developed area were not linear. Specifically, a low percentage of developed area does not necessarily mean low COHESION or high IJI. Thus, the percentage of urban developed area cannot replace urban development pattern metrics. This means that, in land use planning policy, IJI and COHESION should be considered together with the percent of urban developed area to evaluate the possible influence on stream water quality. Figure 2-8. Scatter plots showing correlations between IJI, COHESION and the percentage of urban developed areas The shape and edge complexity of developed areas were useful but not as important as aggregation/interspersion metrics in influencing stream water quality. Among some highly 40 colinear shape metrics, I found that the median of CIRCLE, FRAC, and SHAPE of a developed area were more efficient compared to other metrics. SHAPE was frequently applied to measure the effect of shape complexity on stream water quality and was found to be negatively associated with pollutant concentration at the catchment scale (Li et al., 2015; Yu et al., 2013; Lee et al., 2009; Shi et al., 2017). I found that FRAC and CIRCLE were also efficient metrics for measuring urban pattern shape. The importance of CIRCLE indicated that patch elongation was as important as patch compactness of urban area in evaluating stream water quality. I furthermore found that class level landscape metrics were more effective than landscape level metrics in predicting stream water quality. The reason for this is that class level metrics had different influences on water quality depending on land cover type. For example, COHESION and IJI were important to developed areas in terms of their influence on stream water quality, but the COHESION and IJI of forest and planted areas were not as important. Researchers have argued that, at the landscape level, the landscape level SHDI and ED affect steam water quality at both watershed and reach scale (Shi et al., 2013; Sun et al., 2014,). However, I found that PAFRAC was the most significant factor affecting all pollutant concentrations instead of SHDI and ED. 2.4.3 Interpretation of the spatiotemporal non-stationary land-water relationships In this study, the effects of IJI, LPI, and COHESION of developed areas on TP concentrations were more significant in the dry season than in the wet season. The absolute values of urban development pattern metrics’ coefficients were larger in E.coli and NO3 --N regressions in dry seasons, thereby indicating that urbanization had a larger effect on stream water quality in dry seasons. Precipitation had a significantly negative association with NO3 --N concentration, indicating a potential dilution effect of prolonged precipitation in wet seasons (Chen et al., 2016). I also found that more urban development pattern metrics were selected by LASSO regression in 41 wet seasons, which represented more complex relationships than in dry seasons. Under future climate change conditions, urban development pattern might have a more complicated effects on stream water quality due to more precipitation in coastal areas. Future research should thus investigate the interaction effects between precipitation and the impacts of urban development pattern on stream water quality to further understand this mechanism. Moreover, the influence that urban development pattern exerted on stream water quality had high spatial variations, which might be attributed to different pollutant sources. The LPI of developed area had a negative correlation with TP concentration in the highly urbanized areas like the Dallas metropolitan area in Texas. This finding differed from existing studies that have reported that the LPI of residential areas was a strong positive predictor of pollutant loading (Carey et al., 2011). Alternately, I argue that, in highly urbanized areas, larger LPI of developed area corresponded to aggregated development with fewer urban patch numbers, while smaller LPI of developed area was associated with smaller but more patches of impervious areas. Larger LPI of developed area in this case contributed to better water quality because of the smaller urban footprint of aggregated development. However, in the agricultural area, the relationship between LPI of developed area and TP concentration changed to significantly positive. In these watersheds, there were not many urban patches and large LPI of developed area simply implied larger urban core areas and total impervious areas, which contributed to the increasing pollutant concentration. This conclusion supports previous findings that indicate urbanization in agricultural watersheds can lead to larger increases in pollution compared to urban watersheds (Chen et al., 2016; Huang et al., 2015). Furthermore, the IJI of developed area had a higher negative influence on TP concentration primarily in agricultural watersheds. In these agricultural watersheds, low IJI of developed area 42 was usually associated with low density development and high IJI was associated with medium to high density development. If developed areas were mostly adjacent to developed open areas in low density development, the watersheds typically had a low IJI of developed area and a high TP concentration. This phenomenon might be attributed to the application of phosphorus-based fertilizers on lawns in low-density residential areas (Wilson, 2015). TP concentration in highly urbanized areas, such as watersheds around Houston and Dallas, had a weak dependence on the IJI of developed area. Because of highly mixed land use in the high-density urban areas, the IJI of developed area might not be a reliable indicator of specific urban forms. Complex shape (SHAPE) of developed area was associated with high E.coli concentration in all the watersheds, with stronger influences in San Jacinto Basin, the Neches Basin, and the Sabine Basin than in other basins. The similarity in these regions was higher total precipitation in wet seasons, which might be a reason for E.coli wash off from urban areas. It is also possible that aggregated development led to more E.coli pollution in wet seasons, which aligned with our scenario prediction results. Compared to TP, the effect of IJI of developed area on E.coli concentration had a lower spatial variation. The negative effect of urban sprawl on TP concentration was stronger than that on E.coli concentration, indicating that the mechanisms might differ and thus worth future investigation. 2.4.4 The advantages and limitations of applying machine learning in scenario prediction The major advantage of the machine learning approach in this study was the successful quantification of complex, nonlinear land-water relationships. Overall, it facilitated more accurate water quality predictions under different planning scenarios. As the generalizability of machine learning is guaranteed by large sets of training samples and the train-test split method, it can be used to predict water quality under new land use plans in the Texas Gulf Region, especially in the 43 coastal area where the model performance was better than in the inland area. Policy makers can use this information to decide whether the resulting contaminant concentration meets regulation standards under the future land use scenario. This prediction framework can also be generalized to other watersheds and regions for the purposes of informing planning policy. As using a machine learning model alone is difficult in revealing the contribution of each catchment characteristic on stream water quality, statistical models were useful for uncovering the direction and spatial variation of each urban development pattern metrics’ influence on stream water quality. I therefore suggest that combining statistics and machine learning was helpful for both predicting and interpreting water quality variations. As mentioned in previous studies, a key gap in water quality studies has been a lack of consideration of cross effects between explanatory variables, such as the cross-correlation between land covers and the cross-correlation between land cover and climate in influencing water quality (Li et al., 2015; Hwang et al., 2016; Lintern et al., 2017). Machine learning can make use of all cross effects between variables and improve model predicting accuracy, which is an advantage over traditional statistical models. Another advantage is that RF regression handles high dimensional data well since it works with subsets of data in each tree. It is therefore flexible and can accommodate more factors to improve water quality prediction accuracy, e.g., the inclusion of a monthly climatic variable in this study. Under climate change scenarios, climatic variables can therefore be included in machine learning models to forecast future stream water quality under extreme climate conditions. Overall, machine learning models can be used to predict water quality by taking into consideration any variables of interest in future research, the mechanism of which can be obscure and hard to model with a 44 physical-based model. In the predicting process specifically, it is applicable for integrating a set of planning factors to draw management implications of interest. The major limitation of this study was that some catchment characteristics were excluded because they were not readily available. Such variables included point source pollution, animal products, wastewater treatment plants, and so on (Chen et al., 2014; Zhou et al., 2016). Future machine learning predictions of stream water quality should take these important aspects into consideration in order to obtain more unbiased models. Another limitation was the selection of appropriate variables. In this study, I conducted trials of variable selection in the RF regression using mutual info regression, which entailed dropping a specific number of variables with the lowest mutual information regarding pollutant concentration (Kraskov et al., 2011). I at last decided to keep the whole set of independent features in the prediction model because, after iterating all possible numbers of input variables, the RF regression accuracy did not significantly improve. Future studies should also try other engineering algorithms, such as recursive feature elimination. 2.5. Conclusion Urban development patterns were found to significantly influence stream TP, NO3 --N, and E.coli concentrations in the Texas Gulf Region, with the relationships among them varying according to season and location. LPI, COHESION, and IJI of developed areas were the most efficient urban development pattern metrics associated with stream water quality. Furthermore, shape complexity and edge density of urban developed areas were positively correlated with pollutant concentrations. The effect of urban development pattern on stream water quality was more stable and significant in dry seasons and more variable and complex in wet seasons. The IJI of developed area had a higher negative influence on water quality in less urbanized watersheds. The LPI of developed 45 area had a negative correlation with TP concentration in the highly urbanized area, but a positive correlation in the agricultural area. It was predicted by RF regression that high density aggregated development was the most effective in reducing TP and NO3 --N concentrations compared to medium density development and the current sprawl development. However, aggregated development contributed to E.coli pollution in wet seasons. To conclude, this study demonstrated the environmental consequences of urban sprawl and supported policy orientation towards compact city planning according to the machine learning predictive framework. 46 APPENDIX 47 Table 2A-2-10. Description of landscape metrics. Category Area Range Description (0,100] Variable Percentage of Landscape (PLAND) Total Area (CA) the percentage the landscape comprised of the corresponding patch type the sum of the areas of all patches of the corresponding patch type the median of all patches of the corresponding patch type the median of mean distance between each cell in the patch and the patch centroid. the area of the largest patch of the corresponding patch type divided by total landscape area the sum of the lengths of all edge segments involving the corresponding patch type the sum of the lengths of all edge segments involving the corresponding patch type, divided by the total landscape area the median of the ratio of the patch perimeter to area the median of patch perimeter divided by the minimum perimeter possible for a maximally compact patch of the corresponding patch area. the median of 2 times the logarithm of patch perimeter divided by the logarithm of patch area the median of 1 minus patch area divided by the area of the smallest circumscribing circle the median of the average contiguity value for the cells in a patch minus 1 divided by the sum of the template values minus 1 the number of patches of the corresponding patch type the number of patches of the corresponding patch type divided by total landscape area 1 minus the sum of patch area divided by total landscape area, quantity squared, summed across all patches of the corresponding patch type the total landscape area squared divided by the sum of patch area squared, summed across all patches of the corresponding patch type type involving the corresponding patch type divided by the total length of edge involving the same type, multiplied by the logarithm of the same quantity, summed over each unique edge type; divided by the logarithm of the number of patch types minus 1 (0,100] minus the sum of the length of each unique edge (0, ∞) (0, ∞) (0, ∞) (0,100] [0, ∞) [0, ∞) (0, ∞) (0, ∞) [1, 2] [0, 1) [0, 1] [1, ∞) (0, ∞) [0, 1) [1, Ncell2] Median of Patch Area (AREA_MD) Median of Radius of Gyration (GYRATE_MD) Largest Patch Index (LPI) Total Edge (TE) Edge Density (ED) Median of Perimeter- Area Ratio (PARA_MD) Median of Shape Index (SHAPE_MD) Edge Shape Median of Fractal Dimension Index (FRAC_MD) Median of Related Circumscribing Circle (CIRCLE) Median of Contiguity Index (CONTIG_MD) (NP) Patch Density (PD) Subdivision Number of Patches Landscape Division Index (DIVISION) Splitting Index (SPLIT) Interspersion Juxtaposition Index (IJI) Aggregation 48 Table 2A-2-10 (cont’d) Landscape Shape Index (LSI) [1, ∞) the total length of edge involving the corresponding class, divided by the minimum length of class edge possible for a maximally aggregated class 1 minus the sum of patch perimeter divided by the sum of patch perimeter times the square root of patch area for patches of the corresponding patch type, divided by 1 minus 1 over the square root of the total number of cells in the landscape Patch Cohesion Index (COHESION) [0,100) 49 BIBLIOGRAPHY 50 BIBLIOGRAPHY Ai, L., Shi, Z. H., Yin, W., & Huang, X. (2015). Spatial and seasonal patterns in stream water contamination across mountainous watersheds: Linkage with landscape characteristics. Journal of Hydrology, 523, 398–408. doi:10.1016/j.jhydrol.2015.01.082 Alberti, M., Booth, D., Hill, K., Coburn, B., Avolio, C., Coe, S., & Spirandelli, D. (2007). The impact of urban patterns on aquatic ecosystems: An empirical analysis in Puget lowland sub-basins. Landscape and urban planning, 80(4), 345-361. Avila, R., Horn, B., Moriarty, E., Hodson, R., & Moltchanova, E. (2018). Evaluating statistical model performance in water quality prediction. Journal of Environmental Management, 206, 910–919. doi:10.1016/j.jenvman.2017.11.049 Bhatta, B. (2010). Urban Growth and Sprawl. Analysis of Urban Growth and Sprawl from Remote Sensing Data, 1–16. doi:10.1007/978-3-642-05299-6_1 Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. Bu, H., Meng, W., Zhang, Y., & Wan, J. (2014). Relationships between land use patterns and water quality in the Taizi River basin, China. Ecological Indicators, 41, 187–197. doi:10.1016/j.ecolind.2014.02.003 Carey, R. O., Migliaccio, K. W., Li, Y., Schaffer, B., Kiker, G. A., & Brown, M. T. (2011). Land use disturbance indicators and water quality variability in the Biscayne Bay Watershed, Florida. Ecological Indicators, 11(5), 1093-1104. Chen, J., & Lu, J. (2014). Effects of Land Use, Topography and Socio-Economic Factors on River Water Quality in a Mountainous Watershed with Intensive Agricultural Production in East China. PLoS ONE, 9(8), e102714. doi:10.1371/journal.pone.0102714 Chen, Q., Mei, K., Dahlgren, R. A., Wang, T., Gong, J., & Zhang, M. (2016). Impacts of land use and population density on seasonal surface water quality using a modified geographically weighted regression. Science of the total environment, 572, 450-466. Chermack, T. J., & Swanson, R. A. (2008). Scenario planning: human resource development's strategic learning tool. Advances in Developing Human Resources, 10(2), 129-146. Clapcott, J. E., Collier, K. J., Death, R. G., Goodwin, E. O., Harding, J. S., Kelly, D., ... & Young, R. G. (2012). Quantifying relationships between land‐use gradients and structural and functional indicators of stream ecological integrity. Freshwater Biology, 57(1), 74- 90. doi:10.1111/j.1365-2427.2011.02696.x Clément, F., Ruiz, J., Rodríguez, M. A., Blais, D., & Campeau, S. (2017). Landscape diversity and forest edge density regulate stream water quality in agricultural catchments. Ecological Indicators, 72, 627–639. doi:10.1016/j.ecolind.2016.09.001 Ding, J., Jiang, Y., Liu, Q., Hou, Z., Liao, J., Fu, L., & Peng, Q. (2016). Influences of the land use pattern on water quality in low-order streams of the Dongjiang River basin, China: A multi-scale analysis. Science of The Total Environment, 551-552, 205–216. doi:10.1016/j.scitotenv.2016.01.162 51 Del Monaco, N. (2017). Reducing directly connected stormwater infrastructure and the associated benefits (Doctoral dissertation, Rutgers University-Graduate School-New Brunswick). Ewing, R. H. (n.d.). Characteristics, Causes, and Effects of Sprawl: A Literature Review. Urban Ecology, 519–535. doi:10.1007/978-0-387-73412-5_34 Fan, M., & Shibata, H. (2015). Simulation of watershed hydrology and stream water quality under land use and climate change scenarios in Teshio River watershed, northern Japan. Ecological Indicators, 50, 79–89. doi:10.1016/j.ecolind.2014.11.003 Forman, R. T. (2014). Land Mosaics: The ecology of landscapes and regions (1995) (p. 217). Island Press. Goonetilleke, A., Thomas, E., Ginn, S., & Gilbert, D. (2005). Understanding the role of land use in urban stormwater quality management. Journal of Environmental Management, 74(1), 31–42. doi:10.1016/j.jenvman.2004.08.006 Grand River Water Management Plan. (2013). Update Water Quality Targets to Support Healthy and Resilient Aquatic Ecosystems in the Grand River Watershed Giri, S., & Qiu, Z. (2016). Understanding the relationship of land uses and water quality in Twenty First Century: A review. Journal of Environmental Management, 173, 41–48. doi:10.1016/j.jenvman.2016.02.029 Glińska-Lewczuk, K., Gołaś, I., Koc, J., Gotkowska-Płachta, A., Harnisz, M., & Rochwerger, A. (2016). The impact of urban areas on the water quality gradient along a lowland river. Environmental Monitoring and Assessment, 188(11). doi:10.1007/s10661-016-5638-z Gordon, P., & Richardson, H. W. (1996). Beyond Polycentricity: The Dispersed Metropolis, Los Angeles, 1970-1990. Journal of the American Planning Association, 62(3), 289–295. doi:10.1080/01944369608975695 Hameed, M., Sharqi, S. S., Yaseen, Z. M., Afan, H. A., Hussain, A., & Elshafie, A. (2016). Application of artificial intelligence (AI) techniques in water quality index prediction: a case study in tropical region, Malaysia. Neural Computing and Applications, 28(S1), 893–905. doi:10.1007/s00521-016-2404-7 Harrell, F. (2017). Regression modeling strategies. BIOS, 330, 2018. Holcomb, D. A., Messier, K. P., Serre, M. L., Rowny, J. G., & Stewart, J. R. (2018). Geostatistical Prediction of Microbial Water Quality Throughout a Stream Network Using Meteorology, Land Cover, and Spatiotemporal Autocorrelation. Environmental Science & Technology, 52(14), 7775–7784. doi:10.1021/acs.est.8b01178 Homer, C., Dewitz, J., Yang, L., Jin, S., Danielson, P., Xian, G., ... & Megown, K. (2015). Completion of the 2011 National Land Cover Database for the conterminous United States–representing a decade of land cover change information. Photogrammetric Engineering & Remote Sensing, 81(5), 345-354. Huang, Z., Han, L., Zeng, L., Xiao, W., & Tian, Y. (2015). Effects of land use patterns on stream water quality: a case study of a small-scale watershed in the Three Gorges Reservoir 52 Area, China. Environmental Science and Pollution Research, 23(4), 3943–3955. doi:10.1007/s11356-015-5874-8 Hwang, S.-A., Hwang, S.-J., Park, S.-R., & Lee, S.-W. (2016). Examining the Relationships between Watershed Urban Land Use and Stream Water Quality Using Linear and Generalized Additive Models. Water, 8(4), 155. doi:10.3390/w8040155 Jones, J. E., Earles, T. A., Fassman, E. A., Herricks, E. E., Urbonas, B., & Clary, J. K. (2005). Urban Storm-Water Regulations—Are Impervious Area Limits a Good Idea? Journal of Environmental Engineering, 131(2), 176–179. doi:10.1061/(asce)0733- 9372(2005)131:2(176) Kalteh, A. M., Hjorth, P., & Berndtsson, R. (2008). Review of the self-organizing map (SOM) approach in water resources: Analysis, modelling and application. Environmental Modelling & Software, 23(7), 835–845. doi:10.1016/j.envsoft.2007.10.001 Kraskov, A., Stögbauer, H., & Grassberger, P. (2011). Erratum: Estimating mutual information [Phys. Rev. E 69, 066138 (2004)]. Physical Review E, 83(1), 019903. Lee, S.-W., Hwang, S.-J., Lee, S.-B., Hwang, H.-S., & Sung, H.-C. (2009). Landscape ecological approach to the relationships of land use patterns in watersheds to water quality characteristics. Landscape and Urban Planning, 92(2), 80–89. doi:10.1016/j.landurbplan.2009.02.008 Lek, S. (1999). Predicting stream nitrogen concentration from watershed features using neural networks. Water Research, 33(16), 3469–3478. doi:10.1016/s0043-1354(99)00061-5 Li, Y., Li, Y., Qureshi, S., Kappas, M., & Hubacek, K. (2015). On the relationship between landscape ecological patterns and water quality across gradient zones of rapid urbanization in coastal China. Ecological Modelling, 318, 100–108. doi:10.1016/j.ecolmodel.2015.01.028 Lintern, A., Webb, J. A., Ryu, D., Liu, S., Bende-Michl, U., Waters, D., … Western, A. W. (2017). Key factors influencing differences in stream water quality across space. Wiley Interdisciplinary Reviews: Water, 5(1), e1260. doi:10.1002/wat2.1260 Liu, A., Goonetilleke, A., & Egodawatta, P. (2012). Inadequacy of Land Use and Impervious Area Fraction for Determining Urban Stormwater Quality. Water Resources Management, 26(8), 2259–2265. doi:10.1007/s11269-012-0014-4 Lv, H., Xu, Y., Han, L., & Zhou, F. (2014). Scale-dependence effects of landscape on seasonal water quality in Xitiaoxi catchment of Taihu Basin, China. Water Science and Technology, 71(1), 59–66. doi:10.2166/wst.2014.463 McGarigal, K. (1995). FRAGSTATS: spatial pattern analysis program for quantifying landscape structure (Vol. 351). US Department of Agriculture, Forest Service, Pacific Northwest Research Station. McHarg, I.L., Sutton, J., 1975. Ecological plumbing for the Texas coastal plain: The Woodlands New Town Experiment. Landscape Archit. 65 (1), 80–90. 53 Mirzaei, M., Jafari, A., Gholamalifard, M., Azadi, H., Shooshtari, S. J., Moghaddam, S. M., … Witlox, F. (2020). Mitigating environmental risks: Modeling the interaction of water quality parameters and land use cover. Land Use Policy, 95, 103766. doi:10.1016/j.landusepol.2018.12.014 Molina-Navarro, E., Segurado, P., Branco, P., Almeida, C., & Andersen, H. E. (2020). Predicting the ecological status of rivers and streams under different climatic and socioeconomic scenarios using Bayesian Belief Networks. Limnologica, 80, 125742. doi:10.1016/j.limno.2019.125742 Obropta, C. C., & Del Monaco, N. (2018). Reducing Directly Connected Impervious Areas with Green Stormwater Infrastructure. Journal of Sustainable Water in the Built Environment, 4(1), 05017004. doi:10.1061/jswbay.0000833 Oeding, S., Taffs, K. H., Cox, B., Reichelt-Brushett, A., & Sullivan, C. (2018). The influence of land use in a highly modified catchment: Investigating the importance of scale in riverine health assessment. Journal of Environmental Management, 206, 1007–1019. doi:10.1016/j.jenvman.2017.12.005 Pratt, B., & Chang, H. (2012). Effects of land cover, topography, and built structure on seasonal water quality at multiple spatial scales. Journal of Hazardous Materials, 209-210, 48–58. doi:10.1016/j.jhazmat.2011.12.068 Riitters, K. H., O’Neill, R. V., Hunsaker, C. T., Wickham, J. D., Yankee, D. H., Timmins, S. P., … Jackson, B. L. (1995). A factor analysis of landscape pattern and structure metrics. Landscape Ecology, 10(1), 23–39. doi:10.1007/bf00158551 Schreiber, J., Jessulat, M., & Sick, B. (2019). Generative Adversarial Networks for Operational Scenario Planning of Renewable Energy Farms: A Study on Wind and Photovoltaic. Artificial Neural Networks and Machine Learning – ICANN 2019: Image Processing, 550–564. doi:10.1007/978-3-030-30508-6_44 Sharifi, A., Yen, H., Boomer, K. M. B., Kalin, L., Li, X., & Weller, D. E. (2017). Using multiple watershed models to assess the water quality impacts of alternate land development scenarios for a small community. CATENA, 150, 87–99. doi:10.1016/j.catena.2016.11.009 Shi, Z. H., Ai, L., Li, X., Huang, X. D., Wu, G. L., & Liao, W. (2013). Partial least-squares regression for linking land-cover patterns to soil erosion and sediment yield in watersheds. Journal of Hydrology, 498, 165–176. doi:10.1016/j.jhydrol.2013.06.031 Shi, P., Zhang, Y., Li, Z., Li, P., & Xu, G. (2017). Influence of land use and land cover patterns on seasonal water quality at multi-spatial scales. CATENA, 151, 182–190. doi:10.1016/j.catena.2016.12.017 Sohn, W., Kim, J.-H., & Li, M.-H. (2017). Low-impact development for impervious surface connectivity mitigation: assessment of directly connected impervious areas (DCIAs). Journal of Environmental Planning and Management, 60(10), 1871–1889. doi:10.1080/09640568.2016.1264929 54 Sun, R., Chen, L., Chen, W., & Ji, Y. (2011). Effect of Land-Use Patterns on Total Nitrogen Concentration in the Upstream Regions of the Haihe River Basin, China. Environmental Management, 51(1), 45–58. doi:10.1007/s00267-011-9764-7 Sun, Y., Guo, Q., Liu, J., & Wang, R. (2014). Scale Effects on Spatially Varying Relationships Between Urban Landscape Patterns and Water Quality. Environmental Management, 54(2), 272–287. doi:10.1007/s00267-014-0287-x Teklu, B. M., Hailu, A., Wiegant, D. A., Scholten, B. S., & Van den Brink, P. J. (2016). Impacts of nutrients and pesticides from small- and large-scale agriculture on the water quality of Lake Ziway, Ethiopia. Environmental Science and Pollution Research, 25(14), 13207– 13216. doi:10.1007/s11356-016-6714-1 Texas Commission on Environmental Quality. (2014). Managing nonpoint source pollution in Texas, 2013 annual report Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. Wang, R., Zhang, X., & Li, M.-H. (2019). Predicting bioretention pollutant removal efficiency with design features: A data-driven approach. Journal of Environmental Management, 242, 403–414. doi:10.1016/j.jenvman.2019.04.064 Wijesiri, B., Deilami, K., & Goonetilleke, A. (2018). Evaluating the relationship between temporal changes in land use and resulting water quality. Environmental Pollution, 234, 480–486. doi:10.1016/j.envpol.2017.11.096 Wilson, C. O. (2015). Land use/land cover water quality nexus: quantifying anthropogenic influences on surface water quality. Environmental Monitoring and Assessment, 187(7). doi:10.1007/s10661-015-4666-4 World Population Review (2019). Texas Population 2019. Retrieved from http://worldpopulationreview.com/states/texas-population/ Yang, B., & Li, M.-H. (2011). Assessing planning approaches by watershed streamflow modeling: Case study of The Woodlands; Texas. Landscape and Urban Planning, 99(1), 9–22. doi:10.1016/j.landurbplan.2010.08.007 Yu, D., Shi, P., Liu, Y., & Xun, B. (2013). Detecting land use-water quality relationships from the viewpoint of ecological restoration in an urban area. Ecological Engineering, 53, 205–216. doi:10.1016/j.ecoleng.2012.12.045 Zhou, P., Huang, J., Pontius, R. G., & Hong, H. (2016). New insight into the correlations between land use and water quality in a coastal watershed of China: Does point source pollution weaken it? Science of The Total Environment, 543, 591–600. doi:10.1016/j.scitotenv.2015.11.063 55 CHAPTER 3 DERIVING ANNUAL LAND COVER MAPS AND MODELING THE LONGITUDINAL EFFECT OF LAND COVER CHANGE ON NUTRIENT AND BACTERIA CONCENTRATIONS 3.1 Introduction Land cover change is an important driver of many environmental issues such as climate change, hydrological cycle alteration, nonpoint source pollution, biodiversity declines, and so on (Kalnay and Cai, 2003; Sajikumar and Remya, 2015; Newbold et al., 2016; Zhao et al., 2016; Oeding et al., 2018). Assessing the relationship between land cover and stream water quality is recognized as an imperative step to help manage nonpoint source pollution and to inform land use policies in the watershed (Ai et al., 2015; Giri and Qiu, 2016; Wijesiri et al., 2018). The significant impact of land cover change, together with climatic, geo-morphological, and socioeconomic factors on stream water quality has been highlighted in recent research (Ding et al., 2016; Manfri et al., 2016; Zhou et al., 2016; Lintern et al., 2018). Two broad issues are associated with this area of research. Firstly, because many water quality studies are conducted at a local and cross-sectional scale with limited samples (Huang et al., 2016; Luo et al., 2017; Rodrigues et al., 2018), confidence about land cover effect on stream quality is not high; therefore there is little capacity to make generalizations regionally. Secondly, the spatial and temporal variations in the land-water relationship are difficult to quantify with a simple and robust model structure (Sun et al., 2013; Bu et al., 2014; Kibena and Gumindoga, 2014; Walsh and Webb, 2014). To concentrate on the two issues, I propose a linear mixed model structure that employs 20-year water quality data from 1991 to 2011 in the Texas Gulf Region. The abundance of data is able to capture the variation in the natural and anthropogenic drivers of water quality degradation in this region. 56 With respect to the first issue, one difficulty in conducting a long-term study to quantify the land cover effect on water quality is the mismatch between the temporal resolution of land cover data and the stream water quality data. Land cover data such as NLCD, measured at 2-3 year intervals, is much coarser than the measured water quality data (Seeboonruang, 2012; du Plessis et al., 2015; Homer et al., 2015; Vrebos et al., 2017). In addition, the quality of land cover maps before 2001 is not as high as more recent land cover maps (Vogelmann et al., 2001). There is a great need to generate consistent land cover maps over a long period of time that match the temporal range and resolution of stream water quality data. In addition, land cover maps should have relatively balanced accuracy among different classes, because less common land cover types can still affect environmental processes and functions significantly (Zhu et al., 2016; Heydari and Mountrakis, 2018). Mapping large-area heterogeneous landscapes and detecting changes is always challenging (Schneider et al., 2010; Rodriguez-Galiano et al., 2012; Thakkar et al., 2017). The selection of classifier, the inclusion of auxiliary training features, and the training sample size and distribution are all critical to improve classification performance (Millard and Richardson, 2015; Zhang and Roy, 2017; Liu et al., 2018). Recent developments in long-term land cover classification and change detection methods have incorporated spectral, spatial, and temporal data, as well as the knowledge of logic processes to provide reliable outcomes (Manandhar et al., 2009; Gómez et al., 2016; Jin et al., 2017; Liu et al.,2019). One of the efficient approaches is to use the multi-threshold method to identify change groups, such as biomass increase and decrease groups with multiple spectral indices (Jin et al., 2013, Jin et al., 2017). In this study, a comprehensive investigation of training samples and classifiers, and a multi-threshold post-classification quality control process, are the two primary attempts to obtain the annual land cover data. 57 To address the second issue, it is important to construct a simple but credible model to quantify the spatial and temporal variations in the long-term land cover and water quality relationships to advise both local and regional planning (Wang et al., 2014; Chen et al., 2016; Shi et al., 2017). Significant associations between land cover and water quality have been found using Ordinary Least Squares (OLS) regression with the assumption that the relationship is constant across space (Rothwell et al., 2010; Carey et al., 2011; Chu et al., 2013; Jordan et al., 2018). Ordinary Least Squares (OLS) regression leads to general inferences of the land cover effect on water quality but neglects spatial autocorrelation among water quality samples (Tu, 2011). Geographically Weighted Regression (GWR) demonstrates great improvements in model performance over OLS because it assumes that the samples closer to the location of an observation have a higher impact on the local parameter estimation (Tu, 2013; Chen and Lu, 2014; Chen et al., 2016). However, if the spatial extent and the number of water quality samples become too great, there is a risk that GWR model parameters and the underlying spatial relationships become too complicated. Linear mixed models can handle both spatial and temporal correlation structures among samples with flexible model structures (Molenberghs and Verbeke, 2000; Kuznetsova et al, 2017). These models have provided insights to predict many environmental parameters such as carbon cycles, soil productivity and forest density (Doetterl et al., 2013; Sakai et al., 2013; Zou et al., 2017). The advantage of using linear mixed models in water quality prediction is that samples can be grouped as random components to explore the unobserved characteristics in each group which are not expressed in the fixed effects. For example, water quality samples can be grouped according to the year they were taken, the site they were taken at, and the antecedent discharge, depending on which are the factors of interest (Sheldon et al., 2012; Lessel and Bishop, 2013; Bonansea et al., 2015). Compared to GWR, linear mixed models are more flexible in grouping 58 water quality samples at the scale of policy interest and can account for temporal variation at the same time. The novelty of this study is two-fold. Firstly, it provides an efficient classification and change detection algorithm for generating annual land cover maps. The algorithm can be applied to obtain historical land cover data where only one-year land cover map is available. Secondly, it is one of the few water quality studies with a large regional scale and a long time range. The derived regional-scale knowledge matches the spatial scale of urban and regional planning. This study involves four research objectives: 1) To develop a robust annual land cover classification workflow implemented on the GEE platform. 2) To explore the land cover change and stream water quality change trajectory from 1990 to 2011. 3) To find the most appropriate linear model correlation structure to model the longitudinal relationships between land cover and nutrient and bacteria concentrations. 4) To provide land use and watershed management policy implications at both regional and basin levels in Texas. 3.2 Data and Method 3.2.1 Study Site The Texas Gulf Region is one of the 21 water resource regions within the first-level hydrological units in the United States. It consists of 11 subregions and 23 basins with a total drainage area of 471,080 km2. It covers most areas of Texas and discharges into the Gulf of Mexico. The climate of this region is quite diverse, with a maritime climate along the coast, a continental climate in the central and northern areas, and a dry and hot climate in the west. These diverse climates lead to a heterogeneous landscape across the region. From east to west, the terrain ecosystem changes from coastal swamps and piney woods to rolling plains and rugged hills. According to landscape characteristics, Texas can be divided into 10 ecoregions or natural regions, with 9 of them located 59 in the Texas Gulf Region, including the Piney Woods, the Gulf Prairies and Marshes, the Post Oak Savannah, the Blackland Prairies, the Cross Timbers, the South Texas Plains, the Edwards Plateau, the Rolling Plains, and the High Plains (Figure 3-1). Texas is the second largest state in the United States with a current population of 29 million. It has an annual population growth rate of 1.8%, ranking the third in the country (World Population Review, 2019). The increasing population results in the problem of urban sprawl, which has put natural forest areas at risk and caused stream water quality degradation. In Texas, 410 out of 1214 water bodies do not meet applicable water quality standards or are threatened for one or more designated uses, among which bacteria, dissolved oxygen, nutrients, and organics are the major concerns. Nonpoint source pollution closely related to land use contributes to approximately 45% of stream water quality impairment (Texas Commission on Environmental Quality, 2014). To monitor and assess stream water quality conditions, the Texas Commission on Environmental Quality (TCEQ) Surface Water Quality Monitoring (SWQM) Program has over 3000 active monitoring stations throughout the state. 60 Figure 3-1. Texas Gulf Region with a base map of NLCD 2011 and the Texas ecoregions 3.2.2 Data • Data for image classification 61 NLCD with an 89% overall accuracy at Level I is the most fundamental data to investigate the impact of land cover change on ecosystems in the United States (Tran et al., 2010; Homer et al., 2015; Wickham et al., 2017). It provides land cover data at a 30 m resolution from 2001 to 2016 at 2-3 year intervals. NLCD 1992 is also available but is not recommended for any direct comparisons with the subsequent NLCD products due to the change of legends and mapping methods (Vogelmann et al., 2001). In this study, NLCD 2011 was used to extract ground truthed land cover types for image classification, and NLCD 2006 and 2001 were used as validation maps to evaluate classification performance. The USGS Landsat 5 Surface Reflectance Tier 1 product was used as the base map to extract the spectral training features. This dataset is the atmospherically corrected and orthorectified surface reflectance data. The USGS National Elevation Dataset with the spatial resolution of 30m resampled from the original 1/3 arcsecond was used to derive elevation, slope and other terrain features. All the classification training data was extracted from the GEE platform. • Data for stream water quality prediction The stream water quality data was acquired from the Texas Clean Rivers Program (CRP). There are 1783 water quality monitoring stations in the Texas Gulf Region in operation between 1991 and 2011, from which all the available NO3 --N, PO4 3--P, NH4 +-N, TP and E.coli concentration data were obtained. Then the 1783 contributing areas were delineated according to the 30m Digital Elevation Model (DEM) with the water quality monitoring stations as the subbasin outlets. The delineated watershed boundary was used to obtain all the independent variables of each subbasin. The annual land cover areal percentages were calculated from the classified land cover maps. The elevation and slope data was derived from the 30m USGS National Elevation Dataset. The climatic data, including monthly total precipitation and average temperature, was acquired from PRISM 62 Monthly Spatial Climate Dataset AN81m. All the independent variables were obtained from the GEE platform. 3.2.3 Methods Two major steps were implemented in this study as shown in the flowchart (Figure 3-2): First, annual land cover maps from 1991 to 2011 were generated for the whole Texas Gulf Region. Local random forest classifiers were applied in each ecoregion with a combination of spectral, ancillary, seasonal, and textural training features. Then the 20-year independent classification maps were passed through the post-classification quality control algorithm to produce the final images. Second, land cover percentages in each year were calculated from the 20-year land cover maps to build longitudinal regression models together with nutrient and bacteria concentrations as dependent variables using linear mixed models. 63 Figure 3-2. Method flowchart • Annual land cover classification Local random classifiers applied to every ecoregion were tested to outperform a single random classifier because the dominant land cover types were different among ecoregions. The Post Oak Savannah, the Blackland Prairies and the Cross Timbers ecoregions share some landscape similarities and they were merged to become one region (Post Oak and Prairie) in this study. The High Plains were excluded from the classification process because no water quality monitoring stations are in this ecoregion. Therefore, six local random forest classifiers were fitted independently in the Piney Woods, the Gulf Prairies and Marshes, the Post Oak and Prairies, the 64 South Texas Plains, the Edwards Plateau, and the Rolling Plains ecoregions. In each local random forest classifier, the number of trees was set to 10, the number of variables per split was set to the square root of the number of variables, and the minimum size of a terminal node was set to 1. A pair of cloud-free Landsat images in both leaf-on and leaf-off seasons were generated every year from 1991 to 2011 to extract the training samples. Specifically, the median values of the clear and water pixels with low or median cloud confidence in the pixel quality band of Landsat 5 were selected to generate the cloud-free images. To ensure the reliability of the training samples, two control principles were implemented. 1) Only pixels with consistent land cover labels in NLCD 2001, 2006 and 2011 were included in the training sample pool. 2) A spatial filter was applied to all the pixels to filter pixels with land cover labels the same as the surrounding eight pixels. In each ecoregion, 160,000 training samples were selected as input to the local random forest classifier. Three groups of training features were used in the classification process, which were basic spectral features, ancillary features, and texture features. The basic spectral features included band 1 to band 7 of Landsat 5 imagery. The topography-based ancillary features included elevation, slope, Terrain Ruggedness Index (TRI), Topographic Wetness Index (TWI), slope Length and Steepness factor (LS factor) (Moore et al., 1993; Riley et al., 1999; Panagos et al., 2015). The spectral-based ancillary features included the ratio of near infrared band to the red band, NDVI, Tasseled Cap wetness, and greenness and brightness index (Crist and Cicone, 1984). Texture features calculated from the Grey Level Co-occurrence Matrix (GLCM) were also included in the classification process to aid the detection of developed area and planted area (Rodriguez-Galiano et al., 2012). In this study, the kernel of size 7*7 pixels was used to derive texture features from both the Landsat 5 NIR band and the NDVI image based on the GLCM. The six most important 65 texture features discovered by the Principal Component Analysis (PCA) were selected, including difference entropy, cluster prominence, correlation, cluster shade, information measure of correlation, and sum average. In addition, all the spectral-based features were derived from both the leaf-on and the leaf-off images to add seasonal information. In total, 53 training features were used in the classifier training process. The agreement between the classified map and NLCD was referred to as “accuracy” in this study. The original classification scheme was the eight Anderson Level I land cover classes, which are water, developed, barren, shrubland, herbaceous, planted/cultivated, and wetlands (Anderson et al., 1976). The developed open space, barren, and wetlands were excluded in this study and six land cover classes remained in the classified land cover maps. Barren lands are occupied by less than 15% vegetation and their effect on water quality is similar to those of the developed lands. Wetlands are composed of water and vegetation covers and they would be classified as either water or as whatever vegetation covers them. The water, developed, forest, shrubland, herbaceous and planted land cover classes are all critical to stream nutrient and bacteria concentrations. Therefore, both the overall accuracy and the minimum accuracy of each class are important to the water quality models (Heydari and Mountrakis, 2018). Proportionally distributed training samples yield higher overall accuracy and equally distributed training samples lead to higher minimum accuracy of each class (Mellor et al., 2015; Zhu et al., 2016). In this study, a balance was sought between high overall accuracy and good accuracy within each class. Specifically, tests were conducted to find a balance between proportional samples and equal samples by increasing the sample size of the minority classes. Logical trajectory information together with spectral characteristics was used to correct classification errors of the 20 independent land cover maps with a comprehensive post- 66 classification quality control approach. This quality control approach was modified based on Xian and Homer’s method (Xian and Homer, 2010) and Jin and others’ method (Jin et al., 2013), with an adjustment of control principles and threshold selections to adapt to the local conditions. The quality control process involved two steps. 1) The unchanged mask, the Biomass Increase (BI) mask and the Biomass Decrease (BD) mask were generated to recover some pixels’ labels to those of NLCD 2011. 2) Classification maps were updated in a way that the changes of developed area and forest area were logical. Developed areas, once established, should not change to other land cover types; and if forest areas changed to other land cover types, they would not be able to change back in just 20 years. In the first step, four spectral indices, including Change Vector (CV), the Relative Change Vector MAXimum (RCVMAX), the differenced Normalized Burn Ratio (dNBR), and the differenced Normalized Difference Vegetation Index (dNDVI), were used in the quality control process (Equation 3-1). The four indices indicate the spectral changing conditions of one image compared to another, which implies the possibilities of land cover change. In the equations, B1i denotes the ith band of the early Landsat image and B2i represents the ith band for the later Landsat image. CV and RCVMAX were used to generate the unchanged mask. For example, by comparing the classified image with NLCD 2011, water pixels with Z score of CV smaller than 2 or RCVMAX smaller than 1 were labeled as unchanged. The four indices were used together to generate the BI mask and the BD mask. For example, pixels with Z score of dNDVI larger than 0, dNBR larger than 1, and RCVMAX larger than 1 were designated as part of the BI mask. If the land cover changed from forest to grass, which was a biomass decrease, but the pixels were in the BI mask, they would be corrected to NLCD 2011 labels as forest. In the quality control process, 67 pairs of spectral indices in both leaf-on and leaf-off seasons were generated every year and combined with the “OR” principle. ( dNBR = dNDVI = ∑ ( CV RCVMAX = i ) / ( B B B − 14 14 17 ( ) / ( B B B − 14 13 14 2 ) B B − 1 2 i ∑ B 1 i B 2 = − (   i i ) B + 17 B + 13 ( − ) − B 24 ( B 24 B − 27 B − 23 ) / ( B 24 ) / ( B 24 B + 27 B + 23 ) ) / max( B B 1 2 i , i 2 )   i ) Equation 3-1 The thresholds of the three change detection masks were defined with exploratory statistics and decision tree algorithms. The spectral characteristics of pixels with unchanged labels, biomass increase, and biomass decrease were carefully reviewed by comparing NLCD 2006 and NLCD 2011. Multi-threshold methods were designed using the four indices to generate the change detection masks. The quality control procedure was implemented in an iterated fashion from 1991 to 2011. Finally, the accuracy assessment was conducted by calculating the confusion matrix, the overall accuracy, and the kappa coefficient in 2006 and 2001 with NLCD as the validation data. R2 was also calculated as the most important performance measurement, representing the agreement between the true number of land cover pixels and the classified number of land cover pixels among all the classes in all the subbasins. • Statistical analysis Land cover percentages were retrieved from the classification maps at a yearly base. The pollutant concentration data of NO3 --N, PO4 3--P, NH4 +-N, TP, and E.coli was aggregated yearly in both dry and wet seasons, as were the average temperature and total precipitation. All the pollutant 68 concentrations were log transformed to make them close to normal distributions. The land cover change trends as well as the nutrient and bacteria concentrations were explored. Linear mixed models are key methods of modeling the spatial dependency and temporal dependency among the water quality samples. In a mixed model, fixed effects are assumed constant across samples while random effects vary. Random effects represent groups of samples that share the same unobserved characteristics in each group. Random intercept models were used in this study to avoid overcomplicated parameters and the over-fitting issue. For example, if basins were the only random intercepts, the underlying assumption was that except for land cover, topography, and climatic fixed effects, each basin has some unobserved factors that affect stream water quality, represented by a random intercept. In this study, the potential random effects were years, basins, regions, and sampling stations. The random effects were assumed to be independent from each other. The matrix form of a linear mixed model is as follows (Equation 3-2): Y X = + Uβ γ ε + Equation 3-2 In the above equation, Y is a known vector of observations, which is the vector of pollutant concentration of all the samples. X is the design matrix representing the fixed effect covariates of the samples, which are land cover, topography and climate. U is the design matrix of random effect covariates, which can be columns of years, regions, basins and sampling stations. β is the unknown fixed effect coefficient vector and γ is the unknown random effect coefficient vector to be estimated. In a random intercept model, the correlations among samples in the same group are assumed to be the same. 69 Several candidate models were compared in this study. The fixed effect covariates in all the models were percentages of water, developed, forest, shrubland, and planted land covers, temperature, precipitation, elevation, and slope. The dependent variables were yearly mean pollutant concentrations of NO3 --N, PO4 3--P, NH4 +-N, TP, and E.coli. Dry and wet season models were constructed separately. The first model was the fixed effect multiple linear regression models with no random effects. The second model had only random intercepts of years. The third model had random intercepts of years and ecoregions. The fourth model had random intercepts of years, ecoregions, and basins. The fifth model included random intercepts of years, ecoregions, basins and monitoring stations. Candidate models were compared with respect to the R2, the Akaike Information Criterion (AIC), and the likelihood ratio test to detect significant differences between models. The selected model was used to draw the longitudinal relationships between land cover and pollutant concentrations. 3.3 Result 3.3.1 Land cover change in the Texas Gulf Region • Land cover classification accuracy There was strong agreement between the classified land cover maps and NLCD in both 2006 and 2001. The classified maps achieved 96.19% and 94.69% overall accuracy; and the kappa coefficients were 0.94 and 0.92 in 2001 and 2006 respectively. Table 3-1 and Table 3-2 show that the classification performed particularly well in mapping water and shrubland areas, with a recall of 99.01% and 97.31% in 2006, and a precision of 97.70% and 98.13% in 2001. The precision of developed areas was 87.91% in 2006 and 82.97% in 2001, where some developed areas were misclassified as planted areas. The recall of herbaceous areas was 90.13% in 2006 and 91.78% in 2001, where some shrublands and planted were misclassified as herbaceous. The R2 of the true 70 land cover areas versus the classified land cover areas was 0.98 in both 2006 and 2001, which was calculated among all the 1783 subbasins. The R2 of forest, shrubland and herbaceous were particularly high of 0.97, 0.99 and 0.97 in both 2006 and 2001. Table 3-1. Confusion Matrix of the classification agreement compared with NLCD 2006. water developed forest shrub herbaceous planted recall R2 2006 (OA=96.19%, kappa=0.94, r2=0.98) water developed forest shrubland herbaceous planted 15199a 32 6 170 52 97 97.70% 0.99 2 10386 119 84 88 77 96.56% 0.94 31 210 67758 2435 391 206 95.39% 0.97 49 577 1295 303126 2631 1219 98.13% 0.99 31 159 648 3380 49065 1154 90.13% 0.97 39 450 76 2296 2011 60827 92.59% 0.97 precision 99.01% 87.91% 96.93% 97.31% 90.46% 95.67% a. The units of pixel numbers are 1000 pixels for all the land cover types. Table 3-2. Confusion matrix of the classification agreement compared with NLCD 2001 water developed forest shrub herbaceous planted recall R2 water 17103a 66 27 194 102 396 95.61% 0.94 2001 (OA=94.69%, kappa=0.92, R2=0.98) developed forest shrubland herbaceous 3 13231 95 57 111 83 97.43% 0.92 30 471 69841 3131 699 788 93.17% 0.97 94 626 1693 150444 2497 1473 95.93% 0.99 51 538 1091 2842 66049 1395 91.78% 0.97 a. The units of pixel numbers are 1000 pixels for all the land cover types. • Land cover proportions and changes precision 98.70% 82.97% 95.93% 94.81% 93.18% 95.85% planted 48 1015 59 2008 1425 95524 95.45% 0.96 The land cover areal percentages from 1991 to 2011 were smoothed and presented in Figure 3-3, together with conversion tables of the six ecoregions (Figure 3-3). An obvious deforestation trend was found in the Piney Woods ecoregion. This region had the largest proportion of forest area, but more than 4000 km2 of forest changed to shrubland or herbaceous land. The forest degradation 71 trend was particularly rapid from 2005 to 2011. The Gulf Prairies and Marshes ecoregion has the largest percentages of urban area and planted area. Around 1000 km2 forest in this ecoregion changed to planted or developed areas from 1991 to 2011, but the deforestation trend has recently slowed. The Post Oak and Prairies ecoregion was occupied by balanced proportions of forest, herbaceous, and planted areas. There seemed to be a forest restoration in this ecoregion after 2000. The South Texas Plains, the Rolling Plains and the Edwards Plateau ecoregions were primarily occupied by shrubland. Water area has decreased in the South Texas Plains, with more than 1000 km2 water changing to planted or shrubland areas. In the Rolling Plains ecoregion, most of the land cover was relatively stable, with slight forest degradation. 72 Figure 3-3. Land cover proportions and conversions of the six ecoregions from 1991 to 2011. 3.3.2 The spatial and temporal distributions of nutrient and bacteria concentrations There were large variations in both spatial and temporal distributions of nutrient and bacteria concentrations. Trend plots in different ecoregions of the yearly mean concentrations of NO3 --N, PO4 3--P, NH4 +-N, TP, and E.coli from 1991 to 2001 are present in Figure 3-4. The South Texas Plains, the Gulf Prairie and Marshes, the Piney Woods and the Post Oak and Prairies all faced the issue of the increasing NO3 --N pollution, while the Rolling Plains had a decreasing trend of NO3 -- 73 N concentration. The increasing trend of NO3 --N was particularly significant in the Gulf Prairie and Marshes ecoregion with the average concentration in 2011 rising to higher than 5.0 mg/l. The PO4 3--P concentration in the Gulf Prairie and Marshes and the South Texas Plains were higher than the other ecoregions. The average concentration in both regions were higher than 5 mg/l after 2005. There were also increasing trends of PO4 3--P in the Rolling Plains and Piney Woods. In the Rolling Plains and the Gulf Prairie and Marshes, there was an increasing trend of TP after 2000. The TP concentration in the Gulf Prairie and Marshes reached around 1mg/l in 2011, compared to around 0.5 mg/l in 2000. The high E.coli concentration in the Piney Woods and the Gulf Prairie and Marshes in 2001 was well controlled and had started to decrease since then. In 2011, the E.coli concentration in all the ecoregions was lower than 2000 MPN/100ml. However, the E.coli concentration slightly increased after 2009 in the Post Oak and Prairies, the Rolling Plains and the Edwards Plateau ecoregions. 74 Figure 3-4. Change of nutrients and bacteria concentrations in the six ecoregions The spatial distribution of nutrient and E.coli concentrations in 1991, 2001 and 2011 are present in Figure 3-5. After the log transformation and standardization, the positive range of pollutant concentrations was close to but larger than the negative range, indicating that there were some 75 extremely high concentration values for all the pollutants, represented by the dark red points in Figure 3-5. The NO3 --N concentration increased dramatically in the Middle Brazos, the Lower Brazos and the San Jacinto basins after 2001. The San Jacinto and the San Antonio basins faced the most severe NO3 --N pollution, with the average concentration of 3.36 mg/l and 4.20 mg/l. NH4 +-N concentration remained relatively stable from 1991 to 2011, with a slight increase in the Neches and the San Jacinto basins. The PO4 3--P had very few measurements in 2001, but some hotspots could still be found in the upper and lower Trinity basins and the San Jacinto basin. In 2011, the highest PO4 3--P concentration appeared in the San Jacinto, the Southwestern Texas Coastal and the San Antonio basins. TP concentrations increased significantly in the Neches basin and the San Jacinto basins from 2001 to 2011, with the most polluted areas along the coastal line. The highest average PO4 3--P concentration were in the San Jacinto basin and the Southwestern Texas Coastal basin, with the mean concentration of 0.74 mg/l and 0.64 mg/l. The E.coli concentration generally became lower after 1991. Areas with high E.coli concentrations were in the San Jacinto basin and the Southwestern Texas Coastal basin, with the mean concentration of 4343 MPN/100ml and 1517 MPN/100ml respectively. 76 77 Figure 3-5. The spatial distributions of nutrient and E.coli concentrations in 1991, 2001, and 2011. 3.3.3 The longitudinal relationship between land cover and water quality • The longitudinal model selection Comparison among the five models in predicting NO3 --N concentration in wet seasons is present in Table 3-3. Model 1 contained only fixed effects and did not specify correlations among samples. The R2 of this model was 0.31; and the coefficient of shrubland was significantly positive, which was not reasonable in reality. It proved that models with independent assumptions among samples might lead to wrong inference. After adding a random intercept of years in Model 2, R2 increased to 0.35. The variance explained by the sampled year was 4%. The random intercept of ecoregions was added in Model 3 and R2 increased to 0.4. In this model, 16% of the variance was partitioned to the ecoregions and only 2% was partitioned to years, indicating that the spatial variation of pollutant concentration was much larger than the temporal variation. Model 4 with random intercepts of years, ecoregions and basins had R2 of 0.55. The coefficient of shrubland in this model changed to significantly negative, showing a reasonable result that shrubland had a positive impact on mitigating NO3 --N pollution. In this model, 25% of the variance was explained by the basin intercept, 12% of the variance was explained by the ecoregion intercept, and only 2% of the variance was explained by year. The likelihood ratio tests were conducted to compare the five models and it was found that every model was significantly different from the previous one. The R2 of Model 4 and Model 5 were 0.54 and 0.82 respectively. Model 4 had a moderate prediction power while Model 5 performed the best in predicting NO3 --N concentration (Figure 3- 6). However, there was a generalizability issue with Model 5. The model variance explained by residuals was only 24% and 51% of the variance was explained by the location of monitoring 78 stations. Therefore, model 5 was more suited to explain location-based stream water quality, but not the general land cover effect on water quality. Considering that Model 4 had a balance of R2 and generalization capacity, this model structure was used to draw inferences regarding the land cover effect on all the pollutants in the next step. Figure 3-6 also indicates that both models can predict NO3 --N concentration better if the observed values are larger than 0.01 mg/l. Although the very small values were not predicted accurately, these values were not as important as the normal and high concentration values in reality. 79 Table 3-3. Candidate models to predict log (NO3-N) concentration in wet seasons and their comparison Model 2 Model 1 fixed effect model model with random intercepts of year Model 3 model with random intercepts of year and ecoregion Model 4 model with random intercepts of year, ecoregion and basin model coefficients and significance model performance variance partitions %forest %developed %planted %shrubland %water year after 1991 slope elevation precipitation temperature R2 AIC residual variance proportion year variance proportion ecoregion variance proportion basin variance proportion station variance proportion -0.89**a 2.48** 2.02** 0.94** -2.30** 0.0015 -0.026 0.0002** 0.0011 0.40** 0.31 15656 100% NA NA NA NA a. ** indicates p < 0.01 b. * indicates 0.01 < p < 0.05 -0.82** 2.19** 1.84** 0.69** -2.34** NA -0.039 0.0016** 0.0023 0.60** 0.35 15595 96% 4% NA NA NA -0.61** 2.53** 1.86** 0.12 -2.29** NA -0.13** 0.0029** 0.0019*b 0.45** 0.40 15393 82% 2% 16% NA NA 80 -0.76** 1.99** 2.68** -0.85** -2.27** NA -0.013 0.0016** 0.0015 0.25** 0.55 14725 57% 2% 10% 31% NA Model 5 model with random intercepts of year, ecoregion, basin and monitoring station -0.56 2.12** 2.65** -0.69 -1.51** NA -0.039 0.0018** 0.0021** 0.15** 0.82 12060 24% 2% 7% 16% 51% Figure 3-6. The scatter plots of predicted values vs observed values of Model 4 and Model 5. • The longitudinal model inference The relationship between land cover and NO3 --N in wet seasons was the strongest among all the pollutants, represented by an R2 of 0.55. The R2 of PO4 3--P, TP, and E.coli models in wet seasons were 0.44, 0.39, and 0.41 respectively. The relationship between land cover and NH4 +-N was relatively weak, as indicated by an R2 of 0.3 in wet seasons. The land cover effect on stream water quality were generally stronger in wet seasons than in dry seasons (Table 3-4). The positive impact of forest was significant in reducing all the nutrient concentrations. The impact of forest was particularly strong in wet seasons in mitigating NO3 --N, TP, and NH4 +-N pollution. After some calculation, it was found that if adding 1 percent of forest area, the NO3 --N concentration in wet seasons was expected to drop 1.14%, and the TP concentration in wet seasons was expected to drop 1.36%. Developed land cover was significantly positively associated with all the pollutant concentrations. The impact was strong in both dry and wet seasons. For example, 81 a 1% addition of developed area caused a 5.23% increase of E.coli concentration and a 6.31% increase of NO3 --N concentration in wet seasons. The significantly positive impact of planted area on NO3 --N concentration was very strong. Adding 1% of planted area led to a 13.59% increase of NO3 --N in wet seasons. Planted area was also significantly positively associated with PO4 3--P, TP, NH4 +-N, and E.coli concentrations. Water area significantly reduced NO3 --N, PO4 3--P, TP, and E.coli concentrations. Adding 1% of water area led to an 8.6% decrease in NO3 --N concentration and a 6.7% decrease in TP concentration. Water had the most significant influence on reducing E.coli concentration, and the contribution might be attributed to some wetland areas. Shrubland area had a significantly negative association with NO3 --N, TP, and E.coli concentrations, with the impact strongest on NO3 --N. Adding 1% of shrubland area caused a 1.3% decrease of NO3 --N concentration. Slope generally had a negative impact on pollutant concentrations. In summary, the most influential land covers were developed and planted areas, with a negative impact, and water/wetland areas, with a positive impact in the study area. 82 Table 3-4. Mixed model results to predict pollutant concentrations log (NO3 --N) dry wet -0.76**a 1.99** 2.68** -0.85** -2.27** -0.013 -0.39 2.01** 2.29** -0.59* -3.84** -0.041 0.0016** 0.0016** 0.0024*b 0.024** 0.49 0.0015 0.25** 0.55 wet -0.34* 1.51** 0.91** 0.02 -1.35** -0.019 0.00021 -0.00007 -0.025 0.44 season %forest %developed %planted %shrubland %water slope elevation precipitation temperature R2 log (PO4 3--P) dry log (TP) wet dry wet log (NH4 +-N) dry log (E.coli) wet dry -0.34* 1.59** 0.85** 0.18 -1.15** -0.039** 0.00051 0.00021 0.0071 0.39 -0.74** 0.42** 0.39** -0.34** -1.92** -0.069** -0.0012** -0.0018** -0.076 0.39 -0.27** 0.97** 1.03** -0.58** -2.12** -0.098** -0.00069** 0.00079 -0.079** 0.36 -0.48** -0.86** 0.57** 0.39** 0.39** -0.002 -0.095 -0.41** -0.03 -0.46** -0.0079 -0.0023 0.00027 0.00037* -0.00081 0.014 0.21 -0.00068* 0.037 0.3 -0.12 1.83** 0.49** 0.16 -8.34** -0.13** 0.0012** 0.0033** -0.069 0.41 -0.39 2.55** 0.19 -0.33 -8.54** -0.16** 0.0021** 0.0025* 0.0059 0.37 a. ** indicates p < 0.01 b. * indicates 0.01 < p < 0.05 83 The basin random intercepts of all the wet-season mixed models are present in Figure 3-7, which represented the unobserved basin characteristics that adjusted the stream water quality prediction. The Middle Colorado-Concho basin and the Middle Brazos-Clear Fork basin had some characteristics leading to high NO3 --N concentration. After some calculations, it was found that 6.7 mg/l and 4.6 mg/l should be added besides the fixed effects when estimating NO3 --N concentrations in the above two basins. The Lower Trinity basin and the Lower Colorado basin were likely to have higher PO4 3--P concentration. A 2.48 mg/l and a 1.89 mg/l should be added when estimating PO4 3--P concentration in the two basins. The Middle Brazos-Clear Fork basin was likely to have a higher TP concentration, where a 2.77 mg/l should be added to the predicted results. According to the random intercepts of ecoregions, The South Texas Plains and the Gulf Prairie and Marshes ecoregions had positive random intercepts for all the pollutants, while the Rolling Plains and Edwards Plateau ecoregions had negative random intercepts for all the pollutants. The Piney Woods ecoregion had some positive characteristics that led to higher NO3 --N and E.coli concentrations. The Post Oak and Prairies ecoregion also had some factors that caused higher E.coli concentration. When considering random intercepts of years, pollutant concentration after 2006 was likely to be higher than in earlier years under a fixed land use scenario. 84 Figure 3-7. Bar charts of random intercepts of basin 85 3.4 Discussion 3.4.1 The impact factors on stream water quality in the Texas Gulf Region With an abundance of water quality data provided by TCEQ, water quality study in Texas was still very limited (Santhi et al., 2006; Gelca et al., 2016). The land cover effect on nutrients and bacteria obtained from this study was qualitatively consistent with existing research in other regions. Quantified land cover effect was modified for the Texas Gulf Region, with the effect primarily focused on the regional scale. The most important land cover affecting phosphorous concentration was found to be agricultural land in some research (Nielsen et al., 2012; Varanka and Luoto, 2012; Zhang et al., 2018). Urban area was also proved to have a disproportionately large influence on nutrient generation (Ai et al., 2015; Huang et al., 2016; Wijesiri et al., 2018). In the Texas Gulf Region, planted, water and developed areas were comparably important to predict NO3 --N concentration. The percentage of water area was the most important land cover to predict TP concentration, while the percentages of developed and planted areas were the secondary important predictors. Water was also the most important factor to mitigate E.coli pollution. The reason why water area was highlighted in this study might be that parts of wetlands were classified as water under this classification scheme; and wetlands could keep nutrients and sediments from entering the lakes and streams (Galgraith and Burns, 2007). The significance of shrubland was not mentioned much in existing literature (Meneses et al., 2015), but deserves attention in the Texas Gulf Region because shrubland occupied the largest proportion of land. The positive impact of shrubland on water quality was about similar to that of forest on NO3 --N concentration and weaker on other nutrients and E.coli concentrations in the study area. 86 The results in this study suggested that stream water quality was generally better explained by landscape attributes in wet seasons than in dry seasons, which were consistent with existing literature (Sheldon et al., 2012; Lv et al., 2015; Shi et al., 2017). Slope was found to be negatively associated with all the pollutants, with significant effects on PO4 3--P, TP, and E.coli concentrations. This conclusion agreed with some literature that water quality was generally better in high slope sub-catchment because pollutants tend to decrease when water flows faster (Lv et al., 2015; Shi et al., 2016). However, some researchers claimed that a gentle slope could slow down water movement and provide a longer time to decompose pollutants (Pratt and Chang, 2012; Bu et al., 2014). Temperature had a significantly positive association with NO3 --N concentration. Precipitation had a significantly negative association with PO4 3--P and NO3 --N concentrations, which agreed with pervious findings of the dilution and degradation effects of rainfall on phosphates (Rothwell et al., 2010; Varanka and Luoto, 2012). 3.4.2 The performance of the model system • The performance of the land cover classification algorithm Before quality control, the classification accuracy in this study was improved from 80% to 89% in the test set after adding all the ancillary features. The inclusion of the terrain-based ancillary features and the multi-seasonal information improved the classification accuracy of vegetation classes substantially, as was shown in other studies (Lu and Weng, 2007; Sluiter et al., 2010; Eisavi et al., 2015; Yang et al., 2017). The application of texture features was helpful in discriminating urban classification, but its importance was not as great as terrain-based ancillary features and seasonal information (Ghimire et al., 2010; Gomariz-Castillo et al., 2017). It was difficult to use a single classifier to capture all the local spectral heterogeneity in a large scale image classification (Millard and Richardson, 2015; Zhu et al., 2016; Zhang and Roy, 2017). 87 In this study, the local random classifiers implemented in each ecoregion improved the overall accuracy of the single random classifier from 79% to 89% in the test set before quality control. Ecoregion division was the most efficient approach in this study because the land cover percentages distribution was similar elsewhere within the same ecoregion. The quality control process significantly improved the classification performance. After quality control, the classification R2 was improved from 0.94 to 0.98. The multi-threshold method of identifying land cover change groups was adopted in this study and combined with knowledge- based rules to update land cover maps every year (Griffiths et al., 2014; Kim et al., 2014; Yu et al., 2016; Jin et al., 2017;). The decision tree algorithms demonstrated a high efficiency in generating thresholds in the quality control process using the label changing and spectral information learned from NLCD 2001 and 2006 (Yang et al., 2017; Wang et al., 2019). • The performance of the linear mixed models The advantages of the methodology in this study were a large spatial extent, a long time range and a large sample size, which could be used to draw more general and credible conclusions. In the existing literature, most site areas ranged from 1000 km2 to 5000 km2 (Fatehi et al., 2015; Grabowski et al., 2016; Gu et al., 2016), where the land cover and environmental characteristics of the study area might be homogenous. The study site of this research was 471, 080 km2 with a great deal of variation in climate and landscape. In this study, the number of sampling stations of each pollutant was around 1000, much more than in the previous literature, which always used fewer than 100 sampling stations (Amiri and Nakane, 2009; Varanka et al., 2015; Liu et al., 2017). It was revealed by previous studies that cross-sectional and longitudinal data analysis might generate different inferences about the land use effect on water quality (Wijesiri et al., 2018). This 88 study overcame the lack of reliability issues in the cross-sectional model by generating 20-year land cover data as the explanatory variables. The linear mixed models with random intercepts of years, ecoregions and basins explained from 21% to 55% of the observed variance in the water quality data, which was comparable with other research using similar methods (Uriarte et al., 2011). The predicting accuracy was lower than that of GWR in other research because the local estimation of model coefficients was omitted (Yu et al., 2013; Sun et al., 2014; Huang et al., 2015). If random intercepts of the location of monitoring stations were added into the model, R2 could be improved to around 0.8. Using this model structure, a constant estimation of regional-scale coefficients across the study area were acquired, with the basin-scale variation partitioned to the random intercepts. This approach was well-suited to prompt a regional understanding of the stream water quality in Texas. 3.4.3 The limitations of the study and future research suggestions One limitation of this study was that the land cover classification scheme was coarse and might conceal some important information (Wan et al., 2014). Additionally, although the land cover classification demonstrated strong agreement with NLCD 2001 and 2006 with an overall accuracy higher than 94%, the accuracy was expected to be even higher to detect subtle land cover changes accurately. Because it was not reasonable for land cover type to change back in a short period of time, the land cover percentage data was smoothed with the spline fitting method to derive the final input to the linear mixed models. The smoothed percentages might also introduce some errors. Future quality control algorithms should focus on the combination of knowledge-based methods and spectral trajectory at the pixel level to design more efficient change detection algorithms. The inference of statistical models depends highly on the variable inclusion, sample selection and model assumption. Some explanatory variables were not readily available; therefore they were 89 not included in this study, such as landscape configuration metrics, soil, geology and population dynamics, which might cause biased model estimations (Chen and Lu, 2014; Sheldon et al., 2012; Sangani et al., 2015; Wilson, 2015; Bostanmaneshrad et al., 2018). In addition, the cross- correlations between the explanatory variables were not investigated, such as the cross-correlation between land covers, and the cross-correlation between land cover and climate (Li et al., 2015; Hwang et al., 2016). In this research, the samples were aggregated in dry and wet seasons separately every year, as seasonal variation affected the relationship between land use and water quality (Hwang et al., 2016; Ai et al., 2015; Oeding et al., 2018). If taking fine-resolution climatic variables into account, such as monthly climatic variables and antecedent dry period, a finer aggregation scheme such as monthly aggregation of samples or even no aggregation should be applied to keep as much variation in the data as possible (du Plessis et al., 2015; Uwimana et al., 2017; Mello et al., 2018). With respect to spatial aggregation, the entire subbasin was adopted as the spatial aggregation unit in this study, because some literature reported that the entire watershed approach explained more variations than the riparian buffer zone approach (Pratt and Chang, 2012; Bu et al., 2014). Local- scale research should still compare catchment, riparian buffer, and reach buffer approaches to investigate the scale where each land cover type had an influence on water quality (Zhang et al., 2012; Ding et al., 2016; Liu et al., 2017). The reasons why I selected the random intercept models were to match the objective of regional water quality estimation and to avoid too overly complex model parameters. However, this model structure might oversimplify the spatial and temporal variations in the relationship between land cover and water quality. I suggest that other statistical models can be used to extend the method framework of this study. For example, feature selection techniques can be applied to identify the 90 most influential independent variables prior to the regression analysis. The most important features can be selected via PCA, Redundancy Analysis (RDA), Hierarchical Partitioning (HP) and so on (Zhao et al., 2015; Huang et al., 2016; Kändler et al., 2017; Bostanmaneshrad et al., 2018; Oeding et al., 2018). Cluster analysis such as hierarchical clustering and Self-Organizing Maps (SOM) can be applied to group samples into multiple clusters according to land use and pollutant levels; and regression analysis can be conducted among samples within the same groups (Ye et al., 2009; Liu et al., 2018; Mello et al., 2018; Zhang et al., 2018). In the regression analysis, Bayesian linear regression models with random effects can be used to decompose the interactions among data into a series of conditional models and infer the distribution of model parameters (Wan et al., 2014; Wijesiri et al., 2018). 3.4.4 Model applications and management implications The land cover maps were produced in a standard workflow on the GEE platform. The classification and change detection algorithm relied only on the Landsat imagery and an accurate land cover map in any recent year. It could be readily applied to many parts of the world to obtain historical land cover data. Similarly to other land cover classification research, the GEE platform in this study demonstrated high efficiency in automating the classification process all the way from sample generation, feature derivation to classifier training and results output (Patel et al., 2015; Gorelick et al., 2017; Huang et al., 2017; Zhao and Gao, 2019). This study provided a solution to understand the evolution of Texas land cover with a robust classification algorithm. More information can be extracted from the classified land cover maps such as land cover trends in basins, counties, and cities to inform land use policies. According to the land cover changing status, there was a considerable deforestation trend and the corresponding ecological damage to the Piney Woods ecoregion after 2000. The reforestation efforts should be 91 exerted in this region to avoid further habitat loss (World Wildlife Fund, 2019). More than one third of the Texas population lives in the Gulf Prairie and Marshes ecoregion, which has been impacted by many human-induced factors. There was a more than 1000 km2 increase of developed area and a more than 500 km2 decrease of forest area from 1991 to 2011. The quality of the remaining habitat in this region faces drastic declines with habitat fragmentation, which requires immediate restoration actions (Texas Parks and Wildlife Department, 2012). The proposed models are helpful for the modification of multiscale land use planning. The fixed effect land cover coefficients represent the relationship between land cover and water quality at the regional scale. Under a basin-scale land use planning scenario, water quality can be forecasted by plugging the land use percentages and the corresponding control factors into the linear mixed models, and adding the random intercepts of ecoregions and basins. It can then be decided whether the resulting contaminant concentration meets the regulation standards under the given land use scenario. The model framework is also flexible for local water quality estimation by fitting a mixed-effect model with random intercepts of monitoring stations. After the Land Change Monitoring, Assessment and Projection (LCMAP) data is published, annual land cover data from 1985 to 2017 can be derived to conduct similar research in other regions using the proposed linear mixed model structures (Zhu et al., 2016). The inference of land cover effect on stream water quality can be directly applied to modify land use and watershed management policies. For example, land use planning should be adjusted by controlling low density urban development that occupies forest and shrub areas to mitigate NO3 - -N, PO4 3--P, and E.coli pollution (Bateni et al., 2013; Tu, 2013). Precision agriculture and conservation tillage should be applied in areas with high nutrient concentration such as the South Texas Plains and the Gulf Prairies and Marshes ecoregions, to reduce nutrient export from the 92 croplands (Shi et al., 2017). The positive impact of water and wetland areas on reducing E.coli concentration should be considered to guide policy in areas with rising E.coli concentration, such as in the South Texas Plains (Boutilier et al., 2009; Croft-White et al., 2017 ). In addition, the information in the random components provided some baseline information of the ecoregions and basins. Research efforts should be directed to find the unobserved factors leading to NO3 --N and TP pollution in the Middle Colorado basin, and the factors causing NO3 --N and PO4 3--P pollution in the Lower Trinity basin. 3.5 Conclusion I completed a regional-scale longitudinal study of stream water modelling with land cover, terrain, and climate characteristics in the Texas Gulf Region. It involved a two-step method composed of annual land cover map classification and land cover-water quality modelling. It was the first study making use of all the available stream water quality data in a 20-year time range to derive scientific knowledge and management implications for the Texas Gulf Region. The classified land cover maps had strong agreement with NLCD 2006 and 2001, with an accuracy of 97.70%, 96.56%, 95.39%, 98.13%, 90.13%, and 92.59% for water, developed, forest, shrubland, herbaceous, and planted land covers in 2006. The overall R2 of the classified land cover areas versus true land cover areas calculated from all the subbasins was 0.98 in both 2001 and 2006. From the land cover maps, an obvious deforestation trend was observed in the Piney Woods, the South Texas Plains and the Gulf Prairies and Marshes ecoregions after 2000. Linear mixed models with random intercepts of multiple spatial units can provide multiscale inference of land cover impact on water quality. Random components of years, ecoregions, and basins should be included to account for the spatial and temporal variations. The land cover change together with the terrain and climate factors explained more than 50% of the variance in NO3 --N 93 concentration and more than 30% of the variance in PO4 3--P, TP, NH4 +-N, and E.coli concentrations in the Texas Gulf Region. The most influential land cover types, which were significantly positively correlated with all the nutrient and bacteria concentrations, were developed areas and planted areas. Increasing water areas had a strong impact on the removal of NO3 --N and E.coli. The estimation of random intercepts provided important information regarding the unobserved basin and ecoregion characteristics that affect stream water quality. The Middle Colorado-Concho, the Lower Trinity and the San Jacinto basins had some unobserved characteristics leading to high nutrient and bacteria concentrations, with most of pollution hot spots found around the Houston metropolitan area. To sum up, this research could be applied to provide insights into the knowledge of land-water interactions, to evaluate new land use scenarios, and to inform scientific regional planning and watershed management policies. 94 BIBLIOGRAPHY 95 BIBLIOGRAPHY Ai, L., Shi, Z. H., Yin, W., & Huang, X. (2015). Spatial and seasonal patterns in stream water contamination across mountainous watersheds: Linkage with landscape characteristics. Journal of Hydrology, 523, 398–408. doi:10.1016/j.jhydrol.2015.01.082 Amiri, B. J., & Nakane, K. (2008). Modeling the Linkage Between River Water Quality and Landscape Metrics in the Chugoku District of Japan. Water Resources Management, 23(5), 931–956. doi:10.1007/s11269-008-9307-z Anderson, J. R., Hardy, E. E., Roach, J. T., & Witmer, R. E. (1976). A land use and land cover classification system for use with remote sensor data. Professional Paper. doi:10.3133/pp964 Bateni, F., Fakheran, S., & Soffianian, A. (2013). Assessment of land cover changes & water quality changes in the Zayandehroud River Basin between 1997–2008. Environmental Monitoring and Assessment, 185(12), 10511–10519. doi:10.1007/s10661-013-3348-3 Bonansea, M., Rodriguez, M. C., Pinotti, L., & Ferrero, S. (2015). Using multi-temporal Landsat imagery and linear mixed models for assessing water quality parameters in Río Tercero reservoir 28–41. doi:10.1016/j.rse.2014.10.032 (Argentina). Remote Environment, 158, Sensing of Bostanmaneshrad, F., Partani, S., Noori, R., Nachtnebel, H.-P., Berndtsson, R., & Adamowski, J. F. (2018). Relationship between water quality and macro-scale parameters (land use, erosion, geology, and population density) in the Siminehrood River Basin. Science of The Total Environment, 639, 1588–1600. doi:10.1016/j.scitotenv.2018.05.244 Boutilier, L., Jamieson, R., Gordon, R., Lake, C., & Hart, W. (2009). Adsorption, sedimentation, and inactivation of E. coli within wastewater treatment wetlands. Water Research, 43(17), 4370–4380. doi:10.1016/j.watres.2009.06.039 Bu, H., Meng, W., Zhang, Y., & Wan, J. (2014). Relationships between land use patterns and water the Taizi River basin, China. Ecological Indicators, 41, 187–197. quality doi:10.1016/j.ecolind.2014.02.003 in Carey, R. O., Migliaccio, K. W., Li, Y., Schaffer, B., Kiker, G. A., & Brown, M. T. (2011). Land use disturbance indicators and water quality variability in the Biscayne Bay Watershed, Florida. Ecological Indicators, 11(5), 1093–1104. doi:10.1016/j.ecolind.2010.12.009 Chen, J., & Lu, J. (2014). Effects of Land Use, Topography and Socio-Economic Factors on River Water Quality in a Mountainous Watershed with Intensive Agricultural Production in East China. PLoS ONE, 9(8), e102714. doi:10.1371/journal.pone.0102714 Chen, X., Zhou, W., Pickett, S., Li, W., & Han, L. (2016). Spatial-Temporal Variations of Water Quality and Its Relationship to Land Use and Land Cover in Beijing, China. International 96 Journal of Environmental Research and Public Health, 13(5), 449. doi:10.3390/ijerph13050449 Chu, H.-J., Liu, C.-Y., & Wang, C.-K. (2013). Identifying the Relationships between Water Quality and Land Cover Changes in the Tseng-Wen Reservoir Watershed of Taiwan. International Journal of Environmental Research and Public Health, 10(2), 478–489. doi:10.3390/ijerph10020478 Crist, E. P., & Cicone, R. C. (1984). A Physically-Based Transformation of Thematic Mapper Data---The TM Tasseled Cap. IEEE Transactions on Geoscience and Remote Sensing, GE-22(3), 256–263. doi:10.1109/tgrs.1984.350619 Croft-White, M. V., Cvetkovic, M., Rokitnicki-Wojcik, D., Midwood, J. D., & Grabas, G. P. (2017). A shoreline divided: Twelve-year water quality and land cover trends in Lake Ontario coastal wetlands. Journal of Great Lakes Research, 43(6), 1005–1015. doi:10.1016/j.jglr.2017.08.003 Ding, J., Jiang, Y., Liu, Q., Hou, Z., Liao, J., Fu, L., & Peng, Q. (2016). Influences of the land use pattern on water quality in low-order streams of the Dongjiang River basin, China: A multi-scale analysis. Science of The Total Environment, 551-552, 205–216. doi:10.1016/j.scitotenv.2016.01.162 Doetterl, S., Stevens, A., van Oost, K., Quine, T. A., & van Wesemael, B. (2013). Spatially-explicit regional-scale prediction of soil organic carbon stocks in cropland using environmental variables and mixed model 31–42. doi:10.1016/j.geoderma.2013.04.007 approaches. Geoderma, 204-205, Du Plessis, A., Harmse, T., & Ahmed, F. (2015). Predicting water quality associated with land cover change in the Grootdraai Dam catchment, South Africa. Water International, 40(4), 647–663. doi:10.1080/02508060.2015.1067752 Eisavi, V., Homayouni, S., Yazdi, A. M., & Alimohammadi, A. (2015). Land cover mapping based on random forest classification of multitemporal spectral and thermal images. Environmental Monitoring and Assessment, 187(5). doi:10.1007/s10661-015-4489-3 Fatehi, I., Amiri, B. J., Alizadeh, A., & Adamowski, J. (2015). Modeling the Relationship between Catchment Attributes and In-stream Water Quality. Water Resources Management, 29(14), 5055–5072. doi:10.1007/s11269-015-1103-y Galbraith, L. M., & Burns, C. W. (2006). Linking Land-use, Water Body Type and Water Quality in Southern New Zealand. Landscape Ecology, 22(2), 231–241. doi:10.1007/s10980-006- 9018-x Gelca, R., Hayhoe, K., Scott-Fleming, I., Crow, C., Dawson, D., & Patiño, R. (2015). Climate- water quality relationships in Texas reservoirs. Hydrological Processes, 30(1), 12–29. doi:10.1002/hyp.10545 97 Ghimire, B., Rogan, J., & Miller, J. (2010). Contextual land-cover classification: incorporating spatial dependence in land-cover classification models using random forests and the Getis statistic. Remote Sensing Letters, 1(1), 45–54. doi:10.1080/01431160903252327 Giri, S., & Qiu, Z. (2016). Understanding the relationship of land uses and water quality in Twenty First Century: A review. Journal of Environmental Management, 173, 41–48. doi:10.1016/j.jenvman.2016.02.029 Gomariz-Castillo, F., Alonso-Sarría, F., & Cánovas-García, F. (2017). Improving Classification Accuracy of Multi-Temporal Landsat Images by Assessing the Use of Different Algorithms, Textural and Ancillary Information for a Mediterranean Semiarid Area from 2000 to 2015. Remote Sensing, 9(10), 1058. doi:10.3390/rs9101058 Gómez, C., White, J. C., & Wulder, M. A. (2016). Optical remotely sensed time series data for land cover classification: A review. ISPRS Journal of Photogrammetry and Remote Sensing, 116, 55–72. doi:10.1016/j.isprsjprs.2016.03.008 Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27. doi:10.1016/j.rse.2017.06.031 Grabowski, Z. J., Watson, E., & Chang, H. (2016). Using spatially explicit indicators to investigate watershed characteristics and stream temperature relationships. Science of The Total Environment, 551-552, 376–386. doi:10.1016/j.scitotenv.2016.02.042 Griffiths, P., Kuemmerle, T., Baumann, M., Radeloff, V. C., Abrudan, I. V., Lieskovsky, J., … Hostert, P. (2014). Forest disturbances, forest recovery, and changes in forest types across the Carpathian ecoregion from 1985 to 2010 based on Landsat image composites. Remote Sensing of Environment, 151, 72–88. doi:10.1016/j.rse.2013.04.022 Gu, Q., Zhang, Y., Ma, L., Li, J., Wang, K., Zheng, K., … Sheng, L. (2016). Assessment of Reservoir Water Quality Using Multivariate Statistical Techniques: A Case Study of Qiandao Lake, China. Sustainability, 8(3), 243. doi:10.3390/su8030243 Heydari, S. S., & Mountrakis, G. (2018). Effect of classifier selection, reference sample size, reference class distribution and scene heterogeneity in per-pixel classification accuracy using 26 Landsat sites. Remote Sensing of Environment, 204, 648–658. doi:10.1016/j.rse.2017.09.035 Homer, C., Dewitz, J., Yang, L., Jin, S., Danielson, P., Xian, G., ... & Megown, K. (2015). Completion of the 2011 National Land Cover Database for the conterminous United States–representing a decade of land cover change information. Photogrammetric Engineering & Remote Sensing, 81(5), 345-354. Huang, H., Chen, Y., Clinton, N., Wang, J., Wang, X., Liu, C., … Zhu, Z. (2017). Mapping major land cover dynamics in Beijing using all Landsat images in Google Earth Engine. Remote Sensing of Environment, 202, 166–176. doi:10.1016/j.rse.2017.02.021 98 Huang, J., Huang, Y., Pontius, R. G., & Zhang, Z. (2015). Geographically weighted regression to measure spatial variations in correlations between water pollution versus land use in a coastal Ocean & 14–24. doi:10.1016/j.ocecoaman.2014.10.007 Coastal Management, watershed. 103, Huang, Z., Han, L., Zeng, L., Xiao, W., & Tian, Y. (2015). Effects of land use patterns on stream water quality: a case study of a small-scale watershed in the Three Gorges Reservoir Area, China. Environmental Science and Pollution Research, 23(4), 3943–3955. doi:10.1007/s11356-015-5874-8 Hwang, S.-A., Hwang, S.-J., Park, S.-R., & Lee, S.-W. (2016). Examining the Relationships between Watershed Urban Land Use and Stream Water Quality Using Linear and Generalized Additive Models. Water, 8(4), 155. doi:10.3390/w8040155 Jin, S., Yang, L., Danielson, P., Homer, C., Fry, J., & Xian, G. (2013). A comprehensive change detection method for updating the National Land Cover Database to circa 2011. Remote Sensing of Environment, 132, 159–175. doi:10.1016/j.rse.2013.01.012 Jin, S., Yang, L., Zhu, Z., & Homer, C. (2017). A land cover change detection and classification protocol for updating Alaska NLCD 2001 to 2011. Remote Sensing of Environment, 195, 44–55. doi:10.1016/j.rse.2017.04.021 Jordan, T. E., Weller, D. E., & Pelc, C. E. (2017). Effects of Local Watershed Land Use on Water Quality in Mid-Atlantic Coastal Bays and Subestuaries of the Chesapeake Bay. Estuaries and Coasts, 41(S1), 38–53. doi:10.1007/s12237-017-0303-5 Kalnay, E., & Cai, M. (2003). Impact of urbanization and land-use change on climate. Nature, 423(6939), 528–531. doi:10.1038/nature01675 Kändler, M., Blechinger, K., Seidler, C., Pavlů, V., Šanda, M., Dostál, T., … Štich, M. (2017). Impact of land use on water quality in the upper Nisa catchment in the Czech Republic and in Germany. Science of The Total Environment, 586, 1316–1325. doi:10.1016/j.scitotenv.2016.10.221 Kibena, J., Nhapi, I., & Gumindoga, W. (2014). Assessing the relationship between water quality parameters and changes in landuse patterns in the Upper Manyame River, Zimbabwe. Physics and Chemistry of the Earth, Parts A/B/C, 67-69, 153–163. doi:10.1016/j.pce.2013.09.017 Kim, D.-H., Sexton, J. O., Noojipady, P., Huang, C., Anand, A., Channan, S., … Townshend, J. R. (2014). Global, Landsat-based forest-cover change from 1990 to 2000. Remote Sensing of Environment, 155, 178–193. doi:10.1016/j.rse.2014.08.017 Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest Package: Tests in 82(13). of Statistical Software, Journal Linear Mixed Effects Models. doi:10.18637/jss.v082.i13 99 Lessels, J. S., & Bishop, T. F. A. (2013). Estimating water quality using linear mixed models with 13–22. of Hydrology, turbidity. Journal 498, stream doi:10.1016/j.jhydrol.2013.06.006 discharge and Li, Y., Li, Y., Qureshi, S., Kappas, M., & Hubacek, K. (2015). On the relationship between landscape ecological patterns and water quality across gradient zones of rapid urbanization in 100–108. doi:10.1016/j.ecolmodel.2015.01.028 Modelling, Ecological coastal China. 318, Lintern, A., Webb, J. A., Ryu, D., Liu, S., Bende-Michl, U., Waters, D., … Western, A. W. (2017). Key factors influencing differences in stream water quality across space. Wiley Interdisciplinary Reviews: Water, 5(1), e1260. doi:10.1002/wat2.1260 Liu, C., Xiong, T., Gong, P., & Qi, S. (2017). Improving large-scale moso bamboo mapping based on dense Landsat time series and auxiliary data: a case study in Fujian Province, China. Remote Sensing Letters, 9(1), 1–10. doi:10.1080/2150704x.2017.1378454 Liu, C., Zhang, Q., Luo, H., Qi, S., Tao, S., Xu, H., & Yao, Y. (2019). An efficient approach to capture continuous impervious surface dynamics using spatial-temporal rules and dense Landsat time series stacks. Remote Sensing of Environment, 229, 114–132. doi:10.1016/j.rse.2019.04.025 Liu, J., Zhang, X., Wu, B., Pan, G., Xu, J., & Wu, S. (2017). Spatial scale and seasonal dependence of land use impacts on riverine water quality in the Huai River basin, China. Environmental Science and Pollution Research, 24(26), 20995–21010. doi:10.1007/s11356-017-9733-7 Liu, J., Shen, Z., & Chen, L. (2018). Assessing how spatial variations of land use pattern affect water quality across a typical urbanized watershed in Beijing, China. Landscape and Urban Planning, 176, 51–63. doi:10.1016/j.landurbplan.2018.04.006 Lu, D., & Weng, Q. (2007). A survey of image classification methods and techniques for improving classification performance. International Journal of Remote Sensing, 28(5), 823–870. doi:10.1080/01431160600746456 Luo, K., Hu, X., He, Q., Wu, Z., Cheng, H., Hu, Z., & Mazumder, A. (2017). Using multivariate techniques to assess the effects of urbanization on surface water quality: a case study in the Liangjiang New Area, China. Environmental Monitoring and Assessment, 189(4). doi:10.1007/s10661-017-5884-8 Lv, H., Xu, Y., Han, L., & Zhou, F. (2014). Scale-dependence effects of landscape on seasonal water quality in Xitiaoxi catchment of Taihu Basin, China. Water Science and Technology, 71(1), 59–66. doi:10.2166/wst.2014.463 Manandhar, R., Odeh, I., & Ancev, T. (2009). Improving the Accuracy of Land Use and Land Cover Classification of Landsat Data Using Post-Classification Enhancement. Remote Sensing, 1(3), 330–344. doi:10.3390/rs1030330 100 Manfrin, A., Bombi, P., Traversetti, L., Larsen, S., & Scalici, M. (2016). A landscape-based predictive approach for running water quality assessment: A Mediterranean case study. Journal for Nature Conservation, 30, 27–31. doi:10.1016/j.jnc.2016.01.002 Mello, K. de, Valente, R. A., Randhir, T. O., dos Santos, A. C. A., & Vettorazzi, C. A. (2018). Effects of land use and land cover on water quality of low-order streams in Southeastern Brazil: Watershed versus 130–138. doi:10.1016/j.catena.2018.04.027 CATENA, 167, riparian zone. Mellor, A., Boukir, S., Haywood, A., & Jones, S. (2015). Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin. ISPRS Journal of Photogrammetry and Remote Sensing, 105, 155–168. doi:10.1016/j.isprsjprs.2015.03.014 Meneses, B. M., Reis, R., Vale, M. J., & Saraiva, R. (2015). Land use and land cover changes in Zêzere watershed (Portugal) — Water quality implications. Science of The Total Environment, 527-528, 439–447. doi:10.1016/j.scitotenv.2015.04.092 Millard, K., & Richardson, M. (2015). On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping. Remote Sensing, 7(7), 8489–8515. doi:10.3390/rs70708489 Molenberghs, G., & Verbeke, G. (2000). Linear Mixed Models for Longitudinal Data. Springer Series in Statistics. doi:10.1007/978-1-4419-0300-6 Moore, I. D., Gessler, P. E., Nielsen, G. A., & Peterson, G. A. (1993). Soil Attribute Prediction Using Terrain Analysis. Soil Science Society of America Journal, 57(2), NP. doi:10.2136/sssaj1993.572npb Newbold, T., Hudson, L. N., Arnell, A. P., Contu, S., De Palma, A., Ferrier, S., … Purvis, A. (2016). Has land use pushed terrestrial biodiversity beyond the planetary boundary? A global assessment. Science, 353(6296), 288–291. doi:10.1126/science.aaf2201 Nielsen, A., Trolle, D., Søndergaard, M., Lauridsen, T. L., Bjerring, R., Olesen, J. E., & Jeppesen, E. (2012). Watershed land use effects on lake water quality in Denmark. Ecological Applications, 22(4), 1187–1200. doi:10.1890/11-1831.1 Oeding, S., Taffs, K. H., Cox, B., Reichelt-Brushett, A., & Sullivan, C. (2018). The influence of land use in a highly modified catchment: Investigating the importance of scale in riverine health assessment. Journal of Environmental Management, 206, 1007–1019. doi:10.1016/j.jenvman.2017.12.005 Panagos, P., Borrelli, P., & Meusburger, K. (2015). A New European Slope Length and Steepness Factor (LS-Factor) for Modeling Soil Erosion by Water. Geosciences, 5(2), 117–126. doi:10.3390/geosciences5020117 Patel, N. N., Angiuli, E., Gamba, P., Gaughan, A., Lisini, G., Stevens, F. R., … Trianni, G. (2015). Multitemporal settlement and population mapping from Landsat using Google Earth 101 Engine. International Journal of Applied Earth Observation and Geoinformation, 35, 199– 208. doi:10.1016/j.jag.2014.09.005 Pratt, B., & Chang, H. (2012). Effects of land cover, topography, and built structure on seasonal water quality at multiple spatial scales. Journal of Hazardous Materials, 209-210, 48–58. doi:10.1016/j.jhazmat.2011.12.068 Riley, S. J., DeGloria, S. D., & Elliot, R. (1999). Index that quantifies topographic heterogeneity. intermountain Journal of sciences, 5(1-4), 23-27. Rodrigues, V., Estrany, J., Ranzini, M., de Cicco, V., Martín-Benito, J. M. T., Hedo, J., & Lucas- Borja, M. E. (2018). Effects of land use and seasonality on stream water quality in a small tropical catchment: The headwater of Córrego Água Limpa, São Paulo (Brazil). Science of The Total Environment, 622-623, 1553–1561. doi:10.1016/j.scitotenv.2017.10.028 Rodriguez-Galiano, V. F., Chica-Olmo, M., Abarca-Hernandez, F., Atkinson, P. M., & Jeganathan, C. (2012). Random Forest classification of Mediterranean land cover using multi-seasonal imagery and multi-seasonal texture. Remote Sensing of Environment, 121, 93–107. doi:10.1016/j.rse.2011.12.003 Rothwell, J. J., Dise, N. B., Taylor, K. G., Allott, T. E. H., Scholefield, P., Davies, H., & Neal, C. (2010). Predicting river water quality across North West England using catchment characteristics. 153–162. doi:10.1016/j.jhydrol.2010.10.015 Hydrology, 395(3-4), Journal of Sajikumar, N., & Remya, R. S. (2015). Impact of land cover and land use change on runoff 460–468. Environmental Management, 161, characteristics. doi:10.1016/j.jenvman.2014.12.041 Journal of Sakai, Y., Ishizuka, S., & Takenaka, C. (2013). Predicting deadwood densities of Cryptomeria japonica and Chamaecyparis obtusa forests using a generalized linear mixed model with a national-scale 228–238. doi:10.1016/j.foreco.2013.01.030 dataset. Forest Ecology and Management, 295, Sangani, M. H., Amiri, B. J., Shabani, A. A., Sakieh, Y., & Ashrafi, S. (2015). Modeling relationships between catchment attributes and river water quality in southern catchments of the Caspian Sea. Environmental Science and Pollution Research, 22(7), 4985-5002. doi:10.1007/s11356-014-3727-5 Santhi, C., Srinivasan, R., Arnold, J. G., & Williams, J. R. (2006). A modeling approach to evaluate the impacts of water quality management plans implemented in a watershed in Texas. Environmental 1141–1157. doi:10.1016/j.envsoft.2005.05.013 Modelling Software, 21(8), & Schneider, A., Friedl, M. A., & Potere, D. (2010). Mapping global urban areas using MODIS 500-m data: New methods and datasets based on “urban ecoregions.” Remote Sensing of Environment, 114(8), 1733–1746. doi:10.1016/j.rse.2010.03.003 102 Seeboonruang, U. (2012). A statistical assessment of the impact of land uses on surface water 134–142. of Environmental Management, Journal quality doi:10.1016/j.jenvman.2011.10.019 indexes. 101, Sheldon, F., Peterson, E. E., Boone, E. L., Sippel, S., Bunn, S. E., & Harch, B. D. (2012). Identifying the spatial scale of land use that most strongly influences overall river ecosystem health score. Ecological Applications, 22(8), 2188–2203. doi:10.1890/11- 1792.1 Shi, P., Zhang, Y., Li, Z., Li, P., & Xu, G. (2017). Influence of land use and land cover patterns on seasonal water quality at multi-spatial scales. CATENA, 151, 182–190. doi:10.1016/j.catena.2016.12.017 Shi, W., Xia, J., & Zhang, X. (2016). Influences of anthropogenic activities and topography on water quality in the highly regulated Huai River basin, China. Environmental Science and Pollution Research, 23(21), 21460–21474. doi:10.1007/s11356-016-7368-8 Shi, Z. H., Ai, L., Li, X., Huang, X. D., Wu, G. L., & Liao, W. (2013). Partial least-squares regression for linking land-cover patterns to soil erosion and sediment yield in watersheds. Journal of Hydrology, 498, 165–176. doi:10.1016/j.jhydrol.2013.06.031 Sluiter, R., & Pebesma, E. J. (2010). Comparing techniques for vegetation classification using multi- and hyperspectral images and ancillary environmental data. International Journal of Remote Sensing, 31(23), 6143–6161. doi:10.1080/01431160903401379 Sun, R., Chen, L., Chen, W., & Ji, Y. (2011). Effect of Land-Use Patterns on Total Nitrogen Concentration in the Upstream Regions of the Haihe River Basin, China. Environmental Management, 51(1), 45–58. doi:10.1007/s00267-011-9764-7 Sun, Y., Guo, Q., Liu, J., & Wang, R. (2014). Scale Effects on Spatially Varying Relationships Between Urban Landscape Patterns and Water Quality. Environmental Management, 54(2), 272–287. doi:10.1007/s00267-014-0287-x Texas Commission on Environmental Quality. (2014). Managing nonpoint source pollution in Texas, 2013 annual report Texas Parks and Wildlife Department. 2012. Texas Conservation Action Plan 2012 – 2016: Gulf Coast Prairies and Marshes Handbook. Editor, Wendy Connally, Texas Conservation Action Plan Coordinator. Austin, Texas Thakkar, A. K., Desai, V. R., Patel, A., & Potdar, M. B. (2017). Post-classification corrections in improving the classification of Land Use/Land Cover of arid region using RS and GIS: The case of Arjuni watershed, Gujarat, India. The Egyptian Journal of Remote Sensing and Space Science, 20(1), 79–89. doi:10.1016/j.ejrs.2016.11.006 Tran, C. P., Bode, R. W., Smith, A. J., & Kleppel, G. S. (2010). Land-use proximity as a basis for assessing stream water quality in New York State (USA). Ecological Indicators, 10(3), 727–733. doi:10.1016/j.ecolind.2009.12.002 103 Tu, J. (2011). Spatially varying relationships between land use and water quality across an urbanization gradient explored by geographically weighted regression. Applied Geography, 31(1), 376–392. doi:10.1016/j.apgeog.2010.08.001 Tu, J. (2013). Spatial Variations in the Relationships between Land Use and Water Quality across an Urbanization Gradient in the Watersheds of Northern Georgia, USA. Environmental Management, 51(1), 1–17. doi:10.1007/s00267-011-9738-9 Uriarte, M., Yackulic, C. B., Lim, Y., & Arce-Nazario, J. A. (2011). Influence of land use on water quality in a tropical landscape: a multi-scale analysis. Landscape Ecology, 26(8), 1151– 1164. doi:10.1007/s10980-011-9642-y Uwimana, A., van Dam, A., Gettel, G., Bigirimana, B., & Irvine, K. (2017). Effects of River Discharge and Land Use and Land Cover (LULC) on Water Quality Dynamics in Migina Catchment, Rwanda. Environmental Management, 60(3), 496–512. doi:10.1007/s00267- 017-0891-7 Varanka, S., & Luoto, M. (2011). ENVIRONMENTAL DETERMINANTS OF WATER QUALITY IN BOREAL RIVERS BASED ON PARTITIONING METHODS. River Research and Applications, 28(7), 1034–1046. doi:10.1002/rra.1502 Varanka, S., Hjort, J., & Luoto, M. (2014). Geomorphological factors predict water quality in boreal rivers. Earth Surface Processes and Landforms, 40(15), 1989–1999. doi:10.1002/esp.3601 Vogelmann, J. E., Howard, S. M., Yang, L., Larson, C. R., Wylie, B. K., & Van Driel, N. (2001). Completion of the 1990s National Land Cover Data Set for the conterminous United States from Landsat Thematic Mapper data and ancillary data sources. Photogrammetric Engineering and Remote Sensing, 67(6). Vrebos, D., Beauchard, O., & Meire, P. (2017). The impact of land use and spatial mediated processes on the water quality in a river system. Science of The Total Environment, 601- 602, 365–373. doi:10.1016/j.scitotenv.2017.05.217 Walsh, C. J., & Webb, J. A. (2014). Spatial weighting of land use and temporal weighting of antecedent discharge improves prediction of stream condition. Landscape Ecology, 29(7), 1171–1185. doi:10.1007/s10980-014-0050-y Wan, R., Cai, S., Li, H., Yang, G., Li, Z., & Nie, X. (2014). Inferring land use and land cover impact on stream water quality using a Bayesian hierarchical modeling approach in the Xitiaoxi River Watershed, China. Journal of Environmental Management, 133, 1–11. doi:10.1016/j.jenvman.2013.11.035 Wang, G., A, Y., Xu, Z., & Zhang, S. (2014). The influence of land use patterns on water quality at multiple spatial scales in a river system. Hydrological Processes, 28(20), 5259–5272. doi:10.1002/hyp.10017 104 Wang, R., Zhang, X., & Li, M.-H. (2019). Predicting bioretention pollutant removal efficiency with design features: A data-driven approach. Journal of Environmental Management, 242, 403–414. doi:10.1016/j.jenvman.2019.04.064 Wickham, J., Stehman, S. V., Gass, L., Dewitz, J. A., Sorenson, D. G., Granneman, B. J., … Baer, L. A. (2017). Thematic accuracy assessment of the 2011 National Land Cover Database (NLCD). Remote Sensing of Environment, 191, 328–341. doi:10.1016/j.rse.2016.12.026 Wijesiri, B., Deilami, K., & Goonetilleke, A. (2018). Evaluating the relationship between temporal changes in land use and resulting water quality. Environmental Pollution, 234, 480–486. doi:10.1016/j.envpol.2017.11.096 Wilson, C. O. (2015). Land use/land cover water quality nexus: quantifying anthropogenic influences on surface water quality. Environmental Monitoring and Assessment, 187(7). doi:10.1007/s10661-015-4666-4 World Population Review (2019). Texas Population 2019. Retrieved from http://worldpopulationreview.com/states/texas-population/ World Wildlife Fund (2019). Piney Woods Forest. Retrieved from https://www.worldwildlife.org/ecoregions/na0523 Xian, G., & Homer, C. (2010). Updating the 2001 National Land Cover Database Impervious Surface Products to 2006 using Landsat Imagery Change Detection Methods. Remote Sensing of Environment, 114(8), 1676–1686. doi:10.1016/j.rse.2010.02.018 Yang, C., Wu, G., Ding, K., Shi, T., Li, Q., & Wang, J. (2017). Improving Land Use/Land Cover Classification by Integrating Pixel Unmixing and Decision Tree Methods. Remote Sensing, 9(12), 1222. doi:10.3390/rs9121222 Ye, L., Cai, Q., Liu, R., & Cao, M. (2008). The influence of topography and land use on water quality of Xiangxi River in Three Gorges Reservoir region. Environmental Geology, 58(5), 937–942. doi:10.1007/s00254-008-1573-9 Ye, L., Cai, Q., Liu, R., & Cao, M. (2008). The influence of topography and land use on water quality of Xiangxi River in Three Gorges Reservoir region. Environmental Geology, 58(5), 937–942. doi:10.1007/s00254-008-1573-9 Yu, W., Zhou, W., Qian, Y., & Yan, J. (2016). A new approach for land cover classification and change analysis: Integrating backdating and an object-based method. Remote Sensing of Environment, 177, 37–47. doi:10.1016/j.rse.2016.02.030 Zhang, F., Wang, J., & Wang, X. (2018). Recognizing the Relationship between Spatial Patterns in Water Quality and Land-Use/Cover Types: A Case Study of the Jinghe Oasis in Xinjiang, China. Water, 10(5), 646. doi:10.3390/w10050646 105 Zhang, H. K., & Roy, D. P. (2017). Using the 500 m MODIS land cover product to derive a consistent continental scale 30 m Landsat land cover classification. Remote Sensing of Environment, 197, 15–34. doi:10.1016/j.rse.2017.05.024 Zhang, W., Li, H., Sun, D., & Zhou, L. (2012). A Statistical Assessment of the Impact of Agricultural Land Use Intensity on Regional Surface Water Quality at Multiple Scales. International Journal of Environmental Research and Public Health, 9(11), 4170–4186. doi:10.3390/ijerph9114170 Zhao, G., Gao, H., & Cuo, L. (2016). Effects of Urbanization and Climate Change on Peak Flows over the San Antonio River Basin, Texas. Journal of Hydrometeorology, 17(9), 2371–2389. doi:10.1175/jhm-d-15-0216.1 Zhao, G., & Gao, H. (2019). Estimating reservoir evaporation losses for the United States: Fusing remote sensing and modeling approaches. Remote Sensing of Environment, 226, 109–124. doi:10.1016/j.rse.2019.03.015 Zhao, J., Lin, L., Yang, K., Liu, Q., & Qian, G. (2015). Influences of land use on water quality in a reticular river network area: A case study in Shanghai, China. Landscape and Urban Planning, 137, 20–29. doi:10.1016/j.landurbplan.2014.12.010 Zhou, P., Huang, J., Pontius, R. G., & Hong, H. (2016). New insight into the correlations between land use and water quality in a coastal watershed of China: Does point source pollution weaken 591–600. doi:10.1016/j.scitotenv.2015.11.063 Environment, Science 543, it? of The Total Zhu, Z., Gallant, A. L., Woodcock, C. E., Pengra, B., Olofsson, P., Loveland, T. R., … Auch, R. F. (2016). Optimizing selection of training and auxiliary data for operational land cover classification for the LCMAP initiative. ISPRS Journal of Photogrammetry and Remote Sensing, 122, 206–221. doi:10.1016/j.isprsjprs.2016.11.004 Zou, G., Li, Y., Huang, T., Liu, D. L., Herridge, D., & Wu, J. (2017). A Mixed-Effects Regression Modeling Approach for Evaluating Paddy Soil Productivity. Agronomy Journal, 109(5), 2302. doi:10.2134/agronj2017.02.0089 106 CHAPTER 4 EVALUATING THE EFFECTIVENESS OF WATERSHED PRESERVATION BASED ON THE HYDOLOGICALLY SENSITIVE AREA (HSA) SITING APPROACH—A DEMONSTRATION OF DATA-DRIVEN ECOLOGICAL PLANNING METHOD 4.1 Introduction Landscape planning and design are decision making processes that integrate multiple domains of knowledge, including ecology, hydrology, geology, economics, history and so on (Steiner, 2011; Xiang, 2014; Wang et al., 2016). Since the 1960s and 1970s, the concept of ecological planning had brought much recognition to planners and designers. Among the most acknowledged of those planners and designers was Ian McHarg, who carried out pioneer planning projects using ecological frameworks. According to McHarg, ecological planning and design should be “an intrinsically suitable location” and included “processes with appropriate materials and forms” (McHarg, 2006, p. 123). McHarg viewed nature as a value system by evaluating all the ecological, economic, and cultural factors as interdependent components that, together, formed a holistic social-ecological system (McHarg, 1969, p.104; Yang and Li, 2016). This fundamental theory led to the corresponding “layer-cake” model as the core method for realizing ecological planning. In the “layer-cake” model, conservation areas are delineated in terms of those that are not suitable for development according to a suitability analysis for each layer (McHarg, 1969, p.114). For example, in the early development of The Woodlands project, inventory maps including physiography, geology, soils, hydrology, vegetation, climate and resources were overlaid to determine suitability maps for proposed land uses (McHarg and Steiner, 1998; Yang et al., 2015; Yang, 2018). The “layer-cake” model has had far-reaching influence and has been widely applied in a number of 107 ecological planning projects over the years (Espejel et al., 1999; Sustainable Sites Initiative, 2009; Calkins, 2012). The key step in McHarg’s “layer-cake” model is the identification of critical areas that, intrinsically, have high ecological values and should thus be protected from development (Steiner et al., 2000a; Herrington, 2010). Many research efforts have aimed to expand the framework of ecological planning to become “broader” with additional layers or sublayers. An important question to consider in such work is whether each layer is “deep” enough to form a more efficient plan. We argue that one limitation of the “layer-cake” model is that the suitability analysis in each layer is a linear combination of multiple indicators. The ranking of ecological values was somewhat arbitrary due to the accuracy of environmental data and the linear overlay method (Yang et al., 2015). In fact, the non-linear behavior of ecosystems can hardly be approximated by the linear overlay approach. As such, the introduction of nonlinear interdisciplinary models in order to make each layer more physically sound has yielded promising results. For example, it is possible for the soil erosion layer to be generated by the linear overlay of hydrology, soil and topography maps (Dosskey et al., 2005). However, this approach was shown to be less accurate and efficient in comparison to the Revised Universal Soil Loss Equation (RUSLE) for mapping soil erosion (Schumacher et al., 2005). Soil erosion maps generated by RUSLE with logistic regression calibrations were tested and found to be more robust (Mueller et al., 2005), which aided in the creation of a more effective soil layer in ecological planning. In this study, the hydrology layer in ecological planning was investigated and the hydrologically sensitive area (HSA) approach to map runoff and contaminant source areas was introduced. The hydrology layer is one of the most important components in ecological planning, as it links land use, soil, topography, and aquatic organisms to form an interactive natural process. The HSAs are 108 delineated according to variable source area (VSA) hydrology. VSAs are the runoff-generating areas in a watershed; they are small, variable, and predictable depending on season, climate, topography and land cover factors (Frankenberger et al., 1999; Qiu, 2003). HSAs are parts of VSAs more prone to generating runoff and are therefore susceptible to contaminant transportation (Walter et al, 2000). The spatial patterns of HSAs and their impacts on discharge and pollutant generation have been well demonstrated (Qiu, 2009; Qiu et al, 2013). In ecological planning, HSAs are the preferential locations to place best management practice (BMP) or low impact development (LID) facilities (Martin-Mikle et al., 2015). Another limitation of the traditional ecological planning approach is that planning efficiency was often conceptually and intuitively proved, without further validation from real data. One way researchers have tried to address this issue is by evaluating the performance of ecological planning with hypothetical scenario analysis (Yang and Li, 2011, Fu et al., 2016). However, given the current, dramatically increased availability of data and advancements in hardware and software engineering, little work has yet been done to leverage these resources in ecological planning. To take advantage of various sources of publicly available environmental data, statistical analysis has been shown to be a more straightforward way to investigate the impact of landscape features on hydrology and water quality, compared to complex hydrological models (Giri and Qiu, 2016; Lintern et al., 2018). To illustrate how data-driven methods work in ecological planning, statistical verification and scenario evaluation were applied to prove the effectiveness of the HSA approach in this study. This study utilized an interdisciplinary approach to calculate and validate the HSAs, as demonstrated in the Middle Brazos-Bosque basin in the Texas gulf region. The three objectives were: (1) to generate the HSA map in the Middle Brazos-Bosque basin. On the HSA map, areas 109 with high hydrological sensitivity were suggested to be prioritized as conservation areas; (2) to calculate the mean hydrological sensitivity of each subbasin in the Middle Brazos-Bosque basin and investigate if the mean hydrological sensitivity was correlated with NO3 --N concentrations measured at the subbasin outlet; and (3) to simulate NO3 --N outputs in scenarios where some HSAs were transformed from croplands to green infrastructures for best management practices. A threshold was suggested to delineate HSAs, which led to the most efficient scenario regarding NO3 --N loading reduction. 4.2 Data and Method 4.2.1 Study Site The Middle Brazos-Bosque basin (Figure 4-1) is one of the 378 hydrologic accounting units with an HUC6 number of 120602. The Brazos River is the eleventh longest river in the United States, with a total drainage area of 116,000 km2. The main water quality concerns in the Brazos Watershed include high nutrient loadings, high bacterial, and low dissolved oxygen. The area of the Middle Brazos-Bosque basin is 19,140 km2. It is a mixed-use watershed with the upper drainage area primarily occupied by forest and grassland. The lower drainage area is covered by planted and urban areas. A part of the city of Waco is located downstream of the Middle Brazos- Bosque basin. According to climate data from the Waco Regional Airport Station, in the latest three decades, the annual average temperature is 19.3 °C and the annual total precipitation is 88.1 cm. The mean slope of the basin is around 21°. Hydrologic soil groups C and D are the primary soil categories in the basin, which have lower infiltration rates and higher runoff potentials. Located within the basin boundary are 89 Texas Commission on Environmental Quality (TCEQ) monitoring stations and 13 USGS monitoring stations. Because the large area of the Middle Brazos-Bosque basin would 110 add difficulties to the hydrological simulation process, the McGregor subbasin, with the area of 22.4 km2, was selected as the HSA scenario analysis site. The McGregor subbasin was delineated with USGS Station 08095300 as the subbasin outlet, which is close to the city of McGregor. The primary land covers of the McGregor subbasin are herbaceous and planted. 111 Figure 4-1. Study Site (The Middle Brazos-Bosque basin) 112 4.2.2 Data Acquisition HSA mapping involved the data layers of topography, hydrology and soil. The USGS National Elevation Dataset with a spatial resolution of 30m was used to derive elevation, slope and flow accumulation data. Soil data were drawn from the Soil Survey Geographic (SSURGO) soil database. Soil conductivity and soil depth to the restrictive layer were the two parameters of interest, calculated from “component,” “corestriction” and “chorizon” tables from the SSURGO database. Water quality data of multiple subbasins were required to perform statistical verification of the HSA approach. The locations of water quality monitoring stations were drawn from the Texas Commission on Environmental Quality (TCEQ), and subbasin boundaries were delineated accordingly. The corresponding water quality data in 2011 were obtained from the Texas Clean Rivers Program (CRP) data tool. Specifically, NO3 --N concentration data in the wet seasons were aggregated yearly and joint with other attributes of the subbasins. Land cover, topography and weather data were prepared for the HSA scenarios simulations. Land cover data were extracted from the 2011 National Land Cover Database (NLCD). NLCD has 16 classes of land cover at the spatial resolution of 30m. The Daymet Version 3 dataset, with a spatial resolution of 1000m, was used to aggregate daily mean temperature and daily total precipitation across the study site. Due to missing data on continuously measured pollutants, discharge data were used for model calibration as a compromising approach to simulate contaminant outputs. All the data were prepared with Google Earth Engine (GEE) and ArcMap 10.5. 113 4.2.3 HSA Calculation and Mapping The gridded hydrological sensitivity maps were generated based on the TOPography based hydrological MODEL (TOPMODEL). In the TOPMODEL, the resulting topographic index from Equation 4-1 was used to model patterns of surface runoff. The larger the topographic index value, the more likely the grid is to be saturated during a rainfall event. It is therefore reasonable to keep grids with high topographic index values as conservation areas in ecological planning (Qiu, 2009). λ = ln( ∂ / tan ) β − ln( sK D ) Equation. 4-1 The first part on the right side of the equation is the wetness index and the second part accounts for the soil water storage capacity (Beven and Kirkby, 1979; Walter et al., 2002). In the equation, α represents the upslope contributing area per unit contour length in meters, which is approximated by the flow accumulation value. β is the surface slope angle in decimal degrees. The term sK D is the water storage component, where sK is the mean saturated hydraulic conductivity of the soil profile in meters per day and D is the soil depth to the restrictive layer in centimeters. The shallower the soil profile above the restrictive layers and the lower the saturated hydraulic conductivity, the higher the likelihood of runoff generation. If there are several topsoil layers above the restrictive layer with different sK , a compound sK will be defined via Equation 4-2. In Equation 4-2, d is the total depth of soil above the restrictive layer, id is the depth of layer i and ik is the corresponding saturated hydraulic conductivity of layer i. There are a small numbers of grids where sK values are missing in the SSURGO database. Most 114 of the grids are water bodies, where green infrastructures are not suitable to build. Therefore, they are left as “no data” on the HSA map. K s 4.2.4 Statistical Analysis = ∑ d / n 1 ( / d k i i ) Equation. 4-2 Statistical analysis was carried out to determine the relationships between hydrological sensitivity and NO3 --N concentration in streams. If higher hydrological sensitivity was associated with higher nutrient loadings, the effectiveness of prioritizing HSAs as conservation areas in this basin could be supported. The units of analysis were the 37 subbasins with measured NO3 --N concentration data in the 2011 wet season. The wet season was defined as the time range from June to October. Dependent variables were the yearly averages of NO3 --N concentrations measured at each subbasin outlet. Independent variables were the mean hydrological sensitivity of the subbasin. The Pearson correlation analysis was performed to study the relationships between mean hydrological sensitivity and NO3 --N concentrations. A null hypothesis was also tested to determine if any association existed between them, with a significance level of 0.05. The scatter plots of natural logarithm of NO3 --N concentrations and mean hydrological sensitivity suggested that there might be a non-linear relationship between them. Therefore, a non-linear least squares (NLS) model with a quadratic term was fit to predict NO3 --N concentrations with mean hydrological sensitivity. 115 4.2.5 SWAT modelling The Soil and Water Assessment Tool (SWAT) was used to simulate multiple HSA scenarios. This model was selected because the hydrological response units (HRU) in SWAT integrate the components of land cover, soil and topography, which agreed conceptually with the TOPMODEL. The baseline scenario was the current land cover status of the McGregor subbasin. Two alternative scenarios were developed where HSAs were defined with 5% and 2% grids with the highest hydrological sensitivity values. The HSAs on the croplands were hypothesized to remain as forests. Discharge and NO3 --N outputs were simulated monthly from 2008 to 2011, following a two-year warm-up period from 2006 to 2007. Because continuously measured NO3 --N output data was not available, only discharge was calibrated in the 2008 to 2009 period. The validation period was from 2010 to 2011. The calibrated parameters were CN value, soil evaporation compensation factor and soil available water capacity. The SWAT model efficiency was evaluated by Nash-Sutcliffe model efficiency coefficient (NSE) and R2. The missing NO3 --N output validation was a major drawback in the scenario simulation. The simulated NO3 --N output in the scenario analysis was therefore only an approximation of the performance data. 4.3 Result 4.3.1 HSA Map The distribution of the topographic index values in the Middle Brazos-Bosque basin was a right- skewed bell curve, with a mean value of 5.3, and a standard deviation of 2.8. The maximum topographic index value was 26.2. The values inside one standard deviation were from 3 to 5.2. The topographic index values in this basin had a similar range but a smaller mean than those 116 reported in previous studies (Qiu, 2009; Martin-Mikle et al., 2015). The reason might be that some water bodies with high hydrological sensitivities had no hydraulic conductivity data available, and we excluded them in the topographic index calculation. Presented in Figure 4-2 is the hydrological sensitivity map of the McGregor subbasin, represented by the topographic index values. It is important to note that some HSAs with high topographic index values are located in the middle of subbasin’s fields, rather than along its streams. This indicates that only protecting stream buffer areas is not sufficient for ecological planning. Grids with 5% highest topographic index values were mapped as HSAs, of which the values were larger than 11.9. The critical source areas (CSA) for nutrient generations were mapped as the HSAs on the planted area. Most of the CSAs were located in the downstream areas in the McGregor subbasin. Such CSAs were the prioritized sites to place BMP facilities. Figure 4-2. Hydrological sensitivity map and the critical source areas in the McGregor subbasin 117 4.3.2 The Relationships between Hydrologically Sensitivity and Water Quality Pearson correlation results show a strong association between the mean hydrological sensitivity of the basin and the corresponding natural logarithm of NO3 --N concentrations. The correlation between the basin’s mean hydrological sensitivity and the natural logarithm of NO3 --N concentrations was 0.4, with a p value of 0.014. The scatter plot in Figure 4-3 also indicates a positive association between hydrological sensitivity and NO3 --N concentrations via a probable non-linear relationship. A quadratic non-linear curve fit with the NLS model is also presented in Figure 4-3. The quadratic form was significant at the 0.01 level. The results indicate that subbasins with higher hydrological sensitivity tended to have higher NO3 --N pollutant concentrations. With increased hydrological sensitivity, its impact on NO3 --N concentrations became stronger. Figure 4-3. The relationship between mean hydrological sensitivity and log (NO3 --N) in wet seasons 118 4.3.3 Scenario Simulation The NO3 --N output during the 2008 to 2011 period was approximated in SWAT under multiple HSA scenarios. In the calibration period, the R2 and NSE of discharge were 0.93 and 0.75, respectively. In the validation period, the R2 and NSE of discharge were 0.81 and 0.55, respectively. Table 4-1 demonstrates that scenario 2 was more efficient than scenario 1 in treating NO3 --N pollution. In scenario 2, areas with the highest 2% hydrological sensitivity on the cropland were transformed into green space. Compared to the baseline scenario, 1.3% of croplands were transformed into green space, which only accounted for 0.25% of the total basin area and 1.3% of the total cropland area. However, 3.7% of nitrate outputs were reduced, which was disproportionately larger than the land use change. In scenario 1, the percentages of transformed croplands and the reduction of NO3 --N outputs were about the same; thus the efficiency of the HSA approach was not as high as that of scenario 2. The SWAT simulation results indicated that keeping areas with 2% highest hydrological sensitivity values as green infrastructure would be very efficient for NO3 --N reduction. Increasing the percentage to 5% did not make a huge difference in further reducing NO3 --N loadings. Table 4-1. SWAT simulation results of NO3 --N output in the period from 2008 to 2011 scenario criteria baseline scenario land use of the current situation --N output (kg) NO3 84662 scenario 1 scenario 2 If the grids are among the highest 5% hydrological sensitivity with cropland land use, they are transformed into green space 80471 If the grids are among the highest 2% hydrological sensitivity with cropland land use, they are transformed into green space 81514 119 Table 4-1 (cont’d) 4191 5% 5.6% 1% 3148 3.7% 1.3% 0.25% --N output --N the decreased NO3 output compared to the baseline scenario (kg) the percentage decrease of NO3 compared to the baseline scenario the percentages of total cropland area that is transformed into green space compared to the baseline scenario the percentages of total basin area that is transformed into green space compared to the baseline scenario 4.4 Discussion 4.4.1 Water Quality Management Implication The HSA approach can be linked to land use controls, which protects scarce natural resources and mitigates the negative impacts of urbanization. Common land use controls protect water resources in steep slope areas, stream corridor areas, open space, farmland and wetlands. It was indicated that these types of land use controls could protect only around 50% of HSAs, most of which were protected by wetland conservation (Qiu et al., 2014). Based on the findings, some HSAs were located in the middle of upland fields and not along the stream corridors. These HSAs might not be effectively protected by existing land use control policies. Therefore, HSAs should be taken into consideration in land use control frameworks with additional protecting criteria. The HSA approach can also provide a mechanistic and spatially explicit method for prioritizing LID sites. This approach ensures that LID facilities are more cost-effectively placed. In this study 120 site, HSAs were located dispersedly, with some patches in the stream source areas. The mapping results of HSAs were coincident with the principle of LID, which is to manage runoff at the source using a decentralized approach of controls. In addition, the scenario analysis proved the efficiency of placing LID in areas with the highest 2% hydrological sensitivity. If the measured water quality data were available, NO3 --N outputs could be calibrated to indicate a more accurate threshold of HSA delineation, which could lead to the most effective solution regarding the removal of nutrients. 4.4.2 The Interdisciplinary Ecological Planning Approach Typical ecological planning procedures involve planning goal initialization, inventory analysis, suitability analysis, and land use analysis (Yang and Li, 2016). In this study, the verification of ecological planning with statistical and scenario analyses was emphasized. As shown in Figure 4- 4, the ecological planning workflow of a specific layer includes the steps of planning goals setup, theory formation, data acquisition, map analysis, statistical verification, and planning performance evaluation. It is a circulation process that starts with a specific goal and needs to verify whether or not the planning performance reaches this goal. 121 Figure 4-4. Data-driven ecological planning workflow using hydrology layer as an example The dramatic increase of available data sources has made inventory analysis much more convenient as it is sometimes feasible to get all the in-situ data from public data sources. For example, Google Earth Engine (GEE) provides a data archive that includes more than 40 years of scientific datasets, such as climate and weather data, land cover data, geophysical data and so on (Gorelick et al., 2017). In this case study, land cover data, topography data, and climate data were 122 all obtained from GEE. The availability of big data greatly increases the generalization of evidence-based ecological planning, as the data are also freely available in other regions. Using the data-driven approach to strengthen the scientific core of ecological planning is another important trend. In data-driven ecological planning, the emphasis is on identifying the most important planning factors that affect the final goals. Using the hydrology layer as an example, the most important landscape factor affecting stream water quality vary among different local contexts (Ding et al., 2016). Thus, statistical verification is needed to confirm whether or not the selected indicators in a given planning strategy have a significant impact on local stream water quality. Analytical methods such as stepwise regression, linear mixed model, geographically weighted regression (GWR), and redundancy analysis (RDA) are all helpful in analyzing the relationships between landscape factors and environmental indicators (Ragosta et al., 2010; Wang et al., 2014; Prat and Chang, 2012). Performing scenario analysis of multiple ecological planning alternatives is important in evaluating the efficiency of different approaches, especially for multi-objective ecological planning (Yang and Li, 2011; Fu et al., 2016; Wu et al., 2016). Scenario analysis involves baseline and alternative scenario design, input data preparation, model calibration, and assessment of scenario outputs. In this study, scenario analysis was used to find an optimal threshold to define HSAs that can reduce more NO3 --N loadings with larger areas cultivated. Some fully distributed hydrologic models have potential to simulate hydrological outcomes of ecological planning strategies with different spatial patterns. Such models include the Storm Water Management Model (SWMM), Mike SHE, Regional Hydro-Ecological Simulation System (RHESSys) and Distributed Hydrology–Soil–Vegetation Model (DHSVM) and so on (Qin et al., 2013; Trinh and Chui, 2013; Tague and Band, 2004; Cuo et al., 2008). In addition, it is helpful to measure and 123 document planning and design performance after a project is built, as it can be used as a reference for future ecological planning (Li et al., 2013). Ecological planning has an interdisciplinary nature. The “layer-cake” approach requires expertise and knowledge from multiple disciplines to investigate each layer, especially in a data- driven approach. There are a number of interdisciplinary models that can be used to quantify each layer in ecological planning to support multiple goals in ecological, social and economic aspects, as shown in Figure 4-5. Figure 4-5. Multidisciplinary methods as extensions of the “layer-cake” model To map the vegetation layer, leaf area index (LAI) maps which are derived from satellite imagery have been used to quantify the structure and function of forest ecosystems (Clevers et al., 2017). Areas with large LAIs represent dense forest areas and should be protected from cultivation. In the soil layer, the Water and Tillage Erosion Model (WATEM) and the Vegetative Filter Strip Model (VFSMOD) have been applied to derive soil erosion maps and designate corresponding conservation buffers (Dosskey et al., 2005; Dosskey et al., 2006). In the wildlife biology layer, a 124 habitat suitability index (HSI) map can be used to characterize habitat quality for selected wildlife species. For example, the HSI of marine animals was developed based on factors of sediments, water depth, water temperature, salinity, pH, dissolved oxygen and so on (Thomasma and Peterson, 1991; Chen et al., 2009; Zhang et al., 2017). In the topography layer, high resolution Light detection and ranging (LiDAR) data generate more accurate topographic maps, and perform better in mapping topographic related indexes such as power index (SPI), compound topographic index (CTI) and so on (Galzki et al., 2011; Tomer et al., 2013; Gali et al., 2015; Djodjic and Villa, 2015). In the hydrology layer, except for the HSA approach, the index method can be used to identify areas that are more sensitive to land use change, based on their contribution to the change of flow characteristics (Kalin and Hantush 2009; Noori et al., 2016). Furthermore, ecological planning is a socio-ecological practice that incorporates social systems, such as politics, governance, economy and cultures (Xiang, 2019). Research about social impacts on planning has made some progress in quantifying the social benefits of ecological planning, such as strengthening social ties in neighborhoods, enhancing residents’ mental health, increasing property values and so on (Tyrväinen and Miettinen, 2000; Francis et al., 2012; Kaźmierczak, 2013). Currently, social media data are used as a source of knowledge to measure people’s attitudes and perceptions of built environment in order to inform planning and design strategies (Ciuccarelli et al., 2014; Nummi, 2019). In addition, it should be aware that the homeowner’s preference, the market needs, and the public-private partnerships could all affect the implementation of an ecological planning project (Yang et al., 2015). After the formation of each layer in ecological planning, decision making models such as agent-based models (ABM), which simulate the actions and interactions of multiple entities, can be applied to find the optimal solution in complex socio- ecological systems (Matthews et al., 2007; Bruch and Atwell, 2015). 125 4.5 Conclusion In this study, a data-driven approach to ecological planning by applying the HSA approach in the Middle Brazos-Bosque basin was demonstrated. Hydrological sensitivity was mapped and the most effective conservation areas for protecting a healthy watershed was designated. Correlation analysis and NLS regression results indicated that hydrological sensitivity was significantly positively correlated with NO3 --N concentrations. Therefore, urban development and agriculture cultivation should avoid HSAs to protect stream water quality. Multiple planning scenarios were simulated in SWAT, and it was found that areas with the highest 2% hydrological sensitivity should be kept as green infrastructure in the watershed. Given the results of this study, it was recommended that a standard data-driven approach to ecological planning should involve the steps of statistical verification and planning evaluation to test whether the proposed strategy fulfills the planning goals. There are a variety of models available from multiple disciplines for doing so, for both natural system and social systems. Data- driven approaches can offer a technical guide to realize McHarg’s initial attempt at exploring a scientific and logic way of incorporating ecology in planning. 126 BIBLIOGRAPHY 127 BIBLIOGRAPHY Ahern, J. (2011). From fail-safe to safe-to-fail: Sustainability and resilience in the new urban world. Landscape and urban Planning, 100(4), 341-343. Beven, K. J., & Kirkby, M. J. (1979). A physically based, variable contributing area model of basin hydrology/Un modèle à base physique de zone d'appel variable de l'hydrologie du bassin versant. Hydrological Sciences Journal, 24(1), 43-69. Bruch, E., & Atwell, J. (2015). Agent-based models in empirical social research. Sociological methods & research, 44(2), 186-221. Calkins, M. (2012). The sustainable sites handbook: A complete guide to the principles, strategies, and best practices for sustainable landscapes (Vol. 39). John Wiley & Sons. Chen, X., Li, G., Feng, B., & Tian, S. (2009). Habitat suitability index of Chub mackerel (Scomber japonicus) from July to September in the East China Sea. Journal of oceanography, 65(1), 93-102. Ciuccarelli, P., Lupi, G., & Simeone, L. (2014). Visualizing the data city: social media as a source of knowledge for urban planning and management. Springer Science & Business Media. Clevers, J., Kooistra, L., & Van Den Brande, M. (2017). Using Sentinel-2 data for retrieving LAI and leaf and canopy chlorophyll content of a potato crop. Remote Sensing, 9(5), 405. Cuo, L., Lettenmaier, D. P., Mattheussen, B. V., Storck, P., & Wiley, M. (2008). Hydrologic prediction for urban watersheds with the Distributed Hydrology–Soil–Vegetation Model. Hydrological processes, 22(21), 4205-4213. Ding, J., Jiang, Y., Liu, Q., Hou, Z., Liao, J., Fu, L., & Peng, Q. (2016). Influences of the land use pattern on water quality in low-order streams of the Dongjiang River basin, China: a multi- scale analysis. Science of the total environment, 551, 205-216. Delgado, J. A., Khosla, R., & Mueller, T. (2011). Recent advances in precision (target) conservation. Journal of Soil and Water Conservation, 66(6), 167A-170A. Djodjic, F., & Villa, A. (2015). Distributed, high-resolution modelling of critical source areas for erosion and phosphorus losses. Ambio, 44(2), 241-251. Dosskey, M. G., Eisenhauer, D. E., & Helmers, M. J. (2005). Establishing conservation buffers using precision information. Journal of Soil and Water Conservation, 60(6), 349-354. Dosskey, M. G., Helmers, M. J., & Eisenhauer, D. E. (2006). An approach for using soil surveys the placement of water quality buffers. Journal of Soil and Water to guide Conservation, 61(6), 344-354. Espejel, I., Fischer, D. W., Hinojosa, A., Garcı́a, C., & Leyva, C. (1999). Land-use planning for the Guadalupe Valley, Baja California, Mexico. Landscape and Urban Planning, 45(4), 219-232. Forman, R. T., & Godron, M. (1986). Landscape ecology John Wiley & Sons. New York, 4, 22- 28. 128 Francis, J., Wood, L. J., Knuiman, M., & Giles-Corti, B. (2012). Quality or quantity? Exploring the relationship between Public Open Space attributes and mental health in Perth, Western Australia. Social science & medicine, 74(10), 1570-1577. Frankenberger, J. R., Brooks, E. S., Walter, M. T., Walter, M. F., & Steenhuis, T. S. (1999). A GIS‐based variable source area hydrology model. Hydrological processes, 13(6), 805-822. Fu, X., Wang, X., Schock, C., & Stuckert, T. (2016). Ecological wisdom as benchmark in planning and design. Landscape and Urban Planning, 155, 79-90. Gali, R. K., Soupir, M. L., Kaleita, A. L., & Daggupati, P. (2015). Identifying potential locations terrain attributes and precision conservation for grassed waterways using technologies. Transactions of the ASABE, 58(5), 1231-1239. Galzki, J. C., Birr, A. S., & Mulla, D. J. (2011). Identifying critical agricultural areas with three- meter LiDAR elevation data for precision conservation. Journal of Soil and Water Conservation, 66(6), 423-430. Gaprindashvili, G., & Van Westen, C. J. (2016). Generation of a national landslide hazard and risk map for the country of Georgia. Natural hazards, 80(1), 69-101. Giri, S., & Qiu, Z. (2016). Understanding the relationship of land uses and water quality in Twenty First Century: A review. Journal of environmental management, 173, 41-48. Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18-27. Grove, J. M., Cadenasso, M. L., Pickett, S. T., Machlis, G. E., & Burch, W. R. (2015). The Baltimore school of urban ecology: space, scale, and time for the study of cities. Yale University Press. Kalin, L., & Hantush, M. M. (2009). An auxiliary method to reduce potential adverse impacts of prioritization. Environmental developments: subwatershed projected management, 43(2), 311. land Kaźmierczak, A. (2013). The contribution of local parks to neighbourhood social ties. Landscape and urban planning, 109(1), 31-44. Kosmas, C., Ferrara, A., Briasouli, H., & Imeson, A. (1999). Methodology for mapping environmentally sensitive areas (ESAs) to desertification. The Medalus project: Mediterranean desertification and land use. Manual on key indicators of desertification and mapping environmentally sensitive areas to desertification (Kosmas C, Kirkby M, Geeson N eds), European Union, 18882, 31-47. Herrington, S. (2010). The nature of Ian McHarg’s science. Landscape Journal, 29(1), 1-20. Li, M. H., Dvorak, B., Luo, Y., & Baumgarten, M. (2013). Landscape performance: Quantified benefits and lessons learned from a treatment wetland system and naturalized landscapes. Landscape Architecture Frontiers, 1(4), 56-68. 129 Liao, K. H., & Chan, J. K. H. (2016). What is ecological wisdom and how does it relate to ecological knowledge?. Landscape and Urban Planning, 155, 111-113. Lintern, A., Webb, J. A., Ryu, D., Liu, S., Bende‐Michl, U., Waters, D., ... & Western, A. W. (2018). Key factors influencing differences in stream water quality across space. Wiley Interdisciplinary Reviews: Water, 5(1), e1260. Martin-Mikle, C. J., de Beurs, K. M., Julian, J. P., & Mayer, P. M. (2015). Identifying priority sites for low impact development (LID) in a mixed-use watershed. Landscape and urban planning, 140, 29-41. Matthews, R. B., Gilbert, N. G., Roach, A., Polhill, J. G., & Gotts, N. M. (2007). Agent-based land-use models: a review of applications. Landscape Ecology, 22(10), 1447-1459. Mazziotta, A., Triviño, M., Tikkanen, O. P., Kouki, J., Strandman, H., & Mönkkönen, M. (2015). Applying a framework for landscape planning under climate change for the conservation of biodiversity in the Finnish boreal forest. Global change biology, 21(2), 637-651. McHarg, I. L. (1969). Design with nature. New York, NY: Doubleday/Natural History Press. McHarg, I. L., & Steiner, F. R. (1998). To heal the Earth: Selected writings of Ian L. McHarg, I. L. (2006). The essential Ian McHarg: writings on design and nature. Island Press. Meerow, S. (2015). Defining urban resilience: A review, landscape and urban planning. Mueller, T. G., Cetin, H., Fleming, R. A., Dillon, C. R., Karathanasis, A. D., & Shearer, S. A. (2005). Erosion probability maps: Calibrating precision agriculture data with soil surveys using logistic regression. Journal of soil and water conservation, 60(6), 462-468. Niemelä, J. (1999). Ecology and urban planning. Biodiversity & Conservation, 8(1), 119-131. Noori, N., Kalin, L., Sen, S., Srivastava, P., & Lebleu, C. (2016). Identifying areas sensitive to land use/land cover change for downstream flooding in a coastal Alabama watershed. Regional environmental change, 16(6), 1833-1845. Nummi, P. (2019). Social media data analysis in urban e-planning. In Smart Cities and Smart Spaces: Concepts, Methodologies, Tools, and Applications (pp. 636-651). IGI Global. Palazzo, D., & Steiner, F. R. (2012). Urban ecological design: a process for regenerative places (Vol. 12). Island Press Pratt, B., & Chang, H. (2012). Effects of land cover, topography, and built structure on seasonal water quality at multiple spatial scales. Journal of hazardous materials, 209, 48-58. Qin, H. P., Li, Z. X., & Fu, G. (2013). The effects of low impact development on urban flooding under different rainfall characteristics. Journal of environmental management, 129, 577- 585. Qiu, Z. (2003). A VSA-based strategy for placing conservation buffers in agricultural watersheds. Environmental Management, 32(3), 299-311. Qiu, Z. (2009). Assessing critical source areas in watersheds for conservation buffer planning and riparian restoration. Environmental management, 44(5), 968-980. 130 Qiu, Z., Hall, C., Drewes, D., Messinger, G., Prato, T., Hale, K., & Van Abs, D. (2013). Hydrologically sensitive areas, land use controls, and protection of healthy watersheds. Journal of Water Resources Planning and Management, 140(7), 04014011. Ragosta, G., Evensen, C., Atwill, E. R., Walker, M., Ticktin, T., Asquith, A., & Tate, K. W. (2010). Causal connections between water quality and land use in a rural tropical island watershed. EcoHealth, 7(1), 105-113. Salvati, L., Ferrara, C., & Corona, P. (2015). Indirect validation of the Environmental Sensitive Area Index using soil degradation indicators: A country-scale approach. Ecological indicators, 57, 360-365. Schumacher, J. A., Kaspar, T. C., Ritchie, J. C., Schumacher, T. E., Karlen, D. L., Venteris, E. R., ... & Fenton, T. E. (2005). Identifying spatial patterns of erosion for use in precision conservation. Journal of soil and water conservation, 60(6), 355-362. Steiner, F., McSherry, L., & Cohen, J. (2000a). Land suitability analysis for the upper Gila River watershed. Landscape and urban planning, 50(4), 199-214. Steiner, F., Blair, J., McSherry, L., Guhathakurta, S., Marruffo, J., & Holm, M. (2000b). A watershed at a watershed: the potential for environmentally sensitive area protection in the upper San Pedro Drainage Basin (Mexico and USA). Landscape and urban planning, 49(3-4), 129-148. Steiner, F. (2011). Landscape ecological urbanism: Origins and trajectories. Landscape and urban planning, 100(4), 333-337. Steiner, F. (2016). The application of ecological knowledge requires a pursuit of wisdom. Landscape and Urban Planning, 155, 108-110. Sustainable Sites Initiative. (2009). The sustainable sites initiative: guidelines and performance benchmarks 2009. Tague, C. L., & Band, L. E. (2004). RHESSys: Regional Hydro-Ecologic Simulation System— An object-oriented approach to spatially distributed modeling of carbon, water, and nutrient cycling. Earth interactions, 8(19), 1-42. Thomasma, L. E., Drummer, T. D., & Peterson, R. O. (1991). Testing the habitat suitability index model for the fisher. Wildlife Society Bulletin (1973-2006), 19(3), 291-297. Tomer, M. D., Crumpton, W. G., Bingner, R. L., Kostel, J. A., & James, D. E. (2013). Estimating nitrate load reductions from placing constructed wetlands in a HUC-12 watershed using LiDAR data. Ecological Engineering, 56, 69-78. Trinh, D. H., & Chui, T. F. M. (2013). Assessing the hydrologic restoration of an urbanized area via an integrated distributed hydrological model. Hydrology and Earth System Sciences, 17(12), 4789-4801. Tyrväinen, L., & Miettinen, A. (2000). Property prices and urban forest amenities. Journal of environmental economics and management, 39(2), 205-223. 131 Walter, M. T., Walter, M. F., Brooks, E. S., Steenhuis, T. S., Boll, J., & Weiler, K. (2000). Hydrologically sensitive areas: variable source area hydrology implications for water quality risk assessment. Journal of Soil and Water Conservation, 55(3), 277-284. Walter, M. T., Steenhuis, T. S., Mehta, V. K., Thongs, D., Zion, M., & Schneiderman, E. (2002). Refined conceptualization of TOPMODEL for shallow subsurface flows. Hydrological Processes, 16(10), 2041-2046. Wang, G., Xu, Z., & Zhang, S. (2014). The influence of land use patterns on water quality at multiple spatial scales in a river system. Hydrological processes, 28(20), 5259-5272. Wang, X., Palazzo, D., & Carper, M. (2016). Ecological wisdom as an emerging field of scholarly inquiry in urban planning and design. Landscape and Urban Planning, 155, 100-107. Wu, J. (2006). Landscape ecology, cross-disciplinarity, and sustainability science. Landscape Ecology, 21(1), 1-4. Wu, J., & Wu, T. (2013). Ecological resilience as a foundation for urban design and sustainability. In Resilience in Ecology and Urban Design (pp. 211-229). Springer, Dordrecht. Yang, B., & Li, M.-H. (2011). Assessing planning approaches by watershed streamflow modeling: Case study of The Woodlands; Texas. Landscape and Urban Planning, 99(1), 9-22. Yang, B., Li, M.-H., & Huang, C.-S. (2015). Ian McHarg’s ecological planning in The Woodlands, Texas: Lessons learned after four decades. Landscape Research, 40(7), 773- 794. Yang, B., & Li, S. (2016). Design with Nature: Ian McHarg’s ecological wisdom as actionable and practical knowledge. Landscape and Urban Planning, 155, 21-32. Yang, B. (2018). Landscape Performance: Ian McHarg’s ecological planning in The Woodlands, Texas. Routledge. Yu, K. (1996). Security patterns and surface model in landscape ecological planning. Landscape and urban planning, 36(1), 1-17. Xiang, W. N. (2014). Doing real and permanent good in landscape and urban planning: Ecological wisdom for urban sustainability. Landscape and Urban Planning, (121), 65- 69. Xiang, W. N. (2019). Ecopracticology: the study of socio-ecological practice. Socio-Ecological Practice Research, 1-8. Xu, Z., Liu, Y., Yen, N., Mei, L., Luo, X., Wei, X., & Hu, C. (2016). Crowdsourcing based description of urban emergency events using social media big data. IEEE Transactions on Cloud Computing. 132 Zhang, Z., Zhou, J., Song, J., Wang, Q., Liu, H., & Tang, X. (2017). Habitat suitability index model of the sea cucumber Apostichopus japonicus (Selenka): A case study of Shandong Peninsula, China. Marine pollution bulletin, 122(1-2), 65-76. 133 CHAPTER 5 CONCLUSION AND RECOMMENDATION Landscape-water quality nexus studies with large spatial content and a long time period require a complex research design, large data inputs, and robust analytical methods. In this research, the relationships between landscape characteristics and stream water quality in the Texas Gulf Region from 1990 to 2011 were quantified and analyzed, and the relevant management solutions were proposed. It was discovered that given the same impervious surface area, urban spatial pattern was significantly influential on stream water quality. High-density aggregated urban development led to significantly better stream water quality compared to the current sprawl development. Regarding the general land-water relationships, urban development patterns, soil, and climate were the most significant factors in determining all pollutant concentrations, but the relationships varied according to the season and location. The relationships between land cover, climate and water quality did not change significantly from 1990 to 2011. The variations of landscape and climatic factors at the local scale accounted for more than 50% of the variations in stream water quality. At the basin scale, they accounted for about 20% of the stream water quality variations. Management practice should target different regions and basins. Generally, placing BMPs in HSAs was efficient in reducing nutrient loading. This research was novel as it combines cutting edge technologies to frame a large-scale longitudinal study in the landscape architecture discipline. Machine learning was used to find the most important factors affecting stream water quality, and to predict stream water quality given different urban spatial patterns. Linear mixed models were designed to quantify complex spatially and temporally varying landscape-water quality relationships in a simple and interpretable approach. Hydrological modeling was used to find the threshold to define HSAs that are the most efficient at reducing nutrient loadings. In addition, an annual land cover classification remote 134 sensing algorithm was designed to obtain the annual land cover change, which was used to explain the change in stream water quality. Overall, this dissertation determines an advanced technical workflow to study large scale water quality issues. In the 2011 cross-sectional study, it was concluded that urban spatial patterns, soil, and climate were the most important factors in determining stream pollutant concentrations. The configuration of urban area was more important than the composition of urban area. Using a random forest predictive model, it was found that high density aggregated development contributed to the lowest level of stream pollutant concentrations. This conclusion supports the urban planning policy towards compact city planning. Methodologically, the machine learning model was flexible and robust. Thus, it could incorporate any other factors of interest, and was applicable to be generalized to other regions. In the longitudinal study, the focus was on how the variations in the landscape-water quality relationships were explained with different spatial and temporal scales. The annual land cover classification results indicated an obvious deforestation trend in the Piney Woods, the South Texas Plains and the Gulf Prairies and Marshes ecoregions after 2000. This deforestation and urban expansion together led to water quality degradation in the Texas Gulf Region. For example, adding 1 percent of urban area led to a 6.31% increase of NO3 --N concentration and a 3.52% increase of PO4 3--P concentration in the Texas Gulf Region. It was also discovered that some unobserved characteristics other than land cover and climate led to the high nutrient concentration in the Middle Colorado-Concho and the Lower Trinity basins, and the high E.coli concentration in the San Jacinto basin. Overall, the Texas Gulf Region had quite heterogenous land-water relationships, and the specific management practice should be targeted at the local level. 135 Finally, a basin-scale study in the Middle Brazos-Bosque basin was conducted to verify that placing BMP in HSAs was effective in reducing nutrient concentration. The HSA approach had been proposed and mapped by other studies, but there was little research effort verifying the threshold to delineate HSA. After a comparison among multiple planning scenarios simulated in SWAT, it was found that areas with the highest 2% hydrological sensitivity should be preserved as green space in the watershed to control nutrient pollutions. Several policy recommendations were driven from this study. First, compact city should be promoted in land use planning for stream water quality protection. Regulating urban sprawl would be particularly helpful in reducing E.coli concentration in the Texas coastal areas. Second, stream water quality conservation should be paid greater attention to areas with higher soil storage and areas with high precipitation. Precision agriculture and conservation tillage should be applied in the north parts of the Texas Gulf Region. The reforestation efforts should be exerted in the Piney Woods ecoregion to avoid further habitat loss. Habitat restoration actions should be taken in the Gulf Prairie and Marshes ecoregion. Third, urban development and agriculture cultivation should avoid HSAs to protect stream water quality. Green infrastructure such as constructed wetland and bioretention are more appropriate to site on HSAs. Resilient redevelopment strategies such as connecting impervious surface should also be prioritized on HSAs. Based on the findings, the recommendation is made for future research to focus on validating causal relationships between landscape factors, climatic factors, and stream water quality. It is also worth incorporating socioeconomic factors to form a comprehensive framework to understand how stream water quality responds to diverse human activities. Explicit planning policy implications and design solutions can be drawn with this full picture of land-water relationships. In such studies, a flexible combination of big data technologies, conventional statistical methods, and hydrological 136 modeling holds great promise in getting more interpretable and credible results. I foresee the necessity of continuous research efforts to apply cutting edge methods in water-oriented planning and design, which can contribute to the plan for a more sustainable future. 137