iv v vi Modeling Stream Temperatures .. . 7 A Challenge for Modeling : Selection of Time Period and Data Granularity 10 Purpose of the Study 1 5 . 1 5 Study Site . . 1 5 Data Collection . 1 6 Revising the Data, Applying Data Granularity, and Testing Linearity 8 Comparisons of Goodness of Fit for Each Data Granularity Scenario .. 1 9 Multicollinearity Diagnosis and Response of Parameter Estimates to Data Granularit y . 20 Evaluating Model Performances by Using July - Restricted and June - October Data 1 2 1 Data Granularity Influenced Model Predictive Power and Model Weight 1 Data Granularity Leads to Instability of Parameter Estimates in Best Fitting Model 5 Data Granularity Increased Multicollinearity in Raw Data 1 Using July - Restricted Data Did Not Improve Model Prediction Power 5 38 How Does Model Performance and Choice Vary with Data Granularity? ........................ 39 What are the Possible Reasons for Model Performance and Selection Changes with Data Granularity? ............................................................................................................................ 4 3 How Do Models Perform with July - Restricted Data? .......................................................... 4 6 CONCLUSIONS AND IMPLICATIONS 4 8 APPENDICES 1 APPENDIX A: Tables 2 APPENDIX B: Figures 3 APPENDI X C: Model Parameter Calculation 7 8 APPENDIX D: RStudio Codes 80 BIBLIOGRAPHY 1 vii CHAPTER 2: THE EFFECT OF STREAM THERMAL CLASSIFICATION AND DATA POOLING ON TEMPERATURE GRADIENT MODELING 8 8 A Challenge in Stream Management: Limited Data 8 8 Recent History of Stream Classification 8 9 Data Pooling, Model Generalization and Stream Management Practices 1 METHODS 3 Study Site and Data Collection 3 Stream Classification and Model Per formanc e 3 Obtaining and Evaluating Models 4 RESULTS 6 Pooling Data Changed Model Dynamics and Model Outcomes 6 Classifying Streams Reduced Overall Model P erformance with July - Restricted Data 4 DISCUSSION 107 What is the Effect of Data Pooling on Model Dynamics? ................................................... 1 0 8 Does Stream Classification Improve Model Performance? ................................................ 1 0 8 Do Models Work Better for Warm or Cold Streams? ........................................................ 1 10 Does Using July - Restricted Data Change Model P erformance? ........................................ 11 3 CONCLUSIONS AND IMPLICATIONS 4 APPENDIX 1 7 BIBLIOGRAPHY 6 viii Mean adjusted correlation ( R 2 ) values of each model by data granularity across all streams with June - October data 2 Table 1.4 . Strea m temperature models (Magnusson et. al. 2012; Andrews 2019) 2 Table 1.5. Intercept and parameter estimate values of Model 10 across different data granularity. Tobacco River June October 2016 data was 3 Table 1.6 . Streams and rivers with their regions (SLP: Southern Lower Peninsula; NLP: Northern Lower Peninsula; UP: Upper Peninsula), thermal classes, upstream latitudes, upstream longitude, downstream latitude, and downstream lo ngitude (Zorn et. al. 2008; Andrews 2019) 4 Table 1.7. Starting and ending day of year for sampling in each stream for 2016 5 Table 1.8 . Correlation matrix of values with hourly data. Model 10 with seasonal data was used to obtain values. Correlation between variables were obtained across streams 6 Table 1.9. Correlation matrix of values with 2 - hour data granularity . Model 10 with seasonal data was used to obtain values. Correlation between variables were obtained acros s streams 7 Table 1.10. Correlation matrix of values with 6 - hour data granularity . Model 10 with seasonal data was used to obtain values. Correlation between variables were obtained across streams 5 8 Table 1.11. Correlation matrix of values with 12 - hour data granularity . Model 10 with seasonal data was used to obtain values. Correlation between variables were obtained across streams 5 9 Table 1.12. Correlation matrix of values with daily data granularity . Model 10 with seasonal data was used to obtain values. Correlation between va riables were obtained across streams 60 Table 1.13. Correlation matrix of values with weekly data granularity . Model 10 with seasonal data was used to obtain values. Correlation between variables were obtained across s treams 1 Table 1.14 . Mean adjusted correlation ( R 2 ) values from July - restricted and June - October across models . Student t - test was used to find p - values 2 ix Table 2.1 . Intercepts and parameter estimates from Stream - Specific models (SSMs) applied to each stream for June October hydrological data 9 7 Table 2.2 . Parameter estimates of Class - Based and Global Based models. June - October 2016 data were used 8 r ) values of SSM, C BM and GBM across . 10 4 Table 2.6 . Mean observed and predicted temperature gradient values, absolute bias values of Stream - Specific models (SSM), Class - Based models (CBM) and Global - Based model (GBM) predictions. June - October 2016 data were used 1 8 Table 2.7 . Mean downstream temperatures of streams with June - October and July - restricted data for year 2016. The stream classes are based on Zorn et. al. (2008), cold (C): July Mean - - transitional: based on their mean JMT values from 3 0 - years of data (Zorn et. al. 2008) 1 9 x Figure 1.1 . Components of heat energy budget in streams 5 Components of water budget in streams. Heat energy budget forms the downstream discharge 6 Figure 1.3 . The locations of 16 streams that were selected for this study .. 1 6 Figure 1.4 . Mean adjusted correlation ( R 2 ) values of all streams (June - October 2016) based on different data granularity scenarios and different models 2 2 Figure 1.5 . The percentage of the models having the highest model weight at least one stream for each data granularity with June - October 2016 data 2 4 Figure 1.6 . Response of to data granularity. Model 10 was used with June - October 2016 data. values of all streams were averaged. Q Up : upstream discharge ; Q Down - Q Up : difference between downstream and upstream discharge; T Air T Up : difference between air temperature and upstream temperature ..2 6 Figure 1.7 . Response of to data granularity. Model 10 was used with June - October 2016 data . values of all streams were averaged. S : day length; : altitude angle; Up : upstream heat flow ; Base : baseflow heat flow ; Over : Overflow heat flow ...2 7 Figure 1.8 . Correlation ( r ) between values across all streams with hourly data granularity. Model 10 was used with June - October 2016 data. The color and the size of circles indicate the sign and the numerical value of correlation 2 8 Figure 1.9 . Correlation ( r ) between values across all streams with 2 - hour data granularity. Model 10 was used with June - October 2016 data .. 2 9 Figure 1.1 0 . Correlation ( r ) between values across all streams with 6 - hour data granularity. Model 10 was used with June - October 2016 data .. 29 Figure 1.1 1 . Correlation ( r ) between values across all streams with 12 - hour data granularity. Model 10 was used with June - October 2016 data .. 30 Figure 1.1 2 . Correlation ( r ) between values across all streams with daily data granularity. Model 10 was used with June - October 2016 data .. 30 Figure 1.1 3 . Correlation ( r ) between values across all streams with weekly data granularity. Model 10 was used with June - October 2016 data . . 3 1 Figure 1.1 4 . The amount of correlation ( r ) between environmental variables in Tobacco River with hourly (a) and 2 - hour (b) June - Oct ober 016 data shown in correlogram .. . 3 2 xi Figure 1.1 5. The amount of correlation ( r ) between environmental variables in Tobacco River with 6 - hour (a) and 12 - hour (b) data granularity. Model 10 with June - October 2016 data was used 3 3 Figure 1.1 6. The amount of correlation (r) between environmental variables in Tobacco River with daily (a) and weekly (b) data granularity. Model 10 with June - October 2016 data was used 3 4 Figure 1.1 7 . Mean adjusted correlation ( R 2 ) values of each data granularity scenarios based on all regression models. Whiskers represent standard errors of sample .3 6 Figure 1.1 8 . Mean adjusted correlation ( R 2 ) values of models based on averaging all data granularity scenarios. Lines represent mean adjusted correlation values obtained by using July restricted data (blue) June - October data (orange) ...3 7 Figure 1.1 9 . Mean adjusted correlation ( R 2 ) values of models across all streams with July 2016 data across data granularity scenarios . 38 Figure 1.20 : Air temperature downstream temperature (T a T w ) across observed temperature gradient of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016 3 Figure 1.21. Upstream discharge ( Q Up ) (cubic meters per second CMS) across observed temperature gradient of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016 . 6 4 Figure 1.22. Upstream discharge downstream discharge ( Q Up Q Down ) (cubic meters per second CMS) across observed temperature gr adient of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016 5 Figure 1.23. Day length ( S ) across observed temperature gradient of To bacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016 . 6 6 Figure 1.24. Altitude angle ( ) across observed temperature gradient of Tobacco Riv er with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016 . 6 7 Figure 1.25. Upstream heat f low ( Up ) across observed temperature gradient of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016 6 8 Figure 1.26. Baseflow heat f low ( Base ) across observed temperature gradient of Tobacco River w ith hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016 6 9 xii Figure 1.27. Overflow heat fl ow ( Over ) across observed temperature gradient of Tobacco River with hou rly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016 70 Figure 1.28. Observed and predicted temperature gradient (°C) of Tobacco River with hourly (a), 2 - hour (b), 6 - h our (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016. Predictions were obtained from Model 10 1 Figure 1.2 9 . Mean adjusted correlation ( R 2 ) values of models based on averaging all data granularity scenarios with June - 2 Figure 1. 30 . Mean adjusted correlation ( R 2 ) values of each data granularity scenarios based on all regression models with June - 3 Figure 1. 31 . Parameter estimate ( ) values of some predictor variables across streams with hourly June - October 2016 data. Model 10 was used to obtain 4 Figure 1. 32 . Parameter estimate ( ) values of some predictor variables across streams with we ekly June - October 2016 data. Model 10 was used to obtain 5 Figure 1. 33 . Air temperature across time with hourly (a), daily (b) and weekly (c) data granularity. Tobacco River July 2016 (July 1 July 31) data were 6 Figure 1. 34 . Adjusted correlation ( R 2 ) values across data granularity. Model 10 was used with June - 7 Figure 2.1 . Classification of streams and rivers at a national scale based on annual stream regimes ( from Maheu et. al. 201 6 ) ... 90 Figure 2.2 . The absolute value of biases averaged for each stream class. The higher the mean absolute bias, the higher the overall mean temperature gradient prediction deviates from the overall mean observed temperature gradient 100 Figure 2. 3 r ) values of SSM, CBM. GBM across mean downstream temperatures from June - October 2016 .10 2 Figure 2. 4 . Bias ( B ) versus mean downstream temperature. June October 2016 data were used ...10 2 Figure 2. 5 . Pearson correlation coefficient ( r ) values of SSM, CBM. GBM across mean observed temperature gradient from June - October 2016 .10 3 Figure 2. 6 . Bias ( B ) versus mean observed temperature gradient. June October 2016 data were used ...10 4 Figure 2. 7 ( r ) of Class - Based Models with June - October 2016 data and July 2016 data 10 5 Figure 2. 8 . Pearson correlation coefficient ( r ) values of SSM, CBM. GBM across mean downstream temperatures from July 2016 10 6 xiii Figure 2. 9 . Pearson correlation coefficient (r) values of SSM, CBM. GBM across mean observed temperature gradient from July 2016 1 0 7 Figure 2. 10 . Observed and predicted temperature gradient (°C) from Stream - Specific, Class - Based, and Global - Based models. Cedar Creek (cold) June - October 2016 data were used 1 20 Figure 2.11 . Observed and predicted temperature gradient (°C) from Stream - Specific, Class - Based, and Global - Based models. Tobacco River (cold - transitional) June - October 2016 data were used ...12 1 Figure 2.12 . Observed and predicted temperature gradient (°C) from Stream - Specific, Class - Based, and Global - Based models. Escanaba River (warm - transitional) June - October 2016 data were used ...12 2 Figure 2.13 . Observed and predicted temperature gradient (°C ) from Stream - Specific, Class - Based, and Global - Based models. Prairie River (warm) June - October 2016 data were used 12 3 Figure 2.14 . Average Pearson correlation coefficient ( r ) based on stream classes. June October data were used ...12 4 Figure 2.15 . Mean Pearson correlation coefficient ( r ) values were averaged based on stream classes. July 2016 data were used .12 5 1 C HAPTER 1: T HE I MPACT OF D ATA G RANULARITY ON T EMPERATURE G RADIENT M ODELING IN M ICHIGAN S S TREAMS INTRODUCTION Freshwater ecosystems are a priority of conservation efforts since they are more prone to lose their biodiversity compared t o terrestrial ecosystems ( Sala et. al. 2000 ). In addition to their ecological importance, freshwater resources are very important for humans as they constitute only 0.01% of the total water budget in the world ( Dudgeon et. al. 2006 ). It is known that these critical water systems and their biodiversity show regional differences in their reactions to environmental changes based on their unique environmental conditions. For example, Carpenter et al. (1992) predicted that the biodiversity in high altitude and l atitude streams is more susceptible to decline when compared to biodiversity in tropical and temperate streams due to alterations in stream temperature patterns, mostly based on climate change and changes in land cover (e.g., Woltemade and Hawkins 2016 ). I n addition to climate change and land cover changes, an important driver of stream temperature is the amount of groundwater input (e.g., Woltemade and Hawkins 2016), a factor that is vulnerable to human alteration by groundwater withdrawal. Climate change, land cover change, and groundwater withdrawal occur across the globe, but manifest themselves in changes to water temperature as a local scale. This is Raymond Nac e from U.S. Geological Survey ( Nace 1967 ). Stream temperature is one of the most important aspect of riverine systems as all freshwater organisms and their life cycles are affected by it. Therefore, the effect of water temperature has been well studied with a long history of investigation. For exam ple, the effect of stream temperature on aquatic plants and their photosynthesis rates is well explained by Iversen 2 (1971) and Sand - Jensen (1989) . They showed that while light availability is the main driving factor of photosynthesis, stream temperature ca n change the structure of the primary producer community especially in pools and slow - flowing streams in addition to littoral zones because of the lack of vertical mixing. In addition to the direct effect on the growth rate of primary producers by changing the rate of photosynthesis, water temperature can also change the chemistry of water by changing the solubility of water chemicals ( Wetzel 1960 ). In addition to primary producers, there have been numerous studies on aquatic invertebrates, with documente d changes to drift behavior (e.g., Wojtalik and Waters 1970 ; Jackson et. al. 2007 ), and production (e.g., Galbraith and Vaughn 2009 ). Patrick et. al. (2019) , for example, revealed the relationship between stream invertebrate production and hydrological cha racteristics of streams in a global scale. They used estimates of secondary production of stream invertebrates from 164 sites distributed globally. Secondary production is particularly important because it is considered as a main determinant of dynamics in higher trophic levels. By using their metamodel, they concluded that stream temperature had the highest overall effect on annual community secondary production among other environmental covariates (e.g. , latitude, elevation, forest cover, monthly discharg e). Although the streams may have unique hydrological characteristics and biota, this study posed an overall picture of how stream temperature affects invertebrate biomass in streams from a global perspective . Fish have also been a focus of many studies, and the effect of water temperature on fish distribution, productivity and survival is well - understood. For example, the effect of water temperature on fish physiology is well explained by Ficke et. al. (2007) who describe d the relationship between fish me tabolic rate and water temperature. They also emphasize that the effect of water temperature occurs even at the cellular level as the stability of proteins varies with 3 temperature. Since fish physiology responds strongly to water temperature, it can be con cluded that water temperature directly affects fish reproduction and survival. In addition, fish community structure can also change with water temperature. In a recent study, Morales - Marin et. al. (2019) modelled the distribution of Athabasca Rainbow Trou t, Oncorhynchus mykiss, which is considered as a species at risk, by using predicted future stream temperatures in Athabasca River basin, AB, Canada. Using the rainbow trout water temperature tolerance ranges and predicted distribution of water temperature in the basin, they concluded that the changing temperatures would constrain the Rainbow Trout to the Northern parts of the basin and this can potentially change the fish community structure by opening new niche areas for other fish species. The effect of stream temperature and water withdrawal on fish distribution and growth in Michigan has been observed in several recent studies (Zorn et. al. 2004; Wehrly et. al. 200 7 ; Nuhfer et. al. 2017 ). For example, Nuhfer et. al. (2017) observed that reductions in discharge did not have a significant effect on brook trout density, but spring - to - fall growth of fish declined significantly under 75% or more discharge reductions. They also observed that warming rates increased with increased water withdrawal, but the change in temperature was relatively small because the reach was quite short (602 m). However, they predicted that the increase in water temperature that would be caused by 90% flow reduction would have eliminated over 80% of hab itable areas for brook trout in the whole river system. As stream temperature is critical for riverine systems, it is important to understand the physical processes that drive and affect stream temperatures. Therefore, the following section is devoted to describing those processes and environmental variables. 4 Physics Behind Temperature G radient in Streams The change in water temperature between two points in a stream (which I will refer to hereafter as temperature gradient) is determined by several environmental factors or processes. Four of the main process es influencing temperature gradient are radiative energy exchange, conduction, evaporation, and direct changes due to input or loss of water to the stream ( Figure 1.1 ) . Radiative energy exchange occurs via incoming solar radiation (i.e., shortwave radiati on), longwave radiation that is mainly emitted by the water body, and back radiation that includes reflected solar radiation by the water body (Cheng and Wiley 2016). Heat transfer via conduction occurs between the river base and the water body and between the water body and the atmosphere. Evaporative heat loss can occur in streams but is generally thought to be a minor component in the overall heat budget (Cheng and Wiley 2016). Finally, the heat energy contained in incoming surface water and groundwater contribute to temperature gradient by directly adding water with a potentially different temperature signature than the stream itself. 5 Figure 1.1 . Components of heat energy budget in streams. As the thermal signature of runoff and groundwater contributions influence temperature gradient, it is important to consider the water budget within a stream. The discharge at a point in a river is based on upstream discharge and the net effects of evapora tion, transpiration, incoming - outgoing surface water runoff (mostly determined by amount of precipitation) and incoming - outgoing groundwater ( Figure 1.2 ) . 6 Figure 1.2 . Components of water budget in streams. Heat energy budget forms the downstream discharge. A simple equation for downstream discharge can be written as follows: Q down = Q up + (R in R out ) + (G in - G out ) where Q down stands for the downstream discharge, Q up stands for upstream discharge, R in stands for incoming runoff, R o ut stands for outgoing runoff, G in and G out stands for input and outflow of groundwater, respectively. Groundwater inputs occur as water moves from the water table through hyporheic zone into a stream ( Vogt et. al. 2010 ) and they are vulnerable to groundwater withdrawal. If the water table is equal or higher than the surface water, groundwater input occurs (i.e., gaining reach) ( Storey et. al. 2003 ). However, if the water table is lower than surface water level, the strea m loses water to the aquifer , which can be viewed as reducing in - stream discharge ( Ruehl et. al. 2006 ) . Precipitation is included in incoming runoff because the majority of precipitation joins the stream from the landscape instead of directly falling on th e 7 stream. Although outgoing runoff is conceptually possible, it does not have a substantial influence on the downstream discharge. In this equation, evaporation and transpiration are not represented as these are typically minor quantities in streams (Chen g and Wiley 2016). As indicated in the above equation, the amount of groundwater contribution is especially important in smaller streams where groundwater flow plays a large role in the water budget, and consequently in the amount of temperature gradient a long a river. Modeling Stream Temperatures There are many models for representing stream temperature dynamics. Stream temperature models can be divided into two main groups: deterministic and statistical/stochastic models. Both have different features, st rengths, and weaknesses under different circumstances. Therefore, selection of the model type is important to make reliable representations of stream temperatures. Deterministic models use mathematical expressions and equations based on physical laws (suc h as laws of thermodynamics, fluid mechanics, etc.) that govern the interactions between the stream and its surroundings ( Benyahya et. al. 2007 ). Since they use an energy budget approach, they generally require large amounts of detailed data for driving va riables such as air temperature, solar radiation, wind, humidity, depth of water, velocity and so on ( Morin and Couillard 1990 ; S i nokrot and Stefan 1993 ; St - Hilaire et. al. 2000 ; Benyahya et. al. 2007; Cheng and Wiley 2016). Deterministic models have been successfully used in a variety of situations and can be effective and appropriate to use because the heat budget equations can be modified based on different purposes such as analyzing and comparing the impacts of environmental changes (St - Hilaire et. al. 2000; Benyahya et. al. 2007). 8 Because they are typically complicated and costly to implement due to intensive data requirements, practitioners have sought to simplify deterministic models without losing their robustness. Cheng and Wiley (2016), for exampl e, addressed some challenges of building and using physically based heat balance models, such as scarcity and unreliability of data for parameter values especially for large watersheds ( Edinger et. al. 1974 ; Crittenden 1978 ), using a steady - state solution that assumes that the parameters do not change temporally or spatially ( Bartholow 2000a ; Borman and Larson 2003 ; Bartholow et. al. 200 4 ), and region - specific relationships between stream temperatures and stream flows. Statistical mode ls are alternatives for deterministic models. One of the main differences between deterministic and statistical models is that the latter tend to be more simplistic and require less data, which can be advantageous in such cases that data collection may cos t workforce, time, and money (Benyahya et. al. 2007). Benyahya et al. (2007) classified statistical models into two groups: parametric and non - parametric models. The structure of non - parametric models depends on the data and do not use conventional mathema tical functions; instead, they adopt a set of relations between parameters and the output variable (e.g., Artificial Neural Networks ; Benyahya et. al. 2007). Parametric models, on the other hand, adopt mathematical functions and they are very useful explai ning the variation in some environmental variables ( e.g . , water temperature) by using the variation in other variables ( e.g. , air temperature ; Benyahya et. al. 2007). Benyahya et. al. (2007) classified linear regression models, which are the focus of my s tudy, as parametric models . Linear regression models have been used to simulate stream temperatures as a function of one (e.g., air temperature) or more independent variables (e.g., air 9 temperature, vegetation cover, groundwater recharge; Benyahya et. al. 2007). Although simple regression models use the structure: T w (t) = a 0 + a 1 T a (t) + (t), where T w (t) is modelled water temperature for a given time period; T a (t) is air temperature for the same time period; a 0 and a 1 are regression coefficients and (t) is the error term for given time, the model can be modified to a multiple regression equation by adding other independent variables such as amount of flow ( Webb et. al. 2003 ; Benyahya et. al. 2007; Andrews 2019). Andr ews (2019) developed a suite of regression models to simulate the temperature gradient in 21 streams in Michigan. He collected hydrological and meteorological data from 15 streams in 2015 (July to early November) and 21 streams in 2016 (May through October ) at 15 - minute intervals. He built 11 regression models ( Table 1.4 ) that included different independent variables, and one model that was a deterministic model based on a previous study (Magnusson et. al. 2012). He compared those models based on their fit Information Criterion (AIC) (Akaike 1973), and root mean square errors (RMSE) ( Janssen and Heuberger 19 9 5 ) between observed and predicted values. In addition to model accuracy and correlation with observed data, he used part ial regression analysis to determine the strength of the impacts of different parameters in the best model that was selected by AIC. Finally, he evaluated the implications of baseflow reductions by using the most highly selected model. One of the findings from his analysis was that two models received the highest weight of Avg. ) of 0.74 for the highest ranked model, Model 10 ( Eqn. 1 ) . This model also had the highest correlation with 10 observed data in 76% of the 21 streams, with an average correlation ( r ) for one - year and two - year data sets of 0.66 and 0.58, respectively. Eqn . 1 . where T a is air temperature (°C), T W is water temperature, Q up is the upstream flow (m 3 /sec), Q Down is the downstream flow( m 3 /sec), S is the day length (hours ) , is the altitude angle, up is the upstream heat gradient (°C) , base is the baseflow heat gradient (°C), T ower is the overland flow heat gradient (°C) (Andrews 2019) . Although Andrews (2019) successfully applied these regression models, which provided a number of insights into drivers of stream temperature gradient, several questions remain considering the possible challenges that might be encountered in other hydrologi cal modeling studies. I will address these potential challenges in following section. A Challenge for Modeling : Selection of Time Period and Data Granularity Data collection and modeling serve a variety of purposes for ecological and stream conservation. Because of the variety of uses, the time period across which data are collected and the level of data aggregation in time varies widely. For example, if the long - term effects of some environmental parameter change are the main focus, researchers tend to use yearly periods or all seasons when predicting the response variable. Studies that focus on the effects of global climate change are good examples for selection of annual periods ( Sinokrot et. al. 1995 ; Isaak et. al. 2012 ; Anderson and Konrad 2019 ). On the other hand, a narrower time period is often used to predict the effects of environmental parameters that can change seasonally such as vegetative cover, soil temperature, concentration of nitrates and phosphates (St - Hilaire et. al. 2000; Álvarez 11 Cabria et. al. 201 6 ). Shorter time periods (e.g., monthly) may be used when the focus is on periods of ecological stress; for example, Zorn et. al. (2004) modelled the distribution of fish populations based on predicted July mean temperature under different baseflow reduction scenarios. Although the time period for data collection is generally selected based on the purpose of study, and not based on model success, the reliability of model outputs is still important for explaining the variation in response variables with predictor variables. Therefore, it is cruc ial to understand and interpret the response of model success to use of different time periods (e.g., seasonal, and monthly). Moreover, understanding the response of model reliability with different time periods can give researchers a clue how model reliab ility varies as the ecological relevance of the time period selection varies. Selecting the level of time aggregation, which I will refer to as data granularity in this study, is an important decision - making step in modeling s been used in the field of business (e.g., Kim et al. 2019 ) and energy production and distribution (e.g., Kools and Phillipson 2016 ), but to my knowledge it has not been used in the hydrological literature. With current technology and data collection tool s, researchers can collect environmental data at very fine time intervals such as every minute or 15 minutes and use the data with various data granularity levels by taking averages at broader time intervals (e.g., hourly time interval). In the literature, different studies have used a variety of data granularity ranging from hourly ( Caissie et. al. 2001 ) and daily ( Cheng and Wiley 2016 ), to weekly (Stefan and purpose of the study was shaped by the ecological relevance of the selected data granularity. For example, Zorn et. al. (2004) used July averages to model fish distribution based on the close 12 relationship observed between July mean temperatures cold water fish pop ulations. However, selecting the level of data granularity may not be entirely dependent on the purpose of study or ecological relevance of data granularity. Data granularity may be selected for a variety of reasons such as the features of data collection tools (e.g. , data collection devices may have variety of sampling interval) ( Johnson et. al. 2005 ) modeling literature, the reason for selecting a level of data granu larity is not stated often or explained in detail in the majority of studies. This implies that data granularity may be selected arbitrarily in most cases. However, arbitrary selection of data granularity may cause biases in model evaluation and selection processes ( Kirchner 2006 ). This may eventually affect the decision - making processes and evaluation of hydrological and ecological implications. Therefore, selection of data granularity poses a considerable challenge for researchers and managers as the conc lusions may depend on arbitrary choices. Some studies in the past examining the consequences of using different data granularity on model success have already that wa ter and air temperatures were more correlated, and their relationship was less scattered, as the time averaging of data increased from two hours to weekly averages. Pilgrim et. al. (1998) also found that the slope of the regression line increased with incr easing data granularity (daily, weekly, and monthly). Webb et. al. (2003) obtained similar results when they used hourly, daily, and weekly temperature mean values of different streams in Devon River System, that is, the correlation coefficient ( r 2 ) betwee n air temperature - stream temperature increased from hourly mean temperature values to weekly mean values in all streams. 13 Considering the magnitude of the problem, the number of studies in the literature is still limited and the issue needs to be address ed for recent hydrological studies. For example, although his models were useful in representing the dynamics of temperature gradient in Michigan streams, Andrews (2019) only used a single data granularity (i.e., hourly). Therefore, evaluating the response of his models to different data granularity levels would lead a better understanding of these models. Although I address the effect of data granularity on model success in this study, the focus of my study is not to defin e what is the most appropriate or relevant time period or data granularity for a particular problem, but rather to determine the modeling implications or consequences of changing either of these factors. Using different data granularity can alter the model dynamics ( i.e. , the influence of predictor variables) and affect the results of model evaluation methods. Change in parameter estimates of models with different data granularity can be responsible for differences in perceived system dynamics and model pre dictive power. For example, the best fitting model with hourly data (Andrews 2019) may have different parameter estimates with different data granularity and this may potentially change conclusions based on predictive powers of models. In addition to effe cts on model predictive power, using different data granularity may also 10 (2019) had the best model fit - complexity balance (i.e., model weight), however, it is un known whether using coarser data granularity (e.g., daily) would still lead to Model 10 hav ing the highest model weight, and the best option for temperature gradient prediction. If not, which regression model would give the best model fit - complexity balanc e with daily data? Although my questions are related to the specific cases from Andrews (2019), they are relevant in many other ecological and hydrological modeling studies. Therefore, finding answers is important for 14 future studies and environmental impli cations of regression models because it w ould reveal which environmental factors are specifically important with different data granularity selection . For these reasons, it was necessary to provide more information and a better perspective on data granular ity - model reliability relationship. Considering the potential effects of data granularity on model reliability , preliminary findings suggested that parameter estimates were not stable across different levels of data granularity ( Table 1.5 ) . Although there are many potential causes of parameter instability in regression models, a common source for this problem is multicollinearity in the independent variables. Multicollinearity is defined as the dependency of two or more predictor variable s in a regression model. The primary effect of multicollinearity is an increase of the standard error of parameter estimates. The biased standard errors of parameter estimates affect the significance of parameter estimates potentially leading to biases in model selection processes that can make selecting an appropriate model hard for decision makers and may cause failure in ecological and environmental implementations ( Daoud 2018 ). The problems related with multicollinearity in regression models have been a ddressed in various studies ( Farrar and Glauber 1967 ; Haitovsky 1969; Daoud 2018). The multicollinearity problem is pervasive in hydrological modeling because many environmental variables in topography, geology, geo - morphology, and meteorology are natural ly correlated ( Kroll and Song 2013 ). Moreover, Kroll and Song (2013) also concluded that the sample size (e.g., number of sampled streams) also might affect the amount of correlation between variables. Similarly, Mason and Perreault (1991) found that small er sample sizes exaggerated the effect of multicollinearity on model success. This is particularly important for my research because higher data granularity naturally leads to lower sample size s . Therefore, 15 there was a need for addressing multicollinearity issues in regression models that I used in my research. This would help researchers to have a better perspective of the influence of data granularity on model success and selection. Purpose of the Study In my research, I address the consequences of using different data granularity and time these models, I believe that my findings will be a guide for many other mo deling approaches since modelers have common challenges . In response to these challenges, the main objectives of my study are: 1) To compare the performance of regression models across different levels of data granularity by evaluating their goodness of fit and model weights , 2) To observe the effect of data granularity on parameter estimates and to seek possible explanations for the changing model dynamics with changing data granularity , 3) To analyze multicollinearity of independent variables with different data granularity to have a better insight of parameter estimate instability with changing data granularity , 4) To determine the relative performance of models developed for a broad time frame (June - October) compared to models developed for a narrow time frame (July) that represents a critical ecological period for cold water fishes to observe whether model performance (i.e., model prediction reliability) varies with data window choice based on the ecological relevance of data selection. M ETHODS Study S ite The choice of study streams was based on sites modelled in Zorn et. al. (2008) and 16 to groundwater extraction points in different regions of Michigan based on the different thermal classifications that are explained in Zorn et. al. (2008). I chose 16 of the 24 streams Andrews sampled based on data requirements that are explained in following sections ( Table 1.6 ; Figure 1.3 ). Figure 1.3 . The locations of 16 streams that were selected for this study. Data Collection Andrews (2019) collected hydrological and meteorological data in 2015 from 15 streams with different time periods for each stream but generally ranging from July to early November 17 and from 21 streams in 2016, generally ranging from May to October. He place d stream gauges by using PVC pipes that were stabilized by attaching them to a fence post fixed in the streambed. To obtain water stage data, he integrated staff rulers to gauges and he used HOBO ® U20 Water Level Loggers to gauges to obtain water temperatu re data for every 15 minutes after calibrating the loggers by placing them into ice bath (0 °C) and then letting them reach room temperature slowly. The temperatures that were obtained from all loggers were consistent but were adjusted to the same temperat ure. Air temperature data was collected using Monarch ® Track - It data loggers with 15 - minute intervals, and all water and air temperature data were averaged into hourly temperatures. To obtain stream discharge levels, he used both staff rulers and SonTek Fl owtracker ® . He collected barometric pressure readings from SonTek Flowtracker ® to subtract them from total pressure and find water pressure. The equation that Andrews (2019) used for the discharge calculation ( Eqn. 2 ) was: Eqn. 2 , where Q stands for the stream discharge (m 3 /sec), G stands for the reading on the gauge (inches), and a and b are parameter estimates that were obtained by using a power function while building stage - discharge curve. He derived other constants (or parameters), c, e , f , h , i , and j , from the power function to calculate other hydrological variables ( Eqn. 3 , Eqn. 4 , Eqn. 5 ): Eqn. 3 . , Eqn. 4 . , Eqn. 5 . , where w stands for the width (m) of the stream, d stands for the depth of the stream (m) and V stand for t he water velocity (m/sec). 18 Revising the Data , Applying Data Granularity , and Testing Linearity Although data were available from 2015 and 2016, I chose to use only the streams and rivers that were sampled in 2016 as these had data that covered the longest and most consistent time interval (i.e., June to October; Table 1.7 ). Data were trimmed so that the data started from 1 June 2016 to 31 October 2016 for each stream. Before modeling, I evaluated residual plots for each stream, removing outliers when necessary and removing some data frames based on unrealistic discharge changes. I also plotted the r elationship between dependent and each independent variable as well as between observed a predicted temperature gradient to evaluate whether a linear model appeared to be appropriate constant ( ). Example results from the Tobacco Ri ver, which had the best goodness of fit, between June - October 2016 are presented as an example ( Figure 1.20 to Figure 1.28 ). I also changed the usage of some parameters: upstream heat flow ( , baseflow heat flow ( , overflow heat flow and total heat flow ( ( Table 1.4 ) to better reflect the dynamics of stream discharge. In his study, Andrews (2019) equalized all these parameters to zero when the downstream discharge was lower than the upstream discharge because he suggested that if downstream discharge were lower than upstream discharge, the contribution of upstream flow, baseflow and overflow on downstream discharge and temperature gradient would be ignorable. Another reason was that these parameters tend to have negative values in that case. On the contrary, I directly used the values of these parameters although their values were negative because I suggested that the discharge loss might b e a result of natural processes (i.e. , downwelling) or anthropogenic process (i.e. , groundwater or surface water withdrawal), and those parameters might have had an effect on discharge and temperature 19 gradient . Moreover, temporal changes of those parameter s might have had explanatory power on temperature gradient even if they had negative values. After these refinements and revisions, I took hourly, 2 - hour, 6 - hour, 12 - hour, daily (24 - hour) and weekly (168 - hour) averages of the thermal gradient data and of the environmental data from each stream to create data of increasing granularity . Comparisons of Goodness of Fit for Each Data Granularity Scenario To achieve the first goal in my study, I applied 11 regression models based on Andrews (2019) ( Table 1.4 ) to June - October 2016 data with different data granularity scenarios. I fit each model to each stream and determined the best - fitting models by using two measures of goodness of fit for each data granularity scenario. The first measure was adjusted correlation coefficient ( R 2 temperature modeling studies ( Ahmadi - Nedushan et. al. 2007 ; Mayer 2012 ; Hill et. al. 2013 ) . Adjusted correlation coefficient ( R 2 ) was used to explain the variation of a variable (e.g., predicted temperature gradient) across other variable (e.g., observed temperature gradient). Based on the nature of the equation, value of R 2 is always between 0 and 1, and as the value approaches to 1, model predictive power becomes g reater. To obtain R 2 , I used Eqn. 6 : Eqn. 6 . where n is the number of observations, p is the number of parameters, SSE is the sum of squared residuals and SST , and SST is the total sum of squares. (Akaike 1973) based on the principle of parsimony. AIC is defined as Eqn. 7 : Eqn. 7 . and , 20 where L stands for the likelihood, k stands for the number of unknown parameters, and n stands for the sample size ( Seber and Wild 1989 ). I prioritized the models b y determining their weight of evidence using the formula: Eqn. 8 . where M is the total number of models, m is the model number, and is the difference of AIC values of that model from the AIC value of the best - fitting model. By using model weights, I was able to order the models from the best - fitting to poorest - fitting model while balancing model complexity (Andrews 2019). Multicollin earity Diagnosis and Response of Parameter Estimates to Data Granularity I adopted two approaches to evaluate the implications of multicollinearity among r ) ( Eqn. 9 ) between the parameter estimates of predictor variables that were used in Model 10, the best performin g model, across streams to understand how parameter estimates covary. I obtained r values by using: Eqn. 9. where x and y are variables, and and represent means of variables. I obtained and used correlation diagrams and correlograms to visualize the change of correlation ( r ) between package was used i n RStudio Version 0.98.1103 ( Appendix D: RStudio Codes). To obtain Appendix D: RStudio Codes). The purpose of this approach was to observe the response of mean values to increasing data granularity, leading to a better understanding on the insight of the best predicting model. 21 As multicollinearity in the input data has long been known to influence the stability of parameter estimates, I calculated the degree of multicollinearity between variables in the raw data by using an example stream to evaluate whether the level of collinearity in the data could be driving the instability of parameter estimates. I used Tobacco River June - October 2016 d ata as an example since model predictive power was highest based on my pr eliminary results . Correlation matrices and correlograms were obtained for each time aggregation to analyze the effect of data granularity on the level of correlation in the raw data. Evaluating Model Performances by Using July - Restricted and June - October Data As another purpose in my research, I fit the linear regression models to July 2016 restricted datasets for each data granularity scenario and I found adjusted correlation coefficient ( R 2 ) to evaluate model predictive power across data gra nularity scenarios. Then, I compared model predictive power of July - restricted model and June - October model. The variations between model predictive powers indicated the importance of selecting seasonal or monthly dataset on accuracy of the best fitting mo dels. R ESULTS Data Granularity Influenced Model Predictive Power and Model Weight The relationship between data granularity and the predictive power of linear regression models as measured by the adjusted correlation coefficient ( R 2 ) showed three major patterns. Firstly, overall model prediction powers of all models increased with data granularity ( Table 1.1 ). The second major pattern is that Model 10 had the highest mean adjusted correlation for each of the levels of data granularity ( Figure 1.4 ). 22 Table 1.1 . Mean adjusted correlation ( R 2 ) values of each model by data granularity across all streams with June - October data. Data G ranularity (hour) Model 1 2 6 12 24 168 Average 1 0.139 0.142 0.149 0.198 0.315 0.498 0.24 0 2 0.094 0.098 0.108 0.133 0.202 0.415 0.175 3 0.188 0.209 0.207 0.226 0.311 0.499 0.273 4 0.205 0.209 0.225 0.253 0.34 0 0.571 0.301 5 0.278 0.284 0.309 0.368 0.502 0.732 0.412 6 0.253 0.257 0.279 0.36 0 0.485 0.737 0.395 7 0.329 0.336 0.367 0.502 0.515 0.754 0.467 8 0.258 0.375 0.391 0.453 0.591 0.812 0.48 0 9 0.332 0.336 0.358 0.45 0.587 0.823 0.481 10 0.418 0.423 0.447 0.563 0.598 0.842 0.548 11 0.312 0.32 0 0.342 0.419 0.536 0.793 0.454 Average 0.255 0.272 0.289 0.357 0.453 0.68 0 Figure 1.4 . Mean adjusted correlation ( R 2 ) values of all streams (June - October 2016) based on different data granularity scenarios and different models. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 Mean Adjusted Correlation ( R 2 ) Data granularity (hour) Mod 1 Mod 2 Mod 3 Mod 4 Mod 5 Mod 6 Mod 7 Mod 8 Mod 9 Mod 10 Mod 11 23 When averaged across all levels of data granularity, Model 10 had an average R 2 of 0.548 ( Table 1.1 ; Figure 1.2 9 ). Models 8 and 9 followed closely behind Model 10 in their predictive capacity, with a mean R 2 value of 0.480 and 0.481, respectively ( Table 1.1 ). Models 7 and 11 were generally close in their predictive capacity, with a mean R 2 value of 0.467 and 0.454 , respectively ( Table 1.1 ). Models 1 through 4 showed distinctly lower predictive power than the other models ( Figure 1.4 ). These models lacked parameters representing solar insolation, such as altitude angle and day length, indicating that these paramete rs were of large importance in explaining patterns of temperature gradient across all levels of data granularity. The last major pattern was that the mean correlation generally increased for all models as data granularity was increased from hourly to weekl y time scales ( Table 1.1 ; Figure 1.30 ). When averaged across all models, the mean R 2 value increased from 0.255 for hourly data granularity to 0.680 for weekly data granularity ( Table 1.1 ). While these patterns were quite consistent for the mean response o f adjusted correlation coefficients to data granularity, preliminary analysis suggested that the trends of model predictive power across data granularity varied among streams. Overall, Model 10 received the highest weight of evidence in the majority of dat a granularity scenarios ( Table 1.2 ; Figure 1.5 ). However, the same results showed that the level of data granularity changed the outcome of model selection substantially, where increasing data granularity (i.e., reducing the number of data points) led to reduced weights for the most complex models, and broadened the support for less complex models ( Table 1. 2 ; Figure 1.5 ). 24 Table 1.2 . Percentage (%) of streams where each model had the highest model weight ( w ) across levels of data granularity. June - October data were used in models. Models Data granularity (hour) 1 2 3 4 5 6 7 8 9 10 11 Total 1 0 0 0 0 0 0 0 6.25 6.25 62.5 0 25 .00 100 2 0 0 0 0 0 0 0 6.25 0 62.5 0 31.25 100 6 0 0 0 0 0 0 0 12.5 0 0 50 .00 37.5 0 100 12 0 0 0 0 0 0 18.75 6.25 6.25 43.75 25 .00 100 24 0 0 0 0 6.25 6.25 0 25 .00 12.5 0 18.75 31.25 100 168 6.25 0 0 0 6.25 0 6.25 25 .00 0 31.25 25 .00 100 Figure 1.5 . The percentage of the models having the highest model weight at least one stream for each data granularity with June - October 2016 data. The effect of data granularity on model selection was clearly noticeable as model weights changed across data granula rity scenarios ( Table 1.2 ) . For hourly data granularity, Model 10 had the highest model weight for more than 60% of streams. Model 10 continued to receive the M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 0 10 20 30 40 50 60 70 80 90 100 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 25 highest weight for the most streams across all levels of data granularity except for a daily data granularity, in which Model 11 had the highest percentage. The percentage of other models that were most highly chosen increased with data granularity. For data granularity scenarios of 1 - , 2 - , and 6 - hours , Models 8, 9, 10, and 11 were the only models to be selected as the top models. As data granularity increased to higher levels (i.e. , 12 - hour, daily and weekly ), less complex models, such as Model 1, 5, 6 and 7, emerged as the most highly selected model in some streams. Data Granularity Leads to Instability of Parameter Estimates in Best Fitting Model Parameter estimates ( ) for Model 10 averaged across all streams showed instability with increas ing data granularity. In most cases, for predictor variables showed consistent trends with higher data granularity ( Figure 1.6 and 1. 7 ) . The value associated with upstream discharge ( Q Up ) showed a strong increasing trend with greater data granularity, and even showed a change in the sign of the parameter estimate ( Figure 1.6 ). In contrast, value for day length ( S ) started with a positive sign and ended up with a negative sign with weekly data ( Figure 1.7 ). Furthermore, the general picture indicated that the trends of mean parameter estimate values across data granularity i nfluenced each other. For example, upstream heat flow ( Up ) and overflow heat flow ( Over ) increased from hourly to 12 - hour data granularity and a decrease for greater granularity scenarios, whereas baseflow heat flow ( Base ) showed a decrease from 2 - hour to daily data granularity but increased in weekly granularity ( Figure 1.7 ). 26 Figure 1.6 . Response of to data granularity. Model 10 was used with June - October 2016 data. values of all streams were averaged. Q Up : upstream discharge ; Q Down - Q Up : difference between downstream and upstream discharge; T Air T Up : difference between air temperature and upstream temperature. -5 -4 -3 -2 -1 0 1 2 3 4 5 0 20 40 60 80 100 120 140 160 180 Qup Qdown - Qup Tair - Tup 27 Figure 1.7 . Response of to data granularity. Model 10 was used with June - October 2016 data. values of all streams were averaged. S : day length; : altitude angle; Up : upstream heat flow ; Base : baseflow heat flow ; Over : Overflow heat flow. The potential interaction between mean parameter estimates ( ) led me to evaluate multicollineari ty for parameter estimates ( ) of Model 10 across streams since the interaction between values might have been explained by high correlation between values. Preliminary analysis on Up , Base and Ove r revealed the interaction between these variables ( Figure 1.31 and 1.32 ). These figures showed that if values of Up and Ove r are high on a stream, value of Base tended to be low for that stream, or vice versa. To evaluate the interactions between all parameter estimates, multicollinearity between values was tested by observing coefficient of correlation ( r ). Results showed that increasing data granularity resu lted in a change of overall correlation between the values of explanatory variables ( Table 1.8 to 1. 1 3 ). Correlograms clearly showed this change , as the number of darker and bigger circles varied across data granularity ( Figure 1.8 to 1. 13 ) . In addition, some of the values were highly -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0 20 40 60 80 100 120 140 160 180 S TUp TBase 28 correlated in all data granularity scenarios . Baseflow heat flow ( Base ) and overflow heat flow ( Ove r ) had the highest negative correlation across scenarios. Moreover, Overflow heat flow ( Ove r ) and upstream heat flow ( Up ) had the highest positive correlation in all scenarios. Figure 1.8 . Correlation ( r ) between values across all streams with hourly data granularity. Model 10 was used with June - October 2016 data. The color and the size of circles indicate the sign and the numerical value of correlation. 29 Figure 1.9 . Correlation ( r ) between values across all streams with 2 - hour data granularity. Model 10 was used with June - October 2016 data. Figure 1.1 0 . Correlation ( r ) between values across all streams with 6 - hour data granularity. Model 10 was used with June - October 2016 data. 30 Figu re 1.1 1 . Correlation ( r ) between values across all streams with 12 - hour data granularity. Model 10 was used with June - October 2016 data. Figure 1.1 2 . Correlation ( r ) between values across all streams with daily data granularity. Model 10 was used with June - October 2016 data . 31 Figure 1.1 3 . Correlation ( r ) between values across all streams with weekly data granularity. Model 10 was used with June - October 2016 data. Data Gran ularity Increased Multicollinearity in Raw Data A potential cause of parameter instability and multicollinearity between parameter estimates ( ) might have been the intrinsic multicollinearity between environmental variables in the raw data. Multicollinearity between environmental variables was tested by using Tobacco River data (June - October 2016). Correlation ( r ) between environmental variables showed two major patterns. First, an increase of r between environmental variables was observed ( Figure 1.1 4 to 1. 16 ). Moreover, although the magnitude of correlation varied with increasing data granularity, the sign of r values did not change with data g ranularity. Some of the variables (e.g., Q up and Q Down - Q up versus Up ) were consistently negatively correlated, whereas some parameters (e.g., a ltitude angle versus d ay l ength) were positively correlated. Second, at hourly data granularity, several variables showed high correlation. Both Q up and Q Down - Q up values and Up had the highest correlation in all scenarios. In addition, Q up and Q Down - Q up were other variables that had high correlation in all scenarios ( Figure 1.1 4 to 1. 16 ). 32 Figure 1.1 4 . The amount of correlation ( r ) between environmental variables in Tobacco River with hourly (a) and 2 - hour (b) June - October 016 data shown in correlogram. a b 33 Figure 1.1 5 . The amount of correlation ( r ) between environmental variables in Tobacco River with 6 - hour (a) and 12 - hour (b) data granularity. Model 10 with June - October 2016 data was used. a b 34 Figure 1.1 6 . The amount of correlation ( r ) between environmental variables in Tobacco River with daily (a) and weekly (b) data granularity. Model 10 with June - October 2016 data was used . a b 35 Using July - Restricted Data Did Not Improve Model Prediction Power Although overall R 2 increased with greater data granularity for the July - restricted models, it was less apparent than for June - October models ( Table 1.3 ; Figure 1.1 7 ). This observation was supported by the fact that, in all data granularity scenarios, the p - values were greater than p=0.05 (1 - hour: p=0.1681; 2 - hour: p=0.2869; 6 - hour: p=0.3859; 12 - hour: p=0.7024; 24 - hour: p=0.2581), that is, I failed to conclude that t he mean R 2 values of July - restricted models and June - October models within the same aggregation were significantly different ( Table 1.14 ). In other words, using July restricted dataset did not cause a significant difference between overall predictive power of models. Table 1.3 . Mean adjusted correlation ( R 2 ) values of each model by data granularity across all streams with Ju ly 2016 data. Data Granularity (hour) Model 1 2 6 12 24 168 Average 1 0.144 0.143 0.13 0 0.163 0.116 0.144 0.139 2 0.136 0.139 0.145 0.12 0 0.181 0.136 0.144 3 0.252 0.257 0.274 0.29 0 0.356 0.252 0.286 4 0.261 0.265 0.282 0.298 0.377 0.261 0.297 5 0.275 0.278 0.29 0 0.28 0 0.409 0.275 0.306 6 0.341 0.346 0.366 0.4 00 0.407 0.341 0.372 7 0.355 0.358 0.375 0.421 0.444 0.355 0.391 8 0.394 0.398 0.401 0.399 0.497 0.394 0.418 9 0.448 0.452 0.461 0.443 0.494 0.448 0.46 0 10 0.472 0.476 0.486 0.463 0.519 0.472 0.483 11 0.438 0.432 0.441 0.417 0.455 0.438 0.437 Average 0.32 0 0.322 0.332 0.336 0.387 0.32 0 36 Figure 1.1 7 . Mean adjusted correlation ( R 2 ) values of each data granularity scenarios based on all regression models. Whiskers represent standard errors of sample. In addition, mean R 2 values showed little relation to data granularity for July restricted data ( Figure 1.1 7 ). As observed for June - October data, Model 10 had the highest mean correlation coefficient (0.483) in all data granularity scenarios for models applied to July 2016 data ( Figure 1.1 8 ). Moreover, Model 3 and higher models were grouped together based on their predictive power ( Figure 1.1 9 ) when July - restricted data were used, but this grouping pattern was different since Model 5 and higher models were grouped when June - October data were used ( Figure 1.4 ). This conclusion suggests that the influence of parameters (i.e. , day length and altitude angle) used in models differs between June - October and July - restricted data. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Data Granularity (hr) June-October 2016 July 2016 37 Figure 1.1 8 . Mean adjusted correlation ( R 2 ) values of models based on averaging all data granularity scenarios. Lines represent mean adjusted correlation values obtained by using July restricted data (blue) June - October data (orange). 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6 7 8 9 10 11 12 Mean Adjusted Correlation ( R 2 ) Model Number July 2016 June-Oct 2016 38 Figure 1.1 9 . Mean adjusted correlation ( R 2 ) values of models across all streams with July 2016 data across data granularity scenarios. D ISCUSSION My findings address the gaps in previous modeling studies I identified by answering h data How d o m odels p erform with July - r estricted d ata? clear picture of how model performance varied with different data granularity scenarios, as well as the possible reasons for variable model performances by ob serving the changes of model dynamics with data granularity. Revealing the changes in model dynamics by referring multicollinearity has given a better insight into the regression models that can be used when implementing these models in future studies. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Mod 1 Mod 2 Mod 3 Mod 4 Mod 5 Mod 6 Mod 7 Mod 8 Mod 9 Mod 10 Mod 11 39 How D oes M odel P erformance and C hoice V ary with D ata G ranularity? Selection of different data granularity, or time aggregation, scenarios had substantial effects on model performances and model outcomes. Higher data granularity increased overall prediction power of regression models with July - October data ( Table 1.1 ; Figure 1.30 ). Data granularity did not only change overall model prediction power but also changed decisions in model selections by influencing the model weights. For example, depending on the ecological perspective and purpose, Model 10 can be selected and used to make more accurate temperature gradient predictions with hourly data, whereas Model 11 can be selected for the same purpose with daily data, since model weight of Mod el 11 was the highest for most of the streams with daily data ( Table 1.2 : Figure 1.5 ). data granularity to be used in environmental studies? My study cannot directly answer this relevance of selected data granularity. To illustrate, predicting monthly stream temperature or temperature gradient averages to evaluate and simul ate the habitable streams for certain fish species (e.g., Zorn et. al. 2004: Zorn et.al. 2008) would be more plausible than using hourly stream temperature predictions since many fish species can tolerate the hourly variations in stream temperature. Theref ore, using monthly average would give a better perspective to simulate the fish distribution. On the other hand, if the daily change of stream discharge based on daily or weekly groundwater withdrawal is under focus, using greater data granularity scenario s (e.g., daily, weekly) would be the best decision ( Fleury et. al. 2009 ). granularity also depends on the expectations from the model performances. In the literature, there is no agreement on what the range of adjusted correlat ion coefficient ( R 2 ) should be used as 40 an indicator of good model performance ( Prairie 1996 ). For example, the value of 0.8 for R 2 may be considered as a low model performance in a study, while the value of 0.7 may be considered as high model performance i n another study depending on that field of research. Moreover, the selection. As a clear example, my study showed that the model prediction power (i.e., R 2 ) of Model 10 with weekly data granularity was the highest, whereas with the same data granularity, Model 11 had the highest model weight which is used in model selection processes based on the balance between model complexity - model prediction power (Aka ike 1973). is the best data granularity selection based on the purposes and the applying greater data granularity reduces the number data points, it consequently reduces the number of sharp variations within environm ental variables. As an example, my preliminary results based on July 2016 Tobacco River data showed that the air temperature tended to fluctuate during the day when hourly data is used. However, applying daily or higher data granularity (e.g., daily and we ekly) reduces these fluctuations ( Figure 1.33 ). As a result, overall model predictive power tended to increase with hourly data to weekly data granularity. My study also revealed the best - fitting model based on model predictive power and model weights. Mo del 10 had the highest overall model predictive power on temperature gradient among all regression models based on adjusted correlation coefficient ( R 2 ) considering all data granularity scenarios ( Figure 1.29 ). Model 10 is the most complex model having 8 i ndependent variables. As expected, model complexity increased the model predictive power. 41 Compared to models Model 8 and Model 9, Model 10 contains both altitude angle ( ) and day length ( S ) as driving variables, leading to higher predictive power. Since a ltitude angle, which is directly related with the amount of solar radiation reaching the stream, is very important especially during the summer season, variations in altitude angle helped Model 10 to have better prediction power. Moreover, day length is al so an important predictive parameter since it determines the amount of time that the stream is exposed to solar radiation. Therefore, using both parameters in the same model was critical. Additional evidence for the importance of these parameters is the mo del grouping based on model predictive power ( Figure 1.4 ). The models that included at least one of these parameters (e.g., Model 5 through Model 10) were grouped based on their distinctly higher predictive power, whereas the models that do not include the se parameters were grouped based on their lower predictive power. Model 10 also had the highest model weight in the majority on data granularity scenarios ( Figure 1.5 ). In other words, the trade - off between goodness of fit components and the model complexity was the lowest in Model 10. Therefore, I concluded that Model 10 should be selected as a linear regression model to make more reliable temperature gradient predictions, until another model is developed that can have better reliability - complexity balance. However, it was clearly observed that the model weight of Model 10 decreased with data granularity ( Figure 1.5 ). For example, Model 11 and Model 8 with daily data granularity had higher percentage of being the best model selection across streams. This was not the only conclusion. The results also showed that the less complex models began showing up as the best models for some streams as data granularity increased ( Table 1.2 ). This was an important finding, because it implied that higher model comp lexity may be a disadvantage, since less complex models can make temperature gradient predictions as good as the complex models as data granularity increases. This was a 42 consequence of decrease in the number of predicted temperature gradient (i.e., data p oints) as greater data granularity was used, making the model complexity less important, yet making the explanatory power of parameters in the model more important. Therefore, researchers should consider the data granularity when building the models, since the complexity may reduce the model efficiency (i.e., predictive power - model complexity balance). Moreover, designing less complex but more reliable models may save resources such as time, work force and finances during the data collection. The biggest picture that my findings posed was that the arbitrary selection of data granularity may have serious consequences, such as biases in model evaluations and model selection processes. More importantly, my literature readings showed that arbitrary selection o f data granularity has not been a big concern for many researchers and managers, and the reasons for data granularity selection were not detailly explained in many studies that deal with riverine systems modeling . However, if the reasoning for data granula rity selection is not purely based on ecological relevance (i.e., data granularity is selected arbitrarily), researchers can easily come up with conclusions on success of their models by using arbitrary data granularity selection, which may not be realisti c when other data granularity scenarios are considered. The solution, in these cases, might be to define the ecological relevance of a particular level of data granularity in the first place, then evaluate the models based on their performances. By doing s o, researchers may have a better understanding of the weaknesses of their models with ecologically relevant data granularity, and design models that do not only have higher performance, but also have ecological relevance with their purposes. 43 What are t he P ossibl e R easons for M odel P erformance and S election C hanges with D ata G ranularity? As above, I propose that one of the reasons for overall model performance increase with higher data granularity was the lower number of data points in the data. However, my preliminary results showed that the model prediction power may decrease with higher data granularity for some streams. For example, adjusted correlation value of Tobacco River was lower for 12 - hour data granularity when compared to the same val ue for 12 - hour and daily data granularities ( Figure 1.34 ). Likewise, the value of adjusted correlation decreased from 12 - hour to daily data granularity for Butterfield Creek, Carp River and Prairie River ( Figure 1.34 ). Therefore, lower number of data point s could not be the only reason for prediction power change with data granularity. Changes in model dynamics, which are caused by variation between parameter estimate ( ) values (i.e., parameter instability) across granularity scenarios, is likely a more pl ausible reason for model prediction power changes as well as the changes in model weights, since parameter estimate ( ) values indicate the weight (or influence) of each predictor variable on temperature gradient predictions. The simplest way to show the c hange of model dynamics was to observe the trends of mean parameter estimate ( ) values across data granularity scenarios. The instability of values, leading in some cases to a change in the sign of the value ( Figure 1.7 ), suggests that changes in data granularity changed the structure of data resulting in the instability of values . Mean parameter estimate instability was not the only critical finding. Increasing and decreasing trends of values across data granular ity showed a potential interaction between values ( Figure 1.6 and 1. 7 ). This was interpreted as a clear sign of interaction between parameter estimates ( ) . Observing the values of Up , Base and Over parameters across streams also supported this interpretation and revealed a sign of potential multicollinearity between values 44 ( Figure 1.31 and 1.32 ). Indeed, correlograms showed a clear increase of multicollinearity between values , meaning that the o verall independency of predictor variables decreased with data granularity ( Figure 1.8 to 1. 13 ). The correlation between these values across streams showed whether the weight of a predictor variable on predictions was changed with the weight of other predictor variables or not. Therefore, the higher the correlation, the higher the influences of pre dictor variables on each other. From the modeling perspective, if the correlation between environmental variables is high, those environmental variables cannot be considered as independent from each other, which violates one of the assumptions of linear re gression models, that is, independence of model variables. Especially some values (e.g., Up , Base and Over ) were found to be highly correlated across streams in all data granularity scenarios. This finding revealed that some predictor variables were correlated in the majority of streams. The first main conclusion was that data granularity increased the overall correlation between environmental variables in the raw data ( Figure 1.1 4 to 1. 16 ) and that this increase in multicollinearity likely contributes to the instability of parameter estimates across levels of granularity . A potential reason for higher overall correlation was that the nature of some environmental variables, such as the a ltitude a ngle ( ) varied system atically with data granularity based on the a ltitude a ngle equation. This caused substantial variation between the values during the daytime and nighttime. Daily and weekl y data reduced this variation increasing the correlation between a ltitude a ngle and d ay l ength ( S ) . The correlation between a ltitude a ngle versus b aseflow heat flow ( Base ) and o verflow heat flow ( Over ) increased with data granularity for the same reason. The signs of r were also helpful to better understand the relationship between variables. As expected, the level of data granularity did not change the 45 negative and positive sign of correlations because increasing time granularity should not have any substantial effect on increasing and decreasing trends of environmental variables. The second main conclusion was that some of the environmental variables used in Model 10 were naturally correlated. Both discharge variables (i.e., Q Up and Q Down - Q Up ) were highly correlated with Up . This was an expected result, considering the equation for Up that includes ratio of upstream discharge ( Q up ) and downstream discharge ( Q D own ) ( Appendix C: Model Parameter Calculation ) . In addition, obtaining a high r between Q Up and Q Down - Q Up was also an expected outcome since the value of Q Down - Q Up was highly dependent on Q Up . Furthermore, some environmental variables were found negatively or positively correlated. Negative correlation between Up versus both discharge variables (i.e., Q Up and Q Down - Q Up ) was observed. This was a consequence of nature of the equation of Up that includes upstream discharge ( Q up ) as numerator ( Appendix C: Model Parameter Calculation ) . In other words, as upstream discharge ( Q up ) increased, Up decreased. Moreover, d ay l ength and a ltitude a ngle were found positively correlated. This result matched with the natural processes since both variables mostly decrease between June and October in the North ern Hemisphere. All these observations support ed my conclusion that data granularity affects th e multicollinearity between environmental variables in raw data, consequently affecting parameter estimates ( ) and outcome of regression models . Therefore, the change in multicollinearity would certainly be of concern for decision - makers on environmental issues since multicollinearity affects model designing and selection processes. For example, i ncreased multicollinearity makes it hard to separate the individual effect s of each environmental variable ( Alin 2010 ) , as a result, making it hard to resolve the influence of driving environmental variables . Therefore, data granularity selection and potential multicollinearity between 46 environmental variables should be considered together while designing models. As an example, including a ltitude a ngle in the model may be redundant if the d ay l ength is also included if daily or weekly data will be used. For the same reason, there may be utility in avoiding the inclusion of naturally correlated environmental variables such as Q Up and Q Down Q up . Also, i ncluding correlated environmental variables may magnify the effect of a certain parameter (e.g. , upstream discharge ) on response variable and may cause uncertainties on evaluation of the effect of environmental variables. Another advantage of eliminating r edundant environmental variables is that it may significantly reduce the effort for collecting environmental data and effort for modeling applications. A downside of this approach, however, is that overall predictive power may be lost due to the removal of variables ( ). Put another way, the cost of predictor variables may be correlated is that the parameter estimates are unstable, and as such, difficult to interpret. How Do Models Perform with July - Restricted Data? Overall model predictive power across data granularity did not substantially change with July - restricted data ( Figure 1.1 7 ). In addition, I found that no significant difference between model predictive power for July - restricted and June - October data within the same data granularity scenario ( Table 1.14 ). Although the sample size, (i.e., period of data, the number of data granularity scenarios or the number of streams) may not be enough t o conclude that the effect of data granularity was significantly changed, one can expect that using longer time period (i.e., June - October) may cause lower prediction powers ( Tian et. al. 2017 ), since there would be larger variations within the same enviro nmental variable. For example, the variation in day length and altitude angle during June - October would be higher than July - restricted data, which 47 may lead low fit between observed and predicted temperature gradient. However, it should also be considered t hat limiting time period may increase multicollinearity between environmental variables and change the model outputs and performance ( Cropper 1984 ). Therefore, without a multicollinearity analysis, it was hard to come up with a conclusion on the exact reas ons for insignificant effects of using July - restricted data on model predictive powers. My results also revealed the fact that model selection, based on model prediction power, can be affected by time period selection. Using July - restricted data reduced t he effect of model complexity since Model 1 and Model 2 were grouped as the least - fitting models ( Figure 1.1 9 ), whereas Model 1 through 4 were grouped as the least - fitting models when June - October data were used ( Figure 1.4 ). This was a clear sign for the effect of time period on the importance of environmental variables. As the data were restricted to July, the importance of day length ( S ) and altitude angle ( ) parameters, which appeared in Model 5 and upper models, was reduced, therefore, Model 3 and 4 w ere grouped with best - fitting models, even though they lacked these parameters. These results emphasize the importance of time period selection when interpreting the output of models. Although my results showed no significant differences on model performa nces, using larger time periods may increase or decrease biases between predicted and observed values ( Jetten et. al. 1999 ; Tian et. al. 2017 ) . In both cases, decision - makers need to decide between the model performance and the purpose of their study. As I explained, the purpose of the study naturally overrides the model performance expectations in most cases, that is, decision - makers favor ecological relevance over model performance. The perspective that my study brought to this issue is that the models ca n be designed or re - adjusted by using different time periods (e.g., my study showed that using day length and altitude angle may not be necessary for July - 48 restricted data). This approach will reveal the critical environmental variables that should be used in their models or point out the redundant parameters, eventually resulting to better model predictions. By understanding the effect of data granularity and different time periods on their model performances, researchers can optimize and use their models w ithout losing the ecological relevance of their data and without reducing their expectations from model reliability. C ONCLUSIONS AND I MPLICATIONS Although this research provides a variety of insights into hydrological modeling, the following conclusions are of most importance: 1) Selection of data granularity is a significant factor in modeling applications as it directly affects parameter estimates, model selection, and goodness of fit measures. Therefore, arbitrary selection of data granularity may lead to conflicting insights across studies where none exist. If the selection of data granularity does not include a strong ecological relevance, then the model performance should be one of the biggest concerns while deciding on data granularity. Ano ther concern should be the effect of data granularity on multicollinearity between predictor variables since multicollinearity may influence the model dynamics and performance. Because of different responses of models and streams to data granularity, it wa However, my study clearly showed that model performance changes with the type of data granularity, giving a better perspective to researchers on possible consequences of arbitrary data granula rity selections. More research on this topic is needed in ecological and environmental sciences, considering the lack of studies enlightening the remaining unknowns at this topic. A better understanding on the implications of data granularity 49 will help res earchers to design models that work best for their purposes and this will lead to more accurate decisions on ecological implications. 2) The best - fitting model among the regression models was Model 10, however multicollinearity analyses showed that some of the parameters in Model 10 were dependent, which violates one of the assumptions of linear regression models. Thus, I suggest that addit ional work could be done to improve this model. More analysis, such as Variance Inflation Factors, on multicollinearity can be done to have a better understanding on which parameters are mostly causing the multicollinearity. Modifying Model 10, such as dis carding and adding parameters, based on my findings may decrease the dependency of predictor variables to each other and this may lead a better understanding of which environmental variables have a greater effect on temperature gradient. Improvements to th is model will help to improve predictions relevant to environmental applications, such as predicting the fish distributions based on temperature gradient predictions, the effect of variations in climate and the impact of groundwater withdrawal on stream te mperature changes (Carlson et. al. 2020). 3) Variation between the same environmental variables across different streams showed that t he characteristics of streams influence model dynamics and reliability . Because it is hard to design models that are specifi c to each stream, classifying streams based on some characteristics may help to find a generalized model for each stream class. Finding generalized models may reduce the costs of data collection and improve the model performances that result to more robust predictions that will help decision makers a better perspective in natural resource management. Because the effect of stream classification 50 on model performances and the selection of time aggregation is not well studied, I explore this issue in the next c hapter o f my thesis . 4) Using July - restricted data did not substantially influence overall model performances . Different time periods can either reduce or increase the influence of environmental variables on temperature gradient predictions. However, my find ings are insufficient to conclude whether restricting data improves model performance or not . In fact, time period selection is critically dependent on the purpose of a study and ecological relevance of time period. Therefore, selection of the model and ti me period is study specific. However, optimizing the models by using different time periods can be helpful to maximize model performance within an ecologically relevant time period. 51 APPENDICES 52 APPENDIX A: Tables Table 1.4 . Stream temperature models (Magnusson et. al. 2012; Andrews 2019). Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 Model 10 Model 11 53 Table 1.5 . Intercept and parameter estimate values of Model 10 across different data gra nularity. Tobacco River June October 2016 data was used. Data Granularity (hours) 1 2 6 12 24 168 Intercept 0.627 0.629 0.868 1.337 1.092 0.780 T a - T w 0.023 0.023 0.014 0.001 - 0.023 - 0.030 Q Up - 1.166 - 1.169 - 1.430 - 2.045 - 2.630 - 1.383 Q Down Q Up - 0.579 - 0.583 - 0.522 0.242 0.366 - 0.436 S - 0.006 - 0.006 - 0.023 - 0.051 0.000 - 0.056 0.019 0.019 0.022 0.023 0.001 0.000 Up - 0.223 - 0.225 - 0.221 - 0.055 - 0.088 - 0.071 Base 0.138 0.139 0.125 - 0.001 0.008 - 0.016 Over - 0.144 - 0.145 - 0.138 - 0.031 - 0.060 - 0.067 54 Table 1.6 . Streams and rivers with their regions (SLP: Southern Lower Peninsula; NLP: Northern Lower Peninsula; UP: Upper Peninsula), thermal classes, upstream latitudes, upstream longitude, downstream latitude, and downstream longitude (Zorn et. al. 2008; Andrews 20 19). Stream Region Thermal Class Upstream Latitude Upstream Longitude Downstream Latitude Downstream Longitude Pokagon Creek SLP C 41.89517 - 86.162632 41.915803 - 86.175679 Pigeon River SLP CT 42.932887 - 86.081828 42.91636 - 86.146075 Nottawa Creek SLP WT 42.192564 - 85.060415 42.195998 - 85.104618 Middle Branch Tobacco River SLP WT 43.909194 - 84.697312 43.929905 - 84.666327 Hasler Creek SLP W 43.042332 - 83.423206 43.083594 - 83.442947 Prairie River SLP W 41.801832 - 85.116614 41.832568 - 85.165065 Swan Creek SLP W 41.90477 - 85.297885 41.921249 - 85.312047 Cedar Creek NLP C 44.375846 - 85.972647 44.369588 - 85.999598 Cedar River NLP C 44.956875 - 85.132748 44.968664 - 85.138993 East Branch Black River NLP C 45.070651 - 84.283728 45.089439 - 84.284929 Butterfield Creek NLP CT 44.273249 - 85.094087 44.256377 - 85.03362 Morgan Creek UP C 46.519698 - 87.504502 46.521351 - 87.494782 Spring Creek UP CT 46.512909 - 90.156133 46.513418 - 90.177011 Carp River UP CT 46.509131 - 87.418924 46.510534 - 87.388497 Middle Branch Escanaba River UP WT 46.420206 - 87.797962 46.398398 - 87.770883 Squaw Creek UP W 46.057035 - 87.18974 45.985396 - 87.140559 55 Table 1.7 . Starting and ending day of year for sampling in each stream for 2016 . Stream Name Data Start Date Data End Date Black River 177 284 Butterfield Creek 144 296 Carp River 151 285 Cedar Creek 144 296 Cedar River 143 296 Escanaba River 151 284 Hasler Creek 160 315 Morgan Creek 151 285 Nottawa Creek 138 289 Pigeon River 137 307 Pokagon Creek 137 307 Prairie River 138 289 Spring Creek 151 284 Squaw Creek 152 285 Swan Creek 139 289 Tobacco River 144 287 56 Table 1.8 . Correlation matrix of values with hourly data. Model 10 with seasonal data was used to obtain values. Correlation between variables were obtained across streams. Parameter Q up Q Do wn Q up S T a - T w Up Base Over Q up 1.000 - 0.665 0.233 0.224 - 0.585 - 0.326 0.493 - 0.35 Q D own Q up 1.000 - 0.361 - 0.194 0.686 0.339 - 0.476 0.453 S 1.000 - 0.615 0.152 - 0.475 0.810 - 0.600 1.000 - 0.500 0.156 - 0.269 0.195 T a - T w 1.000 - 0.257 0.003 - 0.130 Up 1.000 - 0.759 0.957 Base 1.000 - 0.832 Over 1.000 57 Table 1.9 . Correlation matrix of values with 2 - hour data granularity . Model 10 with seasonal data was used to obtain values. Correlation between variables were obtained across streams. Parameter Q up Q Do wn Q up S T a - T w Up Base Over Q up 1.000 - 0.599 0.159 0.248 - 0.603 - 0.273 0.440 - 0.295 Q D own Q up 1.000 - 0.267 - 0.256 0.695 0.318 - 0.444 0.420 S 1.000 - 0.605 0.155 - 0.332 0.616 - 0.429 1.000 - 0.505 0.086 - 0.239 0.108 T a - T w 1.000 - 0.245 - 0.003 - 0.119 Up 1.000 - 0.758 0.959 Base 1.000 - 0.839 Over 1.000 58 Table 1.10 . Correlation matrix of values with 6 - hour data granularity . Model 10 with seasonal data was used to obtain values. Correlation between variables were obtained across streams. Parameter Q up Q Do wn Q up S T a - T w Up Base Over Q up 1.000 - 0.371 0.053 0.219 - 0.582 - 0.239 0.368 - 0.200 Q D own Q up 1.000 - 0.289 - 0.198 0.583 0.320 - 0.403 0.382 S 1.000 - 0.647 0.191 - 0.321 0.566 - 0.410 1.000 - 0.473 0.077 - 0.287 0.137 T a - T w 1.000 - 0.197 - 0.008 - 0.106 Up 1.000 - 0.783 0.959 Base 1.000 - 0.858 Over 1.000 59 Table 1.11 . Correlation matrix of values with 12 - hour data granularity . Model 10 with seasonal data was used to obtain values. Correlation between variables were obtained across streams. Parameter Q up Q Do wn Q up S T a - T w Up Base Over Q up 1.000 - 0.614 - 0.022 0.117 - 0.739 - 0.187 0.406 - 0.158 Q D own Q up 1.000 - 0.016 - 0.299 0.760 - 0.018 - 0.151 0.010 S 1.000 - 0.749 0.150 - 0.432 0.531 - 0.554 1.000 - 0.169 0.293 - 0.444 0.523 T a - T w 1.000 - 0.170 - 0.106 - 0.005 Up 1.000 - 0.895 0.907 Base 1.000 - 0.903 Over 1.000 60 Table 1.12 . Correlation matrix of values with daily data granularity . Model 10 with seasonal data was used to obtain values. Correlation between variables were obtained across streams. Parameter Q up Q Do wn Q up S T a - T w Up Base Over Q up 1.000 - 0.654 - 0.145 0.288 - 0.811 - 0.084 0.392 - 0.141 Q D own Q up 1.000 - 0.226 - 0.096 0.764 0.057 - 0.293 0.204 S 1.000 - 0.474 0.228 - 0.560 0.461 - 0.479 1.000 - 0.277 0.416 - 0.268 0.352 T a - T w 1.000 - 0.227 - 0.104 0.003 Up 1.000 - 0.909 0.896 Base 1.000 - 0.888 Over 1.000 61 Table 1.13 . Correlation matrix of values with weekly data granularity . Model 10 with seasonal data was used to obtain values. Correlation between variables were obtained across streams. Parameter Q up Q Do wn Q up S T a - T w Up Base Over Q up 1.000 - 0.838 0.055 - 0.036 - 0.904 0.081 0.127 - 0.040 Q D own Q up 1.000 - 0.041 - 0.006 0.930 - 0.355 0.197 - 0.212 S 1.000 - 0.985 - 0.034 0.044 0.015 0.115 1.000 0.017 - 0.033 - 0.023 - 0.080 T a - T w 1.000 - 0.298 0.067 - 0.052 Up 1.000 - 0.925 0.861 Base 1.000 - 0.900 Over 1.000 62 Table 1.14 . Mean adjusted correlation ( R 2 ) values from July - restricted and June - October across models . Student t - test was used to find p - values. Mean Adjusted Correlation Data Granularity (hour) July 2016 June - October 2016 p - value 1 0.320 0.255 0.168 2 0.322 0.272 0.287 6 0.332 0.289 0.386 12 0.336 0.357 0.702 24 0.387 0.453 0.258 63 APPENDIX B: Figures Figure 1.20 . Air temperature downstream temperature ( T a T w ) across observed temperature gradient of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016. -20 0 20 40 60 -1 0 1 2 3 T a - T w ( C) Temperature Gradient ( C) -20 0 20 40 60 -1 0 1 2 3 -20 0 20 40 60 -1 0 1 2 3 -20 0 20 40 60 -1 0 1 2 3 Temperature Gradient ( C) -20 0 20 40 60 -1 0 1 2 3 T a - T w ( C) Temperature Gradient ( C) -20 0 20 40 60 -1 0 1 2 3 a b c d e f 64 Figure 1.21 . Upstream discharge ( Q Up ) (cubic meters per second CMS) across observed temperature gradient of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016. 0 0.2 0.4 0.6 0.8 1 -1 0 1 2 3 Q Up (CMS) Temperature Gradient ( C) 0 0.2 0.4 0.6 0.8 1 -1 0 1 2 3 0 0.2 0.4 0.6 0.8 1 -1 0 1 2 3 0 0.2 0.4 0.6 0.8 1 -1 0 1 2 3 Temperature Gradient ( C) 0 0.2 0.4 0.6 0.8 1 -1 0 1 2 3 Q Up (CMS) Temperature Gradient ( C) 0 0.2 0.4 0.6 0.8 1 -1 0 1 2 3 a b c d e f 65 Figure 1.22 . Upstream discharge downstream discharge ( Q Up Q Down ) (cubic meters per second CMS) across observed temperature gradient of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016. -0.5 0 0.5 1 1.5 -1 0 1 2 3 Q Up - Q Down (CMS) Temperature Gradient ( C) -0.5 0 0.5 1 1.5 -1 0 1 2 3 -0.5 0 0.5 1 1.5 -1 0 1 2 3 -0.5 0 0.5 1 1.5 -1 0 1 2 3 Temperature Gradient ( C) -0.5 0 0.5 1 1.5 -1 0 1 2 3 Q Up - Q Down (CMS) Temperature Gradient ( C) -0.5 0 0.5 1 1.5 -1 0 1 2 3 a b c d e f 66 Figure 1.23 . Day length ( S ) across observed temperature gradient of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016. 0 5 10 15 20 -1 0 1 2 3 Day Length (hours) Temperature Gradient ( C) 0 5 10 15 20 -1 0 1 2 3 0 5 10 15 20 -1 0 1 2 3 0 5 10 15 20 -1 0 1 2 3 Temperature Gradient ( C) 0 5 10 15 20 -1 0 1 2 3 Day Length (hours) Temperature Gradient ( C) 0 5 10 15 20 -1 0 1 2 3 a b c d e f 67 Figure 1.24 . Altitude angle ( ) across observed temperature gradient of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016. 0 20 40 60 80 -1 0 1 2 3 Altitude Angle Temperature Gradient ( C) 0 20 40 60 80 -1 0 1 2 3 0 20 40 60 80 -1 0 1 2 3 0 20 40 60 80 -1 0 1 2 3 Temperature Gradient ( C) 0 20 40 60 80 -1 0 1 2 3 Altitude Angle Temperature Gradient ( C) 0 20 40 60 80 -1 0 1 2 3 a b c d e f 68 Figure 1.25 . Upstream heat f low ( Up ) across observed temperature gradient of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016. -10 -5 0 5 10 -1 0 1 2 3 T Up ( C) Temperature Gradient ( ° C) -10 -5 0 5 10 -1 0 1 2 3 -10 -5 0 5 10 -1 0 1 2 3 -10 -5 0 5 10 -1 0 1 2 3 Temperature Gradient ( C) -10 -5 0 5 10 -1 0 1 2 3 T Up ( C) Temperature Gradient ( C) -10 -5 0 5 10 -1 0 1 2 3 a b c d e f 69 Figure 1.26 . Baseflow heat f low ( Base ) across observed temperature gradient of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016. -25 -15 -5 5 15 25 -1 0 1 2 3 T Base ( C) Temperature Gradient ( C) -25 -15 -5 5 15 25 -1 0 1 2 3 -25 -15 -5 5 15 25 -1 0 1 2 3 -25 -15 -5 5 15 25 -1 0 1 2 3 Temperature Gradient ( C) -25 -15 -5 5 15 25 -1 0 1 2 3 T Base ( C) Temperature Gradient ( C) -25 -15 -5 5 15 25 -1 0 1 2 3 a b c d e f 70 Figure 1.27 . Overflow heat fl ow ( Over ) across observed temperature gradient of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016. -30 -20 -10 0 10 20 30 -1 0 1 2 3 T Over ( C) Temperature Gradient ( C) -30 -20 -10 0 10 20 30 -1 0 1 2 3 -30 -20 -10 0 10 20 30 -1 0 1 2 3 -30 -20 -10 0 10 20 30 -1 0 1 2 3 Temperature Gradient ( C) -30 -20 -10 0 10 20 30 -1 0 1 2 3 T Over ( C) Temperature Gradient ( C) -30 -20 -10 0 10 20 30 -1 0 1 2 3 a b c d e f 71 Figure 1.28 . Observed and predicted temperature gradient (°C) of Tobacco River with hourly (a), 2 - hour (b), 6 - hour (c), 12 - hour (d), daily (e), and weekly (f) data granularity between June - October 2016. Predictions were obtained from Model 10. -3 -2 -1 0 1 2 3 -2.5 -1.5 -0.5 0.5 1.5 2.5 Predicted Observed -3 -2 -1 0 1 2 3 -2.5 -1.5 -0.5 0.5 1.5 2.5 -3 -2 -1 0 1 2 3 -2.5 -1.5 -0.5 0.5 1.5 2.5 -3 -2 -1 0 1 2 3 -2.5 -1.5 -0.5 0.5 1.5 2.5 Observed -3 -2 -1 0 1 2 3 -2.5 -1.5 -0.5 0.5 1.5 2.5 Predicted Observed -3 -2 -1 0 1 2 3 -2.5 -1.5 -0.5 0.5 1.5 2.5 a b c d e f 72 Figure 1.2 9 . Mean adjusted correlation ( R 2 ) values of models based on averaging all data granularity scenarios with June - October 2016 data. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6 7 8 9 10 11 12 73 Figure 1. 30 . Mean adjusted correlation ( R 2 ) values of each data granularity scenarios based on all regression models with June - October data. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 Data Granularity (hour ) 74 Figure 1. 31 . Parameter estimate ( ) values of some predictor variables across streams with hourly June - October 2016 data. Model 10 was used to obtain values for each stream. -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Black River Butterfield Creek Carp River Cedar Creek Cedar River Escanaba River Hasler Creek Morgan Creek Nottawa Creek Pigeon River Pokagon Creek Prairie River Spring Creek Squaw Creek Swan Creek Tobacco River 75 Figure 1. 32 . Parameter estimate ( ) values of some predictor variables across streams with week ly June - October 2016 data. Model 10 was used to obtain values for each stream. -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Black River Butterfield Creek Carp River Cedar Creek Cedar River Escanaba River Hasler Creek Morgan Creek Nottawa Creek Pigeon River Pokagon Creek Prairie River Spring Creek Squaw Creek Swan Creek Tobacco River Parameter Estimate ( ) TUp TBase 76 Figure 1. 33 . Air temperature across time with hourly (a), daily (b) and weekly (c) data granularity. Tobacco River July 2016 (July 1 July 31) data were used. 0 5 10 15 20 25 30 35 40 0 48 96 144 192 240 288 336 384 432 480 528 576 624 672 720 768 Time (hour) 0 5 10 15 20 25 30 35 40 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Air Temperature ( C) 0 5 10 15 20 25 30 35 40 0 1 2 3 4 5 6 77 Figure 1. 34 . Adjusted correlation ( R 2 ) values across data granularity. Model 10 was used with June - October 2016 data. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 Data granularity (hr) Butterfield Creek Carp River Prairie River Tobacco River 78 APPENDIX C: Model Parameter Calculation Eqn . 10. T f low (Andrews 2019). , where Q up : upstream discharge (cms); Q down : downstream discharge (cms); Q Base : baseflow discharge (cms); T Up : upstream temperature (°C), T gw : groundwater temperature; : average air temperature of every 12 - hour. Eqn. 11. T up (Andrews 2019). Eqn. 12. T base (Andrews 2019). , where T base : baseflow temperature (cms). Eqn. 13. T Over (Andrews 2019). Eqn. 14. day length ( S ) (Andrews 2019). , , where lat : latitude and : declination angle of the S un : , where x : the number of days since the vernal equinox (March 21). Eqn. 15. , , where LST : local standard time, long : longitude . 79 Eqn. 15. , where N : day of the year. 80 APPENDIX D: RStudio Codes Output data =aggregate ( Raw data , by=list ( Raw data $ Day_of_Year, Raw data $X2hours), FUN=mean) # obtaining 2 - hour data granularity. Output data =aggregate ( Raw data , by=list ( Raw data $ Day_of_Year, Raw data $X 6 hours), FUN=mean) # obtaining 6 - hour data granularity. Output data =aggregate ( Raw data , by=list ( Raw data $ Day_of_Year, Raw data $X 12 hours), FUN=mean) # obtaining 12 - hour data granularity. Output data =aggregate ( Raw data , by=list ( Raw data $ Day_of_Year, Raw data $ daily ), FUN=mean) # obtaining daily data granularity. Output data =aggregate ( Raw data , by=list ( Raw data $ Day_of_Year, Raw data $ weekly ), FUN=mean) # obtaining weekly data granularity. Model10< - lm (down_up_delta_tempc~air_tempc_minus_up_tempc+up_dischargecms+down_up_delta_dischar ge+day_length+altitude_angle+up_heat_load +base_heat_load+over_heat_load, data= Output data ) # simulation of Model 10. summary (Model10) # obtaining adjusted correlation ( R 2 ) and parameter estimates ( ). AIC < - AIC (Model1, Model2, Model3, Model4, Model5, Model6, Model7, Model8, Model9, Model10, Model11 , k=2) # obtaining AIC results for each granularity scenario and stream. s ummary ( AIC ) # summarizing AIC results for each granularity scenario and stream. Correlation matrix < - cor( Paramater estimate data ) #obtaining correlation matrix for each data granularity scenario based on parameter estimates of Model 10. r ound ( Correlation matrix ,2) # rounding the numbers in Correlation matrix . install.packages("writexl") library("writexl") package. write_xlsx( Correlation matrix ," File destination/ Correlation matrix table .xlsx") install.packages("corrplot") # installing "corrplot colnames( Correlation matrix )< - c(" Qup "," Qdown - Qup - Tw "," "," "," ") #setting column names of correlogram. rownames( Correlation matrix )< - c(" Qup "," Qdown - Qup ," Ta - Tw "," "," "," ") #setting row names of correlogram. library(corrplot) # extracting "corrplot corrplot( Correlation matrix , type = "upper", order = "alphabet", tl.col = "black", tl.srt = 45) # obtaining correlogram for each data granularity scenario. 81 BI BLIOGRAPHY 82 BIBLIOGRAPHY Ahmadi - Nedushan, B., St. - Hilaire, A., Ouarda, T. B. M. J., Bilodeau, L., Robichaud, É., Thiémonge, N., & Bobée , B. (2007). Predicting river water temperatures using stochastic models: Case study of the Moisie River (Québec, Canada). Hydrological Processes . https://doi.org/10.1002/hyp.6353 Akaike, H. (1973). Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika . https://doi.org/10.1093/biomet/60.2.255 Alin, A. (2010). Multicollinearity. Wiley Interdisciplinary Reviews: C omputational Statistics . https://doi.org/10.1002/wics.84 Álvarez - Cabria, M., Barquín, J., & Peñas, F. J. (2016). Modeling the spatial and seasonal variability of water quality for entire river networks: Relationships with natural and anthropogenic factors. Science of the Total Environment . https://doi.org/10.101 6/j.scitotenv.2015.12.109 . Anderson, S. W., & Konrad, C. P. (2019). Downstream - Propagating Channel Responses to Decadal - Scale Climate Variability in a Glaciated River Basin. Journal of Geophysical Research: Earth Surface . https://doi.org/10.1029/2018JF004734 . Andrews, R. ( 2019 ) . Effects of flow reduction on thermal dynamics of streams: improving an . M.S. Thesis, Michigan State University, East Lansing, MI. Bartholow, J. (2000a). The Stream Segment and Stream Network Temperature Models. Technical Report. U.S. Department of Interior. U.S. Geological Survey. Bartholow, J. M., Campbell, S. G., & Flug, M. (2004). Predicting the the rmal effects of dam removal on the Klamath river. Environmental Management . https://doi.org/10.1007/s00267 - 004 - 0269 - 5 Benyahya, L., Caissie, D., St - Hilaire, A., Ouarda, T. B. M. ., & Bobée, B. (200 7). A Review of Statistical Water Temperature Models. Canadian Water Resources Journal . https://doi.org/10.4296/cwrj3203179 Borman, M. M., & Larson, L. L. (2003). A case study of river temperature response to agricultural land use and environmental thermal patterns. Journal of Soil and Water Conservation . 83 Caissie, D., El - Jabi, N., & Satish, M. G. (2001). Modeling of maximum da ily water temperatures in a small stream using air temperatures. Journal of Hydrology . https://doi.org/10.1016/S0022 - 1694(01)00427 - 9 Carlson, A. K., Taylor, W. W., & Infante, D. M. (2020). Modeling effects of climate change on Michigan brown trout and rain bow trout: Precipitation and groundwater as key predictors. Ecology of Freshwater Fish . https://doi.org/10.1111/eff.12525 Carpenter, S. R., Fisher, S. G., Grimm, N. B., & Kitchell, J. F. (1992). Global Cha nge and Freshwater Ecosystems. Annual Review of Ecology and Systematics . https://doi.org/10.1146/annurev.es.23.110192.001003 Cheng, S. T., & Wiley, M. J. (2016). A Reduced Parameter Strea m Temperature Model (RPSTM) for basin - wide simulations. Environmental Modeling and Software . https://doi.org/10.1016/j.envsoft.2016.04.015 Crittenden, R. N. (1978). Sensitivity analysis of a theoretical energy balance model for water temperatures is small streams, Ecological Modeling , Volume 5, Issue 3, 1978, Pages 207 - 224, ISSN 0304 - 3800, https://doi.org/10.1016/0304 - 3800(78)90021 - 2 . Cropper, J. (1984). Multicollinearity within selected western north American temperature and precipitation data sets. Tree Ring Bulletin . Daoud, J. I. (2018). Multicollinearity and Regression Analy sis. Journal of Physics: Conference Series . https://doi.org/10.1088/1742 - 6596/949/1/012009 Dudgeon, D., Arthington, A. H., Gessner, M. O., Kawabata, Z. I., Knowler, D. J., Lévêque, C., conservation challenges. Biological Reviews of the Cambridge Philosophical Soci ety . https://doi.org/10.1017/S1464793105006950 Edinger, John & Brady, D.K. & Geyer, J.C. (1974). Heat Exchange and Transport in the Environment. Farrar, D. E., & Glauber, R. R. (1967). Multicollin earity in Regression Analysis: The Problem Revisited. The Review of Economics and Statistics . https://doi.org/10.2307/1937887 Ficke, A. D., Myrick, C. A., & Hansen, L. J. (2007). Potential impacts of global climate change on freshwater fisheries. Reviews in Fish Biology and Fisheries . https://doi.org/10.1007/s11160 - 007 - 9059 - 5 . Fleury, P., Ladouche, B., Conroux, Y., Jourde, H., & Dörfliger, N. (2009). Modeling the hydrologic functions of a karst aquifer under active water management - The Lez spring. Journal of Hydrology . https://doi.org/10.1016/j.jhydrol.2008.11.037 84 Galbraith, H. S., & Vaughn, C. C . (2009). Temperature and food interact to influence gamete development in freshwater mussels. Hydrobiologia . https://doi.org/10.1007/s10750 - 009 - 9933 - 3 Haitovsky , Y. (1969). Multicollinearity in Regression Analysis: Comment. The Review of Economics and Statistics . https://doi.org/10.2307/1926450 Hill, R. A., Hawkins, C. P., & Carlisle, D. M. (2013). Predicting therm al reference conditions for USA streams and rivers. Freshwater Science . https://doi.org/10.1899/12 - 009.1 Isaak, D. J., Wollrab, S., Horan, D., & Chandler, G. (2012). Climate change effects on stream and riv er temperatures across the northwest U.S. from 1980 - 2009 and implications for salmonid fishes. Climatic Change . https://doi.org/10.1007/s10584 - 011 - 0326 - z . Iversen, T.M. (1971). The ecology of a mosquito population (A~des communis ) in a temporary pool in a Danish beech wood. Arch. Hydrobiol., 69: 309 - 332. Jackson, H. M., Gibbins, C. N., & Soulsby, C. (2007). Role of discharge and temperature variation in determining invertebrate community struct ure in a regulated river. River Research and Applications . https://doi.org/10.1002/rra.1006 Janssen, P. H. M., & Heuberger, P. S. C. (1995). Calibration of process - oriented models. Ecological Modeling . https://doi.org/10.1016/0304 - 3800(95)00084 - 9 Jetten, V., De Roo, A., & Favis - Mortlock, D. (1999). Evaluation of field - scale and catchment - scale soil erosion models. Catena . https://doi.org/10.1016/S0341 - 8162(99)00037 - 5 Johnson, A. N., Boer, B. R., Woessner, W. W., Stanford, J. A., Poole, G. C., Thomas, S. A., & - diameter temperatur e logger for documenting ground water - river interactions. Ground Water Monitoring and Remediation . https://doi.org/10.1111/j.1745 - 6592.2005.00049.x Kim, M . Bradlow, E . and Iyengar, R. (2019) Selecting Data Granularity Using the Power Likelihood. Available at SSRN: https://ssrn.com/abstract=3453170 or http://dx .doi.org/10.2139/ssrn.3453170 Kirchner, J. W. (2006). Getting the right answers for the right reasons: Linking measurements, analyses, and models to advance the science of hydrology. Water Resources Research . https://doi.org/10.1029/2005WR004362 Kools, L., & Phillipson, F. (2016). Data granularity and the optimal planning of distributed generation. Energy . https://doi.org/10.1016/j.energy.2016.06.089 . Kroll, C. N., & Song, P. (2013). Impact of multicollinearity on small sample hydrologic regression models. Water Resources Research . https://doi.org/10.1002/wrcr.20315 85 Magnusson, J., Jonas, T., & Kirchner, J. W. (2012). Temperature dynamics of a proglacial stream: Identifying dominant energy balance components and inferring spatially integrated hydraul ic geometry. Water Resources Research . https://doi.org/10.1029/2011WR011378 Mason, C. H., & Perreault, W. D. (1991). Collinearity, Power, and Interpretation of Multiple Regression Analysis. Journal of Marketing Research . https://doi.org/10.2307/3172863 Mayer, T. D. (2012). Controls of summer stream temperature in the Pacific Northwest. Journal of Hydrology . https://doi.org/10.1016/j.jhydrol.2012.10 .012 Mor ales - Marín, L. A., Rokaya, P., Sanyal, P. R., Sereda, J., & Lindenschmidt, K. E. (2019). Changes in streamflow and water temperature affect fish habitat in the Athabasca River basin in the context of climate change. Ecological Modeling . https://doi.org/10.1016/j.ecolmodel.2019.108718 . Morin, G., and Couillard, D. ( 1990 ) . Predicting river temperatures with a hydrological model. In Encyclopedia of fluid mechanic,surface and groundwater flow phen omena. Edited by N.P.Chereminisoff. Gulf Publishing Company, Huston, Tex., Vol. 10,Chap. 5. pp. 171 209. Nace, R. L. (1967). Water resources: A global problem with local roots. Environmental Science and Technology. 1. No. 7. July 1967. Nuhfer, A. J., Zo rn, T. G., & Wills, T. C. (2017). Effects of reduced summer flows on the brook trout population and temperatures of a groundwater - influenced stream. Ecology of Freshwater Fish . https://doi.org/10.1111/eff.12259 Not a Good Idea*. Social Science Quarterly . https://doi.org/10.1111/ssqu.12273 Woodward, G. (2019). Precipitation and temperature drive continental - scale patterns in stream invertebrate produc tion. Science Advances . https://doi.org/10.1126/sciadv.aav2348 Pilgrim, J. M., Fang, X., & Stefan, H. G. (1998). Stream temperature correlations with air temperatures in Minnesota: Implications for cl imate warming. Journal of the American Water Resources Association . https://doi.org/10.1111/j.1752 - Transactions of the Institute of British Geographers . https://doi.org/10.2307/621706 Prairie, Y. T. (1996). Evaluating the predictive power of regression models. Canadian Journal of Fisheries and Aquatic Sciences . https://doi.org/10.1139/cjfas - 53 - 3 - 490 Ruehl, C., Fisher, A. T., Hatch, C., Huertos, M. L., Stemler, G., & Shennan, C. (2006). Differential gauging and tracer tests resolv e seepage fluxes in a strongly - losing stream. Journal of Hydrology . https://doi.org/10.1016/j.jhydrol.2006.03.025 86 Sala, O. E., Chapin, F. S., Armesto, J. J., Berlow (2000). Global biodiversity scenarios for the year 2100. Science . https://doi.org/10.1126/science.287.5459.1770 Sand - Jensen, K. (1989). Environmen tal variables and their effect on photosynthesis of aquatic plant communities. Aquatic Botany . https://doi.org/10.1016/0304 - 3770(89)90048 - X . Seber, G. A. F., & Wild, C. J. (1989). Autocorrela ted Errors. In Nonlinear Regression . https://doi.org/10.1002/0471725315.ch6 Sinokrot, B. A., & Stefan, H. G. (1993). Stream temperature dynamics: Measurements and modeling. Water Resources Research . https://doi.org/10.1029/93WR00540 Sinokrot, B. A., Stefan, H. G., McCormick, J. H., & Eaton, J. G. (1995). Modeling of climate change effects on stream temperatures and fish habitats below dams and near gro undwater inputs. Climatic Change . https://doi.org/10.1007/BF01091841 . FROM AIR TEMPERATURE. JAWRA Journal of the American Water Resources Association . https://doi.org/10.1111/j.1752 - 1688.1993.tb01502.x St - Hilaire, A., Morin, G., El - Jabi, N., & Caissie, D. (2000). Water temperature modeling in a small forested stream: Implication of forest canopy and soil temperature. Canadian Journal of Civil Engineering . https://doi.org/10.1139/l00 - 021 . Storey, R. G., Howard, K. W. F., & Williams, D. D. (2003). Factors controlling riffle - scale hyporheic exchange flows and their seasonal changes in a gaining stream: A three - dimensional groundwater flow model. Water Resources Research . https://doi.org/10.1029/2002WR001367 (2017). Influence of the sampling period and time resolution on the PM source apportionment: Study based on the high time - resolution data and long - term daily data. Atmospheric Environment . https://doi.org/10.1016/j.atmosenv.2017.07.003 Vogt, T., Schneider, P., Hahn - Woernle, L., and Cirpka, O. A. (2010). Estimation of seepage rates in a losing stream by means of fiber - optic high - re solution vertical temperature profiling. Journal of Hydrology . https://doi.org/10.1016/j.jhydrol.2009.10.033 Webb, B. W., Clack, P. D., & Walling, D. E. (2003). Water - air temperature relationsh ips in a Devon river system and the role of flow. Hydrological Processes . https://doi.org/10.1002/hyp.1280 87 Wehrly, K. E., Wang, L., & Mitro, M. (2007). Field - Based Estimates of Thermal Tolerance Limits for Trout: Incorporating Exposure Time and Temperature Fluctuation. Transactions of the American Fisheries Society . https://doi.org/10.1577/t06 - 163.1 Wetzel, R.G. (1960). Marl encrustation on hydrophytes in several Michigan lakes. Oikos, 11: 223 - 236. Wojtalik, T. A., & Waters, T. F. (1970). Some Effects of Heated Water on the Drift of Two Species of Stream Invertebrates. Transactions of the American Fisheries Society . https://doi.org/10.1577/1548 - 8659(1970)99<782:seohwo>2.0.co;2 . Woltemade, C.J., and Hawkins , T.W. ( 2016 ) . Stream Temperature Impacts Because Of Changes In Air Temperature, La nd Cover And Stream Discharge: Navarro River Watershed, California, USA. River Research and Applications 32:2020 - 2031. DOI: 10.1002/rra.3043 Reduction on Fish As semblages in Michigan Streams1, (October 2017). https://doi.org/10.1111/j.1752 - 1688.2012.00656.x Zorn, T.G., Seelbach, P.W. , and Wiley, M.J. ( 2004 ) . Utility of Species - Regression Models for Prediction ofFish Assemb Peninsula.Michigan Department of Natural Resources, Fisheries Research Report 2072, Ann Arbor, Michigan. http://www.michigandnr.com/PUBLICATIONS/PDFS/ifr/ifrlibra/Research/reports/2072rr.p df 88 CHAPTER 2: THE EFFECT OF STREAM THERMAL CLASSIFICATION AND DATA POOLING ON TEMPERATURE GRADIENT MODELING INTRODUCTION A Challenge in Stream Management: Limited Data Data availability is critically important for environmental studies. Availabilit y and integrity of environmental data determines the outcomes of environmental studies, and eventually influence the decisions for environmental problems. Data limitation is a global problem and might be a consequence of many factors, such as limited time, intensive labor need and high costs ( Niemczynowicz 1999 ; Tavares Wahren et. al. 2016 ). Although the reason for data limitation varies case by case, the need for making environmental predictions with limited data is a common problem. In some cases, reducin g the number of data collection sites by determining reference data collection sites (e.g., McManamay et. al. 2018 ) can be a reasonable solution to reduce the expenses of data collection procedures. Data collection sites are usually determined by a set of key environmental characteristics that vary between environments and are commonly used for classifying these environments. For example, different hydrological (e.g., thermal) and ecological (e.g., species diversity) characteristics of streams are used for stream classification, and they help identifying reference data collection sites that represent a broader group of streams (Zorn et. al. 2008; Leathwick et. al. 2011 ; Maheu et. al. 2016). Therefore, stream classification has been an effective tool to reduc e the costs of data collection and has been an important topic in environmental sciences. Moreover, key stream characteristics (e.g., discharge change) help researchers gain deeper insight and make better predictions o f other environmental variables (e.g., groundwater inflow or outflow) for which data collection might be challenging . 89 Because of its importance to data collection practices and needs, I will primarily focus on stream classification in this study and its use for reducing the need for extensive data collection. However, the consequences of those applications will also be un der focus. A detailed analysis and interpretation of the outcomes of stream classification and its applications on linear regression models will be the main theme as there is no such study that was dedicated to this Although Z orn et. al. (2004, 2008) and Andrews (2019) considered the consequences of stream classification for stream temperature and temperature gradient modeling , some concepts related to these issues remained unknown. For example, linear regression models have no t been generalized and applied based on stream classes. Before explaining possible applications of stream classes to linear regression models and its possible results, I will touch on some applications of stream classification in the United States and in M ichigan. Recent History of Stream Classification Classification of streams has been a useful tool in stream management in many aspects ( Tadaki et. al. 2014 ) and many different approaches have been adopted. Classification of streams has been based on various characteristics of streams. The U.S. Environmental Protection Agency (EPA) , for example, use s average water depth, surface area, water velocity and sedim ent type to classify streams and rivers (ROSGEN stream classification) in the United States (Rosgen 1994,1996) . S tream temperature has been considered as another classification criteria since water temperature is a n important water quality criterion and ca n help decision - makers to monitor anthropogenic effects . For example, Maheu et. al. (201 6 ) characterized the thermal regime of streams by describing the patterns in water temperature variability at a national scale. They used annual mean stream temperature s that were obtained from daily mean stream temperatures at 79 90 sites. They also included annual and diel water temperature variability in their classification by using other environmental variables such as air temperature. Based on the se inputs , they devel oped six stream thermal classes: highly variable cool, variable cold, variable cool, variable warm, stable cold and stable cool. Based on their findings, they mapped streams nation - wide based on their stream classes ( Figure 2.1 ). In addition to such wide - s cale classification, researchers have classif ied streams at smaller scales since local environmental variables can also be critical. Figure 2.1 . Classification of streams and rivers at a national scale based on annual stream regimes ( from Maheu et. al. 201 6 ). In addition to nation - wide efforts, stream classification approaches have been implemented at a state - wide scale ( Kendy et. al. 2012 ). Michigan is one of the states where stream classification is well - studied topic , go ing back to the late 1990s. Seelbach et al. (1997 ) developed and used a landscape - based classification model to classify river valley segments in lower Michigan based on their ecological features, such as cat chment size, water temperature, hydrology and fish assemblages. Several years later , Brenden et. al. (2008) further refined the initial classification system. In addition, considering stream temperatures as one of the main 91 factors for fish habitat prefere nce, Wehrly et al. (2003) classified streams into three classes (cold, cool and warm) by using July mean temperature (JMT) data from 171 sites in Michigan. By referencing the classification approaches in previous studies ( Seelbach et. al (1997); Zorn et. a l (2002); Wang et. al. (2003) ; Wehrly et. al. (2003); Baker (2006); Seelbach et. al. (2006); Brenden et. al. (2008)), Zorn et al. (2008 ) developed a model to evaluate the effect of flow reduction on stream fish assemblages in Michigan . In this study , stream thermal classes were developed based on July mean temperatures : (cold (C) = JMT 17.5 °C (63.5 °F), cold - transitional (CT) = 17.5 °C (63.5 °F) < JMT 19.5 °C (67 °F), warm - transitional (WT) = 19.5 °C (67 °F) < JMT 21.0 °C (70 °F), warm (W) = JMT > 21.0 °C (70 °F)) and were applied to make predictions by using the Water Withdrawal Assessment Tool (WWAT) . These categories are the current basis for classification under current Michigan legislation. In previous research, Andrews (2019) developed a s uite of regression models to predict thermodynamics in streams. However, those regression models have not been evaluated within a stream classification framework. In this study, I adopted the best performing linear regression model among Andrews (2019) mod els and applied data pooling to determine if these models could be generalized across thermal stream classes. My study is important in many aspects since my findings can lead to new perspectives in stream classification stream temperature modeling and ca n be implemented in state - wide stream management processes. Data Pooling, Model Generalization and Stream Management Practices - specific temperature gradient ( ) prediction s. In other words, the model dynamics changed from stream to stream since data from individual streams were used in parameter estimation. Hypothetically, pooling the data from streams within the same thermal class could result in more 92 generalized models. I f these generalized class - based models (e.g., Cold stream class model) were applied to predict temperature gradient for an individual stream, the predictions would reflect the temperature gradient predictions based on the overall stream characteristics of that stream class (e.g., Cold stream class). If class - based temperature gradient predictions are realistic, these generalized models would be useful for numerous management purposes. If generalized models work well, the most practical use of those models would be to reduce the need for extensive data from individual streams. As such, the class - based models based on a set of representative stream data could be used to make temperature gradient predictions with limited data for other streams. For example, re liable temperature gradient predictions can be made based on common behavioral characteristics of the streams within that stream class ( Tadaki et. al. 2014 ). Also, real - time predictions of response variables can be achieved without collecting individual stream data beforehand, but by retrieving instantaneous data on predictor variables from various data sources (e.g., GIS and weather station data). Future pr edictions of response variables can also be made by applying hypothetical data for different scenarios. For example, future fish population distributions based on stream temperature changes can be predicted by using hypothetical data that reflect different climate change scenarios (e.g., Lyons et. al. 2010 ). In this chapter predictions and observed temperature gradient values, as well as the consistency of trends of observed and predi cted thermal gradient across time. Naturally, potential uses of generalized regression models depend on their model performances, particularly on their precision and potential bias. Therefore, I evaluated overall performance of class - specific models, as we ll as a 93 most general model. Additionally, evaluating model performances across stream classes gives valuable information on which stream classes can be most acc urately modeled. Furthermore, I evaluated the performance of generalized models when July - restricted data were used to develop those models since time period selection was an important factor affecting model performances (see Chapter 1). All these consider ations shaped the main goals of my study, which are: 1) To apply data pooling (with June - October 2016 data) based on stream thermal classes (C, CT, WT, W) to obtain generalized models; 2) To investigate the changes of model dynamics across data pooling; 3) To evaluate overall model performances of stream - specific and g eneralized models and evaluate their success across stream classes; 4) To evaluate overall model performances of stream - specific and generalized models by applying July - restricted data. METHODS Study Site and Data Collection The same study streams and data collection methods in Chapter 1 were used for this chapter. Moreover, the same refined and revised datasets of streams and regression models that were defined in Chapter 1 were used. Stream Classification and Model Performance Streams we re classified based on July Mean Temperatures (JMT) predictions as described in Zorn et. al. ( 200 8 ) . I decided to use daily data granularity because using daily data granularity resulted in generally high model predictive power for Model 10 (see Chapter 1) , and because daily data granularity was used in the WWAT (Zorn et. al. 2008). Although overall model predictive power was highest with weekly data granularity, I did not use it for my 94 applications in this chapter to avoid overfitting problem especially wi th July - restricted datasets (see Chapter 1). June - October (starting from June to October 2016) and July - restricted (July 2016) time periods were used to evaluate the effect of stream classification on model performances for each period. To evaluate model p erformances, model prediction reliability and model prediction powers were observed. Model prediction reliability were evaluated based on bias ( B ) for individual streams and mean bias ( ) values for the class as a whole. Pearson correlation coefficient ( r ) between observed and predicted temperature gradient was used to evaluate the consistency between observed and predicted values. Obtaining and Evaluating Models - Specific (SSM), Class - Based (CBM) and Global - Based GBM) models. SSMs were obtained by applying the base model to data from individual streams, as it was done in Chapter 1. CBMs for each stream class were obtained by pooling the data of streams within the same stream class and running the base regression model for the pooled data . Hypothetically, the dynamics (i.e., the intercepts and parameter estimates) in the base model for each stream class would be expected to vary since each class had different environmental characteristics and data , ther efore, the outputs from CBMs for each stream class were expected to be different. When compared to SSMs, CBMs were more generalized models since the dynamics of the base model were determined by the sets of streams that were in the same stream classes. The datasets of all streams were limited to the span between day of the year 177 to 270 to ensure all streams were equally represented. The Global - Based Model was obtained by pooling the data from all streams and applying the base regression model to the poo led data. Like SSMs and CBMs, the GBM was expected to have unique values of intercept and parameter estimates. Temperature gradient predictions were 95 obtained using the GBM for each stream. Since the data of all streams were pooled, GBM was the most general ized model. As the CBMs and the GBM are more broadly applicable than the After obtaining the temperature gradient predictions from each model by using June - October data, I obtained the Pearson Correlation ( r ) between observed temperature gradient and predicted temperature gradient for each stream. Moreover, to find the amount of bias between observed and predicted temperature gradient for each model, I obtained the mean observed and mean predicted temperature gradient for each stream and used the equation: Eqn. 1 6. B = - where B stands for bias, ( ° C) stands for mean predic ted temperature gradient, and ( ° C) stands for mean observed temperature gradient. The overall bias between observed and predicted temperature gradient values would be expected to be zero as the sum of residuals (which are the difference between observed and predicted values) is zero in linear regression. The bias, B , calculated here indicates the magnitude of deviation that occurs for subsets of data, which is not guaranteed to be zero for linear regression with subgroups. I calculated the mean absolute value of the stream - specific bias for each stream class as: Eqn. 1 7. , where is the mean absolute value of bias and n is the number of the streams in the thermal class. The absolute difference between mean predicted temperature gradient and mean observed head flux were found to observe the magnitude of deviation between these values. In addition to evaluating m odel perf ormances across stream classes, I also explored how model performance varied with mean observed downstream temperature and mean observed temperature gradient within each stream. The effect of downstream temperature was explored 96 because this provided a dir contrast to the stream thermal classification, which is based on predictions of the 30 - year mean July mean temperature for a stream from Brenden et al. (2008). I also explored model p observed temperature gradient to determine if generalized models performed equally across the range of temperature gradient s observed. RESULTS Pooling Data Changed Model Dynamics and Model Outcomes Stream - Specific models (SSMs) were obtained by applying Model 10 on the individual dataset of each stream. Substantial variation between the values of intercepts and parameter estimates ( ) of the SSM for each stream was observed across stream - specific mod els ( Table 2.1 ) . As an example, the value of intercept in SSM for Black River was 0.004, yet the same value in the model for Hasler Creek was 2.638. 97 Table 2.1 . Intercepts and parameter estimates from Stream - Specific models (SSMs) applied to each stream for June October hydrological data. Streams Intercept T a - T w Q up Q down Q up S up base over Black River 0.004 - 0.380 - 0.872 0.002 0.004 - 0.037 - 0.060 - 0.013 - 0.026 Cedar River 0.982 - 0.665 2.810 - 0.154 0.005 - 0.011 0.097 - 0.029 0.015 Cedar Creek - 3.800 - 0.044 0.102 - 0.041 - 0.001 0.009 0.004 - 0.012 0.001 Morgan C. - 0.258 7.516 - 0.225 0.155 - 0.012 - 0.015 - 0.231 0.186 - 0.027 Pokagon C. - 2.326 - 1.105 - 1.405 0.072 - 0.014 0.027 0.043 - 0.105 0.048 Butterfield C. 1.690 - 0.323 - 0.343 0.309 0.053 - 0.016 0.020 - 0.006 - 0.003 Carp River 0.500 28.338 - 9.301 - 0.038 0.018 - 5.671 0.133 0.056 - 0.004 Pigeon River - 3.953 1.841 5.079 0.009 - 0.020 0.018 - 0.038 0.015 0.013 Spring Creek 1.284 2.412 1.146 0.147 - 0.020 0.002 0.060 - 0.094 0.038 Escanaba R. - 5.148 1.626 - 1.804 0.352 - 0.066 0.078 - 0.130 0.123 - 0.033 Nottawa C. - 4.156 - 0.436 - 0.050 0.245 - 0.060 - 0.017 0.064 - 0.120 0.068 Tobacco R. 1.092 - 1.513 - 2.864 0.584 - 0.077 - 0.018 - 0.472 0.370 - 0.340 Hasler C. 2.638 0.121 - 0.061 - 0.037 - 0.022 - 0.041 0.041 0.004 0.008 Prairie River - 5.764 - 18.751 1.618 0.082 - 0.038 0.019 0.244 - 0.256 0.117 Squaw Creek 0.246 - 1.359 - 2.618 0.265 0.008 - 0.019 0.140 - 0.108 0.130 Swan Creek - 2.335 - 2.630 0.366 0.000 0.001 - 0.023 - 0.088 0.008 - 0.060 Average - 1.207 0.916 - 0.526 0.122 - 0.015 - 0.357 - 0.011 0.001 0.000 98 In addition, the intercept and parameter estimate values of CBMs and GMB were unique to each class - specific model and the global model ( Table 2.2 ). To illustrate, the intercept value in cold CBM was 0.479, and the value across stream classes and GBM varied. Also, parameter estimates of the same environmen tal variable (e.g., Q up ) changed sign across class - specific models ( Table 2.2 ). For example, Q up had a positive sign in cold CBM (0.236), whereas its value was negative in warm - transitional CBM ( - 0.400). These variations between parameter estimates indicated potential conflicts in interpretations o f how environmental factors influence model predictio ns as well as the amount of variance explained. Table 2.2 . Parameter estimates of Class - Based and Global Based models. June - October 2016 data were used. Stream Class Intercept T a - T w Q up Q down Q up S up base over C 0.479 0.030 0.236 - 0.122 0.005 0.002 - 0.010 - 0.038 0.008 C T - 0.042 - 0.004 - 0.015 - 0.39 0.067 - 0.019 0.069 - 0.036 0.056 W T - 2.622 0.032 - 0.400 - 0.875 0.230 - 0.023 0.002 - 0.020 - 0.008 W 0.606 0.027 - 2.713 - 2.527 0.052 0.050 0.111 - 0.051 0.105 Global - 2.096 0.072 0.101 - 0.166 0.178 - 0.041 0.012 - 0.015 - 0.002 Naturally, changes in model parameter estimates with data pooling resulted in changes of model predictions. Observed and predicted temperature gradient values showed that the congruence between observed and predicted temperature gradient varied among streams within a class ( Figure 2.10 to 2.13 ). Cedar Creek, Pigeon River, Escanaba River and Prairie River were selected as example streams from each stream class as they had the overall highest mean r values of models (0.6482, 0.5436, 0.5961,0.5729 respectively) among all streams. The fit of SSMs was generally higher than the fit for CBMs and GBMs. For example, the predicted temperature 99 gradient from the SSM of Cedar Creek displayed a more similar trend across time to the observed temperature gradient compared to predictions from CBM and GBM ( Figure 2.10 ). G eneralized models generally showed lower overall accuracy of temperature gradient predictions ( Table 2.3 ) compared to SSMs . The mean bias ( ) values of SSMs were g enerally the lowest for all stream classes, and mean biases of GBMs were the highest for all classes ( Figure 2.2 ). Moreover, overall mean bias values of GBMs were higher than mean bias values of CBMs . For example, mean bias value of GBM for Warm stream class (0.794) was almost five times greater than the same value of CBM (0.160) for the same stream class. Table 2.3 . B ias values (B) and their average ( ) of Stream - Specific models (SSM), Class - Based models (CBM) and Global - Based model (GBM) predi ctions. June - October 2016 data were used. Stream C lass Stream Bias (B) (SSM) Bias (B) (CBM) Bias (B) (GBM) Mean Bias ( ) SSM Mean Bias ( ) CBM Mean Bias ( ) GBM C Black River 0.000 - 0.118 - 0.066 0.008 0.073 0.181 Cedar River - 0.008 0.083 0.117 Cedar Creek 0.004 - 0.034 0.201 Morgan C. - 0.002 - 0.031 - 0.179 Pokagon C. - 0.028 0.100 0.342 C T Butterfield C. - 0.017 0.507 - 0.887 0.021 0.160 0.388 Carp River - 0.006 0.044 - 0.224 Pigeon River 0.004 0.074 - 0.406 Spring Creek - 0.057 0.202 0.037 W T Escanaba R. 0.051 - 0.037 0.296 0.028 0.025 0.275 Nottawa C. 0.017 0.007 - 0.168 Tobacco R. - 0.015 0.030 0.361 Hasler Creek - 0.021 - 0.225 - 1.223 0.037 0.160 0.794 Prairie River 0.027 - 0.091 - 0.489 W Squaw Creek - 0.031 0.213 1.170 Swan Creek 0.070 0.110 0.293 100 Figure 2.2 . The absolute value of biases averaged for each stream class. The higher the mean absolute bias, the higher the overall mean temperature gradient prediction deviates from the overall mean observed temperature gradient. Based on mean r values, SSMs had distinctively higher model predictive power compared to CBMs and GBMs ( Table 2.4 ; Figure 2.14 ). Moreover, CBMs had higher model predictive power compared to GBMs, support ing the conclusion that model prediction reliability decrease s as generalization of models increase s ( i.e., SSMs to GBMs). Table 2.4 classes. June - Oct ober 2016 data were used. Stream Class SSM CBM GBM Cold 0.691 0.212 0.333 Cold - Transitional 0.618 0.106 0.059 Warm - Transitional 0.699 0.584 0.163 Warm 0.796 0.472 0.191 SSM CBM GBM 0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.000 0.021 0.037 0.073 0.388 0.794 Model 101 Cold stream class generally had the lowest mean biases for all models, whereas Warm stream classes generally had the highest mean biases ( Table 2.3 ; Figure 2.2 ). However, streams in Cold - Transitional stream class showed the lowest mean r values in all models ( Figure 2.14 ). In contrast, warmer streams ( i.e., Warm and Warm - Tran sitional classes) posed higher mean r values in majority of models. Stream classifications used to this point were based on (Zorn et. al. 2008), which uses model - based predictions for each stream As such, there is a potential mismatch between predicted stream class membership and the observed mean stream temperatures for my study streams between June - October in a single year: 2016 . These differences wer e apparent for several streams ( Table 2.7 ), which lead me to evaluate model performances as a function of mean downstream temperature. Model prediction power values showed no clear relation to mean downstream temperatures ( Figure 2.3 ). In other words, model prediction power did not substantially ch ange with increasing or decreasing stream temperatures. Likewise, bias ( B ) did not show a trend across mean downstream temperatures ( Figure 2.4 ). 102 Figure 2.3 r ) values of SSM, CBM. GBM across mean downstream temperatures from June - October 2016. Figure 2.4 . Bias ( B ) versus mean downstream temperature. June October 2016 data were used. -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 12 13 14 15 16 17 18 19 20 21 Correlation Coefficient ( r ) SSM CBM GBM Linear (SSM) Linear (CBM) Linear (GBM) -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 10 12 14 16 18 20 22 Mean Downstream Temperature ( C) Bias (SSM) Bias (CBM) Bias (GBM) Linear (Bias (SSM)) Linear (Bias (CBM)) Linear (Bias (GBM)) 103 I also evaluated the relationship of model performances to mean observed temperature gradient to determine if streams that show more or less warming are modeled more accurately . Correlation ( r ) between generalized model predictions and observed temperature gradient increased with mean observed temperature gradient ( Figure 2.5 ). Model predictive power of GBMs especially showed a considerable increase (from negative values of r to values of 0.6) , indicating that generalized models predicted the trends of temperature gradient more accurately for warming stream reaches . On the other hand, the highest bias values for generalized models were observed at the high and low ends of the range of temperature gradient values observed ( Figure 2.6 ). In other words, high temperature changes between upstream and downstream resulted i n greater inaccuracies in model predictions. Figure 2.5 . Pearson correlation coefficient ( r ) values of SSM, CBM. GBM across mean observed temperature gradient from June - October 2016. -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 Mean Observed Temperature Gradient ( C) Individual Model Correlation Class-Based Model Correlation Global Model Correlation Linear (Individual Model Correlation) Linear (Class-Based Model Correlation) Linear (Global Model Correlation) 104 Figure 2.6 . Bias ( B ) versus mean observed temperature gradient. June October 2016 data were used. Classifying Streams Reduced Overall Model P e r formance with July - Restricted Data With July - restricted data, the predictive power of SSMs was still higher compared to CB Ms and GBMs ( Table 2.5 ; Figure 2.15 ). Although the model predictive power of SSM was substantially higher for all stream classes, neither CBM nor GBM were found to have distinctly higher model predictive power for any particular stream classes when July - restricted data were used. Thus, using July - restricte d data did not increase model predictive power over data from the full summer season for either of the generalized models. Table 2.5 r ) values of SSM, CBM and GBM across stream classes. July 2016 data were used. Str eam Class SSM CBM GBM Cold 0.796 0.014 0.165 Cold - Transitional 0.831 0.252 0.330 Warm - Transitional 0.734 0.340 - 0.031 Warm 0.861 0.324 0.226 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 Bias (SSM) Bias (CBM) Bias (GBM) Linear (Bias (SSM)) Linear (Bias (CBM)) Linear (Bias (GBM)) 105 Mean model prediction power of Class - Based models with June - October and July - restricted data were compared to understand whether using July restricted data would make Class - Based models work better or not. Surprisingly, CBMs performed better when June - October data were used in most cases, except for the Cold - Transitional stream class ( Figure 2.7 ). More over, r esults showed that there was no substantial change in model predictive power of models across July mean downstream temperature ( Figure 2.8 ). N evertheless, the m odel prediction power of CBMs and GBMs increased with higher mean temperature gradient values ( Figure 2.9 ) . Figure 2.7 ( r ) of Class - Based Models with June - October 2016 data and July 2016 data. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cold Cold-Transitional Warm-Transitional Warm Mean Correlation Coefficient ( r ) CBM (June-October) CBM (July) 106 Figure 2.8 . Pearson correlation coefficient ( r ) values of SSM, CBM. GBM across mean downstream temperatures from July 2016. -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 12 14 16 18 20 22 24 Mean Downstream Temperature ( C) SSM CBM GBM Linear (SSM) Linear (CBM) Linear (GBM) 107 Figure 2.9 . Pearson correlation coefficient (r) values of SSM, CBM. GBM across mean observed temperature gradient from July 2016. D ISCUSSION Although the Michigan Department of Natural Resources has applied st ream classification with physical models for many years (Zorn et. al. 2008), the effects of applying data pooling on linear regression models has not been tested. Therefore, applying data pooling to the regression models that were designed by Andrews (2019 ) provided insight into my four main warm or cold streams Does u sing July - r estricted d ata c hange m odel p erformance? The answers to these questions are intended to help guide researchers and managers select proper models for their particular needs and to determine if adequate better model predictions can be made without collecting extensive and expensive stream data ( Carlson et. al. 2017 ). -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 Mean Observed Temperature Gradient ( C) Individual Model Correlation Class-Based Model Correlation Global Model Correlation Linear (Individual Model Correlation) Linear (Class-Based Model Correlation) Linear (Global Model Correlation) 108 What is the Effect of Data Pooling on Model Dynamics? The results showed that data pooling based on stream classes resulted in different parameter estimates of Model 10, indicating substantia l changes in model dynamics or highly variable parameter estimates as data were broken into subsets ( Table 2.1 & Table 2.2 ). Therefore, data pooling resulted in significant variation in temperature gradient predictions, leading to substantial differences in model performance (i.e., biases and model predictive powers) between models. Basically, using generalized regression models (i.e., CBMs and GBMs) reduces the explanatory power of models especially for the streams that have unique environment al conditions ( Carlson et. al. 2017 ). Therefore, I hypothesized that generalizing the regression model by using the global stream data (across Michigan) may increase the magnitude of biases in model predictions and reduce overall model predictive powers. T his hypothesis was tested by evaluating performances of stream - specific and generalized models, and results are discussed in the following section. Does Stream Classification Improve Model Performance? I used two approaches to evaluate model performances : the mean bias ( ) between observed and predicted temperature gradient ( Eqn. 1 6 r ) as an indicator of model prediction power. One of the key findings was that generalized models had higher bias values when compared to SSMs ( Figure 2.2 ). Thus, Class - Based and Global - Based models had lower model prediction reliability. Higher values of CBMs and GBMs supported my hypothesis that data pooling would result in less accurate predictions, especially for the strea ms that had distinct environmental characteristics. It is important to note that although CBMs did not produce predictions that were as reliable as SSM predictions, they were generally more reliable than GBM predictions. 109 Model predictive power based on co rrelation values ( r ) was another indicator of model performance. Overall model predictive power of SSMs was distinctly higher for the majority of streams and stream classes ( Table 2.4 ). This finding matched with bias findings as bias values ( ) of SSMs w ere the lowest for most of streams. Therefore, using SSMs would give better temperature gradient predictions and better estimation of temperature gradient trends. In general, predictions from CBMs had higher correlation with observed data than GBMs. This m ay be caused by higher similarity between the environmental conditions of the streams that are grouped in the same class. Therefore, the temperature gradient trends were predicted better by CBMs. However, the low sample size of streams within each stream c lass might be a constraint to developing reliable CBMs, and consequently, may be partly a cause of lower model predictive powers for these models. Thus, if the number of streams that are used to obtain CBMs is increased, model predictive power may be impro ved. Both mean bias ( ) and correlation ( r ) revealed limitations of implementing CBMs and GBMs, but how can these limitations be considered from the perspective of stream management? The bias results implied that although CBMs had less model prediction reliability when compared to SSMs, they still have the potential to be used. The mean bias of CBMs ranged from 0.025 to 0.160 ( Table 2.3 ) across stream classes. In other words, the difference between average predicted and observed temperature gr adient was less than 0.2 °C across stream classes. From an ecological perspective, such difference may be negl igible since some salmonid species (e.g., brown trout: Salmo trutta ) have the ability to acclimate to a temperature of 27 to 30 °C within 24 hours ( Brett 1956 ; Sullivan et. al. 2000 ). In addition, daily water temperature changes up to 13.5 °C did not substantially affect the survival and growth of salmonids, unless lethal temperature levels were reached ( Thomas et. al. 19 86 ). Considering these tolerance ranges, the mean bias 110 values of CBMs may be acceptable, depending on the focus of the study (e.g., the characteristics of fish species) and as well as the availability of physical and financial resources for stream data col es on temperature gradient predictions for multiple streams that are distributed within a small spatial range, adopting stream - specific models may be most appropriate as the range of temperature gradient value s may be quite narrow. On the other hand, generalized models may be more useful in studies that require modeling for the streams within very large spatial range (e.g., state - wide, e.g., Steward et. al. 2015 ) that have a wider range of conditions and where the impact of bias would be less. At this point, the efficiency of using generalized models must be evaluated by researchers and decision - makers based on their purpose, the range of bias that is acceptable, and their resource availability needed for data c ollection. Do Models Work Better for Warm or Cold Streams? Cold stream class had the lowest overall bias, indicating model prediction reliability was relatively higher for the Cold stream class. Moreover, the Warm stream class had the highest overall bias thereby yielding models with the lowest model prediction reliability. Interestingly, the correlation between observed and predicted temperature gradient of CBMs was highest for warmer stream classes (Warm and Warm - Transitional). This apparent conflict hig hlights the difference between predictions that correlate to temporal trends in temperature gradient , and predictions that are offset from the observed data, leading to bias. A potential limitation for making conclusions based on model performance across stream thermal classes is that the observed stream temperatures for the time period I used (June - October 2016) did not always match a priori stream classes. For example, Morgan Creek should have been included in Warm - Transitional stream class based on its observed mean July downstream 111 temperature (17.550; Table 2.7 ). Likewise, Butterfield, Spring, Hasler, Squaw, Swan creeks and Carp, Pigeon, Escanaba, Tobacco, Prairie rivers wou ld fall into different stream classes based on their mean July downstream temperatures. Therefore, to cross - validate my findings on model performance and stream class and make more reliable conclusions on model performance versus stream temperatures, I te sted model performances across mean downstream temperatures as another criteri on . The distribution of r values of models showed no clear relation to mean downstream temperature ( Figure 2.3 ). In addition, bias also did not show a substantial increase or dec rease with increasing mean downstream temperature ( Figure 2.4 ). Therefore, it appears that these models are equally applicable to cooler or warmer streams . This conclusion should be tempered, however, by the narrow range in mean temperature (15 ° C - 18 ° C) a mong my study streams. Given the low diversity of thermal characteristics of streams studied, it is unknown whether model generalization approaches would work better across a broader range of thermal characteristics. Response of bias to temperature gradien t, which was another thermal criterion, varied between models ( Figure 2.6 ). Generalized models (CBMs and GBMs) had higher biases when compared to SSMs as for streams that showed the highest and lowest mean temperature gradient values. A potential reason wa s that generalized models can result in biases, especially when a stream has unique hydrological characteristics, such as having complex groundwater - surface water interactions. For example, using a generalized model for a stream section with a high degree of groundwater lo ss (e.g., positive temperature gradient ) or gain (e.g., negative temperature gradient ) may result in overestimation of temperature gradient for gaining streams and underestimation for losing streams. In addition to B coefficient ( r ) to observe the model predictive power across mean temperature gradient. The 112 results showed that model predictive power of generalized models substantially increased with mean observed tempe rature gradient, that is, generalized worked better for warming streams. ( Figure 2.5 ). This result was important because it may be a sign for reduced performance based on the amount of groundwater input in the system. As mentioned before, cooling streams m ay be considered as groundwater gaining streams. As such, the predictive power for warming streams may be better because they lack complex groundwater - surface water interactions. Evaluating model performance across temperature gradient also indicated that other environmental processes (e.g., stream shading, discharge, groundwater) that lead to heat gain or loss of the streams may be more important considerations beyond the observed temperature at a point in the stream ( Webb and Zhang 1997 ; Dugdale et. al. 2018 ). Although I did not observe a clear relationship between model performance and stream class or mean downstream temperature, it appears that generalized models perform more poorly for streams with high temperature gradient. Considering that extreme temperature gradients tend to be observed in streams that are highly altered by human activity (e.g., surface or groundwater withdrawal; Xin and Kinouchi 2013 ) or observed in the streams that might have complex groundwater and surface water dynamics ( Westh off et. al. 2007 ), managers are recommended to use Stream - Specific models instead of generalized models to obtain reliable temperature gradient predictions. C ooling streams should be of particular concern since the cooling trend generally indicates a groun dwater - driven stream, for which models had lower performances. Poor decisions on groundwater withdrawal based on poor model predictions could severely a ffect dynamics in groundwater - driven streams as well as its biota ( Boulton et. al. 2010 ; Carlson et. al. 2019 ). 113 Does Using July - Restricted Data Change Model Performance? Predictive powers of CBMs and GBMs with July - restricted data were lower when compared to model predictive powers with June - October data ( Table 2.5 ; Figure 2.15 ). Evaluation of model predictive power of CBMs with July - restricted and June - October data validated this conclusion except prediction power increased for Cold - Transitional stream class ( Figure 2.7 ). Using shorter time period s , such as July, may increase te mporal and spatial variation of hydrological events ( e.g., groundwater flow, precipitation, snowmelt) across streams. For example, average monthly precipitation is typically the highest in June and July in the Great Lakes basin ( Norton et. al. 2019 ). High spatial variation of rainfall during July may cause larger variations between physical characteristics of streams, consequently reducing the performance of generalized models. The response of model prediction power across July mean observed temperature gr adient matched previous results, that is, as mean observed temperature gradient increased, the predictive power of generalized models increased ( Figure 2.9 ). Thus, restricting data to the warmest part of the year, which may be ecologically the most relevan t, does not appear to improve model fits, particularly for sections of streams that show longitudinal cooling streams and that potentially have complex groundwater - surface water dynamics. Using July - restricted data did not significantly change the relation of model predictive power to July mean downstream temperature ( Figure 2.8 ), however, implying that these models work equally well across observed mean downstream temperatures. Based on these results, my main conclusion was that using shorter time period made generalized models even more disadvantageous than SSMs. Because the ecological relevance of the time period selected for the purpose of the study should come first (as expla ined in Chapter 114 1) and other time period options are not applicable in most cases, improving the class - based models appears to be the most effective way to reduce the costs of data sampling and making better predictions. The ways to improve class - based mod els for July - restricted time period is the same as for the full June - October period: increasing the number of streams used to develop the model and using more representative streams. Certainly, the optimum number of streams varies depending on various case s, however the number of cooling (i.e., groundwater - driven) streams should be carefully chosen to obtain generalized model due to the high complexity and low predictability in these streams. In addition, Model 10, which was the base for CBMs, can potentia lly be improved by adding new parameters or modifying the existing parameters so that the model can deal with complex groundwater - surface water dynamics and can be less sensitive to extreme temperature gradient values, as well as it can deal with variatio ns between streams within the shorter time period. CONCLUSIONS AND IMPLICATIONS 1) Stream classification is a useful approach to group streams based on their characteristics for many purposes, but especially important to decrease the need for extensive data c ollection. Data pooling is an effective practice to create class - specific and global models. Class - specific and global regression models had unique model dynamics, therefore, they resulted in different outcomes and showed different performances. G eneralize d models have the potential to make accurate predictions on response variables without the need of data from streams, as well as to predict future effects of an environmental change (e.g., groundwater withdrawal) on ecological characteristics of streams an d stream classes. 115 2) Predictions from the Global - based model showed the highest degree of bias and were not highly correlated to temporal trends in temperature gradient in individual stream. Thermal class - based models performed better than the Global - based model, but had poorer performance compared to Stream - Specific models. Even though the streams were classified in the same thermal class, some showed distinct physical characteristics, thus class - specific models did not work well for those streams. Anothe r reason for lower performance of class - specific models was the low number of representative streams that were used to create these models. Using larger number of representative streams to create these models might increase model performance. 3) My study did not reveal any relationship between model performance across stream thermal classes or mean downstream temperature. The performance of generalized models increased as temperature gradient increased, however, implying better predictive capacity in streams with less groundwater contribution. Therefore, I suggest that modifying the base model and data inputs to better represent groundwater - surface water interactions would be a starting point to develop better generalized models that can explain the influence of groundwater on thermal dynamics of streams and stream classes. 4) Restricting the time period to July decreased the overall model performances of generalized models. The reason for this is unclear , but high temporal and spatial variations between environmental phenomena in July (e.g., precipitation) could have increased the distinct physical features of streams, resulting in lower model performances of generalized models for those streams. Therefore, using Stream - Specific models may be more useful in management practices. Nevertheless, although class - specific and global models had lower performances with July - restricted data, ecological relevance of time 116 period selection may be more important. Thus, improving generalized models would be more effective than using a time period that has lower ecological relevance with the purpose of the study. 117 A PPENDI X 118 Table 2.6 . Mean observed and predicted temperature gradient values, absolute bias values of Stream - Specific models (SSM), Class - Based models (CBM) and Global - Based model (GBM) predictions. June - October 2016 data were used. Stream class Stream Mean Observed (°C) SSM Mean Predicted (°C) CBM Mean Predicted (°C) GBM Mean Predicted (°C) C Black River 0.282 0.282 0.400 0.348 Cedar River 0.484 0.476 0.401 0.366 Cedar Creek 0.093 0.097 0.127 - 0.108 Morgan C. - 0.469 - 0.471 - 0.437 - 0.290 Pokagon C. 0.467 0.439 0.367 0.125 C T Butterfield C. - 0.943 - 0.960 - 0.622 - 0.056 Carp River 0.000 - 0.007 - 0.045 0.224 Pigeon River - 0.381 - 0.377 - 0.456 0.025 Spring Creek - 0.121 - 0.178 - 0.323 - 0.158 W T Escanaba R. - 0.042 0.009 - 0.005 - 0.338 Nottawa C. - 0.909 - 0.891 - 0.915 - 0.740 Tobacco R. 0.616 0.601 0.586 0.255 W Hasler Creek - 1.678 - 1.698 - 1.452 - 0.455 Prairie River 0.449 0.477 0.540 - 0.040 Squaw Creek 1.117 1.086 0.904 - 0.053 Swan Creek 0.379 0.449 0.269 0.086 Average - 0.041 - 0.042 - 0.041 - 0.051 119 Table 2.7 . Mean downstream temperatures of streams with June - October and July - restricted data for year 2016. The stream classes are based on Zorn et. al. (2008), cold (C): July Mean - - transitional: T > 21.0 °C. Streams were assigned to their classes based on their mean JMT values from 3 0 - years of data (Zorn et. al. 2008). Stream Class Stream Mean Downstream Temperature June - October 2016 (°C) Mean Downstream Temperature July 2016 (°C) Cold Black river 15.362 16.813 Cedar river 13.248 14.943 Cedar Creek 14.369 15.896 Morgan Creek 17.550 20.191 Pokagon Creek 17.441 19.793 Cold - Transitional Butterfield Creek 15.231 17.712 Carp river 17.103 19.213 Pigeon river 16.814 18.553 Spring Creek 17.410 19.814 Warm - Transitional Escanaba River 17.296 19.230 Nottawa Creek 20.306 22.388 Tobacco River 16.878 19.221 Warm Hasler Creek 18.051 21.307 Prairie River 17.788 19.116 Squaw Creek 17.186 20.236 Swan Creek 19.529 21.578 120 Figure 2.10 . Observed and predicted temperature gradient (°C) from Stream - Specific, Class - Based, and Global - Based models. Cedar Creek (cold) June - October 2016 data were used. -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 175 185 195 205 215 225 235 245 255 265 275 Observed Stream-Specific Prediction Class-Based Prediction Global-Based Prediction 121 Figure 2.11 . Observed and predicted temperature gradient (°C) from Stream - Specific, Class - Based, and Global - Based models. Tobacco River (cold - transitional) June - October 2016 data were used. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 175 185 195 205 215 225 235 245 255 265 275 Temperature gradient ( C) Observed Strem-Specific Prediction Class-Based Prediction Global-Based Prediction 122 Figure 2.12 . Observed and predicted temperature gradient (°C) from Stream - Specific, Class - Based, and Global - Based models. Escanaba River (warm - transitio nal) June - October 2016 data were used. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 175 185 195 205 215 225 235 245 255 265 275 Day of Year Observed Stream-Specific Prediction Class-Based Prediction Global-Based Prediction 123 Figure 2.13 . Observed and predicted temperature gradient (°C) from Stream - Specific, Class - Based, and Global - Based models. Prairie River (warm) June - October 2016 data were used. -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 175 185 195 205 215 225 235 245 255 265 275 Day of Year Observed Stream-Specific Prediction Class-Based Prediction Global-Based Prediction 124 Figure 2.14 . Average Pearson correlation coefficient ( r ) based on stream classes. June October data were used. SSM CBM GBM 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Thermal Class 125 Figure 2.15 . Mean Pearson correlation coefficient ( r ) values were averaged based on stream classes. July 2016 data were used. SSM CBM GBM 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Thermal Class 126 BIBLIOGRAPHY 127 BIBLIOGRAPHY Andrews, R. 2019. Effects of flow reduction on stream dynamics of streams: improving an - 0 - 0. M.S. Thesis, Michigan State University, East Lansing, MI. Baker, E. A. (2006). A landscape - based ecological classification system f or river valley segments in Michigan's Upper Peninsula. Michigan Department of Natural Resources, Fisheries Research Report 2085, Ann Arbor. Boulton, A. J., Datry, T., Kasahara, T., Mutz, M., & Stanford, J. A. (2010). Ecology and management of the hyporhe ic zone: Stream - groundwater interactions of running waters and their floodplains. Journal of the North American Benthological Society . https://doi.org/10.1899/08 - 017.1 Brenden, T. O., Wang, L., & Seelbach, P. W. (2008). A River Valley Segment Classification of Michigan Streams Based on Fish and Physical Attributes. Transactions of the American Fisheries Society . https://doi.org/10.1577/T07 - 166.1 Brett, J. R. (1956). Some Principles in the Thermal Requirements of Fishes. The Quarterly Review of Biology . https://doi.org/10.1086/401257 Carlson, A. K., Taylor, W. W., & Infante, D. M. (2019). Developing precipitation - and groundwater - corrected stream temperature models to improve brook charr management amid climate change. Hydrobiologia . https://doi.org/10.1007/s1 0750 - 019 - 03989 - 1 Carlson, A. K., Taylor, W. W., Hartikainen, K. M., Infante, D. M., Beard, T. D., & Lynch, A. J. (2017). Comparing stream - specific to generalized temperature models to guide salmonid management in a changing climate. Reviews in Fish Biolog y and Fisheries . https://doi.org/10.1007/s11160 - 017 - 9467 - 0 Dugdale, S. J., Malcolm, I. A., Kantola, K., & Hannah, D. M. (2018). Stream temperature under contrasting riparian forest cover: Understan ding thermal dynamics and heat exchange processes. Science of the Total Environment . https://doi.org/10.1016/j.scitotenv.2017.08.198 Kendy, E., Apse, C., Blann , K., & Richardson, A. (2012). a Practical Guide To Environmental Flows for Policy and Planning. Nat Conserv . Leathwick, J. R., Snelder, T., Chadderton, W. L., Elith , J., Julian, K., & Ferrier, S. (2011). Use of generalised dissimilarity modeling to improve the biological discrimination of river and stream classifications. Freshwater Biology . https://doi .org/10.1111/j.1365 - 2427.2010.02414.x 128 Lyons, J., Stewart, J. S., & Mitro, M. (2010). Predicted effects of climate warming on the distribution of 50 stream fishes in Wisconsin, U.S.A. Journal of Fish Biology . https://doi.org/10.1111/j.1095 - 8649.2010.02763.x Maheu, A., Poff, N. L., & St - Hilaire, A. (2016). A Classification of Stream Water Temperature Regimes in the Conterminous USA. River Research and Applications . https://doi.org/10.1002/rra.2906 McManamay, R. A., Smith, J. G., Jett, R. T., Mathews, T. J., & Peterson, M. J. (2018). Identifying non - reference sites to guide stream restoration and long - term monitoring. Science of the Total Environ ment . https://doi.org/10.1016/j.scitotenv.2017.10.107 Niemczynowicz, J. (1999). Urban hydrology and water management - present and future challenges. Urban Water . https://doi.org/10.1016/s1462 - 0758(99)00009 - 6 Norton, P.A., Driscoll, D.G., and Carter, J.M. (2019). Climate, streamflow, and lake - level trends in the Great Lakes Basin of the United States and C anada, water years 1960 2015: Scientific Investigations Report 2019 5003, 47 p., https://doi.org/10.3133/sir20195003 . Rosgen, D. L. (1994). A classification of natural rivers. Catena . https://doi.org/10.1016/0341 - 8162(94)90001 - 9 Rosgen, D.L., (1996). Applied River Morphology (Second Edition). Wildland Hydrology, Pagosa Springs, Colorado. Seelbach, P. W., Wiley, M. J. , Baker, M. E. and Wehrly K. E. (2006). Initial classification of 48 in R. Hughes, L. Wang, and P. W. Seelbach, editors. Landscape influences on stream habitats and biological communities. American Fisheries S ociety, Symposium 48, Bethesda, Maryland. Seelbach, P.W. & Wiley, Michael & Kotanchik, J.C. & Baker, Matthew. (1997). A Landscape - Based Ecological Classification for River Valley Segments in Lower Michigan. Stewart, J.S., Westenbroek, S.M., Mitro, M.G., Lyons, J.D., Kammel, L.E., and Buchwald, C.A. (2015). A model for evaluating stream temperature response to climate change in Wisconsin: U.S. Geological Survey Scientific Investigations Report 2014 5186, 64 p., http://dx.doi.org/10.3133/sir20145186 . Sullivan, K., D.J. Martin, R.D. Cardwell, J. E. Toll, and Duke , S. (2000). An analysis of the effects of temperature on salmonids of the Pacific Northwest with implications for selecting temperature criteria . Sustainable Ecosystems Institute, Portland Oregon. Tadaki, M., Brierley, G., & Cullum, C. (2014). River classification: theory, practice, politics. Wiley Interdisciplinary Reviews: Water . https://doi.org/10.1002/wat2.1026 129 Tavares Wahren, F., Julich, S., Nunes, J. P., Gonzalez - Pelayo, O., Hawtree, D., Feger, K. H., & Keizer, J. J. (2016). Combining digital soil mapping and hydrological modelin g in a data scarce watershed in north - central Portugal. Geoderma . https://doi.org/10.1016/j.geoderma.2015.08.023 Thomas, R. E., Gharrett, J. A., Carls, M. G., Rice, S. D., Moles, A., & Korn, S . (1986). Effects of Fluctuating Temperature on Mortality, Stress, and Energy Reserves of Juvenile Coho Salmon. Transactions of the American Fisheries Society . https://doi.or g/10.1577/1548 - 8659(1986)115<52:eoftom>2.0.co;2 (2003). Watershed, reach, and riparian influences on stream fish assemblages in the Northern Lakes and Forest Ecoregio n, U.S.A. Canadian Journal of Fisheries and Aquatic Sciences . https://doi.org/10.1139/f03 - 043 Webb, B. W., & Zhang, Y. (1997). Spatial and seasonal variability in the components of the river heat budget. Hydrological Processes . https://doi.org/10.1002/(sici)1099 - 1085(199701)11:1<79::aid - hyp404>3.0.co;2 - n Wehrly, K. E., Wiley, M. J., & Seelbach, P. W. (2003). Classifying Regional Variation in Stream Regime Based on Stream Fish Community Patterns. Transactions of the American Fisheries Society . https://doi.org/10.1577/1548 - 8659(2 003)132<0018:CRVITR>2.0.CO;2 Westhoff, M. C., Savenije, H. H. G., Luxemburg, W. M. J. ., Stelling, G. S., van de Giesen, N. high resolution temperature observations. Hydrology and Earth System Sciences Discussions . https://doi.org/10.5194/hessd - 4 - 125 - 2007 Xin, Z., & Kinouchi, T. (2013). Analysis of stream temperature and heat budget in an urban river under stron g anthropogenic influences. Journal of Hydrology . https://doi.org/10.1016/j.jhydrol.2013.02.048 Zorn, T. G., Seelbach , P. W., & Wiley, M. J. (2002). Distributions of Stream Fishes and their Transactions of the American Fisheries Society . https://doi.org/10.1577/1548 - 8659(2002)131<0070:DOSFAT>2.0.CO;2 Assess the Effects of Flow Reduction on Fish Assemblages in Michigan S treams1, (October 2017). https://doi.org/10.1111/j.1752 - 1688.2012.00656.x Zorn, T.G., Seelbach, P.W. , and Wiley, M.J. ( 2004 ) . Utility of Species - Regression Models fo Peninsula.Michigan Department of Natural Resources, Fisheries Research Report 2072, Ann Arbor, Michigan. 130 http://www.michigandnr.com/PUBLICATIONS/PDFS/ifr/ifrlibra/Research/reports/2072rr.p df