IMPROVING THE PREDICTABILITY OF HYDROLOGIC INDICES IN
           ECOHYDROLOGICAL APPLICATIONS
                                By
                Juan Sebastian Hernandez Suarez
                        A DISSERTATION
                            Submitted to
                    Michigan State University
            in partial fulfillment of the requirements
                         for the degree of
         Biosystems Engineering – Doctor of Philosophy
                               2021


                                            ABSTRACT
            IMPROVING THE PREDICTABILITY OF HYDROLOGIC INDICES IN
                            ECOHYDROLOGICAL APPLICATIONS
                                                  By
                                 Juan Sebastian Hernandez Suarez
        Monitoring freshwater ecosystems allow us to better understand their overall
ecohydrological condition within large and diverse watersheds. Due to the significant costs
associated with biological monitoring, hydrological modeling is widely used to calculate
ecologically relevant hydrologic indices (ERHIs) for stream health characterization in locations
with lacking data. However, the reliability and applicability of these models within
ecohydrological frameworks are major concerns. Particularly, hydrologic modeling’s ability to
predict ERHIs is limited, especially when calibrating models by optimizing a single objective
function or selecting a single optimal solution. The goal of this research was to develop model
calibration strategies based on multi-objective optimization and Bayesian parameter estimation
to improve the predictability of ERHIs and the overall representation of the streamflow regime.
The research objectives were to (1) evaluate the predictions of ERHIs using different calibration
techniques based on widely used performance metrics, (2) develop performance and signature-
based calibration strategies explicitly constraining or targeting ERHIs, and (3) quantify the
modeling uncertainty of ERHIs using the results from multi-objective model calibration and
Bayesian inference. The developed strategies were tested in an agriculture-dominated watershed
in Michigan, US, using the Unified Non-dominated Sorting Algorithm III (U-NSGA-III) for
multi-objective calibration and the Soil and Water Assessment Tool (SWAT) for hydrological
modeling. Performance-based calibration used objective functions based on metrics calculated
on streamflow time series, whereas signature-based calibration used ERHIs values for objective


functions’ formulation. For uncertainty quantification purposes, a lumped error model
accounting for heteroscedasticity and autocorrelation was considered and the multiple-try
Differential Evolution Adaptive Metropolis (ZS) (MT-DREAM(ZS)) algorithm was implemented
for Markov Chain Monte Carlo (MCMC) sampling. In relation to the first objective, the results
showed that using different sets of solutions instead of a single optimal introduces more
flexibility in the predictability of various ERHIs. Regarding the second objective, both
performance-based and signature-based model calibration strategies were successful in
representing most of the selected ERHIs within a 30% relative error acceptability threshold
while yielding consistent runoff predictions. The performance-based strategy was preferred since
it showed a lower dispersion of near-optimal Pareto solutions when representing the selected
indices and other hydrologic signatures based on water balance and Flow Duration Curve
characteristics. Finally, regarding the third objective, using near-optimal Pareto parameter
distributions as prior knowledge in Bayesian calibration generally reduced both the bias and
variability ranges in ERHIs prediction. In addition, there was no significant loss in the reliability
of streamflow predictions when targeting ERHIs, while improving precision and reducing the
bias. Moreover, parametric uncertainty drastically shrank when linking multi-objective
calibration and Bayesian parameter estimation. Still, the representation of low flow magnitude
and timing, rate of change, and duration and frequency of extreme flows were limited. These
limitations, expressed in terms of bias and interannual variability, were mainly attributed to the
hydrological model’s structural inadequacies. Therefore, future research should involve revising
hydrological models to better describe the ecohydrological characteristics of riverine systems.


                                     ACKNOWLEDGMENTS
        First, I want to thank God. Without His grace, love, inspiration, and blessings, this
research would not have been possible.
        I would like to express my sincere appreciation to my advisor Dr. A. Pouyan
Nejadhashemi for his relentless support, encouragement, and guidance during my PhD. Beyond
being an excellent mentor, I consider him a friend. I would also like to thank my committee
members: Dr. Kalyanmoy Deb, Dr. Timothy Harrigan, and Dr. Mohsen Zayernouri for their
guidance and advice throughout my research.
        I am very grateful to the Colombian Ministry of Science, Technology and Innovation
(Minciencias), Fulbright Colombia, and the Office of International Students and Scholars at
Michigan State University (MSU) for the financial support granted for my doctoral studies. I am
also grateful to the College of Agriculture and Natural Resources, the College of Engineering,
the Department of Biosystems and Agricultural Engineering (BAE), and the Graduate School at
MSU for providing fellowships to encourage and disseminate my research and finalize my
dissertation. I would like to extend my sincere thanks to the BAE administrators and staff for
their help with all the (tedious) administrative side of the program. I would also like to thank my
lab mates and friends for all their help and fun moments that made my time in the US very
enjoyable.
        Many thanks to my father, mother, and brother for all their love, patience, motivation,
and support from the distance. Dennise, this journey would have been unbearable without you.
Thank you very much for sharing your days with me and making me infinitely happy. Finally, I
dedicate this work to the memory of my Grandma Mariela. We miss you.
                                                   iv


                                             TABLE OF CONTENTS
LIST OF TABLES ....................................................................................................................... viii
LIST OF FIGURES ........................................................................................................................ x
KEY TO ABBREVIATIONS ....................................................................................................... xii
1 INTRODUCTION ..................................................................................................................... 1
2 LITERATURE REVIEW........................................................................................................... 5
  2.1 OVERVIEW ....................................................................................................................... 5
  2.2 INTRODUCTION .............................................................................................................. 5
  2.3 MODELING METHODS ................................................................................................... 8
      2.3.1 Statistical Methods .................................................................................................... 8
            2.3.1.1 Linear statistical methods .............................................................................. 8
            2.3.1.2 Ordination methods ..................................................................................... 13
      2.3.2 Machine Learning ................................................................................................... 15
            2.3.2.1 Decision tree-based methods ....................................................................... 15
                       2.3.2.1.1 Boosted regression trees ............................................................... 16
                       2.3.2.1.2 Random forests ............................................................................. 22
            2.3.2.2 Artificial neural networks............................................................................ 24
            2.3.2.3 Other methods ............................................................................................. 27
      2.3.3 Soft Computing Methods ........................................................................................ 29
            2.3.3.1 Fuzzy logic-based methods ......................................................................... 29
            2.3.3.2 Bayesian belief networks ............................................................................ 33
  2.4 KNOWLEDGE GAP ANALYSIS ................................................................................... 36
  2.5 SUMMARY AND CONCLUSION ................................................................................. 41
3 INTRODUCTION TO METHODOLOGY AND RESULTS ................................................. 47
4 EVALUATION OF THE IMPACTS OF HYDROLOGIC MODEL CALIBRATION
  METHODS ON PREDICTABILITY OF ECOLOGICALLY-RELEVANT HYDROLOGIC
  INDICES .................................................................................................................................. 50
  4.1 INTRODUCTION ............................................................................................................ 50
  4.2 MATERIALS AND METHODS ...................................................................................... 54
      4.2.1 Study area ................................................................................................................ 55
      4.2.2 Data Collection ........................................................................................................ 57
      4.2.3 SWAT Model description ....................................................................................... 58
      4.2.4 Hydrologic indices .................................................................................................. 58
      4.2.5 Objective functions ................................................................................................. 59
            4.2.5.1 Nash-Sutcliffe efficiency-based objective functions................................... 60
            4.2.5.2 Root-Mean-Square Error-based objective functions ................................... 61
      4.2.6 Many-objective optimization algorithm .................................................................. 62
      4.2.7 Model evaluation ..................................................................................................... 64
                                                                  v


  4.3 RESULTS AND DISCUSSION ....................................................................................... 66
      4.3.1 Convergence and spread of Pareto-optimal fronts obtained with multi-objective
            calibration strategies ................................................................................................ 66
      4.3.2 Reduction of initial parameter ranges by multi-objective calibration strategies ..... 69
      4.3.3 Flow duration curves and streamflow time series representation ........................... 72
      4.3.4 Statistical analysis for predicted streamflow time-series ........................................ 74
      4.3.5 The level of predictability of ecologically-relevant hydrologic indices using multi
            and single-objective strategies................................................................................. 75
            4.3.5.1 Multi-objective calibration strategies .......................................................... 75
            4.3.5.2 Single-objective calibration......................................................................... 81
  4.4 CONCLUSIONS............................................................................................................... 83
5 A NOVEL MULTI-OBJECTIVE MODEL CALIBRATION METHOD FOR
  ECOHYDROLOGICAL APPLICATIONS............................................................................. 86
  5.1 INTRODUCTION ............................................................................................................ 86
  5.2 MATERIALS AND METHODS ...................................................................................... 90
      5.2.1 Overview ................................................................................................................. 90
      5.2.2 Study Area ............................................................................................................... 91
      5.2.3 Watershed Model .................................................................................................... 92
      5.2.4 Strategy 1: Constrained Performance-Based Model Calibration ............................ 94
            5.2.4.1 Performance Metrics Selection ................................................................... 94
            5.2.4.2 Constraint Definition ................................................................................... 98
      5.2.5 Strategy 2: Unconstrained Signature-Based Model Calibration ........................... 100
      5.2.6 Evolutionary Multi-Objective Optimization Algorithm........................................ 101
      5.2.7 Selection of Preferred Tradeoff Solutions ............................................................. 103
      5.2.8 Evaluation of Calibration Results Using Water Balance, Flow Duration Curve
            Characteristics, and Additional Hydrologic Indices ............................................. 104
  5.3 RESULTS AND DISCUSSION ..................................................................................... 106
      5.3.1 Performance of Single-objective Model Calibration Using Transformed Metrics 106
      5.3.2 Selected Metrics for Constrained Performance-Based Model Calibration ........... 108
      5.3.3 Overall Performance of Pareto-Optimal Solutions ............................................... 111
      5.3.4 Replication of Ecologically Relevant Hydrologic Indices of Interest ................... 113
      5.3.5 Performance of Preferred Tradeoff Solutions ....................................................... 116
      5.3.6 Representation of Water Balance and Flow Duration Curve Characteristics ....... 117
      5.3.7 Relationship between Water Balance, Flow Duration Curve Characteristics, and
            Ecologically Relevant Hydrologic Indices of Interest .......................................... 119
      5.3.8 Replication of Variability in Ecologically Relevant Hydrologic Indices ............. 120
  5.4 CONCLUSIONS............................................................................................................. 122
6 PROBABILISTIC PREDICTIONS OF ECOLOGICALLY RELEVANT HYDROLOGIC
  INDICES USING A HYDROLOGICAL MODEL ............................................................... 125
  6.1 INTRODUCTION .......................................................................................................... 125
  6.2 MATERIALS AND METHODS .................................................................................... 129
      6.2.1 Bayesian Parameter Estimation ............................................................................. 130
            6.2.1.1 Likelihood function ................................................................................... 130
            6.2.1.2 Prior distributions ...................................................................................... 132
                    6.2.1.2.1 Experiment 1 – Non-informative priors ...................................... 132
                                                           vi


                       6.2.1.2.2 Experiment 2 – Multi-objective model calibration ..................... 133
            6.2.1.3 Sampling algorithm ................................................................................... 135
      6.2.2 Generation of Predictive Distributions of ERHIs ................................................. 136
      6.2.3 Performance evaluation ......................................................................................... 137
      6.2.4 Case study ............................................................................................................. 138
            6.2.4.1 Study area and model ................................................................................ 138
            6.2.4.2 Data collection........................................................................................... 139
            6.2.4.3 Calibration parameters .............................................................................. 140
            6.2.4.4 Ecologically Relevant Hydrologic Indices ................................................ 141
            6.2.4.5 Experiments set up .................................................................................... 142
  6.3 RESULTS AND DISCUSSION ..................................................................................... 143
      6.3.1 Convergence of multi-objective and Bayesian calibration experiments ............... 143
      6.3.2 Comparison between posterior parameter distributions using non-informative priors
            and Pareto-optimal results ..................................................................................... 144
      6.3.3 Performance of uncertainty quantification of daily streamflows .......................... 146
      6.3.4 Performance of uncertainty quantification of ERHIs ............................................ 148
  6.4 CONCLUSIONS............................................................................................................. 152
7 CONCLUSIONS .................................................................................................................... 154
8 FUTURE RESEARCH .......................................................................................................... 158
APPENDIX ................................................................................................................................. 161
REFERENCES ........................................................................................................................... 168
                                                                 vii


                                                     LIST OF TABLES
Table 1 Summary of advantages, disadvantages and applications for the methods described in
         this study......................................................................................................................... 43
Table 2 Calibrated ranges obtained with Pareto-optimal solutions. Values without brackets
         correspond to NSE-based strategy results while values within brackets correspond to
         RMSE-based strategy results .......................................................................................... 71
Table 3 Percentage of Pareto-optimal solutions without evidence of significant mean difference
         (𝜶=0.05) between simulated and observed time-series, considering different time series
         categories and clusters for both calibration strategies. Cluster with highest percentage
         for each flow category are in bold .................................................................................. 75
Table 4 List of ecologically-relevant hydrologic indices with all, high, medium, and low flow
         Pareto-optimal solutions having median relative errors outside the ±30% bound, for
         each multi-objective calibration strategy........................................................................ 78
Table 5 The lowest median relative error and corresponding interquartile range (IQR) and flow
         cluster for each multi-objective calibration strategy for the Indicators of Hydrologic
         Alteration (IHA). Values that exceed ±30% bound of relative error are highlighted .... 80
Table 6 Calibration parameters and ranges ................................................................................... 94
Table 7 Performance metrics and transformations considered for the selection process ............. 96
Table 8 List of 39 Ecologically Relevant Hydrologic Indices of interest used for multi-objective
         model calibration ............................................................................................................ 99
Table 9 Proportion of indices falling within the 30% relative error threshold under different
         categories of hydrologic indices. Proportions are reported for each performance metric
         considered in the single-objective calibration process. Performance metrics were
         grouped following proportions similarity. The best performing metric overall is in bold
         within each group. Proportions are color-coded as follows: 100% are dark green
         (excellent), 70-99% are light green (good), 55-69% are dark yellow (fair), 40-54% are
         light yellow (poor), and 0-39% are red (very poor) ..................................................... 110
Table 10 Overall performance of near-optimal Pareto and preferred tradeoffs solutions under
         each model calibration strategy. Values in parenthesis correspond to the validation
         period ............................................................................................................................ 113
Table 11 Model calibration parameters and ranges .................................................................... 141
Table 12 List of ERHIs used in this study .................................................................................. 142
Table 13 Performance of predictive distributions of ERHIs obtained under Period 2. Reliability
         was evaluated by identifying whether the distributions contained the ERHIs from
         observations, and whether the median of the distributions was within the ±30% relative
         error range .................................................................................................................... 151
                                                                  viii


Table A1 Description of ecologically-relevant hydrologic indices with all, high, medium, and
        low flow Pareto-optimal solutions having median relative errors outside the ±30%
        bound, for each multi-objective calibration strategy. Adapted from Olden and
        Poff……………………………………………………………………………………162
                                               ix


                                                    LIST OF FIGURES
Figure 1 A schematic diagram presenting the overall multi-objective model calibration and
         evaluation process. Q25 and Q75 are the flows exceeded 25% and 75% of the time,
         respectively, NSE is the standard Nash-Sutcliffe Efficiency, NSEsqrt is the root-squared-
         transformed NSE, NSErel is the relative NSE, RMSE is the Root-Mean-Square Error,
         and MHIT is the MATLAB Hydrological Index Tool ................................................... 55
Figure 2 Location and topography of the study area .................................................................... 57
Figure 3 Objective spaces for the SWAT model calibration: a) using different forms of Nash-
         Sutcliffe efficiency; b) after hydrograph partitioning using Q25 and Q75 thresholds ... 60
Figure 4 Normalized hypervolume indicator behavior over the NSGA-III search process for each
         calibration strategy ......................................................................................................... 68
Figure 5 Clustered Pareto-optimal solutions obtained for each multi-objective calibration strategy
         employing NSGA-III algorithm and k-means clustering method a) NSE-based and b)
         RMSE-based ................................................................................................................... 69
Figure 6 Flow duration curves and time series obtained from Pareto-optimal solutions (light
         gray) and clustered (high, medium, and low flow) solutions (dark gray) for NSE-based
         (a, b, and c), and RMSE-based (d, e, and f) multi-objective calibration strategies. Red
         lines correspond to observed streamflow values ............................................................ 73
Figure 7 Overview of the two multi-objective strategies for model calibration evaluated in this
         study ............................................................................................................................... 91
Figure 8 Location of the Honeyoey Creek - Pine Creek Watershed............................................. 92
Figure 9 Heatmaps with relative errors for 178 ecologically relevant hydrologic indices when
         optimizing different transformed measures. Panels a) to l) represent an individual
         category of hydrological indices as presented in Table 9 ............................................. 109
Figure 10 Overall performance of the two model calibration strategies: a) 10-generations moving
         average of normalized hypervolume indicator and number of Pareto solutions over the
         U-NSGA-III search process, lighter colors represent values for each generation; b)
         Taylor diagram for the initial population and Pareto solutions at the last generation,
         contour lines represent the ratio of the standard deviation of residuals and standard
         deviation of observations,  is the ratio of simulated and observed standard deviations,
         and r is the linear correlation coefficient; c) behavior of the ratio of simulated and
         observed means () obtained for the initial population and Pareto solutions at the last
         generation ..................................................................................................................... 112
Figure 11 Boxplots representing the distribution of relative errors for each Ecologically Relevant
         Hydrologic Index of interest for the near-optimal Pareto solutions obtained under each
         model calibration strategy, horizontal dashed lines represent the 30% interval: a)
         magnitude of monthly water conditions; b) magnitude and duration of annual extreme
         water conditions; c) duration and frequency of high and low pulses, rate and frequency
                                                                    x


         of water condition changes, and timing of annual extreme water conditions; d)
         Magnificent seven indices. Index abbreviations are listed in Table 8 .......................... 115
Figure 12 Flow duration curves (FDCs) for the preferred tradeoff solutions identified for each
         calibration strategy compared against the observed FDC from 2003 to 2014: a) FDCs
         from near-optimal Pareto solutions for Strategy 1; b) FDCs from near-optimal Pareto
         solutions for Strategy 2; c) and d) represent the bias for FDC and water balance
         measures for Strategy 1 and Strategy 2, respectively, under calibration and validation
         periods .......................................................................................................................... 118
Figure 13 Boxplots representing the distribution of relative errors for variability hydrologic
         indices under each model calibration strategy, horizontal dashed lines represent the
         30% interval: a) variability in the magnitude of monthly water conditions; b)
         variability in the magnitude and duration of annual extreme water conditions; c)
         variability in the duration and frequency of high and low pulses, rate and frequency of
         water condition changes, and the timing of annual extreme water conditions. Index
         abbreviations are presented in Table 3 ......................................................................... 122
Figure 14 Study area location and major land uses .................................................................... 139
Figure 15 Distribution of model parameters obtained from multi-objective calibration
         (Experiment 2, Period 1 – MOOP1) and Bayesian parameter estimation (Experiment 1,
         Period 1 – E1P1; Experiment 1, Period 2 – E1P2; Experiment 2, Period 2 – E2P2). Box
         and whisker plots represent the 50% and 95% confidence limits, respectively; points
         represent median parameter values. Parameter descriptions are reported in Table 11.
         *These parameters were calibrated using global multipliers ....................................... 145
Figure 16 Uncertainty quantification performance using multi-objective calibration and Bayesian
         parameter estimation. The hydrographs (left column) represent the 95% prediction
         bounds for streamflow; light gray is for total uncertainty, dark gray is for parametric
         uncertainty, red line are observations. The middle column presents the corresponding
         quantile-quantile plots (PQQ) using a standard uniform distribution. The right column
         presents the overall performance indices for reliability (R), precision (P), and Bias. a)
         Experiment 1, Period 1; b) Experiment 2, Period 1; c) Experiment 1, Period 2; d)
         Experiment 2, Period 2 ................................................................................................. 147
Figure 17 Distribution of relative errors of the selected ERHIs using multi-objective calibration
         (Experiment 2, Period 1 – MOOP1) and Bayesian parameter estimation (Experiment 1,
         Period 1 – E1P1; Experiment 1, Period 2 – E1P2; Experiment 2, Period 2 – E2P2). Box
         and whiskers represent the 50% and 95% confidence limits, respectively; points
         represent median relative error values; the vertical dotted line represents the zero axis,
         the gray area represent the nominal 30% ERHI uncertainty. Index abbreviations are
         reported in Table 12...................................................................................................... 150
                                                                  xi


                                 KEY TO ABBREVIATIONS
ABC: Approximate Bayesian Computation
AIC: Akaike Information Criterion
ALPHA_BF: Baseflow alpha factor (days-1)
ANCOVA: Analysis of Covariance
ANFIS: Adaptive Neuro-Fuzzy Inference Systems
ANN: Artificial Neural Network
AR (1): Autoregressive Model with Lag 1
AUC: Area Under Receiver Operating Characteristic Curve
BBN: Bayesian Belief Networks
BIC: Bayesian Information Criterion
BIOMIX: Biological mixing efficiency
BRT: Boosted Regression Trees
CA: Correspondence Analysis
CANMX: Maximum canopy storage (mm H2O)
CAR (1): Continuous Autoregressive Model with Lag 1
CART: Classification and Regression Trees
CCA: Canonical Correspondence Analysis
CDF: Cumulative distribution function
CH_K (2): Effective hydraulic conductivity in main channel alluvium (mm hr-1)
CH_N (2): Manning's "n" value for the main channel
CN2: Curve Number for moisture condition II
COIN: Computational Optimization and Innovation Laboratory at Michigan State University
d: Index of Agreement
                                              xii


DCA: Detrended Correspondence Analysis
DCCA: Detrended Canonical Correspondence Analysis
DH: High flows duration indices
DH1: Annual maximum daily flow (m3 s-1)
DH2: Annual maximum of 3-day moving average flow (m3 s-1)
DH3: Annual maxima of 7-day means of daily discharge (m3 s-1)
DH4: Annual maxima of 30-day means of daily discharge (m3 s-1)
DH5: Annual maxima of 90-day means of daily discharge (m3 s-1)
DH6: Variability of annual maximum daily average flow
DH7: Variability of annual maximum of 3-day moving average flow
DH8: Variability of annual maximum of 7-day moving average flow
DH9: Variability of annual maximum of 30-day moving average flow
DH10: Variability of annual maximum of 90-day moving average flow
DH15: High flow pulse duration with a threshold equal to the 75th percentile of the entire flow
record (days)
DH16: Variability in high flow pulse duration with a threshold equal to the 75th percentile of the
entire flow record
DH17: High flow duration with a threshold equal to the median flow (days)
DH19: High flow duration with a threshold equal to 7 times the median flow (days)
DH20: High flow duration with a threshold equal to the 75th percentile value for the median
annual flows (days)
DH21: High flow duration with a threshold equal to the 25th percentile value for the median
annual flows (days)
DH23: Flood duration with a threshold equal to the flow equivalent for a flood recurrence of
1.67 years (days)
DL: Low flows duration indices
DL1: Annual minimum daily flow (m3 s-1)
                                              xiii


DL2: Annual minimum of 3-day moving average flow (m3 s-1)
DL3: Annual minima of 7-day means of daily discharge (m3 s-1)
DL4: Annual minima of 30-day means of daily discharge (m3 s-1)
DL5: Annual minima of 90-day means of daily discharge (m3 s-1)
DL6: Variability of annual minimum daily average flow
DL7: Variability of annual minimum of 3-day moving average flow
DL8: Variability of annual minimum of 7-day moving average flow
DL9: Variability of annual minimum of 30-day moving average flow
DL10: Variability of annual minimum of 90-day moving average flow
DL11: Mean of 1-day minima of daily discharge
DL12: Mean of 3-day minima of daily discharge
DL16: Low flow pulse duration (days)
DREAM: Differential Evolution Adaptive Metropolis
EPCO: Plant uptake compensation factor
ERHI: Ecologically Relevant Hydrologic Index
ESCO: Soil evaporation compensation factor
FDC: Flow Duration Curve
FH: High flows frequency indices
FH1: High flood pulse count with a threshold equal to the 25th percentile of the entire flow
record (year-1)
FH2: Variability in high flood pulse count with a threshold equal to the 25th percentile of the
entire flow record
FH4: High flood pulse count with a threshold equal to 7 times median daily flow (year-1)
FH5: Flood frequency with a threshold equal to the median flow (year-1)
FH8: Flood frequency with a threshold equal to the 25th percentile of the entire flow record
(year-1)
                                               xiv


FH9: Flood frequency with a threshold equal to the 75th percentile of the entire flow record
(year-1)
FHV: FDC very-high-segment volume
FL: Low flows frequency indices
FL1: Low flood pulse count (year-1)
FL2: Variability in low flood pulse count
FLV: FDC low-segment volume
FMS: FDC midsegment slope
FMV: FDC high-segment volume
GA: Genetic Algorithm
GAM: Generalized Additive Model
GARP: Genetic Algorithm for Rule set Production
GLM: Generalized Linear Model
GLS: Generalized Least-Squares
GRNN: Generalized Regression Neural Network
GW_DELAY: Groundwater delay time (days)
GW_REVAP: Groundwater "revap" coefficient
GWQMN: Threshold depth of water in the shallow aquifer required for return flow to occur (mm
H2O)
HIT: Hydrologic Index Tool
HRU: Hydrologic Response Unit
I-IBI: Macroinvertebrate Index of Biotic Integrity
IBI: Fish Index of Biotic Integrity
ICI: Invertebrate Community Index
IHA: Indices of Hydrologic Alteration
IoA: Index of Agreement
                                                xv


IQR: Interquartile Range
KGE: Kling-Gupta Efficiency
MA: Average flows magnitude indices
MA12: Mean monthly flow for January (m3 s-1)
MA13: Mean monthly flow for February (m3 s-1)
MA14: Mean monthly flow for March (m3 s-1)
MA15: Mean monthly flow for April (m3 s-1)
MA16: Mean monthly flow for May (m3 s-1)
MA17: Mean monthly flow for June (m3 s-1)
MA18: Mean monthly flow for July (m3 s-1)
MA19: Mean monthly flow for August (m3 s-1)
MA20: Mean monthly flow for September (m3 s-1)
MA21: Mean monthly flow for October (m3 s-1)
MA22: Mean monthly flow for November (m3 s-1)
MA23: Mean monthly flow for December (m3 s-1)
MA24: Variability in January flows
MA25: Variability in February flows
MA26: Variability in March flows
MA27: Variability in April flows
MA28: Variability in May flows
MA29: Variability in June flows
MA30: Variability in July flows
MA31: Variability in August flows
MA32: Variability in September flows
MA33: Variability in October flows
                                           xvi


MA34: Variability in November flows
MA35: Variability in December flows
MA42: Variability across annual flows
MA44: Variability across annual flows
MA45: Skewness in annual flows
MAG: Magnificent Seven indices
MAG1: First L-moment
MAG2: Second L-moment
MAG3: Third L-moment
MAG4: Fourth L-moment
MAG5: Autoregressive lag-one AR(1) correlation coefficient
MAG6: Amplitude of the seasonal signal
MAG7: Phase of the seasonal signal
MARS: Multivariate Adaptive Regression Splines
MCDM: Multicriteria Decision-Making
MCMC: Markov Chain Monte Carlo
MH: High flows magnitude indices
MH10: Mean maximum October monthly flow (m3 s-1)
MH11: Mean maximum November monthly flow (m3 s-1)
MH21: High flow volume (days)
MH22: High flow volume (days)
MH23: High flow volume (days)
MH6: Mean maximum June monthly flow (m3 s-1)
MH7: Mean maximum July monthly flow (m3 s-1)
MHIT: MATLAB Hydrologic Index Tool
                                           xvii


ML: Low flows magnitude indices
ML7: Mean minimum July monthly flow (m3 s-1)
ML8: Mean minimum August monthly flow (m3 s-1)
ML9: Mean minimum September monthly flow (m3 s-1)
ML14: Mean of annual minimum flows
ML15: Low flow index
ML16: Median of annual minimum flows
ML17: Baseflow index based on the seven-day minimum flow
ML18: Baseflow index based on the seven-day minimum flow
ML19: Variability of baseflow index based on the lowest annual daily flow
ML21: Variability across annual minimum flows
ML22: Specific mean annual minimum flows (m3 s-1 km-2)
MLP: Multilayer Perceptron
MLR: Multiple Linear Regression
MOEA: Multi-objective evolutionary algorithm
MT-DREAM(ZS): Multiple-try Differential Evolution Adaptive Metropolis (ZS)
NASS: National Agricultural Statistics Service
NCDC: National Climatic Data Center
NCEI: National Centers for Environmental Information
NED: National Elevation Dataset
NMDS: Nonmetric Multidimensional Scaling
NOAA: National Oceanic and Atmospheric Administration
NRCS: Natural Resources Conservation Service
NSE: Nash-Sutcliffe Efficiency
NSErel: Relative NSE
                                              xviii


NSEsqrt: Root-squared-transformed NSE
NSGA-II: Non-dominated Sorting Genetic Algorithm II
NSGA-III: Nondominated Sorted Genetic Algorithm III
OF: Objective function
PBIAS: Percent Bias
PCA: Principal Component Analysis
PCoA: Principal Coordinates Analysis
PLSR: Partial Least Squares Regression
PO: Polar Ordination
PQQ: Predictive quantile-quantile plot
Q25: flow exceeded 25% of the time
Q75: flow exceeded 75% of the time
r: Correlation coefficient
R2: Coefficient of Determination
R4MS4E: Fourth Root Mean Quadrupled Error
RA: Rate of change indices
RA1: Rise rate (m3 s-1 d-1)
RA2: Rise rate variability
RA3: Fall rate (m3 s-1 d-1)
RA4: Fall rate variability
RA6: Change of flow- increasing (m3 s-1)
RA7: Change of flow - decreasing (m3 s-1)
RA8: Reversals (year-1)
RA9: Reversals variability
RCHRG_DP: Deep aquifer percolation fraction
                                           xix


RDA: Redundancy Analysis
REVAPMN: Threshold depth of water in the shallow aquifer for "revap" or percolation to the
deep aquifer to occur (mm H2O)
RF: Random Forests
RIVPACS: River Invertebrate Prediction and Classification System
RMSE: Root Mean Squared Error
RR: Runoff Ratio
SBX: Simulated Binary Crossover
SCS: Soil Conservation Service
SDM: Species Distribution Model
SEM: Structural Equation Modeling
SOL_AWC: Available water capacity of the soil layer (mm H2O mm-1 soil)
SSURGO: Soil Survey Geographic Database
SURLAG: Surface runoff lag coefficient
SVM: Support Vector Machine
SWAT: Soil and Water Assessment Tool
TA: Average flows timing indices
TH: High flows timing indices
TH1: Julian date of annual maximum
TH2: Julian date of annual maximum variability
TL: Low flows timing indices
TL1: Julian date of annual minimum
TL2: Julian date of annual minimum variability
TL4: Seasonal predictability of non-low flow
U-NSGA-III: Unified Non-dominated Sorting Algorithm III
US: United States
                                              xx


USDA: US Department of Agriculture
USEPA: US Environmental Protection Agency
USGS: US Geological Survey
WFG: Walking Fish Group
WWAP-UN: United Nations World Water Assessment Programme
WXGEN: SWAT Stochastic Weather Generator
                                         xxi


                                       1 INTRODUCTION
        One of the major concerns in the twenty-first century is the increasing pressure on water
resources worldwide. Nearly 80% of the global population are exposed to high levels of threat to
water security (Vörösmarty et al., 2010). In addition, freshwater ecosystems are deeply
fragmented by built infrastructure, with only 23% of rivers longer than 1000 km arriving
uninterrupted to the ocean (Grill et al., 2019). Unfortunately, this crisis is not only limited to
water quantity but also is expanded to water quality. According to the United Nations World
Water Assessment Programme (WWAP-UN), over 80% of global wastewater is discharged to
waterbodies without any treatment (WWAP-UN, 2017). Moreover, most agriculture and urban
runoff are delivered to freshwater and marine ecosystems without any water quantity and quality
control (Eckart et al., 2017; Mateo-Sagasta et al., 2018). Summed to climate change, all these
factors have increased the occurrence of waterborne disease. In addition, the resulting
biodiversity and ecosystem losses imperil valuable ecosystem services necessary to sustain
human societies (Hipsey et al., 2015; Pham et al., 2019).
        In the United States, the Clean Water Act was enacted in 1972 to restore and maintain US
waters' chemical, physical, and biological integrity. In river systems, chemical integrity can be
associated with instream water quality. Meanwhile, physical integrity can be described in terms
of water quantity, physical habitat, and stream’s geomorphology. Likewise, biological integrity
is expressed in terms of abundance, composition, and diversity of freshwater organisms. Since
these three components support biotic systems necessary for human and environment well-being,
the paradigm comprising these concepts is known as stream or river health (Karr, 1999;
Maddock, 1999). Stream health is generally measured using bioassessments, which have gained
popularity for supporting water quality management, complementing chemical and
                                                  1


microbiological criteria (US EPA, 2011). Particularly, fish and benthic macroinvertebrates are
commonly used as biological indicators. Fish are suitable for monitoring broad habitat
conditions, streams connectivity, and long-term effects, whereas benthic macroinvertebrates are
preferred when assessing local conditions and short-term effects (Herman and Nejadhashemi,
2015).
         Stream health monitoring is usually done sparsely in time and space (Einheuser et al.,
2012). Knowing the stream health condition of every single stream within a watershed is
desirable for environmental management and policymaking. However, extensive biological
monitoring is costly, time-consuming, and impractical for large areas. Therefore, modeling
techniques have been developed to extend available information to locations with lacking
biological data (Woznicki et al., 2016a). Stream health models generally use landscape attributes
(e.g., land use/cover, slope, soils, geology) and instream physical and chemical characteristics
(e.g., temperature, streamflow, nutrients, sediments, substrate) as explanatory variables to predict
instream biological responses (Einheuser et al., 2012; Sowa et al., 2016). Streamflow is
considered a master variable that dictates patterns and processes occurring in rivers and streams,
including water quality, physical habitat formation, and life cycles of living organisms (Walker
et al., 1995). Therefore, by studying the streamflow behavior over time (i.e., streamflow regime),
it is possible to approximate the overall stream health condition (Poff et al., 1997; Richter et al.,
2003).
         The streamflow regime is generally described using metrics or indices related to five
major facets: magnitude, duration, frequency, timing, and rate of change of flows (Sofi et al.,
2020). Magnitude refers to the volume of water passing through a fixed location per unit of time.
Based on this facet, streamflow can be classified as high, average, or low flow. Streamflow plays
                                                   2


different roles depending on its magnitude (Sofi et al., 2020); for instance, low flows maintain
instream water quality conditions, define the longitudinal stream connectivity, enable fish and
nutrients to move, and allow natural selection by purging invasive species (Poff et al., 1997).
Meanwhile, high flows give shape to streams, flush away pollutants, and maintain lateral
connectivity, favoring floodplains, wetlands, and riparian vegetation (Poff et al., 1997). Duration
is the length of time associated with a flow event being read horizontally in a hydrograph. This
facet influences the persistence of aquatic and riparian species and controls fish growth potential
and development under flooding events (Bunn and Arthington, 2002). Frequency refers to how
often a streamflow magnitude occurs over a specific period of time, and it is generally described
using Flow Duration Curves (FDC). This facet is important for controlling aquatic and riparian
species’ life cycles and productivity (Bunn and Arthington, 2002). For example, frequency
regulates how often fish can move upstream or to floodplains for migration or reproduction (Poff
et al., 1997). Timing is the degree to which flow events are temporally autocorrelated, indicating
that the system has memory. For instance, certain rivers always experience high flows in spring
and low flows in summer. This facet works as a trigger the system needs to start a process (e.g.,
fish spawning). Additionally, timing helps to maintain species diversity (Bunn and Arthington,
2002). Finally, the rate of change describes how fast the system goes up and down. This facet
influences species persistence and coexistence and controls the establishment of nonnative
species (Poff et al., 1997; Sofi et al., 2020). In summary, there are many indices describing the
aforementioned streamflow regime facets (Olden and Poff, 2003), and they are known as
ecologically relevant hydrologic indices (ERHIs).
         Calibrated hydrological models are used to predict streamflows beyond monitoring
stations. Consequently, ERHIs can be estimated in ungauged locations using results obtained
                                                    3


from hydrological modeling. When developing models for predicting ERHIs, three fundamental
questions emerge in the process: (1) how much the predictability of ERHIs is affected by the
choice of model calibration techniques? (2) how to calibrate a hydrologic model to improve the
prediction of multiple ERHIs simultaneously? and (3) how reliable are the hydrological
modeling results when predicting EHRIs?
        To address these questions, the goals of this research were to (1) evaluate the predictions
of ERHIs using a hydrological model when it is calibrated using single- and multi-objective
techniques based on widely used performance metrics, (2) develop calibration strategies for
improving the predictability of ERHIs and the overall streamflow regime using a hydrologic
model, and (3) quantify the modeling uncertainty of ERHIs using the results obtained from the
developed calibration strategies under the previous section.
        The outcome of this research is a framework for linking hydrologic model calibration and
uncertainty quantification when predicting ERHIs. This framework includes the development of
novel calibration strategies aimed to improve the accuracy of ERHIs predictions while
maintaining a balanced representation of different streamflow regime facets. Ultimately, it is
expected that the overall performance of ERHIs’ uncertainty quantification is improved. This can
help policymakers with decision-making in the context of water and natural resources
management.
                                                  4


                                   2 LITERATURE REVIEW
2.1 OVERVIEW
        During the last three decades, explaining cause-effect relationships between natural and
anthropogenic disturbances with measures of stream health have motivated the growing
application of statistical, machine learning, and soft computing methods. The aim of this review
is to provide insight into the most widely used methods for predicting biological variables based
on macroinvertebrate and fish species in riverine ecosystems. Therefore, we describe several
methods including multiple linear regression, generalized linear models, generalized additive
models, boosted regression trees, random forests, artificial neural networks, fuzzy logic-based,
and Bayesian belief networks along with recent applications of these. Moreover, issues regarding
variable selection, model interpretability, ensemble modeling, and model evaluation and
overfitting are discussed. Recent advances have suggested the need for integrated modeling
systems to enhance predictive ability and improve interpretability. However, trade-offs between
model complexity and accuracy demand research efforts in uncertainty
quantification/propagation in model ensembles. Additionally, models should be perceived as
complementary tools that require further validation with field measurements. Therefore, a
consensus regarding monitoring and modeling practices for stream health applications is
recommended.
2.2 INTRODUCTION
        Current and future threats to freshwater ecosystems due to changes in environmental
conditions and impacts of anthropogenic activities require urgent and well-informed actions
(Strayer and Dudgeon, 2010; Vörösmarty et al., 2010; Waldron et al., 2017). Therefore, health
                                                  5


assessments of these ecosystems are critical to promoting their protection and restoration
(Beechie et al., 2010). During the last decades, many environmental legislations have been
increasingly supporting the introduction of biological assessments in local, regional and national
monitoring programs (Hering et al., 2010; Hill et al., 2017). Typically, biotic indices are derived
from biological assessments to represent the stream health condition. The stream health concept
comprises the physical, chemical and biological capacity to maintain the structure and
functioning of freshwater ecosystems, required for supporting living systems (Karr, 1999;
Maddock, 1999). These indices are based on one or multiple metrics describing abundance,
richness, diversity or composition of biological assemblages (Herman and Nejadhashemi, 2015).
Furthermore, biomass, probability of occurrence, and incidence (presence/absence) data provide
information regarding the level of impairment, which is also useful for stream health evaluation
(Hill et al., 2017; Smucker et al., 2013). Biological measurements in aquatic ecosystems can be
obtained from several biological assemblages and their selection is usually subjected to the type
of study to perform. Benthic macroinvertebrates are preferred when studying localized effects of
habitat and water quality alterations, due to their limited movement within a water body (Kerans
and Karr, 1994). Meanwhile, fish communities are preferred when evaluating changes in flow
regime and spatial connectivity (Karr, 1981). Benefits of stream health evaluation include the
possibility to explore the environmental mechanisms driving ecosystem alterations (Herman and
Nejadhashemi, 2015). Likewise, indicators of stream health can help with the identification of
degraded areas and the provision of necessary inputs to design protection and restoration projects
(Walters et al., 2009).
         Stream health models have been introduced to relate observed biological data with
environmental and landscape variables with the goal of establishing reference conditions (Feio
                                                   6


and Poquet, 2011; Hawkins et al., 2000), predicting biological variables and indicators in
unsampled locations (Merriam et al., 2015; Waite et al., 2010), classifying streams by
impairment condition (Brown et al., 2012; Maloney et al., 2009), and predicting biological
variables and indicators given the implementation of conservation practices (Hall et al., 2017;
Herman et al., 2015; Sowa et al., 2016) and changes in environmental and landscape stressors
(Einheuser et al., 2013b, 2013a, 2012). These models have been enhanced by the advances in
landscape methods for studying freshwater ecosystems (Johnson and Host, 2010; Steel et al.,
2010), species distribution models (SDMs) (Elith and Graham, 2009; Li and Wang, 2013; Van
Echelpoel et al., 2015) and habitat suitability models (Ahmadi-Nedushan et al., 2006; Yi et al.,
2017). Still, due to the complexity and nature of the problem, stream health models are mainly
empirically-based rather than mechanistic or process-based. However, hierarchical approaches
incorporating climate, hydrologic, hydraulic, water quality and/or physical habitat models have
been suggested to improve the current models’ predictability, interpretability and accuracy
(Daneshvar et al., 2017a; Einheuser et al., 2013a, 2013b, 2012; Guse et al., 2015; Herman et al.,
2015; Holguin-Gonzalez et al., 2014, 2013a, 2013b; Jähnig et al., 2012; Kail et al., 2015; Kennen
et al., 2008; Woznicki et al., 2016a, 2016b; Yi et al., 2017). Modeling approaches include
traditional regression models (e.g. multiple linear regression, generalized linear models),
ordination and classification methods (e.g. principal component analysis, redundancy analysis),
clustering methods (e.g. self-organizing maps, k-nearest neighbors), structural equation modeling
(SEM), machine learning and soft computing techniques (e.g. fuzzy logic, neural networks,
evolutionary computation). In this paper, we review the most widely used methods able to model
both continuous and categorical stream health data based on macroinvertebrate and fish
assemblages. These methods comprise of traditional statistical approaches, machine learning,
                                                  7


and soft computing methods. The specific objectives of this study are to (1) summarize the main
characteristics of the selected modeling methods and their applications, and (2) identify features
requiring further research for improving stream health modeling practices.
2.3 MODELING METHODS
         In this section, we describe the most widely-used methods for stream health modeling. In
general, the approaches presented herein are data-driven since we are dealing with natural
systems; however, soft computing methods are more suitable for incorporating expert elicitation.
In addition, both soft computing and machine learning methods are more flexible regarding
statistical assumptions than traditional statistical modeling approaches.
2.3.1 Statistical Methods
         Statistical methods considered herein are mainly focused on modeling approaches based
on linear regression. However, a general overview of multivariate methods for ordination is also
presented. A statistical model is a specification of probability distributions reproducing observed
data, establishing mathematical relationships between explanatory and response variables
(Nelder and Baker, 2006). Multivariate methods can be used for either ordination or
classification purposes. Ordination is the arrangement of biological data samples along one or
more gradients (Austin, 1976), whereas classification is the assignment of biological data
samples into groups based on a measure of similarity (or dissimilarity) (Mitteroecker and
Bookstein, 2011).
2.3.1.1 Linear statistical methods
         Linear statistical methods have been applied mainly to elucidate relationships between
landscape, habitat, and water quality factors and biological variables. In this category, there are
methods with different complexity level like Multiple Linear Regression (MLR), Generalized
Linear Models (GLMs) and Generalized Additive Models (GAMs), which have been widely
                                                    8


implemented in ecological applications. MLR fits a linear equation using the observed data to
model the relationship between a set of explanatory variables and a response variable, assuming
independent and identically normal distributed errors. Meanwhile, GLMs introduce some
flexibility allowing different error distributions, selected from the exponential family of sampling
models (e.g. normal, binomial, Poisson, gamma), and relate the response variable 𝑌 with the
explanatory variables 𝑋 using a pre-specified link function 𝑔 (McCullagh and Nelder, 1989). The
link function provides the relationship between the linear predictor 𝜂 and the expected value (i.e.
mean) of the response variable 𝐸(𝑌|𝑋):
        𝑔[𝐸(𝑌|𝑋)] = 𝜂 = 𝛼 + 𝑋𝛽                                                                (1)
where, 𝛼 and 𝛽 are the intercept and the vector of linear weights, respectively. When selecting
different link functions, GLMs comprise linear, logistic, and Poisson regressions, among others.
For instance, logistic regression is preferred when modeling species presence/absence, while
Poisson regression is more suitable when modeling the count data (Li and Wang, 2013). In order
to further account for nonlinearities, GAMs extend GLMs, expressing 𝜂 as the sum of
unspecified nonparametric linear or nonlinear smoothing functions 𝑓𝑖 , applied over the set of 𝑝
explanatory variables (Hastie et al., 2009):
        𝑔[𝐸(𝑌|𝑋)] = 𝛼 + 𝑓1 (𝑋1 ) + ⋯ + 𝑓𝑝 (𝑋𝑝 )                                               (2)
        To ensure that the smoothing functions are identifiable, they are restricted to have zero
mean (Maloney et al., 2012). These functions are commonly estimated using a scatterplot
smoother (e.g. cubic spline) as the basic building block (Hastie et al., 2009; Zuur et al., 2009).
        Linear statistical models were initially introduced to build empirical associations between
landscape and stream health attributes. Early efforts were mostly concerned in identifying the
main stressors and landscape components (i.e. riparian buffer, watershed) affecting water quality,
                                                    9


physical habitat, and/or freshwater biological communities (Van Sickle et al., 2004). Many of
these works are reviewed by Ahmadi-Nedushan et al. (2006), Johnson and Host (2010) and Steel
et al. (2010). Linear statistical models have also been continuously applied as benchmarks for
comparison with other statistical methods and machine learning approaches. For
macroinvertebrate communities, traditional regression models are still used to relate abiotic
stressors with species occurrence in order to explore habitat and water quality preferences,
especially in headwaters (Pond et al., 2017) and tropical regions (Damanik-Ambarita et al., 2016;
Everaert et al., 2014; Jerves-Cobo et al., 2017). However, prediction of the biological condition
under different spatial and temporal domains and scales have been also addressed in some
studies for both macroinvertebrate and fish assemblages (Frimpong et al., 2005; Johnson and
Host, 2010; Van Sickle and Burch Johnson, 2008).
         Representative studies employing MLR for stream health prediction include models
developed by Waite et al. (2010) to predict several macroinvertebrate metrics using watershed-
and riparian-scale variables. Results showed that the best models explained 41-74% of the
variation requiring only two or three explanatory variables after stepwise selection using the
Akaike Information Criterion (AIC) estimator. Likewise, Merriam et al. (2015, 2013) employed
linear regression and deletion tests to predict two indices based on benthic macroinvertebrate
abundance as a function of surface mining, underground mine permit density, residential
development, and location attributes. The results suggested that the interactions between
different land uses are more important than a single land use effect.
         On the other hand, GLMs have been very popular for predicting species occurrence and
distribution in freshwater ecosystems (Ahmadi-Nedushan et al., 2006). Van Sickle et al. (2004)
implemented a linear regression and a negative binomial GLM for projecting fish and
                                                  10


macroinvertebrate biological indices as a function of landscape and streamflow variables under
different timeline scenarios, including reference conditions. Donohue et al. (2006) employed a
stepwise binary logistic regression and logarithmic and quadratic regressions to obtain national-
wide relationships between catchment and water quality attributes and stream ecological status
based on an index representing the structure of benthic macroinvertebrate communities. Other
studies have implemented GLMs in an integrated ecological modeling framework involving
hydrodynamic, water quality and stream habitat suitability models to predict macroinvertebrate-
based stream health at a reach scale (Holguin-Gonzalez et al., 2013b, 2013a; Kuemmerlen et al.,
2014). Sui et al. (2014) developed a predictive model employing a geomorphology-based
hydrological model to determine ten flow indices. Then, a GLM was implemented in order to
relate those indices to the occurrence probabilities of 50 fish species at a watershed scale. In a
recent study, Gieswein et al. (2017) used GLM to quantify the pairwise stressor interactions
(strength and significance). This was performed following implementation of a decision tree-
based model for identifying stressor hierarchy. A Boosted Regression Trees (BRT) model was
used when analyzing the relationships between several factors (i.e. riparian land use, physical
habitat quality, nutrients, natural variables) and fish, macroinvertebrates and macrophytes
assemblages.
         With respect to GAMs, Maloney et al. (2012) compared the standard and boosted version
of this method for macroinvertebrate and fish metrics prediction, using watershed, stream, and
site attributes as explanatory variables. Results indicated that gradient boosting applied to GAMs
avoids overfitting and provides interpretable relationships, which is an advantage in comparison
with traditional machine learning techniques. Additionally, regular GAM has been also
compared with a GAM based on principal component analysis (PCA) for fish richness and
                                                  11


diversity prediction (Zhao et al., 2014). Results showed different selected explanatory variables
for each approach, generating different outcomes for the response variables. However, PCA-
based GAM performed better during cross-validation tests and was found to be more suitable
when predictors are highly correlated (Zhao et al., 2014). More recently, Almeida et al. (2017)
evaluated the effect of sampling effort in terms of transect length on fish metrics. This was
performed for a large Mediterranean watershed using a GAM with sampling area as a predictor.
Results indicated that fish indices that are obtained using predictive models are more sensitive to
sampling strategies than simpler biotic metrics that are model-independent, showing a decrease
in their values with increasing sampling area, despite observed higher richness.
         Regarding spatial-scale effects, Johnson and Host (2010) listed representative studies
from 2000 to 2008 involving invertebrates and fish assemblages. For each study, the authors
reported the scales (e.g. habitat, reach, local, ecosystem, watershed, regional, ecoregion) at which
explanatory variables explained the instream biological response. Johnson and Host (2010)
showed meaningful differences in the scale’s importance among the reviewed studies, due to the
different region sizes and disturbance levels for each case. A study by Frimpong et al. (2005)
used linear and piecewise regression to compare the performance of stream habitat indices
obtained at the watershed-scale and observations at the reach-scale for fish metrics prediction.
Results indicated that watershed-scale variables provided better predictions for stream health
than reach-scale variables. Additionally, predictive ability decreased with the spatial extent,
which might be attributed to the increase of the attributes’ heterogeneity. In another study, Van
Sickle and Burch Johnson (2008) developed a distance weighting model based on linear
regression for estimating specific land use areas within watersheds that best explain fish index of
biotic integrity (IBI). With this approach, it is possible to compare different scales of landscape
                                                   12


influence on stream health. Furthermore, linear models have been also used for multimetric
indices formulation. Pont et al. (2009) implemented a procedure involving stepwise, multilinear,
logistic, and Poisson regressions to build a predictive IBI for aquatic vertebrates (fish and aquatic
amphibians). Particularly, this procedure was developed to discriminate natural and
anthropogenic effects over the biotic metrics computation. Implementing the same approach,
Moya et al. (2011) developed a multimetric index for macroinvertebrate assemblages. This index
based on predictive models successfully discriminated between the reference and disturbed sites.
2.3.1.2 Ordination methods
        Ordination refers to multivariate statistical methods commonly classified into indirect
(unconstrained ordination) and direct gradient analysis (constrained ordination) (De’ath, 1999;
Guo et al., 2015b). Ordination methods are generally preferred when analyzing multiple species
at multiple sites (Ahmadi-Nedushan et al., 2006). The main objective of ordination is to reduce
dimensionality to identify patterns in the data while describing relationships with explanatory
variables (e.g. environmental gradients). As a result, data samples are ordered in such a way that
similar points are placed together (Ahmadi-Nedushan et al., 2006).
        Indirect gradient analysis only uses samples collected in one data matrix, extracting
dominant or orthogonal axes of variation. Any additional information regarding explanatory
variables is used afterwards to enhance results’ interpretation. These methods can be classified
into distance-based techniques (e.g. polar ordination – PO, principal coordinates analysis –
PCoA, nonmetric multidimensional scaling – NMDS) and Eigen analysis-based techniques,
which can be derived from linear models (e.g. PCA) or from unimodal (nonlinear) models (e.g.
correspondence analysis – CA, detrended correspondence analysis – DCA). Contrariwise, in
direct gradient analysis, variables of interest are directly related to explanatory variables.
Therefore, these techniques are preferred for habitat modeling (Ahmadi-Nedushan et al., 2006).
                                                   13


Direct gradient methods can also be based on linear models (e.g. redundancy analysis – RDA) or
unimodal models (e.g. canonical correspondence analysis – CCA, detrended canonical
correspondence analysis – DCCA). It is worth noting that linear models are preferred for short
gradients, whereas unimodal models are, in general, more suitable for aquatic habitat modeling
(Ahmadi-Nedushan et al., 2006). Further details regarding ordination techniques can be found
elsewhere (Borcard et al., 2011; Kent, 2006; Zuur et al., 2007).
        Multivariate methods for biological assessments, like the River Invertebrate Prediction
and Classification System (RIVPACS) and its variants (Abbasi and Abbasi, 2012; Feio and
Poquet, 2011), are mainly based on ordination methods. These multivariate methods attempt to
predict ratios of taxa observed vs. expected – O/E, carefully choosing reference sites while using
categorical (e.g. presence/absence) rather than continuous biological data. Some recent
applications of ordination methods within stream health modeling were reported by Gazendam et
al. (2016), indicating that the integration with other techniques is needed for identifying
relationships between environmental variables and stream health using PCA or CCA. For
instance, D’Ambrosio et al. (2014), used CCA and variance partitioning to evaluate the
relationships between instream habitat, spatial location, and geomorphic characteristics on fish
and macroinvertebrate-based stream health indices. This study was conducted considering highly
modified drainage channels as a consequence of agricultural activities. Results provided key
ecological drivers for each biological community, under different stream geomorphic condition
and location. Additionally, it is worth noting that ordination methods have been mainly
implemented to assess the influence of explanatory variables on response variables before
implementing more complex approaches for predicting stream health indicators (Lin et al.,
2016).
                                                  14


2.3.2 Machine Learning
        Machine learning is a form of artificial intelligence employing statistical, probabilistic
and optimization algorithms to identify relationships and patterns from datasets. The resulting
outcomes can be used for data analysis, visualization and prediction (Mitchell, 1999).
2.3.2.1 Decision tree-based methods
        Decision tree family of models are hierarchical structures also known as Classification
and Regression Trees (CART) (Breiman et al., 1984). These models divide the predictor space
into regions with a homogenous response, then fitting a constant to each region. A decision tree
grows using binary splits, resulting in a dendrogram with varying numbers of branches (De’ath,
2007). Each split is defined by threshold values accompanying the explanatory variables.
Regression trees fit the mean response to observations, while classification trees, which are used
for categorical data, fit the most frequent class as the constant. Usually, CART are grown to a
maximum and then pruned using cross-validation approaches to prevent overfitting (Hastie et al.,
2009). CART have been applied in several stream health applications, and nowadays are
commonly used as a benchmark approach for comparison with other methods (Ambelu et al.,
2010; He et al., 2010; Holguin-Gonzalez et al., 2014, 2013a; Maloney et al., 2009; Ocampo-
Duque et al., 2007; Waite et al., 2012; Wang et al., 2007). Known drawbacks of CART are the
difficulty in modeling smooth functions and producing very different results when making small
changes to the training data (Elith et al., 2008). However, ensemble methods based on
computational intensive procedures as boosting and bootstrap aggregation (a.k.a. bagging) have
shown better and promissory results in ecological applications (De’ath, 2007). Thus, two
ensemble methods, Boosted Regression Trees (BRT) and Random forests (RF), are further
described in the next sections.
                                                   15


2.3.2.1.1 Boosted regression trees
         Boosted regression trees (BRT) method is an advanced form of regression that combines
a large number of regression trees using the boosting technique to increase predictive
performance (De’ath, 2007; Friedman, 2001; Hastie et al., 2009). Boosting is a forward
sequential procedure that aims to find and merge results from multiple models (e.g. decision
trees), emphasizing on observations poorly represented by an existing combination of models
(Brown et al., 2012). For BRT, boosting works as an optimization technique, minimizing the
difference between predicted and observed values, adding at each step a new tree that best
reduces this difference (Elith et al., 2008). The technique updates the residuals in each iteration,
preserving the existing trees unchanged while extending the overall model. Hence, the final BRT
model is a linear combination of several decision trees. Like other regression methods, it is
possible to define the error distribution in BRT models in order to consider different response
types (e.g. Gaussian, Poisson, binomial).
         BRT models are controlled by two important parameters: the learning rate (lr), which
determines the contribution of each tree, and the tree complexity (tc), which controls the number
of terminal nodes and interactions. Both lr and tc define the required number of trees (nt) for an
optimal prediction (Elith et al., 2008). In addition, the stochasticity of BRT models is controlled
by the “bag fraction”, which refers to the observations that are randomly drawn to train each new
tree, with optimal values between 0.5 and 0.75 (Elith et al., 2008). In ecological applications,
small values for lr (<0.001), and therefore high nt (>1000), are preferred in order to avoid
overfitting, reduce the contribution of each tree, and increase predictive reliability (Elith et al.,
2008). Values for tc are defined depending on the data availability and are restricted to the
desired computing time. High values for tc implies a slower lr to keep a similar optimal nt (Elith
et al., 2008).
                                                  16


        Works conducted by Moisen et al. (2006), Elith et al. (2006) and Leathwick et al.
(2006a), are among the first studies employing BRT models for ecological applications.
Particularly, they showed that BRT models are more flexible and outperform regression models
like GLM or GAM in variable selection, higher variance explanation, and lower prediction error.
Moreover, BRT models are suitable for handling nonlinear relationships and can model smooth
functions and interactions (Elith et al., 2008). Applications in stream health modeling include the
determination of quantitative relationships between landscape variables and instream biological
response (Chee and Elith, 2012; Gieswein et al., 2017; Golden et al., 2016; Pilière et al., 2014;
Steel et al., 2017; Tonkin et al., 2014; Waite and Van Metre, 2017), prediction (Brown et al.,
2012; Clapcott et al., 2017; Elias et al., 2016; Leclere et al., 2011; May et al., 2015; Waite et al.,
2014, 2012), multimetric indices formulation (Clapcott et al., 2014; Esselman et al., 2013), and
setting of instream water quality/ecological objectives and disturbance thresholds (Clapcott et al.,
2012, 2010; Wagenhoff et al., 2016). Moreover, the most of recent studies involving BRTs
implementation have dealt with regional and national scales.
        Nonlinear relationships using BRTs have been explored using different explanatory and
response variables, and distinct objectives. For instance, Chee and Elith (2012) analyzed the
patterns of occurrence for 17 native and alien riverine fish species. In that study, the explanatory
variables comprised of 20 environmental predictors including physiographic, bioclimatic,
edaphic and land cover attributes. The survey method was also considered as a categorical
explanatory variable. Results showed that several, but not all, of the developed models are
transferable to adjacent regions. Meanwhile, Tonkin et al. (2014) explored the effects of distance
and barriers on the occurrence of macroinvertebrate assemblages. Hence, four different BRT
models (i.e. considering different factors driving invertebrate colonization) were evaluated. In
                                                  17


another study, Golden et al. (2016) addressed the relationship between landscape variables at the
watershed and riparian buffer scales with instream nutrient concentration and fish IBI under low
flow conditions. Landscape variables included temporal and geographic position attributes, and
indicators of runoff and point and non-point nutrient sources. Similarly, Pilière et al. (2014)
explored the relationships between environmental stressors and freshwater invertebrates
represented by the Invertebrate Community Index (ICI). Predictors included geography, water
quality, physical-habitat quality, and toxic pressure variables. The results suggested that it is
necessary to fit explanatory variables interactions to increase predictive ability and model
interpretability. In a most recent work, Waite and Van Metre (2017) used BRTs to identify the
most important stressors explaining the macroinvertebrate condition in streams. The final model
was determined sequentially eliminating variables according to cross-validation performance.
Three macroinvertebrate metrics and a multimetric index were used as response variables.
Results indicated that watershed-scale stressors acted as surrogate variables for instream
stressors. However, given the performance metrics for model validation, the authors did not
recommend using the fitted BRT models for prediction in unsampled sites. On the other hand,
there are several studies that implemented BRTs for understanding instream processes that affect
species distribution. For instance, a study by Steel et al. (2017) explored the relationship between
streamflow regime and water temperature metrics with the Shannon-Wiener diversity index, total
richness, and total density per square meter of benthic macroinvertebrate assemblages. Results
indicated that macroinvertebrate diversity and total richness showed the best predictive
performances, with metrics related to spring snowmelt recession and variability in summer water
temperature having the greatest relative influence. Meanwhile, the total density per square meter
                                                  18


of benthic macroinvertebrates showed the poorest fitness, limiting the interpretability of the
modeling results (Steel et al., 2017).
        In general, studies using BRTs concerned with predicting the stream health condition
using fish and macroinvertebrate communities are recent. Leclere et al. (2011) attempted to
select an appropriate statistical method for predicting nine fish species occurrence in large river
systems at a reach-scale. Compared methods were CART, GLM and BRT, where the latter
showed the best performance. Specified predictors comprised qualitative (occurrence of shallow
waters and shelters), semi-quantitative (range of coverage/magnitude of bottom subtract, current
velocity, shade, macrophytes, complexity structures), and quantitative (value for depth, stream
width, subtract diversity, cover of bed sediment) variables. The results showed that the BRT
method selected a greater number of variables, giving more importance to continuous variables.
Furthermore, this method provided a better ecological interpretability and consistency in the
obtained response curves. Other studies have used BRT models to predict benthic
macroinvertebrate metrics at a regional scale. For instance, Waite et al. (2012) used land use and
land cover explanatory variables to obtain richness and O/E ratios. In addition, the study
compared MLR, CART, Random forests (RF) and BRT predictive performances. Results
indicated that BRT outperformed the other methods and provided additional information
regarding potential interaction among explanatory variables. Meanwhile, Brown et al. (2012)
used benthic macroinvertebrate index of biotic integrity (I-IBI) as the response variable.
Landscape variables at the watershed and riparian-buffer scales were selected as explanatory
variables. In the study, population density and agricultural and urban land use were the predictors
with the highest influence on the response. However, the final BRT model was not able to
capture the minimum and maximum values of the observed data. Therefore, the results suggested
                                                 19


that the outcomes from BRT models should not be used to predict index values at specific sites.
Instead, the model is recommended to be used to predict impairment condition due to watershed
disturbance (Brown et al., 2012).
        In a more recent study, Elias et al. (2016) implemented a two-level nested model to
predict macroinvertebrate occurrence. The first level attempted to predict four water quality
variables (dissolved oxygen, phosphates, ammonium and nitrates) with a BRT model. Then,
these variables were used to predict reference conditions of stream health employing a different
modeling approach. However, results showed that the BRT model was only successful in
predicting nitrates. Other studies have addressed related fields to stream health modeling like
surface water and groundwater interactions and water quality modeling (Poor and Ullman, 2010;
Smucker et al., 2013). For instance, Johnson et al. (2017) successfully estimated the effects of
groundwater seepage on stream temperature in unsampled sites at headwaters. Hence, landform
and precipitation covariates, representing spatial and temporal variables, respectively, were
selected as explanatory variables. During the modeling process, these variables were sequentially
eliminated using ranking, clustering and model simplification approaches.
        The impacts of the scale of study on model predictability have been also addressed with
BRT models in which depending on the size and location of study, the importance of natural
gradients and anthropogenic stressors are different (May et al., 2015; Waite et al., 2014).
However, it was suggested that using BRT models for small spatial scales provide more accurate
predictions compare to large-scale studies (Waite et al., 2014). A study by Clapcott et al. (2017)
attempted to estimate site-specific contemporary and reference values for a macroinvertebrate
index, evaluating spatial-scale effects on the predictions. Models at national and regional scales
were tested for comparison. In general, the proportion of native vegetation in upstream
                                                  20


catchments was the primary predictor, while the remaining relevant predictors varied regionally.
Main environmental predictor at large scales also included flow variability, habitat category,
substrate composition, summer temperature and average upstream slope. Regional models
showed that low flow remaining downstream after daily water allocation, and calcium
concentration of rocks in the catchments were relevant. Other methods (i.e. ANCOVA and RF)
were also evaluated. Results indicated that models at different scales were equally informative.
However, the authors recommended using finer-scale predictors to improve the model accuracy.
These variables include substrate size, nutrient concentrations, streamflow, and temperature.
Additionally, it was shown that regression tree-based methods did not overestimate biological
condition scores because the methods do not extrapolate beyond the range of observations.
Meanwhile, these methods are able to predict stream classes that are unrepresented in the
available observations (Clapcott et al., 2017).
        BRT model interpretability has been found to be suitable for defining disturbance
thresholds and setting instream objectives. Particularly, functional indicators for ecosystem
processes are often used as response variables for the task mentioned above (Clapcott et al.,
2012, 2010; Wagenhoff et al., 2016). Functional indicators include variables related to primary
production, ecosystem respiration, organic matter breakdown, cellulose decomposition potential,
among others (Clapcott et al., 2010). Furthermore, it has been suggested that statistical analysis
based on single-stressor models have the tendency to provide spurious thresholds for
management purposes (Wagenhoff et al., 2016). Meanwhile, the use of sediment-specific
macroinvertebrate metrics has been encouraged for improving prediction and threshold definition
(Wagenhoff et al., 2016). Other findings include the importance of spatial variation of the
predictors for increasing predictive power (Clapcott et al., 2012). Finally, additional applications
                                                 21


of BRT models at national and continental scales comprise the definition of multimetric indices.
Examples include the formulation of fish community indicators in large regions (Esselman et al.,
2013) and multimetric indices development based on water quality-based predictive modeling,
measurements of macroinvertebrate and fish assemblages, and indicators for ecosystem
processes (Clapcott et al., 2014).
2.3.2.1.2 Random forests
        Similar to BRT, Random forests (RF) are collections of individual CART (Breiman,
2001). This method fits several trees using bootstrap samples of the training data while
employing a small number of randomly selected predictors from the explanatory variables
(Snelder and J. Booker, 2013). The bootstrapping process attempts to reduce the variance of
estimated outputs (Hastie et al., 2009), which is typically high for single large regression trees.
For each bootstrapped sample, the largest tree is grown but not pruned, and aggregation is made
by averaging or majority voting the trees (Carlisle et al., 2009b; Cutler et al., 2007). The size of
randomly selected predictors is usually √𝑝 or log⁡(𝑝), being 𝑝 the number of explanatory
variables (De’ath, 2007). This method requires a large number of trees to ensure convergence
(Booker et al., 2015). Additionally, because of the bootstrap sampling, RF excludes over 37% of
observed data for growing the regression or classification trees. This non-drawn portion is called
out-of-bag samples (Prasad et al., 2006), and error estimates are computed using these samples.
Then, these error estimates are used for regression trees aggregation (Carlisle et al., 2009b).
When used for classification, RF determines the most frequent class across all trees for each
observation within the out-of-bag portion. Estimating the error using the out-of-bag samples is
almost equivalent to perform k-fold cross-validation in which once the error stabilizes, the
                                                 22


training is terminated (Hastie et al., 2009). Therefore, because a large number of trees provides
limited generalization errors, RF method prevents overfitting (Prasad et al., 2006).
        RF applications include predicting the instream biological condition and simulating
ecologically-relevant hydrologic indices in undisturbed sites and ungauged locations. Most of the
studies have been developed at regional and national scales. Some RF models have attempted to
evaluate the effects of human activities and relevant environmental factors on natural aquatic
ecosystems (He et al., 2010), while others have addressed the prediction of instream biological
condition in ungauged locations (Carlisle et al., 2009a; Hill et al., 2017). Many of these studies
have been focused on predicting macroinvertebrate taxa richness and composition (Álvarez-
Cabria et al., 2017; Booker et al., 2015; Carlisle et al., 2009a; Chinnayakanahalli et al., 2011;
Patrick and Yuan, 2017; Vander Laan et al., 2013; Waite et al., 2014). Other studies have
addressed the RIVPACS-type approach for determining biological condition (Carlisle et al.,
2009a; Chinnayakanahalli et al., 2011). Moreover, fish assemblages composition and richness
(He et al., 2010; Patrick and Yuan, 2017), and fish biomass (Álvarez-Cabria et al., 2017) have
been also modeled. Other applications include determining reference conditions (Clapcott et al.,
2017). For biological condition prediction, common explanatory variables comprise of geospatial
datasets including land cover, land use, topography, climate, soils, societal infrastructure, and
hydrologic modification. Some of these studies have addressed natural flow regime and water
quality and temperature roles on biological condition predictability (Booker et al., 2015;
Chinnayakanahalli et al., 2011; Patrick and Yuan, 2017; Vander Laan et al., 2013). For instance,
Patrick and Yuan (2017) used the 171 Hydrologic Index Tool (HIT) indices as predictors, which
were obtained with statistical modeling. Other studies have identified variables describing
natural and human activities as important predictors (Carlisle et al., 2009a; He et al., 2010),
                                                  23


especially land use changes and flow modification. Regarding ecologically-relevant hydrologic
indices, common explanatory variables are related to geospatial data describing natural
watershed characteristics. However, RF models usually include a reduced number of indices
(between 1 and 36), and prediction errors range from 15% to 40% (Carlisle et al., 2009b). Other
applications include river classification (Dhungel et al., 2016; Snelder and J. Booker, 2013),
where classification success has ranged from 34% to 75%. Additionally, there are studies that
have analyzed potential effects of climate change (Dhungel et al., 2016), and environmental
flow, and flow-ecology relationships, for environmental impact assessment (Buchanan et al.,
2017).
2.3.2.2 Artificial neural networks
        Artificial neural networks (ANNs) are nonlinear models with many parameters flexible
enough to approximate any smooth function. ANN is a learning method based on the idea of
building linear combinations of the specified explanatory variables and then modeling the
response variables as nonlinear functions of these linear combinations (Hastie et al., 2009).
ANNs are labeled as “black box” models and are known for being useful for prediction but not
very useful for producing understandable models (i.e. provide limited insight into the relative
influence of explanatory variables). However, multiple methods for understanding ANNs results
are available, including sensitivity analysis and randomization tests (Gevrey et al., 2003; Olden
and Jackson, 2002).
        A typical two-stage regression or classification network model is known as a feed-
forward neural network. Under this approach, linear combinations of explanatory variables 𝑋 are
transformed into 𝑍 elements called “hidden units” using a nonlinear function⁡𝜎, known as the
“activation function”. Usually, 𝜎 is the sigmoid function, which is a smooth version of the step
                                                  24


function. However, Gaussian radial basis functions are also commonly used (Hastie et al., 2009;
Mathon et al., 2013). For ANN setting, more than one layer of hidden units can be used (Lek and
Guégan, 1999). Afterwards, these Z elements are linearly combined. Then, the linear
combinations are transformed using an output function to provide the response variables. The
constants involved in the linear combinations are known as “weights”. The aforementioned
ANNs are called multilayer perceptrons (MLPs) and are very popular among ecological
applications. In order to fit ANNs with observed data, a two-pass procedure known as back-
propagation, which is a gradient descent algorithm, is typically employed in order to determine
the weights (Hastie et al., 2009). Hence, key MLP parameters include the number of times that
the training data is used to update the weights of the hidden units (i.e. epochs) and the number of
hidden layers and units. Alternatives to MLP, such as the Generalized Regression Neural
Network (GRNN), have been recently applied for stream health modeling (Mathon et al., 2013;
Sutela et al., 2010).
        ANNs were first introduced in ecological applications during the 1990s (Lek et al., 1996;
Lek and Guégan, 1999); however, since then the method has been widely used. Goethals et al.
(Goethals et al., 2007) presented a review of 26 representative studies addressing
macroinvertebrate prediction, covering a period from 1998 to 2006. In those studies, the response
variables were usually presence/absence, abundance and derived indicators (e.g. richness,
average score per taxon, exergy), using landscape, instream and water quality attributes as
explanatory variables (i.e. inputs). Studies using fish biological data have been also implemented
for evaluating restoration projects and understanding the effects of changes in physical habitat
and water quality variables (Olaya-Marín et al., 2013, 2012; Olden et al., 2008).
                                                 25


         Other studies have reported an increased use of self-organizing maps (SOMs) for
predicting macroinvertebrate and fish distributions and exploring relationships with landscape
and environmental variables (Chon, 2011; Tsai et al., 2016). SOMs, also known as the Kohonen
networks, are unsupervised ANNs that are implemented for pattern classification, clustering, and
ordering purposes (Kalteh et al., 2008), and are also known for approximating probability density
functions of the input data (Chon, 2011). Recent works have addressed variable selection and
uncertainty analysis to gain insight into the results interpretability, which is one of the main
ANN’s drawbacks. For instance, Mouton et al. (2010) compared six different methods reviewed
by Gevrey et al. (2003) for evaluating explanatory variables’ individual contribution in
predicting macroinvertebrates abundance. The results indicated that some techniques are more
sensitive and less stable than others are. However, it was shown that the different methods were
able to provide consistent results regarding the order of importance for the explanatory variables.
Likewise, Grenouillet et al. (2011) and Guo et al. (2015a) evaluated the variability and
uncertainty of an ensemble of several models including GLM, GAM, CART, RF, ANNs, among
others, in the prediction of fish distributions in streams and lakes at a national level, respectively.
This evaluation included the comparison between the individual predictions provided by every
single model and an average of ensemble models. Results indicated that ensemble modeling
results improve accuracy when modeling different species and revealed uncertainty dependence
upon geographical extent. In addition, Gazendam et al. (2016) developed an ANN model to
predict two macroinvertebrate indices (Hilsenhoff’s Biotic Index and richness) while testing
combinations of physically-based input variables (geomorphic, riparian, hydrology and
watershed-level). Results showed that considering both watershed- and reach-scale (geomorphic
                                                   26


and riparian) inputs can improve model performance and are consistent with findings using
ordination techniques (D’Ambrosio et al., 2014, 2009).
2.3.2.3 Other methods
         Additional statistical and machine learning methods that have been implemented for
SDM and habitat suitability include Multivariate adaptive regression splines (MARS), Support
Vector Machines (SVMs), and Partial Least Squares Regression (PLSR). Other methods such as
Maximum Entropy (Phillips et al., 2006) and Genetic algorithm for rule set production (GARP)
(Stockwell and Peters, 1999) are not further described herein because they are intended for using
presence-only data. It is worth noting that these two methods are especially suitable when
working with small sample sizes and incomplete datasets (Li and Wang, 2013; Yi et al., 2017).
         MARS (Friedman, 1991) are multidimensional extensions of GAMs that build up from
basis functions fitting separate splines for different intervals (with extremes known as knots) of
the predictor variables. MARS algorithm finds the location and number knots using an
exhaustive search procedure in a forward/backward stepwise fashion (Prasad et al., 2006).
MARS models are better suited than CART for continuous variables, can handle large numbers
of explanatory variables with low order interactions, and automatically quantify interaction
effects (Li and Wang, 2013; Prasad et al., 2006). However its interpretability is limited when
analyzing species-environmental relationships, its parameter identification is not straightforward,
and it is highly sensitive to extrapolation (Prasad et al., 2006). MARS method is mainly
implemented for multi-species modeling (Guo et al., 2015a) and is considered as an alternative to
other regression methods such as GLM and GAM (Barry and Elith, 2006; Li and Wang, 2013; Yi
et al., 2017).
         On the other hand, SVMs (Vapnik, 2000) is a machine learning algorithm that also uses
basis functions known as kernels to map data into a new nonlinear feature hyperspace attempting
                                                   27


to simplify data patterns. Then, the data is classified while the margins between hyperplanes that
are used to define classes are maximized. These hyperplanes are determined by a set of support
vectors using quadratic programming. An advantage of SVMs is the reduced number of tuning
parameters. Also, SVM models are not prone to overfitting. However, because this method does
not provide a simple representation or a pictorial graph, the interpretation of modeling results is
difficult (Cutler et al., 2007), the algorithm is computationally demanding, and the tuning
parameters are poorly identifiable when data is not linearly separable (Guo et al., 2015b). SVM
have been applied mainly for modeling species-habitat relationships for fish communities
(Fukuda et al., 2013; Fukuda and De Baets, 2016; Muñoz-Mas et al., 2018, 2016) and for
predicting the occurrence and macroinvertebrate-based stream health indices using landscape and
water quality attributes (Ambelu et al., 2010; Fan et al., 2017; Hoang et al., 2010; Lin et al.,
2016; Sor et al., 2017). In general, the studies have indicated similar performances between ANN
and SVM when predicting continuous variables, although SVM usually performs slightly better
than ANN. Meanwhile, methods such as RF have shown better results than SVM for
classification purposes.
        PLSR (Wold et al., 2001) is a method that projects explanatory and response variables
into a new space where MLR is performed. PLSR is known for handling multicollinearity and
strong correlation of predictors while allowing a high interpretability of the resulting regression
coefficients (Villeneuve et al., 2015). Moreover, this method is suitable when the number of
explanatory variables is greater than the number of observations (Abouali et al., 2016b). PLSR
method has been used as an explicative method for quantifying relationships between several
stressors and stream health indicators. This method is especially suitable for identifying the most
                                                   28


relevant input variables before any predictive model training (Abouali et al., 2016b; Einheuser et
al., 2013a; Villeneuve et al., 2015).
        In an attempt to facilitate the integration with expert elicitation regarding causal
relationships, different approaches are being integrated within the SEM framework for stream
health modeling (Riseng et al., 2011; Surridge et al., 2014). For instance, a recent study by
Villeneuve et al. (2018) shows an application of the aforementioned framework using PLSR for
evaluating the direct and indirect effects of multiple stressors on macroinvertebrate indices in
nested spatial scales (watershed, reach, and site). Results showed that the direct effects of
instream water quality conditions decrease when indirect effects from land use and hydro-
morphological alterations are considered for macroinvertebrate assemblages.
2.3.3 Soft Computing Methods
        Soft computing is a collection of paradigms that attempt to represent complex systems in
an environment of imprecision, uncertainty and partial truth, resembling human mind and
biological systems’ learning. Soft computing approaches include fuzzy logic, neurocomputing,
evolutionary computation and probabilistic reasoning (Zadeh, 1994, 1998).
2.3.3.1 Fuzzy logic-based methods
        Fuzzy logic-based methods are used to model nonlinear relationships employing
linguistic terms instead of numeric values. These methods incorporate membership functions,
fuzzy set operations, and if-then rules for mapping from a given input to an output. This process
is also called “fuzzy inference” (Ocampo-Duque et al., 2006). The membership functions are
curves with values between 0 and 1 that represent the degree of membership of an input
variable’s value (element) to a certain fuzzy set, where 0 and 1 represent non- and full
membership, respectively (Adriaenssens et al., 2006). Given that the same element may belong
to several sets at the same time, expert knowledge is necessary to define the overlap between
                                                   29


different membership functions for the same variable. Then, the obtained membership values are
combined for different variables using fuzzy if-then rules incorporating fuzzy set operations.
Outcomes from different fuzzy rules are then aggregated into a final fuzzy score. Fuzzy set
operations include union, intersection and additive complement. Following Adriaenssens et al.
(2004a), fuzzy if-then rules consist of antecedent and consequent parts, which mainly rely on
expert knowledge. The antecedent part states conditions for the explanatory variables (i.e. input),
while the consequent describes the corresponding values of the response variables (i.e. output).
When both parts are concerned with statements that define the value of the variables without
considering any explicit function relating the explanatory and response variables, the model
belongs to the Mamdani-Assilian type. On the other hand, when the consequent part establishes a
linear or nonlinear relationship between the explanatory and response variables, the model
belongs to the Takagi-Sugeno type. It would be necessary to implement a weighting operation
using a specified decision-making method (e.g. Analytic Hierarchy Process) to define the
influence of each input variable in the final fuzzy score (Ocampo-Duque et al., 2006). Once the
fuzzy rules are implemented, the defuzzification operation is performed, if necessary, in order to
obtain a numerical output under no-fuzzy contexts (Ocampo-Duque et al., 2006). Fuzzy logic
models are generally trained employing optimization routines based on the Shannon-Waver
entropy (tuning the fuzzy sets parameters looking for entropies values greater than 0.85) or
Genetic Algorithms to generate uniformly distributed fuzzy sets (Yi et al., 2017). Then, the
nearest ascent hill-climbing algorithm can be used for optimizing the consequent part of each
fuzzy rule (Yi et al., 2017).
        Compared to ANNs, fuzzy logic-based methods provide a more transparent insight into
the influence and interactions of input variables. Moreover, the analysis of the model is more
                                                  30


intuitive and can be performed in a semi-qualitative manner while uncertainty quantification can
be easily integrated. However, setting the membership functions and fuzzy rules, which are
jointly referred as the knowledge database, is not an easy task (Adriaenssens et al., 2004a). In
order to address this issue, several techniques such as Adaptive neuro-fuzzy inference systems
(ANFIS) (Jang, 1993) were introduced. ANFIS is a hybrid method that combines ANNs and
fuzzy logic. An adaptive network is a multilayer feed-forward neural network where the hidden
units may or may not have parameters. These parameters are determined during the training
process using observed data by minimizing the predictive error. Therefore, the membership
functions are introduced in the first hidden layer of parameterized units. Successive layers, which
contain non-parameterized units, represent the if-then fuzzy rules while incorporating the fuzzy
set operations. Each successive layer attempts to represent the different specified fuzzy rules.
The last layer before computing the overall fuzzy score comprises parameterized units. In this
layer, the linear and non-linear relationships for Takagi-Sugeno type models are introduced.
These relationships are referred as “output membership functions” (Jang, 1993). The number of
parameters in the ANFIS model depends on the number of explanatory variables, the number of
membership functions per variable, the membership functions shape, and the output membership
function type. Thus, the number of parameters should not be greater than the number of available
observations for model training in order to avoid overfitting (Woznicki et al., 2016a).
        Early fuzzy logic applications for predictive modeling and ecosystem management were
reviewed by Adriaenssens et al. (2004a). The integration of expert knowledge, the use of
qualitative data, and the capacity to incorporate uncertainty assessments have motivated the
growing use of fuzzy logic-based methods in ecological applications (Adriaenssens et al., 2004a;
Ocampo-Duque et al., 2006). Recent studies include the prediction of macroinvertebrate
                                                  31


abundance (Adriaenssens et al., 2006; Mouton et al., 2009; Van Broekhoven et al., 2006),
ecological status classification (Ocampo-Duque et al., 2007), the development of multimetric
stream health indices (Marchini et al., 2009), and the prediction of fish species occurrence,
distribution or derived indices (Boavida et al., 2014; Muñoz-Mas et al., 2012). When applied for
prediction, fuzzy logic-based models typically use water quality, hydrologic and hydro-
morphological attributes and knowledge based on species preferences and tolerances. In an
attempt to implement fuzzy logic in a more data-driven fashion, ANFIS has been recently
implemented using process-based models to evaluate the impacts of conservation practices and
environmental changes on stream health indices at a watershed scale (Einheuser et al., 2013a,
2013b, 2012). For instance, the Soil and Water Assessment Tool (SWAT) has been employed to
simulate streamflow series in order to obtain ecologically-relevant hydrologic indices. Those
hydrologic indices are later used to predict stream health indices using ANFIS (Herman et al.,
2016, 2015). Others studies have also incorporated water quality simulations (sediment and
nutrients) for predicting stream health indicators (Woznicki et al., 2016a, 2016b). Moreover,
Abouali et al. (Abouali et al., 2016b), employed the same integrated framework to develop a
two-phase approach coupling Partial Least Square Regression (PLSR) and ANFIS for predicting
macroinvertebrate and fish-based indices. Results showed a significant improvement in the
prediction power while omitting the need for variable selection. It is worth noting that ensemble
modeling frameworks integrating process-based models are especially suitable for evaluating
climate change effects on biological assemblages (Daneshvar et al., 2017a). Recently, Yi et al.
(2017) presented a detailed revision of fuzzy logic integration with process-based models
intended for habitat modeling. Remarkable applications include micro-habitat
                                                  32


selection/evaluation for fish and macroinvertebrate assemblages and dam operation/removal
assessments (e.g. CASiMiR model).
2.3.3.2 Bayesian belief networks
        Bayesian belief networks (BBN) are directed acyclic graphs having nodes linked by
probabilities (Pearl, 1986). Each node represents constants, discrete or continuous variables, or
continuous functions in the model, whereas the arrowed links indicate direct correlation or causal
relationships between nodes (McCann et al., 2006). There are two types of nodes: parent or
independent nodes (nodes that don’t have arrows incoming or outgoing), and child nodes (with
one or both incoming and outgoing arrows). Each node has an associated probability distribution,
which is unconditional for parent nodes and conditional for child nodes. The outcomes of each
node are known as states (McDonald et al., 2015). Probability distributions are usually defined
for each node in terms of the states (i.e. conditional probability tables, CPTs) (McCann et al.,
2006). Consequently, a BBN is comprised of a qualitative component referring to the network
structure and a quantitative component given by the CPTs within each node (Phan et al., 2016).
        The structure of the network is iteratively determined using expert knowledge and/or
prior data, where metrics such as correlation coefficients are often used to define causal links. It
is important to note that causal links should be carefully defined in order to prevent high levels
of uncertainty in the model’s outputs (McDonald et al., 2015). Conditional independence tests
are often used for learning the BBN structure. However, when multiple BBN are tested, model
selection using the Bayesian Information Criterion (BIC), or optimization routines are also
employed (Aguilera et al., 2011). Determining relevant variables is usually addressed by
knowing that the nodes within a network can be unconditionally separated, or conditionally
separated/connected if prior knowledge is given in other nodes (Aguilera et al., 2011). On the
other hand, determining the probability values populating CPTs is usually done using prior data
                                                  33


and the Bayes’ theorem for probabilities propagation from parent nodes. Training algorithms
have been developed in accordance with the BBN description of the joint probability distribution
of the nodes within the network (Pérez-Miñana, 2016). CPTs, which are marginal probability
distributions, are often parameterized and calculated using approaches based on Monte Carlo
simulation (Phan et al., 2016), Gibbs sampling or dynamic discretization (Nojavan A. et al.,
2017; Pérez-Miñana, 2016), maximum likelihood or the Laplace correction (Aguilera et al.,
2011), or the Expectation maximization algorithm for small and incomplete datasets, and the
gradient learning algorithm for large incomplete datasets and continuous data (McDonald et al.,
2015). In general, populating CPTs requires either expert knowledge, the use of several data-
based methods, or both (Phan et al., 2016).
        There are recent reviews covering BBN applications in areas such as environmental
modeling (Aguilera et al., 2011), ecosystem services modeling (Landuyt et al., 2013; Pérez-
Miñana, 2016), ecological risk assessment (McDonald et al., 2015), and water resources
management (Phan et al., 2016), which also include some early studies related to stream health
modeling (Adriaenssens et al., 2004b; Marcot et al., 2001). The aforementioned reviews
addressed different aspects of BBN model development/application such as data pre-processing,
complexity, training, optimization, validation methods, variations and extensions (e.g. dynamic
Bayesian networks for time series modeling and Hidden Markov Models for higher order
relationships, see Tucker and Duplisea (2012)), integration with other modeling techniques, and
software comparison. Moreover, given the extensive application of BBN in ecology and
environmental science, there are guidelines addressing good modeling practices (Chen and
Pollino, 2012; Marcot et al., 2006), overfitting, uncertainty quantification, and salient issues
(Marcot, 2017, 2012).
                                                  34


        Representative BBN applications in stream health modeling include the evaluation of
cumulative environmental impacts of multiple stressors on ecosystem health using traditional
and scientific knowledge (Mantyka-Pringle et al., 2017). The study incorporated biotic factors
describing wildlife health, food webs, wildlife populations, fish health, macroinvertebrate
metrics (density, richness and diversity), among others. Other recent studies have also addressed
the estimation of the interactive effect of land use and climate change (considering its influence
on water quality, physical factors and habitat characteristics) on fish population success
(Turschwell et al., 2017), EPT macroinvertebrates indices integrating SEM (Li et al., 2018), and
both fish and macroinvertebrate richness (Mantyka-Pringle et al., 2014). Likewise, BBN models
have been developed for predicting macroinvertebrate indices using land use, physicochemical,
and hydro-morphological factors (Allan et al., 2012; Forio et al., 2015; McLaughlin and
Reckhow, 2017). However, McLaughlin and Reckhow (2017) could not find strong causal
relationships or a high predictive power when relating water quality parameters (e.g. nutrients,
chlorophyll, dissolved oxygen) and habitat attributes to benthic macroinvertebrates in streams.
The results may indicate that the BNN model is reflecting associations rather than strict causal
relationships between variables. In addition, low predictive power might be a consequence of not
considering all relevant factors affecting the response variable. However, other studies have been
able to identify relevant stressors. For instance, Forio et al. (2015) indicated that flow velocity is
a major variable determining stream health, which is highly sensitive to natural streamflow
alterations by dams and water abstractions. In addition, Dyer et al. (2014) ended up with a
similar conclusion when analyzing the effects of climate change and stream regulation on
ecologically-relevant hydrologic indices and water quality attributes using BBN. Moreover, there
are studies implementing BBN for environmental flows decision-making processes (Leigh et al.,
                                                   35


2012; Shenton et al., 2014) and water management for fish species conservation (Peterson et al.,
2013; Vilizzi et al., 2013). On the other hand, Death et al. (2015) compared BNN with logistic
regression, artificial neural networks, classification trees, and random forests while predicting a
macroinvertebrate-based stream health index at a national scale. Results indicated that BNN
moderately outperformed the other methods; however, the model preparation is more time-
consuming. BBN have been also implemented for evaluating the habitat suitability of invasive
macroinvertebrate species using purely data-driven and expert knowledge-based approaches
(Boets et al., 2015).
2.4 KNOWLEDGE GAP ANALYSIS
        Explaining cause-effect relationships between environmental and anthropogenic factors
with measures of ecological integrity or health in freshwater ecosystems is not a straightforward
task because of the complex, nonlinear and uncertain nature of these systems (Niemi and
McDonald, 2004). However, growing applications of statistical and soft computing methods
have been employed to address the aforementioned challenges. When selecting a modeling
approach, aspects regarding variable selection, interpretability of modeling results, modeling
ensembles for increasing predictive ability, and model evaluation and overfitting should be
considered. Particularly, we have identified the need for developing guidelines regarding three
main aspects of stream health modeling practice: variable selection, model evaluation, and data
acquisition and uncertainty analysis for modeling ensembles. In this section, we briefly describe
the aspects mentioned above and indicate the corresponding research priorities.
        Strategies for variable selection include trial and error, expert knowledge, statistical
analysis, heuristics, or combinations of these methods (Falcone et al., 2010; May et al., 2008).
However, there is not an agreement about how to proceed with variable selection when
                                                  36


developing stream health models (Woznicki et al., 2015). Variable selection is a critical step in
any empirical modeling exercise, compromising performance, efficiency and interpretability (Li
et al., 2015). Moreover, the number of selected input variables defines the number of model
parameters to be calibrated, the computational effort, and is critical for overfitting prevention
(Galelli et al., 2014). According to the studies reviewed in this paper, ordination and
classification methods, nonparametric rank correlations and Bayesian methods are usually
implemented. For instance, Woznicki et al. (Woznicki et al., 2015) compared PCA, Spearman’s
rank correlation and Bayesian variable selection when modeling macroinvertebrates and fish
based indices with ANFIS. Results showed that Bayesian variable selection provided the best
final models (Woznicki et al., 2015). Other variable selection approaches are based on stepwise
procedures using relative quality (e.g. using AIC) (Clapcott et al., 2017), backward elimination
(Fox et al., 2017), clustering employing SOMs (Bowden et al., 2005), partial mutual information
(Fernando et al., 2009) and using interpretability tools based on sensitivity and perturbation
analysis and the relative importance of the predictors (Elith et al., 2008; Gevrey et al., 2003).
Nevertheless, there is a lack of studies comparing different variable selection methods with
different stream health modeling approaches. Thus, further research in this area is encouraged to
gain insight into the influence of different ensembles of variable selection approaches and
modeling methods over predictive ability and model parsimony.
         Traditional statistical methods based on linear regression (e.g. MLR, GLM) are generally
transparent and their coefficients can be well interpreted (Li and Wang, 2013). However, those
methods usually show the lowest predictive power. For more complex models, there have been
initiatives introducing expert elicitation into the model formulation and training. In those cases,
fuzzy logic and Bayesian belief networks approaches have played an important role (Mantyka-
                                                   37


Pringle et al., 2017; Mouton et al., 2009). On the other hand, when working with data-driven
approaches, some alternatives have been formulated depending on the implemented modeling
method. For instance, in decision tree-based methods (e.g. CART, BRT, RF), results are
interpreted estimating the relative influence of predictor variables and are visualized and
examined using partial dependence plots (Hastie et al., 2009). Relative influence in single trees
can be measured based on the number of times a variable is selected for splitting. The number of
times is weighted by a squared sum measure of model improvement as a result of adding each
split to the individual tree. The final relative influence is obtained by averaging the
corresponding results for the individual trees and then standardizing the final values (Elith et al.,
2008). A similar option for obtaining relative influence measures can be applied using the out-of-
bag samples, assuming that relative decrease in prediction accuracy is related to the variable
influence (Carlisle et al., 2009a). The out-of-bag samples observations are used to obtain a
decrease in prediction accuracy when the explanatory variables are randomly permuted in each
tree. Then, the decrease is averaged and standardized across all trees (Carlisle et al., 2009a).
Partial dependence plots attempt to represent the effect of a variable when assigning average
values for all other variables in the model. The plots can reveal strong interactions when using
multiple variables and are especially suitable for detecting disturbance thresholds or ranges.
However, these plots are limited to low-dimensional views (Hastie et al., 2009). Other methods
for analyzing the contribution of input variables are based on partial derivatives, perturbation of
the input variables, and successive variation in a certain input variable while the remaining are
kept constant, among others that are specifically designed for ANNs (e.g. neural interpretation
diagram, Garson’s algorithm, randomization test, stepwise methods) (Gevrey et al., 2003; Olden
and Jackson, 2002). However, to significantly improve the interpretability of modeling results, it
                                                    38


is necessary to advance towards frameworks which incorporate process-based models to describe
disturbance factors (Araújo and New, 2007).
         Ensemble modeling frameworks have been introduced to improve predictive ability while
understanding cause-effect and multiscale dynamics driving instream changes due to alterations
in landscape and environmental factors. The inclusion of process-based modeling approaches
into integrated stream health modeling frameworks has been developed for different scales (i.e.
macro-, meso- and micro-scale). For instance, at a macroscale (e.g. watershed, ecoregion), main
advancements are related to the representation of bioclimatic and hydrologic factors using
climatic, hydrologic, hydraulic, and water quality models. At meso- (e.g. river segments,
hydromorphologic units: pool, riffle, run,…) and micro-scales (e.g. point locations, substrate),
computational fluid dynamics, hydraulic, water quality and physical habitat models predicting
local velocities, depths and physic-chemical constituents are often implemented (Daneshvar et
al., 2017b; Yi et al., 2017). For instance, Jähnig et al. (2012) proposed a framework following
the driver-pressure-state-impact concept. Drivers include main watershed and instream stressors
(e.g. climate, land use, river alteration). Pressures on freshwater ecosystems comprise hydrologic
and hydraulic stress, sediment intake, substrate stability, among others, which can be represented
by process-based models developed in several areas such as ecohydrology and ecohydraulics.
State refers to the outputs driven by the aforementioned pressures (e.g. extreme events, sediment,
velocity, depth, substrate, nutrients). Finally, to evaluate the impact of the different state
variables, species distribution, aquatic biodiversity and stream health measures can be obtained
(Jähnig et al., 2012). A similar comprehensive framework was recently proposed by Kail et al.
(2015), introducing a module for simulating stream channel geomorphological evolution. Other
authors have put more attention on states describing the flow regime, using ecologically-relevant
                                                   39


hydrologic indices (Woznicki et al., 2016a) and linkages with climate change scenarios
(Daneshvar et al., 2017a; Guse et al., 2015). It is also worth noting that there is a trade-off
between model complexity and uncertainty. In ensemble stream health modeling, gaining model
predictive power and interpretability implies multiple information sources, an elevated number
of model parameters and the persistence of epistemic errors in model formulation. Hence, higher
outcome uncertainties might be expected. Therefore, increased efforts studying uncertainty
propagation and shrinking in ensemble modeling must be addressed to promote more transparent
decision-making processes. On the other hand, with the advent of big data sources and
applications, including image processing and long-term monitoring data (Kuemmerlen et al.,
2016), recent advances in deep learning are encouraged to be implemented and integrated within
existing modeling frameworks (Babbar-Sebens et al., 2015; Lecun et al., 2015). Furthermore, it
is necessary to evaluate how the current strategies for biological data acquisition are compatible
with data derived from remote sensing products and traditional environmental information
systems and networks, and how these strategies can help us to better understand the health of a
stream. In summary, stream health modeling should be seen as complementary tools that require
continuous validation with field measurements (Kuehne et al., 2017). Therefore, clear guidelines
regarding monitoring stream health for modeling purposes should be considered in future
studies.
         With respect to model evaluation and overfitting, the commonly used performance
measures depend on the type of response variable. When categorical variables are predicted (e.g.
presence/absence, impairment condition), the percentage of correctly classified observations, true
statistical skills, sensitivity, specificity, Cohen’s kappa, and the area under receiver operating
characteristic curve (AUC) are commonly calculated (Manel et al., 2001; Sor et al., 2017).
                                                    40


Meanwhile, using several performance measures when conducting modeling exercises is
recommended (Maloney et al., 2009). When predicting continuous response variables (e.g. a
stream health index, biomass) commonly used performance measures are the correlation
coefficient (r), the coefficient of determination (R2), the Nash-Sutcliffe efficiency, the root mean
squared error (RMSE) and the deviance between observed and predicted values (Goethals et al.,
2007). On the other hand, the application of the algorithm’s formulation and the k-fold cross
validation techniques can be used to minimize model’s overfitting as they were widely used in
studies reviewed here. However, to the best of our knowledge, there are no standard guidelines
for evaluating the stream health model performance. Even though, for other aspects of
environmental modeling, several guidelines have been developed including Bennett et al. (2013)
and Moriasi et al.(2007, 2015). Therefore, for stream health modeling, a combination of the
aforementioned criteria or new criteria should be further evaluated with respect to their
applicability/usefulness.
2.5 SUMMARY AND CONCLUSION
         In this study, we provided an overview of different statistical, machine learning, and soft
computing methods widely used in ecological applications and stream health modeling based on
data describing macroinvertebrates and fish assemblages. The main advantages and
disadvantages for the reviewed methods are summarized in Table 1. It is worth noting that
statistical methods are simpler and more interpretable than other methods, while their prediction
power is generally low. On the other hand, models based on machine learning techniques provide
a better accuracy in reproducing observed stream health indices, and are more suitable for
representing complex, nonlinear systems. Nevertheless, these methods can be difficult to
interpret and hardly provide insight into model parameters’ meaning and relative importance.
                                                  41


Thus, soft computing methods, which can be integrated with machine learning techniques, are
favorable because they allow the insertion of expert elicitation and partial information,
enhancing interpretability of ecological models. However, model formulation is usually time
consuming; especially for very complex models. Meanwhile, soft computing models that are
structured based on expert knowledge are susceptible to misrepresenting causal relationships,
and consequently are likely to provide higher structural uncertainties.
        Therefore, frameworks supporting the integration of process-based models for driving
multi-scale stressors and employing ensembles of different empirical modeling techniques, are
being recommended. Meanwhile, these types of modeling techniques are vulnerable to
uncertainty propagation resulted from the modeling process and components and data sources,
which can affect the consistency and reliability of the modeling results. Meanwhile, there is a
growing amount of literature providing better practices for data preparation, optimal model
design, model interpretation, performance evaluation, variables relative importance, among
others, for specific methods such as decision trees, ANN, fuzzy logic and BBN. Therefore, it is
crucial to develop guidelines addressing the aforementioned aspects for stream health modeling
practice.
                                                 42


            Table 1 Summary of advantages, disadvantages and applications for the methods described in this study
                                                                                                          Applications
   Method             Advantages                    Disadvantages
                                                                                    Macroinvertebrates                         Fish
Multiple      Straightforward              Low predictive power           (Merriam et al., 2015, 2013;        (Frimpong et al., 2005; Van
Linear         implementation and           Method assumptions (i.e.       Pond et al., 2017; Waite et al.,    Sickle and Burch Johnson, 2008)
Regression     interpretation                normality, homoscedasticity)   2012, 2010)
              Computational effort is low   are usually violated
                                            Parameter estimation is
                                             unstable under
                                             multicollinearity and strong
                                             correlated variables
Generalized   Straightforward              Low predictive power           (Damanik-Ambarita et al.,           (Fukuda et al., 2013; Gieswein et
Linear         implementation and           Model structure (distributions 2016; Death et al., 2015;           al., 2017; Grenouillet et al.,
Models         interpretation                selection) must be defined a   Donohue et al., 2006; Everaert      2011; Guo et al., 2015a;
              Flexible with the selection   priori                         et al., 2014; Gieswein et al.,      Hermoso et al., 2011; Kwon et
               of error distributions                                       2017; Holguin-Gonzalez et al.,      al., 2015; Leclere et al., 2011;
              Computational effort is low                                  2013a, 2013b; Jerves-Cobo et        Patrick and Yuan, 2017; Sui et
                                                                            al., 2017; Kuemmerlen et al.,       al., 2014)
                                                                            2014; Moya et al., 2011; Pont et
                                                                            al., 2009; Sauer et al., 2011;
                                                                            Van Sickle et al., 2004)
Generalized   Suitable for modeling        Prone to overfitting           (Maloney et al., 2012; Sauer et     (Almeida et al., 2017; Fukuda et
Additive       nonlinear relationships      Reduced interpretability of    al., 2011)                          al., 2013; Grenouillet et al.,
Models        Uses non-parametric basis     modeling results                                                   2011; Guo et al., 2015a;
               functions                                                                                        Maloney et al., 2012; Zhao et al.,
                                                                                                                2014)
                                                                  43


Table 1 (cont’d).
 Ordination       Suitable when analyzing       Methods’ assumptions (e.g.     (D’Ambrosio et al., 2014; Lin      (D’Ambrosio et al., 2014, 2009;
 methods           multiple species in multiple   linearity, unimodality) are    et al., 2016; Pond et al., 2017)   Kwon et al., 2015; Patrick and
                   sites                          likely to be violated                                             Yuan, 2017)
                  Straightforward               Interpretability is
                   interpretation                 compromised when high
                  Computational effort is low    correlations are present
                  Suitable for variable          without clear causal
                   selection and exploratory      relationships
                   analysis                      Some methods are sensitive to
                                                  the relative scaling and noise
                                                  of the explanatory variables
 Classification   Do not require assumptions    Smooth functions are poorly    (Ambelu et al., 2010; Death et     (Grenouillet et al., 2011; Guo et
 and               about data distribution        modeled                        al., 2015; Holguin-Gonzalez et     al., 2015a; He et al., 2010; Kwon
 Regression       Interactions between          Provide very different results al., 2014, 2013a; Maloney et al.,  et al., 2015; Leclere et al., 2011;
 Trees             predictors are modeled and     when making small changes to   2009; Ocampo-Duque et al.,         Wang et al., 2007)
                   can be easily visualized       the training data              2007; Sauer et al., 2011; Waite
                                                 Large trees are poorly         et al., 2012; Wang et al., 2007)
                                                  interpretable
                                                 Not suitable for modeling
                                                  continuous datasets (e.g.
                                                  temporal dynamics)
 Boosted          Suitable for modeling         Time consuming for large       (Brown et al., 2012; Clapcott et   (Chee and Elith, 2012; Clapcott
 Regression        smooth functions and           number of trees or low         al., 2017, 2014, 2012; May et      et al., 2014; Esselman et al.,
 Trees             interactions between           learning rates                 al., 2015; Pilière et al., 2014;   2013; Golden et al., 2016;
                   predictors                    Prone to overfitting           Steel et al., 2017; Tonkin et al., Leclere et al., 2011)
                  Insensitive to outliers       Maximum and minimum            2014; Wagenhoff et al., 2016;
                  Exclude irrelevant predictor   values for response variables  Waite et al., 2014, 2012; Waite
                   variables                      are poorly reproduced          and Van Metre, 2017)
                  Do not extrapolate beyond     Interactions between multiple
                   the range of observations      (more than three) explanatory
                                                  variables are difficult to
                                                  visualize and interpret
                                                 Model interpretability is
                                                  limited
                                                                         44


Table 1 (cont’d).
 Random           Resistant to overfitting     Time consuming for large       (Álvarez-Cabria et al., 2017;     (Álvarez-Cabria et al., 2017;
 Forests          Cross-validation is not       number of trees                Booker et al., 2015; Carlisle et  Fukuda et al., 2013; Fukuda and
                   necessary because a similar  Less accurate than BRT         al., 2009a; Chinnayakanahalli et  De Baets, 2016; Grenouillet et
                   approach is automatically    Interactions between multiple  al., 2011; Clapcott et al., 2017; al., 2011; Guo et al., 2015a; He
                   performed during model        (more than three) explanatory  Death et al., 2015; Fox et al.,   et al., 2010; Kwon et al., 2015;
                   training                      variables are difficult to     2017; Hill et al., 2017; Patrick  Olaya-Marín et al., 2013; Patrick
                  Good for classification       visualize and interpret        and Yuan, 2017; Vander Laan       and Yuan, 2017; Tuulaikhuu et
                   purposes                     Model interpretability is      et al., 2013; Waite et al., 2012) al., 2017; Vezza et al., 2015)
                                                 limited
                                                Cannot be extrapolated
                                                 beyond the range of
                                                 observations
 Artificial       Suitable for modeling        Model interpretability is      (Chon, 2011; Gazendam et al.,     (Chon, 2011; Fukuda et al.,
 Neural            nonlinear relationships       limited                        2016; Goethals et al., 2007;      2013; Grenouillet et al., 2011;
 Networks         Vast literature addressing   Relative importance of         Mathon et al., 2013; Mouton et    Guo et al., 2015a; Mathon et al.,
                   aspects such as variable      predictor variables is more    al., 2010; Sauer et al., 2011)    2013; Olaya-Marín et al., 2013,
                   selection, sensitivity        difficult to determine than                                      2012; Olden et al., 2008; Sutela
                   analysis, model ensembles,    other approaches                                                 et al., 2010; Tsai et al., 2016)
                   and optimal design
                  Good performance when
                   modeling continuous data
 Multivariate     Suitable for modeling        Model interpretability is      (Sauer et al., 2011)              (Hermoso et al., 2011; Kwon et
 Adaptive          smooth functions              limited                                                          al., 2015; Leathwick et al.,
 Regression       Can handle a large number    Highly sensitive to                                              2006b)
 Splines           of explanatory variables      extrapolation (prone to under
                   with low order interactions   and overestimation)
                  Automatically quantify       Model parameters are difficult
                   interaction effects           to identify
 Support          Suitable for modeling        Model interpretability is      (Ambelu et al., 2010; Fan et al., (Fukuda et al., 2013; Fukuda and
 Vector            nonlinear relationships       limited                        2017; Hoang et al., 2010; Lin et  De Baets, 2016; Kwon et al.,
 Machines         Reduced number of            Algorithm is computationally   al., 2016; Sor et al., 2017)      2015; Muñoz-Mas et al., 2018)
                   algorithm parameters.         expensive
                  Overfitting is unlikely      Model parameters are difficult
                  Good performance when         to identify when data is not
                   modeling continuous data      linearly separable
                                                                        45


Table 1 (cont’d).
 Partial Least    Handles multicollinearity       Interpretability is            (Abouali et al., 2016b; Riseng     (Abouali et al., 2016b; Einheuser
 Squares           and strong correlation of        compromised when high          et al., 2011; Surridge et al.,     et al., 2013a; Villeneuve et al.,
 Regression        predictors                       correlations are present       2014; Villeneuve et al., 2018,     2015)
                  Straightforward                  without clear causal           2015)
                   interpretation                   relationships
                  Suitable when the number        Sensitive to the relative
                   of explanatory variables is      scaling and noise of the
                   greater than the number of       predictor variables
                   observations
 Fuzzy Logic-     Provides insight into the       Computational effort rapidly   (Adriaenssens et al., 2006;        (Abouali et al., 2016b; Boavida
 based             influence and interactions of    increases with the number of   Herman et al., 2016; Herman        et al., 2014; Einheuser et al.,
                   explanatory variables.           predictors                     and Nejadhashemi, 2015;            2013a, 2013b, 2012; Fukuda et
                  Uncertainty quantification is   Introduction of expert         Marchini et al., 2009; Mouton      al., 2013; Fukuda and De Baets,
                   easily integrated into the       elicitation into models can be et al., 2009; Ocampo-Duque et      2016; Herman et al., 2016, 2015;
                   models                           time consuming                 al., 2007; Van Broekhoven et       Muñoz-Mas et al., 2012;
                  Suitable for including                                          al., 2006; Woznicki et al.,        Woznicki et al., 2016b, 2016a;
                   expert elicitation and partial                                  2016b, 2016a)                      Yi et al., 2017)
                   information along data
 Bayesian         Uncertainty quantification is   Computational effort and data  (Allan et al., 2012; Boets et al., (Mantyka-Pringle et al., 2017,
 Belief            easily integrated into the       demand rapidly increases with  2015; Death et al., 2015; Forio    2014; Peterson et al., 2013;
 Networks          models                           the number of variables        et al., 2015; Li et al., 2018;     Turschwell et al., 2017; Vilizzi
                  Suitable for including          Loss of accuracy and           Mantyka-Pringle et al., 2014;      et al., 2013)
                   expert elicitation and partial   information because of         McLaughlin and Reckhow,
                   information with data            variable discretization        2017)
                  Able to handle missing          Model performance greatly
                   values in input dataset          depends on the qualitative
                  Can be extended to account       network definition (i.e.
                   for feedback loops and time      network formulation itself is
                   series modeling                  an important source of error)
                                                   Time series modeling is
                                                    computationally demanding
                                                                           46


                 3 INTRODUCTION TO METHODOLOGY AND RESULTS
         This dissertation is comprised of three studies developing a framework for linking multi-
objective calibration and uncertainty quantification for ecohydrological models. The first study
evaluates the impacts of two multi-objective calibration strategies in the replication of a
comprehensive list of ecologically relevant hydrologic indices. The second study builds upon the
first study by introducing an optimization constraint for improving the representation of a subset
of hydrologic indices. Furthermore, different categories of hydrologic indices targeting distinct
streamflow regime facets are explicitly considered during the objective functions’ formulation.
The third study links the advances from the previous two studies to quantify the uncertainty of
ecologically relevant hydrologic indices using Bayesian parameter estimation.
         The first study, titled “Evaluation of the Impacts of Hydrologic Model Calibration
Methods on Predictability of Ecologically-relevant Hydrologic Indices”, evaluated the
performance of multi-objective model calibration in replicating 167 hydrologic indices of
ecohydrological interest using the median values of near-optimal Pareto solutions. Two
calibration strategies were compared. The first strategy consisted of three objective functions
based on the Nash-Sutcliffe Efficiency (NSE), each one accentuating different flow conditions.
The second strategy explicitly divided the streamflow time-series into three segments
representing low, moderate, and high flows using the 25% and 75% flow quantiles as thresholds.
Then, an objective function based on the root-mean-square error (RMSE) was formulated for
each portion. The Non-dominated Sorting Genetic Algorithm III (NSGA-III) was implemented
to obtain near-optimal Pareto solutions under each strategy. SWAT was used to simulate daily
streamflows at the outlet of the Honeyoey Creek – Pine Creek Watershed, located in east-central
Michigan, US. Then, the MATLAB Hydrologic Index Tool (MHIT) was used to compute the
                                                   47


167 hydrologic indices for each near-optimal Pareto solution. Pareto solutions were clustered
into three groups using the k-means method. Generalized Least-Squares (GLS) was used to
analyze the difference among the different clusters in their prediction of average streamflows.
Meanwhile, the replication of hydrologic indices was evaluated using a 30% relative error
range as reference. Finally, the performance of multi-objective calibration was compared against
traditional single-objective model calibration using different NSE versions targeting different
flow conditions.
         The second study, titled “A Novel Multi-Objective Model Calibration Method for
Ecohydrological Applications”, developed calibration strategies for generating a balanced
representation of the overall streamflow regime in terms of magnitude, frequency, duration,
timing, and rate of change. The second study used the same hydrological model and study area as
the first study. Two multi-objective calibration strategies were evaluated based on the findings of
the first study. On one side, the first strategy selected six objective functions representing as
many hydrologic indices as possible within a 30% relative error range. Moreover, an
optimization constraint was devised targeting a subset of indices of ecohydrological interest to be
within the error range. This subset was comprised of 32 Indices of Hydrologic Alteration (IHA)
describing the central tendency of streamflow attributes and seven indices (a.k.a. Magnificent
seven) representing fundamental stochastic properties of streamflow time series. On the other
side, the second strategy consisted in the formulation of six objective functions, each one
explicitly targeting groups of hydrologic indices representing a particular streamflow regime
facet. These hydrologic indices were part of the same subset of 39 indices. The Unified Non-
dominated Sorting Genetic Algorithm III (U-NSGA-III) was applied to generate near-optimal
Pareto solutions under each strategy. Additionally, preferred tradeoff solutions were identified
                                                    48


using various multicriteria decision-making methods. Results for both strategies were compared
in terms of performance of the near-optimal Pareto solutions and preferred tradeoff solutions,
accuracy in the replication of the subset of hydrologic indices, the representation of water
balance and flow duration curve characteristics, and accuracy in the replication of hydrologic
indices’ variability.
        The final study, titled “Probabilistic Predictions of Ecologically Relevant Hydrologic
Indices Using a Hydrological Model”, evaluated the effects of prior knowledge obtained from
multi-objective optimization on the uncertainty analysis of simulated hydrologic indices of
ecohydrological interest. For this purpose, two experiments were formulated. In the first
experiment, non-informative priors were considered when calibrating model and error
parameters using Bayesian parameter estimation. In the second experiment, near-optimal Pareto
solutions from multi-objective calibrations were used to build a multivariate prior distribution for
calibrating model and error parameters using Bayesian inference under an independent time
period from the one used for multi-objective calibration. In both experiments, the same
likelihood function was employed, considering heteroscedasticity and autocorrelation effects.
The multi-objective strategy used here was the same as the one used for the second study's first
strategy. The U-NSGA-III algorithm was used for multi-objective calibration, the multiple-try
Differential Evolution Adaptive Metropolis(ZS) (MT-DREAM(ZS)) algorithm was implemented as
the Markov Chain Monte Carlo method for sampling the posterior distributions, and the
hydrological model, study area, and subset of hydrologic indices were the same as the second
study. The reliability, precision, and bias in streamflow and hydrologic indices predictions were
evaluated and compared for each experiment.
                                                   49


   4 EVALUATION OF THE IMPACTS OF HYDROLOGIC MODEL CALIBRATION
            METHODS ON PREDICTABILITY OF ECOLOGICALLY-RELEVANT
                                       HYDROLOGIC INDICES
4.1 INTRODUCTION
         Alterations driven by human interventions and changing environmental conditions are
threatening water security and freshwater biodiversity around the world (Bunn and Arthington,
2002; Carpenter et al., 2011; Dudgeon et al., 2006; Hipsey et al., 2015; Vörösmarty et al., 2010).
Traditionally, stream condition evaluation has used chemical and microbiological constituents as
criteria (Karr and Yoder, 2004). However, the lack of holistic approaches resulted in further
degradation of aquatic ecosystems (Hering et al., 2010; Jelks et al., 2008). To overcome this
issue, biological assessments were introduced to provide additional insight into the overall
ecological integrity of streams (US EPA, 2011; Woznicki et al., 2016a), and therefore, can be
used for environmental management and decision-making.
         Biological assessments measure the biota (e.g. fish, benthic macroinvertebrates,
periphyton, amphibians) within a stream to obtain information regarding its biological integrity
(US EPA, 2011). In this context, biological integrity is the capacity to support and maintain a
“balanced, integrated, and adaptive” biological system within the expected structure and function
of the natural habitat of a particular region (Karr, 1996; Karr and Dudley, 1981). Stream health
integrates the physical, chemical and biological integrity of a stream, which supports living
systems that are necessary for human well-being (Karr, 1999; Maddock, 1999).
         Stream health indices are generally classified into biotic indices, multi-metric indices,
and multivariate methods (Herman and Nejadhashemi, 2015). Biotic indices use only one metric,
and multi-metric indices use multiple metrics to evaluate stream health (Herman and
                                                  50


Nejadhashemi, 2015). Biotic metrics are individual characteristics comprised of species
abundance and condition, species richness and composition, or trophic composition (Herman and
Nejadhashemi, 2015). Multivariate methods use the reference condition approach to predict
ratios of taxa observed vs. expected – O/E, and implement statistical and modeling tools that
relate environmental features with observed organisms. These tools include cluster analysis,
ordination techniques, discriminant analysis, Artificial Neural Networks, self-organizing maps,
evolutionary algorithms, Bayesian Belief Networks, and others (Abbasi and Abbasi, 2012; Feio
and Poquet, 2011). However, due to limited economic resources, it is not possible to obtain
biotic metrics or O/E ratios for all streams within a watershed. Therefore, stream health
evaluation based on field data is very limited. To address this difficulty, several modeling
approaches have been introduced to extend the available information to ungauged locations
(Woznicki et al., 2015).
        Streamflow regime has been recognized as a key determinant for sustaining biodiversity
and ecological integrity. Thus, ecologically-relevant hydrologic indices are often used as
predictors for stream health models besides landscape factors and water quality indicators (Poff
and Zimmerman, 2010; Woznicki et al., 2016a). Prediction of ecologically-relevant hydrologic
indices include the use of regional statistic approaches (Carlisle et al., 2009b; Dhungel et al.,
2016; Knight et al., 2012; Patrick and Yuan, 2017; Sanborn and Bledsoe, 2006; Yang et al.,
2016), and hydrological modeling (Caldwell et al., 2015; Kennen et al., 2008; Kiesel et al., 2017;
Olsen et al., 2013; Vis et al., 2015; Wenger et al., 2010; You et al., 2014). The use of
hydrological models is especially preferred when it is necessary to evaluate the change of stream
health driven by modifications in land use, environmental conditions or management practices
(Poff et al., 2010; Shrestha et al., 2016; Woznicki et al., 2016b). However, hydrologic models’
                                                  51


ability to replicate ecologically-relevant indices is limited. Some studies have shown that the use
of typical calibration approaches (i.e. single-objective based on widely used performance
metrics) produces poor representations of some streamflow regime characteristics (Murphy et al.,
2013; Vis et al., 2015). For instance, while average conditions are generally well-predicted, low-
and high-flow indices are frequently over or under predicted (Wenger et al., 2010). Moreover, no
model has been found to provide all selected ecologically-relevant hydrologic indices within ±
30% of the observed values (Caldwell et al., 2015; Vis et al., 2015). Therefore, several studies
have proposed to explicitly include ecologically-relevant hydrologic indices into the objective
functions for model calibration to improve the overall performance of streamflow regime
simulations (Murphy et al., 2013; Shrestha et al., 2014; Vis et al., 2015). For instance, Kiesel et
al. (2017) and Zhang et al. (2016) used multi-metric (i.e. aggregated) objective functions based
on a reduced number of ecologically-relevant hydrological indices (12 and 16 indices,
respectively). They found that it is possible to obtain better overall representations of streamflow
regime compared to objective functions based only on conventional performance metrics (e.g.
coefficient of efficiency, mean squared errors, correlation coefficient). However, optimal
solutions were still unable to effectively represent all hydrological indices individually,
especially when they are not explicitly included in the objective function formulation (Kiesel et
al., 2017). In addition, optimal solutions are sensitive to the weights assigned to each ecological-
relevant hydrological index in the multi-metric objective function (Zhang et al., 2016). On the
other hand, different authors have suggested the use of typical performance metrics with proper
transformations (Garcia et al., 2017; Oudin et al., 2006; Pushpalatha et al., 2012) or explicit
hydrograph partitioning (Pfannerstill et al., 2014) for model calibration. However, these
approaches have mainly shown improvements in the representation of target flow conditions
                                                  52


rather than the overall streamflow regime, or have been evaluated using traditional performance
metrics instead of ecologically-relevant hydrological indices.
         Furthermore, the aforementioned studies have mainly implemented single-objective
algorithms for model calibration. Therefore, these previous studies present no integrated
perspectives on relationships between different performance metrics or hydrological indices
involved in the model calibration process. On the other hand, multi-objective optimization
algorithms are very useful for evaluating the tradeoffs between different metrics and objective
functions involved in hydrologic model calibration (Price et al., 2012), and can provide sets of
solutions able to represent different flow conditions (Efstratiadis and Koutsoyiannis, 2010; Reed
et al., 2013). However, Pareto-optimal solutions are not usually evaluated employing
ecologically-relevant hydrological indices, but instead pure hydrological signatures based on, for
example, Flow Duration Curve (FDC) segments or runoff ratios (Shafii and Tolson, 2015; van
Werkhoven et al., 2009). Moreover, many studies have been more concerned about selecting a
best single solution than analyzing the whole set of Pareto-optimal solutions, which could
provide better results for overall streamflow regime representation. For instance, Vis et al. (2015)
attributed high model efficiencies when using the median of several optimum solutions as a
“more robust prediction” for ecologically-relevant hydrologic indicators. Therefore, the objective
of this study is to identify an objective function best suited for stream health model applications
using a multi-objective optimization algorithm for model calibration. Typical performance
metrics that represent different flow conditions and explicit hydrograph partitioning are
considered in this study. For this purpose, the Soil and Water Assessment Tool (SWAT)
watershed model and the NSGA-III multi-objective optimization algorithm are jointly
implemented. Then, a set of 167 ecologically-relevant hydrologic indices is used for evaluating
                                                  53


the ability of the resulting Pareto-optimal solutions in representing the overall streamflow
regime.
4.2 MATERIALS AND METHODS
        Two strategies based on a many-objective optimization technique for model calibration
were compared to evaluate their abilities to predict ecologically-relevant hydrologic indices
(Figure 1). The first multi-objective strategy calibrates the model based on three different forms
of Nash-Sutcliffe Efficiency (NSE) described by Krause et al. (2005) and Pushpalatha et al.
(2012) that are suitable for evaluating high, medium, and low flows. In the second strategy,
observed daily flow time series were divided into three categories (high, medium, and low flows)
using explicit thresholds for low and high flow (flows exceeded 75% and 25% of the time,
respectively). For each category, an objective function based on the root-mean-square error
(RMSE) was computed. Pareto-optimal solutions for calibration model parameters were obtained
for each multi-objective strategy employing the NSGA-III algorithm (Deb and Jain, 2014; Jain
and Deb, 2014).
        For both strategies, the Soil and Water Assessment Tool (SWAT) (Arnold et al., 2012;
Neitsch et al., 2011) was used to simulate daily streamflow discharge time series for every
stream segment, and the MATLAB Hydrological Index Tool (Abouali et al., 2016a) was
employed to calculate 171 hydrologic indices intended to characterize streamflow regime (Olden
and Poff, 2003). Hydrologic indices were computed for each Pareto-optimal point obtained from
both multi-objective strategies and for the observed flow dataset. Then, model outputs and
indices were evaluated with respect to the observed values using statistical analysis. For this
purpose, Pareto-optimal points for each multi-objective strategy were clustered into three groups
using the k-means method. Generalized Least-Square (GLS) estimation, considering
                                                  54


autocorrelation for the residue, was implemented for streamflow time-series. Meanwhile, the
median errors between the hydrologic indices based on Pareto-optimal solutions and observed
time series were evaluated with respect to the ±30% uncertainty bound for the observed values,
as reported by previous studies (Caldwell et al., 2015; Kennard et al., 2010a; Vis et al., 2015).
Finally, results were compared with the optimal solution obtained using a single-objective
approach with an objective function based on the standard NSE.
   Figure 1 A schematic diagram presenting the overall multi-objective model calibration and
 evaluation process. Q25 and Q75 are the flows exceeded 25% and 75% of the time, respectively,
   NSE is the standard Nash-Sutcliffe Efficiency, NSEsqrt is the root-squared-transformed NSE,
  NSErel is the relative NSE, RMSE is the Root-Mean-Square Error, and MHIT is the MATLAB
                                      Hydrological Index Tool
4.2.1 Study area
        In order to perform environmental flow analysis, it is desirable to identify areas where
urbanization is limited, streamflow is not regulated or its alteration is negligible, and observed
discharge records are available for almost all the studied period (Olden and Poff, 2003). The
                                                 55


Honeyoey Creek–Pine Creek Watershed, with a drainage area of 1,010 km2, meets all the
criteria, because urbanization is less than 4%, streamflow regulation is limited, and observed
streamflow data for the period is complete. The study area is located in the Saginaw Bay
Watershed in east-central Michigan (Figure 2), which is the largest watershed in the state and is
identified as an area of concern by the US Environmental Protection Agency (USEPA, 2015).
The watershed has an average slope of 1.9% ranging from 12-39% in the headwaters to 0-1.4%
in the lowlands (USGS, 2018). The region has a temperate climate with distinct seasons
(Andresen and Winkler, 2009). The average annual rainfall is about 840 mm for the period 1981-
2010 (NOAA-NCEI, 2020). However, the precipitation regime is bimodal, with maxima in May
and September, and minima in February and July. Mean annual air temperature in the watershed
is 9 °C, with a minimum monthly temperature of -9 °C in January and a maximum monthly
temperature of 28 °C in July. The dominant land use is agriculture, covering about 50% of the
watershed, followed by forests (24%), wetlands (16%) and pasturelands (7%) (USDA-NASS,
2012). Over 60% of the river network’s riparian vegetation has not been altered by human
activities. The dominant soil textures are loamy sand, sandy loam, loam/clay loam, and sand,
which cover about 30, 26, 20 and 11% of the study area, respectively (USDA-NRCS, 2020). The
average flow is about 103 m3/s at the outlet of the watershed. High flows occur between March
and May as a result of snow melting and high precipitation, while low flows occur between July
and October, during summer and fall seasons (USGS, 2020). High flows, considered in this study
as those that are exceeded at most 25% of the time (Q25), have magnitudes above 11.3 m3/s
while low flows are defined as those with values below 3.9 m3/s, which is the discharge
exceeded 75% of the time (Q75).
                                                  56


                        Figure 2 Location and topography of the study area
4.2.2 Data Collection
        Datasets required for the hydrologic modeling comprise topography, land use, soil
properties, climate, and observed streamflow discharge. The National Elevation Dataset from the
US Geological Survey (USGS) with 30 m spatial resolution was used to represent the topography
of the watershed (USGS, 2018). The land use characteristics were obtained from the Cropland
Data Layer developed by the National Agricultural Statistics Service of the US Department of
Agriculture (USDA-NASS) with 30 m spatial resolution (USDA-NASS, 2012). The soil
properties were compiled from the Natural Resources Conservation Service’s (NRCS) Soil
Survey Geographic (SSURGO) Database, at a scale of 1: 250,000 (USDA-NRCS, 2020). Daily
time series for precipitation and temperature from 2001 through 2014 were obtained from two
weather stations that belong to the National Climatic Data Center (NOAA-NCEI, 2020). Relative
                                                 57


humidity, solar radiation and wind speed time series for the same time span were provided by the
stochastic weather generator WXGEN (Neitsch et al., 2011) included in SWAT. Daily
streamflow discharges between 2003 and 2014 were obtained from the Pine River Near Midland
gauging station (ID 04155500) (USGS, 2020).
4.2.3 SWAT Model description
        The Soil and Water Assessment Tool (SWAT version 2012, rev. 614) is a semi-
distributed, continuous-time, process-based hydrological model, which simulates water flow,
sediment transport, and water quality processes in watersheds (Arnold et al., 1998). SWAT
divides a watershed into subwatersheds that are further discretized into multiple units with
homogeneous land use, slope, and soil characteristics known as hydrologic response units
(HRU). The main processes in SWAT include snow accumulation and melting,
evapotranspiration, infiltration, percolation losses, surface runoff, channel routing, and
groundwater flows (Neitsch et al., 2011).
        SWAT is used in this study for daily flow simulation between 2003 to 2014 for all 749
defined stream segments in the Honeyoey Creek–Pine Creek watershed. Fifteen parameters were
selected for model calibration whose description and ranges of variation are presented in Table 2.
The calibration period was defined between 2003 and 2008, while the validation period spans
between 2009 to 2014. Meanwhile, two years of warm-up period (2001-2002) were considered
to stabilize initial conditions of soil water (Cibin et al., 2010).
4.2.4 Hydrologic indices
        The 171 hydrologic indices reported by Olden and Poff (2003) are evaluated for all
Pareto-optimal solutions after completing each multi-objective model calibration. Then, these
indices are compared with the respective indices for the observed dataset, including the
calibration and validation periods. The evaluated hydrologic indices characterize the streamflow
                                                   58


regime in terms of magnitude, frequency, duration, timing and rate of change of flows (Poff et
al., 1997; Richter et al., 1996) for a given daily time-series. These indices are classified into
eleven groups: magnitude for low (ML), average (MA), and high (MH) flow conditions;
frequency for low (FL), and high (FH) flow conditions; duration for low (DL), and high (DH)
flow conditions; timing for low (TL), average (TA), and high (TH) flow conditions; and rate of
change for average (RA) flow conditions. The hydrologic indices are computed using the
MATLAB Hydrological Index Tool (MHIT), which has shown better computing performances
in comparison to other available packages when handling high number of datasets (Abouali et
al., 2016a).
4.2.5 Objective functions
         In this study, we contrast the ability of two commonly used procedures in representing a
wide number of the streamflow metrics related to environmental flows indicated in section 4.2.4.
Each procedure refers to a specific three-dimensional objective space (Figure 3). In summary,
the first strategy utilizes three different NSE-based efficiency criteria to evaluate the efficiency
of high, medium (overall), or low flow conditions. In the second strategy, efficiency computation
is done after flow time series are explicitly partitioned into high, medium, and low flows using
statistical thresholds for flow separation. Further details are presented next.
                                                    59


  Figure 3 Objective spaces for the SWAT model calibration: a) using different forms of Nash-
        Sutcliffe efficiency; b) after hydrograph partitioning using Q25 and Q75 thresholds
4.2.5.1 Nash-Sutcliffe efficiency-based objective functions
        In this strategy, NSE-based objective functions are formulated to represent different parts
of the observed hydrograph. Krause et al. (2005) and Pushpalatha et al. (2012) indicated that
standard NSE, Eq. 1 (Nash and Sutcliffe, 1970) is very sensitive to high flows on continuous
simulations, given that the differences between simulated and observed values are squared. On
the other hand, NSE calculated on root squared transformed flows (Eq. 2, NSEsqrt) has been
found to provide a more balanced performance because the errors are more equally distributed on
high and low flow portions of the hydrograph (Oudin et al., 2006; Pushpalatha et al., 2012).
Additionally, relative NSE (Eq. 3, NSErel), described by Krause et al. (2005), suppresses the
influence of peak flows on the efficiency computation, making it more sensitive to low flows.
Pushpalatha et al. (2012) showed that NSE computed on the reciprocal of flow values (inverse
transformed flows) is better suited for low flow conditions, focusing on the 20% lowest flows on
average. However, in this study we decided to use the NSErel given that some of the ecologically-
relevant hydrologic indices (e.g., low flow index, base flow indices, indices based on moving
averages) computed by MHIT for low flows are based also on overall flow values. Therefore,
NSE, NSEsqrt, and NSErel were used to represent high, medium, and low flows, respectively.
                                                  60


                       n
                      O  P 
                                            2
                                  i     i
         NSE  1    i 1
                                                                                            (1)
                      O  O 
                       n                    2
                                  i
                     i 1
                                                  
                             n                        2
                                      Oi  Pi
         NSE sqrt  1     i 1
                                                                                            (2)
                                                  
                             n                        2
                                      Oi  O
                           i 1
                                                2
                           n
                                Oi  Pi 
                                            
         NSE rel  1  
                         i 1         Oi 
                                                2                                           (3)
                           n 
                                    Oi  O 
                         
                         i 1 
                               
                                      O 
                                              
where, O and P are the observed and predicted values, respectively. For all NSE-based criteria,
the objective functions (OF) were minimized by computing 1 – NSE, which have a range
between zero and infinite. Values for any form of NSE range from minus infinite to one, while
the corresponding OFs range from zero to infinite. A perfect fit between simulated and observed
values is achieved when all NSE are equal to one and corresponding OFs are equal to zero.
4.2.5.2 Root-Mean-Square Error-based objective functions
        In this strategy, RMSE-based objective functions are formulated for streamflow
calibration. The time series for the entire study period (2003-2014) was divided into three
categories representing high, medium, and low flows using the Q25 and Q75 thresholds obtained
from the observed data. To have the same amount of observed and simulated points in each
category, the simulated time series are divided following the observed time series partitioning.
Then, the RMSE is computed for each flow category:
                                 nj
                         1
         RMSE j                O  P 
                                                   2
                                          i      i                                          (4)
                         nj     i 1
                                                        61


where, j refers to the flow category and nj is the number of observations for each category. Each
minimization OF is equal to the computed RMSE for each category. A perfect fit between
observed and simulated values yields an RMSE equal to zero.
4.2.6 Many-objective optimization algorithm
        Multi-objective evolutionary algorithms (MOEAs) are population-based heuristic search
methods that use randomly generated points that move towards a Pareto-optimal front using
evolutionary operators (Coello Coello et al., 2007). MOEAs have been widely implemented
during the last two decades for water resources applications (Efstratiadis and Koutsoyiannis,
2010; Maier et al., 2014; Reed et al., 2013). For instance, the Non-dominated Sorting Genetic
Algorithm II (NSGA-II) (Deb et al., 2002a) has been widely-used for hydrologic model
calibration (e.g. Bekele and Nicklow, 2007; Confesor and Whittaker, 2007; Lu et al., 2014;
Shafii and De Smedt, 2009). The popularity of NSGA-II is mainly given by its simplicity,
modularity, parameter-less property, and good performance for difficult two-objective problems
(Deb and Gupta, 2006). However, without any extensions or combinations with other
approaches, the NSGA-II by itself has shown shortcomings for solving problems with three or
more objectives (Deb and Jain, 2014; Reed et al., 2013; Sindhya et al., 2013).
        NSGA-III, which is based on the NSGA-II framework, is the evolutionary many-
objective optimization algorithm used to implement the two multi-objective calibration
strategies. NSGA-III is an elitist reference-point-based procedure that uses non-domination
sorting to solve problems with four or more objectives. This procedure has also shown good
performance solving cases with three objectives (Seada and Deb, 2016). The main difference
between NSGA-II and NSGA-III is the niching method which is a procedure to maintain
diversity among solutions (Deb, 2001). NSGA-II uses crowding distances, while NSGA-III is
reference-directions-based (Deb and Jain, 2014; Jain and Deb, 2014). A reference direction is a
                                                  62


line that crosses both the origin and a supplied reference point in the objective space. Selection
operation is not explicit in NSGA-III given that for each reference direction, only one population
individual is expected (Seada and Deb, 2016). The general outline of the algorithm is as follows:
1. The algorithm begins by generating a population of size N and a number of H reference
    points distributed in the objective space with M dimensions (i.e., number of objectives is M).
    The number of reference points is H = (𝑀 + 𝑝𝑝 − 1), where p is the number of divisions, along
    each objective, used to distribute reference directions on the front. The parameter p is chosen
    suitably so as to create a population size adequate to hold a number of trade-off solutions.
2. Next, NSGA-III proceeds in a similar fashion as NSGA-II. Using recombination and
    mutation, the current parent population 𝑃𝑡 is used to generate an offspring population 𝑄𝑡 . The
    parent and offspring populations are combined into 𝑅𝑡 = 𝑃𝑡 ∪ 𝑄𝑡 (of size 2N), and then the
    𝑅𝑡 members are sorted using non-domination ranking. A new intermediate set St is generated
    selecting the first Pareto front until the size of St is equal or greater than N. The rank of the
    last selected individual in St is obtained, corresponding to the last front Fl. The population
    members included in St but not included in Fl (expressed as St \ Fl) are directly selected for
    the next generation Pt+1.
3. The new population Pt+1 is completed by selecting population members from Fl based on the
    NSGA-III niching method. For this purpose, objective values and supplied reference points
    are normalized to have a commensurate range. After the normalization, the ideal point
    coincides with the origin of the objective space. Next, reference directions are constructed by
    joining the ideal point with each reference point. Then, each member of St \ Fl is associated
    with a reference point according to its proximity (i.e. perpendicular distance) to the
    corresponding reference direction. Reference points that have the least number of related
                                                    63


    population members in St \ Fl are considered to be associated with a member of Fl. Each
    member of Fl set is therefore selected one-at-a-time making the latter association to fill the
    remaining slots for Pt+1.
4. The whole evolutionary process is repeated until a predefined termination criterion is reached
    (e.g., number of generations/function evaluations, negligible improvement of Pareto-optimal
    solutions, small change in performance metrics).
        NSGA-III’s parameters are the population size (equal to the number of reference points),
stopping-criteria (in this case, the number of generations), crossover and mutation probabilities,
and distribution indices for each genetic operation (i.e., Simulated Binary Crossover – SBX, and
polynomial mutation). The NSGA-III implementation used in this study was programmed in
Java and was provided by the Computational Optimization and Innovation (COIN) Laboratory at
Michigan State University. The Java code was adapted for this study to have a connection with
SWAT to perform the automatic calibration process.
4.2.7 Model evaluation
        In order to perform the statistical analysis for model evaluation, the Pareto-optimal points
are clustered into three groups representing high, medium, and low flow conditions. Hence, the
k-mean clustering method (Arthur and Vassilvitskii, 2007) is employed for each calibration
strategy in order to identify separate sets of solutions that show better performances for each
flow condition. The three clusters are identified for each calibration method, using the
corresponding objective functions presented in section 4.2.5. Thus, Pareto-optimal solutions with
the highest NSE and NSErel values are going to be collected in the high and low flow clusters,
respectively. Solutions with balanced NSE and NSErel values will comprise the medium flow
cluster. Then, the simulated streamflow time series from each group are compared with the
                                                   64


observed dataset, to evaluate whether the estimated mean differences between the simulations
and observations are significant. Therefore, the difference of each simulation with respect to the
observed dataset is used as response to fit a simple regression with intercept using GLS
estimation with Autoregressive model with lag 1, or AR (1), to account for the serial correlation
for the time series. The differences are considered significant when the reported p-value is less
than 0.05 (i.e. 95% confidence interval for the estimated mean of the distribution does not span
zero), with positive (or negative) values for the difference indicating over/ under-estimation of
the actual observation. The process was repeated comparing high, medium, and low streamflow
categories using the Q25 and Q75 thresholds defined for the RMSE-based calibration strategy.
However, because the extracted time series for each flow category are irregularly spaced
temporally, we modeled the difference from the corresponding samples between the observed
series and each simulated series. Three different methods were used: Normal distribution,
Student’s t-distribution, and GLS with a Continuous Autoregressive model with lag 1, or CAR
(1). Then, we determine the most appropriated test based on the smallest Akaike Information
Criterion (AIC) value.
        On the other hand, the hydrologic indices obtained for each Pareto-optimal solution are
also grouped following the same three clusters determined above (high, medium, and low flow
conditions). For each cluster and hydrologic index, the difference between the simulated and
observed values are computed and divided between the respective observed values to obtain the
relative error. Then, it is determined whether the median relative errors for each group are within
or outside the ±30% uncertainty bound. The comparison described above is also performed with
no clustering of the Pareto-optimal solutions, in order to evaluate the effect of accounting for all
solutions in the median values of the predicted hydrologic indices.
                                                  65


4.3 RESULTS AND DISCUSSION
4.3.1 Convergence and spread of Pareto-optimal fronts obtained with multi-objective
        calibration strategies
        The NSGA-III algorithm was implemented for both NSE- and RMSE-based strategies
using a population size of 100 points, a maximum number of 500 generations, a crossover
probability of 0.9, a mutation probability of 1/15 (i.e., the reciprocal of the number of calibration
parameters), and distribution indices of 10 and 20 for SBX and polynomial mutation,
respectively. The convergence to the Pareto-optimal front was evaluated using the hypervolume
indicator, which is a measure of the volume enclosed by a Pareto front with respect to a specified
reference point (Auger et al., 2009). The Pareto-optimal front was selected when a steady
behavior of the hypervolume indicator was observed across the preceding generations. In this
study, the reference points for each strategy were selected with NSE = NSEsqrt = NSErel = 0, and
RMSEH = RMSEM = 30 m3/s and RMSEL = 10 m3/s, which approximately correspond to the
extreme objective function values visited by the optimization algorithm. The hypervolume
indicator was computed for the non-dominated front obtained at the end of each generation using
the Walking Fish Group (WFG) algorithm (While et al., 2012, 2016), and the resulting values for
each strategy were normalized to range between 0 and 1 (Figure 4). Figure 5 shows the final non-
dominated fronts obtained after 349 and 484 generations using the NSE- and RMSE-based
calibration strategies, respectively. In the same figure, clusters corresponding to high, medium
and low flow conditions, obtained with the k-means method, are also shown.
        The solutions along the NSE-based Pareto-optimal front range from 0.22 to 0.76 for NSE,
from 0.37 to 0.73 for NSEsqrt, and from 0.57 to 0.81 for NSErel. The Pareto-optimal front is
characterized by two distinct regions with significant tradeoffs. One of those regions, the low
flow cluster, shows NSErel above 0.77 (Figure 5a) with lower values for NSE and NSEsqrt
                                                  66


spanning from 0.22 to 0.35, and from 0.37 to 0.45, respectively. The other region, the high and
medium flow clusters, shows acceptable NSE and NSEsqrt values (i.e. from 0.73 to 0.76 and from
0.58 to 0.71, respectively) while NSErel ranges from 0.57 to 0.71. These results indicate that
solutions with a very good representation of low discharges provide a poor representation of
peak and medium flows (Guo et al., 2014; Shafii and De Smedt, 2009). However, the results also
suggest that low discharges still exhibit acceptable efficiencies for the best representations of
high and medium flows. Additionally, a strong linear correlation (R2 = 0.91) between the NSE
and NSEsqrt objective functions is observed, though it is weaker (R2 = 0.002) for smaller values
for the NSErel objective function (i.e., low flow cluster). Hence, it is likely that NSE and NSEsqrt
are providing similar information for model calibration and therefore similar results can be
achieved discarding one of these two objective functions. With respect to the RMSE-based
Pareto-optimal front, the performance measures range from 8.3 to 20.4 m3/s for RMSEH, from
2.1 to 4.9 m3/s for RMSEM, and from 0.81 to 3.6 m3/s for RMSEL (Figure 5b). These values
indicate that, in general, the RMSE-based values range from 30% to 130% of the observed
average discharges for each flow category (high, medium, and low) defined using the observed
Q25 and Q75 thresholds. Moreover, the RMSE-based Pareto-optimal clusters are layered along
the RMSEH direction. This behavior suggests that the high flow cluster can represent some
medium and low discharges that are well represented by medium and low flow clusters.
                                                  67


Figure 4 Normalized hypervolume indicator behavior over the NSGA-III search process for each
                                   calibration strategy
                                            68


     Figure 5 Clustered Pareto-optimal solutions obtained for each multi-objective calibration
  strategy employing NSGA-III algorithm and k-means clustering method a) NSE-based and b)
                                             RMSE-based
4.3.2 Reduction of initial parameter ranges by multi-objective calibration strategies
        The Pareto-optimal calibrated SWAT parameter ranges varied according to the multi-
objective calibration strategy (see Table 2). In general, results suggest that NSE-based calibration
strategy was able to provide narrower calibrated ranges for model parameters than RMSE-based
strategy. Moreover, for some parameters, the implementation of the k-means method to the
Pareto-optimal fronts allowed the identification of different ranges depending on the role of the
objective functions in each cluster (e.g., importance of NSE and RMSEH for high flow, and
importance of NSErel and RMSEL for low flow). In order to compare the results for calibration
parameters, we considered whether or not they showed a significant reduction in their ranges
after the calibration process (i.e., narrower calibration range with respect to initial calibration
range), and whether or not they showed similar final ranges in each multi-objective calibration
strategy. Regarding the significant reduction in calibration ranges, we found that a group of
                                                   69


parameters describing mainly HRU and groundwater components showed very similar initial and
final ranges for both calibration strategies. These parameters were BIOMIX (biological mixing),
CANMX (max. canopy storage), EPCO (plant uptake), GW_REVAP (groundwater “revap”
coefficient) and REVAPMN (groundwater threshold depth for “revap”). On the other hand, the
groundwater parameter GWQMN (threshold depth for flow return) reduced in range for both
multi-objective strategies. However, the ranges varied depending on the calibration strategy.
Likewise, some parameters mainly related to groundwater and routing components also reduced
in calibration range. On the other hand, the following ranges were very similar for both
calibration strategies: ESCO (soil evaporation), GW_DELAY (groundwater delay time),
ALPHA_BF (baseflow factor), RCHRG_DP (percolation factor), CH_N (2) (Manning
coefficient), and CH_K (2) (alluvium hydraulic conductivity). It is worth noting that
GW_DELAY, ALPHA_BF and CH_N (2) were within different ranges depending on contrasting
flow conditions (i.e., high and low flow clusters). For example, GW_DELAY resulted in a range
of 0 to 0.1 days for high flow conditions, and a range of 237 to 309 days for low flow conditions
using the NSE-based strategy. Meanwhile, CN2 (curve number for moisture condition II) and
SOL_AWC (soil water capacity) showed contrasting ranges in high and low flows for NSE-
based strategy. For instance, low flow conditions favored positive multiplicative factors for CN2,
increasing runoff potential, while providing negative multiplicative factors for SOL_AWC,
reducing the available water capacity of soils. For high flow conditions, CN2 and SOL_AWC
results were the opposite. However, RMSE-based strategy did not provide reduced calibrated
ranges for these two parameters. Finally, SURLAG (surface runoff lag) showed very similar
reduced ranges for all flow conditions in the NSE-based strategy, while providing a reduced
range only for high flow condition in the RMSE-based strategy.
                                                  70


    Table 2 Calibrated ranges obtained with Pareto-optimal solutions. Values without brackets
   correspond to NSE-based strategy results while values within brackets correspond to RMSE-
                                               based strategy results
                                         Calibrated ranges per cluster
  Parameter**         Initial range
                                         All solutions       High Flow          Medium Flow          Low flow
  BIOMIX              0-1                0-0.97              0-0.97             0-0.63               0.04-0.87
                                         [0.02-0.99]         [0.02-0.96]        [0.03-0.99]          [0.07-0.99]
  CN2*                (-0.25)-0.25       (-0.25)-0.25        (-0.25) -(-0.22) (-0.25) -(-0.21)       0.246-0.249
                                         [(-0.25)-0.25]      [(-0.25)-0.23]     [(-0.25)-0.25]       [(-0.24)-0.25]
  CANMX               0-100              10-100              10-68              11-95                36-100
                                         [1.4-91]            [4.7-78]           [20-91]              [1.4-70]
  ESCO                0.01-1             0.74-1              0.85-0.96          0.74-0.91            0.91-1
                                         [0.6-1]             [0.6-1]            [0.9-1]              [0.9-1]
  EPCO                0.01-1             0.01-0.9            0.01-0.79          0.01-0.9             0.09-0.47
                                         [0.01-0.9]          [0.01-0.8]         [0.01-0.9]           [0.1-0.9]
  GW_DELAY            0-500              0-309               0-0.1              0-0                  237-309
                                         [0-499]             [0.01-0.34]        [0-415]              [141-499]
  ALPHA_BF            0-1                0.05-0.29           0.23-0.29          0.19-0.28            0.05-0.11
                                         [0.05-0.33]         [0.13-0.32]        [0.1-0.33]           [0.05-0.24]
  GWQMN               0-5000             0-4861              2-154              0-614                1221-4861
                                         [38-2018]           [38-636]           [479-2018]           [331-649]
  GW_REVAP            0.02-0.2           0.02-0.2            0.02-0.2           0.03-0.19            0.11-0.2
                                         [0.03-0.2]          [0.03-0.2]         [0.1-0.16]           [0.12-0.17]
  REVAPMN             0-1000             48-959              50-953             48-959               60-378
                                         [0.15-840]          [0.15-840]         [0.88-550]           [16.41-450]
  RCHRG_DP            0-1                0.28-0.63           0.52-0.63          0.43-0.55            0.28-0.41
                                         [0.28-0.75]         [0.28-0.64]        [0.3-0.45]           [0.3-0.75]
  CH_N (2)            0.001-0.3          0.03-0.23           0.03-0.04          0.03-0.04            0.13-0.23
                                         [0.02-0.3]          [0.02-0.04]        [0.02-0.08]          [0.05-0.3]
  CH_K (2)            0-500              10-34               23-31              22-34                10-21
                                         [12-57]             [21-51]            [28-57]              [12-52]
  SOL_AWC*            (-0.25)-0.25       (-0.25)-0.25        0.02-0.25          0.11-0.25            (-0.25) -(-0.13)
                                         [(-0.25)-0.23]      [(-0.19)-0.23]     [(-0.25)-0.16]       [(-0.22)-0.23]
  SURLAG              1-24               1-1.4               1-1.1              1-1.2                1.1-1.4
                                         [1-19.2]            [1-1.2]            [1-13.4]             [1-19.2]
*
  These parameters are treated as global multiplying factors that modify the assigned values for each HRU
depending on soil type and land use
**
   ALPHA_BF, Baseflow alpha factor (days-1); BIOMIX, Biological mixing efficiency; CANMX, Maximum canopy
storage (mm H2O); CH_K (2), Effective hydraulic conductivity in main channel alluvium (mm hr -1); CH_N (2),
Manning's "n" value for the main channel; CN2, Initial SCS runoff number for moisture condition II; EPCO, Plant
uptake compensation factor; ESCO, Soil evaporation compensation factor; GW_DELAY, Groundwater delay time
(days); GWQMN, Threshold depth of water in the shallow aquifer required for return flow to occur (mm H 2O);
GW_REVAP, Groundwater "revap" coefficient; REVAPMN, Threshold depth of water in the shallow aquifer for
"revap" or percolation to the deep aquifer to occur (mm H 2O); RCHRG_DP, Deep aquifer percolation fraction;
SOL_AWC, Available water capacity of the soil layer (mm H2O mm-1 soil); SURLAG, Surface runoff lag
coefficient.
                                                          71


4.3.3 Flow duration curves and streamflow time series representation
        Figure 6 presents FDC and hydrographs for the simulation period. Visual inspection of
the simulated and observed curves reveals that NSE-based strategy provides less variability than
RMSE-based strategy, represented by the width of light gray bound of solutions. For instance,
Q25 and Q75 for NSE-based strategy ranged from 7.4 to 11.8 m3/s and from 2.6 to 4.3 m3/s,
respectively. Meanwhile, Q25 and Q75 for RMSE-based strategy ranged from 4.6 to 12.2 m3/s
and from 2.6 to 5.9 m3/s, respectively. This means that the higher uncertainty level for
streamflow simulations given by the RMSE-based calibration strategy is consistent with the wide
CN2 and SOL_AWC calibrated ranges (Table 2). Note that some extreme discharges, especially
low flow events, lie outside all the simulation bounds provided by Pareto-optimal solutions
considered here. Also, some descending limbs and subsequent low flow pulses are poorly
simulated in both calibration and validation periods. Therefore, limitations in the representation
of extreme low and high flow related indices are expected. However, different sources of error
may play a role here including input data uncertainties and structural inadequacies (Price et al.,
2012). In both calibration strategies, high flow cluster bounds include most of the observed FDC,
while shrinking the dispersion of simulated FDCs. It is worth noting that the group of simulated
FDCs obtained from the NSE-based calibration strategy splits into two branches at the portion
representing discharges exceeded 25% of the time. For the aforementioned calibration strategy,
only the low flow cluster does not have any simulated FDC representing the corresponding
branch for high discharges. Additionally, medium flow cluster shows the largest variability in the
NSE-based calibration strategy, while lower flow cluster does in the RMSE-based strategy.
                                                 72


  Figure 6 Flow duration curves and time series obtained from Pareto-optimal solutions (light
gray) and clustered (high, medium, and low flow) solutions (dark gray) for NSE-based (a, b, and
 c), and RMSE-based (d, e, and f) multi-objective calibration strategies. Red lines correspond to
                                  observed streamflow values
                                               73


4.3.4 Statistical analysis for predicted streamflow time-series
        We performed the statistical analysis of the mean difference between observed and
simulated streamflow time series for the simulation period comprised from 2003 to 2014. Most
of results showed that GLS with CAR (1) correlation is substantially better than the other
methods that ignore the serial correlation for the time series, as indicated by much smaller AIC
(results not showed here). In a few rare cases, t-student model is better than GLS-CAR (1),
meaning it is even more important to model the heavy-tail distribution rather than model the
serial correlation. The percentage of Pareto-optimal solutions in each cluster without enough
evidence of significant difference with a confidence level of 95% (Table 3), only account for the
results obtained with GLS with AR (1) or CAR (1) correlations. Results for the statistical
analysis indicated that, in general, the NSE-based strategy provides many more Pareto-optimal
solutions without significant differences, compared to RMSE-based strategy. Table 3 also shows
that most solutions that belong to the high flow cluster of the Pareto front yield good mean
representations of overall, high, and medium streamflow time series values in both multi-
objective calibration strategies. However, the percentages for high flow category do not surpass
47%, because of the recurrent under-estimation of the mean of time series comprising high
discharges. Surprisingly, medium flow cluster resulted in more solutions with good mean
representation of discharges below Q25 threshold (i.e., low flow values) in both calibration
strategies. Therefore, it is possible to infer that solutions with simultaneous good performances
based on both NSE and NSErel (Pareto front region conformed by high and medium flow
clusters, see Figure 5) lead to sound representation of overall streamflow time series and specific
flow conditions, which is consistent with the graphical results obtained for FDCs and
hydrographs presented in Figure 6.
                                                    74


 Table 3 Percentage of Pareto-optimal solutions without evidence of significant mean difference
      (𝜶=0.05) between simulated and observed time-series, considering different time series
  categories and clusters for both calibration strategies. Cluster with highest percentage for each
                                          flow category are in bold
                                                                  Percentage
                 Category                          Cluster
                                                                  NSE-based  RMSE-based
                                                   High flow      88%        50%
                 Complete time series              Low flow       0%         0%
                                                   Medium flow    27%        0%
                                                   High flow      47%        28%
                 High flow extracted time series   Low flow       0%         0%
                                                   Medium flow    13%        0%
                                                   High flow      94%        56%
                 Medium flow extracted time series Low flow       0%         5%
                                                   Medium flow    22%        11%
                                                   High flow      0%         17%
                 Low flow extracted time series    Low flow       62%        11%
                                                   Medium flow    75%        26%
4.3.5 The level of predictability of ecologically-relevant hydrologic indices using multi and
         single-objective strategies
4.3.5.1 Multi-objective calibration strategies
         For the period from 2003 to 2014, we computed the relative errors between 171
ecologically-relevant hydrologic indices obtained from the Pareto-optimal solutions and those
obtained from the observed hydrograph. Results were organized according to the eleven
hydrologic index groups defined by Olden and Poff (2003), which are described in section 4.2.4.
For each group and multi-objective calibration strategy, we determined the indices with median
relative errors within ±30%, using different sets of Pareto-optimal solutions: complete Pareto
front (i.e. all points) and high, medium, and low flow clusters obtained with the k-means method.
Indices whose median relative errors were outside the ±30% bound for all different collections of
Pareto-optimal solutions, are reported in Table 4 and described in Table A1. Hence, we
considered that these indices were not well represented by the calibration strategies and the
                                                     75


model structure employed in this study. Note that we discarded four hydrologic indices related to
the frequency and duration of zero-flow days (DL18-DL20) and low flow spells (FL3), all equal
to zero for this case of study. Therefore, the NSE-based calibration strategy was able to provide
acceptable representation of 128 indices (77%), while the RMSE-based calibration strategy did
the same for 123 indices (74%) out of 167 indices. In general, the RMSE-based strategy
provided more dispersion for indices values than the NSE-based strategy (see Table 5).
        Regarding the magnitude of flow events, both multi-objective calibration strategies
provided acceptable results for 76 out of 94 indices (81%). These strategies were not able to fully
represent the variability of flows across months and years, and the magnitude (mean and median)
of annual extreme flows. For instance, under average flow conditions, results showed poor
representation of the variability of some summer and fall monthly flows (i.e., MA29-MA33,
which are expressed in terms of the coefficient of variation) and the skewness in annual flows
(i.e., MA45, represented in terms of the difference between the mean and median annual flows,).
Additionally, the NSE-based calibration strategy generated high relative errors for the variability
across annual flows, expressed in terms of the range or 90th – 10th percentiles (i.e., MA42 and
MA44, which include extreme flow values). For low flow conditions, the mean and median of
annual minimum flows were not well replicated (i.e., ML14 and ML16, respectively), also
affecting the results for some indices depending on these values (e.g., low flow index, ML15;
baseflow index, ML19; and variability across annual minimum flows, ML21). For high flow
conditions, both calibration strategies had limitations in representing high flow volumes (e.g.,
MH21) and the mean maximum monthly flows for some summer and fall months (e.g., MH6,
June; MH7, July; MH10, October). Both multi-objective calibration strategies generated
acceptable median values for 10 out of 13 indices (77%) describing the frequency of flow events.
                                                 76


The indices that were not well represented include the low flow pulse count (i.e., FL1) and some
flood frequency indices that use the median and 75th percentile of flows as upper thresholds (i.e.,
FH5 and FH9). Moreover, the NSE-based strategy yielded poor representations of a high flood
pulse count index based on a very high upper threshold (i.e., FH4, which uses 7 times median
flow). Meanwhile, the RMSE-based strategy produced limited results for high flood pulse count
(i.e. FH1) and flood frequency using percentile 25th as threshold (i.e. FH8). For the duration of
flow events, both calibration strategies resulted in acceptable values for 32 out of 41 indices
(78%). The results for this group of hydrologic indices were consistent with the poor
representation of some indices describing the magnitude and frequency of flow events. For
instance, duration indices related to magnitude and variability of daily and annual minima (i.e.,
DL1 and DL11, and DL6-DL8, respectively) yielded elevated relative errors for both strategies.
Similarly, high flow indices with the median and 75th percentile of flows used as thresholds (i.e.,
DH17 and DH15, respectively), produced high relative errors too. Likewise, the NSE-based
strategy presented difficulties representing high flow duration using seven times the median as
an upper threshold (i.e., DH19). Moreover, the RMSE-based strategy yielded large deviations for
the annual minima of 3-day means of daily discharge (i.e., DL2), the mean annual 3-day
minimum of daily discharge (i.e. DL12), and indices related to flood duration (i.e. DH20, DH23)
because of poor results for pulse count. With respect to the timing of flow events, all the four
indices were well reproduced using both multi-objective calibration strategies. However, the
NSE-based strategy was not able to produce median relative errors within 30% for the seasonal
predictability of non-low flow (i.e. TL4). Finally, regarding the rate of change in flow events,
both multi-objective calibration strategies reproduced 6 out of 9 indices (67%) with median
                                                  77


relative errors outside 30% for the fall rate (i.e., RA3) and change of flow for increasing and
decreasing discharges (i.e., RA6 and RA7, respectively).
         In general, hydrologic indices presented in Table 4 are mainly influenced by extreme low
and high flows and attained a poor performance due to the model’s limited depiction of
descending limbs and low flow pulses that take place in the transition from summer to fall
seasons. Additionally, between June (summer) and October (fall) is when the lowest annual
flows are expected to occur in the study area. For instance, ML14, ML16, ML21, MH10, FL1,
DL11, DL16, DH15, RA6 and RA7 indices (see Table A1 for description) are all directly related
to discharges occurring in the period indicated above. Moreover, the aforementioned indices are
key indicators for the description of the flow regime of perennial streams, as indicated by Olden
and Poff (2003). Additionally, it is important to mention that extreme high flows are also being
under-predicted, as indicated by the statistical analysis performed in section 4.3.4. Therefore, it is
reasonable to assume that high flow indices with large upper thresholds values produced more
deviated results.
  Table 4 List of ecologically-relevant hydrologic indices with all, high, medium, and low flow
 Pareto-optimal solutions having median relative errors outside the ±30% bound, for each multi-
                                    objective calibration strategy
 Hydrologic index    No. of       Median values outside ±30% relative error**
 group               indicators   Both NSE- and RMSE-based          Only NSE-based Only RMSE-based
 Magnitude of flow events
 Average flow                     MA29, MA30, MA31, MA32,
                     45                                             MA42, MA44     MA34
 conditions                       MA33, MA45
 Low flow                         ML7, ML8, ML14, ML15,
                     22                                                            ML9, ML17
 conditions                       ML16, ML19, ML21, ML22
 High flow
                     27           MH6, MH7, MH10, MH21              MH22           MH11, MH23
 conditions
 Frequency of flow events
 Low flow
                     3            FL1
 conditions
 High flow
                     11           FH5, FH9                          FH4            FH1, FH8
 conditions
                                                    78


Table 4 (cont’d).
  Duration of flow events
  Low flow                       DL1, DL6, DL7, DL8, DL11,
                       20                                                          DL2, DL12
  conditions                     DL16
  High flow
                       24        DH15, DH17, DH21                DH19              DH20, DH23
  conditions
  Timing of flow events
  Average flow
                       3
  conditions
  Low flow
                       4                                         TL4
  conditions
  High flow
                       3
  conditions
  Rate of change in flow events
  Average flow
                       9         RA3, RA6, RA7
  conditions
          In comparison to previous studies where hydrological modeling was employed to predict
ecological-relevant hydrologic indices (Caldwell et al., 2015; Kiesel et al., 2017; Murphy et al.,
2013; Shrestha et al., 2014; Vis et al., 2015), the use of the median of different optimal-Pareto
sets improved the representation of some indicators (e.g. Julian day of annual minimum, TL1;
high flood pulse count, FH1; rise rate, RA1; reversals, RA8). However, key indices related to the
frequency and duration of high and low flow pulses (e.g., FL1, DL16, and DH15, see below)
were consistently poorly simulated. For instance, Table 5 presents the lowest relative error
obtained for a suite of 32 indices included in the software Indicators of Hydrologic Alteration
(IHA) (The Nature Conservancy, 2009) that were evaluated by Shrestha et al. (2014). For this
group of indices, while calibrated solutions can properly reproduce the magnitude, duration and
timing of different flow conditions (with difficulties for DL1, the annual minima of daily flows),
the frequency and duration of low flood pulses (i.e., FL1 and DL16, respectively), the duration of
high flow pulses (i.e. DH15), and the fall rate (i.e. RA3) still showed high deviances. These
outcomes might be related to the limited model reproduction of some descending limbs and low
                                                  79


flood pulses that occur at the beginning of the fall season as observed in Figure 6. The latter is
also confirmed with the high relative errors obtained for the average maximum monthly flows
for June, July and October (i.e. MH6, MH7 and MH10) and the variability of mean flows among
the same months (i.e. MA29 to MA33). It is important to note that most of the indices in Table 5
were well represented by high flow clusters, which are dominated by good performances for
NSE or RMSEH. However, the misrepresented low flow indices FL1 and DL16 obtained the
lowest relative errors using optimal-Pareto solutions from low flow clusters. These clusters have
a better description of low flow discharges, particularly in the seasonal transition from summer
to fall.
 Table 5 The lowest median relative error and corresponding interquartile range (IQR) and flow
  cluster for each multi-objective calibration strategy for the Indicators of Hydrologic Alteration
               (IHA). Values that exceed ±30% bound of relative error are highlighted
                                  NSE-based                         RMSE-based
                    Hydrologic
 IHA group                        Relative                          Relative
                    index**                   IQR        Cluster               IQR       Cluster
                                  error                             error
                    MA12          -3.4%       10.5%      High       -6.9%      11.4%     High
                    MA13          0.2%        6.1%       Low        -7.9%      9.8%      High
                    MA14          12.9%       7.3%       High       9.3%       10.4%     High
                    MA15          -7.8%       2.5%       High       -5.9%      6.2%      High
                    MA16          1.2%        3.9%       High       -3.0%      8.5%      High
 Magnitude of       MA17          -0.5%       7.4%       High       -4.5%      14.9%     High
 monthly water
 conditions         MA18          -1.0%       24.2%      All        -1.8%      32.6%     All
                    MA19          7.5%        10.2%      Medium     -1.0%      30.6%     Medium
                    MA20          1.8%        7.8%       High       0.7%       22.9%     High
                    MA21          -21.9%      6.5%       High       -23.7%     38.6%     Low
                    MA22          -17.9%      13.7%      High       -25.0%     20.7%     High
                    MA23          -7.1%       8.1%       High       -13.9%     13.7%     High
                                                  80


Table 5 (cont’d)
                    DL1          36.8%       10.5%      Medium    71.8%      48.2%    High
                    DL2          10.3%       8.5%       Medium    38.5%      38.7%    High
                    DL3          2.1%        16.8%      All       19.8%      33.4%    High
                    DL4          -8.7%       6.8%       High      2.2%       30.6%    All
 Magnitude and
 duration of annual DL5          -17.6%      6.1%       High      -14.8%     39.2%    Low
 extreme water      DH1          -17.1%      3.7%       High      -14.6%     11.9%    High
 conditions (mean   DH2          -8.9%       3.9%       Medium    -6.5%      10.3%    High
 daily flow)
                    DH3          -2.2%       3.8%       High      0.5%       9.8%     High
                    DH4          -0.3%       4.2%       All       1.5%       10.1%    High
                    DH5          -0.4%       5.1%       All       0.2%       6.3%     High
                    ML17         13.1%       4.1%       Medium    32.4%      25.1%    High
 Timing of annual   TL1          6.7%        4.0%       Low       12.2%      12.8%    Low
 extreme water
 conditions         TH1          -0.4%       10.2%      Medium    0.4%       32.8%    All
                    FL1          -65.5%      25.7%      Low       -75.7%     16.2%    Low
 Frequency and      DL16         213.5%      167.2%     Low       144.9%     181.2%   Low
 duration of high
 and low pulses     FH1          -28.4%      3.4%       High      -44.9%     16.1%    High
                    DH15         60.0%       34.4%      Low       100.0%     53.4%    High
 Rate and           RA1          18.9%       11.6%      Medium    -16.9%     27.3%    Medium
 frequency of
                    RA3          -50.0%      2.8%       High      -53.0%     13.2%    High
 water condition
 changes            RA8          5.5%        7.3%       Low       0.9%       25.2%    Low
4.3.5.2 Single-objective calibration
         We obtained the individual model simulations that minimized each NSE-based objective
function from the optimized Pareto front obtained after the NSGA-III algorithm implementation,
with their corresponding results for the ecologically-relevant hydrologic indices. Maximum
attained values for NSE, NSEsqrt and NSErel were 0.76, 0.73 and 0.81, respectively. Optimal NSE
and NSEsqrt models were able to simulate 119 out of 167 indices (71%) within 30% of relative
error each, while optimal NSErel model did the same for 78 indices (47%). Compared to NSE-
based multi-objective calibration strategy, some of the indices reported in Table 4 were
represented within the acceptability threshold of 30% using any of the single-objective
calibrated models. As expected, the optimal NSE model provided acceptable results for high
                                                 81


flow related indices: mean maximum monthly flows for June and July (MH6 and MH7,
respectively), high flow volume using as threshold three times the median annual flow (MH22),
high flow duration with seven times the media flow as the upper threshold (DH19) and the
seasonal predictability of non-low flows (TL4). However, key indices as MA19 (mean monthly
flow for August), DL2 (annual minimum of 3-day average flow), TL1 (Julian date of annual
minimum) and RA8 (reversals), included in Table 5, felt out the acceptability range defined in
this study. On the other hand, the optimal NSEsqrt model provided acceptable results for the mean
maximum October flow (MH10) which is a key index for perennial streams (related to low flows
during the fall season). Similarly, NSEsqrt produced acceptable outcomes for high flood pulse
count with seven times the median daily flow as the upper threshold (FH4), in addition to MH22
(high flow volume), DH19 (high flow duration) and TL4 (seasonal predictability), also given by
the optimal NSE model. However, optimal NSEsqrt model provided poor representation for
MA19 (August mean flow), DL2 (3-day annual minimum), RA1 (rise rate) and R8 (reversals).
Finally, the optimal NSErel model, which is insensitive to peak flows and biased towards low
flows, surprisingly improved the representation of the high flow pulse duration (DH15), a key
indicator for perennial streams. This occurs because NSErel significantly reduces the influence of
absolute differences during high flow events (Krause et al., 2005). Therefore, the NSErel
objective function has the property of benefiting simulations that better describe the overall
shape of the hydrograph, which can be graphically evinced in Figure 6 for the NSE-based low
flow cluster simulations. Key indices that cannot be represented by the NSErel optimal model
within 30% relative error include MA14-MA17 and MA21-MA22 (mean monthly flows for
March-June and October-November, respectively), ML17 (seven-day minimum flow divided by
mean annual daily flows averaged across all years), DL5 (seasonal magnitude of minimum
                                                 82


annual flow), and DH2-DH5 (magnitude of maximum annual flow from 3-day duration to
seasonal). As expected, aforementioned indices are mainly related with high flow events.
4.4 CONCLUSIONS
         This study evaluated the predictability of 167 ecologically-relevant hydrologic indices
using different approaches for model calibration. We compared the performance of two multi-
objective and three single-objective formulations employing the NSGA-III multi-objective
optimization algorithm and the SWAT model structure. In general, the two multi-objective
formulations performed better than the single-objective formulations in calculating the
hydrologic indices, within a range of acceptability given by 30% relative error. However, no
specific approach was able to outperform the others for all the same set of hydrologic indicators.
In this sense, all the evaluated formulations can be used to represent different targeted
ecologically-relevant hydrologic indices. An advantage of a multi-objective calibration approach
over a single objective alternative is the direct provision of a non-subjective range of variation
for the quantity of interest after the optimization process, given by the diversity of the set of
Pareto-optimal solutions.
         Among the multi-objective formulations tested herein, the NSE-based strategy provided
the highest number of well-predicted indices and the smallest dispersion (i.e., uncertainty) over
the different sets (all points, low, medium, and high flow clusters) of Pareto-optimal simulations.
The results indicated that low flows show acceptable efficiencies for the best representations of
high and medium flows. Consequently, the Pareto front region comprised of high and medium
flow clusters contained the highest percentages of Pareto-optimal solutions with no evidence of
significant mean difference between simulated and observed time-series. Likewise, this Pareto-
optimal region provided the highest number of hydrologic indicators with the lowest median
                                                   83


relative error and dispersion measure (i.e., interquartile range). Furthermore, this method
provided groups of solutions able to simultaneously describe different streamflow regime
components for distinct flow conditions, which has been proven to be a very difficult task for a
single optimal solution found by the current single-objective calibration strategies (including
multi-metric approximations) and available model structures.
        The multi-objective strategies were able to explain up to 77% of the ecologically-relevant
hydrologic indices. Important indices related to the frequency and duration of high and low flow
pulses were consistently poorly simulated. Limited model depiction of descending limbs and low
flow pulses that take place in the transition from summer to fall seasons resulted in weak
predictability of low flow indices. This issue has been clearly identified in previous studies and is
subject of current hydrology research (Garcia et al., 2017; Murphy et al., 2013; Pfannerstill et al.,
2014; Shrestha et al., 2014).
        In this study, we proposed the use of NSErel performance measure in a multi-objective
framework in order to improve the representation of low and extreme low flow events while
maintaining a good overall representation of other flow conditions. However, we showed that an
NSErel objective function improves low flows while highly sacrificing the representation of other
flow conditions, as opposed to the standard NSE for high flows which improves high flows,
maintaining acceptable representation of other flow conditions. Therefore, during the calibration
process we observed that NSErel affects the overall central tendency values (e.g. median and
mean flows) for different temporal scales (e.g. monthly, annual) negatively impacting the
representation of low flow indicators (e.g. baseflow index).
        We also demonstrated that the use of different set of solutions, instead of a single optimal
solution, introduces more flexibility in the predictability of different hydrologic indices of
                                                  84


ecological interest. Moreover, we were able to identify a reduced group of poorly represented
indices that are closely related (Table 4). This systematic identification would facilitate the
formulation of additional objective functions intended to improve model performance or to
detect model inadequacies that can be addressed to reduce structural uncertainties in future
research efforts.
                                                  85


       5 A NOVEL MULTI-OBJECTIVE MODEL CALIBRATION METHOD FOR
                             ECOHYDROLOGICAL APPLICATIONS
5.1 INTRODUCTION
         The streamflow regime is widely acknowledged as a key determinant of the ecological
integrity of riverine ecosystems (Poff et al., 1997; Sofi et al., 2020). Both climate and human-
driven alterations to natural streamflow fluctuations affect the structure and functioning of these
ecosystems, threatening biodiversity and restricting the provision of ecosystem services (Palmer
and Ruhi, 2019; Vörösmarty et al., 2010). Therefore, understanding and evaluating the impacts
of climate change and human interventions on the streamflow regime is critical to inform and
prioritize environmental management alternatives (Hassanzadeh et al., 2017; Mittal et al., 2016).
         A broadly accepted approach to characterizing streamflow regimes is to compute flow
statistics from streamflow hydrographs. These statistics, also known as hydrologic signature
metrics, streamflow characteristics (SFCs), or ecologically relevant hydrologic indices (ERHIs),
generally represent five fundamental facets: magnitude, frequency, duration, timing, and rate of
change of flows (Poff and Zimmerman, 2010). Currently, there are over 200 flow statistics
relevant to stream ecology (Archfield et al., 2014; Olden and Poff, 2003; Vogel et al., 2007).
These indices are usually employed in ecohydrological applications such as stream classification
(Kennard et al., 2010b; Mcmanamay et al., 2014), prediction of stream health or distribution of
riverine species (Hernandez-Suarez and Nejadhashemi, 2018; Kakouei et al., 2017), and
environmental flow determination (Mathews and Richter, 2007; Poff et al., 2010). Since these
applications generally cover large spatial scales, statistical and hydrological models have been
increasingly used, especially to predict regional changes in ERHIs due to climate and
anthropogenic factors (Caldwell et al., 2015; Mittal et al., 2016; Yang et al., 2016).
                                                  86


        Hydrological models are usually preferred over regional statistical approaches because
they can explicitly consider modifications in land use, environmental conditions, and
management practices (Hall et al., 2017; Shrestha et al., 2016). Moreover, some environmental
flow frameworks recommend using hydrological models for predicting streamflow in poorly
gauged or ungauged locations (Peters et al., 2012; Poff et al., 2010). However, there is a growing
number of studies revealing important limitations of hydrological models in representing ERHIs,
especially when these models are calibrated based on traditional performance metrics such as the
Nash-Sutcliffe efficiency (NSE) (Murphy et al., 2013; Shrestha et al., 2014; Vigiak et al., 2018;
Vis et al., 2015). These limitations include over or underprediction of low- and high-flow indices
(Wenger et al., 2010), high errors/uncertainties when predicting ERHIs related to timing,
duration, frequency, and/or rate of change of flows (Murphy et al., 2013; Shrestha et al., 2014;
Vigiak et al., 2018), and different sets of equally well-performing model parameters (in terms of
traditional metrics) yielding very different performances in terms of ERHIs (Vis et al., 2015).
        Current model calibration approaches for addressing limitations in ERHIs’ representation
can be classified into two major categories. In the first category (hereafter referred to as
performance-based), objective functions are formulated based on traditional performance metrics
with different streamflow transformations (e.g., square root, logarithm, inverse) to stress or
balance the importance of different flow conditions. On the other hand, calibration approaches in
the second category (hereafter referred to as signature-based) explicitly incorporate SFCs of
interest into the objective functions (Hallouin et al., 2020; Kiesel et al., 2020, 2017; Pool et al.,
2017; Vis et al., 2015; Zhang et al., 2016). In ecohydrological applications, the choice of SFCs of
interest has been mainly based on riverine species preferences (Hallouin et al., 2020; Kiesel et
al., 2020, 2017; Pool et al., 2017), whereas hydrological applications usually target Flow
                                                  87


Duration Curve (FDC) features, runoff ratios, and basic discharge statistics (Chilkoti et al., 2018;
Euser et al., 2013; Fernandez-Palomino et al., 2020; Pfannerstill et al., 2017, 2014; Sahraei et al.,
2020; Shafii and Tolson, 2015; Yilmaz et al., 2008). Some applications using performance-based
approaches target specific flow conditions (Garcia et al., 2017; Mizukami et al., 2019), whereas
others use one or multiple objective functions to attain an acceptable overall representation of the
streamflow regime (Hallouin et al., 2020). When combining multiple objective functions, studies
either use aggregated single-objective functions (Vis et al., 2015) or pure multi-objective
approaches (Chilkoti et al., 2018; Hernandez-Suarez et al., 2018; Sahraei et al., 2020). In general,
signature-based approaches provide better predictions of pre-selected SFCs compared to
performance-based approaches (Hallouin et al., 2020). However, those SFCs that are not
included in the original objective function formulation are not necessarily well-represented or
better-performing than traditional approaches using streamflow transformations (Hallouin et al.,
2020).
         During the last decade, researchers have obtained a better understanding of the
implications of model calibration into EHRIs replication. For instance, several studies have
demonstrated that the objective function choice or formulation influences the prediction of flow
statistics (Kiesel et al., 2020; Pool et al., 2017; Shafii and Tolson, 2015; Vis et al., 2015). Also,
these studies showed that optimality in terms of traditional performance metrics does not
necessarily result in optimal solutions for ecohydrological purposes (Hallouin et al., 2020; Kiesel
et al., 2020). In ecohydrological applications, regardless of the optimization scheme for model
calibration, it is uncommon to find solutions yielding acceptable results for all ERHIs of interest.
Also, finding an individual simulation with acceptable results for both low- and high-flow
conditions is unusual. Therefore, simulation ensembles such as median or averages of optimal
                                                    88


results, or their clusters, are recommended (Hernandez-Suarez et al., 2018; Vis et al., 2015). It is
worth noting that most of the calibration approaches used in previous ecohydrological studies
have run on single-objective mode (i.e., multi-metric, aggregated functions). Hence, those results
depend on the weight assigned to each ERHI or performance metric considered within the
objective function (Zhang et al., 2016), and tradeoffs among different indices, performance
metrics, or regime facets are not fully explored.
         The goal of this study was to develop calibration strategies providing a balanced
streamflow regime representation among the different regime facets (i.e., magnitude, frequency,
duration, timing, and rate of change). Two strategies were developed to compare both
performance- and signature-based calibration approaches. The strategy using a performance-
based approach was improved by incorporating a novel constraint formulation to obtain
simulations with targeted ERHIs within pre-defined acceptability thresholds. For the signature-
based strategy, tradeoffs between different streamflow regime facets were explicitly considered.
These calibration strategies were implemented in an agriculture-dominated watershed in
Michigan, US, using the recently developed evolutionary multi-objective optimization algorithm
called Unified Non-dominated Sorting Genetic Algorithm III (U-NSGA-III) and the Soil and
Water Assessment Tool (SWAT). To the best of our knowledge, previous multi-objective
calibration approaches for ecohydrological applications have not explicitly considered
optimization routines constraining the performance of ERHIs of interest. Likewise, this is the
first time that a multi-objective calibration approach is applied to targeted ERHIs, pursuing a
balanced representation of the overall streamflow regime while explicitly considering different
regime facets.
                                                  89


5.2 MATERIALS AND METHODS
5.2.1 Overview
        Two different strategies for multi-objective calibration were evaluated to improve the
representation of the overall streamflow regime in a watershed model. Strategy 1 employed a
constrained performance-based approach, whereas Strategy 2 used a constraint-free signature-
based approach (Figure 7). Strategy 1 consisted of three major steps. In the first step, the goal
was to identify a reduced set of performance metrics that jointly represented a wide list of ERHI.
Then, in the second step, a tailored constraint was formulated to generate individual simulations
with an acceptable replication of a reduced set of ERHIs of interest. This formulation was based
on pre-defined acceptability criteria for ERHI replication. Moreover, the selection of ERHI of
interest was performed by targeting a balanced representation of different flow regime facets. In
the third step, the outputs of the previous steps were used as inputs to formulate a multi-objective
optimization problem for model calibration. Meanwhile, Strategy 2 consisted of two major steps.
In the first step, a reduced set of ERHI was defined to provide a balanced representation of
different regime facets. Then, several objective functions representing different regime facets
were formulated. These objective functions were considered as inputs of the problem
formulation in step 2. This formulation was intended to explore tradeoffs in the simulation of
different regime facets. For each strategy, near-optimal Pareto solutions were obtained using an
evolutionary multi-objective optimization algorithm. Finally, preferred tradeoff solutions were
identified and compared using multicriteria decision-making (MCDM) methods.
                                                  90


 Figure 7 Overview of the two multi-objective strategies for model calibration evaluated in this
                                                study
5.2.2 Study Area
        The proposed strategies were evaluated in the Honeyoey Creek-Pine Creek
Watershed (Hydrologic Unit Code 0408020203), located in east-central Michigan, US (Figure
8). This watershed has a drainage area of 1010 km2 and is situated within the Saginaw River
Watershed, which drains into Lake Huron. The Saginaw River Watershed is identified as an area
of concern by the US Environmental Protection Agency (USEPA) due to water pollution,
wildlife habitat degradation, loss of recreational values, among others (USEPA, 2015).
According to data from the National Agricultural Statistics Service (NASS) of the US
Department of Agriculture (USDA), agriculture is the dominant land use (~50% of the area),
                                                  91


followed by forests (~24%), wetlands (~16%), pasturelands (~7%), and urban development
(~3%) (USDA-NASS, 2012).
                 Figure 8 Location of the Honeyoey Creek - Pine Creek Watershed
5.2.3 Watershed Model
        The Soil and Water Assessment Tool (SWAT 2012, Rev. 622) was used to simulate the
streamflow regime in the study area. SWAT is a semi-distributed, process-based, continuous-
time watershed model that can operate on a daily or sub-daily time step. SWAT is mainly used to
evaluate the impact of land use and management practices on water, sediments, nutrients,
pesticides, and bacteria yields at the watershed scale (Arnold et al., 2012). When using SWAT, a
watershed is divided into subwatersheds, which are further discretized into Hydrologic Response
Units (HRUs). HRUs are geographical units with homogeneous land use, soil, and topographical
characteristics. SWAT inputs controlling the water balance include daily or sub-daily
precipitation, maximum and minimum air temperatures, solar radiation, wind speed, and relative
                                                 92


humidity. SWAT simulates the watershed hydrology in two phases: land (loading) and water
network (routing). Simulated hydrological processes include snow accumulation and melting,
canopy storage, plant growth, evapotranspiration, infiltration, surface runoff, soil water
redistribution, lateral flow, groundwater flows, and channel routing (Neitsch et al., 2011).
         In this study, SWAT was used to obtain daily streamflow from 2003 to 2014 (calibration
period) and from 1983 to 1994 (validation period) at the outlet of the Honeyoey Creek-Pine
Creek Watershed (Figure 2). A warm-up period of two years was used to minimize the effect of
initial conditions on the simulations. Simulated streamflow values were compared against daily
observations obtained from the Pine River Near Midland US Geological Survey (USGS) gauging
station (ID 04155500) (USGS, 2020). Input daily precipitation and max/min temperature data
from 1981 to 2014 were collected from two weather stations provided by the National Centers
for Environmental Information (NCEI) of the National Oceanic and Atmospheric Administration
(NOAA) (NOAA-NCEI, 2020). The missing weather input data were estimated using SWAT’s
stochastic weather generator WXGEN (Neitsch et al., 2011). The watershed was divided into 250
subwatersheds, each consisting of a unique HRU obtained from dominant land use, soil, and
slope characteristics. These subwatersheds were delineated using stream network data from the
National Hydrography Dataset (NHD) and pre-defined units obtained from the Michigan
Institute for Fisheries Research (Einheuser et al., 2012). Elevation data with a 30-m resolution
was obtained from the National Elevation Dataset provided by the USGS National Map (USGS,
2018). Land use was extracted from the 30-m resolution Cropland Data Layer (CDL), which was
obtained from USDA-NASS (2012). Soil characteristics were extracted from the Soil Survey
Geographic Database (SSURGO) provided by the USDA Natural Resources Conservation
Service (NRCS) (USDA-NRCS, 2020). Potential evapotranspiration was calculated using the
                                                 93


Penman-Monteith equation (Monteith, 1965), whereas surface runoff was computed using the
Soil Conservation Service (SCS) curve number method (USDA-SCS, 1972). Streamflow was
routed through the channel network using the variable storage coefficient method (Williams,
1969). The model was calibrated by adjusting 15 parameters whose description and calibration
ranges are reported in Table 6.
                                 Table 6 Calibration parameters and ranges
 Parameter        Description                                                             Calibration range
 BIOMIXa          Biological mixing efficiency                                            [0, 1]
 CANMXa           Maximum canopy storage (mm H2O)                                         [-0.25, 0.25]
 CN2b             Initial Soil Conservation Service (SCS) runoff number for moisture      [0, 100]
                  condition II
 ESCOa            Plant uptake compensation factor                                        [0, 1]
 EPCOa            Soil evaporation compensation factor                                    [0, 1]
 ALPHA_BFa        Baseflow alpha factor (days−1)                                          [0, 1]
 GW_DELAYa        Groundwater delay time (days)                                           [0, 500]
 GWQMNa           Threshold depth of water in the shallow aquifer required for return     [0, 5000]
                  flow to occur (mm H2O)
 GW_REVAPa        Groundwater “revap” coefficient                                         [0.02, 0.2]
 REVAPMNa         Threshold depth of water in the shallow aquifer for “revap” or          [0, 1000]
                  percolation to the deep aquifer to occur (mm H2O)
 RCHRG_DPa        Deep aquifer percolation fraction                                       [0, 1]
 CH_N2a           Manning’s n value for the main channel                                  [0, 0.3]
 CH_K2a           Effective hydraulic conductivity in main channel alluvium (mm h −1)     [0, 500]
 SOL_AWCb         Available water capacity of the soil layer (mm H2O mm−1 soil)           [-0.25, 0.25]
 SURLAGa          Surface runoff lag coefficient                                          [1, 24]
 Notes:
 a Values are replaced in the SWAT input files by a drawn value from the calibration range.
 b Values are replaced in the SWAT input files by the existing parameter value (defined during the model set up)
 multiplied by 1 plus a drawn value from the calibration range.
5.2.4 Strategy 1: Constrained Performance-Based Model Calibration
5.2.4.1 Performance Metrics Selection
         A reduced set of performance metrics were used for objective functions’ formulation
from a list of widely used measures (see Table 7). These measures included NSE (Nash and
Sutcliffe, 1970), original and modified versions of the Kling-Gupta Efficiency (KGE, Gupta et
al., 2009; Kling et al., 2012), the index of agreement (IoA, Willmott, 1981), and the coefficient
                                                         94


of determination (R2). The Fourth Root Mean Quadrupled Error (R4MS4E) was also considered
in order to emphasize the largest residuals expected under high flow conditions.
         Since both NSE and the Root Mean Square Error (RMSE) vary only with the sum of
squared model residuals, just the former was contemplated in this study. Following Gupta et al.
(2009), NSE and KGE can be expressed in terms of three components representing correlation,
bias, and variability. Correlation relates to timing and hydrograph shape. Meanwhile, bias and
variability are aimed to reproduce the first and second moments of the distribution of
observations, which mainly affect magnitude-related SFCs. These three components interact
differently under each performance measure. For instance, bias is scaled by the standard
deviation of observations in NSE. Thus, in presence of high variability, the bias component
might be less important when obtaining optimal values. In addition, correlation and variability
components interact with each other in NSE, which generally results in underestimation of the
latter (Gupta et al., 2009). As an alternative, KGE provides a more balanced representation of
correlation, bias, and variability, while avoiding interactions among these components (Gupta et
al., 2009). By considering R2 as an additional measure, we aimed to evaluate the role of the
correlation component in ERHIs replication. Meanwhile, IoA was included to consider a
different way of normalizing the sum of square errors and its effects on ERHIs replication.
         To accentuate different flow conditions (i.e., low, moderate, and high), relative errors and
error transformations were considered. Except for R4MS4E, all measures included their standard
versions along with logarithmic, inverse, and square root transformations. Relative error versions
were only used for NSE and IoA. It is worth mentioning that, in general, the standard versions
favor high flows representation, square root transform is used for highlighting moderate or
                                                   95


average flow conditions, whereas logarithmic, inverse, and relative error versions accentuate low
flows (Bennett et al., 2013; Krause et al., 2005).
        Table 7 Performance metrics and transformations considered for the selection process
 Metric                                Range      Formula
 Nash-Sutcliffe                        (-∞, 1]            ∑𝑛𝑖=1(𝑆𝑖 − 𝑂𝑖 )2
 Efficiency (NSE)                                 1−
                                                         ∑𝑛𝑖=1(𝑂𝑖 − 𝜇𝑜 )2
 Kling-Gupta                           (-∞, 1]    Original (Gupta et al., 2009):
 Efficiency (KGE)                                 1 − √(1 − 𝑟)2 + (1 − 𝛼)2 + (1 − 𝛽)2
                                                  Modified (Kling et al., 2012):
                                                  1 − √(1 − 𝑟)2 + (1 − 𝛾)2 + (1 − 𝛽)2
                                                                                             𝜎𝑠
                                                         𝐶𝑜𝑣𝑠𝑜            𝜎𝑠      𝜇𝑠         𝜇𝑠
                                                  𝑟=            ;𝛼=          ;𝛽=     ;𝛾=     𝜎𝑜
                                                          𝜎𝑠 𝜎𝑜           𝜎𝑜      𝜇𝑜
                                                                                             𝜇𝑜
 Index of                              [0, 1]                      ∑𝑛𝑖=1(𝑆𝑖 − 𝑂𝑖 )2
 Agreement (IoA)                                  1−
                                                         ∑𝑛𝑖=1(|𝑆𝑖 − 𝜇𝑠 | + |𝑂𝑖 − 𝜇𝑜 |)2
 Coefficient                           [0, 1]                𝐶𝑜𝑣𝑠𝑜 2
 of Determination (R2)                            𝑟2 = (              )
                                                              𝜎𝑠 𝜎𝑜
 Fourth Root Mean                      [0, ∞)             𝑛
 Quadrupled Error (R4MS4E)                        4  1
                                                  √ ∑(𝑆𝑖 − 𝑂𝑖 )4
                                                     𝑛
                                                         𝑖=1
 Transformations
 Standard                                         𝑆𝑖 = 𝑦𝑖 ; 𝑂𝑖 = 𝑦̂𝑖
 Square root                                      𝑆𝑖 = √𝑦𝑖 ; 𝑂𝑖 = √𝑦̂𝑖
 Logarithmic                                      𝑆𝑖 = ln 𝑦𝑖 ; 𝑂𝑖 = ln 𝑦̂𝑖
 Inverse                                          𝑆𝑖 = 𝑦𝑖−1 ; 𝑂𝑖 = 𝑦̂𝑖 −1
 Relative                                         For NSE:
                                                                  𝑦𝑖 −𝑦̂𝑖              𝑦̂𝑖 −𝜇𝑜
                                                  𝑆𝑖 − 𝑂𝑖 =               ; 𝑂𝑖 − 𝜇𝑜 =
                                                                   𝜇𝑜                     𝜇𝑜
                                                  For IoA:
                                                                                   |𝑦𝑖 − 𝜇𝑠 | + |𝑦̂𝑖 − 𝜇𝑜 |
                                                  |𝑆𝑖 − 𝜇𝑠 | + |𝑂𝑖 − 𝜇𝑜 | =
                                                                                               𝜇𝑜
 Notes:
       1                1                         1
 𝜇 = ∑𝑛𝑖=1 𝑋𝑖 ; 𝜎 = √ ∑𝑛𝑖=1(𝑋𝑖 − 𝜇)2 ; 𝐶𝑜𝑣𝑠𝑜 = ∑𝑛𝑖=1(𝑂𝑖 − 𝜇𝑜 )(𝑆𝑖 − 𝜇𝑠 )
       𝑛                𝑛                         𝑛
 If 𝜇 = 𝜇𝑜 ⁡𝑜𝑟⁡𝜎 = 𝜎𝑜 → 𝑋𝑖 = 𝑂𝑖 ; if 𝜇 = 𝜇𝑠 ⁡𝑜𝑟⁡𝜎 = 𝜎𝑠 → 𝑋𝑖 = 𝑆𝑖
 If transformation is ‘Relative’: if 𝜇 = 𝜇𝑜 → 𝑋𝑖 = 𝑦̂𝑖 ; if 𝜇 = 𝜇𝑠 → 𝑋𝑖 = 𝑦𝑖 𝑦̂ = observed values; y= simulated
 values; 𝑛 = number of observations
                                                             96


        Single-objective model calibration was executed for each performance metric and
transformation indicated above, resulting in 23 individual optimization problems. Each
minimization objective function f was defined as 1 − 𝑃𝑚 , where 𝑃𝑚 is the transformed
performance metric to be maximized; for R4MS4E, 𝑓 = 𝑃𝑚 (as this metric has to be minimized).
As a next step, 171 ERHIs reported by Henriksen et al. (2006), and seven ERHIs proposed by
Archfield et al. (2014) were computed for each of the 23 optimal solutions. Simulated ERHIs
were compared against those obtained from streamflow observations by calculating relative
errors for each index 𝑒𝑟𝑒𝑙 for a vector of model parameters 𝜃, as follows:
                     𝐼 (𝑦(𝜃))−𝐼𝑖 (𝑦̂)
        𝑒𝑟𝑒𝑙𝑖 (𝜃) = ( 𝑖              )∙ 100                                                 (1)
                          𝐼𝑖 (𝑦̂)
        where 𝐼𝑖 is the i-th hydrologic index evaluated for simulations 𝑦(𝜃) and observations 𝑦̂.
Then, ERHIs within a pre-defined relative error threshold were identified for each optimal
solution. It was expected that optimal results from different performance metrics and
transformations yield different well-replicated ERHIs. Therefore, the final choice of performance
metrics was determined by selecting up to six transformed measures that jointly represented the
maximum number of ERHIs within the pre-defined relative error threshold. For the selection
procedure, the transformed measure with the highest number of ERHIs within the acceptability
threshold was selected. Then, another transformed measure was identified based on the
remaining ERHIs and added to the list of selected measures. The previous step was repeated until
either attaining the maximum number of well-replicated ERHIs or the pre-defined maximum
number of objective functions. It is worth noting that the fraction of non-dominated solutions
with respect to the total population increases with the number of objective functions, slowing
down the search process (Deb and Jain, 2014). Likewise, a higher population size is required to
maintain a good exploration of large dimensional spaces, which increases the number of function
                                                 97


evaluations and the overall computational time. For these reasons, we decided to limit the
number of objective functions to six.
        In this study, a real-parameter Genetic Algorithm (GA) (Goldberg, 1991) was used for
single-objective optimization. Particularly, tournament selection, simulated binary crossover
(SBX, Deb & Agrawal, 1994), and polynomial mutation (Deb, 2001) were designated as GA
operators. The optimization algorithm ran for 250 generations with a population size of 100,
resulting in a total of 25,000 model evaluations for each problem. The crossover probability and
distribution index for the SBX operator were defined as 0.9 and 10, respectively. Likewise, the
mutation probability and distribution index for the polynomial mutation operator were defined as
1/15 (i.e., the reciprocal of the number of calibration parameters) and 20, respectively. On the
other hand, the relative error threshold for EHRI replication was defined as ±30%, following
uncertainty in the estimation of hydrologic indices reported by Kennard et al. (2010a) when
using 15-year time series. This threshold has also been used in previous ecohydrological studies
to evaluate the performance of ERHIs predictions (Caldwell et al., 2015; Hernandez-Suarez et
al., 2018; Vis et al., 2015).
5.2.4.2 Constraint Definition
        Traditionally, hydrologic signatures have been used in model calibration either as
objective functions or post-calibration evaluation criteria (Shafii and Tolson, 2015). Here, we
used a set of relevant signatures as constraints given a pre-defined acceptability threshold. This
set can be identified by the modeler depending on the ecohydrological application needs. In this
study, we used 32 Indicators of Hydrologic Alteration (IHA) (The Nature Conservancy, 2009),
divided into five categories (Table 8), each representing specific streamflow regime facets. In
addition, seven indices presented by Archfield et al. (2014), which describe fundamental
stochastic properties of streamflow time series, were included in the constraint definition. The
                                                  98


aforementioned 39 indices are described in Table 8. For consistency, an acceptability threshold
of ±30% relative error was used for constraining ERHIs prediction.
 Table 8 List of 39 Ecologically Relevant Hydrologic Indices of interest used for multi-objective
                                               model calibration
 Category                    Index*              Description                                   Associated
                                                                                               variability index*
 IHA Group 1: magnitude      MA12 – MA23         Mean monthly flows from January to            MA24 – MA35
 of monthly water                                December (m3 s-1)
 conditions (IHA1)
 IHA Group 2: magnitude      DL1 – DL5           Annual minimum with 1-, 3-, 7-, 30-, and      DL6 – DL10
 and duration of annual                          90-day moving average flow (m3 s-1)
 extreme water conditions    DH1 – DH5           Annual maximum with 1-, 3-, 7-, 30-, and      DH6 – DH10
 (IHA2)                                          90-day moving average flow (m3 s-1)
                             ML17                Baseflow index based on the 7-day             ML18
                                                 minimum flow
 IHA Group 3: timing of      TL1                 Julian day of annual minimum                  TL2
 annual extreme water        TH1                 Julian day of annual maximum                  TH2
 conditions (IHA3)
 IHA Group 4: frequency      FL1                 Mean low flow pulse count per water year      FL2
 and duration of high and                        (year−1)
 low pulses (IHA4)           DL16                Mean low flow pulse duration (days)           DL17
                             FH1                 Mean high flow pulse count per water year     FH2
                                                 with a threshold equal to the 75th percentile
                                                 of the entire flow record (year −1)
                             DH15                Mean high flow pulse duration with a          DH16
                                                 threshold equal to the 75th percentile of the
                                                 entire flow record (days)
 IHA Group 5: rate and       RA1                 Rise rate (m3 s−1 d−1)                        RA2
 frequency of water          RA3                 Fall rate (m3 s−1 d−1)                        RA4
 condition changes           RA8                 Reversals (year−1)                            RA9
 (IHA5)
 Magnificent seven           MAG1 – MAG4         First four L–moments (mean, coefficient of
 (MAG) (Archfield et al.,                        variation, skewness, and kurtosis)
 2014)                       MAG5                Autoregressive lag-one AR(1) correlation
                                                 coefficient
                             MAG6 – MAG7 Amplitude and phase of the seasonal signal
 * Index abbreviations for Indicators of Hydrologic Alteration (IHA) as presented by Olden
 and Poff (2003).
         The constraint, which was formulated as the sum of two components, aggregates the
performance of all EHRIs of interest into a single measure. The first component is the number of
indices with relative errors outside the pre-defined acceptability threshold for EHRIs replication.
The second component is a weighted sum of relative violations by each index with respect to the
pre-defined acceptability threshold. The constraint can be expressed as follows:
                                                         99


                                              1 |𝐼𝑖 (𝑦(𝜃))−𝐼𝑖 (𝑦̂)|
        𝐶𝑉(𝜃) = ∑𝑚    𝑖=1 𝑘𝑖 (𝜃) [1 + 𝑤𝑖 (𝜏                         − 1)]               (2)
                                                       𝐼𝑖 (𝑦̂)
                      1 |𝐼𝑖 (𝑦(𝜃)) − 𝐼𝑖 (𝑦̂)|
            0, if⁡⁡⁡⁡                           −1≤0
𝑘𝑖 (𝜃) = {            𝜏           𝐼𝑖 (𝑦̂)
            1,                  Otherwise
                 1
        𝑤𝑖 =
               𝑔 ∙ ℎ𝑖
        where CV(𝜃) is the constraint violation for simulations y(𝜃), m is the total number of
indices (i.e., 39 in this study), 𝜏 is the acceptability threshold expressed as the absolute value of a
fraction between 0 and 1 (0.30 is used for this study), 𝑤𝑖 is the weighting factor for the i-th
index, g is the number of index categories (i.e., 6 in this study), and ℎ𝑖 is the total number of
indices in the category that contains the i-th index. The weighing factor was explicitly
incorporated to provide a balanced contribution from different streamflow regime facets. A
solution is considered feasible when CV(𝜃) attains a value of zero, but for ease of handling the
constraint with an optimization algorithm, we convert it to an inequality constraint as 𝐶𝑉(𝜃) ≤
0. By introducing the constraint formulation presented above, the optimization algorithm is
forced to find streamflow simulations in which all ERHIs of interest are estimated within the
acceptable range (i.e., the relative error is within 30%). It is worth noting that the constraint
definition is flexible enough to designate different acceptability thresholds 𝜏𝑖 for each index.
This might be necessary when it is needed to iteratively relax certain acceptability conditions to
find feasible solutions.
5.2.5 Strategy 2: Unconstrained Signature-Based Model Calibration
        Under this strategy, an objective function was formulated for each index category
presented in Table 8, as follows:
                          |𝐼𝑖 (𝑦(𝜃))−𝐼𝑖 (𝑦̂)|
        𝑓𝑗 (𝜃) = ∑𝑖∈𝐺𝑗                                                                          (3)
                                 𝐼𝑖 (𝑦̂)
                                                            100


        where 𝑓𝑗 (𝜃) is the objective function for the j-th category, and 𝐺𝑗 is the set of indices
belonging to the j-th category. Each objective function represents the total error obtained under
each index category. Relative errors were used to normalize the contribution from different
indices. No constraints were formulated for this calibration strategy. Therefore, opposite to
Strategy 1, no pre-defined acceptability thresholds for ERHIs replication and no weighting
factors were required in Strategy 2.
5.2.6 Evolutionary Multi-Objective Optimization Algorithm
        In both calibration strategies, the goal was to determine the values for the vector of model
parameters 𝜃 (i.e., decision variables) that minimize the objective functions formulated for each
strategy. Each decision variable 𝜃𝑝 , p = 1, 2, …, 15, could take a value within the ranges defined
in Table 6. In Strategy 1, those model simulations with 𝐶𝑉(𝜃) ≤ 0 were considered as feasible
(see Eq. 2), the remaining were infeasible. An evolutionary multi-objective optimization
algorithm, U-NSGA-III (Seada and Deb, 2016), was implemented to address the optimization
problems resulting from each strategy. U-NSGA-III is a population-based algorithm that
employs crossover and mutation operators along with non-dominated sorting and reference
directions to move towards near-optimum Pareto solutions. Reference directions are vectors
evenly filling the objective space. This algorithm can be used for single-, multi- (i.e., 2 or 3
objective functions), and many-objective (i.e., >3 objective functions) optimization problems,
and stems from the NSGA-III algorithm (Deb and Jain, 2014). It is worth mentioning that U-
NSGA-III can handle both unconstrained and constrained problems. For unconstrained problems,
during the non-domination sorting, any two solutions are compared using just the objective
function values. A solution x1 dominates a solution x2 when 1) x1 is no worse than x2 in all
objective functions, and 2) x1 is better than x2 in at least one objective function (Deb, 2001). In
                                                  101


constrained problems, the concept of constraint-domination is used instead. A solution x1
constraint-dominates a solution x2 when 1) x1 is feasible and x2 is infeasible, 2) both x1 and x2
are infeasible but x1 has a lower constraint violation CV, or 3) both x1 and x2 are feasible and x1
dominates x2 using the traditional domination principle (Jain and Deb, 2014). In non-domination
sorting, feasible solutions will always be on top of infeasible solutions. Likewise, the selection
operation when creating the offspring population is modified for constrained problems (Jain and
Deb, 2014).
         NSGA-III and U-NSGA-III have been implemented in previous water resources
applications, such as multivariate model calibration using streamflow and evapotranspiration
data (Herman et al., 2020), multi-objective calibration targeting different flow conditions
(Hernandez-Suarez et al., 2018), irrigation scheduling (Kropp et al., 2019; Mwiya et al., 2020),
reservoir design and operation (Chen et al., 2020; Pourshahabi et al., 2020), and optimization of
land use practices (Raschke et al., 2021). In this study, an interface for modifying SWAT input
files and executing the model was developed in Python 3.7. This interface also included the
computation of the ERHIs reported by Henriksen et al. (2006) and Archfield et al. (2014), and
was coupled with the Python library pymoo (Blank and Deb, 2020) to implement the U-NSGA-
III algorithm. The stopping criterion was set as a maximum of 1000 generations for the multi-
objective optimization, with a number of reference directions assigned equal to 100. Well-spaced
reference directions were generated using the recently developed Riesz s-Energy method (Blank
et al., 2021) included in the pymoo library. The operators and parameters chosen for crossover
and mutation were the same as the ones presented in section 5.2.4.1 for the GA, which are
standard and recommended (Deb et al., 2002b). Convergence to a near-optimal solution was
                                                 102


analyzed using the Hypervolume indicator (Auger et al., 2009), which is a measure of the
collective volume of the region dominated by the Pareto-optimal solutions in the objective space.
5.2.7 Selection of Preferred Tradeoff Solutions
        Since we were interested in obtaining solutions providing balanced representations of
different streamflow regime facets, we compared a set of preferred solutions from different
MCDM methods. Particularly, two approaches were implemented: compromise programming
(Zeleny, 2011), and the pseudo-weight method (Deb, 2001). The compromise programming
approach identifies the closest Pareto-optimal solution to a reference point using a user-defined
distance metric. Usually, the reference point is the ideal point, representing the best-expected
objective function values. In this study, the ideal point was the origin of the objective space. As
distance metrics, we used the ℓ𝑝 norm with 𝑝 = 2 (Euclidian distance) and 𝑝 → ⁡∞ (Chebyshev
distance). The latter is preferred for non-convex Pareto-optimal solutions. The metrics for a
Pareto-optimal solution were computed as follows (Branke et al., 2008):
                                   1
        ℓ𝑝 = (∑𝑀 𝑚=1|𝑓𝑚 − 𝑧𝑚 | )
                                 𝑝 𝑝
                                                                                              (4)
        ℓ𝑝→∞ = max(|𝑓𝑚 − 𝑧𝑚 |)                                                                (5)
                   𝑚
        where M is the number of objective functions, 𝑓𝑚 is the value for the m-th objective
function, and 𝑧𝑚 is the value of the m-th component of the reference point. Before applying any
distance metrics, the objective functions were normalized to values between 0 and 1. Meanwhile,
the pseudo-weight method generates a vector for each Pareto-optimal solution representing the
relative importance (or weight) of each objective function. The sum of the different weights in
each vector is forced to one. The pseudo-weight 𝑤𝑖 for the i-th component in a Pareto-optimal
solution was computed as follows (Deb, 2001):
                                                 103


                 (𝑓𝑖max −𝑓𝑖 )⁡⁄(𝑓𝑖max −𝑓𝑖min )
        𝑤𝑖 = ∑𝑀       max −𝑓 )⁄(𝑓 max −𝑓 min )                                                (6)
                𝑚=1(𝑓𝑚        𝑚     𝑚      𝑚
        where 𝑓𝑖max and 𝑓𝑖min are the maximum and minimum values for the i-th objective
function among all Pareto-optimal solutions, respectively. The denominator in Eq. 6 guarantees
that the sum of all pseudo-weight vector components for a Pareto-solution is equal to one.
Pseudo-weights are proportional to the difference between the maximum objective function
value and the solution’s value for a particular component. Thus, a higher pseudo-weight
indicates that the point is closer to the minimum objective function value for that component. In
other words, a higher pseudo-weight value indicates a higher preference for the corresponding
objective function. In this study, we selected the most balanced Pareto-optimal solution as the
one with the closest pseudo-weight vector to the M-dimension target vector [1/𝑀          ⋯ 1/𝑀].
Different target vectors can be used to explore how a Pareto solution changes when giving more
relevance to a particular objective function.
5.2.8 Evaluation of Calibration Results Using Water Balance, Flow Duration Curve
        Characteristics, and Additional Hydrologic Indices
        The Flow Duration Curve (FDC) is the complement of the streamflow cumulative
distribution function (Vogel and Fennessey, 1994). FDCs are signatures of runoff variability and
summarize a watershed's ability to generate streamflow values of different magnitude (Yilmaz et
al., 2008). FDCs have been widely used for model evaluation and calibration (Fenicia et al.,
2018). Since a FDC is a frequency-domain representation of a hydrograph, information
concerning to streamflow timing is lost, limiting its utility to diagnose the overall streamflow
regime. However, some characteristics extracted from FDCs are useful for understanding key
hydrological processes and their ecohydrological significance (McMillan, 2020a, 2020b). In this
study, we computed the percent bias (PBIAS) of four indices extracted from FDCs to evaluate
                                                 104


the consistency between calibration results and SFCs that have been typically used in signature-
based model calibration. The characteristics derived from FDCs were the very-high-segment
volume (FHV), high-segment volume (FMV), midsegment slope (FMS), and low-segment
volume (FLV) (Ley et al., 2016; Yilmaz et al., 2008). The aforementioned segments were
subjectively defined by Yilmaz et al. (2008) using the 2%, 20%, and 70% flow exceedance
probabilities. FMS is a signature of the vertical soil moisture redistribution and streamflow
flashiness. Likewise, FHV provides additional information regarding streamflow flashiness and
quantifies watershed reactions to large precipitation events. Meanwhile, FMV quantifies the
watershed response to heavy rainfall. Finally, FLV, which is related to long-term baseflow, was
computed using the modification reported by Casper et al. (2012) to reduce the effect of the
difference in lowest simulated and observed flows on the PBIAS computation. Long-term water
balance was also considered by computing the PBIAS in the overall runoff ratio (RR) (Yilmaz et
al., 2008).
        The IHA indices that were selected in this study are computed from metrics obtained on
an annual basis and represent the central tendency (i.e., mean) of annual metrics (Table 8). When
setting environmental flows or evaluating streamflow regime alteration, widely used methods
such as the Range of Variability Approach (Richter et al., 1997, 1996) also consider the
associated interannual variability in those metrics. These methods define streamflow alteration
targets as a function of central tendency and variability metrics. These targets are defined for
each streamflow regime facet using meaningful indices and are further customized depending on
the available ecological information of the study area (Poff et al., 2010; Richter et al., 1997).
Given the relevance of streamflow variability in ecohydrological applications, especially in the
definition of limits of streamflow alteration, we evaluated the impact of the two calibration
                                                 105


strategies defined in this study (which use only central tendency indices) in the replication of
associated variability indices. These variability indices are expressed here in terms of
coefficients of variation following Henriksen et al. (2006).
5.3 RESULTS AND DISCUSSION
5.3.1 Performance of Single-objective Model Calibration Using Transformed Metrics
        The relative errors for EHRIs replication under each optimal solution using the
transformed measures indicated in section 5.2.4.1 are presented in Figure 9. These results were
obtained as part of the objective functions’ selection routine under Strategy 1. Using hierarchical
clustering with Euclidean distances and Ward’s method, five groups of performance metrics
were identified based on their similarity in replicating ERHIs. These groups are presented in
Table 9 and can be visualized in Figure 9 for the different categories of hydrologic indices
(performance metrics were arranged by similarity in the y-axis). These groups revealed that
optimal solutions using R2 and relative-transformed metrics as objective functions behaved
drastically different, compared to the other evaluated metrics. Generally, optimal simulations
using the former metrics were able to represent those ERHIs that did not fall within the 30%
relative error threshold using KGE- and sum-of-square-errors-based metrics. For example, in
Figure 3b indices ML9 and ML10 are better represented by R2 and relative-transformed metrics
than any other metrics. Similar examples can be observed for DL6 and DL11 in Figure 3d, for
FH5 in Figure 3g, or for RA3 and RA6 in Figure 3k. Still, the overall performance in ERHIs
replication was very poor for R2 and relative-transformed metrics (having less than 51% of
ERHIs within the threshold according to Table 9). Other poor-performing metrics included
inverse- and log-transformed NSE. These results suggest that those measures should be used as
                                                 106


complementary criteria rather than objective functions in single-objective model calibration
when targeting the overall streamflow regime representation.
        Different performance metrics or groups of metrics are more suitable in replicating
specific index categories or streamflow regime facets (see Table 9). In terms of magnitude,
standard or square-root-transformed metrics are preferred when targeting average and high flows
(MA and MH, respectively), whereas low flows (ML) were best represented by optimal solutions
when using R2 for model calibration. Regarding duration, KGE and KGE’ provided the best
performing solutions for low flows (DL), whereas standard and square-root-transformed metrics
were better suited for high flows (DH). For frequency, most of the standard, square-root-, and
inverse-transformed metrics were better suited for low flows (FL), whereas KGEsqrt yielded the
highest proportion of well-replicated high flow (FH) indices. With respect to timing, standard
square-root-, or most of the log-transformed metrics are preferred when targeting average (TA)
and high flows (TH). Meanwhile, IoArel, IoAlog, KGE’inv, and R2 were the best performing
metrics when looking for optimal solutions in replicating low flows timing (TL). Those indices
representing the change of flow and reversals showed an acceptable replication under some R2-
based or relative-transformed metrics. However, in general, an acceptable representation of rate
of change indices (RA) was quite difficult, and none of the performance metrics that were
employed in this study resulted in an outstanding performance. Finally, all of the Magnificent
Seven indices (MAG) were well-replicated by optimal results using standard, square-root- or
log-transformed metrics.
        It is worth noting that none of the 23 identified optimal solutions were able to represent
five indices within the pre-defined acceptability threshold of 30% relative error. These indices
were the mean duration of flows exceeded 25% of the time (DH21), mean low flow pulse
                                                 107


duration (DL16), mean low flow pulse count (FL1), mean number of high flow events using the
flow exceeded 25% of the time as a threshold (FH9), and mean high flow volume using the
median annual flow as a threshold (MH21).
5.3.2 Selected Metrics for Constrained Performance-Based Model Calibration
         The selection process of objective functions under Strategy 1 resulted in three different
lists of six transformed measures jointly representing 168 out of 178 ERHIs within the 30%
relative error range. These lists had in common the first five measures: standard and inverse
KGE, and standard, inverse, and square root R2. The sixth measure was either R2log, KGE’inv, or
IoArel. We decided to proceed with the list containing IoArel because, opposite to the other two
lists, this one represented all rate of change indices within the 30% acceptability threshold. The
optimal solution using standard KGE was able to provide the highest number of indices within
the error threshold (i.e., 128 indices or 72% of all ERHIs, see Table 9). Note that the selected list
of metrics includes most of the best performing measures for each group reported in Table 9 (i.e.,
metrics in bold). However, this list did not well-represent five indices related to flow variability
and high flow magnitude: variability in annual minima of daily flows (DL6 and ML21),
variability in February and August flows (MA25 and MA31, respectively), and mean peak flows
using the median annual flow as a threshold (MH24). These indices were added to the five
indices that were not represented by an optimal solution (see Section 5.3.1).
                                                  108


Figure 9 Heatmaps with relative errors for 178 ecologically relevant hydrologic indices when
optimizing different transformed measures. Panels a) to l) represent an individual category of
                         hydrological indices as presented in Table 9
                                             109


      Table 9 Proportion of indices falling within the 30% relative error threshold under different categories of hydrologic indices.
Proportions are reported for each performance metric considered in the single-objective calibration process. Performance metrics were
grouped following proportions similarity. The best performing metric overall is in bold within each group. Proportions are color-coded
   as follows: 100% are dark green (excellent), 70-99% are light green (good), 55-69% are dark yellow (fair), 40-54% are light yellow
                                                        (poor), and 0-39% are red (very poor)
                                                                                Hydrologic indices category*
                                                                                                                                                   Overall
                                            MA      ML     MH      DL      DH      FL      FH      TA        TL       TH         RA     MAG
           Performance
  Group                                                                              Number of indices
           metric
                             Near-optimal
                                            45      22     27      20      24      3       11      3         4        3          9      7          178
                             value
  1        NSE               0.78           76%     41%    89%     60%     79%     67%     64%     100%      50%      100%       22%    100%       70%
           KGE               0.88           76%     41%    89%     75%     79%     67%     55%     100%      50%      100%       44%    100%       72%
           KGE'              0.86           73%     41%    85%     75%     75%     67%     55%     100%      25%      100%       22%    100%       69%
           IoA               0.93           73%     45%    78%     65%     79%     67%     55%     100%      50%      100%       11%    100%       67%
           R4MS4E (m3/s)     10.7           67%     45%    89%     60%     75%     33%     55%     100%      25%      100%       22%    100%       66%
           NSEsqrt           0.72           76%     36%    85%     55%     79%     67%     64%     100%      50%      100%       22%    100%       68%
           KGEsqrt           0.87           76%     41%    89%     55%     79%     33%     73%     100%      50%      100%       33%    100%       70%
           KGE'sqrt          0.85           76%     45%    78%     55%     79%     67%     36%     100%      50%      100%       11%    100%       66%
  2        IoAsqrt           0.93           76%     55%    78%     65%     75%     67%     45%     100%      50%      100%       33%    100%       69%
           KGElog            0.83           71%     45%    63%     55%     79%     33%     36%     100%      50%      100%       33%    100%       63%
           KGE'log           0.84           73%     45%    63%     55%     75%     67%     45%     100%      50%      100%       33%    100%       64%
           IoAlog            0.92           69%     55%    63%     60%     63%     67%     36%     100%      100%     100%       44%    100%       64%
  3        NSElog            0.55           71%     41%    44%     35%     71%     33%     36%     33%       75%      67%        22%    100%       54%
           NSEinv            0.45           64%     36%    22%     45%     54%     33%     36%     33%       75%      67%        11%    57%        46%
           KGEinv            0.67           71%     50%    41%     65%     67%     67%     55%     67%       75%      67%        33%    71%        60%
           KGE'inv           0.67           78%     45%    30%     70%     54%     67%     55%     67%       100%     67%        22%    100%       59%
           IoAinv            0.81           73%     45%    44%     65%     58%     67%     36%     67%       75%      67%        33%    71%        58%
  4        R2                0.80           42%     73%    37%     70%     38%     0%      36%     67%       100%     100%       33%    86%        51%
           R2sqrt            0.78           33%     23%    19%     40%     29%     33%     45%     67%       50%      100%       44%    71%        35%
           R2log             0.77           16%     5%     15%     30%     33%     33%     36%     67%       75%      100%       33%    57%        26%
           R2inv             0.59           4%      9%     11%     50%     21%     33%     27%     33%       75%      33%        33%    14%        20%
  5        NSErel            0.79           33%     45%    22%     50%     38%     33%     36%     67%       75%      67%        22%    57%        38%
           IoArel            0.93           42%     50%    19%     45%     38%     33%     36%     100%      100%     67%        33%    43%        41%
* MA = magnitude – average flows, ML = magnitude – low flows, MH = magnitude – high flows, DL = duration – low flows, DH = duration – high flows, FL =
frequency – low flows, FH = frequency – high flows, TA = timing – average flows, TL = timing – low flows, TH = timing – high flows, RA = rate of change,
MAG = magnificent seven indices.
                                                                            110


5.3.3 Overall Performance of Pareto-Optimal Solutions
         Each multi-objective calibration strategy was executed using 20 threads in parallel on a
machine equipped with two Intel® Xeon® CPU E5-2640 Processor at 2.5 GHz with 64 GB
RAM running Ubuntu 16.04.7 LTS. Total computation time for Strategies 1 and 2 were 32.43
and 30.86 hours, respectively. Strategy 1 successfully identified Pareto solutions satisfying the
defined constraint for all 39 ERHIs of interest. The first feasible solution was found at generation
48, and convergence to a near-optimal Pareto front was achieved after 800 generations once the
hypervolume indicator started to show a steady behavior (Figure 10a). Pareto front sizes at the
end of each generation over the U-NSGA-III search process did not exceed 35 solutions, with 25
near-optimal Pareto solutions for the 1000th generation. Similarly, Strategy 2 converged to a
near-optimal Pareto front after 800 generations. In this case, Pareto front sizes at the end of each
generation mostly varied between 20 and 40 solutions, with 29 near-optimal Pareto solutions for
the 1000th generation.
         Performance of near-optimal Pareto solutions for both strategies improved with respect to
the initial random population sampled from uniform distributions of model calibration
parameters (Figures 10b and c). Near-optimal solutions from Strategy 1 resulted in linear
correlations r between 0.80 and 0.85, whereas Strategy 2 provided results with a broader range
for r between 0.70 and 0.85. All Pareto solutions from Strategy 1 overestimated up to 1.3 times
the standard deviation in observations while showing a ratio between simulated and observed
means between 0.95 and 1.05. Meanwhile, Strategy 2 resulted in a more balanced and wider set
of near-optimal Pareto solutions in terms of both simulated/observed standard deviation and
mean ratios ( and , respectively). Under both strategies, the standard deviation of model
residuals was around 60-70% of the standard deviation of observations.
                                                 111


    Figure 10 Overall performance of the two model calibration strategies: a) 10-generations
  moving average of normalized hypervolume indicator and number of Pareto solutions over the
    U-NSGA-III search process, lighter colors represent values for each generation; b) Taylor
    diagram for the initial population and Pareto solutions at the last generation, contour lines
 represent the ratio of the standard deviation of residuals and standard deviation of observations,
     is the ratio of simulated and observed standard deviations, and r is the linear correlation
 coefficient; c) behavior of the ratio of simulated and observed means () obtained for the initial
                        population and Pareto solutions at the last generation
        A summary of the metrics and objectives (median, interquartile range (IQR), maximum,
and minimum) that were used to obtain the near-optimal Pareto and preferred tradeoffs solutions
are presented in Table 10. In both strategies, the performance metric showing the highest
variability, as presented by IQR, was KGEinv, which emphasizes low flow conditions. In general,
near-optimal Pareto solutions from Strategy 2 showed a higher variability of objective function
values compared to Strategy 1. In terms of KGEinv, Strategy 2 provided an overall better
performance, but also had solutions with very low values (minimum was -1.43). It is worth
noting that none of the maximum values for the performance metrics chosen under Strategy 1
were as high as those found when executing single-objective model calibration. For example, the
maximum KGE of 0.83 obtained from the near-optimal Pareto set from Strategy 1 reported in
Table 10 was below the near-optimum value of 0.88 reported for KGE in Table 9. These results
                                                 112


indicate that simulations with ERHIs of interest within 30% relative error are not necessarily
close to an optimum in terms of a particular performance metric.
     Table 10 Overall performance of near-optimal Pareto and preferred tradeoffs solutions under
        each model calibration strategy. Values in parenthesis correspond to the validation period
               Near-optimal Pareto solutions      Near-optimal Pareto solutions                   Preferred tradeoffs
                        Strategy 1                          Strategy 2
  Metric*                                                                           Compromise        Pseudo-
            Median     IQR      Max       Min  Median      IQR       Max       Min       prog.         weights    Compromise prog.
                                                                                     (Strategy 1)    (Strategy 1)     (Strategy 2)
                                                   Strategy 1 performance metrics
                                                                                         0.75            0.76             0.77
  KGE         0.77    0.06      0.83      0.67  0.77       0.04      0.83      0.68
                                                                                        (0.81)          (0.81)           (0.74)
                                                                                         0.40            0.41             0.47
  KGEinv      0.46    0.25      0.60      0.22  0.56       0.61      0.65     -1.43
                                                                                        (0.58)          (0.57)           (0.53)
                                                                                         0.73            0.71             0.66
  R2          0.72    0.01      0.74      0.70  0.64       0.07      0.71      0.53
                                                                                        (0.71)          (0.69)           (0.58)
                                                                                         0.74            0.73             0.69
  R2sqrt      0.72    0.02      0.74      0.70  0.66       0.05      0.73      0.62
                                                                                        (0.71)          (0.70)           (0.64)
                                                                                         0.40            0.41             0.41
  R2inv       0.40    0.02      0.43      0.37  0.44       0.04      0.52      0.37
                                                                                        (0.38)          (0.38)           (0.35)
                                                                                         0.91            0.91             0.90
  IoArel      0.92    0.01      0.92      0.90  0.88       0.04      0.92      0.82
                                                                                        (0.92)          (0.92)           (0.91)
                                                        Strategy 2 objectives
                                                                                       11.4%           11.6%            11.3%
  f1         11.6%    1.0%     14.5%     10.5% 10.7%      2.1%      22.8%     8.8%
                                                                                       (9.1%)          (9.2%)          (13.6%)
                                                                                       13.4%           11.7%              8.6%
  f2         11.8%    3.1%     18.1%      7.8% 15.5%      9.7%      40.4%     8.2%
                                                                                       (5.5%)          (6.3%)           (7.3%)
                                                                                       26.0%           26.7%            15.8%
  f3         26.4%    1.3%     27.7%     20.9% 12.6%      11.6%     26.0%     3.3%
                                                                                      (21.9%)         (19.6%)          (13.6%)
                                                                                       15.0%           10.5%            18.9%
  f4         11.9%    5.1%     21.2%      2.2% 13.5%      37.9%    107.9%     4.1%
                                                                                      (29.8%)         (23.5%)          (40.6%)
                                                                                       18.6%           15.1%            10.5%
  f5         16.9%    1.5%     19.9%     14.6% 14.2%      11.6%     27.9%     6.8%
                                                                                      (17.1%)         (25.1%)          (23.0%)
                                                                                         7.1%            7.2%             6.3%
  f6         7.9%     3.6%     12.1%      4.2%  6.9%      7.6%      15.0%     3.8%
                                                                                       (9.4%)          (8.1%)          (10.0%)
*f1 = objective function for IHA Group 1 (magnitude of monthly water conditions); f2 = objective function for IHA
Group 2 (magnitude and duration of annual extreme water conditions); f3 = objective function for IHA Group 3
(timing of annual extreme water conditions); f4 = objective function for IHA Group 4 (frequency and duration of
high and low pulses); f5 = objective function for IHA Group 5 (rate and frequency of water condition changes); f6 =
objective function for Magnificent Seven indices. See Equation 3.
5.3.4 Replication of Ecologically Relevant Hydrologic Indices of Interest
           Figure 11 shows the distribution of relative errors for each ERHI of interest and model
calibration strategy during both calibration and validation periods. During the calibration period,
which was defined between 2003 and 2014, all the indices computed from the near-optimal
                                                                 113


Pareto set from Strategy 1 fell within the 30% relative error range. Meanwhile, Strategy 2
provided median values for almost all ERHI of interest (MA19, i.e., August mean flow, was the
exception) within the same range, with some near-optimal Pareto solutions generating index
values outside this range. In general, Strategy 1 resulted in a lower variability of EHRIs values
compared to Strategy 2. Additionally, median relative errors had a similar behavior among both
calibration strategies. Some exceptions included DH1 to DH5 (i.e., duration of annual maxima)
in Figure 11b, which exhibited opposite trends under both strategies (i.e., overestimation for
Strategy 1 and underestimation for Strategy 2).
        Indices that showed the highest variability during the calibration period in both strategies
were mostly related to low flow conditions. These indices include DL1 to DL5 (i.e., duration of
annual minima) and ML17 (i.e., baseflow index) in Figure 5b, row 1; DL16 and FL1 (i.e., low
flow pulse duration and frequency), and RA3 (i.e., fall rate) in Figure 5c, row 1; and MAG3 and
MAG4 (i.e., skewness and kurtosis) in Figure 5d, row 1. Strategy 1 presented the most biased
results for indices in the IHA3 category representing the timing of annual extremes. This is
consistent with the f3 median value of 26.4% for this strategy, reported in Table 10, which is the
highest median value among both strategies and objectives used in Strategy 2. It is worth noting
that median values for objectives f2 and f4 (related to duration and frequency, respectively) were
lower and better for Strategy 1 than Strategy 2.
                                                 114


Figure 11 Boxplots representing the distribution of relative errors for each Ecologically Relevant
  Hydrologic Index of interest for the near-optimal Pareto solutions obtained under each model
    calibration strategy, horizontal dashed lines represent the 30% interval: a) magnitude of
  monthly water conditions; b) magnitude and duration of annual extreme water conditions; c)
 duration and frequency of high and low pulses, rate and frequency of water condition changes,
       and timing of annual extreme water conditions; d) Magnificent seven indices. Index
                                 abbreviations are listed in Table 8
        Near-optimal Pareto sets of model parameters obtained during model calibration were
validated for a 12-year period between 1983 and 1994. Validation results for the replication of
ERHIs of interest are also presented in Figure 11. From the list of 39 ERHI of interest, the
                                                115


median relative errors for three indices fell outside the acceptability range of 30%. These
indices were RA3 for both strategies, FL1 for Strategy 1, and DH15 for Strategy 2 (Figure 5c,
row 2). RA3 relative error values were mostly distributed within -50% and -40%, and median
relative error values for FL1 and DH15 were very close to -30% and 30% limits, respectively
(excepting DH15 for Strategy 2). These results are indicative of the robustness of both
calibration strategies. Variability of ERHIs of interest behaved similarly during the calibration
and validation periods. However, DH15 and TH1, related to duration and timing of high flow
events, drastically increased their variability during the validation period.
5.3.5 Performance of Preferred Tradeoff Solutions
        We obtained three different solutions (two from Strategy 1 and one from Strategy 2)
targeting a balanced representation of the different streamflow regime facets. Both Euclidean and
Chebyshev distances used for the compromise programming method selected the same preferred
solution from the near-optimal Pareto sets under each calibration strategy. For Strategy 2, the
pseudo-weight method's preferred solution was the same as the compromise programming
method. The overall calibration performance for these preferred solutions is reported on the right
side of Table 10.
        There are no major differences between the three preferred solutions in terms of
performance during the calibration period. Strategy 2 provided slightly better results for KGE
and KGEinv, whereas Strategy 1 solutions presented better R2 and R2sqrt values. It is worth noting
that NSE values were 0.61 and 0.60 for compromise programming and pseudo-weight solutions
in Strategy 1, respectively. The preferred solution under Strategy 2 attained a lower NSE of 0.56.
Regarding the replication of ERHIs of interest, the Strategy 2 preferred solution provided, in
average, lower absolute relative errors for five of six categories of hydrologic indices during the
                                                  116


calibration period. Meanwhile, Strategy 1 preferred solutions attained better replication results
for the IHA4 category, which is related to frequency and duration of high and low pulses.
        During the validation period, preferred solutions from Strategy 1 improved in terms of
KGE, KGEinv and IoArel (~7%, ~42%, and ~1%, respectively) while slightly worsening in terms
of R2, R2sqrt and R2inv (~3%, ~4%, and ~6%, respectively). NSE improved to 0.65 and 0.63 (~6%)
for the compromise programming and pseudo-weight method solutions, respectively. Average
absolute relative errors improved for the three first ERHI categories (i.e., magnitude of monthly
flows, duration and timing of extremes) and deteriorated for the remaining categories, especially
for the IHA4 category. Regarding Strategy 2, the preferred solution generally worsened in terms
of both performance metrics and ERHI replication, especially for the IHA4 category, which
exceeded the 30% threshold on average. Likewise, the validation NSE was reduced to 0.48 (14%
reduction).
5.3.6 Representation of Water Balance and Flow Duration Curve Characteristics
        Percent bias for long-term water balance and FDC characteristics are presented in Figure
12. In the same figure, FDCs for the preferred tradeoff solutions and near-optimal Pareto sets
under each strategy are compared against the observed FDC during the calibration period.
Generally, absolute biases of FDC characteristics for the validation period were lower than the
calibration period. Most of the near-optimal Pareto solutions over-estimated FMV (high-segment
volume) and FMS (midsegment slope), whereas FLV (low-segment volume) was mostly under-
estimated. Opposite to Strategy 1, FHV (very-high-segment volume) was mostly under-estimated
in Strategy 2. The maximum absolute bias for Strategy 1 was below 30%. For Strategy 2, the
maximum bias was just below 100%. Meanwhile, the largest variability resulted from FMS
under both Strategies, whereas the highest variability for FLV occurred under Strategy 2. The
overall RR (runoff ratio) showed a lower variability compared to the FDC characteristics.
                                                117


Moreover, this index was mostly under-estimated during both calibration and validation periods.
Concerning the preferred tradeoff solutions, none of them exceed the 30% absolute threshold for
any water balance or FDC characteristic. For these solutions, the most biased FDC characteristic
was FMS and the least biased was FHV. It is worth noting that, during validation, the minimum
observed flow was at least 0.5 m3/s lower than the minimum simulated flow for any preferred
tradeoff solution.
  Figure 12 Flow duration curves (FDCs) for the preferred tradeoff solutions identified for each
 calibration strategy compared against the observed FDC from 2003 to 2014: a) FDCs from near-
 optimal Pareto solutions for Strategy 1; b) FDCs from near-optimal Pareto solutions for Strategy
2; c) and d) represent the bias for FDC and water balance measures for Strategy 1 and Strategy 2,
                        respectively, under calibration and validation periods
                                                 118


5.3.7 Relationship between Water Balance, Flow Duration Curve Characteristics, and
         Ecologically Relevant Hydrologic Indices of Interest
         The results obtained in the previous section indicated that constraining or targeting
ERHIs of interest during model calibration did not drastically worsen long-term water balance
and FDC representation. Instead, calibration and validation results for ERHIs were relatively
consistent with the behavior of the five SFCs addressed above. For instance, FLV under-
estimation is related to the observed under-estimation of most of the monthly mean flows
(indices MA12-23 showed in Figure 11a). Likewise, baseflow index behavior (see ML17 in
Figure 11b) under both calibration and validation periods was consistent with RR. Lower values
in the latter (i.e., lower simulated mean flow) resulted in an increase of ML17, which is
computed as the ratio between the minimum 7-day flow and the overall mean flow (assuming the
minimum 7-day flow does not change drastically). Another example is the under-estimation of
FHV, which is related to the under-estimation of DH indices (Figure 11b), which can be caused
by missing high flow events or low volume events. The same logic applies to FHV over-
estimation.
         On the other hand, FMS interpretation posed a different and remarkable case. In this
study, simulations under both strategies were prone to yield steeper midsegment slopes. An
initial explanation for this behavior was that the chosen model structure and calibrated
parameters favored flashiness (i.e., abrupt ascendant and descendant streamflow changes after
the occurrence of rainfall events). However, this explanation contradicted the observed
underestimation of the fall rate (see RA3 in Figure 11c). When explicitly considering the timing
facet (neglected when constructing FDCs), we obtained a more consistent interpretation. For this
purpose, it is worth noting that end-of-summer monthly flows (i.e., MA18 and MA19, linked to
July and August months, respectively) were drastically over-predicted, whereas September and
                                                 119


October monthly flows were under-predicted. Also, the timing of flow minima (TL1), which
usually occurs during the summer season, was generally over-estimated. The latter followed the
lower simulated fall rate for the spring-to-summer transition. The lagged timing prediction in
annual minima resulted in the over-estimation of summer flows. Likewise, there was an
additional delay in the transition towards the fall season. This delay was one of the reasons for
the observed under-prediction in monthly flows for the fall season. Adding up this behavior
across all the simulated years mainly explained the FMS results. Given the consistency between
ERHIs constraining/targeting and FDC/water balance characteristics, model structure
inadequacies in representing intermediate and baseflows were likely the main factors
contributing to the previous inaccuracies.
5.3.8 Replication of Variability in Ecologically Relevant Hydrologic Indices
        Figure 13 shows the distribution of relative errors for IHA variability indices under each
model calibration strategy. Opposite to Strategy 2, many of the interannual variabilities of
monthly flows were not captured by Strategy 1 for the calibration period using the 30% relative
error threshold. However, most of these indices were well represented during the validation
period under both strategies. According to Figure 13a, the variabilities of winter flows (i.e.,
MA24-25, MA34-35) were over-predicted, with median relative errors as high as ~110%.
Variabilities of summer flows (i.e., MA29-31) were generally under-predicted, with absolute
median relative errors as high as ~50%. Indices representing variabilities in magnitude and
duration of annual extreme water conditions were mostly well represented under both strategies.
Compared to Strategy 1, Strategy 2 resulted in more over-predicted indices under this category
outside the acceptability threshold, especially those representing the duration of high flows
(Figure 13b). It is worth noting that median relative errors for the variability in the duration of
                                                 120


annual 1-day minimum flows (i.e., DL6) were slightly below -30% under both strategies and for
both calibration and validation periods. Regarding other streamflow facets (i.e., frequency, rate
of change, and timing), some calibration and validation results showed a contrasting behavior
(Figure 13c). For the calibration period, most of variability indices median relative errors fell
within the acceptability threshold regardless of the strategy. The most problematic index was the
coefficient of variation of the Julian day of annual minimum (i.e., TL2), which was over-
predicted with median relative errors around ~100%. Meanwhile, most indices of variability in
frequency/duration of flow pulses and variability in rate of change of flows were largely over-
predicted during the validation period, with median relative errors as high as ~120%. Therefore,
our results suggest that water resources managers must be particularly cautious when defining
streamflow regime alteration limits based on simulated low flow timing, rate of change, and
extreme events duration and frequency given the observed bias in both associated central
tendency and variability indices during model validation.
                                                 121


    Figure 13 Boxplots representing the distribution of relative errors for variability hydrologic
     indices under each model calibration strategy, horizontal dashed lines represent the 30%
      interval: a) variability in the magnitude of monthly water conditions; b) variability in the
   magnitude and duration of annual extreme water conditions; c) variability in the duration and
 frequency of high and low pulses, rate and frequency of water condition changes, and the timing
          of annual extreme water conditions. Index abbreviations are presented in Table 3
5.4 CONCLUSIONS
         Implementing the performance-based calibration strategy confirmed that various
performance metrics and transformations are better suited for particular streamflow regime
facets. Also, it was revealed that R2 and relative-transformed metrics behaved drastically
                                                   122


different comparing to KGE- and sum-of-square-errors-based metrics when replicating
hydrologic indices. Moreover, results showed that a balanced representation of the streamflow
regime is not directly related to the improvement of a particular performance metric. Instead, it
responded to tradeoffs among different performance-based objective functions stressing different
regime facets (i.e., magnitude, duration, frequency, rate of change, and timing) and flow
conditions (i.e., high, moderate, and low flows).
         The successful implementation of the signature-based calibration strategy demonstrated
that it is possible to obtain consistent hydrological responses by simultaneously targeting
multiple streamflow regime facets. More importantly, this was achieved without using any
performance-based objective function. However, compared to the latter, the signature-based
strategy resulted in higher variability in the near-optimal Pareto solutions, many of them with
simulated indices falling outside the acceptability threshold (30% relative error). Similarly, this
strategy resulted in a highly variable representation of water balance and FDC characteristics
compared to the performance-based strategy. Therefore, performance-based calibration is
preferable. It is worth noting that the variability in the near-optimal Pareto solutions obtained
under the two calibration strategies was driven mostly by the representation of low flows, as
revealed by the highly variable inverse-transformed KGE values and low-flow related FDC
characteristics among these solutions.
         The model calibration framework developed here can also be used as a diagnosis tool.
For instance, results revealed limitations of the SWAT model structure when representing the
vertical redistribution of soil moisture, fall rate, and timing of annual extremes. Likewise, the
representation of low flow magnitude and timing, rate of change of flows (especially rise and fall
rates), and duration and frequency of extreme flows was limited in terms of interannual
                                                   123


variability. These limitations impact the definition of limits to hydrologic alteration, which are
relevant when defining environmental flows and managing social-hydrological systems. Thus,
water managers and modelers must account for limitations in hydrologic indices replication
when defining or selecting streamflow regime targets as part of broader ecohydrological
frameworks and applications in ungauged or poorly gauged watersheds.
        In this study, we focused on analyzing the objective space and output variables of
interest. Analyzing the near-optimal decision variables (i.e., model parameters) and intermediate
variables representing other water cycle components (e.g., evapotranspiration, soil moisture,
groundwater) was out of the scope of this study. Our framework detected modeling limitations
when representing various streamflow regime facets, which is useful to address structural
inadequacies and improving the overall modeling process. Future research should involve
redesigning hydrological models and tailoring modeling practices (e.g., input data processing,
model parameters selection, choosing calibration/validation time periods/lengths) to better
represent ecologically-relevant characteristics of riverine ecosystems. Likewise, we recommend
future studies to analyze model parameter behavior and other water cycle components when
using any of the proposed calibration methods. In this regard, the proposed performance-based
method is flexible enough to implement multi-variable and multi-site model calibration.
                                                 124


          6 PROBABILISTIC PREDICTIONS OF ECOLOGICALLY RELEVANT
                HYDROLOGIC INDICES USING A HYDROLOGICAL MODEL
6.1 INTRODUCTION
         Hydrologic signatures (a.k.a., hydrologic indices) are quantitative features that
characterize the statistical properties of hydrologic time series (McMillan, 2020b). These
signatures, which are most likely obtained from streamflow data, have received increasing
attention due to their significance in understanding hydrologic and ecological processes (Carlisle
et al., 2017; McMillan, 2020a). In hydrological modeling, streamflow signatures are typically
used for model evaluation (Euser et al., 2013; Gupta et al., 2008; Jehn et al., 2019), model
calibration (Shafii and Tolson, 2015), and for informing watershed management in ungauged and
poorly gauged watersheds (Guo et al., 2021). Ecologically relevant hydrologic indices (ERHIs)
are a subset of hydrologic signatures that can be obtained from hydrologic simulations to predict
the biological condition of freshwater ecosystems in sites lacking streamflow or biological data
(Hernandez-Suarez and Nejadhashemi, 2018; Mazor et al., 2018). Likewise, simulated ERHIs
can be used to evaluate hydrologic and ecological alterations due to anthropogenic interventions
or changes in climate and land use (Bejarano et al., 2019; McKay et al., 2019; Sengupta et al.,
2018).
         Using hydrologic models to simulate ERHIs introduces uncertainty. However,
uncertainty sources are not only limited to modeling uncertainties (i.e., inputs, structure,
parameters, initial/boundary conditions, and measurement errors), but also include the signature
computation method (Westerberg et al., 2016; Westerberg and McMillan, 2015), the time series
length, and non-stationarity effects (Kennard et al., 2010a). For this reason, simulated ERHIs can
result in large prediction errors, especially when hydrologic models do not explicitly target those
                                                 125


ERHIs during model calibration (Hallouin et al., 2020; Vigiak et al., 2018). Whether a
calibration method accounts for uncertainties or not, it can be broadly classified into probabilistic
and deterministic methods (Tasdighi et al., 2018).
        On one side, probabilistic methods for model calibration consider different uncertainty
sources and can be classified into informal and formal approaches (Schoups and Vrugt, 2010).
The most important difference in these two approaches resides in the likelihood function
formulation (Beven and Binley, 2014). Informal methods use subjective measures (i.e., Limits of
Acceptability) to identify those simulations that are a good fit to the observations (i.e., behavioral
solutions), and then generate predictive distributions of the model parameters using sampling
algorithms (Vrugt and Beven, 2018). Generally, the characteristics of model residual errors are
treated implicitly and mapped onto the resulting parameter distributions (Beven and Smith,
2015). In this context, hydrologic signatures are typically used to further constrain the
identification of behavioral solutions (Blazkova and Beven, 2009). Examples using informal
methods and EHRIs can be found in Kiesel et al. (2020, 2017). Meanwhile, formal methods
explicitly consider a model of residual errors to formulate the likelihood function, providing
uncertainty estimates for the parameters of both hydrological and error models (McInerney et al.,
2017; Smith et al., 2015). Under a Bayesian framework, different sources beyond parameter
uncertainty can be explicitly addressed (Moges et al., 2021). However, the most common
practice, which is also conceptually problematic, uses a lumped error model to account for those
sources (Ammann et al., 2019). One of the major difficulties with hydrologic signatures has been
the formulation of closed-form and tractable likelihood functions (Sadegh et al., 2015). Thus,
these methods have been mainly implemented when the modeling objective is to predict the
streamflow time series. However, in recent years, the application of signature-based formal
                                                 126


probabilistic calibration was introduced by implementing Approximate Bayesian Computation
(ABC) methods. ABC methods do not require likelihood function evaluations and, instead, they
sample Bayesian posterior distributions at the expense of a higher number of model evaluations
(Fenicia et al., 2018; Kavetski et al., 2018; Sadegh and Vrugt, 2014).
        On the other side, deterministic methods are mainly comprised of optimization methods
using single or multiple objective functions to generate parameter sets that provide the best fit
between observations and simulations. Single-objective methods are rather unreliable for
decision-making since they do not address equifinality and identifiability issues (Beven, 2006).
In addition, they only provide point estimates of model parameters and predictions and ignore
the distributional properties of the model residual errors (Farmer and Vogel, 2016). In contrast,
multi-objective methods provide ranges of solutions (i.e., Pareto-optimal solutions). However,
the resulting Pareto-optimal distribution of model parameters and predictions do not necessarily
correspond to probabilistic solutions suitable for uncertainty analysis (Reichert and Schuwirth,
2012; Tang et al., 2018). In ecohydrological applications, replication of ERHIs using
deterministic methods have been the rule rather than the exception when performing model
calibration (Hallouin et al., 2020; Hernandez-Suarez et al., 2018; Parker et al., 2019; Pool et al.,
2017; Sengupta et al., 2018; Shrestha et al., 2016, 2014; Vigiak et al., 2018; Vis et al., 2015;
Zhang et al., 2016). However, it is worth noting that the reported prediction errors for some of
these studies are originated based on the distribution of the relative differences between observed
and point predictions of ERHIs across multiple locations (Vigiak et al., 2018), multiple
calibration trials (Pool et al., 2018; Vis et al., 2015), or Pareto-optimal solutions (Hernandez-
Suarez et al., 2018). These distributions are valuable since they can be used in other modeling
                                                    127


processes as prior knowledge, especially when using probabilistic methods (Almeida et al.,
2013).
        Improvements in ERHIs prediction have been mainly driven by the choice of objective
functions on untransformed or transformed streamflows. These objective functions target either
specific flow conditions (i.e., high, low flows), regime facets (i.e., magnitude, duration,
frequency, rate of change, timing), or hydrologic indices (Hallouin et al., 2020). As a result,
several calibration strategies have been devised, some of them resulting in ensembles of model
solutions (Hernandez-Suarez et al., 2018; Kiesel et al., 2020; Sengupta et al., 2018). Among
these strategies, those using multi-objective calibration gained popularity since they consider
tradeoffs among different targets (Efstratiadis and Koutsoyiannis, 2010; Kollat et al., 2012).
However, these methods do not provide formal uncertainty estimates for model parameters and
outputs. These estimates are relevant when predicting streamflows and ERHIs within the spatial
domain of distributed or semi-distributed models. In addition, it is important to estimate
uncertainty when developing and evaluating regionalization schemes based on hydrological
modeling results (Addor et al., 2018; Almeida et al., 2016; Guo et al., 2021; Mazor et al., 2018;
Moges et al., 2021; Prieto et al., 2019).
        Here, we evaluated the effect of prior knowledge obtained from multi-objective
calibration on the resulting posterior parameter distributions and ERHIs predictions when using a
time-domain Bayesian calibration method. This allows linking the advances in deterministic
ERHIs prediction and uncertainty quantification. To the best of our knowledge, this is the first
time that an evaluation of this kind is performed for predicting ERHIs. The objectives of this
study were to 1) estimate the total uncertainty in predicting a set of ERHIs when targeting the
overall streamflow time series, 2) compare the posterior model parameter distributions when
                                                 128


using non-informative versus Pareto-optimal priors, and 3) identify changes in parameter
estimation performance when using non-informative versus Pareto-optimal priors. We performed
this evaluation in an agriculture-dominated watershed in Michigan, US, using the Unified Non-
dominated Sorting Algorithm III (U-NSGA-III) (Seada and Deb, 2016) for multi-objective
calibration, the Soil and Water Assessment Tool (SWAT) as the hydrological model, and the
multiple-try Differential Evolution Adaptive Metropolis (ZS) (MT-DREAM(ZS)) algorithm
(Laloy and Vrugt, 2012) for sampling the posterior distributions. For the likelihood function, we
used a lumped residual errors model accounting for heteroscedasticity and autocorrelation
(McInerney et al., 2017).
6.2 MATERIALS AND METHODS
        We outlined two experiments using daily data to compare the performance of time-
domain Bayesian calibration under different prior knowledge conditions. We employed the same
likelihood function regardless of the experiment. For each experiment, we defined two time
periods with the same number of consecutive streamflow observations. In Experiment 1, we
employed non-informative priors for inferring model and error parameters for each time period.
Meanwhile, Experiment 2 was devised to evaluate the effect of prior knowledge obtained from
multi-objective calibration in Bayesian parameter estimation. For this purpose, we obtained
Pareto-optimal parameter distributions from one time period (Period 1) and used them as prior
knowledge for calibrating the model using data for the other time period (Period 2). We
compared the Pareto-optimal parameter distributions against the posterior distributions using
Bayesian inference for Period 1 with non-informative priors. Likewise, we compared the
predictive distributions for model parameters and ERHIs using informative and non-informative
                                               129


priors under Period 2. Finally, we assessed the reliability, precision, and bias of the streamflow
and ERHIs predictions for each experiment.
6.2.1 Bayesian Parameter Estimation
        In this study, we assumed that a streamflow observation 𝑌̃𝑡 at time step 𝑡 is linked to a
deterministic hydrological model 𝐻 with model parameters 𝜽𝐻 , and given forcing data 𝑿             ̃ , as
follows,
        𝑌̃𝑡 ← 𝐻𝑡 (𝜽𝐻 , 𝑿
                       ̃ ) + 𝜀𝑡 (𝜽𝜀 )                                                                  (1)
        where, 𝜀𝑡 represents the raw residuals as an aggregated measure of predictive errors. We
also assumed that the residuals follow a probability distribution with parameters 𝜽𝜀 . Using the
Bayes equation, the posterior probability distribution of hydrological and residual error model
parameters can be obtained by conditioning the model to observations and the given forcing data
(McInerney et al., 2017; Vrugt, 2016),
                   ̃, 𝒀
        𝑝(𝜽𝐻 , 𝜽𝜀 |𝑿  ̃ ) ∝ 𝑝(𝒀 ̃ |𝜽𝐻 , 𝜽𝜀 , 𝑿
                                             ̃ )𝑝(𝜽𝐻 , 𝜽𝜀 )                                            (2)
        where, 𝑝(𝜽𝐻 , 𝜽𝜀 ) is the joint prior distribution of hydrological and residual error model
parameters, and 𝐿(𝜽𝐻 , 𝜽𝜀 |𝑿 ̃, 𝒀
                                ̃ ) ≡ 𝑝(𝒀  ̃ |𝜽𝐻 , 𝜽𝜀 , 𝑿
                                                        ̃ ) is the likelihood function. In the following
sections, we present the model of residual errors and corresponding likelihood function (section
6.2.1.1), the prior distributions we used (section 6.2.1.2), and the sampling procedure to
approximate the posterior distributions (section 6.2.1.3).
6.2.1.1 Likelihood function
        The likelihood function summarizes the distance between the model simulations and
observations and is built on top of the model of residual errors (Vrugt, 2016). The error model
used here corresponds to a typical formulation in hydrological sciences that describes the total
effect of all sources of error (Ammann et al., 2019). We followed a transformational strategy to
                                                       130


account for heteroscedasticity and skewness in predictive errors (McInerney et al., 2017). For
this purpose, we used the Box-Cox or power transformation with parameter 𝜆 (Box and Cox,
1964),
                      (𝑌 𝜆 − 1)⁄𝜆 ⁡⁡⁡if⁡𝜆 ≠ 0
        𝑧[𝑌; 𝜆] = {                                                                                  (3)
                      log 𝑌 ⁡⁡⁡⁡⁡⁡⁡⁡⁡otherwise
        Thus, the transformed residual 𝜂 at time step 𝑡 is obtained as follows,
        𝜂𝑡 = 𝑧[𝑌̃𝑡 , 𝜆] − 𝑧[𝐻𝑡 (𝜽𝐻 , 𝑿  ̃ ); 𝜆]                                                      (4)
        Since errors in daily hydrological model outputs are usually highly autocorrelated, we
used a first-order autoregressive (AR1) model to consider the temporal persistence of the
(transformed) residual errors,
        𝜂𝑡 = 𝜙𝜂𝑡−1 + 𝑊𝑡                                                                              (5)
where, 𝜙 is the autoregressive parameter and 𝑊𝑡 is the disturbance or innovation. We assumed
that innovations followed a truncated Gaussian distribution to avoid negative streamflow
predictions with parameters 𝜇 = 0, 𝜎 = 𝜎𝑊 , and lower bound 𝐿𝑊,𝑡 (Fenicia et al., 2018),
        𝑊𝑡 ~𝒯𝒩 (0, 𝜎𝑊 , 𝐿𝑊,𝑡 (𝜽𝐻 , 𝑿     ̃ , 𝜂𝑡−1 ))                                                 (6)
        Note that 𝐿𝑊,𝑡 is defined such that 𝑧[𝐻𝑡 (𝜽𝐻 , 𝑿   ̃ ); 𝜆] + 𝜂𝑡 (𝜽𝜀 ) ≥ 𝑧[0; 𝜆], which makes it
time-dependent. Assuming innovations are independent, the likelihood function is formulated as
follows (Fenicia et al., 2018):
                               𝑁𝑡
                                                       𝑓𝑁 (𝑊𝑡 |0, 𝜎𝑊 )                       𝜕𝑧[𝑌̃𝑡 ; 𝜆]
        𝐿(𝜽𝐻 , 𝜽𝜀 |𝑿̃, 𝒀
                       ̃) = ∏                                                              ×
                                                                  ̃ ); 𝜆] − 𝜙𝜂𝑡−1 |0, 𝜎𝑊 )
                                     1 − 𝐹𝑁 (𝑧[0; 𝜆] − 𝑧[𝐻𝑡 (𝜽𝐻 , 𝑿                            𝜕𝑌
                              𝑡=1
                                                                                                     (7)
where, 𝑊𝑡 = 𝜂𝑡 − 𝜙𝜂𝑡−1 , 𝑓𝑁 (𝑣|𝜇, 𝜎) is the Gaussian probability distribution function with mean
𝜇 and standard deviation 𝜎 evaluated for 𝑣, 𝐹𝑁 (𝑣|𝜇, 𝜎) is the corresponding cumulative
                                                      131


distribution function (CDF), and 𝑁𝑡 is the total number of observations. It is worth noting that for
large 𝑁𝑡 , which is the case for hydrologic time series, the likelihood is a very small number that
can result in arithmetic underflow. Thus, it is common to work with the log-likelihood,
ℒ(𝜽𝐻 , 𝜽𝜀 |𝑿̃, 𝒀
               ̃ ), instead,
         ℒ(𝜽𝐻 , 𝜽𝜀 |𝑿̃, 𝒀
                        ̃)
                                                              𝑁𝑡                    𝑁𝑡
                                𝑁𝑡                       1                                𝜕𝑧[𝑌̃𝑡 ; 𝜆]
                          ≅ − log 2𝜋 − 𝑁𝑡 log 𝜎𝑊 − 2 ∑(𝜂𝑡 − 𝜙𝜂𝑡−1 )2 + ∑ log
                                 2                     2𝜎𝑊                                   𝜕𝑌
                                                             𝑡=2                   𝑡=1
                              𝑁𝑡
                          − ∑ log{1 − 𝐹𝑁 (𝑧[0; 𝜆] − 𝑧[𝐻𝑡 (𝜽𝐻 , 𝑿 ̃ ); 𝜆] − 𝜙𝜂𝑡−1|0, 𝜎𝑊 )}
                             𝑡=2
                                                                                              (8)
         The complete set of error model parameters is 𝜽𝜀 = {𝜆, 𝜙, 𝜎𝑊 }. In this study, we fixed the
values of 𝜆 and 𝜙 to 0.2 and 0.8, respectively, following Evin et al. (2014) and McInerney et al.
(2017) recommendations for reducing parameter interactions during model calibration.
6.2.1.2 Prior distributions
6.2.1.2.1 Experiment 1 – Non-informative priors
         In this experiment, we used uniform distributions that defined the feasible parameter
space by providing the minimum and maximum values for the hydrological and error model
parameters. Upper and lower limits for the hydrological model parameters were determined from
the literature and previous modeling exercises in the study area (see section 6.2.4.3), whereas 𝜎𝑊
limits were defined from an initial screening of modeling results. The initial states for the chains
used by the Markov chain Monte Carlo (MCMC) sampling algorithm for Bayesian inference (see
section 6.2.1.3) were drawn using Latin-Hypercube Sampling (McKay et al., 1979) subject to the
aforementioned parameter limits.
                                                 132


6.2.1.2.2 Experiment 2 – Multi-objective model calibration
        A constrained, performance-based, multi-objective calibration targeting a set of ERHIs
was executed to obtain near-optimal Pareto distributions of model parameters. We implemented
the recently developed evolutionary multi-objective optimization algorithm U-NSGA-III (Seada
and Deb, 2016). The calibration consisted in minimizing six objective functions 𝑓(𝜽𝐻 ) derived
from performance metrics 𝑃𝑚 (𝜽𝐻 ). Each 𝑓(𝜽𝐻 )⁡is computed on transformed or untransformed
streamflow values to accentuate different flow conditions or regime facets,
        𝑓𝑗 (𝜽𝐻 ) = 1 − 𝑃𝑚𝑗 (𝜽𝐻 )                                                                (9)
        where, 𝑗 = 1, 2, … , 6. The performance metrics used in this study for calibration were the
Kling-Gupta Efficiency (Gupta et al., 2009) computed on untransformed and inverse-
transformed values (𝐾𝐺𝐸 and 𝐾𝐺𝐸𝑖𝑛𝑣 , respectively), the relative Index of Agreement 𝑑𝑟𝑒𝑙
(Krause et al., 2005), and the coefficient of determination computed on untransformed, inverse-,
                                               2
and square-root-transformed values (𝑅 2 , 𝑅𝑖𝑛𝑣          2
                                                 , and 𝑅𝑠𝑞𝑟𝑡 , respectively). An optimization
constraint, which must not be greater than 0, was defined to limit all targeted ERHIs to not
exceed a predefined acceptability threshold 𝜏 in terms of relative error 𝑒𝑟𝑒𝑙 (𝜽𝐻 ),
                      𝐼𝑖 (𝐻(𝜽𝐻 , 𝑿̃ )) − 𝐼𝑖 (𝒀
                                             ̃)
        𝑒𝑟𝑒𝑙𝑖 (𝜽𝐻 ) =
                                   ̃)
                               𝐼𝑖 (𝒀
                                                                                                (10)
where, 𝐼𝑖 is the i-th ERHI evaluated for the simulation 𝐻(𝜽𝐻 , 𝑿   ̃ ) and observations 𝒀
                                                                                        ̃ . The
constraint 𝐶𝑉(𝜽𝐻 ) was formulated to penalize high relative errors and for separating feasible
from unfeasible solutions. An unfeasible solution results in ERHI values outside the predefined
acceptability threshold,
                                                   133


                      𝑚
                                                     |𝑒𝑟𝑒𝑙 𝑖 (𝜽𝐻 )|
        𝐶𝑉(𝜽𝐻 ) = ∑ 𝑘𝑖 (𝜽𝐻 ) [1 + 𝑤𝑖 (                              − 1)]
                                                            𝜏
                     𝑖=1
                                |𝑒𝑟𝑒𝑙 𝑖 (𝜽𝐻 )|
        𝑘𝑖 (𝜽𝐻 ) = {  0⁡⁡⁡⁡if⁡                     −1≤0
                                         𝜏
                      1⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡Otherwise
                 1
        𝑤𝑖 =
               𝑔 × ℎ𝑖
                                                                                              (11)
where, 𝑚 is the total number of ERHIs, 𝑤𝑖 is the weighting factor for the i-th ERHI, g is the
number of ERHI categories, and ℎ𝑖 is the total number of ERHIs in the category that contains the
i-th ERHI. The ERHIs used in this study and their categories are presented in section 6.2.4.4.
The value of 𝜏 used in this study was 0.3 (Hernandez-Suarez et al., 2018), which means that all
the targeted ERHIs must attain relative errors within 30%. Once the near-optimal Pareto
solutions were obtained, we computed 𝜎𝑊 for each individual solution using equations 3 – 5. A
multivariate kernel distribution was generated to have a non-parametric representation of the
joint distribution of parameters 𝜽𝐻 and 𝜎𝑊 using the resulting near-optimal Pareto parameter
sets. For this purpose, we employed the mvksdensity function in Matlab R2019b using a
Gaussian kernel. An initial vector of bandwidths was defined using the Silverman’s rule of
thumb (Silverman, 1986),
                                   1⁄
                       4              (𝑑+4)
        𝑏𝑘 = 𝜎𝑘 [                ]
                   (𝑑 + 2)𝑛
                                                                                              (12)
where, 𝑑 is the number of dimensions (i.e., number of hydrological and error model parameters),
𝑛 is the number of observations (i.e., Pareto-optimal solutions), 𝑘 = 1, 2, … , 𝑑 , and 𝜎𝑘 is the
standard deviation of the k-th variate (i.e., parameter). This vector of bandwidths was further
                                                              134


refined by maximizing the agreement between the marginal empirical CDF of each parameter
and the corresponding marginal CDF obtained from the multivariate kernel density distribution
using a 5-fold cross-validation and a genetic algorithm. The resulting optimized multivariate
kernel distribution was then used as the prior distribution 𝑝(𝜽𝐻 , 𝜽𝜀 ) under this experiment. It is
worth noting that the initial states for the chains used by the MCMC sampling algorithm were
directly drawn from 𝑝(𝜽𝐻 , 𝜽𝜀 ).
6.2.1.3 Sampling algorithm
        In this study, the MT-DREAM(ZS) algorithm (Laloy and Vrugt, 2012) was used to
efficiently explore the posterior distribution of hydrological and error model parameters. MT-
DREAM(ZS) is an adaptive MCMC algorithm using multiple-try sampling, snooker updating, and
an archive of past states to improve the convergence of computationally intensive and high-
dimensional models (Vrugt, 2016). This method belongs to the Differential Evolution Adaptive
Metropolis (DREAM) multi-chain family of algorithms for Bayesian inference. The original
DREAM algorithm automatically adjusts the scale and orientation of the proposal distribution
used for posterior inference. In addition, it employs subspace sampling and outlier chain
correction while maintaining balance and ergodicity (Vrugt et al., 2009). DREAM(ZS) introduced
the use of past samples (from an external archive) into the jump distribution. As a result, the
sampling procedure requires a smaller number of chains to explore the target distribution,
outliers do not need a forceful treatment, and chains can run in a distributed manner (Vrugt,
2016). To ensure convergence (given the violation of Markovian principles introduced by
adaptive Metropolis samplers), the adaptation rate decreases with the number of generations (Ter
Braak and Vrugt, 2008). MT-DREAM(ZS) introduced multiple-try sampling in each of the chains,
                                                  135


creating 𝑚𝑡 different proposals in each chain that can be evaluated in parallel, which is more
practical than running DREAM with large chain numbers (Laloy and Vrugt, 2012).
          Convergence to a stationary posterior distribution was monitored using the multivariate
𝑅̂ -statistic proposed by Gelman and Rubin (1992). The multivariate 𝑅̂ -statistic, which is
computed using the last 50% samples of each parallel chain, is used to evaluate whether the
between- and within-covariance matrices of these chains are similar. When these matrices are
very similar, the 𝑅̂ -statistic is close to unity. An 𝑅̂ -statistic below 1.2 is used in practice to
declare convergence (Gelman et al., 2013).
6.2.2 Generation of Predictive Distributions of ERHIs
          Predictive distributions were generated by propagating the posterior probability
distributions for the model and error parameters through the hydrologic model and ERHIs
computation methods. For an individual set of parameters 𝜽𝑖 = (𝜽𝑖𝐻 , 𝜽𝑖𝜀 ), a prediction of a given
ERHI was obtained as follows (Fenicia et al., 2018; McInerney et al., 2017):
1) Sample 𝑊𝑡 using equation 6.
2) Obtain 𝜂𝑡𝑖 using equation 5. For 𝑡 = 1, 𝜂𝑡𝑖 is directly sampled and step 1 is ignored:
     𝜂1𝑖 ← 𝑓𝑁 (0, 𝜎𝜂𝑖 )                                                                             (13)
                       2
     where, 𝜎𝜂2 = 𝜎𝑊     /(1 − 𝜙 2 )
3) Compute the streamflow prediction at time step t for the given 𝜽𝑖 as follows:
                                   ̃ ); 𝜆𝑖 ] + 𝜂𝑡𝑖 )
     𝑌𝑡𝑖 (𝜽𝑖 ) = 𝑧 −1 (𝑧[𝐻𝑡 (𝜽𝑖𝐻 , 𝑿                                                                (14)
     where, 𝑧 −1 is the inverse Box-Cox transformation.
4) Once 𝑌𝑡𝑖 is obtained for the 𝑁𝑡 time steps, the j-th ERHI is computed using equation 15:
     𝐸𝑅𝐻𝐼𝑗𝑖 = 𝐼𝑗 (𝑌 𝑖 (𝜽𝒊 ))                                                                        (15)
                                                     136


        The parameter sets were taken from the last 20% posterior samples obtained by the MT-
DREAM(ZS) algorithm.
6.2.3 Performance evaluation
        We computed three measures to quantify reliability, precision, and bias for evaluating the
performance of the Bayesian parameter estimation under each experiment. A prediction is
reliable when the observations can be considered samples of the predictive distribution (i.e.,
observations consistently fall within the prediction bounds). The reliability measure that was
employed in this study represents the average absolute difference between the predictive
quantile-quantile (PQQ) plot and a 1:1 line representing the CDF of a standard uniform
distribution 𝑈(0,1) (McInerney et al., 2017). Regarding precision, we determined the average
coefficient of variation of the predicted streamflows using the observations as a proxy to the
average streamflow at each time step (McInerney et al., 2017). The precision measure represents
the width of the prediction bounds. The following equation was used to compute the precision:
                                  𝑁𝑡
                              1      𝑠𝑡𝑑(𝑌𝑡 )
        Precision[𝑌, 𝑌̃] = ∑
                             𝑁𝑡         𝑌̃𝑡
                                 𝑡=1
                                                                                             (16)
where, 𝑠𝑡𝑑(𝑌𝑡 ) is the standard deviation of streamflow predictions at time step 𝑡. Finally, we
computed the absolute volumetric bias to evaluate the long-term water balance error of the
predictions. For this purpose, we used the mean prediction value at each time step 𝑌̅𝑡 :
                        ∑𝑁 𝑡 ̃        𝑁𝑡 ̅
                          𝑡=1 𝑌𝑡 − ∑𝑡=1 𝑌𝑡
        Bias[𝑌, 𝑌̃] = |                     |⁡
                              ∑𝑁𝑡 𝑌̃𝑡
                                𝑡=1
                                                                                             (17)
        The reliability, precision, and bias measures used herein targeted the overall streamflow
predictions; for the ERHIs predictions, we directly compared the resulting predictive
                                                 137


distributions against the ERHIs obtained from the observations. For the latter, we visually
inspected whether the ERHIs from observations fell within the corresponding predictive bounds.
Also, we verified whether the median relative errors felt within the nominal ±30% relative error
range. This error range has been reported in previous studies as a reference value for uncertainty
in ERHIs due to data length effects when working with 15-year time series (Kennard et al.,
2010a). In addition, we computed the coefficient of variation of each ERHI distribution using the
observed ERHI as a reference and obtained the average relative error between each predicted and
observed ERHI.
6.2.4 Case study
6.2.4.1 Study area and model
        We executed the calibration experiments in the Honeyoey Creek-Pine Creek Watershed
located in east-central Michigan, US (Figure 14). This watershed has a drainage area of 1010
km2, and its land use is predominantly agriculture, covering about 50% of the total area, followed
by forests (~24%) and wetlands (~16%). Developed areas account for less than 4% of the total
watershed area (Hernandez-Suarez et al., 2018). SWAT 2012, Rev. 622 was used as the
deterministic hydrological model in equation 1 for predicting streamflow at the watershed outlet.
SWAT is a process-based model widely used to simulate daily water quantity and quality time
series at the watershed scale (Arnold et al., 2012). In SWAT, a watershed is divided into
subwatersheds, which are comprised of hydrologic response units (HRUs). An HRU is a
homogeneous land unit concerning land use/cover, soil type, and slope. In this study, the
Honeyoey Creek-Pine Creek Watershed was divided into 250 subwatersheds, each one
comprised by a single HRU representing the dominant land use, soil type, and slope conditions
(Einheuser et al., 2012). SWAT was used to simulate daily streamflows from 2003 to 2014
(Period 1) and from 1983 to 1994 (Period 2). Warm-up periods of 2 years (1981-1982, and
                                                 138


2001-2002) were used to reduce the effect of initial conditions in both periods. Potential
evapotranspiration was estimated using the Penman-Monteith equation (Monteith, 1965).
Surface runoff was obtained using the Soil Conservation Service (SCS) curve number method
(USDA-SCS, 1972), and the selected routing method was the variable storage coefficient routine
developed by Williams (1969).
                        Figure 14 Study area location and major land uses
6.2.4.2 Data collection
       Input data included the 30-m resolution National Elevation Dataset provided by the US
Geological Survey (USGS, 2018), the 30-m resolution Cropland Data Layer provided by the
National Agricultural Statistics Service of the US Department of Agriculture (USDA-NASS,
2012), soil properties extracted from the Soil Survey Geographic Database (SSURGO) from the
USDA Natural Resources Conservation Service (USDA-NRCS, 2020), and daily precipitation
and maximum and minimum air temperature time series from 1981 to 2014 collected at two land
                                                139


stations from the National Centers for Environmental Information of the National Oceanic and
Atmospheric Administration (NOAA-NCEI, 2020). Missing values and remaining input weather
data such as solar radiation, wind speed, and relative humidity were estimated using the SWAT
built-in WXGEN stochastic weather generator (Neitsch et al., 2011). Daily observed streamflow
records were obtained at the watershed outlet for the period of study from the Pine River Near
Midland USGS gauging station 04155500 (USGS, 2020).
6.2.4.3 Calibration parameters
        The set of 𝜽𝐻 to be estimated was comprised of 15 SWAT model parameters. Maximum
and minimum limits for each parameter were defined following the model documentation
(Neitsch et al., 2011) and previous studies (Herman et al., 2018; Hernandez-Suarez et al., 2018).
Model parameters were adjusted during the calibration process either by replacing the original
value with a new one or by perturbing the original value by a fraction. Most of the parameters
were assumed to have the same value in every HRU. Only the Curve Number for moisture
condition II (CN2) and the available water capacity of the soil layer (SOL_AWC) were assumed
to spatially change and were adjusted by perturbing the initial default values by a global fraction.
These global fractions were calibrated instead of estimating CN2 and SOL_AWC at each
individual HRU. Calibration model parameters and their calibration ranges are presented in
Table 11.
                                                140


                              Table 11 Model calibration parameters and ranges
  Parameter     Description                                                                 Calibration range
  Biomix        Biological mixing efficiency                                                [0, 1]
  CN2*          Initial Soil Conservation Service (SCS) runoff number for moisture          [-0.25, 0.25]
                condition II
  Canmx         Maximum canopy storage (mm H2O)                                             [0, 100]
  Esco          Plant uptake compensation factor                                            [0, 1]
  Epco          Soil evaporation compensation factor                                        [0, 1]
  Alpha bf      Baseflow alpha factor (days−1)                                              [0, 1]
  Gw delay      Groundwater delay time (days)                                               [0, 500]
  Gwqmn         Threshold depth of water in the shallow aquifer required for return flow    [0, 5000]
                to occur (mm H2O)
  Gw revap      Groundwater “revap” coefficient                                             [0.02, 0.2]
  Revapmn       Threshold depth of water in the shallow aquifer for “revap” or              [0, 1000]
                percolation to the deep aquifer to occur (mm H2O)
  Rchrg dp      Deep aquifer percolation fraction                                           [0, 1]
  Ch n2         Manning’s n value for the main channel                                      [0, 0.3]
  Ch k2         Effective hydraulic conductivity in main channel alluvium (mm h−1)          [0, 500]
  Sol awc*      Available water capacity of the soil layer (mm H2O mm−1 soil)               [-0.25, 0.25]
  Surlag        Surface runoff lag coefficient                                              [1, 24]
  Notes:
  *These parameters were adjusted by perturbing the initial default spatially-varying values for each HRU by a
  global fraction.
6.2.4.4 Ecologically Relevant Hydrologic Indices
          The selection of hydrologic indices depends on the ecohydrological application
objectives. For instance, some studies target non-redundant hydrologic metrics for streamflow
classification (Eng et al., 2017), others target specific hydrologic indices relevant to the condition
of specific biological communities (George et al., 2021). Our goal in this study was to calibrate
the hydrologic model targeting a balanced representation of the streamflow regime, which is of
interest when defining environmental flows (Poff et al., 2010). Therefore, we selected 32 Indices
of Hydrologic Alteration (IHA) (The Nature Conservancy, 2009), describing the central
tendency of several streamflow regime characteristics. These indices were originally classified
into five categories representing distinct regime facets such as magnitude, duration, frequency,
timing, and rate of change of flows (Richter et al., 1997). In addition, we considered seven
indices proposed by Archfield et al. (2014) (a.k.a., Magnificent seven) describing basic
                                                           141


properties of streamflow time series such as central tendency, variability, skewness, kurtosis,
autocorrelation, and seasonality. The list of 39 ERHIs is presented in Table 12.
                                  Table 12 List of ERHIs used in this study
  Category                           Index*                Description
  Magnitude of monthly water         MA12 – MA23           Mean monthly flows from January to December (m3 s-1)
  conditions
  Magnitude and duration of          DL1 – DL5             Annual minimum with 1-, 3-, 7-, 30-, and 90-day
  annual extreme water                                     moving average flow (m3 s-1)
  conditions                         DH1 – DH5             Annual maximum with 1-, 3-, 7-, 30-, and 90-day
                                                           moving average flow (m3 s-1)
                                     ML17                  Baseflow index based on the 7-day minimum flow
  Timing of annual extreme           TL1                   Julian day of annual minimum
  water conditions                   TH1                   Julian day of annual maximum
  Frequency and duration of high FL1                       Mean low flow pulse count per water year (year −1)
  and low pulses                     DL16                  Mean low flow pulse duration (days)
                                     FH1                   Mean high flow pulse count per water year with a
                                                           threshold equal to the 75th percentile of the entire flow
                                                           record (year−1)
                                     DH15                  Mean high flow pulse duration with a threshold equal to
                                                           the 75th percentile of the entire flow record (days)
  Rate and frequency of water        RA1                   Rise rate (m3 s−1 d−1)
  condition changes                  RA3                   Fall rate (m3 s−1 d−1)
                                     RA8                   Reversals (year−1)
  Magnificent seven                  MAG1 – MAG4           First four L–moments (mean, coefficient of variation,
                                                           skewness, and kurtosis)
                                     MAG5                  Autoregressive lag-one AR(1) correlation coefficient
                                     MAG6 – MAG7           Amplitude and phase of the seasonal signal
  * Index abbreviations for Indicators of Hydrologic Alteration (IHA) as presented by Olden and Poff (2003).
6.2.4.5 Experiments set up
          The U-NSGA-III algorithm was implemented using the pymoo library in Python 3.7
(Blank and Deb, 2020). We developed a Python interface to modify SWAT’s input text files to
link pymoo and SWAT. The multi-objective optimization algorithm was executed for 1000
generations, using 100 well-spaced reference directions obtained with the Riesz s-Energy method
(Blank et al., 2021). This resulted in a total number of 100,000 model evaluations. Other U-
NSGA-III parameters included the crossover probability, the distribution index for the Simulated
Binary Crossover operator, the mutation probability, and the distribution index for the
polynomial mutation operator, defined as 0.9, 10, 1/15 (i.e., the inverse of the number of the
                                                         142


hydrological model calibration parameters), and 20, respectively. The MT-DREAM(ZS) algorithm
was executed in Matlab R2019b using the MT-DREAM(ZS) package developed by Vrugt (2016).
The SWAT interface in Python was linked with Matlab to compute the log-likelihood function.
MT-DREAM(ZS) was executed using three Markov chains and five multi-try proposals for 10,000
generations (for a total of 150,000 model evaluations). Additional MT-DREAM(ZS) parameters
were assigned the default values reported by Vrugt (2016). The calibration experiments were
executed using up to 20 threads in parallel on a machine equipped with two Intel® Xeon® CPU
E5-2640 Processors at 2.5 GHz with 64 GB RAM running Ubuntu 16.04.7 LTS.
6.3 RESULTS AND DISCUSSION
6.3.1 Convergence of multi-objective and Bayesian calibration experiments
        Convergence of the U-NSGA-III algorithm (Experiment 2, Period 1) was monitored
using the Hypervolume Indicator (Auger et al., 2009), which started to show a steady behavior
after 800 generations (i.e., 80,000 model evaluations). The first feasible solution (i.e., model
simulation with all the selected ERHIs within 30% relative error) was found after 4,800 model
evaluations. The total computation time for the multi-objective calibration was 32.43 hours.
Regarding the Bayesian calibration experiments, the MT-DREAM(ZS) algorithm converged after
5,400 generations (i.e., 81,000 model evaluations) for the Experiment 1, Period 1; 8,400
generations (i.e., 126,000 model evaluations) for the Experiment 1, Period 2; and 6,200
generations (i.e., 93,000 model evaluations) for the Experiment 2, Period 2. The total
computation times for the Bayesian calibration experiments, which were simultaneously
executed in the same machine, were 118.74 hours for Experiment 1, Period 1; 120.19 hours for
Experiment 1, Period 2; and 118.97 hours for Experiment 2, Period 2. It is worth noting that
Bayesian calibration using MCMC sampling had 50% more model evaluations than the multi-
                                                143


objective calibration. According to the results, Bayesian calibration convergence using prior
knowledge was 26% faster than the non-informative case. This faster convergence can be
partially attributed to the further constrained search space for the informative case through the
prior distribution.
6.3.2 Comparison between posterior parameter distributions using non-informative
        priors and Pareto-optimal results
        The distribution of the hydrologic and error model parameters considered in this study
are presented in Figure 15 for each experiment and calibration period. Under Experiment 1, some
parameter distributions were similar for both calibration periods, whereas others showed a very
distinct behavior. The latter group was comprised mostly of soil- and groundwater-related
parameters: Epco, Gwqmn, Gw revap (which represents water movement from the shallow
aquifer to the overlying unsaturated zone), Rchrg dp, and the Ch k2 (associated to transmission
losses in the main channel). The standard deviation of the transformed autocorrelated residuals
𝜎𝑊 was greater than period 2, indicating a higher total uncertainty in streamflow predictions in
period 2 compared to period 1. These differences were perhaps originated by the time-varying
nature of model parameters driven by non-stationarity effects from input weather data and
changes in land use (Xiong et al., 2019).
        Regarding Experiment 2, parameter distributions were more consistent across the two
calibration periods compared with Experiment 1. This behavior revealed the strong influence of
the prior distribution on the Bayesian calibration results for period 2. However, some parameters
showed important differences, including Canmx (related to surface water interception) and Alpha
bf (related to the shape of recession curves). These differences were originated from the
contribution of new data under period 2. Excepting the global multiplier for Sol awc, the
posterior parameter distributions from Bayesian calibration (i.e., period 2) were narrower
                                                  144


compared to the multi-objective calibration distributions (i.e., period 1) for Experiment 2. This
reduction in parametric uncertainty resulted from the assimilation of a greater amount of
information (i.e., Experiment 2, period 2 ultimately used information from both periods 1 and 2).
      Figure 15 Distribution of model parameters obtained from multi-objective calibration
 (Experiment 2, Period 1 – MOOP1) and Bayesian parameter estimation (Experiment 1, Period 1
– E1P1; Experiment 1, Period 2 – E1P2; Experiment 2, Period 2 – E2P2). Box and whisker plots
  represent the 50% and 95% confidence limits, respectively; points represent median parameter
values. Parameter descriptions are reported in Table 11. *These parameters were calibrated using
                                          global multipliers
        In general, Experiment 2 distributions were drastically narrower than Experiment 1
distributions. Bayesian calibration with non-informative priors (especially Experiment 1, period
1) resulted in poorly-informative posteriors. These posteriors included distributions for Biomix,
groundwater parameters such as Gwqmn, Gw revap, and Revapmn, and Surlag. Poorly-
informative posteriors are related to the equifinality problem, indicating that different parameter
combinations yield similar outputs (Beven, 2006). This may apply to the groundwater parameters
interacting with each other and Surlag, which represents water storage. Likewise, non-
informative posteriors are indicative of a low sensitivity of the modeling outputs to changes in
those particular parameters, which can explain Biomix behavior. Nevertheless, Experiment 1,
period 1 attained the lowest 𝜎𝑊 , which can offset a lower total uncertainty in streamflow
predictions.
                                                 145


        The reduction in parametric uncertainty attained by Experiment 2 was mainly driven by
the constraints to ERHIs simulation accuracy, introduced by the proposed multi-objective
calibration strategy. Another effect of the ERHIs constraints was the difference in the actual
range of values for certain parameters. For instance, Experiment 2 results indicated that the
global multiplier perturbing initial CN2 values was around -25%, whereas Experiment 1 results
indicated the opposite (i.e., around +25%). Higher CN2 makes the watershed more impervious,
resulting in higher runoff generation (and therefore higher streamflows). Similarly, while
Experiment 2 resulted in Surlag values close to unity, Experiment 1 resulted in Surlag values
mostly between 10 and 20. Surlag controls the fraction of total available water that enters the
main channel on a daily basis; higher Surlag values result in higher fractions. Likewise,
Experiment 2 generated Gw delay close to zero, whereas Experiment 1 yielded values greater
than 200 days. Gw delay represents the time that water spends in the vadose zone before
becoming shallow aquifer recharge. Gw delay ultimately affects groundwater contributions to the
main channel, which also impacts low-flow dynamics.
6.3.3 Performance of uncertainty quantification of daily streamflows
        Uncertainty quantification performance of streamflow predictions is presented in Figure
16 for each experiment and calibration period. As expected, the streamflow prediction bounds
resulted in a lower parametric uncertainty for Experiment 2, which was consistent with the
narrower parameter distributions presented in Figure 15 and the effects of the ERHIs’
optimization constraints. In fact, parametric uncertainty for Experiment 2, Period 2 (i.e.,
Bayesian parameter estimation using multi-objective calibration prior) was drastically lower
compared to the other cases (~72% narrower compared to Experiment 1 for the same period).
Regarding total uncertainty, the hydrographs in Figure 16 (left column) reveal wider limits for
                                                146


low flows in Experiment 1 (i.e., non-informative priors), and wider limits for high flows in
Experiment 2.
     Figure 16 Uncertainty quantification performance using multi-objective calibration and
   Bayesian parameter estimation. The hydrographs (left column) represent the 95% prediction
 bounds for streamflow; light gray is for total uncertainty, dark gray is for parametric uncertainty,
 red line are observations. The middle column presents the corresponding quantile-quantile plots
 (PQQ) using a standard uniform distribution. The right column presents the overall performance
  indices for reliability (R), precision (P), and Bias. a) Experiment 1, Period 1; b) Experiment 2,
                   Period 1; c) Experiment 1, Period 2; d) Experiment 2, Period 2
        In terms of reliability, PQQ plots indicated that Experiment 1 (Figure 16 row b, middle
column) slightly over-estimated predictive uncertainty under Period 1 (curve above the 1:1
diagonal at the lower-left corner and below the same line at the upper-right corner). Meanwhile,
Pareto-optimal solutions (Experiment 2, Figure 16 row b, middle column) over-predicted
streamflows under the same calibration period (curve was mostly below the 1:1 diagonal). The
                                                  147


latter occurred because total uncertainty was estimated using the 𝜎𝑤 obtained from the residuals
of each Pareto-optimal solution. Since the multi-objective calibration strategy did not consider
the additive error term included in the Bayesian calibration framework, the streamflow
predictions resulted in higher values by explicitly adding the residual innovations to the Pareto-
optimal simulations. Regarding Period 2, PQQ plots indicated excellent reliability for the
streamflow predictions in both experiments (Figure 16 rows c and d, middle column).
        When statistically comparing the observed quantiles from both experiments with a 95%
confidence level, we did not find evidence of a significant difference in the reliability measure
under Period 2 (p = 0.070). For Period 1, the reliability was significantly different for both
experiments (p = 1.7 x 10-5), with Experiment 1 presenting a better result overall (i.e., 10% vs.
13%, see Figure 16, right column). Regarding precision, no evidence of a significant difference
was found between the two experiments under Period 1 (p = 0.11). Meanwhile, Experiment 2
resulted in a higher precision than Experiment 1 under Period 2 (58% vs 66%, p = 1.3 x 10-8).
Regarding bias in the long-term water balance, no significant difference was found for Period 1
(p = 0.40), whereas for Period 2, Experiment 2 attained a significantly lower bias compared with
Experiment 1 (4% vs. 6%, p = 0.0059). In other words, Bayesian calibration using multi-
objective priors resulted in lower total uncertainty in streamflow predictions with lower bias in
long-term water balance and no significant loss in reliability.
6.3.4 Performance of uncertainty quantification of ERHIs
        Figure 17 shows the distribution of the relative errors for the 32 IHA and Magnificent
seven indices selected in this study and reported in Table 12. This figure presents the parametric
predictive distributions for the multi-objective calibration under Period 1 because 𝜎𝑤 was not
directly calibrated in this case. As expected, all Pareto-optimal predictions fell within the 30%
                                                  148


relative error range due to the optimization constraints in ERHIs accuracy. However, note that
only about 50% of the ERHIs computed on the observed streamflows fell within the Pareto-
optimal predictive distributions. The situation was not better for the non-informative Bayesian
calibration predictions under the same period. Meanwhile, the non-informative case under
Period 2 contained 64% of the observed ERHIs, against 56% for Experiment 2 (see columns 3
and 4 in Table 13). Furthermore, all the predictive distributions for indices under the “frequency
and duration of high and low pulses” category did not contain observed ERHIs under Period 2.
The same situation was observed for DL1 (i.e., annual daily minimum), RA8 (i.e., reversals), and
MAG3 (i.e., L-skewness). Distributions obtained with the Bayesian calibration using prior
knowledge were particularly limited in the prediction of low-flow-related ERHIs across all the
streamflow regime facets (i.e., magnitude, duration, frequency, rate of change, and timing). It is
worth noting that Bayesian calibration using prior knowledge resulted in narrower predictive
distributions compared with the non-informative case under Period 2 (see columns 7 and 8 in
Table 13), which explains the tradeoff between precision and reliability.
        The aforementioned limitations in the accuracy of ERHIs predictive distributions were
not surprising since we were trying to fit multiple streamflow facets simultaneously. Moreover,
total uncertainty propagation through ERHIs computation affects precision and impacts the
overall central tendency of the predictive distributions. Instead, our main interest was to limit the
bias and the overall dispersion within the nominal 30% relative error range. When comparing
the median ERHIs relative errors against the acceptability threshold, we found that both
experiments under Period 2 yielded 90% of the median ERHIs within the acceptability threshold.
For Experiment 2, ERHIs with the median outside the acceptability range include MA18-19 (i.e.,
monthly flows for July and August, summer flows), DL1, and FH1 (i.e., frequency of high flow
                                                 149


pulses). For Experiment 1. ERHIs with the median outside the acceptability range include MA12
and MA15 (i.e., monthly flows for January and April), TH1 (i.e., Julian day of annual
maximum), and FH1.
 Figure 17 Distribution of relative errors of the selected ERHIs using multi-objective calibration
 (Experiment 2, Period 1 – MOOP1) and Bayesian parameter estimation (Experiment 1, Period 1
   – E1P1; Experiment 1, Period 2 – E1P2; Experiment 2, Period 2 – E2P2). Box and whiskers
   represent the 50% and 95% confidence limits, respectively; points represent median relative
 error values; the vertical dotted line represents the zero axis, the gray area represent the nominal
               30% ERHI uncertainty. Index abbreviations are reported in Table 12
        In general, Experiment 2 exhibited a better performance in ERHIs prediction compared
with Experiment 1 because the overall precision had an average increase of ~32%. Regarding
bias, ~59% ERHIs exhibited lower (better) values for Experiment 2 compared with the non-
informative case. It is worth noting that ERHIs predictions behaved differently depending on the
calibration approach, as revealed by the differences in some posterior model parameter
distributions. For instance, Experiment 1 increased runoff generation and transmission losses and
regulated groundwater return to the main channels. Meanwhile, Experiment 2 decreased runoff
                                                  150


generation, increased water storage, and increased groundwater and lateral flow contributions.
As a result, non-informative Bayesian calibration generated results with a lower bias for low
flow ERHIs compared with Experiment 2. Likewise, Experiment 2 resulted in a lower bias for
high flow ERHIs compared with Experiment 1. The main factors contributing to these
differences were the constraints inherited through the informative prior distribution and the
likelihood function (particularly, the transformational approach selected to address
heteroscedasticity). Deciding for a preferred calibration option ultimately requires a better
understanding and assessment of the internal modeling processes using observations for
variables of other water cycle components. Also, a better characterization of other uncertainty
sources is required.
 Table 13 Performance of predictive distributions of ERHIs obtained under Period 2. Reliability
  was evaluated by identifying whether the distributions contained the ERHIs from observations,
        and whether the median of the distributions was within the ±30% relative error range
                                Dist. contains     Median within
                                                                   Precision (%)    Bias (%)
  ERHI Category         ERHI    observed ERHI      ±30% range
                                Exp. 1     Exp. 2  Exp. 1  Exp. 2  Exp. 1    Exp. 2 Exp. 1    Exp. 2
                        MA12               ✓               ✓       14.4      6.8    49.6      -2.7
                        MA13               ✓       ✓       ✓       12.8      7.2    27.9      1.8
                        MA14               ✓       ✓       ✓       6.4       5.8    -17.4     -4.7
                        MA15               ✓               ✓       7.7       5.7    -41.8     -8.8
                        MA16    ✓                  ✓       ✓       12.6      8.1    -13.3     22.7
  Magnitude of monthly  MA17    ✓          ✓       ✓       ✓       13.1      8.4    -9.6      13.9
  water conditions      MA18    ✓                  ✓               20.2      12.1   29.9      53.5
                        MA19    ✓                  ✓               17.5      10.9   26.1      34.9
                        MA20    ✓          ✓       ✓       ✓       12.9      9.4    5.2       -1.0
                        MA21    ✓          ✓       ✓       ✓       9.3       9.0    -15.9     4.9
                        MA22               ✓       ✓       ✓       9.2       7.4    -24.5     5.0
                        MA23    ✓          ✓       ✓       ✓       11.7      6.3    2.2       -4.3
                                                  151


Table 13 (cont’d).
                          DL1                       ✓               12.3    8.3     -28.0    -30.5
                          DL2       ✓               ✓      ✓        12.6    8.3     -21.6    -24.0
                          DL3       ✓        ✓      ✓      ✓        13.1    8.4     -12.5    -14.9
 Magnitude and            DL4       ✓        ✓      ✓      ✓        15.0    8.8     17.9     15.2
 duration of annual       DL5       ✓               ✓      ✓        11.9    7.0     12.2     17.3
 extreme water            DH1                ✓      ✓      ✓        10.6    7.6     -27.3    -8.0
 conditions (mean daily   DH2       ✓        ✓      ✓      ✓        10.6    7.3     -24.8    -6.7
 flow)                    DH3       ✓        ✓      ✓      ✓        9.9     6.7     -21.3    -5.9
                          DH4       ✓        ✓      ✓      ✓        8.1     5.8     -14.4    -2.2
                          DH5       ✓        ✓      ✓      ✓        7.1     4.5     -12.4    2.0
                          ML17      ✓               ✓      ✓        13.7    7.3     -6.6     -18.1
 Timing of annual         TL1       ✓               ✓      ✓        11.2    8.1     0.8      18.0
 extreme water
 conditions               TH1                ✓             ✓        34.7    33.5    -59.2    -27.0
                          FL1                       ✓      ✓        13.8    7.2     29.0     27.6
 Frequency and
                          DL16                      ✓      ✓        8.8     6.9     -27.6    -27.8
 duration of high and
 low pulses               FH1                                       11.2    8.0     46.0     30.5
                          DH15                      ✓      ✓        7.2     6.7     -28.5    -20.1
 Rate and frequency of    RA1       ✓        ✓      ✓      ✓        6.6     3.6     -12.6    4.3
 water condition          RA3       ✓               ✓      ✓        8.1     4.2     12.1     25.1
 changes                  RA8                       ✓      ✓        1.9     1.8     29.5     29.5
                          MAG1 ✓             ✓      ✓      ✓        6.4     3.4     -6.2     4.3
                          MAG2 ✓             ✓      ✓      ✓        4.4     2.3     -6.9     0.1
                          MAG3                      ✓      ✓        5.4     3.7     -15.2    -9.8
 Magnificent seven        MAG4 ✓                    ✓      ✓        7.8     5.2     -14.7    -16.7
                          MAG5 ✓             ✓      ✓      ✓        2.5     2.0     1.7      -2.1
                          MAG6 ✓             ✓      ✓      ✓        20.0    10.7    4.9      -15.7
                          MAG7 ✓                    ✓      ✓        5.9     3.4     -12.2    7.5
Note: Exp. 1 = Experiment 1; Exp. 2 = Experiment 2
6.4 CONCLUSIONS
         In this study, we successfully linked multi-objective calibration with Bayesian parameter
estimation for ERHIs uncertainty quantification. We achieved this by using a multivariate prior
distribution of model parameters based on near-optimal Pareto solutions. The connection allowed
the transfer of predefined ERHIs accuracy constraints into the overall Bayesian inference
framework. The main advantage of the developed strategy is the use of multiple sources of
information contained within a single continuously measured quantity for improving streamflow
regimes prediction. Other benefits – compared with Bayesian calibration using non-informative
                                                   152


priors – included: 1) faster convergence to a stationary multivariate posterior distribution of
model parameters using the MT-DREAM(ZS) algorithm., 2) drastic reduction of parametric
uncertainty in streamflow predictions, 3) higher precision in streamflow predictive uncertainty
with lower bias in the long-term water balance and no significant loss in reliability, and 4) higher
precision in ERHIs’ prediction.
        It is worth noting that using prior knowledge had an important effect on how the
hydrological model internally simulated surface runoff, interflow, and baseflow, as reflected by
differences in related parameter posteriors. For example, when using non-informative priors, the
chosen likelihood function favored the representation of low flows at the expense of high flows.
Meanwhile, multi-objective calibration priors resulted in the opposite outcomes, yielding better
results for high flows and behaving rather poorly for low flows. Therefore, further work is
needed for understanding the tradeoffs between priors and likelihood functions for improving
streamflow and ERHIs prediction. Ultimately, deciding on a particular modeling path depends on
a better characterization of uncertainty sources (e.g., weather data, land use change, non-
stationarity effects) and the incorporation of additional observations for other water cycle
components (e.g., evapotranspiration, soil moisture, groundwater levels, leaf area index). Also,
persistent limitations in ERHIs prediction for low flows, rate of change, and frequency and
duration of high and low pulses require the revision of modeling workflows and structures to
describe the ecohydrological behavior of freshwater ecosystems better.
                                                 153


                                       7 CONCLUSIONS
         This research developed different calibration strategies to improve the predictability of
ERHIs using hydrological modeling. These strategies were tested in an agriculture-dominated
watershed in Michigan, US. The first study evaluated the predictability of an exhaustive list of
ERHIs by comparing the performance of two multi-objective and three single-objective
formulations for model calibration. The second study improved the multi-objective formulations
by explicitly targeting a subset of ERHIs, providing a balanced representation of the overall
streamflow regime. Finally, the third study quantified the uncertainty in ERHIs prediction. For
this purpose, multi-objective calibration and Bayesian parameter estimation were linked using
near-optimal Pareto parameter distributions as prior knowledge. The overall process
implemented watershed modeling, streamflow regime characterization, evolutionary multi-
objective optimization, MCDM methods, and Bayesian inference using adaptive MCMC
sampling. The following can be concluded from this research:
 The multi-objective calibration strategy based on NSE calculated on different streamflow
   transformations was superior to the RMSE-based strategy using flow separation. Specifically,
   the NSE-based strategy achieved a faster convergence, higher accuracy in ERHIs simulation,
   lower variability in ERHIs solutions, and narrower distributions of Pareto-optimal model
   parameters.
 NSE-based single-objective strategies calculated on untransformed and square-root-
   transformed streamflows outperformed the NSE- and RMSE-based multi-objective strategies
   in terms of accuracy in high-flow indices estimation.
                                                154


 No calibration approach among the two multi-objective and three single-objective strategies
  tested in the first study was considered as the best for all ERHIs. Instead, they can be regarded
  as complementary to each other. However, the multi-objective strategies were preferred over
  the single-objective ones because they provided ranges of solutions while accounting for
  tradeoffs between different flow conditions.
 Outcomes from the first study help decision-makers in identifying which simulated ERHIs are
  more reliable when defining environmental standards and limits to human-driven hydrologic
  alteration. Having multiple optimal solutions provide natural resources managers with several
  options when defining these standards or limits under varying conditions (e.g., low, moderate,
  high flows).
 Obtaining a balanced representation of the overall streamflow regime is not directly related to
  improvements in a particular performance metric computed on streamflow time series. Instead,
  the balance responds to the interaction between different regime facets and flow conditions.
 It was possible to obtain consistent runoff simulations using multiple objective functions based
  on ERHIs that represent different streamflow regime facets. This was achieved without using
  any objective function computed on streamflow time series, at the expense of higher variability
  in the near-optimal Pareto solutions.
 Explicitly constraining or targeting ERHIs when calibrating a hydrological model, boosts its
  ability in representing those indices. Particularly, the performance-based strategy constraining
  the ERHIs accuracy was preferred for its lower variability in near-optimal Pareto solutions.
 Variabilities in the near-optimal Pareto solutions for both performance- and signature-based
  strategies were mainly driven by the model representation of low flows, as revealed by highly
  variable inverse-transformed KGE values and related Flow Duration Curve metrics.
                                                 155


 The performance-based strategy tested in the second study can incorporate acceptability
  thresholds for ERHIs prediction defined by decision-makers beforehand. These thresholds are
  generally based on additional information such as preferences of riverine species and socio-
  economic criteria. Therefore, the overall ecohydrological modeling process can be easily
  connected to broader decision-support tools.
 The overall bias and precision in streamflow predictions increased at the expense of a slight
  reduction in reliability. Still, the best precision in streamflow predictions was hardly below
  60%.
 In general, ERHI predictive distributions were narrower and more accurate when using multi-
  objective calibration prior knowledge. However, ERHIs related to low flows presented a lower
  bias compared with the non-informative case.
 While most of the ERHIs predictive distributions fell within the nominal 30% relative error
  range (i.e., expected uncertainty due to data length and non-stationarity), over 46% of ERHIs
  computed on streamflow observations did not fell within the predictive distributions, which
  can be related to model structure inadequacies.
 All the strategies developed in this research revealed limitations of the SWAT model structure
  when representing the vertical redistribution of soil moisture, rate of change, and timing of
  annual extremes. Particularly, the simulated interannual variability of low flow magnitude and
  timing, rise and fall rates, and duration and frequency of extreme flows was very inaccurate.
 Limitations in the representation of interannual variability have important repercussions in the
  definition of limits to hydrologic alteration, which is a major issue in freshwater systems
  protection and restoration. Thus, modelers and policymakers should account for these
                                                    156


  limitations when implementing broader ecohydrological applications in ungauged and poorly
  gauged watersheds.
 Regarding the third study, simulated ERHIs with wide ranges of variability can be seen as less
  reliable for decision-makers. As a result, other ERHIs with narrower variability can be chosen
  for making decisions using modeling results, or additional efforts can be pursued to reduce the
  uncertainty in the ERHIs of interest.
                                                157


                                     8 FUTURE RESEARCH
         This research presented novel calibration strategies based on multi-objective optimization
and Bayesian inference to improve the predictability of ERHIs when using hydrological models.
By linking multi-objective calibration results with Bayesian parameter estimation, multiple
sources of information contained within a single continuously measured quantity (i.e.,
streamflow) can contribute to the overall uncertainty quantification process. In order to enhance
the reliability of the proposed strategies, the following recommendations are provided here for
further studies:
 Extend the calibration strategies spatially and temporally. The developed strategies were
   tested in a watershed with a single streamflow gauging station. Therefore, it is recommended
   that future studies implement these strategies considering multiple locations for calibration and
   validation purposes. For the former, the multi-objective calibration approach is flexible enough
   for incorporating multi-site calibration. Likewise, a third independent time period should be
   added to validate the predictive distributions that are built upon data from two other periods
   (one for multi-objective calibration and the other for Bayesian calibration using prior
   knowledge). In addition, the developed strategies should be extended to consider time-varying
   parameters and their effects on ERHIs predictability.
 Explicitly consider other uncertainty sources. In this research, only the parametric uncertainty
   was addressed explicitly. Additional uncertainty sources were aggregated into a lumped error
   model. Therefore, extending the Bayesian inference framework is recommended to account for
   other sources such as input (e.g., land use, weather, soil), model structure, measurement errors,
   data length, non-stationarity effects, and ERHIs computation methods. In addition, other error
                                                  158


  models should be tested to evaluate whether a particular formulation is better suited for ERHIs
  prediction. A multi-level optimization approach can be suitable for trying different
  autocorrelation models/coefficients and transformational approaches, preventing parameter
  interaction issues.
 Validate the modeling results with observations for other water cycle components. Multi-
  variable calibration can be easily incorporated into the developed calibration strategies. With
  the advent of remotely sensed products and integrated modeling frameworks, it is
  recommended that future studies evaluate the predictability of ERHIs when considering
  additional variables representing other components of the hydrological cycle such as
  evapotranspiration, soil moisture, groundwater levels, and/or leaf area index.
 Apply the calibration strategies with different model structures. This research used SWAT as
  the hydrological model. Since several limitations were identified for this model, it is
  recommended to implement the developed strategies with other model structures and under
  different spatial scales (e.g., regional, national, continental) and time resolutions (e.g., sub-
  daily, monthly with disaggregation techniques) to evaluate any improvements in the prediction
  of ERHIs under different modeling paradigms.
 Implement evolutionary multi-objective optimization with stochastic objective functions. The
  connection between multi-objective optimization and Bayesian inference developed in this
  research was built upon the prior distribution. However, additional integration approaches
  should be examined, such as the formulation of stochastic objective functions that incorporate
  error models to quantify the total uncertainty.
 Quantify uncertainty throughout broader ecohydrological applications. In this research, we
  obtained prediction distributions for selected ERHIs. ERHIs are generally used as explanatory
                                                  159


  variables for predicting other quantities of interest in ecohydrological applications, such as
  biological indicators or environmental flow settings. Therefore, future studies should consider
  these distributions for uncertainty quantification of ecohydrological variables.
 Integrate surrogate modeling for computationally intensive models. One of the main
  limitations of the integrated multi-objective calibration and Bayesian inference process is the
  requirement of large numbers of model executions. In general, MCMC methods are sequential
  approaches with limited parallelization capabilities. In order to address this issue, new
  strategies incorporating surrogate models should be considered.
 Test uncertainty quantification results in practical decision-making scenarios. Uncertainty
  analysis provides a higher transparency to the modeling process and to the discussion between
  modelers and policy and decision makers. Futures studies should consider the evaluation of
  modeling uncertainty effects on the definition of environmental standards which incorporate
  additional criteria regarding social and economic components. Other decision-making
  scenarios include prioritizing biological monitoring sites, definition of rules based on limits to
  hydrologic alteration, environmental impact assessment, and allocation of best management
  practices.
                                                 160


APPENDIX
   161


  Table A1 Description of ecologically-relevant hydrologic indices with all, high, medium, and low flow Pareto-optimal solutions
 having median relative errors outside the ±30% bound, for each multi-objective calibration strategy. Adapted from Olden and Poff
                                                (2003) and Henriksen et al. (2006)
Code     Hydrologic index                Units      Description                                        Calibration strategy
Magnitude of flow events
Average flow conditions
MA29     Variability in June flows       -                                                             Both NSE- and RMSE-based
MA30     Variability in July flows       -                                                             Both NSE- and RMSE-based
MA31     Variability in August flows     -                                                             Both NSE- and RMSE-based
                                                    Coefficient of variation in monthly flows
MA32     Variability in September flows  -                                                             Both NSE- and RMSE-based
MA33     Variability in October flows    -                                                             Both NSE- and RMSE-based
MA34     Variability in November flows   -                                                             Only RMSE-based
                                                    Range of monthly flows divided by median monthly
MA42     Variability across annual flows -                                                             Only NSE-based
                                                    flows
                                                    90th - 10th percentile of monthly flows divided by
MA44     Variability across annual flows -                                                             Only NSE-based
                                                    median monthly flows
                                                    (Mean annual flow - median annual flow)/median
MA45     Skewness in annual flows        -                                                             Both NSE- and RMSE-based
                                                    annual flow
Low flow conditions
ML7      Mean minimum July monthly flow  m3 s-1                                                        Both NSE- and RMSE-based
         Mean minimum August monthly
ML8                                      m3 s-2     Mean minimum monthly flow                          Both NSE- and RMSE-based
         flow
         Mean minimum September monthly
ML9                                      m3 s-3                                                        Only RMSE-based
         flow
                                                    Mean of the lowest annual daily flow divided by
ML14     Mean of annual minimum flows    -                                                             Both NSE- and RMSE-based
                                                    median annual daily flow averaged across all years
                                                                  162


Table A1 (cont’d).
                                                        Mean of the lowest annual daily flow divided by
 ML15     Low flow index                    -                                                                Both NSE- and RMSE-based
                                                        mean annual daily flow averaged across all years
                                                        Median of the lowest annual daily flow divided by
 ML16     Median of annual minimum flows    -                                                                Both NSE- and RMSE-based
                                                        median annual daily flow averaged across all years
                                                        Seven-day minimum flow divided by mean annual
 ML17     Baseflow index 1                  -                                                                Only RMSE-based
                                                        daily flows averaged across all years
                                                        Mean of the ratio of the lowest annual daily flow to
 ML19     Basefow index 2                   -           the mean annual daily flow times 100 averaged        Both NSE- and RMSE-based
                                                        across all years
          Variability across annual minimum             Coefficient of variation in annual minimum flows
 ML21                                       -                                                                Both NSE- and RMSE-based
          flows                                         averaged across all years
          Specific mean annual minimum                  Mean annual minimum flows divided by catchment
 ML22                                       m3 s-1 km-2                                                      Both NSE- and RMSE-based
          flows                                         area
 High flow conditions
 MH6      Mean maximum June monthly flow    m3 s-1                                                           Both NSE- and RMSE-based
 MH7      Mean maximum July monthly flow    m3 s-1                                                           Both NSE- and RMSE-based
                                                        Mean of the maximum monthly flows
          Mean maximum October monthly
 MH10                                       m3 s-1                                                           Both NSE- and RMSE-based
          flow
          Mean maximum November monthly
 MH11                                       m3 s-1                                                           Only RMSE-based
          flow
                                                                     163


Table A1 (cont’d)
                                          Mean of the high flow volume (calculated as the
                                          area between the hydrograph and the upper
 MH21     High flow volume         days                                                        Both NSE- and RMSE-based
                                          threshold defined as the median annual flow)
                                          divided by median annual daily flow across all years
                                          Mean of the high flow volume (calculated as the
                                          area between the hydrograph and the upper
 MH22     High flow volume         days   threshold defined as 3 times the median annual       Only NSE-based
                                          flow) divided by median annual daily flow across all
                                          years
                                          Mean of the high flow volume (calculated as the
                                          area between the hydrograph and the upper
 MH23     High flow volume         days   threshold defined as 7 times the median annual       Only RMSE-based
                                          flow) divided by median annual daily flow across all
                                          years
 Frequency of flow events
 Low flow conditions
                                          Average number of flow events below the 25th
 FL1      Low flood pulse count    year-1                                                      Both NSE- and RMSE-based
                                          percentile of the entire flow record
 High flow conditions
                                          Average number of flow events above the 75th
 FH1      High flood pulse count 1 year-1                                                      Only RMSE-based
                                          percentile of the entire flow record
                                                       164


Table A1 (cont’d)
                                                     Average number of days per year that the flow is
 FH4      High flood pulse count 2            year-1                                                      Only NSE-based
                                                     above 7 times median daily flow of the entire record
                                                     Mean number of high flow events per year using a
 FH5      Flood frequency 1                   year-1 threshold equal to the median flow of the entire     Both NSE- and RMSE-based
                                                     record
                                                     Mean number of high flow events per year using a
 FH8      Flood frequency 2                   year-1 threshold equal to the 25th percentile of the entire Only RMSE-based
                                                     flow record
                                                     Mean number of high flow events per year using a
 FH9      Flood frequency 2                   year-1 threshold equal to the 75th percentile of the entire Both NSE- and RMSE-based
                                                     flow record
 Duration of flow events
 Low flow conditions
                                                     Magnitude of minimum annual flow of 1-day
 DL1      Annual minimum daily flow           m3 s-1                                                      Both NSE- and RMSE-based
                                                     duration
          Annual minimum of 3-day moving             Magnitude of minimum annual flow of 3-day
 DL2                                          m3 s-1                                                      Only RMSE-based
          average flow                               duration
          Variability of annual minimum daily        Coefficient of variation in magnitude of minimum
 DL6                                          -                                                           Both NSE- and RMSE-based
          average flow                               annual flow of 1-day duration
          Variability of annual minimum of 3-        Coefficient of variation in magnitude of minimum
 DL7                                          -                                                           Both NSE- and RMSE-based
          day moving average flow                    annual flow of 3-day duration
                                                                  165


Table A1 (cont’d)
          Variability of annual minimum of 7-      Coefficient of variation in magnitude of minimum
 DL8                                          -                                                          Both NSE- and RMSE-based
          day moving average flow                  annual flow of 7-day duration
          Mean of 1-day minima of daily            Mean annual 1-day minimum, divided by median
 DL11                                         -                                                          Both NSE- and RMSE-based
          discharge                                flow
          Mean of 3-day minima of daily            Mean annual 3-day minimum, divided by median
 DL12                                         -                                                          Only RMSE-based
          discharge                                flow
                                                   Mean duration of flow events below the 25th
 DL16     Low flow pulse duration             days                                                       Both NSE- and RMSE-based
                                                   percentile of the entire flow record
 High flow conditions
                                                   Mean duration of flow events above the 75th
 DH15     High flow pulse duration            days                                                       Both NSE- and RMSE-based
                                                   percentile of the entire flow record
                                                   Mean duration of flow events above a threshold
 DH17     High flow duration 1                days                                                       Both NSE- and RMSE-based
                                                   equal to the median flow of the entire record
                                                   Mean duration of flow events above a threshold
 DH19     High flow duration 1                days                                                       Only NSE-based
                                                   equal to 7 times the median flow of the entire record
                                                   Mean duration of flow events above a threshold
 DH20     High flow duration 2                days equal to the 75th percentile value for the median     Only RMSE-based
                                                   annual flows
                                                   Mean duration of flow events above a threshold
 DH21     High flow duration 2                days equal to the 25th percentile value for the median     Both NSE- and RMSE-based
                                                   annual flows
                                                                166


Table A1 (cont’d)
                                                        Mean annual number of days that flows remain
                                                        above the flood threshold (equal to the flow
 DH23     Flood duration 2                   days                                                          Only RMSE-based
                                                        equivalent for a flood recurrence of 1.67 years)
                                                        averaged across all years
 Timing of flow events
 Low flow conditions
                                                        Maximum proportion between the number of days
          Seasonal predictability of non-low
 TL4                                         -          that flow is above the 5-year flood threshold and  Only NSE-based
          flow
                                                        365 or 366 (leap year) among all years.
 Rate of change in flow events
 Average flow conditions
                                                        Mean rate of negative changes in flow from one day
 RA3      Fall rate                          m3 s-1 d-1                                                    Both NSE- and RMSE-based
                                                        to the next
                                                        Median of difference between log10 of flows
 RA6      Change of flow                     m3 s-1                                                        Both NSE- and RMSE-based
                                                        between two consecutive days with increasing flow
                                                        Median of difference between log10 of flows
 RA7      Change of flow                     m3 s-1                                                        Both NSE- and RMSE-based
                                                        between two consecutive days with decreasing flow
                                                                      167


REFERENCES
    168


                                         REFERENCES
Abbasi, T., Abbasi, S.A., 2012. Multivariate Approaches for Bioassessment of Water Quality, in:
     Water Quality Indices. Elsevier, pp. 337–350. https://doi.org/10.1016/B978-0-444-54304-
     2.00015-4
Abouali, M., Daneshvar, F., Nejadhashemi, A.P., 2016a. MATLAB Hydrological Index Tool
     (MHIT): A high performance library to calculate 171 ecologically relevant hydrological
     indices. Ecol. Inform. 33, 17–23. https://doi.org/10.1016/j.ecoinf.2016.03.004
Abouali, M., Nejadhashemi, A.P., Daneshvar, F., Woznicki, S.A., 2016b. Two-phase approach to
     improve stream health modeling. Ecol. Inform. 34, 13–21.
     https://doi.org/10.1016/j.ecoinf.2016.04.009
Addor, N., Nearing, G., Prieto, C., Newman, A.J., Le Vine, N., Clark, M.P., 2018. A Ranking of
     Hydrological Signatures Based on Their Predictability in Space. Water Resour. Res. 54,
     8792–8812. https://doi.org/10.1029/2018WR022606
Adriaenssens, V., De Baets, B., Goethals, P.L.M., De Pauw, N., 2004a. Fuzzy rule-based models
     for decision support in ecosystem management. Sci. Total Environ. 319, 1–12.
     https://doi.org/10.1016/S0048-9697(03)00433-9
Adriaenssens, V., Goethals, P.L.M., Charles, J., De Pauw, N., 2004b. Application of Bayesian
     Belief Networks for the prediction of macroinvertebrate taxa in rivers. Ann. Limnol. 40,
     181–191. https://doi.org/10.1051/limn/2004016
Adriaenssens, V., Goethals, P.L.M., De Pauw, N., 2006. Fuzzy knowledge-based models for
     prediction of Asellus and Gammarus in watercourses in Flanders (Belgium). Ecol. Modell.
     195, 3–10. https://doi.org/10.1016/j.ecolmodel.2005.11.043
Aguilera, P.A., Fernández, A., Fernández, R., Rumí, R., Salmerón, A., 2011. Bayesian networks
     in environmental modelling. Environ. Model. Softw. 26, 1376–1388.
     https://doi.org/10.1016/j.envsoft.2011.06.004
Ahmadi-Nedushan, B., St-Hilaire, A., Bérubé, M., Robichaud, É., Thiémonge, N., Bobée, B.,
     2006. A review of statistical methods for the evaluation of aquatic habitat suitability for
     instream flow assessment. River Res. Appl. 22, 503–523. https://doi.org/10.1002/rra.918
Allan, J.D., Yuan, L.L., Black, P., Stockton, T., Davies, P.E., Magierowski, R.H., Read, S.M.,
     2012. Investigating the relationships between environmental stressors and stream condition
     using Bayesian belief networks. Freshw. Biol. 57, 58–73. https://doi.org/10.1111/j.1365-
     2427.2011.02683.x
Almeida, D., Alcaraz-Hernández, J.D., Merciai, R., Benejam, L., García-Berthou, E., 2017.
     Relationship of fish indices with sampling effort and land use change in a large
     Mediterranean river. Sci. Total Environ. 605–606, 1055–1063.
                                                169


    https://doi.org/10.1016/j.scitotenv.2017.06.025
Almeida, S., Bulygina, N., McIntyre, N., Wagener, T., Buytaert, W., 2013. Improving parameter
    priors for data-scarce estimation problems. Water Resour. Res. 49, 6090–6095.
    https://doi.org/10.1002/wrcr.20437
Almeida, S., Le Vine, N., McIntyre, N., Wagener, T., Buytaert, W., 2016. Accounting for
    dependencies in regionalized signatures for predictions in ungauged catchments. Hydrol.
    Earth Syst. Sci. 20, 887–901. https://doi.org/10.5194/hess-20-887-2016
Álvarez-Cabria, M., González-Ferreras, A.M., Peñas, F.J., Barquín, J., 2017. Modelling
    macroinvertebrate and fish biotic indices: From reaches to entire river networks. Sci. Total
    Environ. 577, 308–318. https://doi.org/10.1016/j.scitotenv.2016.10.186
Ambelu, A., Lock, K., Goethals, P., 2010. Comparison of modelling techniques to predict
    macroinvertebrate community composition in rivers of Ethiopia. Ecol. Inform. 5, 147–152.
    https://doi.org/10.1016/j.ecoinf.2009.12.004
Ammann, L., Fenicia, F., Reichert, P., 2019. A likelihood framework for deterministic
    hydrological models and the importance of non-stationary autocorrelation. Hydrol. Earth
    Syst. Sci. 23, 2147–2172. https://doi.org/10.5194/hess-23-2147-2019
Andresen, J., Winkler, J., 2009. Weather and Climate, in: Schaetzl, R., Darden, J., Brandt, D.
    (Eds.), Michigan Geography and Geology. Pearson Custom Publishing, Boston, MA.
Araújo, M.B., New, M., 2007. Ensemble forecasting of species distributions. Trends Ecol. Evol.
    22, 42–47. https://doi.org/10.1016/j.tree.2006.09.010
Archfield, S.A., Kennen, J.G., Carlisle, D.M., Wolock, D.M., 2014. An objective and
    parsimonious approach for classifying natural flow regimes at a continental scale. River
    Res. Appl. 30, 1166–1183. https://doi.org/10.1002/rra.2710
Arnold, J.G., Moriasi, D.N., Gassman, P.W., Abbaspour, K.C., White, M.J., Srinivasan, R.,
    Santhi, C., Harmel, R.D., Griensven, a. Van, VanLiew, M.W., Kannan, N., Jha, M.K.,
    2012. Swat: Model Use, Calibration, and Validation. Trans. ASABE 55, 1491–1508.
    https://doi.org/ISSN 2151-0032
Arnold, J.G., Srinivasan, R., Muttiah, R.S., Williams, J.R., 1998. Large area hydrologic
    modeling and assesment Part I: Model development. JAWRA J. Am. Water Resour. Assoc.
    34, 73–89. https://doi.org/10.1111/j.1752-1688.1998.tb05961.x
Arthur, D., Vassilvitskii, S., 2007. K-Means++: the Advantages of Careful Seeding. Proc.
    eighteenth Annu. ACM-SIAM Symp. Discret. algorithms 8, 1027–1025.
    https://doi.org/10.1145/1283383.1283494
Auger, A., Bader, J., Brockhoff, D., Zitzler, E., 2009. Theory of the hypervolume indicator. Proc.
    tenth ACM SIGEVO Work. Found. Genet. algorithms - FOGA ’09 87.
    https://doi.org/10.1145/1527125.1527138
                                                 170


Austin, M.P., 1976. On non-linear species response models in ordination. Vegetatio 33, 33–41.
     https://doi.org/10.1007/BF00055297
Babbar-Sebens, M., Mukhopadhyay, S., Singh, V.B., Piemonti, A.D., 2015. A web-based
     software tool for participatory optimization of conservation practices in watersheds.
     Environ. Model. Softw. 69, 111–127. https://doi.org/10.1016/j.envsoft.2015.03.011
Barry, S., Elith, J., 2006. Error and uncertainty in habitat models. J. Appl. Ecol. 43, 413–423.
     https://doi.org/10.1111/j.1365-2664.2006.01136.x
Beechie, T.J., Sear, D.A., Olden, J.D., Pess, G.R., Buffington, J.M., Moir, H., Roni, P., Pollock,
     M.M., 2010. Process-based Principles for Restoring River Ecosystems. Bioscience 60, 209–
     222. https://doi.org/10.1525/bio.2010.60.3.7
Bejarano, M.D., Sordo-Ward, A., Gabriel-Martin, I., Garrote, L., 2019. Tradeoff between
     economic and environmental costs and benefits of hydropower production at run-of-river-
     diversion schemes under different environmental flows scenarios. J. Hydrol. 572, 790–804.
     https://doi.org/10.1016/j.jhydrol.2019.03.048
Bekele, E.G., Nicklow, J.W., 2007. Multi-objective automatic calibration of SWAT using
     NSGA-II. J. Hydrol. 341, 165–176. https://doi.org/10.1016/j.jhydrol.2007.05.014
Bennett, N.D., Croke, B.F.W., Guariso, G., Guillaume, J.H.A., Hamilton, S.H., Jakeman, A.J.,
     Marsili-Libelli, S., Newham, L.T.H., Norton, J.P., Perrin, C., Pierce, S.A., Robson, B.,
     Seppelt, R., Voinov, A.A., Fath, B.D., Andreassian, V., 2013. Characterising performance
     of environmental models. Environ. Model. Softw. 40, 1–20.
     https://doi.org/10.1016/j.envsoft.2012.09.011
Beven, K., 2006. A manifesto for the equifinality thesis. J. Hydrol. 320, 18–36.
     https://doi.org/10.1016/j.jhydrol.2005.07.007
Beven, K., Binley, A., 2014. GLUE: 20 years on. Hydrol. Process. 28, 5897–5918.
     https://doi.org/10.1002/hyp.10082
Beven, K., Smith, P., 2015. Concepts of Information Content and Likelihood in Parameter
     Calibration for Hydrological Simulation Models. J. Hydrol. Eng. 20, 1–15.
     https://doi.org/10.1061/(asce)he.1943-5584.0000991
Blank, J., Deb, K., 2020. Pymoo: Multi-Objective Optimization in Python. IEEE Access 8,
     89497–89509. https://doi.org/10.1109/ACCESS.2020.2990567
Blank, J., Deb, K., Dhebar, Y., Bandaru, S., Seada, H., 2021. Generating Well-Spaced Points on
     a Unit Simplex for Evolutionary Many-Objective Optimization. IEEE Trans. Evol. Comput.
     25, 48–60. https://doi.org/10.1109/TEVC.2020.2992387
Blazkova, S., Beven, K., 2009. A limits of acceptability approach to model evaluation and
     uncertainty estimation in flood frequency estimation by continuous simulation: Skalka
     catchment, Czech Republic. Water Resour. Res. 45, 1–12.
                                                 171


    https://doi.org/10.1029/2007WR006726
Boavida, I., Dias, V., Ferreira, M.T., Santos, J.M., 2014. Univariate functions versus fuzzy logic:
    Implications for fish habitat modeling. Ecol. Eng. 71, 533–538.
    https://doi.org/10.1016/j.ecoleng.2014.07.073
Boets, P., Landuyt, D., Everaert, G., Broekx, S., Goethals, P.L.M., 2015. Evaluation and
    comparison of data-driven and knowledge-supported Bayesian Belief Networks to assess
    the habitat suitability for alien macroinvertebrates. Environ. Model. Softw. 74, 92–103.
    https://doi.org/10.1016/j.envsoft.2015.09.005
Booker, D.J., Snelder, T.H., Greenwood, M.J., Crow, S.K., 2015. Relationships between
    invertebrate communities and both hydrological regime and other environmental factors
    across New Zealand’s rivers. Ecohydrology 8, 13–32. https://doi.org/10.1002/eco.1481
Borcard, D., Gillet, F., Legendre, P., 2011. Numerical Ecology with R, Applied Spatial Data
    Analysis with R. Springer New York, New York, NY. https://doi.org/10.1007/978-1-4419-
    7976-6
Bowden, G.J., Dandy, G.C., Maier, H.R., 2005. Input determination for neural network models
    in water resources applications. Part 1 - Background and methodology. J. Hydrol. 301, 75–
    92. https://doi.org/10.1016/j.jhydrol.2004.06.021
Box, G.E.P., Cox, D.R., 1964. An Analysis of Transformations. J. R. Stat. Soc. Ser. B 26, 211–
    252. https://doi.org/10.2307/2287791
Branke, J., Deb, K., Miettinen, K., Słowiński, R., 2008. Multiobjective Optimization: Interactive
    and Evolutionary Approaches, Lecture Notes in Computer Science. Springer Berlin
    Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88908-3
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32.
    https://doi.org/10.1023/A:1010933404324
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A., 1984. Classification and regression trees.
    CRC press.
Brown, L.R., May, J.T., Rehn, A.C., Ode, P.R., Waite, I.R., Kennen, J.G., 2012. Predicting
    biological condition in southern California streams. Landsc. Urban Plan. 108, 17–27.
    https://doi.org/10.1016/j.landurbplan.2012.07.009
Buchanan, B.P., Auerbach, D.A., McManamay, R.A., Taylor, J.M., Flecker, A.S., Archibald,
    J.A., Fuka, D.R., Walter, M.T., 2017. Environmental flows in the context of unconventional
    natural gas development in the Marcellus Shale: Ecol. Appl. 27, 37–55.
    https://doi.org/10.1002/eap.1425
Bunn, S.E., Arthington, A.H., 2002. Basic principles and ecological consequences of altered
    flow regimes for aquatic biodiversity. Environ. Manage. 30, 492–507.
    https://doi.org/10.1007/s00267-002-2737-0
                                                 172


Caldwell, P. V., Kennen, J.G., Sun, G., Kiang, J.E., Butcher, J.B., Eddy, M.C., Hay, L.E.,
     Lafontaine, J.H., Hain, E.F., Nelson, S.A.C., Mcnulty, S.G., 2015. A comparison of
     hydrologic models for ecological flows and water availability. Ecohydrology 8, 1525–1546.
     https://doi.org/10.1002/eco.1602
Carlisle, D.M., Falcone, J., Meador, M.R., 2009a. Predicting the biological condition of streams:
     Use of geospatial indicators of natural and anthropogenic characteristics of watersheds.
     Environ. Monit. Assess. 151, 143–160. https://doi.org/10.1007/s10661-008-0256-z
Carlisle, D.M., Falcone, J., Wolock, D.M., Meador, M.R., Norris, R.H., 2009b. Predicting the
     natural flow regime: models for assessing hydrological alteration in streams. River Res.
     Appl. 30, n/a-n/a. https://doi.org/10.1002/rra.1247
Carlisle, D.M., Grantham, T.E., Eng, K., Wolock, D.M., 2017. Biological relevance of
     streamflow metrics: Regional and national perspectives. Freshw. Sci. 36, 927–940.
     https://doi.org/10.1086/694913
Carpenter, S.R., Stanley, E.H., Vander Zanden, M.J., 2011. State of the World’s Freshwater
     Ecosystems: Physical, Chemical, and Biological Changes. Annu. Rev. Environ. Resour. 36,
     75–99. https://doi.org/10.1146/annurev-environ-021810-094524
Casper, M.C., Grigoryan, G., Gronz, O., Gutjahr, O., Heinemann, G., Ley, R., Rock, A., 2012.
     Analysis of projected hydrological behavior of catchments based on signature indices.
     Hydrol. Earth Syst. Sci. 16, 409–421. https://doi.org/10.5194/hess-16-409-2012
Chee, Y.E., Elith, J., 2012. Spatial data for modelling and management of freshwater
     ecosystems. Int. J. Geogr. Inf. Sci. 26, 2123–2140.
     https://doi.org/10.1080/13658816.2012.717628
Chen, J., Zhong, P. an, Liu, W., Wan, X.Y., Yeh, W.W.G., 2020. A multi-objective risk
     management model for real-time flood control optimal operation of a parallel reservoir
     system. J. Hydrol. 590, 125264. https://doi.org/10.1016/j.jhydrol.2020.125264
Chen, S.H., Pollino, C.A., 2012. Good practice in Bayesian network modelling. Environ. Model.
     Softw. 37, 134–145. https://doi.org/10.1016/j.envsoft.2012.03.012
Chilkoti, V., Bolisetti, T., Balachandar, R., 2018. Multi-objective autocalibration of SWAT
     model for improved low flow performance for a small snowfed catchment. Hydrol. Sci. J.
     63, 1482–1501. https://doi.org/10.1080/02626667.2018.1505047
Chinnayakanahalli, K.J., Hawkins, C.P., Tarboton, D.G., Hill, R.A., 2011. Natural flow regime,
     temperature and the composition and richness of invertebrate assemblages in streams of the
     western United States. Freshw. Biol. 56, 1248–1265. https://doi.org/10.1111/j.1365-
     2427.2010.02560.x
Chon, T.S., 2011. Self-Organizing Maps applied to ecological sciences. Ecol. Inform. 6, 50–61.
     https://doi.org/10.1016/j.ecoinf.2010.11.002
                                                 173


Cibin, R., Sudheer, K.P., Chaubey, I., 2010. Sensitivity and identifiability of stream flow
     generation parameters of the SWAT model. Hydrol. Process. 24, 1133–1148.
     https://doi.org/10.1002/hyp.7568
Clapcott, J.E., Collier, K.J., Death, R.G., Goodwin, E.O., Harding, J.S., Kelly, D., Leathwick,
     J.R., Young, R.G., 2012. Quantifying relationships between land-use gradients and
     structural and functional indicators of stream ecological integrity. Freshw. Biol. 57, 74–90.
     https://doi.org/10.1111/j.1365-2427.2011.02696.x
Clapcott, J.E., Goodwin, E.O., Snelder, T.H., Collier, K.J., Neale, M.W., Greenfield, S., 2017.
     Finding reference: a comparison of modelling approaches for predicting macroinvertebrate
     community index benchmarks. New Zeal. J. Mar. Freshw. Res. 51, 44–59.
     https://doi.org/10.1080/00288330.2016.1265994
Clapcott, J.E., Goodwin, E.O., Young, R.G., Kelly, D.J., 2014. A multimetric approach for
     predicting the ecological integrity of New Zealand streams. Knowl. Manag. Aquat. Ecosyst.
     03. https://doi.org/10.1051/kmae/2014027
Clapcott, J.E., Young, R.G., Goodwin, E.O., Leathwick, J.R., 2010. Exploring the response of
     functional indicators of stream health to land-use gradients. Freshw. Biol. 55, 2181–2199.
     https://doi.org/10.1111/j.1365-2427.2010.02463.x
Coello Coello, C.A., Lamont, G.B., Veldhuizen, D. a Van, 2007. Evolutionary Algorithms for
     Solving Multi-Objective Problems, 2nd ed, Genetic and Evolutionary Computation Series.
     Springer US, Boston, MA.
Confesor, R.B., Whittaker, G.W., 2007. Automatic Calibration of Hydrologic Models With
     Multi-Objective Evolutionary Algorithm and Pareto Optimization1. JAWRA J. Am. Water
     Resour. Assoc. 43, 981–989. https://doi.org/10.1111/j.1752-1688.2007.00080.x
Cutler, D.R., Edwards, T.C., Beard, K.H., Cutler, A., Hess, K.T., Gibson, J., Lawler, J.J., 2007.
     Random Forests for Classification in Ecology. Ecology 88, 2783–2792.
     https://doi.org/10.1890/07-0539.1
D’Ambrosio, J.L., Williams, L.R., Williams, M.G., Witter, J.D., Ward, A.D., 2014.
     Geomorphology, habitat, and spatial location influences on fish and macroinvertebrate
     communities in modified channels of an agriculturally-dominated watershed in Ohio, USA.
     Ecol. Eng. 68, 32–46. https://doi.org/10.1016/j.ecoleng.2014.03.037
D’Ambrosio, J.L., Williams, L.R., Witter, J.D., Ward, A., 2009. Effects of geomorphology,
     habitat, and spatial location on fish assemblages in a watershed in Ohio, USA. Environ.
     Monit. Assess. 148, 325–341. https://doi.org/10.1007/s10661-008-0163-3
Damanik-Ambarita, M.N., Everaert, G., Forio, M.A.E., Nguyen, T.H.T., Lock, K., Musonge,
     P.L.S., Suhareva, N., Dominguez-Granda, L., Bennetsen, E., Boets, P., Goethals, P.L.M.,
     2016. Generalized linear models to identify key hydromorphological and chemical variables
     determining the occurrence of macroinvertebrates in the Guayas River Basin (Ecuador).
     Water (Switzerland) 8. https://doi.org/10.3390/W8070297
                                                 174


Daneshvar, F., Nejadhashemi, A.P., Herman, M.R., Abouali, M., 2017a. Response of benthic
    macroinvertebrate communities to climate change. Ecohydrol. Hydrobiol. 17, 63–72.
    https://doi.org/10.1016/j.ecohyd.2016.12.002
Daneshvar, F., Nejadhashemi, A.P., Woznicki, S.A., Herman, M.R., 2017b. Applications of
    computational fluid dynamics in fish and habitat studies. Ecohydrol. Hydrobiol. 17, 53–62.
    https://doi.org/10.1016/j.ecohyd.2016.12.005
De’ath, G., 2007. Boosted regression trees for ecological modeling and prediction. Ecology 88,
    243–251. https://doi.org/10.1890/0012-9658(2007)88[243:BTFEMA]2.0.CO;2
De’ath, G., 1999. Principal Curves : A New Technique for Indirect and Direct Gradient Analysis.
    Ecology 80, 2237–2253. https://doi.org/10.1890/0012-
    9658(1999)080[2237:PCANTF]2.0.CO;2
Death, R.G., Death, F., Stubbington, R., Joy, M.K., van den Belt, M., 2015. How good are
    Bayesian belief networks for environmental management? A test with data from an
    agricultural river catchment. Freshw. Biol. 60, 2297–2309.
    https://doi.org/10.1111/fwb.12655
Deb, K., 2001. Multi-objective optimization using evolutionary algorithms. John Wiley & Sons.
Deb, K., Agrawal, R.B., 1995. Simulated Binary Crossover for Continuous Search Space.
    Complex Syst. 9, 115–148.
Deb, K., Gupta, H., 2006. Introducing Robustness in Multi-Objective Optimization. Evol.
    Comput. 14, 463–494. https://doi.org/10.1162/evco.2006.14.4.463
Deb, K., Jain, H., 2014. An Evolutionary Many-Objective Optimization Algorithm Using
    Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With
    Box Constraints. IEEE Trans. Evol. Comput. 18, 577–601.
    https://doi.org/10.1109/TEVC.2013.2281535
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T., 2002a. A fast and elitist multiobjective genetic
    algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197.
    https://doi.org/10.1109/4235.996017
Deb, K., Thiele, L., Laumanns, M., Zitzler, E., 2002b. Scalable multi-objective optimization test
    problems. Proc. 2002 Congr. Evol. Comput. CEC 2002 1, 825–830.
    https://doi.org/10.1109/CEC.2002.1007032
Dhungel, S., Tarboton, D.G., Jin, J., Hawkins, C.P., 2016. Potential Effects of Climate Change
    on Ecologically Relevant Streamflow Regimes. River Res. Appl. 32, 1827–1840.
    https://doi.org/10.1002/rra.3029
Donohue, I., McGarrigle, M.L., Mills, P., 2006. Linking catchment characteristics and water
    chemistry with the ecological status of Irish rivers. Water Res. 40, 91–98.
    https://doi.org/10.1016/j.watres.2005.10.027
                                                175


Dudgeon, D., Arthington, A.H., Gessner, M.O., Kawabata, Z.-I., Knowler, D.J., Lévêque, C.,
     Naiman, R.J., Prieur-Richard, A.-H., Soto, D., Stiassny, M.L.J., Sullivan, C.A., 2006.
     Freshwater biodiversity: importance, threats, status and conservation challenges. Biol. Rev.
     Camb. Philos. Soc. 81, 163–82. https://doi.org/10.1017/S1464793105006950
Dyer, F., ElSawah, S., Croke, B., Griffiths, R., Harrison, E., Lucena-Moya, P., Jakeman, A.,
     2014. The effects of climate change on ecologically-relevant flow regime and water quality
     attributes. Stoch. Environ. Res. Risk Assess. 28, 67–82. https://doi.org/10.1007/s00477-
     013-0744-8
Eckart, K., McPhee, Z., Bolisetti, T., 2017. Performance and implementation of low impact
     development – A review. Sci. Total Environ. 607–608, 413–432.
     https://doi.org/10.1016/j.scitotenv.2017.06.254
Efstratiadis, A., Koutsoyiannis, D., 2010. One decade of multi-objective calibration approaches
     in hydrological modelling: a review. Hydrol. Sci. J. 55, 58–78.
     https://doi.org/10.1080/02626660903526292
Einheuser, M.D., Nejadhashemi, A.P., Sowa, S.P., Wang, L., Hamaamin, Y.A., Woznicki, S.A.,
     2012. Modeling the effects of conservation practices on stream health. Sci. Total Environ.
     435–436, 380–391. https://doi.org/10.1016/j.scitotenv.2012.07.033
Einheuser, M.D., Nejadhashemi, A.P., Wang, L., Sowa, S.P., Woznicki, S.A., 2013a. Linking
     Biological Integrity and Watershed Models to Assess the Impacts of Historical Land Use
     and Climate Changes on Stream Health. Environ. Manage. 51, 1147–1163.
     https://doi.org/10.1007/s00267-013-0043-7
Einheuser, M.D., Nejadhashemi, A.P., Woznicki, S.A., 2013b. Simulating stream health
     sensitivity to landscape changes due to bioenergy crops expansion. Biomass and Bioenergy
     58, 198–209. https://doi.org/10.1016/j.biombioe.2013.08.025
Elias, C.L., Calapez, A.R., Almeida, S.F.P., Chessman, B., Simões, N., Feio, M.J., 2016.
     Predicting reference conditions for river bioassessment by incorporating boosted trees in the
     environmental filters method. Ecol. Indic. 69, 239–251.
     https://doi.org/10.1016/j.ecolind.2016.04.027
Elith, J., Graham, C.H., 2009. Do they? How do they? WHY do they differ? On finding reasons
     for differing performances of species distribution models. Ecography (Cop.). 32, 66–77.
     https://doi.org/10.1111/j.1600-0587.2008.05505.x
Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., Hijmans, R.J.,
     Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A.,
     Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M., Peterson, A.T.,
     Phillips, S.J., Richardson, K., Scachetti-Pereira, R., Schapire, R.E., Soberon, J., Williams,
     S., Wisz, M.S., Zimmermann, N.E., 2006. Novel methods improve prediction of species’
     distributions from occurrence data. Ecography (Cop.). 29, 129–151.
     https://doi.org/10.1111/j.2006.0906-7590.04596.x
                                                176


Elith, J., Leathwick, J.R., Hastie, T., 2008. A working guide to boosted regression trees. J. Anim.
     Ecol. 77, 802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x
Eng, K., Grantham, T.E., Carlisle, D.M., Wolock, D.M., 2017. Predictability and selection of
     hydrologic metrics in riverine ecohydrology. Freshw. Sci. 36, 915–926.
     https://doi.org/10.1086/694912
Esselman, P.C., Infante, D.M., Wang, L., Cooper, A.R., Wieferich, D., Tsang, Y.P., Thornbrugh,
     D.J., Taylor, W.W., 2013. Regional fish community indicators of landscape disturbance to
     catchments of the conterminous United States. Ecol. Indic. 26, 163–173.
     https://doi.org/10.1016/j.ecolind.2012.10.028
Euser, T., Winsemius, H.C., Hrachowitz, M., Fenicia, F., Uhlenbrook, S., Savenije, H.H.G.,
     2013. A framework to assess the realism of model structures using hydrological signatures.
     Hydrol. Earth Syst. Sci. 17, 1893–1912. https://doi.org/10.5194/hess-17-1893-2013
Everaert, G., De Neve, J., Boets, P., Dominguez-Granda, L., Mereta, S.T., Ambelu, A., Hoang,
     T.H., Goethals, P.L.M., Thas, O., 2014. Comparison of the abiotic preferences of
     macroinvertebrates in tropical river basins. PLoS One 9.
     https://doi.org/10.1371/journal.pone.0108898
Evin, G., Thyer, M., Kavetski, D., McInerney, D., Kuczera, G., 2014. Comparison of joint versus
     postprocessor approaches for hydrological uncertainty estimation accounting for error
     autocorrelation and heteroscedasticity. Water Resour. Res. 50, 2350–2375.
     https://doi.org/10.1002/2013WR014185
Falcone, J.A., Carlisle, D.M., Weber, L.C., 2010. Quantifying human disturbance in watersheds:
     Variable selection and performance of a GIS-based disturbance index for predicting the
     biological condition of perennial streams. Ecol. Indic. 10, 264–273.
     https://doi.org/10.1016/j.ecolind.2009.05.005
Fan, J., Wu, J., Kong, W., Zhang, Y.Y.Y., Li, M., Zhang, Y.Y.Y., Meng, W., Zhang, and M.,
     2017. Predicting Bio-indicators of Aquatic Ecosystems Using the Support Vector Machine
     Model in the Taizi River, China. Sustainability 9, 892. https://doi.org/10.3390/su9060892
Farmer, W.H., Vogel, R.M., 2016. On the deterministic and stochastic use of hydrologic models.
     Water Resour. Res. 52, 5619–5633. https://doi.org/10.1002/2016WR019129
Feio, M.J., Poquet, J.M., 2011. Predictive Models for Freshwater Biological Assessment:
     Statistical Approaches, Biological Elements and the Iberian Peninsula Experience: A
     Review. Int. Rev. Hydrobiol. 96, 321–346. https://doi.org/10.1002/iroh.201111376
Fenicia, F., Kavetski, D., Reichert, P., Albert, C., 2018. Signature-Domain Calibration of
     Hydrological Models Using Approximate Bayesian Computation: Empirical Analysis of
     Fundamental Properties. Water Resour. Res. 54, 3958–3987.
     https://doi.org/10.1002/2017WR021616
Fernandez-Palomino, C.A., Hattermann, F.F., Krysanova, V., Vega-Jácome, F., Bronstert, A.,
                                                 177


     2020. Towards a more consistent eco-hydrological modelling through multi-objective
     calibration: a case study in the Andean Vilcanota River basin, Peru. Hydrol. Sci. J. 00, 1–
     16. https://doi.org/10.1080/02626667.2020.1846740
Fernando, T.M.K.G., Maier, H.R., Dandy, G.C., 2009. Selection of input variables for data
     driven models: An average shifted histogram partial mutual information estimator approach.
     J. Hydrol. 367, 165–176. https://doi.org/10.1016/j.jhydrol.2008.10.019
Forio, M.A.E., Landuyt, D., Bennetsen, E., Lock, K., Nguyen, T.H.T., Ambarita, M.N.D.,
     Musonge, P.L.S., Boets, P., Everaert, G., Dominguez-Granda, L., Goethals, P.L.M., 2015.
     Bayesian belief network models to analyse and predict ecological water quality in rivers.
     Ecol. Modell. 312, 222–238. https://doi.org/10.1016/j.ecolmodel.2015.05.025
Fox, E.W., Hill, R.A., Leibowitz, S.G., Olsen, A.R., Thornbrugh, D.J., Weber, M.H., 2017.
     Assessing the accuracy and stability of variable selection methods for random forest
     modeling in ecology. Environ. Monit. Assess. 189. https://doi.org/10.1007/s10661-017-
     6025-0
Friedman, J.H., 2001. Greedy Function Approximation : A Gradient Boosting Machine. Ann.
     Stat. 29, 1189–1232.
Friedman, J.H., 1991. Multivariate adaptive regression splines. Ann. Stat. 1–67.
Frimpong, E.A., Sutton, T.M., Engel, B.A., Simon, T.P., 2005. Spatial-scale effects on relative
     importance of physical habitat predictors of stream health. Environ. Manage. 36, 899–917.
     https://doi.org/10.1007/s00267-004-0357-6
Fukuda, S., De Baets, B., 2016. Data prevalence matters when assessing species’ responses using
     data-driven species distribution models. Ecol. Inform. 32, 69–78.
     https://doi.org/10.1016/j.ecoinf.2016.01.005
Fukuda, S., De Baets, B., Waegeman, W., Verwaeren, J., Mouton, A.M., 2013. Habitat
     prediction and knowledge extraction for spawning European grayling (Thymallus thymallus
     L.) using a broad range of species distribution models. Environ. Model. Softw. 47, 1–6.
     https://doi.org/10.1016/j.envsoft.2013.04.005
Galelli, S., Humphrey, G.B., Maier, H.R., Castelletti, A., Dandy, G.C., Gibbs, M.S., 2014. An
     evaluation framework for input variable selection algorithms for environmental data-driven
     models. Environ. Model. Softw. 62, 33–51. https://doi.org/10.1016/j.envsoft.2014.08.015
Garcia, F., Folton, N., Oudin, L., 2017. Which objective function to calibrate rainfall–runoff
     models for low-flow index simulations? Hydrol. Sci. J. 62, 02626667.2017.1308511.
     https://doi.org/10.1080/02626667.2017.1308511
Gazendam, E., Gharabaghi, B., Ackerman, J.D., Whiteley, H., 2016. Integrative neural networks
     models for stream assessment in restoration projects. J. Hydrol. 536, 339–350.
     https://doi.org/10.1016/j.jhydrol.2016.02.057
                                                178


Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B., 2013. Bayesian
     data analysis. CRC press.
Gelman, A., Rubin, D.B., 1992. Inference from Iterative Simulation Using Multiple Sequences.
     Stat. Sci. 7, 15–51. https://doi.org/10.1214/ss/1177011136
George, R., McManamay, R., Perry, D., Sabo, J., Ruddell, B.L., 2021. Indicators of hydro-
     ecological alteration for the rivers of the United States. Ecol. Indic. 120, 106908.
     https://doi.org/10.1016/j.ecolind.2020.106908
Gevrey, M., Dimopoulos, I., Lek, S., 2003. Review and comparison of methods to study the
     contribution of variables in artificial neural network models. Ecol. Modell. 160, 249–264.
     https://doi.org/10.1016/S0304-3800(02)00257-0
Gieswein, A., Hering, D., Feld, C.K., 2017. Additive effects prevail: The response of biota to
     multiple stressors in an intensively monitored watershed. Sci. Total Environ. 593–594, 27–
     35. https://doi.org/10.1016/j.scitotenv.2017.03.116
Goethals, P.L.M., Dedecker, A.P., Gabriels, W., Lek, S., De Pauw, N., 2007. Applications of
     artificial neural networks predicting macroinvertebrates in freshwaters. Aquat. Ecol. 41,
     491–508. https://doi.org/10.1007/s10452-007-9093-3
Goldberg, D., 1991. Real-coded genetic algorithms, virtual alphabets, and blocking. Complex
     Syst. 5, 139–167.
Golden, H.E., Lane, C.R., Prues, A.G., D’Amico, E., 2016. Boosted Regression Tree Models to
     Explain Watershed Nutrient Concentrations and Biological Condition. J. Am. Water
     Resour. Assoc. 52, 1251–1274. https://doi.org/10.1111/1752-1688.12447
Grenouillet, G., Buisson, L., Casajus, N., Lek, S., 2011. Ensemble modelling of species
     distribution: The effects of geographical and environmental ranges. Ecography (Cop.). 34,
     9–17. https://doi.org/10.1111/j.1600-0587.2010.06152.x
Grill, G., Lehner, B., Thieme, M., Geenen, B., Tickner, D., Antonelli, F., Babu, S., Borrelli, P.,
     Cheng, L., Crochetiere, H., Ehalt Macedo, H., Filgueiras, R., Goichot, M., Higgins, J.,
     Hogan, Z., Lip, B., McClain, M.E., Meng, J., Mulligan, M., Nilsson, C., Olden, J.D.,
     Opperman, J.J., Petry, P., Reidy Liermann, C., Sáenz, L., Salinas-Rodríguez, S., Schelle, P.,
     Schmitt, R.J.P., Snider, J., Tan, F., Tockner, K., Valdujo, P.H., van Soesbergen, A., Zarfl,
     C., 2019. Mapping the world’s free-flowing rivers. Nature 569, 215–221.
     https://doi.org/10.1038/s41586-019-1111-9
Guo, C., Lek, S., Ye, S., Li, W., Liu, J., Li, Z., 2015a. Uncertainty in ensemble modelling of
     large-scale species distribution: Effects from species characteristics and model techniques.
     Ecol. Modell. 306, 67–75. https://doi.org/10.1016/j.ecolmodel.2014.08.002
Guo, C., Park, Y., Liu, Y., Lek, S., 2015b. Toward a new generation of ecological modelling
     techniques, in: Advanced Modelling Techniques for Studying Global Changes in
     Environmental Sciences. Elsevier B.V., pp. 11–44. https://doi.org/10.1016/B978-0-444-
                                                   179


     63536-5.00002-8
Guo, J., Zhou, J., Lu, J., Zou, Q., Zhang, H., Bi, S., 2014. Multi-objective optimization of
     empirical hydrological model for streamflow prediction. J. Hydrol. 511, 242–253.
     https://doi.org/10.1016/j.jhydrol.2014.01.047
Guo, Y., Zhang, Y., Zhang, L., Wang, Z., 2021. Regionalization of hydrological modeling for
     predicting streamflow in ungauged catchments: A comprehensive review. Wiley Interdiscip.
     Rev. Water 8, 1–32. https://doi.org/10.1002/wat2.1487
Gupta, H. V., Kling, H., Yilmaz, K.K., Martinez, G.F., 2009. Decomposition of the mean
     squared error and NSE performance criteria: Implications for improving hydrological
     modelling. J. Hydrol. 377, 80–91. https://doi.org/10.1016/j.jhydrol.2009.08.003
Gupta, H. V., Wagener, T., Liu, Y., 2008. Reconciling theory with observations: elements of a
     diagnostic approach to model evaluation. Hydrol. Process. 22, 3802–3813.
     https://doi.org/10.1002/hyp.6989
Guse, B., Kail, J., Radinger, J., Schröder, M., Kiesel, J., Hering, D., Wolter, C., Fohrer, N., 2015.
     Eco-hydrologic model cascades: Simulating land use and climate change impacts on
     hydrology, hydraulics and habitats for fish and macroinvertebrates. Sci. Total Environ. 533,
     542–556. https://doi.org/10.1016/j.scitotenv.2015.05.078
Hall, K.R., Herbert, M.E., Sowa, S.P., Mysorekar, S., Woznicki, S.A., Nejadhashemi, P.A.,
     Wang, L., 2017. Reducing current and future risks: Using climate change scenarios to test
     an agricultural conservation framework. J. Great Lakes Res. 43, 59–68.
     https://doi.org/10.1016/j.jglr.2016.11.005
Hallouin, T., Bruen, M., O’Loughlin, F.E., 2020. Calibration of hydrological models for
     ecologically relevant streamflow predictions: A trade-off between fitting well to data and
     estimating consistent parameter sets? Hydrol. Earth Syst. Sci. 24, 1031–1054.
     https://doi.org/10.5194/hess-24-1031-2020
Hassanzadeh, E., Elshorbagy, A., Nazemi, A., Jardine, T.D., Wheater, H., Lindenschmidt, K.E.,
     2017. The ecohydrological vulnerability of a large inland delta to changing regional
     streamflows and upstream irrigation expansion. Ecohydrology 10, 1–17.
     https://doi.org/10.1002/eco.1824
Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning, 2nd ed,
     Elements of Statistical Learning, Springer Series in Statistics. Springer-Verlag, New York,
     NY. https://doi.org/10.1198/jasa.2004.s339
Hawkins, C.P., Norris, R.H., Hogue, J.N., Feminella, J.W., 2000. Development and evaluation of
     predictive models for measuring the biological integrity of streams. Ecol. Appl. 10, 1456.
     https://doi.org/10.1890/1051-0761(2000)010{[}1456:DAEOPM]2.0.CO;2
He, Y., Wang, J., Lek-Ang, S., Lek, S., 2010. Predicting assemblages and species richness of
     endemic fish in the upper Yangtze River. Sci. Total Environ. 408, 4211–4220.
                                                 180


    https://doi.org/10.1016/j.scitotenv.2010.04.052
Henriksen, J.A., Kennen, J, G., Nieswand, S., 2006. Users Manual for the Hydroecological
    Integrity Assessment Process Software (including the New Jersey Assessment Tools).
    Open-File Report 2006-1093.
Hering, D., Borja, A., Carstensen, J., Carvalho, L., Elliott, M., Feld, C.K., Heiskanen, A.S.,
    Johnson, R.K., Moe, J., Pont, D., Solheim, A.L., de Bund, W. van, 2010. The European
    Water Framework Directive at the age of 10: A critical review of the achievements with
    recommendations for the future. Sci. Total Environ. 408, 4007–4019.
    https://doi.org/10.1016/j.scitotenv.2010.05.031
Herman, M.R., Hernandez-Suarez, J.S., Nejadhashemi, A.P., Kropp, I., Sadeghi, A.M., 2020.
    Evaluation of Multi- and Many-Objective Optimization Techniques to Improve the
    Performance of a Hydrologic Model Using Evapotranspiration Remote-Sensing Data. J.
    Hydrol. Eng. 25, 04020006. https://doi.org/10.1061/(asce)he.1943-5584.0001896
Herman, M.R., Nejadhashemi, A.P., 2015. A review of macroinvertebrate- and fish-based stream
    health indices. Ecohydrol. Hydrobiol. 15, 53–67.
    https://doi.org/10.1016/j.ecohyd.2015.04.001
Herman, M.R., Nejadhashemi, A.P., Abouali, M., Hernandez-Suarez, J.S., Daneshvar, F., Zhang,
    Z., Anderson, M.C., Sadeghi, A.M., Hain, C.R., Sharifi, A., 2018. Evaluating the role of
    evapotranspiration remote sensing data in improving hydrological modeling predictability.
    J. Hydrol. 556, 39–49. https://doi.org/10.1016/j.jhydrol.2017.11.009
Herman, M.R., Nejadhashemi, A.P., Daneshvar, F., Abouali, M., Ross, D.M., Woznicki, S.A.,
    Zhang, Z., 2016. Optimization of bioenergy crop selection and placement based on a stream
    health indicator using an evolutionary algorithm. J. Environ. Manage. 181, 413–424.
    https://doi.org/10.1016/j.jenvman.2016.07.005
Herman, M.R., Nejadhashemi, A.P., Daneshvar, F., Ross, D.M., Woznicki, S.A., Zhang, Z.,
    Esfahanian, A.-H.H., 2015. Optimization of conservation practice implementation strategies
    in the context of stream health. Ecol. Eng. 84, 1–12.
    https://doi.org/10.1016/j.ecoleng.2015.07.011
Hermoso, V., Linke, S., Prenda, J., Possingham, H.P., 2011. Addressing longitudinal
    connectivity in the systematic conservation planning of fresh waters. Freshw. Biol. 56, 57–
    70. https://doi.org/10.1111/j.1365-2427.2009.02390.x
Hernandez-Suarez, J.S., Nejadhashemi, A.P., 2018. A Review of Macroinvertebrate- and Fish-
    based Stream Health Modeling Techniques. Ecohydrology e2022.
    https://doi.org/10.1002/eco.2022
Hernandez-Suarez, J.S., Nejadhashemi, A.P., Kropp, I.M., Abouali, M., Zhang, Z., Deb, K.,
    2018. Evaluation of the impacts of hydrologic model calibration methods on predictability
    of ecologically-relevant hydrologic indices. J. Hydrol. 564, 758–772.
    https://doi.org/10.1016/j.jhydrol.2018.07.056
                                                181


Hill, R.A., Fox, E.W., Leibowitz, S.G., Olsen, A.R., Thornbrugh, D.J., Weber, M.H., 2017.
      Predictive mapping of the biotic condition of conterminous U.S. rivers and streams. Ecol.
      Appl. 27, 2397–2415. https://doi.org/10.1002/eap.1617
Hipsey, M.R., Hamilton, D.P., Hanson, P.C., Carey, C.C., Coletti, J.Z., Read, J.S., Ibelings,
      B.W., Valesini, F.J., Brookes, J.D., 2015. Predicting the resilience and recovery of aquatic
      systems: A framework for model evolution within environmental observatories. Water
      Resour. Res. 51, 7023–7043. https://doi.org/10.1002/2015WR017175
Hoang, T.H., Lock, K., Mouton, A., Goethals, P.L.M., 2010. Application of classification trees
      and support vector machines to model the presence of macroinvertebrates in rivers in
      Vietnam. Ecol. Inform. 5, 140–146. https://doi.org/10.1016/j.ecoinf.2009.12.001
Holguin-Gonzalez, J.E., Boets, P., Alvarado, A., Cisneros, F., Carrasco, M.C., Wyseure, G.,
      Nopens, I., Goethals, P.L.M., 2013a. Integrating hydraulic, physicochemical and ecological
      models to assess the effectiveness of water quality management strategies for the River
      Cuenca in Ecuador. Ecol. Modell. 254, 1–14.
      https://doi.org/10.1016/j.ecolmodel.2013.01.011
Holguin-Gonzalez, J.E., Boets, P., Everaert, G., Pauwels, I.S., Lock, K., Gobeyn, S., Benedetti,
      L., Amerlinck, Y., Nopens, I., Goethals, P.L.M., 2014. Development and assessment of an
      integrated ecological modelling framework to assess the effect of investments in wastewater
      treatment on water quality. Water Sci. Technol. 70, 1798–1807.
      https://doi.org/10.2166/wst.2014.316
Holguin-Gonzalez, J.E., Everaert, G., Boets, P., Galvis, A., Goethals, P.L.M., 2013b.
      Development and application of an integrated ecological modelling framework to analyze
      the impact of wastewater discharges on the ecological water quality of rivers. Environ.
      Model. Softw. 48, 27–36. https://doi.org/10.1016/j.envsoft.2013.06.004
Jähnig, S.C., Kuemmerlen, M., Kiesel, J., Domisch, S., Cai, Q., Schmalz, B., Fohrer, N., 2012.
      Modelling of riverine ecosystems by integrating models: conceptual approach, a case study
      and research agenda. J. Biogeogr. 39, 2253–2263. https://doi.org/10.1111/jbi.12009
Jain, H., Deb, K., 2014. An evolutionary many-objective optimization algorithm using reference-
      point based nondominated sorting approach, Part II: Handling constraints and extending to
      an adaptive approach. IEEE Trans. Evol. Comput. 18, 602–622.
      https://doi.org/10.1109/TEVC.2013.2281534
Jang, J.S.R., 1993. ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE Trans. Syst.
      Man Cybern. 23, 665–685. https://doi.org/10.1109/21.256541
Jehn, F.U., Chamorro, A., Houska, T., Breuer, L., 2019. Trade-offs between parameter
      constraints and model realism: a case study. Sci. Rep. 9, 1–12.
      https://doi.org/10.1038/s41598-019-46963-6
Jelks, H.L., Walsh, S.J., Burkhead, N.M., Contreras-Balderas, S., Diaz-Pardo, E., Hendrickson,
      D.A., Lyons, J., Mandrak, N.E., McCormick, F., Nelson, J.S., Platania, S.P., Porter, B.A.,
                                                 182


     Renaud, C.B., Schmitter-Soto, J.J., Taylor, E.B., Warren, M.L., 2008. Conservation Status
     of Imperiled North American Freshwater and Diadromous Fishes. Fisheries 33, 372–407.
     https://doi.org/10.1577/1548-8446-33.8.372
Jerves-Cobo, R., Everaert, G., Iñiguez-Vela, X., Córdova-Vela, G., Díaz-Granda, C., Cisneros,
     F., Nopens, I., Goethals, P.L.M., 2017. A methodology to model environmental preferences
     of EPT taxa in the Machangara River Basin (Ecuador), Water (Switzerland).
     https://doi.org/10.3390/w9030195
Johnson, L.B., Host, G.E., 2010. Recent developments in landscape approaches for the study of
     aquatic ecosystems. J. North Am. Benthol. Soc. 29, 41–66. https://doi.org/10.1899/09-030.1
Johnson, Z.C., Snyder, C.D., Hitt, N.P., 2017. Landform features and seasonal precipitation
     predict shallow groundwater influence on temperature in headwater streams. Water Resour.
     Res. 53, 5788–5812. https://doi.org/10.1002/2017WR020455
Kail, J., Guse, B., Radinger, J., Schröder, M., Kiesel, J., Kleinhans, M., Schuurman, F., Fohrer,
     N., Hering, D., Wolter, C., 2015. A Modelling Framework to Assess the Effect of Pressures
     on River Abiotic Habitat Conditions and Biota. PLoS One 10, e0130228.
     https://doi.org/10.1371/journal.pone.0130228
Kakouei, K., Kiesel, J., Kail, J., Pusch, M., Jähnig, S.C., 2017. Quantitative hydrological
     preferences of benthic stream invertebrates in Germany. Ecol. Indic. 79, 163–172.
     https://doi.org/10.1016/j.ecolind.2017.04.029
Kalteh, A.M., Hjorth, P., Berndtsson, R., 2008. Review of the self-organizing map (SOM)
     approach in water resources: Analysis, modelling and application. Environ. Model. Softw.
     23, 835–845. https://doi.org/10.1016/j.envsoft.2007.10.001
Karr, J., 1996. Ecological integrity and ecological health are not the same. Eng. within Ecol.
     constraints.
Karr, J.R., 1999. Defining and measuring river health. Freshw. Biol. 41, 221–234.
     https://doi.org/10.1046/j.1365-2427.1999.00427.x
Karr, J.R., 1981. Assessment of Biotic Integrity Using Fish Communities. Fisheries 6, 21–27.
     https://doi.org/10.1577/1548-8446(1981)006<0021:AOBIUF>2.0.CO;2
Karr, J.R., Dudley, D.R., 1981. Ecological perspective on water quality goals. Environ. Manage.
     5, 55–68. https://doi.org/10.1007/BF01866609
Karr, J.R., Yoder, C.O., 2004. Biological assessment and criteria improve total maximum daily
     load decision making. J. Environ. Eng. June, 594–604.
Kavetski, D., Fenicia, F., Reichert, P., Albert, C., 2018. Signature-Domain Calibration of
     Hydrological Models Using Approximate Bayesian Computation: Theory and Comparison
     to Existing Applications. Water Resour. Res. 54, 4059–4083.
     https://doi.org/10.1002/2017WR020528
                                                 183


Kennard, M.J., Mackay, S.J., Pusey, B.J., Olden, J.D., Marsh, N., 2010a. Quantifying uncertainty
     in estimation of hydrologic metrics for ecohydrological studies. River Res. Appl. 26, 137–
     156. https://doi.org/10.1002/rra.1249
Kennard, M.J., Pusey, B.J., Olden, J.D., MacKay, S.J., Stein, J.L., Marsh, N., 2010b.
     Classification of natural flow regimes in Australia to support environmental flow
     management. Freshw. Biol. 55, 171–193. https://doi.org/10.1111/j.1365-2427.2009.02307.x
Kennen, J.G., Kauffman, L.J., Ayers, M.A., Wolock, D.M., Colarullo, S.J., 2008. Use of an
     integrated flow model to estimate ecologically relevant hydrologic characteristics at stream
     biomonitoring sites. Ecol. Modell. 211, 57–76.
     https://doi.org/10.1016/j.ecolmodel.2007.08.014
Kent, M., 2006. Numerical classification and ordination methods in biogeography. Prog. Phys.
     Geogr. 30, 399–408. https://doi.org/10.1191/0309133306pp489pr
Kerans, B.L.L., Karr, J.R., 1994. A Benthic Index of Biotic Integrity (B-IBI) for Rivers of the
     Tennessee Valley. Ecol. Appl. 4, 768–785. https://doi.org/10.2307/1942007
Kiesel, J., Guse, B., Pfannerstill, M., Kakouei, K., Jähnig, S.C., Fohrer, N., 2017. Improving
     hydrological model optimization for riverine species. Ecol. Indic. 80, 376–385.
     https://doi.org/10.1016/j.ecolind.2017.04.032
Kiesel, J., Kakouei, K., Guse, B., Fohrer, N., Jähnig, S.C., 2020. When is a hydrological model
     sufficiently calibrated to depict flow preferences of riverine species? Ecohydrology 13, 1–
     15. https://doi.org/10.1002/eco.2193
Kling, H., Fuchs, M., Paulin, M., 2012. Runoff conditions in the upper Danube basin under an
     ensemble of climate change scenarios. J. Hydrol. 424–425, 264–277.
     https://doi.org/10.1016/j.jhydrol.2012.01.011
Knight, R.R., Gain, W.S., Wolfe, W.J., 2012. Modelling ecological flow regime: An example
     from the Tennessee and Cumberland River basins. Ecohydrology 5, 613–627.
     https://doi.org/10.1002/eco.246
Kollat, J.B., Reed, P.M., Wagener, T., 2012. When are multiobjective calibration trade-offs in
     hydrologic models meaningful? Water Resour. Res. 48, 1–19.
     https://doi.org/10.1029/2011WR011534
Krause, P., Boyle, D.P., Base, F., 2005. Comparison of different efficiency criteria for
     hydrological model assessment. Adv. Geosci. 89–97.
Kropp, I., Nejadhashemi, A.P., Deb, K., Abouali, M., Roy, P.C., Adhikari, U., Hoogenboom, G.,
     2019. A multi-objective approach to water and nutrient efficiency for sustainable
     agricultural intensification. Agric. Syst. 173, 289–302.
     https://doi.org/10.1016/j.agsy.2019.03.014
Kuehne, L.M., Olden, J.D., Strecker, A.L., Lawler, J.J., Theobald, D.M., 2017. Past, present, and
                                                 184


     future of ecological integrity assessment for fresh waters. Front. Ecol. Environ. 15, 197–
     205. https://doi.org/10.1002/fee.1483
Kuemmerlen, M., Schmalz, B., Guse, B., Cai, Q., Fohrer, N., Jähnig, S.C., 2014. Integrating
     catchment properties in small scale species distribution models of stream
     macroinvertebrates. Ecol. Modell. 277, 77–86.
     https://doi.org/10.1016/j.ecolmodel.2014.01.020
Kuemmerlen, M., Stoll, S., Sundermann, A., Haase, P., 2016. Long-term monitoring data meet
     freshwater species distribution models: Lessons from an LTER-site. Ecol. Indic. 65, 122–
     132. https://doi.org/10.1016/j.ecolind.2015.08.008
Kwon, Y.S., Bae, M.J., Hwang, S.J., Kim, S.H., Park, Y.S., 2015. Predicting potential impacts of
     climate change on freshwater fish in Korea. Ecol. Inform. 29, 156–165.
     https://doi.org/10.1016/j.ecoinf.2014.10.002
Laloy, E., Vrugt, J.A., 2012. High-dimensional posterior exploration of hydrologic models using
     multiple-try DREAM (ZS) and high-performance computing. Water Resour. Res. 48, 1–18.
     https://doi.org/10.1029/2011WR010608
Landuyt, D., Broekx, S., D’hondt, R., Engelen, G., Aertsens, J., Goethals, P.L.M., 2013. A
     review of Bayesian belief networks in ecosystem service modelling. Environ. Model. Softw.
     46, 1–11. https://doi.org/10.1016/j.envsoft.2013.03.011
Leathwick, J.R., Elith, J., Francis, M.P., Hastie, T., Taylor, P., 2006a. Variation in demersal fish
     species richness in the oceans surrounding New Zealand: An analysis using boosted
     regression trees. Mar. Ecol. Prog. Ser. 321, 267–281. https://doi.org/10.3354/meps321267
Leathwick, J.R., Elith, J., Hastie, T., 2006b. Comparative performance of generalized additive
     models and multivariate adaptive regression splines for statistical modelling of species
     distributions. Ecol. Modell. 199, 188–196. https://doi.org/10.1016/j.ecolmodel.2006.05.022
Leclere, J., Oberdorff, T., Belliard, J., Leprieur, F., 2011. A comparison of modeling techniques
     to predict juvenile 0+ fish species occurrences in a large river system. Ecol. Inform. 6, 276–
     285. https://doi.org/10.1016/j.ecoinf.2011.05.001
Lecun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444.
     https://doi.org/10.1038/nature14539
Leigh, C., Stewart-Koster, B., Sheldon, F., Burford, M.A., 2012. Understanding multiple
     ecological responses to anthropogenic disturbance: Rivers and potential flow regime
     change. Ecol. Appl. 22, 250–263. https://doi.org/10.1890/11-0963.1
Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J., Aulagnier, S., 1996. Application of
     neural networks to modelling nonlinear relationships in ecology. Ecol. Modell. 90, 39–52.
     https://doi.org/10.1016/0304-3800(95)00142-5
Lek, S., Guégan, J.F., 1999. Artificial neural networks as a tool in ecological modelling, an
                                                  185


     introduction. Ecol. Modell. 120, 65–73. https://doi.org/10.1016/S0304-3800(99)00092-7
Ley, R., Hellebrand, H., Casper, M.C., Fenicia, F., 2016. Comparing classical performance
     measures with signature indices derived from flow duration curves to assess model
     structures as tools for catchment classification. Hydrol. Res. 47, 1–14.
     https://doi.org/10.2166/nh.2015.221
Li, X., Maier, H.R., Zecchin, A.C., 2015. Improved PMI-based input variable selection approach
     for artificial neural network and other data driven environmental and water resource
     models. Environ. Model. Softw. 65, 15–29. https://doi.org/10.1016/j.envsoft.2014.11.028
Li, X., Wang, Y., 2013. Applying various algorithms for species distribution modelling. Integr.
     Zool. 8, 124–135. https://doi.org/10.1111/1749-4877.12000
Li, X., Zhang, Y., Guo, F., Gao, X., Wang, Y., 2018. Predicting the effect of land use and
     climate change on stream macroinvertebrates based on the linkage between structural
     equation modeling and bayesian network. Ecol. Indic. 85, 820–831.
     https://doi.org/10.1016/j.ecolind.2017.11.044
Lin, Y., Chen, Q., Chen, K., Yang, Q., 2016. Modelling the presence and identifying the
     determinant factors of dominant macroinvertebrate taxa in a karst river. Environ. Monit.
     Assess. 188. https://doi.org/10.1007/s10661-016-5322-3
Lu, S., Kayastha, N., Thodsen, H., Van Griensven, a, Andersen, H.E., 2014. Multiobjective
     calibration for comparing channel sediment routing models in the soil and water assessment
     tool. J. Environ. Qual. 43, 110–120. https://doi.org/10.2134/jeq2011.0364
Maddock, I., 1999. The importance of physical habitat assessment for evaluating river health.
     Freshw. Biol. 41, 373–391. https://doi.org/10.1046/j.1365-2427.1999.00437.x
Maier, H.R., Kapelan, Z., Kasprzyk, J., Kollat, J., Matott, L.S., Cunha, M.C., Dandy, G.C.,
     Gibbs, M.S., Keedwell, E., Marchi, A., Ostfeld, A., Savic, D., Solomatine, D.P., Vrugt,
     J.A., Zecchin, A.C., Minsker, B.S., Barbour, E.J., Kuczera, G., Pasha, F., Castelletti, A.,
     Giuliani, M., Reed, P.M., 2014. Evolutionary algorithms and other metaheuristics in water
     resources: Current status, research challenges and future directions. Environ. Model. Softw.
     62, 271–299. https://doi.org/10.1016/j.envsoft.2014.09.013
Maloney, K.O., Schmid, M., Weller, D.E., 2012. Applying additive modelling and gradient
     boosting to assess the effects of watershed and reach characteristics on riverine
     assemblages. Methods Ecol. Evol. 3, 116–128. https://doi.org/10.1111/j.2041-
     210X.2011.00124.x
Maloney, K.O., Weller, D.E., Russell, M.J., Hothorn, T., 2009. Classifying the biological
     condition of small streams: an example using benthic macroinvertebrates. J. North Am.
     Benthol. Soc. 28, 869–884. https://doi.org/10.1899/08-142.1
Manel, S., Williams, H.C., Ormerod, S.J., 2001. Evaluating presence absence models in ecology;
     the need to count for prevalence. J. Appied Ecol. 38, 921–931.
                                                 186


    https://doi.org/10.1046/j.1365-2664.2001.00647.x
Mantyka-Pringle, C.S., Jardine, T.D., Bradford, L., Bharadwaj, L., Kythreotis, A.P., Fresque-
    Baxter, J., Kelly, E., Somers, G., Doig, L.E., Jones, P.D., Lindenschmidt, K.-E., 2017.
    Bridging science and traditional knowledge to assess cumulative impacts of stressors on
    ecosystem health. Environ. Int. 102, 125–137. https://doi.org/10.1016/j.envint.2017.02.008
Mantyka-Pringle, C.S., Martin, T.G., Moffatt, D.B., Linke, S., Rhodes, J.R., 2014.
    Understanding and predicting the combined effects of climate change and land-use change
    on freshwater macroinvertebrates and fish. J. Appl. Ecol. 51, 572–581.
    https://doi.org/10.1111/1365-2664.12236
Marchini, A., Facchinetti, T., Mistri, M., 2009. F-IND: A framework to design fuzzy indices of
    environmental conditions. Ecol. Indic. 9, 485–496.
    https://doi.org/10.1016/j.ecolind.2008.07.004
Marcot, B.G., 2017. Common quandaries and their practical solutions in Bayesian network
    modeling. Ecol. Modell. 358, 1–9. https://doi.org/10.1016/j.ecolmodel.2017.05.011
Marcot, B.G., 2012. Metrics for evaluating performance and uncertainty of Bayesian network
    models. Ecol. Modell. 230, 50–62. https://doi.org/10.1016/j.ecolmodel.2012.01.013
Marcot, B.G., Holthausen, R.S., Raphael, M.G., Rowland, M.M., Wisdom, M.J., 2001. Using
    Bayesian belief networks to evaluate fish and wildlife population viability under land
    management alternatives from an environmental impact statement. For. Ecol. Manage. 153,
    29–42. https://doi.org/10.1016/S0378-1127(01)00452-2
Marcot, B.G., Steventon, J.D., Sutherland, G.D., McCann, R.K., 2006. Guidelines for developing
    and updating Bayesian belief networks applied to ecological modeling and conservation.
    Can. J. For. Res. 36, 3063–3074. https://doi.org/10.1139/x06-135
Mateo-Sagasta, J., Zadeh, S.M., Turral, H., 2018. More people, more food, worse water? - a
    global review of water pollution from agriculture. Food and Agriculture Organization of the
    United Nations and International Water Management Institute, Rome.
Mathews, R., Richter, B.D., 2007. Application of the indicators of hydrologic alteration software
    in environmental flow setting. J. Am. Water Resour. Assoc. 43, 1400–1413.
    https://doi.org/10.1111/j.1752-1688.2007.00099.x
Mathon, B.R., Rizzo, D.M., Kline, M., Alexander, G., Fiske, S., Langdon, R., Stevens, L., 2013.
    Assessing Linkages in Stream Habitat, Geomorphic Condition, and Biological Integrity
    Using a Generalized Regression Neural Network. J. Am. Water Resour. Assoc. 49, 415–
    430. https://doi.org/10.1111/jawr.12030
May, J.T., Brown, L.R., Rehn, A.C., Waite, I.R., Ode, P.R., Mazor, R.D., Schiff, K.C., 2015.
    Correspondence of biological condition models of California streams at statewide and
    regional scales. Environ. Monit. Assess. 187, 1–21. https://doi.org/10.1007/s10661-014-
    4086-x
                                                187


May, R.J., Maier, H.R., Dandy, G.C., Fernando, T.M.K.G., 2008. Non-linear variable selection
    for artificial neural networks using partial mutual information. Environ. Model. Softw. 23,
    1312–1326. https://doi.org/10.1016/j.envsoft.2008.03.007
Mazor, R.D., May, J.T., Sengupta, A., McCune, K.S., Bledsoe, B.P., Stein, E.D., 2018. Tools for
    managing hydrologic alteration on a regional scale: Setting targets to protect stream health.
    Freshw. Biol. 63, 786–803. https://doi.org/10.1111/fwb.13062
McCann, R.K., Marcot, B.G., Ellis, R., 2006. Bayesian belief networks: applications in ecology
    and natural resource management. Can. J. For. Res. 36, 3053–3062.
    https://doi.org/10.1139/x06-238
McCullagh, P., Nelder, J.A., 1989. Generalized linear models, 2nd ed. Chapman & Hall/CRC
    Monographs on Statistics & Applied Probability.
McDonald, K.S., Ryder, D.S., Tighe, M., 2015. Developing best-practice Bayesian Belief
    Networks in ecological risk assessments for freshwater and estuarine ecosystems: A
    quantitative review. J. Environ. Manage. 154, 190–200.
    https://doi.org/10.1016/j.jenvman.2015.02.031
McInerney, D., Thyer, M., Kavetski, D., Lerat, J., Kuczera, G., 2017. Improving probabilistic
    prediction of daily streamflow by identifying Pareto optimal approaches for modeling
    heteroscedastic residual errors. Water Resour. Res. 53, 2199–2239.
    https://doi.org/10.1002/2016WR019168
McKay, M.D., Beckman, R.J., Conover, W.J., 1979. A Comparison of Three Methods for
    Selecting Values of Input Variables in the Analysis of Output from a Computer Code.
    Technometrics 21, 239. https://doi.org/10.2307/1268522
McKay, S.K., Theiling, C.H., Dougherty, M.P., 2019. Comparing outcomes from competing
    models assessing environmental flows in the Minnesota River Basin. Ecol. Eng. X 4,
    100014. https://doi.org/10.1016/j.ecoena.2019.100014
McLaughlin, D.B., Reckhow, K.H., 2017. A Bayesian network assessment of macroinvertebrate
    responses to nutrients and other factors in streams of the Eastern Corn Belt Plains, Ohio,
    USA. Ecol. Modell. 345, 21–29. https://doi.org/10.1016/j.ecolmodel.2016.12.004
Mcmanamay, R.A., Bevelhimer, M.S., Kao, S.C., 2014. Updating the US hydrologic
    classification: An approach to clustering and stratifying ecohydrologic data. Ecohydrology
    7, 903–926. https://doi.org/10.1002/eco.1410
McMillan, H.K., 2020a. Linking hydrologic signatures to hydrologic processes: A review.
    Hydrol. Process. 34, 1393–1409. https://doi.org/10.1002/hyp.13632
McMillan, H.K., 2020b. A review of hydrologic signatures and their applications. Wiley
    Interdiscip. Rev. Water 1–23. https://doi.org/10.1002/wat2.1499
Merriam, E.R., Petty, J.T., Strager, M.P., Maxwell, A.E., Ziemkiewicz, P.F., 2015. Landscape-
                                                188


     based cumulative effects models for predicting stream response to mountaintop mining in
     multistressor Appalachian watersheds. Freshw. Sci. 34, 1006–1019.
     https://doi.org/10.1086/681970.
Merriam, E.R., Petty, J.T., Strager, M.P., Maxwell, A.E., Ziemkiewicz, P.F., 2013. Scenario
     analysis predicts context-dependent stream response to landuse change in a heavily mined
     central Appalachian watershed. Freshw. Sci. 32, 1246–1259. https://doi.org/10.1899/13-
     003.1
Mitchell, T.M., 1999. Machine learning and data mining. Commun. ACM 42, 30–36.
Mittal, N., Bhave, A.G., Mishra, A., Singh, R., 2016. Impact of human intervention and climate
     change on natural flow regime. Water Resour. Manag. 30, 685–699.
     https://doi.org/10.1007/s11269-015-1185-6
Mitteroecker, P., Bookstein, F., 2011. Linear Discrimination, Ordination, and the Visualization
     of Selection Gradients in Modern Morphometrics. Evol. Biol. 38, 100–114.
     https://doi.org/10.1007/s11692-011-9109-8
Mizukami, N., Rakovec, O., Newman, A.J., Clark, M.P., Wood, A.W., Gupta, H. V., Kumar, R.,
     2019. On the choice of calibration metrics for “high-flow” estimation using hydrologic
     models. Hydrol. Earth Syst. Sci. 23, 2601–2614. https://doi.org/10.5194/hess-23-2601-2019
Moges, E., Demissie, Y., Larsen, L., Yassin, F., 2021. Review: Sources of hydrological model
     uncertainties and advances in their analysis. Water (Switzerland) 13, 1–23.
     https://doi.org/10.3390/w13010028
Moisen, G.G., Freeman, E.A., Blackard, J.A., Frescino, T.S., Zimmermann, N.E., Edwards, T.C.,
     2006. Predicting tree species presence and basal area in Utah: A comparison of stochastic
     gradient boosting, generalized additive models, and tree-based methods. Ecol. Modell. 199,
     176–187. https://doi.org/10.1016/j.ecolmodel.2006.05.021
Monteith, J.L., 1965. Evaporation and environment, in: Symposia of the Society for
     Experimental Biology. pp. 205–234.
Moriasi, D.., Arnold, J.G., Van Liew, M.W., Bingner, R.L., Harmel, R.D., Veith, T.L., D. N.
     Moriasi, J. G. Arnold, M. W. Van Liew, R. L. Bingner, R. D. Harmel, T. L. Veith, 2007.
     Model Evaluation Guidelines for Systematic Quantification of Accuracy in Watershed
     Simulations. Trans. ASABE 50, 885–900. https://doi.org/10.13031/2013.23153
Moriasi, D.N., Gitau, M.W., Pai, N., Daggupati, P., 2015. Hydrologic and Water Quality
     Models: Performance Measures and Evaluation Criteria. Trans. ASABE 58, 1763–1785.
     https://doi.org/10.13031/trans.58.10715
Mouton, A.M., De Baets, B., Goethals, P.L.M., 2009. Knowledge-based versus data-driven fuzzy
     habitat suitability models for river management. Environ. Model. Softw. 24, 1014–1018.
     https://doi.org/10.1016/j.envsoft.2009.02.005
                                               189


Mouton, A.M., Dedecker, A.P., Lek, S., Goethals, P.L.M., 2010. Selecting variables for habitat
     suitability of Asellus (Crustacea, Isopoda) by applying input variable contribution methods
     to artificial neural network models. Environ. Model. Assess. 15, 65–79.
     https://doi.org/10.1007/s10666-009-9192-8
Moya, N., Hughes, R.M., Domínguez, E., Gibon, F.M., Goitia, E., Oberdorff, T., 2011.
     Macroinvertebrate-based multimetric predictive models for evaluating the human impact on
     biotic condition of Bolivian streams. Ecol. Indic. 11, 840–847.
     https://doi.org/10.1016/j.ecolind.2010.10.012
Muñoz-Mas, R., Fukuda, S., Pórtoles, J., Martínez-Capel, F., 2018. Revisiting probabilistic
     neural networks: a comparative study with support vector machines and the microhabitat
     suitability for the Eastern Iberian chub (Squalius valentinus). Ecol. Inform. 43, 24–37.
     https://doi.org/10.1016/j.ecoinf.2017.10.008
Muñoz-Mas, R., Martínez-Capel, F., Schneider, M., Mouton, A.M., 2012. Assessment of brown
     trout habitat suitability in the Jucar River Basin (SPAIN): Comparison of data-driven
     approaches with fuzzy-logic models and univariate suitability curves. Sci. Total Environ.
     440, 123–131. https://doi.org/10.1016/j.scitotenv.2012.07.074
Muñoz-Mas, R., Vezza, P., Alcaraz-Hernández, J.D., Martínez-Capel, F., 2016. Risk of invasion
     predicted with support vector machines: A case study on northern pike (Esox Lucius, L.)
     and bleak (Alburnus alburnus, L.). Ecol. Modell. 342, 123–134.
     https://doi.org/10.1016/j.ecolmodel.2016.10.006
Murphy, J.C., Knight, R.R., Wolfe, W.J., S. Gain, W., 2013. Predicting ecological flow regime at
     ungaged sites: a comparison of methods. River Res. Appl. 29, 660–669.
     https://doi.org/10.1002/rra.2570
Mwiya, R.M., Zhang, Z., Zheng, C., Wang, C., 2020. Comparison of approaches for irrigation
     scheduling using AquaCrop and NSGA-III models under climate uncertainty. Sustain. 12.
     https://doi.org/10.3390/su12187694
Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models part I — A
     discussion of principles. J. Hydrol. 10, 282–290. https://doi.org/10.1016/0022-
     1694(70)90255-6
Neitsch, S.L., Arnold, J.G., Kiniry, J.R., Williams, J.R., 2011. Soil and Water Assessment Tool
     Theoretical Documentation Version 2009 Texas Water Resources Institute. Temple, Texas.
Nelder, J.A., Baker, R.J., 2006. Generalized Linear Models. Encycl. Stat. Sci. 4.
Niemi, G.J., McDonald, M.E., 2004. Application of Ecological Indicators. Annu. Rev. Ecol.
     Evol. Syst. 35, 89–111. https://doi.org/10.1146/annurev.ecolsys.35.112202.130132
NOAA-NCEI, 2020. Climate Data Online Data Tools [WWW Document]. URL
     https://www.ncdc.noaa.gov/cdo-web/datatools/ (accessed 7.12.20).
                                                  190


Nojavan A., F., Qian, S.S., Stow, C.A., 2017. Comparative analysis of discretization methods in
    Bayesian networks. Environ. Model. Softw. 87, 64–71.
    https://doi.org/10.1016/j.envsoft.2016.10.007
Ocampo-Duque, W., Ferré-Huguet, N., Domingo, J.L., Schuhmacher, M., 2006. Assessing water
    quality in rivers with fuzzy inference systems: A case study. Environ. Int. 32, 733–742.
    https://doi.org/10.1016/j.envint.2006.03.009
Ocampo-Duque, W., Schuhmacher, M., Domingo, J.L., 2007. A neural-fuzzy approach to
    classify the ecological status in surface waters. Environ. Pollut. 148, 634–641.
    https://doi.org/10.1016/j.envpol.2006.11.027
Olaya-Marín, E.J., Martínez-Capel, F., Soares Costa, R.M., Alcaraz-Hernández, J.D., 2012.
    Modelling native fish richness to evaluate the effects of hydromorphological changes and
    river restoration (Júcar River Basin, Spain). Sci. Total Environ. 440, 95–105.
    https://doi.org/10.1016/j.scitotenv.2012.07.093
Olaya-Marín, E.J., Martínez-Capel, F., Vezza, P., 2013. A comparison of artificial neural
    networks and random forests to predict native fish species richness in Mediterranean rivers.
    Knowl. Manag. Aquat. Ecosyst. 409, 07. https://doi.org/10.1051/kmae/2013052
Olden, J.D., Jackson, D.A., 2002. Illuminating the “black box”: A randomization approach for
    understanding variable contributions in artificial neural networks. Ecol. Modell. 154, 135–
    150. https://doi.org/10.1016/S0304-3800(02)00064-9
Olden, J.D., Lawler, J.J., Poff, N.L., 2008. Machine Learning Methods Without Tears: A Primer
    for Ecologists. Q. Rev. Biol. 83, 171–193. https://doi.org/10.1086/587826
Olden, J.D., Poff, N.L., 2003. Redundancy and the choice of hydrologic indices for
    characterizing streamflow regimes. River Res. Appl. 19, 101–121.
    https://doi.org/10.1002/rra.700
Olsen, M., Troldborg, L., Henriksen, H.J., Conallin, J., Refsgaard, J.C., Boegh, E., 2013.
    Evaluation of a typical hydrological model in relation to environmental flows. J. Hydrol.
    507, 52–62. https://doi.org/10.1016/j.jhydrol.2013.10.022
Oudin, L., Andréassian, V., Mathevet, T., Perrin, C., Michel, C., 2006. Dynamic averaging of
    rainfall-runoff model simulations from complementary model parameterizations. Water
    Resour. Res. 42, 1–10. https://doi.org/10.1029/2005WR004636
Palmer, M., Ruhi, A., 2019. Linkages between flow regime, biota, and ecosystem processes:
    Implications for river restoration. Science (80-. ). 365.
    https://doi.org/10.1126/science.aaw2087
Parker, S.R., Adams, S.K., Lammers, R.W., Stein, E.D., Bledsoe, B.P., 2019. Targeted
    hydrologic model calibration to improve prediction of ecologically-relevant flow metrics. J.
    Hydrol. 573, 546–556. https://doi.org/10.1016/j.jhydrol.2019.03.081
                                               191


Patrick, C.J., Yuan, L.L., 2017. Modeled hydrologic metrics show links between hydrology and
     the functional composition of stream assemblages. Ecol. Appl. 27, 1605–1617.
     https://doi.org/10.1002/eap.1554
Pearl, J., 1986. Fusion, propagation, and structuring in belief networks. Artif. Intell. 29, 241–288.
     https://doi.org/10.1016/0004-3702(86)90072-X
Pérez-Miñana, E., 2016. Improving ecosystem services modelling: Insights from a Bayesian
     network tools review. Environ. Model. Softw. 85, 184–201.
     https://doi.org/10.1016/j.envsoft.2016.07.007
Peters, D.L., Baird, D.J., Monk, W.A., Armanini, D.G., 2012. Establishing Standards and
     Assessment Criteria for Ecological Instream Flow Needs in Agricultural Regions of
     Canada. J. Environ. Qual. 41, 41–51. https://doi.org/10.2134/jeq2011.0094
Peterson, D.P., Wenger, S.J., Rieman, B.E., Isaak, D.J., 2013. Linking climate change and fish
     conservation efforts using spatially explicit decision support tools. Fisheries 38, 112–127.
     https://doi.org/10.1080/03632415.2013.769157
Pfannerstill, M., Bieger, K., Guse, B., Bosch, D.D., Fohrer, N., Arnold, J.G., 2017. How to
     Constrain Multi-Objective Calibrations of the SWAT Model Using Water Balance
     Components. J. Am. Water Resour. Assoc. 53, 532–546. https://doi.org/10.1111/1752-
     1688.12524
Pfannerstill, M., Guse, B., Fohrer, N., 2014. Smart low flow signature metrics for an improved
     overall performance evaluation of hydrological models. J. Hydrol. 510, 447–458.
     https://doi.org/10.1016/j.jhydrol.2013.12.044
Pham, H.V., Torresan, S., Critto, A., Marcomini, A., 2019. Alteration of freshwater ecosystem
     services under global change – A review focusing on the Po River basin (Italy) and the Red
     River basin (Vietnam). Sci. Total Environ. 652, 1347–1365.
     https://doi.org/10.1016/j.scitotenv.2018.10.303
Phan, T.D., Smart, J.C.R., Capon, S.J., Hadwen, W.L., Sahin, O., 2016. Applications of Bayesian
     belief networks in water resource management: A systematic review. Environ. Model.
     Softw. 85, 98–111. https://doi.org/10.1016/j.envsoft.2016.08.006
Phillips, S.B., Aneja, V.P., Kang, D., Arya, S.P., 2006. Modelling and analysis of the
     atmospheric nitrogen deposition in North Carolina. Int. J. Glob. Environ. Issues 6, 231–252.
     https://doi.org/10.1016/j.ecolmodel.2005.03.026
Pilière, A., Schipper, A.M., Breure, T.M., Posthuma, L., de Zwart, D., Dyer, S.D., Huijbregts,
     M.A.J., 2014. Unraveling the relationships between freshwater invertebrate assemblages
     and interacting environmental factors. Freshw. Sci. 33, 1148–1158.
     https://doi.org/10.1086/677898
Poff, N.L., Allan, J.D., Bain, M.B., Karr, J.R., Prestegaard, K.L., Richter, B.D., Sparks, R.E.,
     Stromberg, J.C., 1997. The Natural Flow Regime. Bioscience 47, 769–784.
                                                 192


     https://doi.org/10.2307/1313099
Poff, N.L., Richter, B.D., Arthington, A.H., Bunn, S.E., Naiman, R.J., Kendy, E., Acreman, M.,
     Apse, C., Bledsoe, B.P., Freeman, M.C., Henriksen, J., Jacobson, R.B., Kennen, J.G.,
     Merritt, D.M., O’Keeffe, J.H., Olden, J.D., Rogers, K., Tharme, R.E., Warner, A., 2010.
     The ecological limits of hydrologic alteration (ELOHA): A new framework for developing
     regional environmental flow standards. Freshw. Biol. 55, 147–170.
     https://doi.org/10.1111/j.1365-2427.2009.02204.x
Poff, N.L., Zimmerman, J.K.H., 2010. Ecological responses to altered flow regimes: A literature
     review to inform the science and management of environmental flows. Freshw. Biol. 55,
     194–205. https://doi.org/10.1111/j.1365-2427.2009.02272.x
Pond, G.J., Krock, K.J.G., Cruz, J. V., Ettema, L.F., 2017. Effort-based predictors of headwater
     stream conditions: comparing the proximity of land use pressures and instream stressors on
     macroinvertebrate assemblages. Aquat. Sci. 79, 765–781. https://doi.org/10.1007/s00027-
     017-0534-3
Pont, D., Hughes, R.M., Whittier, T.R., Schmutz, S., 2009. A Predictive Index of Biotic Integrity
     Model for Aquatic-Vertebrate Assemblages of Western U.S. Streams. Trans. Am. Fish. Soc.
     138, 292–305. https://doi.org/10.1577/T07-277.1
Pool, S., Vis, M., Seibert, J., 2018. Evaluating model performance: towards a non-parametric
     variant of the Kling-Gupta efficiency. Hydrol. Sci. J. 63, 1941–1953.
     https://doi.org/10.1080/02626667.2018.1552002
Pool, S., Vis, M.J.P.P., Knight, R.R., Seibert, J., 2017. Streamflow characteristics from modeled
     runoff time series - Importance of calibration criteria selection. Hydrol. Earth Syst. Sci. 21,
     5443–5457. https://doi.org/10.5194/hess-21-5443-2017
Poor, C.J., Ullman, J.L., 2010. Using regression tree analysis to improve predictions of low-flow
     nitrate and chloride in Willamette river basin watersheds. Environ. Manage. 46, 771–780.
     https://doi.org/10.1007/s00267-010-9550-y
Pourshahabi, S., Rakhshandehroo, G., Talebbeydokhti, N., Nikoo, M.R., Masoumi, F., 2020.
     Handling uncertainty in optimal design of reservoir water quality monitoring systems.
     Environ. Pollut. 266, 115211. https://doi.org/10.1016/j.envpol.2020.115211
Prasad, A.M., Iverson, L.R., Liaw, A., 2006. Newer classification and regression tree techniques:
     Bagging and random forests for ecological prediction. Ecosystems 9, 181–199.
     https://doi.org/10.1007/s10021-005-0054-1
Price, K., Purucker, S.T., Kraemer, S.R., Babendreier, J.E., 2012. Tradeoffs among watershed
     model calibration targets for parameter estimation. Water Resour. Res. 48, 1–16.
     https://doi.org/10.1029/2012WR012005
Prieto, C., Le Vine, N., Kavetski, D., García, E., Medina, R., 2019. Flow Prediction in Ungauged
     Catchments Using Probabilistic Random Forests Regionalization and New Statistical
                                                 193


    Adequacy Tests. Water Resour. Res. 55, 4364–4392.
    https://doi.org/10.1029/2018WR023254
Pushpalatha, R., Perrin, C., Moine, N. Le, Andréassian, V., 2012. A review of efficiency criteria
    suitable for evaluating low-flow simulations. J. Hydrol. 420–421, 171–182.
    https://doi.org/10.1016/j.jhydrol.2011.11.055
Raschke, A., Hernandez-Suarez, J.S., Nejadhashemi, A.P., Deb, K., 2021. Multidimensional
    Aspects of Sustainable Biofuel Feedstock Production. Sustainability 13, 1424.
    https://doi.org/10.3390/su13031424
Reed, P.M., Hadka, D., Herman, J.D., Kasprzyk, J.R., Kollat, J.B., 2013. Evolutionary
    multiobjective optimization in water resources: The past, present, and future. Adv. Water
    Resour. 51, 438–456. https://doi.org/10.1016/j.advwatres.2012.01.005
Reichert, P., Schuwirth, N., 2012. Linking statistical bias description to multiobjective model
    calibration. Water Resour. Res. 48, 1–20. https://doi.org/10.1029/2011WR011391
Richter, B.D., Baumgartner, J., Wigington, R., Braun, D., 1997. How much water does a river
    need? Freshw. Biol. 37, 231–249. https://doi.org/10.1046/j.1365-2427.1997.00153.x
Richter, B.D., Baumgartner, J. V, Powell, J., Braun, D.P., 1996. A method for assessing
    hydrologic alteration within ecosystems. Conserv. Biol. 10, 1163–1174.
    https://doi.org/10.2307/2387152
Richter, B.D., Mathews, R., Harrison, D.L., Wigington, R., 2003. Ecologically sustainable water
    management: managing river flows for ecological integrity. Ecol. Appl. 13, 206–224.
    https://doi.org/10.1890/1051-0761(2003)013[0206:ESWMMR]2.0.CO;2
Riseng, C., Wiley, M., Black, R., Munn, M., 2011. Impacts of agricultural land use on biological
    integrity: a causal analysis. Ecol. Appl. 21, 3128–3146. https://doi.org/10.1890/11-0077.1
Sadegh, M., Vrugt, J.A., 2014. Approximate Bayesian Computation using Markov Chain Monte
    Carlo simulation. Water Resour. Res. 10, 6767–6787.
    https://doi.org/10.1002/2014WR015386.Received
Sadegh, M., Vrugt, J.A., Xu, C., Volpi, E., 2015. The stationarity paradigm revisited: Hypothesis
    testing using diagnostics, summary metrics, and DREAM (ABC). Water Resour. Res. 51,
    9207–9231. https://doi.org/10.1002/2014WR016805
Sahraei, S., Asadzadeh, M., Unduche, F., 2020. Signature-based multi-modelling and multi-
    objective calibration of hydrologic models: Application in flood forecasting for Canadian
    Prairies. J. Hydrol. 588. https://doi.org/10.1016/j.jhydrol.2020.125095
Sanborn, S.C., Bledsoe, B.P., 2006. Predicting streamflow regime metrics for ungauged
    streamsin Colorado, Washington, and Oregon. J. Hydrol. 325, 241–261.
    https://doi.org/10.1016/j.jhydrol.2005.10.018
                                                194


Sauer, J., Domisch, S., Nowak, C., Haase, P., 2011. Low mountain ranges: Summit traps for
     montane freshwater species under climate change. Biodivers. Conserv. 20, 3133–3146.
     https://doi.org/10.1007/s10531-011-0140-y
Schoups, G., Vrugt, J.A., 2010. A formal likelihood function for parameter and predictive
     inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors.
     Water Resour. Res. 46, 1–17. https://doi.org/10.1029/2009WR008933
Seada, H., Deb, K., 2016. A Unified Evolutionary Optimization Procedure for Single, Multiple,
     and Many Objectives. IEEE Trans. Evol. Comput. 20, 358–369.
     https://doi.org/10.1109/TEVC.2015.2459718
Sengupta, A., Adams, S.K., Bledsoe, B.P., Stein, E.D., McCune, K.S., Mazor, R.D., 2018. Tools
     for managing hydrologic alteration on a regional scale: Estimating changes in flow
     characteristics at ungauged sites. Freshw. Biol. 63, 769–785.
     https://doi.org/10.1111/fwb.13074
Shafii, M., De Smedt, F., 2009. Multi-objective calibration of a distributed hydrological model
     (WetSpa) using a genetic algorithm. Hydrol. Earth Syst. Sci. Discuss. 6, 243–271.
     https://doi.org/10.5194/hessd-6-243-2009
Shafii, M., Tolson, B.A., 2015. Optimizing hydrological consistency by incorporating
     hydrological signatures into model calibration objectives. Water Resour. Res. 51, 3796–
     3814. https://doi.org/10.1002/2014WR016520
Shenton, W., Hart, B.T., Chan, T.U., 2014. A Bayesian network approach to support
     environmental flow restoration decisions in the Yarra River, Australia. Stoch. Environ. Res.
     Risk Assess. 28, 57–65. https://doi.org/10.1007/s00477-013-0698-x
Shrestha, R.R., Peters, D.L., Schnorbus, M.A., 2014. Evaluating the ability of a hydrologic
     model to replicate hydro-ecologically relevant indicators. Hydrol. Process. 28, 4294–4310.
     https://doi.org/10.1002/hyp.9997
Shrestha, R.R., Schnorbus, M.A., Peters, D.L., 2016. Assessment of a hydrologic model’s
     reliability in simulating flow regime alterations in a changing climate. Hydrol. Process. 30,
     2628–2643. https://doi.org/10.1002/hyp.10812
Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis. Chapman &
     Hall/CRC. https://doi.org/10.1201/9781315140919
Sindhya, K., Miettinen, K., Deb, K., 2013. A hybrid framework for evolutionary multi-objective
     optimization. IEEE Trans. Evol. Comput. 17, 495–511.
     https://doi.org/10.1109/TEVC.2012.2204403
Smith, T., Marshall, L., Sharma, A., 2015. Modeling residual hydrologic errors with Bayesian
     inference. J. Hydrol. 528, 29–37. https://doi.org/10.1016/j.jhydrol.2015.05.051
Smucker, N.J., Becker, M., Detenbeck, N.E., Morrison, A.C., 2013. Using algal metrics and
                                                 195


     biomass to evaluate multiple ways of defining concentration-based nutrient criteria in
     streams and their ecological relevance. Ecol. Indic. 32, 51–61.
     https://doi.org/10.1016/j.ecolind.2013.03.018
Snelder, T.H., J. Booker, D., 2013. Natural flow regime classifications are sensitive to definition
     procedures. River Res. Appl. 29, 822–838. https://doi.org/10.1002/rra.2581
Sofi, M.S., Bhat, S.U., Rashid, I., Kuniyal, J.C., 2020. The natural flow regime: A master
     variable for maintaining river ecosystem health. Ecohydrology 1–12.
     https://doi.org/10.1002/eco.2247
Sor, R., Park, Y., Boets, P., Goethals, P.L.M., Lek, S., 2017. Effects of species prevalence on the
     performance of predictive models. Ecol. Modell. 354, 11–19.
     https://doi.org/10.1016/j.ecolmodel.2017.03.006
Sowa, S.P., Herbert, M., Mysorekar, S., Annis, G.M., Hall, K., Nejadhashemi, A.P., Woznicki,
     S.A., Wang, L., Doran, P.J., 2016. How much conservation is enough? Defining
     implementation goals for healthy fish communities in agricultural rivers. J. Great Lakes
     Res. 42, 1302–1321. https://doi.org/10.1016/j.jglr.2016.09.011
Steel, A.E., Peek, R.A., Lusardi, R.A., Yarnell, S.M., 2017. Associating metrics of hydrologic
     variability with benthic macroinvertebrate communities in regulated and unregulated
     snowmelt-dominated rivers. Freshw. Biol. 1–15. https://doi.org/10.1111/fwb.12994
Steel, E.A., Hughes, R.M., Fullerton, A.H., Schmutz, S., Young, J.A., Fukushima, M., Muhar, S.,
     Poppe, M., Feist, B.E., Trautwein, C., 2010. Are we meeting the challenges of landscape-
     scale riverine research? A review. Living Rev. Landsc. Res. 4, 1–60.
     https://doi.org/10.12942/lrlr-2010-1
Stockwell, D., Peters, D., 1999. The GARP modelling system: problems and solutions to
     automated spatial prediction. Int. J. Geogr. Inf. Sci. 13, 143–158.
     https://doi.org/10.1080/136588199241391
Strayer, D.L., Dudgeon, D., 2010. Freshwater biodiversity conservation: recent progress and
     future challenges. J. North Am. Benthol. Soc. 29, 344–358. https://doi.org/10.1899/08-
     171.1
Sui, P., Iwasaki, A., Saavedra V., O.C., Yoshimura, C., 2014. Modelling basin-scale distribution
     of fish occurrence probability for assessment of flow and habitat conditions in rivers.
     Hydrol. Sci. J. 59, 618–628. https://doi.org/10.1080/02626667.2013.827791
Surridge, B.W.J., Bizzi, S., Castelletti, A., 2014. A framework for coupling explanation and
     prediction in hydroecological modelling. Environ. Model. Softw. 61, 274–286.
     https://doi.org/10.1016/j.envsoft.2014.02.012
Sutela, T., Vehanen, T., Jounela, P., 2010. Response of fish assemblages to water quality in
     boreal rivers. Hydrobiologia 641, 1–10. https://doi.org/10.1007/s10750-009-0048-7
                                                 196


Tang, Y., Marshall, L., Sharma, A., Ajami, H., 2018. A Bayesian alternative for multi-objective
     ecohydrological model specification. J. Hydrol. 556, 25–38.
     https://doi.org/10.1016/j.jhydrol.2017.07.040
Tasdighi, A., Arabi, M., Harmel, D., 2018. A probabilistic appraisal of rainfall-runoff modeling
     approaches within SWAT in mixed land use watersheds. J. Hydrol. 564, 476–489.
     https://doi.org/10.1016/j.jhydrol.2018.07.035
Ter Braak, C.J.F., Vrugt, J.A., 2008. Differential Evolution Markov Chain with snooker updater
     and fewer chains. Stat. Comput. 18, 435–446. https://doi.org/10.1007/s11222-008-9104-9
The Nature Conservancy, 2009. Indicators of Hydrologic Alteration Version 7.1 User’s Manual,
     The Nature Conservancy.
Tonkin, J.D., Stoll, S., Sundermann, A., Haase, P., 2014. Dispersal distance and the pool of taxa,
     but not barriers, determine the colonisation of restored river reaches by benthic
     invertebrates. Freshw. Biol. 59, 1843–1855. https://doi.org/10.1111/fwb.12387
Tsai, W.P., Chang, F.J., Herricks, E.E., 2016. Exploring the ecological response of fish to flow
     regime by soft computing techniques. Ecol. Eng. 87, 9–19.
     https://doi.org/10.1016/j.ecoleng.2015.11.015
Tucker, A., Duplisea, D., 2012. Bioinformatics tools in predictive ecology: applications to
     fisheries. Philos. Trans. R. Soc. B Biol. Sci. 367, 279–290.
     https://doi.org/10.1098/rstb.2011.0184
Turschwell, M.P., Stewart-Koster, B., Leigh, C., Peterson, E.E., Sheldon, F., Balcombe, S.R.,
     2017. Riparian restoration offsets predicted population consequences of climate warming in
     a threatened headwater fish. Aquat. Conserv. Mar. Freshw. Ecosyst. 1–12.
     https://doi.org/10.1002/aqc.2864
Tuulaikhuu, B.-A.. b, Guasch, H.., García-Berthou, E.., 2017. Examining predictors of chemical
     toxicity in freshwater fish using the random forest technique. Environ. Sci. Pollut. Res. 24,
     10172–10181. https://doi.org/10.1007/s11356-017-8667-4
US EPA, 2011. A Primer on Using Biological Assessments to Support Water Quality
     Management. EPA 810-R-11-01. https://doi.org/10.1007/s13398-014-0173-7.2
USDA-NASS, 2012. CropScape - NASS CDL Program [WWW Document]. URL
     https://nassgeodata.gmu.edu/CropScape/ (accessed 8.5.18).
USDA-NRCS, 2020. Web Soil Survey [WWW Document]. URL
     https://websoilsurvey.sc.egov.usda.gov/App/WebSoilSurvey.aspx (accessed 7.12.20).
USDA-SCS, 1972. National engineering handbook, section 4: Hydrology. Washington, DC.
USEPA, 2015. Saginaw River and Bay Area of Concern [WWW Document]. URL
     https://www.epa.gov/saginaw-river-bay-aoc (accessed 7.12.17).
                                                 197


USGS, 2020. National Water Information System: Web Interface [WWW Document]. URL
     https://waterdata.usgs.gov/nwis (accessed 7.12.20).
USGS, 2018. The National Map: Elevation [WWW Document]. URL
     https://nationalmap.gov/elevation.html (accessed 10.5.18).
Van Broekhoven, E., Adriaenssens, V., De Baets, B., Verdonschot, P.F.M., 2006. Fuzzy rule-
     based macroinvertebrate habitat suitability models for running waters. Ecol. Modell. 198,
     71–84. https://doi.org/10.1016/j.ecolmodel.2006.04.006
Van Echelpoel, W., Boets, P., Landuyt, D., Gobeyn, S., Everaert, G., Bennetsen, E., Mouton, A.,
     Goethals, P.L.M.M., Echelpoel, W. Van, Boets, P., Landuyt, D., Gobeyn, S., Everaert, G.,
     Bennetsen, E., Mouton, A., Goethals, P.L.M.M., Van Echelpoel, W., Boets, P., Landuyt, D.,
     Gobeyn, S., Everaert, G., Bennetsen, E., Mouton, A., Goethals, P.L.M.M., 2015. Species
     distribution models for sustainable ecosystem management, 1st ed, Developments in
     Environmental Modelling. Elsevier B.V. https://doi.org/10.1016/B978-0-444-63536-
     5.00008-9
Van Sickle, J., Baker, J., Herlihy, A., Bayley, P., Gregory, S., Haggerty, P., Ashkenas, L., Li, J.,
     2004. Projecting the biological condition of streams under alternative scenarios of human
     land use. Ecol. Appl. 14, 368–380. https://doi.org/10.1890/02-5009
Van Sickle, J., Burch Johnson, C., 2008. Parametric distance weighting of landscape influence
     on streams. Landsc. Ecol. 23, 427–438. https://doi.org/10.1007/s10980-008-9200-4
van Werkhoven, K., Wagener, T., Reed, P., Tang, Y., 2009. Sensitivity-guided reduction of
     parametric dimensionality for multi-objective calibration of watershed models. Adv. Water
     Resour. 32, 1154–1169. https://doi.org/10.1016/j.advwatres.2009.03.002
Vander Laan, J.J., Hawkins, C.P., Olson, J.R., Hill, R.A., 2013. Linking land use, in-stream
     stressors, and biological condition to infer causes of regional ecological impairment in
     streams. Freshw. Sci. 32, 801–820. https://doi.org/10.1899/12-186.1
Vapnik, V.N., 2000. The Nature of Statistical Learning Theory, Second. ed, Farming for Health:
     Green-care farming across Europe and the United States of America, Graduate Texts in
     Contemporary Physics. Springer New York, New York, NY. https://doi.org/10.1007/978-1-
     4757-3264-1
Vezza, P., Muñoz-Mas, R., Martinez-Capel, F., Mouton, A., 2015. Random forests to evaluate
     biotic interactions in fish distribution models. Environ. Model. Softw. 67, 173–183.
     https://doi.org/10.1016/j.envsoft.2015.01.005
Vigiak, O., Lutz, S., Mentzafou, A., Chiogna, G., Tuo, Y., Majone, B., Beck, H., de Roo, A.,
     Malagó, A., Bouraoui, F., Kumar, R., Samaniego, L., Merz, R., Gamvroudis, C.,
     Skoulikidis, N., Nikolaidis, N.P., Bellin, A., Acuňa, V., Mori, N., Ludwig, R., Pistocchi, A.,
     2018. Uncertainty of modelled flow regime for flow-ecological assessment in Southern
     Europe. Sci. Total Environ. 615, 1028–1047.
     https://doi.org/10.1016/j.scitotenv.2017.09.295
                                                 198


Vilizzi, L., Price, A., Beesley, L., Gawne, B., King, A.J., Koehn, J.D., Meredith, S.N., Nielsen,
     D.L., 2013. Model development of a Bayesian Belief Network for managing inundation
     events for wetland fish. Environ. Model. Softw. 41, 1–14.
     https://doi.org/10.1016/j.envsoft.2012.11.004
Villeneuve, B., Piffady, J., Valette, L., Souchon, Y., Usseglio-Polatera, P., 2018. Direct and
     indirect effects of multiple stressors on stream invertebrates across watershed, reach and site
     scales: A structural equation modelling better informing on hydromorphological impacts.
     Sci. Total Environ. 612, 660–671. https://doi.org/10.1016/j.scitotenv.2017.08.197
Villeneuve, B., Souchon, Y., Usseglio-Polatera, P., Ferréol, M., Valette, L., 2015. Can we
     predict biological condition of stream ecosystems? A multi-stressors approach linking three
     biological indices to physico-chemistry, hydromorphology and land use. Ecol. Indic. 48,
     88–98. https://doi.org/10.1016/j.ecolind.2014.07.016
Vis, M., Knight, R., Pool, S., Wolfe, W., Seibert, J., 2015. Model calibration criteria for
     estimating ecological flow characteristics. Water (Switzerland) 7, 2358–2381.
     https://doi.org/10.3390/w7052358
Vogel, R.M., Fennessey, N.M., 1994. Flow‐Duration Curves. I: New Interpretation and
     Confidence Intervals. J. Water Resour. Plan. Manag. 120, 485–504.
     https://doi.org/10.1061/(ASCE)0733-9496(1994)120:4(485)
Vogel, R.M., Sieber, J., Archfield, S.A., Smith, M.P., Apse, C.D., Huber-Lee, A., 2007.
     Relations among storage, yield, and instream flow. Water Resour. Res. 43, 1–12.
     https://doi.org/10.1029/2006WR005226
Vörösmarty, C.J., McIntyre, P.B., Gessner, M.O., Dudgeon, D., Prusevich, A., Green, P.,
     Glidden, S., Bunn, S.E., Sullivan, C.A., Liermann, C.R., Davies, P.M., 2010. Global threats
     to human water security and river biodiversity. Nature 467, 555–561.
     https://doi.org/10.1038/nature09440
Vrugt, J.A., 2016. Markov chain Monte Carlo simulation using the DREAM software package:
     Theory, concepts, and MATLAB implementation. Environ. Model. Softw. 75, 273–316.
     https://doi.org/10.1016/j.envsoft.2015.08.013
Vrugt, J.A., Beven, K.J., 2018. Embracing equifinality with efficiency: Limits of Acceptability
     sampling using the DREAM(LOA) algorithm. J. Hydrol. 559, 954–971.
     https://doi.org/10.1016/j.jhydrol.2018.02.026
Vrugt, J.A., Ter Braak, C.J.F., Diks, C.G.H., Robinson, B.A., Hyman, J.M., Higdon, D., 2009.
     Accelerating Markov chain Monte Carlo simulation by differential evolution with self-
     adaptive randomized subspace sampling. Int. J. Nonlinear Sci. Numer. Simul. 10, 273–290.
     https://doi.org/10.1515/IJNSNS.2009.10.3.273
Wagenhoff, A., Liess, A., Pastor, A., Clapcott, J.E., Goodwin, E.O., Young, R.G., 2016.
     Thresholds in ecosystem structural and functional responses to agricultural stressors can
     inform limit setting in streams. Freshw. Sci. 36, 000–000. https://doi.org/10.1086/690233
                                                  199


Waite, I.R., Brown, L.R., Kennen, J.G., May, J.T., Cuffney, T.F., Orlando, J.L., Jones, K.A.,
    2010. Comparison of watershed disturbance predictive models for stream benthic
    macroinvertebrates for three distinct ecoregions in western US. Ecol. Indic. 10, 1125–1136.
    https://doi.org/10.1016/j.ecolind.2010.03.011
Waite, I.R., Kennen, J.G., May, J.T., Brown, L.R., Cuffney, T.F., Jones, K.A., Orlando, J.L.,
    2014. Stream macroinvertebrate response models for bioassessment metrics: Addressing the
    issue of spatial scale. PLoS One 9. https://doi.org/10.1371/journal.pone.0090944
Waite, I.R., Kennen, J.G., May, J.T., Brown, L.R., Cuffney, T.F., Jones, K.A., Orlando, J.L.,
    2012. Comparison of Stream Invertebrate Response Models for Bioassessment Metrics. J.
    Am. Water Resour. Assoc. 48, 570–583. https://doi.org/10.1111/j.1752-1688.2011.00632.x
Waite, I.R., Van Metre, P.C., 2017. Multistressor predictive models of invertebrate condition in
    the Corn Belt, USA. Freshw. Sci. 36, 000–000. https://doi.org/10.1086/694894
Waldron, A., Miller, D.C., Redding, D., Mooers, A., Kuhn, T.S., Nibbelink, N., Roberts, J.T.,
    Tobias, J.A., Gittleman, J.L., 2017. Reductions in global biodiversity loss predicted from
    conservation spending. Nature 551, 364–367. https://doi.org/10.1038/nature24295
Walker, K.F., Sheldon, F., Puckridge, J.T., 1995. A perspective on dryland river ecosystems.
    Regul. Rivers Res. Manag. 11, 85–104. https://doi.org/10.1002/rrr.3450110108
Walters, D.M., Roy, A.H., Leigh, D.S., 2009. Environmental indicators of macroinvertebrate and
    fish assemblage integrity in urbanizing watersheds. Ecol. Indic. 9, 1222–1233.
    https://doi.org/10.1016/j.ecolind.2009.02.011
Wang, L., Robertson, D.M., Garrison, P.J., 2007. Linkages between nutrients and assemblages of
    macroinvertebrates and fish in wadeable streams: Implication to nutrient criteria
    development. Environ. Manage. 39, 194–212. https://doi.org/10.1007/s00267-006-0135-8
Wenger, S.J., Luce, C.H., Hamlet, A.F., Isaak, D.J., Neville, H.M., 2010. Macroscale hydrologic
    modeling of ecologically relevant flow metrics. Water Resour. Res. 46, 1–10.
    https://doi.org/10.1029/2009WR008839
Westerberg, I.K., McMillan, H.K., 2015. Uncertainty in hydrological signatures. Hydrol. Earth
    Syst. Sci. 19, 3951–3968. https://doi.org/10.5194/hess-19-3951-2015
Westerberg, I.K., Wagener, T., Coxon, G., McMillan, H.K., Castellarin, A., Montanari, A.,
    Freer, J., 2016. Uncertainty in hydrological signatures for gauged and ungauged catchments.
    Water Resour. Res. 52, 1847–1865. https://doi.org/10.1002/2015WR017635
While, L., Bradstreet, L., Barone, L., 2016. Walking Fish Group: Hypervolume Project [WWW
    Document]. URL http://www.wfg.csse.uwa.edu.au/hypervolume/ (accessed 10.10.17).
While, L., Bradstreet, L., Barone, L., 2012. A fast way of calculating exact hypervolumes. IEEE
    Trans. Evol. Comput. 16, 86–95. https://doi.org/10.1109/TEVC.2010.2077298
                                                200


Williams, J.R., 1969. Flood Routing With Variable Travel Time or Variable Storage
     Coefficients. Trans. ASAE 12, 100–103. https://doi.org/10.13031/2013.38772
Willmott, C.J., 1981. On the validation of models. Phys. Geogr. 2, 184–194.
     https://doi.org/10.1080/02723646.1981.10642213
Wold, S., Sjöström, M., Eriksson, L., 2001. PLS-regression: A basic tool of chemometrics.
     Chemom. Intell. Lab. Syst. 58, 109–130. https://doi.org/10.1016/S0169-7439(01)00155-1
Woznicki, S.A., Nejadhashemi, A.P., Abouali, M., Herman, M.R., Esfahanian, E., Hamaamin,
     Y.A., Zhang, Z., 2016a. Ecohydrological modeling for large-scale environmental impact
     assessment. Sci. Total Environ. 543, 274–286.
     https://doi.org/10.1016/j.scitotenv.2015.11.044
Woznicki, S.A., Nejadhashemi, A.P., Ross, D.M., Zhang, Z., Wang, L., Esfahanian, A.-H.H.,
     2015. Ecohydrological model parameter selection for stream health evaluation. Sci. Total
     Environ. 511, 341–353. https://doi.org/10.1016/j.scitotenv.2014.12.066
Woznicki, S.A., Nejadhashemi, A.P., Tang, Y., Wang, L., 2016b. Large-scale climate change
     vulnerability assessment of stream health. Ecol. Indic. 69, 578–594.
     https://doi.org/10.1016/j.ecolind.2016.04.002
WWAP-UN, 2017. The United Nations World Water Development Report 2017, Wastewater:
     The Untapped Resource. UNESCO, Paris.
Xiong, M., Liu, P., Cheng, L., Deng, C., Gui, Z., Zhang, X., Liu, Y., 2019. Identifying time-
     varying hydrological model parameters to improve simulation efficiency by the ensemble
     Kalman filter: A joint assimilation of streamflow and actual evapotranspiration. J. Hydrol.
     568, 758–768. https://doi.org/10.1016/j.jhydrol.2018.11.038
Yang, H.C., Suen, J.P., Chou, S.K., 2016. Estimating the Ungauged Natural Flow Regimes for
     Environmental Flow Management. Water Resour. Manag. 30, 4571–4584.
     https://doi.org/10.1007/s11269-016-1437-0
Yi, Y., Cheng, X., Yang, Z., Wieprecht, S., Zhang, S., Wu, Y., 2017. Evaluating the ecological
     influence of hydraulic projects: A review of aquatic habitat suitability models. Renew.
     Sustain. Energy Rev. 68, 748–762. https://doi.org/10.1016/j.rser.2016.09.138
Yilmaz, K.K., Gupta, H. V., Wagener, T., 2008. A process-based diagnostic approach to model
     evaluation: Application to the NWS distributed hydrologic model. Water Resour. Res. 44,
     1–18. https://doi.org/10.1029/2007WR006716
You, G.J.Y., Thum, B.H., Lin, F.H., 2014. The examination of reproducibility in hydro-
     ecological characteristics by daily synthetic flow models. J. Hydrol. 511, 904–919.
     https://doi.org/10.1016/j.jhydrol.2014.02.047
Zadeh, L. a., 1994. Soft computing and fuzzy logic. Software, IEEE 48–56.
     https://doi.org/10.1109/52.329401
                                                201


Zadeh, L.A., 1998. Roles of Soft Computing and Fuzzy Logic in the Conception, Design and
    Deployment of Information/Intelligent Systems, in: Kaynak, O., Zadeh, L.A., Türk\csen, B.,
    Rudas, I.J. (Eds.), Computational Intelligence: Soft Computing and Fuzzy-Neuro
    Integration with Applications. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 1–9.
Zeleny, M., 2011. Multiple Criteria Decision Making (MCDM): From Paradigm Lost to
    Paradigm Regained? J. Multi-Criteria Decis. Anal. 18, 77–89.
    https://doi.org/10.1002/mcda.473
Zhang, Y., Shao, Q., Zhang, S., Zhai, X., She, D., 2016. Multi-metric calibration of hydrological
    model to capture overall flow regimes. J. Hydrol. 539, 525–538.
    https://doi.org/10.1016/j.jhydrol.2016.05.053
Zhao, J., Cao, J., Tian, S., Chen, Y., Zhang, S., Wang, Z., Zhou, X., 2014. A comparison
    between two GAM models in quantifying relationships of environmental variables with fish
    richness and diversity indices. Aquat. Ecol. 48, 297–312. https://doi.org/10.1007/s10452-
    014-9484-1
Zuur, A.F., Ieno, E.N., Smith, G.M., 2007. Analysing Ecological Data, Statistics for Biology and
    Health. Springer New York, New York, NY, NY. https://doi.org/10.1007/978-0-387-45972-
    1
Zuur, A.F., Leno, E.N., Walker, N.J., Saveliev, A.A., Smith, G.M., Ieno, E.N., Walker, N.J.,
    Saveliev, A.A., Smith, G.M., 2009. Mixed effects models and extensions in ecology with R,
    Public Health, Statistics for Biology and Health. Springer New York, New York, NY.
    https://doi.org/10.1007/978-0-387-87458-6
                                                 202