MONITORING AND MODELING ECOHYDROLOGICAL PROCESSES IN
                VEGETATED WATERSHEDS
                                 By
                         Leo Triet Pham
                             A THESIS
                           Submitted to
                   Michigan State University
            in partial fulfillment of the requirements
                         for the degree of
                  Forestry – Master of Science
                                2022


                                        ABSTRACT
        MONITORING AND MODELING ECOHYDROLOGICAL PROCESSES IN
                                VEGETATED WATERSHEDS
                                              By
                                        Leo Triet Pham
Ecohydrology links ecological and hydrological processes and considers interactions between
water resources and ecosystems. Modeling tools are not only important for studying the
mechanisms of ecological patterns and processes but also for assessing the effects of environ-
mental change on hydrological and ecological processes, providing insights and solutions to
issues in water management. This thesis explores various data-driven approaches to monitor
and model these processes at 95 watersheds in western USA using a combination of seasonal
and annual climate, hydrometric, and remotely sensed vegetation data. In one analysis, we
show that a trend in earlier peak in spring vegetation activity may be a linked to reduced
runoff availability during drought years compared to non-drought years. We also provide
evidence that an increase drought severity is consistent with a decrease in runoff ratio in
forested catchments through regression analysis, supporting the hypothesis that the rela-
tionship among water-balance components may shift during hydrological drought events. In
another analysis, we show that the type and amount of vegetation coverage, among other
catchment characteristics, can affect the accuracy of data-driven runoff models. These re-
sults suggest that a better understanding of the ecohydrologic processes and characteristics is
vital to development of effective long-term strategies to improve the resilience of watersheds.


Copyright by
LEO TRIET PHAM
2022


                              ACKNOWLEDGEMENTS
I would like to thank my advisers, Dr. Andrew Finley and Dr. Lifeng Luo, for their academic
guidance and professional support. I’d also like to acknowledge my committee members Dr.
Asia Dowtin for her invaluable comments and criticisms of my thesis work.
                                              iv


                              TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     vii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     viii
LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .          xi
CHAPTER 1 ECOHYDROLOGICAL IMPLICATIONS OF DROUGHTS IN SIERRA
              NEVADA WATERSHEDS . . . . . . . . . . . . . . . . . . . . . . . .                1
   1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    1
   1.2 Methods and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      4
       1.2.1 Study area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      4
       1.2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      9
       1.2.3 Quantifying droughts using standardized precipitation evapotranspi-
              ration index (SPEI) . . . . . . . . . . . . . . . . . . . . . . . . . . . .    10
       1.2.4 Assessing drought impacts of catchment water balances using the
              Budyko framework . . . . . . . . . . . . . . . . . . . . . . . . . . . .       12
   1.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    13
       1.3.1 Characterizing annual drought using 12-month SPEI . . . . . . . . .             13
       1.3.2 Trends in vegetation response to drought at annual timescale . . . . .          14
       1.3.3 Impact of drought on precipitation-runoff relationship and catch-
              ment water balance . . . . . . . . . . . . . . . . . . . . . . . . . . . .     15
       1.3.4 Catchment balance under the Budyko framework . . . . . . . . . . .              19
       1.3.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     21
   1.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   21
CHAPTER 2 EVALUATION OF RANDOM FORESTS FOR SHORT-TERM DAILY
              STREAMFLOW FORECASTING IN RAINFALL AND SNOWMELT-
              DRIVEN WATERSHEDS . . . . . . . . . . . . . . . . . . . . . . . .              24
   2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  24
   2.2 Methods and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    29
       2.2.1 Random forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      29
       2.2.2 Variable importance in random forests . . . . . . . . . . . . . . . . .         31
       2.2.3 Benchmark models . . . . . . . . . . . . . . . . . . . . . . . . . . . .        32
       2.2.4 Performance evaluation criteria . . . . . . . . . . . . . . . . . . . . .       33
       2.2.5 Study area: Pacific Northwest Hydrologic Region . . . . . . . . . . .           35
       2.2.6 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    36
              2.2.6.1 Streamflow . . . . . . . . . . . . . . . . . . . . . . . . . . .       36
              2.2.6.2 Precipitation . . . . . . . . . . . . . . . . . . . . . . . . . .      38
       2.2.7 Snow water equivalent and temperatures . . . . . . . . . . . . . . . .          38
              2.2.7.1 Predictor selection . . . . . . . . . . . . . . . . . . . . . . .      39
   2.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    40
                                             v


       2.3.1 Parameter tuning . . . . . . . . . . . . . . . . . . . . . . .    . . . . . . 40
       2.3.2 Benchmark RF against MLR and naı̈ve models . . . . . .            . . . . . . 42
       2.3.3 Evaluation of RF overall performance . . . . . . . . . . . .      . . . . . . 44
       2.3.4 RF performance on extreme streamflows . . . . . . . . . .         . . . . . . 46
       2.3.5 Analysis of variable importance . . . . . . . . . . . . . . .     . . . . . . 48
       2.3.6 Effects of watershed characteristics on model performance         . . . . . . 50
       2.3.7 Limitations and future research . . . . . . . . . . . . . . .     . . . . . . 53
   2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     60
                                            vi


                                   LIST OF TABLES
Table 1.1 Summary statistics of selected hydroclimatic, topographic, and vegeta-
          tion characteristics for the 9 watersheds under study. . . . . . . . . . . . .     5
Table 1.2 Drought level classifications based on SPEI [Yao et al., 2018].       . . . . . . 11
Table 1.3 Mean runoff ratio for drought and non-drought years for water years
          1987-2018. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  23
Table 2.1 List of predictors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Table 2.2 The optimized parameter mtry using exhaustive-search strategy (mtry
          = {1, 2, 6, 7, 8} were considered but not found as the optimal value at
          any gauge). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   42
Table 2.3 Descriptive statistics of the four criteria used to evaluate the overall
          performance of RF: R2 , KGE, MAE, and RMSE. . . . . . . . . . . . . . .           45
Table 2.4 Pearson correlation coefficient between KGE scores and selected basin
          physical characteristics. Bolded value indicates the relationship is signif-
          icant at 5 percent or 1 percent level. . . . . . . . . . . . . . . . . . . . . .  53
                                             vii


                                   LIST OF FIGURES
Figure 1.1  Partitioning of water at catchment scale (Figure adapted from Vose
            et al. [2016]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3
Figure 1.2  Spatial map of nine watersheds in the study. . . . . . . . . . . . . . . . .       6
Figure 1.3  Annual hydrographs for nine watersheds. Values reflect monthly average
            over the 30-year period between 1989 and 2018. . . . . . . . . . . . . . .         7
Figure 1.4  Land-use classification for the 9 watersheds based on National Land
            Cover Database 2006 Classification. . . . . . . . . . . . . . . . . . . . . .      8
Figure 1.5  Long-term annual water balance represented by theoretical Budyko frame-
            work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Figure 1.6  12-month SPEI time series for 30 years between 1989 and 2018. . . . . .           13
Figure 1.7  Annual SPEI for the period 1989-2018. Annual SPEI is calculated as
            the cumulative water balance for the 12-month period in the water year.
            Drought years with annual SPEI < -0.5 are indicated by black dots. . . .          14
Figure 1.8  Growing season EVI (May-Sep) during drought and non-drought con-
            ditions for water years between 2001 and 2018. Drought years were
            identified using the annual SPEI value. Solid lines indicate the median
            values and shaded areas represent the Q1 and Q3 of the monthly values. .          15
Figure 1.9  Fraction of monthly runoff to annual precipitation for the period 2001-
            2018 for non-drought and drought years. . . . . . . . . . . . . . . . . .         16
Figure 1.10 Fraction of monthly ET to annual precipitation in the period 2001-2018
            for non-drought and drought years. . . . . . . . . . . . . . . . . . . . . .      17
Figure 1.11 Annual runoff ratio plotted against annual SPEI for 1989-2019. Best-fit
            line was obtained using simple linear regression. . . . . . . . . . . . . . .     18
Figure 1.12 Interannual variability of catchment water balance under the Budyko
            framework for water years in period 2001-2018. Black dot is the catch-
            ment centroid and indicates 18-year average. . . . . . . . . . . . . . . . .      20
                                              viii


Figure 2.1 (a) Elevation (m) shading map showing the Pacific Northwest Hydro-
           logic Unit, 86 selected stream gauges (triangles), and their drainage area
           (cyan delineation lines), and SNOTEL stations (brown squares). Ex-
           amples of annual hydrographs of (b) rainfall-dominated, (c) transient,
           and (d) snowmelt-dominated watersheds. Figures (b-d) are based on
           2009-2018 daily flow data at three sites 12043300 (48.2◦ N, 124.4◦ W),
           12048000 (48◦ N, 123.1◦ W), and 10396000 (42.7◦ N, 118.9◦ W). . . . . .       27
Figure 2.2 Structure of a RF and relevant parameters. . . . . . . . . . . . . . . . . .  29
Figure 2.3 Gauge locations with color gradient indicating variations in (a) drainage
           area (km2 ), watershed mean elevation (m), (c) annual precipitation
           (cm), and (d) annual mean temperature (◦ C). . . . . . . . . . . . . . . .    36
Figure 2.4 Flowchart showing the input-output model using RF. . . . . . . . . . . .      41
Figure 2.5 Out-of-bag mean absolute error plotted against mtry during optimal
           parameter search at Carbon River Watershed (USGS site 12094000). . .          41
Figure 2.6 Boxplots for Pearson correlation coefficient between forecasted and ob-
           served values for three models: RF, naı̈ve, and MLR across three flow
           regimes. Two-sample Wilcoxon rank-sum significance tests are per-
           formed and p-value (in black) are included for each pair of models. . . .     43
Figure 2.7 Pairwise scatter plots of Pearson correlation coefficient between fore-
           casted and observed values among watersheds for (a) RF vs. naı̈ve
           model, (b) RF vs MLR, and (c) MLR vs. naı̈ve model. Each dot repre-
           sents a watershed (n=86). . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 2.8 Streamflow daily forecast scores computed over the validation period for
           RF model in four metrics: R-squared, KGE, MAE, and RMSE. . . . . .            45
Figure 2.9 The probability of detection (POD) plotted against the false alarm rate
           (FAR) for three extreme thresholds: 90th , 95th , and 99th percentiles.
           Thin black line connects values from the same watershed. (Vertical
           axis) Number of times RF correctly forecasted events that exceeded the
           threshold divided by the total number of exceedance. (Horizontal axis)
           Number of times RF incorrectly forecasted events that exceeded the
           threshold divided by the total number of non-exceedance. It is noted
           that the scales of the horizontal and vertical axes are not 1-to-1 in the
           plotted partial receiver operating characteristic (ROC) curve. . . . . . .    47
                                            ix


Figure 2.10 Barplots show importance of predictor variables using (a-c) MDA and
            (d-f) MDI criteria. Length of the blue bars indicates the median value
            across the watersheds for each flow regime and the thin black bar rep-
            resents the full range of the values. . . . . . . . . . . . . . . . . . . . . .  48
Figure 2.11 KGE scores plotted against (a) the average percent of slope and (b) the
            average percent of sand in soil at each watershed. Best-fit lines were de-
            termined using simple linear regression. Pearson correlation coefficients
            were computed with associated significance. . . . . . . . . . . . . . . . .      51
Figure A.1 Correlation between EVI and 1-, 3-, 6-, 12-month SPEI values during
            the growing season (May-Sep) for the period 2001-2018. EVI time series
            were standardized, according to the average and the standard deviation
            of the values for each month. Relationship is considered significant at
            α = 0.05). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Figure A.2 Correlation between streamflow elasticity (ϵ) and selected catchment
            characteristics. Blank tiles indicate the relationship is not significant at
            α = 0.05. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  59
                                               x


                             LIST OF ALGORITHMS
Algorithm 2.1 Building a regression RF . . . . . . . . . . . . . . . . . . . . . . . . . 30
                                          xi


                                        CHAPTER 1
     ECOHYDROLOGICAL IMPLICATIONS OF DROUGHTS IN SIERRA
                                NEVADA WATERSHEDS
1.1      Introduction
Drought is considered as a sustained period of less water compared to normal conditions of
a region. Depending on the severity and duration, these periods of water deficit can have
important implications on human activities and natural systems. In forested catchments,
the relationships among drought, streamflow dyanmics, groundwater recharge and vegetation
response are not straightforward because forest ecosystems have a large capacity to regulate
precipitation partitioning through the evapotranspiration process (Sun et al., 2011) and are
able to withstand certain disturbances through various physiological, morphological, and
behavioral adaptations [Lytle and Poff, 2004]. As more severe and widespread droughts are
projected for the 21st century [Dai, 2013], examining these interactions and feedbacks can
provides a valuable ecohydrological context for understanding and evaluating the impacts
of drought on water resources and management practices [Rodriguez-Iturbe, 2000, Brauman
et al., 2007].
    In the Mediterranean climate regions such as the Sierra Nevada, California, a substan-
tial fraction of precipitation during the wet winter (November-April) provides the water
supply to recharge reservoirs, replenishing groundwater, and build snowpack that provides
water storage for the dry summer season (June-August). The recent multi-year droughts in
California have been characterized by both large precipitation deficits and abnormally high
temperatures during both wet and dry seasons [Swain, 2015, Luo et al., 2017]. The combined
effect of warming temperature and variable precipitation due to climate change is expected
to intensify drought and prolong periods of water stress [Diffenbaugh et al., 2015, Barnett
et al., 2005]. Most studies agree that decreases in mean annual flow, earlier snowmelt runoff,
                                               1


and higher evaporative demand are expected [Medellı́n-Azuara et al., 2008, Vicuña et al.,
2008]. Because of the heterogeneity in topographic and physiographic features, vegetation
types and structure, and their interaction with local climate, individual watersheds within
this region will likely have different sensitivity where hydrologic responses to drought can be
either mitigated or exacerbated by forest vegetation depending upon vegetation water use
[Vose et al., 2016].
     It is well established that forests grow where the water supply is sufficient to support
perennial woody vegetation, as evapotranspiration (the sum of interception, transpiration
and soil evaporation) is much greater in forest systems compared to other vegetation types
[Frank and Inouye, 1994]. As drought is directly related to the balance between soil water
supply from incoming precipitation and tree water demand, the simple water balance equa-
tion provides an intuitive framework to assess the relationships among different pools and
partitioning of precipitation under different conditions.
     At annual and longer time scales, the water balance for a catchment can be written as:
                                        P = ET + Q + ∆S                                      (1.1)
     where P is precipitation, ET is evapotranspiration, Q is surface runoff measured as
streamflow, and ∆S represents the change in subsurface storage within the watershed (Fig.
1.1). The sensitivity of streamflow (Q) to drought can be framed by examining how much
evaporative and transpiration losses occur relative to the total precipitation. During drought
conditions, reduction in P, increase in evaporative and water demand in trees can contribute
to reduction to Q [D’Amato et al., 2013, Vicuna and Dracup, 2007]. However, the effect of
these factors can vary greatly in time and space depending on the intensity, frequency, and
type of precipitation input and the timing of water deficit. It is expected that in moun-
tainous snowmelt-dominated regions less water is lost to evaporation and potential ET has
little effect on the total annual runoff as the precipitation is delivered efficiently in forms of
meltwater in large pulses [Wolock and McCabe, 1999]. Barnett et al. [2005] argued that the
increased soil moisture earlier in the season due to melt happens when potential evaporation
                                                  2


Figure 1.1 Partitioning of water at catchment scale (Figure adapted from Vose et al.
[2016]).
(dominated by net radiation) is low, thus attenuating the effects of ET changes to runoff
production in these regions. However, more recent works show that warming temperature
condition result in accelerating mountain vegetation growth and ET [Goulden and Bales,
2014], and longer growing season in lower elevations [Hunsaker et al., 2012], which in turn
lead to runoff vulnerability as a greater proportion of snowmelt is converted to ET. Bales
et al. [2018] summarized the mechanisms in which droughts can shift the P-Q relationship
in mountainous catchments which include both priority of partitioning of precipitation into
ET vs. discharge during drought, warmer than normal conditions creating higher vegetation
evaporative demand, and spatial heterogeneity across the watershed and sources of P. These
mechanisms highlight the relative importance of ET and vegetation dynamics in catchment
response to drought. Thus, integrating vegetation’s response to drought can benefit our
                                              3


understanding of the partitioning of water and drought impacts.
    Most studies that highlight the impact of droughts in California and Sierra Nevada have
been conducted at global and regional scales [Mann and Gleick, 2015, Diffenbaugh et al.,
2015]). However, there is still a dearth in little research conducted at the catchment scale.
Moreover, Saft et al. [2015] found that local catchment properties such as mean slope and
percentage of woody cover can play a role in changes in P-Q relationship induced by drought.
In this study, we investigate of the drought conditions on the on water resources and vegeta-
tion dynamics in the 9 selected Sierra Nevada catchments. We are particularly interested in
testing the hypothesis that the annual relationship between P and Q shifts during drought
conditions compared to non-drought. The study consists of an empirical analysis of seasonal
and annual hydrometric, and remotely sensed vegetation index and ET data supported by
spatial information on catchment characteristics. As the hydrological processes in mountains
are thus highly sensitive to changes in climate particularly drought [Beaulieu et al., 2016],
understanding processes that control the water balance in mountainous regions is crucial
and relevant for water resource management and can improve future hydrological model
predictions.
1.2     Methods and data
1.2.1    Study area
California’s Sierra Nevada mountain range runs north-south, separating California’s Central
Valley from the Basin and Range province to the east. Due to the rain-shadowed effect, the
eastern side receives less annual precipitation and is drier compared to western side. The
southern part of the Sierra Nevada is generally higher, with elevations greater than 4,000 m
at the crest, while the northern part is generally less than 3,000 m at peak elevations.
                                               4


    Table 1.1 Summary statistics of selected hydroclimatic, topographic, and vegetation characteristics for the 9 watersheds under
    study.
    No.    STAID          Name       Area (km2 ) Mean elevation (m)   Annual P (mm) Annual T (◦ C)   Runoff ratio   Percentage of forest   Major forest type
     1    10336645    General Creek     19.59         2185.5              1201          5.76            0.55               78.38             Mixed conifer
     2    10336660   Blackwood Creek    29.79         2216.39             1486          5.78            0.64               70.15                Red fir
5    3    10336676     Ward Creek       24.71         2218.42             1549          5.71            0.51               77.39             Mixed conifer
     4    10343500    Sagehen Creek     27.60         2159.80              976           5.2            0.33               88.28             Mixed conifer
     5    11237500    Pitman Creek      59.82          2437               1175          5.65            0.54               86.99             Mixed conifer
     6    11230500      Bear Creek     135.53         3244.58             1149          0.43            0.56                26.4            Lodgepole pine
     7    11315000      Cole Creek      54.01         2261.55             1410          7.10            0.74               55.16             Mixed conifer
     8    10308200   EF Carson River   716.42         2410.54              980          4.82            0.43               41.20            Lodgepole pine
     9    11266500     Merced River    467.98         2746.07             1198          6.08            0.58               42.02             Mixed conifer


                   Figure 1.2 Spatial map of nine watersheds in the study.
    In this study, we consider 9 small and medium-sized mountainous watersheds that repre-
sent the variability in topography, vegetation coverage, and elevation gradients of the Sierra
Nevada (Fig. 1.2). These include Bear Creek, Blackwood Creek, Cole Creek, East Folk (EF)
Carson River, General Creek, Merced River at Happy Isles, Pitman Creek, Sagehen Creek,
and Ward Creek. From north to south, Sagehen Creek, Ward Creek, Blackwood Creek, and
General Creek are tributaries of the Truckee River systems and are small catchments located
on the eastern slope of Sierra Nevada. Cole Creek is part of the North Fork Mokelumne River
that flows west and contributes to Salt Springs Reservoir [Silverman, 2010]. EF Carson River
and Merced River basins are larger catchments with drainage areas of approximately 700
km2 and 500 km2 respectively where the Carson River drains the rain-shadowed eastern
slope of and the Merced River drains the wetter, western slopes [Dettinger et al., 2004]. The
majority of the streamflow comes from the snowpack in these watersheds. Bear Creek and
Pitman Creek are in the more south-central region of the Sierra Nevada and part of the San
Joaquin River system. Their drainage areas are in the range 19.6 km2 -716.4 km2 and ele-
                                              6


Figure 1.3 Annual hydrographs for nine watersheds. Values reflect monthly average over
the 30-year period between 1989 and 2018.
vation of 2159.8-3244.56 m. These watersheds have been referenced to have least-disturbed
hydrologic condition and are free from current human influences like dams, diversions, and
major land-use changes [Falcone, 2011]. They have been included in previous hydroclimatic
studies that examined the long-term changes in discharge [Stewart et al., 2005, Peterson
et al., 2005, Yang et al., 2018, Krogh et al., 2020] and forest and ecological monitoring
[Podolak et al., 2015, Stevens et al., 2016, Loheide et al., 2009]. The 30-year climate normal
(1980–2010) from PRISM indicate mean annual precipitation in the range of 980 - 1550 mm.
Monthly precipitation, temperature, and runoff for individual watersheds are shown in Fig.
1.3. These watersheds have varying vegetation coverage and types with a combination of
woody plants, shrubs, and meadows (Fig. 1.4, Table 1.1). Dominant tree species include
mixed conifers such as lodgepole pine (Pinus contorta), Ponderosa pine (Pinus ponderosa),
Jeffrey pine (Pinus jeffreyi), Douglas fir (Pseudotsuga menziesii), red fir (Abies magnifica),
and incense cedar (Calocedrus decurrens) [North, 2012]. Chaparral and montane shrubs are
also common at lower elevation foothills. Soils are either granitic or volcanic, which have
origins from glaciers deposits.
                                               7


Figure 1.4 Land-use classification for the 9 watersheds based on National Land Cover
Database 2006 Classification.
                                              8


1.2.2    Data
Monthly precipitation and temperature observations were obtained from the AN81m PRISM
dataset [Daly et al., 2008]. This gridded dataset has a resolution of 4-km, covers the entire
continental US from January 1981 to present, and is continuously updated every 6 months.
Catchment-average time series were constructed by computing the arithmetic mean for pre-
cipitation and temperature values of all grid points that fall within watershed polygon. 31
years of record between 1988 and 2018 were considered in the study. These time series were
later used to compute Standardized Precipitation Evapotranspiration Index (SPEI) values
(described in Sect. 1.2.3).
    Streamflow data was obtained through the USGS National Water Information System
(NWIS) (https://waterdata.usgs.gov/nwis/sw) for the 9 watersheds. The observation
records are mostly complete with less than 5 days of missing data in a given year. Daily
discharge were aggregated to obtain monthly and annual values for the water years in the
period 1989-2018. These values were then normalized by catchment area.
    Two vegetation indices are currently produced from the the Moderate Resolution Imaging
Spectrometer (MODIS) sensor, Enhanced Vegetation Index (EVI) and Normalized Difference
Vegetation Index (NDVI). For our study, we chose EVI over the more commonly used NDVI
because EVI has been reported to be less sensitive to soil and atmospheric effects than
NDVI and remains sensitive to increases in vegetation density beyond where NDVI becomes
saturated [Huete et al., 2002, Waring et al., 2006]. EVI values were obtained from MODIS at
500-m spatial resolution and 16-day compositing period from the MOD13Q1 dataset [Didan,
2015]. We focus on the vegetation activity during the growing season (May-Sep) as these
are times when demand for water is high. To ensure quality of the pixels and reduce the
possible bias in the resulting EVI values, we removed cloud and snow contaminated pixels
before computing watershed average. As vegetation EVI typically are in the range of 0-1, we
removed pixel with values outside of this range to minimize the possibility of added noise in
the data. All remote sensing images were processed using Google Earth Engine. Due to the
                                                9


availability of the data, 18 years of EVI between 2001 and 2018 water years are used in our
study. We also excluded the month of April in our analysis due to large number of pixels are
still covered with snow. 16-day values were linearly interpolated into daily time series and
subsequently aggregated to obtain monthly EVI value for the months of May-Sep for each
watershed.
     Total ET was retrieved from MOD16A2 Version 6 [Mu et al., 2013] at 8-day temporal
and 500-m pixel resolutions for each watershed for water years between 2001 and 2018.
A water year begins on October 1st of the previous year and ends on September 30th.
MODIS ET was processed in a similar procedure to EVI data. The algorithm used for the
MOD16A2 data product collection is based on the logic of the Penman-Monteith equation,
which includes inputs of daily meteorological reanalysis data along with MODIS remotely
sensed data products such as vegetation property dynamics, albedo, and land cover. Cloud-
contaminated pixels were included from our analysis. 8-day ET time series at each watershed
was linearly interpolated to obtain monthly values.
1.2.3    Quantifying droughts using standardized precipitation evapotranspiration
         index (SPEI)
SPEI was first proposed by Vicente-Serrano et al. [2010] as an improved drought index of
SPI and has since been widely used in many studies to capture drought propogations and
reconstructions [Allen et al., 2011, Li et al., 2012, Labudová et al., 2017, Manzano et al.,
2019]. An advantage of SPEI over SPI is that it accounts for the effect of temperature in the
drought development and climate water balance, defined as the difference between precipita-
tion and potential evapotranspiration (PET), and therefore provides a more reliable measure
of drought severity than only considering precipitation [Beguerı́a et al., 2014]. This differ-
ence is used as the input in the computation of SPEI. Thornthwaite equation [Thornthwaite,
1948] is used for estimation of PET. Alternatively, Penman–Monteith method (PM) can be
used to estimate PET but often requires more extensive data (solar radiation, temperature,
                                               10


wind speed and relative humidity) and more parameters, and long-term records of these vari-
ables are not always available. Previous studies have demonstrated that the two methods
often yield comparable results. As the log-logistic distribution has been shown to provide
better fit than other distributions [Vicente-Serrano et al., 2010, Beguerı́a et al., 2014], we
adopted this method to obtain SPEI series in standardized z units. We used R package SPEI
[Beguerı́a et al., 2013] to compute SPEI. Watersheds are considered to experience drought
when n-month SPEI is less than −0.5 [Yang et al., 2016]. The lower the SPEI value, the
more severe the drought condition (Table 1.2).
    SPEI values can be calculated for different time scales using the cumulative water balance
over the previous n-months. To understand the timescale at which SPEI affects vegetation
activity, we calculated 1-, 3-, 6-, and 12-month SPEI for each watershed and obtained the
Pearson’s correlation between monthly EVI and each of these SPEI time series for the growing
season months (May-Sep) in the period 2001-2018. We consider a significance threshold of
α < 0.01. The annual SPEI is the 12-month SPEI value for the month of September
and represents accumulated water balance at the end of the water year [Feng et al., 2020].
Drought develops gradually and water deficit can accumulate over a long period of time,
making it difficult to pinpoint drought and quantify their duration, magnitude in time and
space with a single variable or metric [Mishra and Singh, 2011, Vicente-Serrano et al., 2010].
In order to access the impact of drought on the catchment’s annual water partitioning, we
characterized a year as a drought year if its annual SPEI was less than −0.5.
           Table 1.2 Drought level classifications based on SPEI [Yao et al., 2018].
                                SPEI values            Categories
                               SPEI > −0.5            No drought
                           −1.0 < SPEI ≤ −0.5        Mild drought
                           −1.5 < SPEI ≤ −1.0      Moderate drought
                            2.0 < SPEI ≤ −1.5       Severe drought
                                SPEI ≤ -2.0        Extreme drought
                                               11


1.2.4   Assessing drought impacts of catchment water balances using the Budyko
        framework
The Budyko framework, which relates the dependence of actual evapotranspiration on en-
ergy availability represented by the potential evaporation and water availability represented
by the precipitation, has been successfully used to understand and predict the climatic and
landscape controls on long-term water balance [Budyko, 1974]. Recent studies have shown
success in extending the framework to investigate between climate, vegetation in the hydro-
logic cycle [Donohue et al., 2007, Li et al., 2012] and study interannual variability in water
partitioning at individual catchments and [Carmona et al., 2014, Koster and Suarez, 1999,
Yang et al., 2007, Cheng et al., 2011]. Figure 1.5 shows the relationship of water pools in
the 9 catchments plotted on the hypothetical Budyko framework.
 Figure 1.5 Long-term annual water balance represented by theoretical Budyko framework.
                                               12


1.3     Results and discussion
1.3.1    Characterizing annual drought using 12-month SPEI
As shown in Fig. 1.6, the 12-month SPEI time series capture the development of the ma-
jor multi-year drought periods in California in the last 40 years including the 1988–1992,
2007–2009, and more recently 2012–2015 droughts [He et al., 2017] across the watersheds.
The severity of the 2014/15 drought, which broke many historical records [Funk et al., 2014],
is also well reflected with SPEI reaching well below -2. At the annual scale, we identified
between 9 and 12 drought years at individual watersheds (Fig. 1.3, Table 1.3).
         Figure 1.6 12-month SPEI time series for 30 years between 1989 and 2018.
                                            13


Figure 1.7 Annual SPEI for the period 1989-2018. Annual SPEI is calculated as the
cumulative water balance for the 12-month period in the water year. Drought years with
annual SPEI < -0.5 are indicated by black dots.
1.3.2    Trends in vegetation response to drought at annual timescale
Differences in geography and vegetation composition can affect vegetation’s response to
droughts (Fig. 1.8). There is a general pattern of decrease in vegetation activities during
drought years in Bear Creek, EF Carson River, and Merced River, which locate in the south-
central part of the Sierra Nevada. We also observe an apparent trend in earlier vegetation
greening and peak in EVI among 8 out of 9 watersheds (except for Bear Creek) in drought
years. As transpiration in conifer forests in the Sierra Nevada is broadly temperature-
limited in winter and water-limited in late summer [Royce and Barbour, 2001], earlier onset
of snowmelt during drought years likely provides soil moisture and warmer air temperature
conditions can potentially drive photosynthesis and support vegetation growth at the be-
ginning of the growing season. Late into the summer months (Jul-Aug), the melt water
supply diminishes and drought stress likely resulted in lesser vegetation activity indicated
by lower EVI across all 9 watersheds. This trade-off in longer growing yet lower mountain
forest productivity under drought-induced snowpack reduction may be a scenario in future
climate [Knowles et al., 2018, Goulden and Bales, 2014]. The summer droughts therefore
will likely have detrimental effects on summer ecosystems in this region.
                                              14


Figure 1.8 Growing season EVI (May-Sep) during drought and non-drought conditions for
water years between 2001 and 2018. Drought years were identified using the annual SPEI
value. Solid lines indicate the median values and shaded areas represent the Q1 and Q3 of
the monthly values.
1.3.3     Impact of drought on precipitation-runoff relationship and catchment wa-
          ter balance
We observe a shift in precipitation-ruff relationship at 9 watersheds (Table 1.3) where there
is a decrease in mean runoff ratio in drought years compared to non-drought years. The
magnitude of the shift varies among the watersheds where the runoff ratio of Bear Creek
is relatively resilient to droughts compared to the other watersheds. This is our first line
of evidence that supports our hypothesis that the P-Q relationship changes under drought
conditions. Monthly runoff (normalized by annual precipitation) shows a shift in earlier peak
flow timing in a number of watersheds (Bear Creek, Pitman Creek, and Sagehen Creek) and
highlights the impacts of droughts on summer water availability (Fig. 1.9). Noticeably, there
is significantly less runoff in the months of May and June in drought years. This reduction
                                               15


Figure 1.9 Fraction of monthly runoff to annual precipitation for the period 2001-2018 for
non-drought and drought years.
have important implications on human and ecosystem water needs as it occurs during the
beginning of the peak water demand and summer growing season.
                                            16


Figure 1.10 Fraction of monthly ET to annual precipitation in the period 2001-2018 for
non-drought and drought years.
    While elevated temperatures associated with drought conditions can result in drier, hotter
atmospheric conditions favorable to drive ET, climate model simulations suggests that the
early spring greening can result in soil moisture deficits and decrease surface runoff persisting
well into the summer months [Lian et al., 2020]. It is evident that there is an increase in
ET as a fraction of available P during drought years in the watersheds under study (Fig.
1.10). We also see that the large decrease in runoff in the month of June coincides with the
peaks in EVI observed in Fig. 1.8 at Blackwood Creek, Ward Creek, and EF Carson River.
Because runoff is influenced not only by ET, but by many other factors, it is difficult to
conclude whether the observed reduction in Q/P fraction during droughts can be entirely
attributed to earlier vegetation activity and water uptake. Previous studies have suggested
that growing season in Sierra Nevada is restricted to a brief window in late spring and early
summer when air temperatures are warm enough for photosynthesis and melt-supplied soil
moisture remains plentiful [Dettinger et al., 2004, Goulden and Bales, 2014]. It makes sense
that during droughts, there is higher priority allocation of water to replenish the soil moisture
deficit and thus shifting the fraction of local P partitioned to Q.
                                               17


Figure 1.11 Annual runoff ratio plotted against annual SPEI for 1989-2019. Best-fit line
was obtained using simple linear regression.
    We further observe this trend in Fig. 1.11 where there is a significant positive linear
relationship between annual SPEI and annual runoff ratio in all watersheds (p < 0.05). In
other words, drought severity may have a direct effect on the magnitude of the partitioning
shift. However, such relationship varies among the watershed where annual SPEI explains
87% of variations in runoff ratio in Ward Creek but only 13% in Bear Creek. This highlights
the fact that sensitivity of catchment function to drought likely differs among watersheds and
is an interplay of both climate and catchment properties [Renner et al., 2012, Veettil et al.,
2018], which, in our study, possibly includes vegetation cover and elevation among others
(Appendix A.2, Fig. A.2).Possibly, the generation of high-elevation runoff such as that in
Bear Creek, which is more resilient to increases in PET due to overall lower temperatures,
can help mitigate runoff losses [Goulden et al., 2012]. Given the orographic effect of the
Sierra Nevada, watersheds at high elevations may also be less susceptible to decreases in
precipitation. These are consistent with previously studies conducted in the Sierra Nevada
                                               18


[Avanzi et al., 2020]. Prior studies, including Saft et al. [2015], Potter et al. [2011], and Avanzi
et al. [2020], used linear regression-based approaches to identify factors associated with
drought-induced changes to the precipitation–runoff relationship in Australia and California.
A better and more robust understanding and prediction on the impacts of drought on P-
Q relationship will benefit from consideration both only meteorological characteristics and
catchment variables.
1.3.4    Catchment balance under the Budyko framework
It is expected that under natural climate variability, individual catchments can move in all
directions through Budyko “space” [Van der Velde et al., 2014]. In Fig. 1.12, in 8 watersheds
exept Bear Creek, there is a tendency of the catchment in Budyko space moving towards
the theoretical water (horizontal) and energy (vertical) limits during drought years. In these
years, the larger evaporative demand and increase in forest ET (higher AET/P) associated
with warmer temperature (higher PET/P) allows for a higher fraction of water turned into
latent heat. On the other hand, forests may have reacted to increased temperature by
increasing their ET. The combined effect likely at the expense of runoff (lower Q/P). While
Loarie et al. [2009] pointed out that the temperature increase associated with climate change
is relatively slow in mountainous biomes including subtropical coniferous forests such as
the Sierra Nevada, we suspect that the combined effect of lower precipitation and higher
temperature associated with drought conditions may drive the eco-hydrology change in this
region.
    The Budyko framework, however, considers allocation of water relative to the aridity
index, a combination of two major water balance drivers (PET and precipitation), rather
than precipitation alone.
                                                19


Figure 1.12 Interannual variability of catchment water balance under the Budyko
framework for water years in period 2001-2018. Black dot is the catchment centroid and
indicates 18-year average.
   The Budyko framework governs available water partitioning by physical behavior under
limit conditions (when the aridity index is zero, all water goes to runoff; when the aridity
index is one, all water goes to ET) and allows for the possibility that even expected and
predictable water balance changes during drought may be nonlinear and that some shifts
observed in other studies may be the result of factors that are not captured in a two-
dimensional precipitation– runoff plane [Maurer et al., 2022]. The Budyko framework can
be leveraged to model more predictable regime versus less predictable partitioning shifts
during droughts. Further research is needed to analyze a more comprehensive set of feedback
mechanisms and compare the Budyko framework to other nonlinear approaches.
                                             20


1.3.5    Limitations
We acknowledge that there are uncertainties associated with MODIS ET estimates and
sources of error have been linked to LAI and meterological data quality, sensor calibration,
and atmospheric corrections [Demarty et al., 2007, Mu et al., 2011]. While the ground
data from the eddy covariance flux towers provide the best ET estimates, these are not
spatially consistent and scaling from tower to watershed scale poses as a challenge due to
the heterogeneous landscape, particularly among larger basins such as Merced River and
EF Carson River in our study. Evaluation results of MOD16 ET over the conterminous
United States using point and gridded FLUXNET and water balance ET by Velpuri et al.
[2013] indicate that MOD16 ET products effectively reproduced basin scale ET response
(up to 25% uncertainty) compared to CONUS-wide point-based ET response (up to 50–60%
uncertainty), illustrating the reliability of MODIS ET products for basin-scale ET estimation.
1.4     Conclusions
In this study, we jointly explored the impacts of drought on the ecohydrological processes
at 9 small and medium-sized watersheds in the Sierra Nevada. We found a general trend
in earlier peak in vegetation activity during drought years compared to non-drought years
watersheds. A similar trend is observed in peak runoff timing. This is likely due to warmer
temperature. Significant, positive linear relationship between annual SPEI and runoff ratio
suggests that drought conditions may affect runoff generation processes and cause changes
in the P-Q relationship. Catchment properties such as elevation and vegetation cover can
affect streamflow elasticity or catchment resilience to disturbance.
    As climate change is expected to increase the odds of worsening drought in many parts
of the United States, our study shows that both ecological and hydrological processes in
the Sierra Nevada watersheds may be vulnerable to drought conditions. Future hydrologic
modeling research could identify threshold responses in watersheds to changes in precipita-
tion deficit and temperature associated with drought, which can provide relevant insights
                                                21


for water and natural resource management.
                                           22


                    Table 1.3 Mean runoff ratio for drought and non-drought years for water years 1987-2018.
                                   Non-drought                                Drought
       Watersheds                                                                                      Percentage difference
                       Mean runoff ratio   Number of years      Mean runoff ratio Number of years
        Bear Creek           0.58               21                    0.50              9                      -13   %
     Blackwood Creek         0.70               19                    0.51             11                      -27   %
23      Cole Creek           0.82               19                    0.60             11                      -26   %
     EF Carson River         0.48               19                    0.34             11                      -29   %
      General Creek          0.64               19                    0.41             11                      -36   %
       Merced River          0.63               18                    0.53             12                      -16   %
      Pitman Creek           0.61               20                    0.38             10                      -37   %
      Sagehen Creek          0.38               20                    0.24             10                      -36   %
       Ward Creek            0.59               20                    0.37             10                      -37   %


                                       CHAPTER 2
     EVALUATION OF RANDOM FORESTS FOR SHORT-TERM DAILY
               STREAMFLOW FORECASTING IN RAINFALL AND
                         SNOWMELT-DRIVEN WATERSHEDS
2.1     Introduction
Nearly all aspects of water resource management, risk assessment, and early-warning sys-
tems for floods rely on accurate streamflow forecast. Yet streamflow forecasting remains a
challenging task due to the dynamic nature of runoff in response to spatial and temporal
variability in rainfall and catchment characteristics. Therefore, development of skillful and
robust streamflow models is an active area of study in hydrology and related engineering
disciplines.
    While physical models remain a common and powerful tool for predicting streamflow,
machine learning (ML) models are gaining popularity due to some of their unique qualities
and potential advantages. Compared with the often labor-intensive and computationally ex-
pensive task of parameterizing in physical model [Tolson and Shoemaker, 2007, Boyle et al.,
2000], ML models are data-driven and can identify patterns in the input-output relationship
without explicit knowledge of the physical processes and onerous computational demand.
To make up for their limited ability to provide interpretation of the underlying mechanisms,
ML models often require less calibration data than physical models, have demonstrated high
accuracy in their predictive performance, are computationally efficient, and can be used in
real-time forecasting [Adamowski, 2008, Mosavi et al., 2018]. ML models are particularly
useful when accurate prediction is the central inferential goal [Dibike and Solomatine, 2001],
whereas conceptual rainfall-runoff model can provide a better understanding of hydrologic
phenomena and catchment yields and responses [Sitterson et al., 2018]. Artificial neural net-
works (ANN), neuro-fuzzy (a combination of ANNs and fuzzy logic), support vector machine
(SVM), and decision trees (DT) are reported to be among the most popular and effective
                                              24


for both short-term and long-term flood forecast [Mosavi et al., 2018]. For example, Daw-
son et al. [2006] provided flood risk estimation at ungauged sites using ANN at catchments
across the United Kingdom. Rasouli et al. [2012] predicted streamflow at lead times of 1-7
days with local observations and climate indices using three ML methods: Bayesian neural
network (BNN), SVM, and Gaussian process (GP). They found BNN outperformed multiple
linear regression (MLR) as well as the other two ML models. Their study also found models
trained using climate indices yielded improved longer lead time forecasts (e.g., 5–7 days).
Tongal and Booij [2018] forecasted daily streamflow in four rivers in the United States with
SVR, ANN, and RF coupled with a baseflow separation method (i.e., separating the two
different components of streamflow into baseflow and surface flow). Obringer and Nateghi
[2018] compared eight parametric, semi-parametric, and non-parametric ML algorithms to
forecast urban reservoir levels in Atlanta, Georgia. Their results showed random forests (RF)
yielded the most accurate forecasts.
    Despite the promising results reported in existing literature, most ML streamflow fore-
cast applications are limited to watersheds where rainfall is the major contributor. In many
settings, particularly non-arid mountainous regions in Western USA, a combination of rain-
fall and spring snowmelt can drive streamflow [Johnstone, 2011, Knowles et al., 2007]. The
amount of snow accumulation and its contribution to discharge also vary among the water-
sheds [Knowles et al., 2006]. Both watershed-scale hydrologic and statistical models have
been used to assess the current and future stream hydrology and associated flood risks
[Salathé Jr et al., 2014, Wenger et al., 2010, Tohver et al., 2014, Pagano et al., 2009]. Safeeq
et al. [2014] simulated streamflows in 217 watersheds at annual and seasonal time scales using
the Variable Infiltration Capacity (VIC) model at 1/16◦ and 1/20◦ spatial resolutions. The
study found that the model was able to capture the hydrologic behavior of the studied wa-
tersheds with a reasonable accuracy. Yet the authors recommend careful site-specific model
calibration, using not only streamflow but also snow water equivalent (SWE) data, would be
expected to improve model performance and reduce model bias. Pagano et al. [2009] applied
                                                25


Z-score regression to daily SWE from Snow Telemetry (SNOTEL) stations and year-to-date
precipitation data to predict seasonal streamflow volume in unregulated streams in Western
US. The authors reported the skill of these forecasts is comparable to the official published
outlooks. A natural question is whether ML models can produce comparable performance
in these watersheds where streamflow contributions come from a mixture of snowmelt and
rainfall, as well as where snowmelt dominates sources. Considering the prominent role of
snowpack in water management and contribution of rapid snowmelt in flood events, such
question is worth exploring. To this end, we evaluate the potential of RF in making short-
term streamflow forecast at 1-day lead time across 86 watersheds in the Pacific Northwest
Hydrologic Region (Fig. 2.1). The U.S. Geological Survey [2020] defines this region as hydro-
logic region 17 or HUC 17. HUC-17 consists of sub-basins and watersheds of the Columbia
River that span varying hydrologic regimes. The selected watersheds have long-term record
of unregulated streamflow and different streamflow contributions of rainfall and snowmelt.
Drainage basin factors such as topography, vegetation, and soil can affect the response time
and mechanisms of runoff [Dingman, 2015]. Few studies attempted to account for or re-
port these effects on models’ performance. Without such consideration, it is difficult to
determine if a data-driven model can be generalized to watersheds not included in the given
study. Therefore, our objectives are (1) to examine and compare the performance of RF in
a number of watersheds across hydrologic regimes and (2) to explore the role of catchment
characteristics in model performance that are overlooked in previous studies.
    In practice, RF can be trained to forecast streamflow at various timescales, depending
on the input variables provided. Rasouli et al. [2012] forecasted streamflow at 1-7 day
lead times using three ML models and data from combinations of climate indices and local
meteo-hydrologic observations. The authors concluded that models with local observations
as predictors were generally best at shorter lead times while models with local observations
plus climate indices were best at longer lead times of 5–7 days. Also, the skillfulness of all
three models decreased with increasing lead times. In our study, we focused on 1-day lead
                                              26


Figure 2.1 (a) Elevation (m) shading map showing the Pacific Northwest Hydrologic Unit,
86 selected stream gauges (triangles), and their drainage area (cyan delineation lines), and
SNOTEL stations (brown squares). Examples of annual hydrographs of (b)
rainfall-dominated, (c) transient, and (d) snowmelt-dominated watersheds. Figures (b-d)
are based on 2009-2018 daily flow data at three sites 12043300 (48.2◦ N, 124.4◦ W),
12048000 (48◦ N, 123.1◦ W), and 10396000 (42.7◦ N, 118.9◦ W).
                                             27


time forecasting and therefore did not include long-term climate information. At longer lead
times, changes in weather conditions would likely exert much greater control on runoff and
the performance of the model.
    We select RF to forecast streamflow for two reasons. First, RF has been referenced
to deliver high performance in short-term streamflow forecasts [Mosavi et al., 2018, Pa-
pacharalampous and Tyralis, 2018, Li et al., 2019, Shortridge et al., 2016], making it a good
candidate for our study. Second, RF allows for some level of interpretability. This is deliv-
ered through two measures of predictive contribution of variables: mean decrease in accuracy
(MDA) and mean decrease in node impuritiy (MDI). These two measures have been widely
used as means for variable selection in classification and regression studies in bioinformatics
[Chen and Ishwaran, 2012], remote sensing classification [Pal, 2005], and flood hazard risk
assessment [Wang et al., 2015]. The interpretability of a ML model, however, can be a con-
troversial subject and remains an active area of study [Ribeiro et al., 2016, Carvalho et al.,
2019]. Both model-agnostic,such as permutation-based feature importance [Breiman, 2001],
and model-specific, such as gini-based for RF [Breiman et al., 1984] and gradient-based for
ANNs [Shrikumar et al., 2017], interpretation methods can provide useful insights into how
the ML models make their predictions. While the referred interpretability does not directly
translate to interpretation of the physical processes, it can provide insight into relationships
among predictors and streamflow response.
    The remainder of the paper is arranged as follows. Section 2.2 provides a brief introduc-
tion to RF, relevant parameters and selected evaluation criteria. Section 2.23 describes the
study area, datasets, and predictor selection. Results and discussion are given in Section 2.4
along with limitations and recommendation for future research. A summary and indication
of future work are is provided in Section 2.5.
                                               28


                            Figure 2.2 Structure of a RF and relevant parameters.
                                                                        Training sample                               Parameter
                                                                                                                        ntree
                                  Bootstrap 1                    Bootstrap 2                ...         Bootstrap n
                            Xi ≥ t1      Xi < t1                                                                        mtry
                       Xi ≥ t2                Xi < t2
               Xi ≥ t3                 Xi ≥ t4
                             Xi < t3                  Xi < t4
                                                                                            ...
                    s          s            s            s    s s          s           s            s s           s s nodesize
                                 Prediction                       Prediction             Prediction      Prediction
                                                                         Average of all
                                                                           predictions
2.2      Methods and data
2.2.1    Random forests
Proposed by Breiman [2001], RF is a supervised, non-parametric algorithm within the de-
cision tree family that comprises an ensemble of decorrelated trees to yield prediction for
classification and regression tasks. Non-parametric methods such as RF do not assume any
particular family for the distribution of the data [Altman and Bland, 1999]. Since a single
decision tree can produce high variance and is prone to noise [James et al., 2013], RF ad-
dresses this limitation by generating multiple trees where each tree is built on a bootstrapped
sample of the training data (Fig. 2.2, Algorithm 2.1). Each time a binary split is made in a
tree (also known as split node), a random subset of predictors (without replacement) from
the full set of predictor variables is considered (Fig. 2.2). One predictor from these candi-
dates is used to make the split where the expected sum variances of the response variable
in the two resulting nodes is minimized (Algorithm 2.1, Step 3). The randomization process
in generating the subset of the features prevents one or more particularly strong predictor
from getting repeatedly chosen at each split, resulting in highly correlated trees [Breiman,
                                                                                    29


2001]. After all the trees are grown, the forests make prediction on a new data point by
having all trees run through the predictors. In the end, the forests cast a majority vote on
a label class for classification task or produce a value for regression task by averaging all
predictions. Breiman [2001] provided full details on RF and its merit. The randomForest
package in R developed by Liaw et al. [2002] was used for model training and validation in
our study. The step-by-step of building a regression RF follows:
Algorithm 2.1 Building a regression RF
Step 1: n bootstrap samples are drawn from training set, each has the same size as the
training sample. This is also known as ntree or number of trees in the forest.
Step 2: At each binary node split, a subset of mtry predictors, Xi , is randomly selected
from p predictor space, Ωp , that results in Xi ∈ Ωp for {i ∈ 1,..., mtry}, mtry < p.
Step 3: The single best combination of predictor Xi among X predictor variables and
threshold t is selected to split the observations, yj , into binary regions R1 = { yj |Xi < t }
and R2 = { yj |Xi ≥ t } that minimize:
                                X                        X
                                       (yj − ŷR1 )2 +          (yj − ŷR2 )2             (2.1)
                              j:yj ∈R1                 j:yj ∈R2
where ŷR1 is the mean of observations in R1 and ŷR2 is the mean of observations in R2 .
Step 4: Repeat step 2-3 until all terminal region contains less than nodesize observations.
    Due to sampling with replacement, some observations may not be selected during the
bootstrap. These are referred to as out-of-bag or OOB and used to estimate the error
of the tree on unseen data. It has been estimated that approximately 37% of samples
constitute OOB data [Huang and Boutros, 2016]. An average OOB error is calculated for
each subsequently added tree to provide an estimate of the performance gain. The OOB error
can be particularly sensitive to the number of random predictors used at each split mtry and
number of trees ntree [Huang and Boutros, 2016]. Generally, the predictive performance
improves (or OOB error decreases) as ntree increases. However, recent research has shown
that depending on the dataset, there is a limit for number of trees where additional growing
does not improve performance [Oshiro et al., 2012]. It has been advised that mtry is set
to no larger than 1/3 of total number of predictors for optimal regression prediction [Liaw
                                                    30


et al., 2002], which is also the default value in randomForest function in R and widely
adopted in literature. Nevertheless, Huang and Boutros [2016] found that this value is
dataset-dependent and could be tuned to improve the performance of RF. Bernard et al.
[2009] argued that the number of relevant predictors highly influences optimal mtry value.
In this study, we select the optimal mtry using an exhaustive search strategy, in which all
possible values of mtry are considered, using R package Caret [Kuhn et al., 2008]. While all
considered parameters might have an effect on the performance of RF, we chose to focus on
two parameters, ntree and mtry, for a number of reasons. The main reason is that these
two parameters were originally introduced by Breiman [2001] in the development of RF
algorithm. Second, ntree in a forest is a parameter that is tunable but not optimized and
should be set sufficiently high [Oshiro et al., 2012, Probst et al., 2019] for RF to achieve good
performance. It has been theoretically proven that more trees are always better [Probst et al.,
2019]. In other words, optimal ntree value can go to infinity. The reduction in error, however,
becomes negligible after a sufficiently large number of trees. Furthermore, empirical results
provided in previous works suggest that mtry is the most influential out of parameters in RF
[Bernard et al., 2009, Van Rijn and Hutter, 2018, Probst et al., 2019]. Figure 2.2 illustrates
the step-by-step operating principle of growing RF and its the relevant parameters.
2.2.2    Variable importance in random forests
In addition to assessing a model’s overall predictive ability, there is also interest in under-
standing the contribution of each predictor variable to model performance. There are two
built-in measures for assessing variable importance in RF: mean decrease in accuracy (MDA)
and mean decrease in node impurity (MDI). Both were developed by Breiman [Breiman et al.,
1984, Breiman, 2001]. After all trees are grown, OOB data during training is used to com-
pute the first measure. At each tree, the mean squared error (MSE) between predicted and
observed is calculated. Then the values of each of the p predictors are randomly permuted
with other predictor variables held constant. The difference between the previous and new
                                                 31


MSE is averaged over all trees. This is considered the predictor variable’s MDA [Liaw et al.,
2002] and values are reported in percent difference in MSE. The procedure is repeated for
each predictor variable. Given that there is a strong association between a predictor and
response variable, breaking such bond would potentially result in large error in the pre-
diction (i.e., large MDA). MDA value can be negative where a predictor has no predictive
power and adds noise to the model.      Strobl et al. [2007], however, expressed caution that
permutation-based measures such as MDA could show a bias towards correlated predictor
variables by overestimating their importance, particularly in high-dimensional data sets.
    The second method, MDI, measures the each time a predictor is selected to make a split
during training. It is based on the principle that a binary split only occurs when residual
errors (or impurity) of two descendent nodes are less than that of their parent node. The
MDI of a predictor is the sum of all gains across all trees divided by the number of trees.
Because the scale of MDI depends on values of response variable, raw MDI provides little
interpretation. Following Wang et al. [2015], we computed relative MDI for each variable,
which in our case is calculated by dividing each predictor variable’s MDI by the sum of MDI
from all predictors at each watershed. When scaled by 100, this relative MDI is a percentage
and can be interpreted as the relative contribution of each predictor to the total reduction
in node impurities. In the case where a predictor makes no contribution during the splitting,
the relative MDI would be effectively zero. For both measures, the larger the value, the more
important the predictor.
2.2.3     Benchmark models
FWe benchmark the performance of RF during the validation period against multiple linear
regression (MLR) and simple naı̈ve models using the calculated Pearson correlation coeffi-
cient (r ) between forecasted and observed values for each model. In naı̈ve model, we assume
“minimal-information” scenario and the best estimate of the streamflow from the next day
is the observed value from current day [Gupta et al., 1999]. Its r, in this case, is the 1-day
                                              32


autocorrelation coefficient in the time series and measures of the strength of persistence. We
train and verify MLR model using same data sets and predictors supplied to RF model.
2.2.4    Performance evaluation criteria
There exist different model performance criteria and each provides unique insights on the
correspondence between forecasted and observed streamflow values. While r and its square,
namely coefficient of determination (R2 ), are often used, Legates and McCabe Jr [1999]
discussed the limitation of these two measures where they were reported to be especially
oversensitive to extreme values or outliers. The authors recommended that absolute error
measures (i.e., root mean squared error or mean absolute error) and goodness-of-fit measure,
such as the Nash-Sutcliffe efficiency (NSE), could provide more reliable and conservative
assessment of the models. Kling-Gupta efficiency (KGE) is a relatively new metric that
was developed based on a decomposition of NSE [Gupta et al., 2009]. This goodness-of-fit
measure is gaining popularity as a benchmark metric for hydrologic models by addressing
several shortcomings diagnosed with NSE. For these reasons, we selected the following four
criteria to evaluate RF performance: R2 , RMSE, MAE, and KGE. These criteria cover
various aspects of model’s performance and also provide intuitive interpretation as explained
in the remainder of this section.
    R2 can be interpreted as the proportion of the variance in the observed values that can
be explained by the model. Values are in the range between 0 and 1 where 1 indicates the
model is able to explain all variation in the observed dataset.
                                                                    2
                                          PN
                                            (ŷi − ŷ)(yi − y)      
                                         i=1
                              2
                                                                    
                            R =  s N                s              
                                                                                         (2.2)
                                   P                   PN           
                                          (ŷi − ŷ)2      (yi − y)2
                                       i=1              i=1
where N is total number of the observations during the validation period, ŷi and yi are the
forecasted and observed values at day i respectively.
                                                 33


     MAE provides an average magnitude of the errors in the model’s predictions without
considering the direction (underestimation or overestimation).
                                                 PN
                                                     |ŷi − yi |
                                                 i=1
                                       M AE =                                                (2.3)
                                                       N
     RMSE is the standard deviation of the residuals between the predictions and observations.
It is more sensitive to larger error due to the squared operation. Both MAE and RMSE scores
range between 0 and ∞ where a score of 0 indicates a perfect match between predicted and
observed data. The standardization in streamflow measurements (described in Sect. 3)
allows comparison of MAE and RMSE across gauges.
                                                v
                                                uN
                                                uP
                                                u (ŷi − yi )2
                                                t i=1
                                     RM SE =                                                 (2.4)
                                                          N
     KGE metric ranges between negative infinity and 1. While there currently is not a
definitive KGE scale, Knoben et al. [2019] showed KGE values in the range between 0.41
and 1 indicate the model improves upon the mean flow benchmark, which assumes the
predicted streamflow values equal to the mean of all observations. KGE value of 1 suggests
the model can perfectly reproduce observations. KGE is calculated as follows:
                                       p
                          KGE = 1 −       (r − 1)2 + (α − 1)2 + (β − 1)2                     (2.5)
     where r is the Pearson correlation coefficient, α is a measure of relative variability in the
forecasted and observed values, and β represents the bias:
                                           σŷ               µŷ
                                      α=        and β =                                      (2.6)
                                           σy                µy
where σŷ is the standard deviation in observations, σy is the standard deviation in forecasted
values, µŷ is the forecasted mean, and µy is observation mean.
     In hydrological forecast, one might be interested in the ability of the model to capture
more extreme events rather than the overall performance. This is particularly relevant
                                                 34


in flood risk assessment and flood forecasting where floods are associated with discharge
exceeding a high percentile (typically ≥ 90th )[Cayan et al., 1999]. The definition of “extreme”
depends on the objective of the study. Here, we adopt the peak-over-threshold method. For
the validation period, we calculated the 90th , 95th , and 99th percentile streamflow values at
each watershed. These are considered thresholds. If an observed daily streamflow exceeded
this threshold, it would be considered an extreme event. We measure the ability of RF to
capture these events using two additional criteria: probability of detection (POD) and false
alarm rate (FAR). The calculation followed as in [Karran et al., 2013].
                                           P (ŷi > ω|yi > ω)
                                  P OD =                                                    (2.7)
                                                 P (yi > ω)
and
                                          P (ŷi > ω|yi < ω)
                                   FA =                                                     (2.8)
                                               P (yi < ω)
where ω is a specified threshold.
2.2.5   Study area: Pacific Northwest Hydrologic Region
In this study, we focus on watersheds in the Pacific Northwest Hydrologic Region (Fig.
2.1). This region covers an area of 836,517 km2 and encompasses all of Washington, six
other states, and British Columbia, Canada. For the purpose of maintaining consistency
in monitoring protocol and data, we only consider watersheds on the US territory. The
Columbia River and its tributaries make up the majority of the drainage area, traveling
more than 2000 km with an extensive network of more than 100 hydroelectric dams and
reservoirs have been built along these river channels. Hydropower in the Columbia River
Basin supplies approximately 70 percent of Pacific Northwest energy [Payne et al., 2004].
Flood control is also an important aspect of reservoir operation in this region.
    The north-south running Cascade Mountain Range divides the region into eastern and
western parts and strongly influence the regional climate. The windward (west) side of the
mountain receives an ample amount of winter precipitation compared to the leeward (east)
                                                35


side. When temperature falls near freezing point, precipitation comes in the form of snow and
provides water storage for dry summer months. Summers tend to be cool and comparatively
dry. East of the Cascades, summer rainfall result from rapidly built thunderstorm and
convective events that can produce flash floods [Mass, 2015]. For this region, proximity
to the ocean creates a more moderate climate with a narrower seasonal temperature range
compared to the inland areas, particularly in the winter. Spatial trends and variations in
annual mean temperature, total precipitation, drainage area, and elevation of the watersheds
are shown in Fig. 2.3.
Figure 2.3 Gauge locations with color gradient indicating variations in (a) drainage area
(km2 ), watershed mean elevation (m), (c) annual precipitation (cm), and (d) annual mean
temperature (◦ C).
2.2.6    Data
The following section describes the different sources of data used in the study.
2.2.6.1   Streamflow
Our analysis uses streamflow data available through the USGS National Water Informa-
tion System (NWIS) (https://waterdata.usgs.gov/nwis/sw). From NWIS, we selected daily
streamflow time series for gauges using the following criteria: 1) continuous operation dur-
                                              36


ing the 10-year period between 2009 and 2018, 2) have less than 10 percent of missing
data, and 3) positioned in watersheds with “natural” flow that is minimally interrupted by
anthropogenic intervention. The third criterion was met using the GAGES-II: Geospatial
Attributes of gauges for Evaluating Streamflow dataset [Falcone, 2011] classification to iden-
tify watersheds with least-disturbed hydrologic condition and represented natural flow. We
performed additional screening by computing correlation coefficient between the respective
gauge and mean basin streamflow and removed those with a correlation of less than 0.5. We
also excluded small creeks with drainage area less than 50 km2 . In total, 86 watersheds were
selected (Fig. 2.1).
    Following methodology proposed in [Wenger et al., 2010], the watersheds were further
grouped into three classes of hydrologic regimes based on the timing of center-of-annual
flow, which is defined as the date at which half of the total annual flow volume is exceeded.
The annual flow calculations follow a water-year calendar that begins October 1st and ends
September 30th . These three hydrologic regimes include: “early” streams with flow time
< 150 (27 February), “late” streams with flow time > 200 (18 April), and “intermedi-
ate” streams with flow time between 150 and 200. These hydrologic regimes correspond to
rainfall-dominated, snowmelt-dominated, and transient or transitional (mixture of rain and
snowmelt) hydrographs, respectively. While this particular classification and its variants
have been used in various studies related to water resources in this region [Mantua et al.,
2009, Elsner et al., 2010, Vano et al., 2015], we adopted this partition in our study for two
reasons. First, as Regonda et al. [2005] pointed out, the classification provides a summary of
information about type and timing of precipitation, timing of snowmelt, and the contribu-
tion of these hydro-climatic variables to streamflow. This helps us assess model performance
in consideration of sources of runoff. Second, the classification provides a basis to generalize
the results to other watersheds that are not part of the study.
    On average, records at these watersheds have less than 3 percent missing data during
the 2009–2018 period. The drainage area of the watersheds range between 51 km2 and 3355
                                               37


km2 , and the mean elevation range from 239 m and 2509 m, estimated from 30-m resolution
digital elevation model.
2.2.6.2    Precipitation
Daily precipitation observations were obtained from the AN81d PRISM dataset [Di Luzio
et al., 2008]. This gridded dataset has a resolution of 4 km, covers the entire continental US
from January 1981 to present, and is continuously updated every 6 months. Best estimate
gridded value is derived by using all the available data from numbers of station networks in-
gested by the PRISM Climate Group. A combination of climatologically aided interpolation
(CAI) and radar interpolation were used in developing PRISM dataset. In our study, water-
shed daily precipitation time series were constructed by computing the arithmetic mean for
precipitation values of all grid points that fall within the given watershed.
2.2.7    Snow water equivalent and temperatures
SWE is defined as the depth of water that would be obtained if a column of snow were
completely melted [Pan et al., 2003]. Daily SWE data were retrieved from 201 SNOTEL
stations in HUC 17. These stations are part of the network of over 800 sites located in
remote, high-elevation mountain watersheds in the western U.S. The elevation of these sta-
tions are in the range of 128 m and 3142 m. At SNOTEL sites, SWE is measured by a
snow pillow—a pressure sensitive pad that weighs the snowpack and records the reading via
a pressure transducer. As the temperature shift is the primary trigger for snowmelt, daily
maximum temperature (TMAX) and minimum temperature (TMIN) from SNOTEL sensors
were also retrieved and included as predictors for streamflow. The obtained data reflected
the last measurement recorded for the respective day at each site. We only supplied the last
measurement from SNOTEL stations because not all predictors have sub-daily values. The
dataset is mostly complete, with 99.6 %, 99.6 %, and 99.9 % of the observations available
for three variables TMAX, TMIN, and SWE respectively. Because of the sparse coverage
                                               38


of SNOTEL sites, daily average values were calculated at USGS basin level (6-digit Hydro-
logical Unit), similar to the currently reported snow observations from National Water and
Climate Center (www.wcc.nrcs.usda.gov/snow/snow map.html), and subsequently applied
to the watersheds located in that basin. There is a total of 15 basins, each contains a num-
ber of SNOTEL stations in the range between 6 and 30 (Table S2 in the Supplement). It is
noted the in situ data from these of stations cannot capture the spatial variability of snow
accumulation and computing an area-averaged snowpack value from observations remains a
challenging task [Mote et al., 2018]. The SNOTEL averages therefore represent first-order
estimates of snow coverage and temperature conditions.
2.2.7.1    Predictor selection
Future daily mean streamflow (Qt+1 ) is the response variable in our study. We attempt to
explain the variability in Qt+1 using eight relevant predictors from the three datasets (Table
2.1). Selection of predictors is based on thorough review of the literature from previous
studies and our understanding of the hydrology of this region. Specifically, precipitation (Pt )
is intuitively a driver of streamflow. SW Et provides storage information on the amount of
accumulated snow available for runoff and is influenced by changes in temperature (T M AXt
and T M INt ). Given that there is high temporal correlation in daily temperatures, TMIN and
TMAX data can provide useful signal to our streamflow forecast. Previous day streamflow
(Qt ) is particularly important due to high degree of persistence that exist in the time series.
A hydrological year consists of 73 pentads where each comprises of five consecutive days and
observation for each day is indexed with a pentad value between 1 and 73. Data preprocessing
showed moderate to strong non-linear temporal correlation between daily streamflow and the
pentad at each gauge. We also derived two variables: sum of 3-day precipitation (P 3t ) and
snowmelt (SDt ) from available data. Inclusion of 3-day precipitation was to account for
large winter storms that can last for several days, which often result in surges in streamflow.
SDt was calculated as the difference between SWE at day t and t − 1. A positive value of
                                               39


                                   Table 2.1 List of predictors.
       No.            Predictors               Index      Unit            Source
         1       Streamflow at day t             Qt      m3 s-1            USGS
         2           Precipitation               Pt       mm              PRISM
         3   Sum of 3-day precipitation         P 3t      mm      Derived from PRISM
         4      Snow water equivalent         SW Et       mm             SNOTEL
                                                           ◦
         5     Maximum temperature           T M AXt         C           SNOTEL
                                                           ◦
         6     Minimum temperature           T M INt         C           SNOTEL
         7    Snowmelt (SWt - SWt−1 )           SDt       mm     Derived from SNOTEL
         8              Pentad                P ENt          -               -
SDt indicates snow accumulation and negative value indicates melt.
    Soil moisture is also a relevant variable in streamflow modeling as it controls the partition
between infiltration and runoff of precipitation [Aubert et al., 2003]. However, soil moisture
data is often limited and incomplete, especially at daily interval and therefore not included
in this study. The data were divided into two sets: training consisting of seven years 2009–
2015 and a validation set of three years 2016–2018. We standardized training and validation
data at each gauge using min-max scaling. First, we computed the min and max values
from training data sets for each of the predictor and response variables at each watershed.
These min and max values were then used to standardize both training and validation data
sets. The training data, which were used to compute min-max values for standardization,
therefore have values between 0 and 1. A flowchart representing the input-output model
using RF is shown in Fig. 2.4.
2.3     Results and discussion
2.3.1    Parameter tuning
As we mentioned in Sect. 2, error rate in RF can be sensitive to two parameters: the
number of trees ntree and number of randomly selected predictors available for splitting at
each node mtry. We tested RF on training data sets of 30 randomly chosen watersheds and
observed that the reduction in out-of-bag MAE error is negligible after 2000 trees. We then
                                                 40


            Figure 2.4 Flowchart showing the input-output model using RF.
Figure 2.5 Out-of-bag mean absolute error plotted against mtry during optimal parameter
search at Carbon River Watershed (USGS site 12094000).
                                        0.0120
                 Cross−validation MAE
                                        0.0118
                                        0.0116
                                        0.0114
                                        0.0112
                                                 1   2   3   4          5   6   7   8
                                                                 mtry
                                                             41


Table 2.2 The optimized parameter mtry using exhaustive-search strategy (mtry = {1, 2, 6,
7, 8} were considered but not found as the optimal value at any gauge).
                           mtry Number of gauges        Median MAE
                             3             29               0.0127
                             4             44               0.0116
                             5             13               0.0079
set ntree=2000 for all 86 watersheds. mtry, on the other hand, was tuned empirically using
a combination of exhaustive search approach and cross-validation.
    The goal of tuning is to select the mtry parameter value that would optimize the perfor-
mance of the model. The candidates were evaluated based on their OOB mean absolute error
(MAE). At each watershed, eight possible candidate values of mtry (1-8) were analyzed by 3
repetitions of 10-fold cross validation from the train data set. Averaging the MAE of repeti-
tions of the cross-validation procedure can provide more reliable results as the variance of the
estimation is reduced [Seibold et al., 2018]. To illustrate, in Fig. 2.5, lowest cross-validation
MAE is obtained at mtry = 3 at Carbon River Watershed (USGS Site 12094000). The re-
sults of tuning for all gauges (Table 2.2) show that the optimal mtry values are {3, 4, 5} with
median MAE of 0.0127, 0.0116, and 0.0079 respectively. The optimal mtry at each gauge was
then used in both training and validating the model. Because the number of predictors in
our study is relatively small, computation burden of the exhaustive search was manageable.
As the number of candidate grows, a random search strategy [Probst et al., 2019], in which
values are drawn randomly from a specified space, can be more computationally efficient.
2.3.2    Benchmark RF against MLR and naı̈ve models
Figure 2.6 shows the distributions of Pearson correlation coefficient (r) between forecasted
and observed values obtained from the three models: RF, naı̈ve, and MLR. Non-parametric,
two-sample Wilcoxon rank-sum significance tests [Wilcoxon et al., 1970], which are used to
assess whether the values obtained between two separate groups are systematically differ-
                                               42


Figure 2.6 Boxplots for Pearson correlation coefficient between forecasted and observed
values for three models: RF, naı̈ve, and MLR across three flow regimes. Two-sample
Wilcoxon rank-sum significance tests are performed and p-value (in black) are included for
each pair of models.
Figure 2.7 Pairwise scatter plots of Pearson correlation coefficient between forecasted and
observed values among watersheds for (a) RF vs. naı̈ve model, (b) RF vs MLR, and (c)
MLR vs. naı̈ve model. Each dot represents a watershed (n=86).
            (a)                                                                                           (b)                                                                                                (c)
                                                     1.0                                                                                           1.0                                                                                      1.0
             Random Forest Correlation Coefficient                                                         Random Forest Correlation Coefficient
                                                                                                                                                                                                              MLR Correlation Coefficient
                                                     0.9                                                                                           0.9                                                                                      0.9
                                                     0.8                                                                                           0.8                                                                                      0.8
                                                     0.7                                                                                           0.7                                                                                      0.7
                                                            1−1 line                                                                                            1−1 line                                                                           1−1 line
                                                           0.7         0.8         0.9           1.0                                                           0.7          0.8          0.9        1.0                                           0.7         0.8         0.9           1.0
                                                           Naive Model Correlation Coefficient                                                                       MLR Correlation Coefficient                                                  Naive Model Correlation Coefficient
                                                                                                       Regime                                            Rainfall dominant      Transient     Snowmelt dominant
ent from one another, suggest that the pair-wise differences in r values between RF and
the other two models are statistically significant (p < 0.05) in two flow regimes. RF is
observed to outperform both naı̈ve and MLR models in rainfall-driven and transient wa-
tersheds. Among snowmelt-driven watersheds, the three models yield similar correlation
coefficients (p > 0.05). In Fig. 2.7a, we observe most points lie on the left of the 1-to-1
line, suggesting that RF outperforms naı̈ve model at most individual watersheds in rainfall-
driven and transient regimes. We also discern that large improvement, defined as the positive
difference in r values between RF and naı̈ve model, tends to occur with lower persistence
(lower r values from the naı̈ve model). This suggests that application of RF would be most
                                                                                                                                                                             43


benefiting at watersheds where next-day streamflow is less dependent on the condition of
the current day. Among snowmelt-driven watersheds, the data points lie on the 1-to-1 line,
indicating that the three models show marginal difference in r values. As Mittermaier [2008]
pointed out, the choice of reference can affect the perceived performance of the forecast
system. Our pair-wise comparisons highlight the fact that evaluating data-driven models
should be performed in consideration of the autocorrelation structure in the data [Hwang
et al., 2012]. Without accounting for persistence, it would be inadequate to conclude that
RF gives better performance in snowmelt-driven watersheds. Nevertheless, we observe RF
outperformed MLR in all rainfall-dominated and transitional watersheds and 19 out of 25
snowmelt-dominated watersheds. The median r values for RF in the three groups are (0.88,
0.89, 0.98) compared to (0.85, 0.87, 0.98) for MLR. This may reflect RF’s better ability to
capture non-linear relationship between streamflow and other variables.
2.3.3    Evaluation of RF overall performance
We next evaluated the overall performance of RF across three flow regimes using four criteria:
R2 , KGE, MAE, and RMSE (Table 2.3, Fig. 2.8). Here, we observe a similar trend in R2 ,
KGE, MAE, and RMSE scores compared to r-value trend in Fig. 6, where RF performs
better in snowmelt-dominated than in rainfall-dominated (higher R2 and KGE, lower MAE
and RMSE). Snowmelt-dominated watersheds have the smallest range of R2 values across the
three groups. This may suggest that there is less variability in flow behaviors at individual
gauges in this group and is consistent with the observed data where the hydrographs of
snowmelt-driven watersheds tend to be less flashy compared to rainfall-driven watersheds.
Not surprisingly, transitional group has the largest spread in R2 values as watersheds in this
group share characteristics from the other two groups.
    Because RMSE is more sensitive to larger errors compared to MAE, the difference be-
tween the two scores represents the extent in which outliers are present in error values
[Legates and McCabe Jr, 1999]. In rainfall-driven and transient groups, the shape of the
                                              44


Figure 2.8 Streamflow daily forecast scores computed over the validation period for RF
model in four metrics: R-squared, KGE, MAE, and RMSE.
Table 2.3 Descriptive statistics of the four criteria used to evaluate the overall performance
of RF: R2 , KGE, MAE, and RMSE.
        Metric       Flow regime            Min       Q1     Median      Q3      Max
                  Rainfall-dominated       0.59      0.71      0.77     0.81     0.87
          R2           Transient           0.57      0.71      0.80     0.87     0.99
                 Snowmelt-dominated        0.88      0.95      0.97     0.98     0.99
                  Rainfall-dominated       0.64      0.78      0.84     0.87     0.92
         KGE           Transient           0.62      0.77      0.86     0.91     0.99
                 Snowmelt-dominated        0.77      0.89      0.94     0.97     0.99
                  Rainfall-dominated      0.0061    0.0096 0.0131 0.0161 0.0245
         MAE           Transient          0.0070    0.0097 0.0109 0.0143 0.0189
                 Snowmelt-dominated       0.0065    0.0087 0.0092 0.0114 0.0168
                  Rainfall-dominated      0.0157    0.0241 0.0326 0.0395 0.0609
        RMSE           Transient          0.0144    0.0227 0.0275 0.0331 0.0468
                 Snowmelt-dominated       0.0160    0.0218 0.0270 0.0315 0.0436
                                               45


boxplot distributions remain fairly consistent between the two error scores, suggesting that
distribution of large errors is similar to that of mean errors in these watersheds (Fig. 2.8).
The MAE scores are heavily skewed towards 0 while RMSE scores are more evenly spread
among snowmelt-driven watersheds. In snowmelt-driven watersheds, we observe a noticeably
wider interquartile range (difference between first quartile and third quartile) in RMSE plot
compared to MAE plot. This indicates that RF can still be susceptible to underestimation
or overestimation in watersheds where the mean error is relatively low.
    In Table 2.3, KGE scores are reported in a range of 0.64–0.99 for all watersheds. The
median values for each flow regime are 0.84, 0.87, and 0.94. As observed mean flow is used
in the calculation of KGE, Knoben et al. [2019] suggested that a KGE score greater than
-0.41 indicates a hydrologic model improves upon the forecast with mean flow, independent
of the basin. Therefore, RF can be seen to give satisfactory performance at all watersheds in
our study. Our results are comparable to findings in [Tongal and Booij, 2018] where authors
compare the performance of RF, SVM, and ANN to simulate daily discharge with baseflow
separation at four rivers in California and Washington. Although authors did not classify
these basins, it can be inferred that three of the rivers were rainfall-driven and one was
snowmelt-driven. RF model in their study produced KGE scores of 0.41, 0.81, and 0.92 for
the rainfall-driven water basins (without baseflow separation). However, our KGE scores for
snowmelt-fed watersheds (with a median of 0.94) are higher compared to the reported 0.55
in their study.
2.3.4    RF performance on extreme streamflows
We also examine the model’s capacity to forecast extreme events because of their potential
high impact and associated flood risks in this region. Ability of RF to correctly detect
extreme flows exceeding 90th , 95th , and 99th percentile thresholds (defined as the POD) for
each watershed are plotted against the FAR in Fig. 2.9. A threshold point falling below the
no-skill line indicates the model yields higher FAR than POD and is considered to have no
                                               46


Figure 2.9 The probability of detection (POD) plotted against the false alarm rate (FAR)
for three extreme thresholds: 90th , 95th , and 99th percentiles. Thin black line connects
values from the same watershed. (Vertical axis) Number of times RF correctly forecasted
events that exceeded the threshold divided by the total number of exceedance. (Horizontal
axis) Number of times RF incorrectly forecasted events that exceeded the threshold divided
by the total number of non-exceedance. It is noted that the scales of the horizontal and
vertical axes are not 1-to-1 in the plotted partial receiver operating characteristic (ROC)
curve.
predictive power for that threshold. RF becomes expectedly less skilful in its forecasts with
increase in magnitude of the events. The model tends to perform better among snowmelt-
dominated watersheds (higher POD, lower FAR) compared to those in transient and rainfall-
driven groups. At the 95th threshold, RF can forecast correctly at least 50 percent of the
extreme events (POD ¿ 0.5) at most watersheds. At the 99th threshold, the difference in RF’s
ability to forecast extreme streamflow among the three flow regimes becomes less obvious.
In snowmelt-driven watersheds, 8 out of 25 have POD > 0.5, 9 have POD between 0.01 and
0.5, and 8 have a POD of 0. While few studies have examined complex diurnal hydrologic
responses in high-elevation catchments [Graham et al., 2013], our particular result suggests
large surges in streamflow sustained by spring and early summer snowmelt can be difficult
to predict, even at 1-day lead time, and is an ongoing research subject [Ralph et al., 2014,
Cho and Jacobs, 2020]. In our study, we observe high POD is accompanied by low FAR for
the same threshold. This may suggest that RF is skillful in its forecasts of extreme events.
                                               47


Figure 2.10 Barplots show importance of predictor variables using (a-c) MDA and (d-f)
MDI criteria. Length of the blue bars indicates the median value across the watersheds for
each flow regime and the thin black bar represents the full range of the values.
2.3.5     Analysis of variable importance
Variable importance is a useful feature in both understanding the underlying process of
current model and generating insights for selection of variable in future studies [Louppe
et al., 2013]. RF quantifies variable importance through two measures: MDA and MDI (Fig.
2.10). In both measures, the higher value indicates variable contributes more to the model
accuracy. Intuitively, streamflow from previous day is shown to be the most importance
variable due to persistence. This is reflected across three flow regimes and two measures.
We also observe the sum of 3-day precipitation tends to have more predictive power than
than 1-day precipitation. Maximum temperature and minimum temperature share similar
contribution where minimum temperature tends to receive slightly higher scores. Among
snowmelt-dominated watersheds (Fig. 2.10c and 2.10f), we anticipate snow indices (SDt
and SW Et ) contribute more in the prediction than precipitation and this is also reflected.
Surprisingly, pentad comes third and fourth in MDI and MDA respectively. This supports the
long-term snowpack memory of daily streamflow [Zheng et al., 2018] and can be useful in real-
                                              48


time prediction. Precipitation does not seem to have significant contribution to the model’s
accuracy among the snowmelt-dominated watersheds. Although PRISM precipitation data
includes both rainfall and snowfall, it is likely that the majority of fallen precipitation in
these high-altitude watersheds is stored as snow on the surface and does not immediately
contribute to runoff. Li et al. [2017] estimated that 37 % of the precipitation falls as snow in
western US, yet snowmelt is responsible for 70 % of the total runoff in mountainous areas.
It is still very surprising to observe such low contribution of precipitation variable to RF
model accuracy. Nevertheless, we observe general agreement between the two measures in
ranking of the variables in snowmelt-driven group.
    In transient and rainfall-dominated groups, there is noticeable disagreement between the
two criteria. Precipitation (Pt ) and 3-day precipitation (P 3t ) tend to rank lower in MDA
measure (Fig. 2.10a and 2.10b) compared to MDI (Fig. 2.10d and 2.10e). Specifically, in
rainfall-dominated group, 3-day precipitation and precipitation are placed 2nd and 3rd based
on median MDI compared to 4th and 7th in MDA. Maximum and minimum temperatures,
on the other hand, tend to be more important in MDA calculation compared to in MDI. In
Shortridge et al. [2016], RF model was used to predict streamflow at five rain-fed rivers in
Ethiopia. Similarly calculated MDA in that study suggested precipitation was less important
(7.71 %) than temperature (12.74 %). Linear model in the same study, however, considered
the coefficient for precipitation to be significant (p << 0.01) while temperature coefficient
was not (p = 0.08). In Obringer and Nateghi [2018], the authors predicted daily reservoir
levels in three reservoirs in Indiana, Texas, and Atlanta using RF and other ML techniques.
Precipitation was reported as the least important variable and ranked behind dew point
temperature and humidity. Inspecting the probability density functions of our predictors,
we suspect that for variables that are heavily skewed and zero-inflated (e.g., precipitation),
permutation-based MDA may underestimate their importance compared to those that are
more normally distributed such as maximum and minimum temperatures. In our precipi-
tation data (both training and validation), at least 30 percent of the daily observations are
                                               49


zeros across the watersheds. There is a high likelihood that the day with zero precipita-
tion ends up with the same value during the shuffling process, thus potentially affecting the
randomness created to compute MDA. While we did not perform additional simulation to
further confirm whether MDA and MDI measures are sensitive to highly-skewed and zero-
inflated variables, this can be a topic of future research. Strobl et al. [2007], however, showed
RF variable importance measures can be unreliable in situations where predictor variables
vary in their scale of measurement. It is noted that the scale of measurement does not only
refer to the numeric range but also the nature of the data (e.g., ordinal vs. continuous).
Among our 8 predictors in our study, pentad is considered an ordinal variable. Also, the
scales of measurement of precipitation and temperature variables are slightly different. Pre-
cipitation is a flux variable and comprises discrete and continuous components in that if it
does not rain the amount of rainfall is discrete whereas if it rains the amount is continuous.
Temperature is a state variable and always continuous. Temperature predictors receiving
higher MDA can also be due to identified bias where permutation-based importance mea-
sures overestimates the true contribution of correlated variables [Gregorutti et al., 2017]. In
our study, temperature variables tend to have more correlation with other predictors than
do the two precipitation variables. This is likely because temperature controls both the form
of precipitation (snowfall vs. rainfall) and the timing of snowmelt. There is also an ongoing
discussion regarding the stability of both measures, in which the two variable importance
measures can yield noticeably different rankings, in simulated datasets [Calle and Urrea,
2010, Nicodemus, 2011, Ishwaran and Lu, 2019]. Although results from MDI make more
sense in our case, we suggest RF users to exert caution when interpreting outputs from these
two measures.
2.3.6    Effects of watershed characteristics on model performance
To explore the role of catchment characteristics such as geology, topography, and land cover
on the performance of RF model, we perform Pearson correlation test between the KGE
                                                50


Figure 2.11 KGE scores plotted against (a) the average percent of slope and (b) the average
percent of sand in soil at each watershed. Best-fit lines were determined using simple linear
regression. Pearson correlation coefficients were computed with associated significance.
                                              51


scores and selected basin physical characteristics for each flow regime. These watershed
characteristics were compiled as part of GAGES-II dataset using national data sources in-
cluding US National Land Cover Database (NLCD) 2006 version, 100 m-resolution National
Elevation Dataset (NED), and Digital General Soil Map of the United States (STATSGO2)
(Table S1 in the Supplement). The results are shown in Table 2.4. There is a strong negative
correlation (p < 0.05) between KGE scores and watershed slopes among rainfall-dominated
and transient watersheds (Fig. 2.11a). As steeper hillslope often associates with faster
surface and subsurface water movement during event-flow runoff, this can result in shorter
response time. We observe a similar trend between KGE scores and percent of sand in the soil
(Fig. 2.11b) where the RF performs worse in watersheds with higher hydraulic conductivity
(i.e., higher sand content). This could be a result of rapid subsurface flow from soil profile
enabled by soil macropores in mountainous forested area [Srivastava et al., 2017], where
subsurface flow is the predominant mechanism. Without a quantification of the partition
of discharge into surface flow and subsurface flow at individual watersheds, it is difficult to
determine the relative importance of subsurface runoff mechanisms in regulating streamflow
and how that may have affected the RF performance. The findings, however, suggest RF
performance can deteriorate at watersheds with quick-response runoff when supplied with
1-day delayed observation data.
    It appears that stream density and the amount of vegetation cover may also affect the
performance of RF. Specifically, an increase in the amount of evergreen forest seems to
improve the RF model among the snowmelt-dominanted watersheds but not the other two
regimes. Aspect eastness, drainage area, and basin compactness are not determining factors
to variability in the KGE scores. We also explored the impact of land-use and land-cover,
which can be represented by the extent of impervious cover in each watershed. However,
because we only selected unregulated watersheds that experienced minimal human disruption
during the initial screening, most watersheds have very little impervious cover (less than
5%). It is noted that these selected characteristics are not meant to be exhaustive, but
                                             52


Table 2.4 Pearson correlation coefficient between KGE scores and selected basin physical
characteristics. Bolded value indicates the relationship is significant at 5 percent or 1
percent level.
      Watershed characteristics                         Hydrologic regime
                                       Rainfall dominant Transient Snowmelt dominant
                 Slope                       -0.42            -0.68               0.12
           Aspect eastness                   -0.02             0.12              -0.12
            Drainage area                     0.14             -0.12              0.11
          Basin compactness                   0.09             -0.12             -0.16
            Stream density                   -0.10              0.29             -0.27
           Percent of sand                   -0.59            -0.46              -0.14
  Percent of evergreen forested area         –0.13              0.31             0.41
rather representative of various types of factors that could help explain the variability in
model performance. Furthermore, an alternative approach to Pearson’s correlation is to
use ANOVA to test for marginal significance of each catchment variable to KGE while
accounting for their interaction. Because our objective is not to make inference on KGE
based on these variables and ANOVA analysis can be complicated to interpret, we choose to
compute correlation coefficient.
2.3.7    Limitations and future research
There are some notable limitations in our study as well as RF in general. The classification
of watersheds into three flow regimes was based on the timing of the climatological mean of
the annual flow volume, which can fluctuate from year to year. This is particularly true for
the watersheds in the transient group where streamflow is contributed by a mixture of runoff
from winter rainfall and springtime snowmelt and the inter-annual variability is tremendous
in both magnitude and timing [Lundquist et al., 2009]. Therefore, the membership of the
classified watersheds from this group can vary. In fact, Mantua et al. [2009] discussed the
future shift of transient runoff watersheds towards rainfall-dominated in Washington State.
Because we trained RF using the same input variables for all watersheds regardless of flow
regimes and calculated performance criteria separately, the classification does not alter the
                                              53


results at individual watershed.
    In the study, we used estimated precipitation from PRISM, which is an interpolation
product and combines data from various rain gauges from multiple networks. Despite pos-
sible introduced errors and uncertainty, we believe the use of spatially distributed product
better represents the areal estimation of precipitation over the watershed than a single rain
gauge measurement. In real-time forecast, this would be not be feasible due to the added
time to compile and process such data. Similarly, we provided RF model with a basin-average
SWE from SNOTEL stations as an estimate of snowpack condition. Using a more spatially
consistent SWE data such as the Snow Data Assimilation System [Pan et al., 2003] product
would potentially improve model accuracy. As our results indicate that RF can produce rea-
sonable forecasts, potential future research could explore the sensitivity of the model using
satellite derived snow products a station data and even include t + 1 precipitation forecast
as a predictor in the model.
    An inherent limitation of RF is the lack of direct uncertainty quantification in prediction.
In our case, the forecasted streamflow using RF does not yield a standard error comparable
to that provided by traditional regression model, and hence no way to provide probabilistic
confidence intervals on predictions. Methods to estimate confidence intervals have been
proposed by Wager et al. [2014], Mentch and Hooker [2016], and Coulston et al. [2016], but
they are not widely applied. For future work, computation of confidence interval in RF
prediction will be useful in addressing and understanding uncertainty.
2.4     Conclusions
Accurate streamflow forecast has extensive applications across disciplines from water re-
sources and planning to engineering design. In this study, we assessed the ability of RF
to make daily streamflow forecasts at 86 watersheds in the Pacific Northwest Hydrologic
Region. Key results are summarized below:
    • Based on the KGE scores (ranging from 0.62 to 0.99), we show that RF is capable of
                                              54


      producing skilfull forecasts across all watersheds.
    • RF performs better in snowmelt-dominated watersheds, which can be attributed to
      stronger persistence in the streamflow time series. The largest improvements in forecast
      compared to naı̈ve model are found among rainfall-dominated watersheds.
    • The two approaches for measuring predictor importance yield noticeably different re-
      sults. We recommend interpretation of the these two measures should be coupled with
      understanding of the physical processes and how these processes are connected.
    • Increase in steepness of slope and amount of sand content are found to deteriorate RF
      performance in two flow regime groups. This demonstrates catchment characteristics
      can cause variability in performance of the model and should be considered in both
      predictor selection and evaluation of the model.
Considering the current and future vulnerabilities of the Pacific Northwest to flooding caused
by extreme precipitation and significant snowmelt events [Ralph et al., 2014], skillful stream-
flow forecasts can have important implications. Due to its practical applications, RF and
RF-based algorithms continue to gain popularity in hydrological studies [Tyralis et al., 2019].
Given the promising results from our study, RF can be used as part of an ensemble of models
to achieve better generalization ability and accuracy not only in streamflow forecast but also
in other water-related applications in this region.
                                               55


APPENDIX
   56


                                        APPENDIX
                                       APPENDIX A
A.1      Vegetation response to drought at different time scales
We observe in Fig. A.1 that the vegetations in the 9 watersheds are relatively resilient
to drought at short time scales. We found no significant correlation between SPEI and
EVI was found at 1-month time scale. At 3-month, we observe a weak correlation (r = 0.26)
between EVI and SPEI at Bear Creek. At 6- and 12-month time scales, we observe significant
vegetation response to SPEI in 3 watersheds: Merced River, Bear Creek, and EF Carson
River. The observed pattern is likely due to the fact that longer SPEI time scales (6- and 12-
month) account for the water deficit accumulated from the winter season where the majority
of precipitation occurs. Furthermore, these three watersheds, which locate in the central and
southern parts of Sierra Nevada, also have higher coverage of shrubs and grasslands (Fig.
1.4) compared to the other densely forested watersheds. More specifically, [Dong et al., 2019]
reported a geographical difference between vegetation responses in Northern versus Southern
California where the sensitivity of the vegetation to drought are larger in the southern part.
This is also consistent with previous research where grassland ecosystems are more sensitive
to drought than coniferous forests [Zha et al., 2010, Vicente-Serrano et al., 2010].
                                              57


Figure A.1 Correlation between EVI and 1-, 3-, 6-, 12-month SPEI values during the
growing season (May-Sep) for the period 2001-2018. EVI time series were standardized,
according to the average and the standard deviation of the values for each month.
Relationship is considered significant at α = 0.05).
A.2     Correlation between streamflow elasticity and catchment char-
        acteristics
Climate elasticity of streamflow, a non-parametric indicator commonly used to quantify
the sensitivity of streamflow to changes in climate, is defined as the proportional change
in streamflow, Q, to the proportional change in a climatic variable such as precipitation P
[Sankarasubramanian et al., 2001] and can be expressed as:
                                                 Qt − Q̄ P̄
                                    ϵ = median(             )                            (A.1)
                                                 Pt − P̄ Q̄
    where Qt is the annual runoff, Pt is the annual precipitation, Q̄ is the long-term average
annual streamflow, P̄ is the long-term average annual precipitation. Q̄ and P̄ were calculated
using 30 years of data for 1987-2018 period.
    To explore the effects of catchment characteristics on the streamflow elasticity, we com-
puted Pearson’s correlation between ϵ and the respective catchment variable. Results are
                                              58


shown in Fig. A.2. Cells with a white background indicate the relationship is not significant
at (α = 0.05).
Figure A.2 Correlation between streamflow elasticity (ϵ) and selected catchment
characteristics. Blank tiles indicate the relationship is not significant at α = 0.05.
                                                        pe         est                           io
                                                      sl        ff                                  n        ea
                                                         o        or                         el             ar
                                                  of          to                               ev       ge
                                                 nt                     lP         lT            at
                                            ce           ce         nu        nu            n    D
                                          Pe
                                                            n          a         a      ea        ra in
                                             r         Pe          An        An         M               a
                                                          r
                                                                                                              1
                 Streamflow elasticity   −0.72        0.75      −0.29      0.46      −0.71      −0.06        0.8
                                                                                                             0.6
                             Percent of slope     −0.84         0.09       −0.58     0.69        0.5
                                                                                                             0.4
                                     Percent of forest                                                       0.2
                                                                0.16       0.55      −0.78      −0.62
                                                                                                              0
                                                             Annual P      0.32      −0.25      −0.49        −0.2
                                                                                                             −0.4
                                                                       Annual T      −0.81      −0.08
                                                                                                             −0.6
                                                                        Mean elevation                       −0.8
                                                                                                 0.3
                                                                                                             −1
                                                                    59


BIBLIOGRAPHY
      60


                                     BIBLIOGRAPHY
Jan F Adamowski. Development of a short-term river flood forecasting method for snowmelt
  driven floods based on wavelet and cross-wavelet analysis. Journal of Hydrology, 353(3-4):
  247–266, 2008.
KJ Allen, John Ogden, BM Buckley, ER Cook, and PJ Baker. The potential to reconstruct
  broadscale climate indices associated with southeast australian droughts from athrotaxis
  species, tasmania. Climate Dynamics, 37(9-10):1799–1821, 2011.
Douglas G Altman and J Martin Bland. Statistics notes variables and parameters. Bmj, 318
  (7199):1667, 1999.
David Aubert, Cecile Loumagne, and Ludovic Oudin. Sequential assimilation of soil moisture
  and streamflow data in a conceptual rainfall–runoff model. Journal of Hydrology, 280(1-4):
  145–161, 2003.
Francesco Avanzi, Joseph Rungee, Tessa Maurer, Roger Bales, Qin Ma, Steven Glaser,
  and Martha Conklin. Climate elasticity of evapotranspiration shifts the water balance of
  mediterranean climates during multi-year droughts. Hydrology and Earth System Sciences,
  24(9):4317–4337, 2020.
Roger C Bales, Michael L Goulden, Carolyn T Hunsaker, Martha H Conklin, Peter C Hart-
  sough, Anthony T O’Geen, Jan W Hopmans, and Mohammad Safeeq. Mechanisms con-
  trolling the impact of multi-year drought on mountain hydrology. Scientific Reports, 8(1):
  690, 2018.
Tim P Barnett, Jennifer C Adam, and Dennis P Lettenmaier. Potential impacts of a warming
  climate on water availability in snow-dominated regions. Nature, 438(7066):303–309, 2005.
Emilie Beaulieu, Yann Lucas, Daniel Viville, François Chabaux, Philippe Ackerer, Yves
  Goddéris, and Marie-Claire Pierret. Hydrological and vegetation response to climate
  change in a forested mountainous catchment. Modeling Earth Systems and Environment,
  2(4):1–15, 2016.
Santiago Beguerı́a, M Sergio, and Lazyload Yes.            Calculation of the standardised
  precipitation-evapotranspiration index. 2013.
Santiago Beguerı́a, Sergio M Vicente-Serrano, Fergus Reig, and Borja Latorre. Standardized
  precipitation evapotranspiration index (spei) revisited: parameter fitting, evapotranspira-
  tion models, tools, datasets and drought monitoring. International journal of climatology,
  34(10):3001–3023, 2014.
                                             61


Simon Bernard, Laurent Heutte, and Sébastien Adam. Influence of hyperparameters on
  random forest accuracy. In International Workshop on Multiple Classifier Systems, pages
  171–180. Springer, 2009.
Douglas P Boyle, Hoshin V Gupta, and Soroosh Sorooshian. Toward improved calibration
  of hydrologic models: Combining the strengths of manual and automatic methods. Water
  Resources Research, 36(12):3663–3674, 2000.
Kate A Brauman, Gretchen C Daily, T Ka’eo Duarte, and Harold A Mooney. The nature
  and value of ecosystem services: an overview highlighting hydrologic services. Annu. Rev.
  Environ. Resour., 32:67–98, 2007.
Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. Classification and
  regression trees. CRC press, 1984.
Mikhail Ivanovich Budyko. Climate and life. Academic Press, Inc., 1974.
M Luz Calle and Vı́ctor Urrea. Letter to the editor: stability of random forest importance
  measures. Briefings in bioinformatics, 12(1):86–89, 2010.
Alejandra M Carmona, Murugesu Sivapalan, Mary A Yaeger, and Germán Poveda. Regional
  patterns of interannual variability of catchment water balances across the continental us:
  A budyko framework. Water Resources Research, 50(12):9177–9193, 2014.
Diogo V Carvalho, Eduardo M Pereira, and Jaime S Cardoso. Machine learning interpretabil-
  ity: A survey on methods and metrics. Electronics, 8(8):832, 2019.
Daniel R Cayan, Kelly T Redmond, and Laurence G Riddle. Enso and hydrologic extremes
  in the western united states. Journal of Climate, 12(9):2881–2893, 1999.
Xi Chen and Hemant Ishwaran. Random forests for genomic data analysis. Genomics, 99
  (6):323–329, 2012.
Lei Cheng, Zongxue Xu, Dingbao Wang, and Ximing Cai. Assessing interannual variability
  of evapotranspiration at the catchment scale using satellite-based evapotranspiration data
  sets. Water Resources Research, 47(9), 2011.
Eunsang Cho and Jennifer M Jacobs. Extreme value snow water equivalent and snowmelt
  for infrastructure design over the contiguous united states. Water Resources Research,
  page e2020WR028126, 2020.
John W Coulston, Christine E Blinn, Valerie A Thomas, and Randolph H Wynne. Ap-
  proximating prediction uncertainty for random forest regression models. Photogrammetric
                                              62


  Engineering & Remote Sensing, 82(3):189–197, 2016.
Aiguo Dai. Increasing drought under global warming in observations and models. Nature
  climate change, 3(1):52–58, 2013.
Christopher Daly, Michael Halbleib, Joseph I Smith, Wayne P Gibson, Matthew K Doggett,
  George H Taylor, Jan Curtis, and Phillip P Pasteris. Physiographically sensitive mapping
  of climatological temperature and precipitation across the conterminous united states.
  International Journal of Climatology: a Journal of the Royal Meteorological Society, 28
  (15):2031–2064, 2008.
Anthony W D’Amato, John B Bradford, Shawn Fraver, and Brian J Palik. Effects of thin-
  ning on drought vulnerability and climate response in north temperate forest ecosystems.
  Ecological Applications, 23(8):1735–1742, 2013.
Christian W Dawson, Robert J Abrahart, Asaad Y Shamseldin, and Robert L Wilby. Flood
  estimation at ungauged sites using artificial neural networks. Journal of hydrology, 319
  (1-4):391–409, 2006.
J Demarty, Fi Chevallier, AD Friend, Ni Viovy, Shilong Piao, and P Ciais. Assimilation of
  global modis leaf area index retrievals within a terrestrial biosphere model. Geophysical
  research letters, 34(15), 2007.
Michael D Dettinger, Daniel R Cayan, Mary K Meyer, and Anne E Jeton. Simulated hydro-
  logic responses to climate variations and change in the merced, carson, and american river
  basins, sierra nevada, california, 1900–2099. Climatic Change, 62(1-3):283–317, 2004.
Mauro Di Luzio, Gregory L Johnson, Christopher Daly, Jon K Eischeid, and Jeffrey G
  Arnold. Constructing retrospective gridded daily precipitation and temperature datasets
  for the conterminous united states. Journal of Applied Meteorology and Climatology, 47
  (2):475–497, 2008.
Yonas B Dibike and Dimitri P Solomatine. River flow forecasting using artificial neural net-
  works. Physics and Chemistry of the Earth, Part B: Hydrology, Oceans and Atmosphere,
  26(1):1–7, 2001.
Kamel Didan. Mod13q1 modis/terra vegetation indices 16-day l3 global 250m sin grid v006.
  NASA EOSDIS Land Processes DAAC, 10, 2015.
Noah S Diffenbaugh, Daniel L Swain, and Danielle Touma. Anthropogenic warming has
  increased drought risk in california. Proceedings of the National Academy of Sciences, 112
  (13):3931–3936, 2015.
S Lawrence Dingman. Physical hydrology. Waveland press, 2015.
                                              63


Chunyu Dong, Glen M MacDonald, Katherine Willis, Thomas W Gillespie, Gregory S Okin,
  and A Park Williams. Vegetation responses to 2012–2016 drought in northern and southern
  california. Geophysical Research Letters, 46(7):3810–3821, 2019.
Randall Donohue, Michael Roderick, Tim R McVicar, et al. On the importance of including
  vegetation dynamics in budyko’s hydrological model. 2007.
Marketa M Elsner, Lan Cuo, Nathalie Voisin, Jeffrey S Deems, Alan F Hamlet, Julie A
  Vano, Kristian EB Mickelson, Se-Yeun Lee, and Dennis P Lettenmaier. Implications of
  21st century climate change for the hydrology of washington state. Climatic Change, 102
  (1-2):225–260, 2010.
James A Falcone. Gages-ii: Geospatial attributes of gages for evaluating streamflow. Tech-
  nical report, US Geological Survey, 2011.
Wei Feng, Hongwei Lu, Tianci Yao, and Qing Yu. Drought characteristics and its elevation
  dependence in the qinghai–tibet plateau during the last half-century. Scientific Reports,
  10(1):1–11, 2020.
Douglas A Frank and Richard S Inouye. Temporal variation in actual evapotranspiration
  of terrestrial ecosystems: patterns and ecological implications. Journal of Biogeography,
  pages 401–411, 1994.
Christopher C Funk, Andrew Hoell, and Daithi Stone. Examining the contribution of the
  observed global warming trend to the california droughts of 2012/13 and 2013/14. Bulletin
  of the American Meteorological Society, 95(9):S11–S15, 2014.
Michael L Goulden and Roger C Bales. Mountain runoff vulnerability to increased evapo-
  transpiration with vegetation expansion. Proceedings of the National Academy of Sciences,
  111(39):14071–14075, 2014.
ML Goulden, RG Anderson, RC Bales, AE Kelly, M Meadows, and GC Winston. Evapotran-
  spiration along an elevation gradient in california’s sierra nevada. Journal of Geophysical
  Research: Biogeosciences, 117(G3), 2012.
Chris B Graham, Holly R Barnard, Kathleen L Kavanagh, and James P McNamara. Catch-
  ment scale controls the temporal connection of transpiration and diel fluctuations in
  streamflow. Hydrological Processes, 27(18):2541–2556, 2013.
Baptiste Gregorutti, Bertrand Michel, and Philippe Saint-Pierre. Correlation and variable
  importance in random forests. Statistics and Computing, 27(3):659–678, 2017.
Hoshin V Gupta, Harald Kling, Koray K Yilmaz, and Guillermo F Martinez. Decomposi-
  tion of the mean squared error and nse performance criteria: Implications for improving
  hydrological modelling. Journal of hydrology, 377(1-2):80–91, 2009.
                                             64


Hoshin Vijai Gupta, Soroosh Sorooshian, and Patrice Ogou Yapo. Status of automatic
  calibration for hydrologic models: Comparison with multilevel expert calibration. Journal
  of Hydrologic Engineering, 4(2):135–143, 1999.
Minxue He, Mitchel Russo, and Michael Anderson. Hydroclimatic characteristics of the
  2012–2015 california drought from an operational perspective. Climate, 5(1):5, 2017.
Barbara FF Huang and Paul C Boutros. The parameter sensitivity of random forests. BMC
  bioinformatics, 17(1):331, 2016.
Alfredo Huete, Kamel Didan, Tomoaki Miura, E Patricia Rodriguez, Xiang Gao, and
  Laerte G Ferreira. Overview of the radiometric and biophysical performance of the modis
  vegetation indices. Remote sensing of environment, 83(1-2):195–213, 2002.
Carolyn T Hunsaker, Thomas W Whitaker, and Roger C Bales. Snowmelt runoff and water
  yield along elevation and temperature gradients in california’s southern sierra nevada 1.
  JAWRA Journal of the American Water Resources Association, 48(4):667–678, 2012.
Seok Hwan Hwang, Dae Heon Ham, and Joong Hoon Kim. A new measure for assessing the
  efficiency of hydrological data-driven forecasting models. Hydrological sciences journal, 57
  (7):1257–1274, 2012.
Hemant Ishwaran and Min Lu. Standard errors and confidence intervals for variable impor-
  tance in random forest regression, classification, and survival. Statistics in medicine, 38
  (4):558–582, 2019.
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to
  statistical learning, volume 103 xiv of, 2013.
James A Johnstone. A quasi-biennial signal in western us hydroclimate and its global tele-
  connections. Climate dynamics, 36(3-4):663–680, 2011.
Daniel J Karran, Efrat Morin, and Jan Adamowski. Multi-step streamflow forecasting using
  data-driven non-linear methods in contrasting climate regimes. Journal of Hydroinformat-
  ics, 16(3):671–689, 2013.
Wouter JM Knoben, Jim E Freer, and Ross A Woods. Inherent benchmark or not? compar-
  ing nash–sutcliffe and kling–gupta efficiency scores. Hydrology and Earth System Sciences,
  23(10):4323–4331, 2019.
John F Knowles, Noah P Molotch, Ernesto Trujillo, and Marcy E Litvak. Snowmelt-driven
  trade-offs between early and late season productivity negatively impact forest carbon up-
  take during drought. Geophysical Research Letters, 45(7):3087–3096, 2018.
N Knowles, M Dettinger, and D Cayan. Trends in snowfall versus rainfall for the western
                                              65


  united states, 1949-2001. prepared for california energy commission public interest energy
  research program. Trends in Snowfall Versus Rainfall for the Western United States,
  1949-2001. Prepared for California Energy Commission Public Interest Energy Research
  Program, 2007.
Noah Knowles, Michael D Dettinger, and Daniel R Cayan. Trends in snowfall versus rainfall
  in the western united states. Journal of Climate, 19(18):4545–4559, 2006.
Randal D Koster and Max J Suarez. A simple framework for examining the interannual
  variability of land surface moisture fluxes. Journal of Climate, 12(7):1911–1917, 1999.
Sebastian A Krogh, Patrick D Broxton, Patricia N Manley, and Adrian A Harpold. Using
  process based snow modeling and lidar to predict the effects of forest thinning on the
  northern sierra nevada snowpack. Frontiers in Forests and Global Change, 3:21, 2020.
Max Kuhn et al. Building predictive models in r using the caret package. Journal of statistical
  software, 28(5):1–26, 2008.
L Labudová, M Labuda, and J Takáč. Comparison of spi and spei applicability for drought
  impact assessment on crop production in the danubian lowland and the east slovakian
  lowland. Theoretical and Applied Climatology, 128(1-2):491–506, 2017.
David R Legates and Gregory J McCabe Jr. Evaluating the use of “goodness-of-fit” measures
  in hydrologic and hydroclimatic model validation. Water resources research, 35(1):233–
  241, 1999.
Dongyue Li, Melissa L Wrzesien, Michael Durand, Jennifer Adam, and Dennis P Letten-
  maier. How much runoff originates as snow in the western united states, and how will that
  change in the future? Geophysical Research Letters, 44(12):6163–6172, 2017.
Weiguang Li, Meiting Hou, Huilin Chen, and Xiaomin Chen. Study on drought trend in
  south china based on standardized precipitation evapotranspiration index. Journal of
  natural disasters, 21(4):84–90, 2012.
Xue Li, Jian Sha, and Zhong-Liang Wang. Comparison of daily streamflow forecasts using
  extreme learning machines and the random forest method. Hydrological Sciences Journal,
  64(15):1857–1866, 2019.
Xu Lian, Shilong Piao, Laurent ZX Li, Yue Li, Chris Huntingford, Philippe Ciais, Alessandro
  Cescatti, Ivan A Janssens, Josep Peñuelas, Wolfgang Buermann, et al. Summer soil drying
  exacerbated by earlier spring greening of northern vegetation. Science advances, 6(1):
  eaax0255, 2020.
Andy Liaw, Matthew Wiener, et al. Classification and regression by randomforest. R news,
  2(3):18–22, 2002.
                                              66


Scott R Loarie, Philip B Duffy, Healy Hamilton, Gregory P Asner, Christopher B Field, and
  David D Ackerly. The velocity of climate change. Nature, 462(7276):1052–1055, 2009.
Steven P Loheide, Richard S Deitchman, David J Cooper, Evan C Wolf, Christopher T
  Hammersmark, and Jessica D Lundquist. A framework for understanding the hydroecol-
  ogy of impacted wet meadows in the sierra nevada and cascade ranges, california, usa.
  Hydrogeology Journal, 17(1):229–246, 2009.
Gilles Louppe, Louis Wehenkel, Antonio Sutera, and Pierre Geurts. Understanding variable
  importances in forests of randomized trees. In Advances in neural information processing
  systems, pages 431–439, 2013.
Jessica D Lundquist, Michael D Dettinger, Iris T Stewart, and Daniel R Cayan. Variability
  and trends in spring runoff in the western united states. Climate warming in western
  North America: evidence and environmental effects. University of Utah Press, Salt Lake
  City, Utah, USA, pages 63–76, 2009.
Lifeng Luo, Deanna Apps, Samuel Arcand, Huating Xu, Ming Pan, and Martin Hoerling.
  Contribution of temperature and precipitation anomalies to the california drought during
  2012–2015. Geophysical Research Letters, 44(7):3184–3192, 2017.
David A Lytle and N LeRoy Poff. Adaptation to natural flow regimes. Trends in ecology &
  evolution, 19(2):94–100, 2004.
Michael E Mann and Peter H Gleick. Climate change and california drought in the 21st
  century. Proceedings of the National Academy of Sciences, 112(13):3858–3859, 2015.
Nathan Mantua, Ingrid Tohver, and Alan Hamlet. Impacts of climate change on key aspects
  of freshwater salmon habitat in washington state. 2009.
Antonio Manzano, Miguel A Clemente, Ana Morata, M Yolanda Luna, Santiago Beguerı́a,
  Sergio M Vicente-Serrano, and M Luisa Martı́n. Analysis of the atmospheric circulation
  pattern effects over spei drought index in spain. Atmospheric Research, 230:104630, 2019.
Clifford Mass. The weather of the Pacific Northwest. University of Washington Press, 2015.
Tessa Maurer, Francesco Avanzi, Steven D Glaser, and Roger C Bales. Drivers of drought-
  induced shifts in the water balance through a budyko approach. Hydrology and Earth
  System Sciences, 26(3):589–607, 2022.
Josué Medellı́n-Azuara, Julien J Harou, Marcelo A Olivares, Kaveh Madani, Jay R Lund,
  Richard E Howitt, Stacy K Tanaka, Marion W Jenkins, and Tingju Zhu. Adaptability and
  adaptations of california’s water supply system to dry climate warming. Climatic Change,
  87(1):75–90, 2008.
                                              67


Lucas Mentch and Giles Hooker. Quantifying uncertainty in random forests via confidence
  intervals and hypothesis tests. The Journal of Machine Learning Research, 17(1):841–881,
  2016.
Ashok K Mishra and Vijay P Singh. Drought modeling–a review. Journal of Hydrology, 403
  (1-2):157–175, 2011.
Marion P Mittermaier. The potential impact of using persistence as a reference forecast on
  perceived forecast skill. Weather and forecasting, 23(5):1022–1031, 2008.
Amir Mosavi, Pinar Ozturk, and Kwok-wing Chau. Flood prediction using machine learning
  models: Literature review. Water, 10(11):1536, 2018.
Philip W Mote, Sihan Li, Dennis P Lettenmaier, Mu Xiao, and Ruth Engel. Dramatic
  declines in snowpack in the western us. Npj Climate and Atmospheric Science, 1(1):1–6,
  2018.
Qiaozhen Mu, Maosheng Zhao, and Steven W Running. Improvements to a modis global
  terrestrial evapotranspiration algorithm. Remote sensing of environment, 115(8):1781–
  1800, 2011.
Qiaozhen Mu, Maosheng Zhao, and Steven W Running. Modis global terrestrial evapo-
  transpiration (et) product (nasa mod16a2/a3). Algorithm Theoretical Basis Document,
  Collection, 5, 2013.
Kristin K Nicodemus. Letter to the editor: On the stability and ranking of predictors from
  random forest variable importance measures. Briefings in bioinformatics, 12(4):369–373,
  2011.
Malcolm North. Managing sierra nevada forests. Gen. Tech. Rep. PSW-GTR-237. Albany,
  CA: US Department of Agriculture, Forest Service, Pacific Southwest Research Station.
  184 p, 237, 2012.
Renee Obringer and Roshanak Nateghi. Predicting urban reservoir levels using statistical
  learning techniques. Scientific reports, 8(1):5164, 2018.
Thais Mayumi Oshiro, Pedro Santoro Perez, and José Augusto Baranauskas. How many
  trees in a random forest? In International workshop on machine learning and data mining
  in pattern recognition, pages 154–168. Springer, 2012.
Thomas C Pagano, David C Garen, Tom R Perkins, and Phillip A Pasteris. Daily updating
  of operational statistical seasonal water supply forecasts for the western us 1. JAWRA
  Journal of the American Water Resources Association, 45(3):767–778, 2009.
Mahesh Pal. Random forest classifier for remote sensing classification. International Journal
                                              68


  of Remote Sensing, 26(1):217–222, 2005.
Ming Pan, Justin Sheffield, Eric F Wood, Kenneth E Mitchell, Paul R Houser, John C
  Schaake, Alan Robock, Dag Lohmann, Brian Cosgrove, Qingyun Duan, et al. Snow process
  modeling in the north american land data assimilation system (nldas): 2. evaluation of
  model simulated snow water equivalent. Journal of Geophysical Research: Atmospheres,
  108(D22), 2003.
Georgia A Papacharalampous and Hristos Tyralis. Evaluation of random forests and prophet
  for daily streamflow forecasting. Advances in Geosciences, 45:201–208, 2018.
Jeffrey T Payne, Andrew W Wood, Alan F Hamlet, Richard N Palmer, and Dennis P Let-
  tenmaier. Mitigating the effects of climate change on the water resources of the columbia
  river basin. Climatic change, 62(1-3):233–256, 2004.
David Peterson, Richard Smith, Iris Stewart, Noah Knowles, Chris Soulard, and Stephen
  Hager. Snowmelt discharge characteristics sierra nevada, california. US Geological Survey
  Scientific Investigations Report, pages 1–13, 2005.
K Podolak, D Edelson, S Kruse, B Aylward, M Zimring, and N Wobbrock. Estimating the
  water supply benefits from forest restoration in the northern sierra nevada. An unpublished
  report of the nature conservancy prepared with ecosystem economics. San Francisco, CA,
  2015.
NJ Potter, C Petheram, and L Zhang. Sensitivity of streamflow to rainfall and temperature
  in south-eastern australia during the millennium drought. In 19th International Congress
  on Modelling and Simulation, Perth, Dec, pages 3636–3642, 2011.
Philipp Probst, Marvin N Wright, and Anne-Laure Boulesteix. Hyperparameters and tuning
  strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge
  Discovery, 9(3):e1301, 2019.
FM Ralph, M Dettinger, A White, D Reynolds, D Cayan, T Schneider, R Cifelli, K Redmond,
  M Anderson, F Gherke, et al. A vision for future observations for western us extreme
  precipitation and flooding. Journal of Contemporary Water Research & Education, 153
  (1):16–32, 2014.
Kabir Rasouli, William W Hsieh, and Alex J Cannon. Daily streamflow forecasting by
  machine learning methods with weather and climate inputs. Journal of Hydrology, 414:
  284–293, 2012.
Satish Kumar Regonda, Balaji Rajagopalan, Martyn Clark, and John Pitlick. Seasonal
  cycle shifts in hydroclimatology over the western united states. Journal of climate, 18(2):
  372–384, 2005.
                                              69


M Renner, R Seppelt, and C Bernhofer. Evaluation of water-energy balance frameworks
  to predict the sensitivity of streamflow to climate change. Hydrology and Earth System
  Sciences, 16(5):1419–1433, 2012.
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Model-agnostic interpretability of
  machine learning. arXiv preprint arXiv:1606.05386, 2016.
Ignacio Rodriguez-Iturbe. Ecohydrology: A hydrologic perspective of climate-soil-vegetation
  dynamies. Water Resources Research, 36(1):3–9, 2000.
Edward B Royce and Michael G Barbour. Mediterranean climate effects. i. conifer water use
  across a sierra nevada ecotone. American Journal of Botany, 88(5):911–918, 2001.
Mohammad Safeeq, Guillaume S Mauger, Gordon E Grant, Ivan Arismendi, Alan F Hamlet,
  and Se-Yeun Lee. Comparing large-scale hydrological model predictions with observed
  streamflow in the pacific northwest: effects of climate and groundwater. Journal of Hy-
  drometeorology, 15(6):2501–2521, 2014.
Margarita Saft, Andrew W Western, Lu Zhang, Murray C Peel, and Nick J Potter. The
  influence of multiyear drought on the annual rainfall-runoff relationship: An a ustralian
  perspective. Water Resources Research, 51(4):2444–2463, 2015.
Eric P Salathé Jr, Alan F Hamlet, Clifford F Mass, Se-Yeun Lee, Matt Stumbaugh, and
  Richard Steed. Estimates of twenty-first-century flood risk in the pacific northwest based
  on regional climate model simulations. Journal of Hydrometeorology, 15(5):1881–1899,
  2014.
A Sankarasubramanian, Richard M Vogel, and James F Limbrunner. Climate elasticity of
  streamflow in the united states. Water Resources Research, 37(6):1771–1781, 2001.
Heidi Seibold, Christoph Bernau, Anne-Laure Boulesteix, and Riccardo De Bin. On the
  choice and influence of the number of boosting steps for high-dimensional linear cox-
  models. Computational Statistics, 33(3):1195–1215, 2018.
Julie E Shortridge, Seth D Guikema, and Benjamin F Zaitchik. Machine learning methods
  for empirical streamflow simulation: a comparison of model accuracy, interpretability, and
  uncertainty in seasonal watersheds. Hydrology and Earth System Sciences, 20(7):2611–
  2628, 2016.
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features
  through propagating activation differences. arXiv preprint arXiv:1704.02685, 2017.
Bernard A Silverman. An evaluation of eleven operational cloud seeding programs in the
  watersheds of the sierra nevada mountains. Atmospheric Research, 97(4):526–539, 2010.
                                             70


Jan Sitterson, Chris Knightes, Rajbir Parmar, Kurt Wolfe, Brian Avant, and Muluken
   Muche. An overview of rainfall-runoff model types. 2018.
Anurag Srivastava, Joan Q Wu, William J Elliot, Erin S Brooks, and Dennis C Flanagan.
   Modeling streamflow in a snow-dominated forest watershed using the water erosion pre-
   diction project (wepp) model. Transactions of the ASABE. 60 (4): 1171-1187., 60(4):
   1171–1187, 2017.
Jens T Stevens, Brandon M Collins, Jonathan W Long, Malcolm P North, Susan J Prichard,
   Leland W Tarnay, and Angela M White. Evaluating potential trade-offs among fuel treat-
   ment strategies in mixed-conifer forests of the sierra nevada. Ecosphere, 7(9):e01445, 2016.
Iris T Stewart, Daniel R Cayan, and Michael D Dettinger. Changes toward earlier streamflow
   timing across western north america. Journal of climate, 18(8):1136–1155, 2005.
Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn. Bias in ran-
   dom forest variable importance measures: Illustrations, sources and a solution. BMC
   bioinformatics, 8(1):25, 2007.
Daniel L Swain. A tale of two california droughts: Lessons amidst record warmth and dryness
   in a region of complex physical and human geography. Geophysical Research Letters, 42
   (22):9999–10, 2015.
Charles Warren Thornthwaite. An approach toward a rational classification of climate.
   Geographical review, 38(1):55–94, 1948.
Ingrid M Tohver, Alan F Hamlet, and Se-Yeun Lee. Impacts of 21st-century climate change
   on hydrologic extremes in the pacific northwest region of north america. JAWRA Journal
   of the American Water Resources Association, 50(6):1461–1476, 2014.
Bryan A Tolson and Christine A Shoemaker. Dynamically dimensioned search algorithm for
   computationally efficient watershed model calibration. Water Resources Research, 43(1),
   2007.
Hakan Tongal and Martijn J Booij. Simulation and forecasting of streamflows using machine
   learning models coupled with base flow separation. Journal of hydrology, 564:266–282,
   2018.
Hristos Tyralis, Georgia Papacharalampous, and Andreas Langousis. A brief review of ran-
   dom forests for water scientists and practitioners and their recent history in water re-
   sources. Water, 11(5):910, 2019.
U.S. Geological Survey. U.s. geological survey, 2019, national hydrography dataset (ver. usgs
   national hydrography dataset best resolution (nhd) for hydrologic unit (hu) 4 - 2001), 2020.
                                               71


Ype Van der Velde, Nikki Vercauteren, Fernando Jaramillo, Stefan C Dekker, Georgia
  Destouni, and Steve W Lyon. Exploring hydroclimatic change disparity via the budyko
  framework. Hydrological Processes, 28(13):4110–4118, 2014.
Jan N Van Rijn and Frank Hutter. Hyperparameter importance across datasets. In Pro-
  ceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &
  Data Mining, pages 2367–2376, 2018.
Julie A Vano, Bart Nijssen, and Dennis P Lettenmaier. Seasonal hydrologic responses to
  climate change in the p acific n orthwest. Water Resources Research, 51(4):1959–1976,
  2015.
Anoop Valiya Veettil, Goutam Konapala, Ashok K Mishra, and Hong-Yi Li. Sensitivity of
  drought resilience-vulnerability-exposure to hydrologic ratios in contiguous united states.
  Journal of hydrology, 564:294–306, 2018.
Naga M Velpuri, Gabriel B Senay, Ramesh K Singh, Stefanie Bohms, and James P Verdin. A
  comprehensive evaluation of two modis evapotranspiration products over the conterminous
  united states: Using point and gridded fluxnet and water balance et. Remote Sensing of
  Environment, 139:35–49, 2013.
Sergio M Vicente-Serrano, Santiago Beguerı́a, and Juan I López-Moreno. A multiscalar
  drought index sensitive to global warming: the standardized precipitation evapotranspi-
  ration index. Journal of climate, 23(7):1696–1718, 2010.
S Vicuna and JA Dracup. The evolution of climate change impact studies on hydrology and
  water resources in california. Climatic Change, 82(3-4):327–350, 2007.
Sebastian Vicuña, Rebecca Leonardson, MW Hanemann, LL Dale, and John A Dracup.
  Climate change impacts on high elevation hydropower generation in california’s sierra
  nevada: a case study in the upper american river. Climatic Change, 87(1):123–137, 2008.
James M Vose, Chelcy Ford Miniat, Charles H Luce, Heidi Asbjornsen, Peter V Caldwell,
  John L Campbell, Gordon E Grant, Daniel J Isaak, Steven P Loheide II, and Ge Sun.
  Ecohydrological implications of drought for forests in the united states. Forest Ecology
  and Management, 380:335–345, 2016.
Stefan Wager, Trevor Hastie, and Bradley Efron. Confidence intervals for random forests:
  The jackknife and the infinitesimal jackknife. The Journal of Machine Learning Research,
  15(1):1625–1651, 2014.
Zhaoli Wang, Chengguang Lai, Xiaohong Chen, Bing Yang, Shiwei Zhao, and Xiaoyan Bai.
  Flood hazard risk assessment model based on random forest. Journal of Hydrology, 527:
  1130–1141, 2015.
                                             72


RH Waring, NC Coops, W Fan, and JM Nightingale. Modis enhanced vegetation index
  predicts tree species richness across forested ecoregions in the contiguous usa. Remote
  Sensing of Environment, 103(2):218–226, 2006.
Seth J Wenger, Charles H Luce, Alan F Hamlet, Daniel J Isaak, and Helen M Neville.
  Macroscale hydrologic modeling of ecologically relevant flow metrics. Water Resources
  Research, 46(9), 2010.
Frank Wilcoxon, SK Katti, and Roberta A Wilcox. Critical values and probability levels
  for the wilcoxon rank sum test and the wilcoxon signed rank test. Selected tables in
  mathematical statistics, 1:171–259, 1970.
David M Wolock and Gregory J McCabe. Explaining spatial variability in mean annual
  runoff in the conterminous united states. Climate Research, 11(2):149–159, 1999.
Chen Yang, You-Kuan Zhang, and Xiuyu Liang. Analysis of temporal variation and scaling
  of hydrological variables based on a numerical model of the sagehen creek watershed.
  Stochastic environmental research and risk assessment, 32(2):357–368, 2018.
Dawen Yang, Fubao Sun, Zhiyu Liu, Zhentao Cong, Guangheng Ni, and Zhidong Lei. Ana-
  lyzing spatial and temporal variability of annual water-energy balance in nonhumid regions
  of china using the budyko hypothesis. Water Resources Research, 43(4), 2007.
Meijian Yang, Denghua Yan, Yingdong Yu, and Zhiyong Yang. Spei-based spatiotemporal
  analysis of drought in haihe river basin from 1961 to 2010. Advances in Meteorology, 2016,
  2016.
Ning Yao, Yi Li, Tianjie Lei, and Lingling Peng. Drought evolution, severity and trends in
  mainland china over 1961–2013. Science of the total environment, 616:73–89, 2018.
Tianshan Zha, Alan G Barr, Garth van der Kamp, T Andy Black, J Harry McCaughey,
  and Lawrence B Flanagan. Interannual variation of evapotranspiration from forest and
  grassland ecosystems in western canada in relation to drought. Agricultural and Forest
  Meteorology, 150(11):1476–1484, 2010.
Xiaohui Zheng, Qiguang Wang, Lihua Zhou, Qing Sun, and Qi Li. Predictive contributions
  of snowmelt and rainfall to streamflow variations in the western united states. Advances
  in Meteorology, 2018, 2018.
                                              73