a...
Jﬁmﬂm? 3&3. _ , . ‘ , . ‘ ‘ . . , . _ 2W6 .
. . , ﬂﬁ . , ‘ ‘ . 24V: .

a
a? _
$3..
4.1... ..
3.2... ..
..

at 5...
Fa:

 

a»). ...
‘ 5.5 Rh...

:1

I|
..3_....«..§lu.
Stuftbll
3:13....(‘1 u

Ev!
. t
.

 

. ti:
o L‘ .ti 4: I.
:4 «’y.

.v . a. .w . . c
v. II. “I
‘30 \ nu
. |. bvv-a..u‘!.0n
. it» I

WIFE

 

 

 

 

5L
3’ if" 0 LIBRARY
Michigan State
Ui liversity

 

 

This is to certify that the
thesis entitled

QUANTIFYING AND PREDICTING ERROR IN DEM-
DERIVED FLOW-DIRECTION

presented by

GLENN ALEXANDER O'NEIL

has been accepted towards fulﬁllment
of the requirements for the

MS. degree in Geggraphy

 

 

  

A)

f M .
Major Professor’s Signﬁﬁre
4'17 , 20/ 0

Date

MSU is an Afﬁrmative Action/Equal Opportunity Employer

—-.--.--.--.—-—n-a-.--.-

OI-t-o-o-n-I-o-o-o-u-o-I-uI

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

 

 

DATE DUE DATE DUE DATE DUE
FEB 2 6 2012
0830 11

 

 

 

 

 

 

 

 

 

 

 

 

 

5/08 KzlProjIAccaPrelelRCIDateDue.indd

 

QUANTIFYWG AND PREDICTIN G ERROR IN
DEM-DERIVED FLOW-DIRECTION

By

Glenn Alexander O’Neil

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

MASTER OF SCIENCE
Geography

2010

ABSTRACT

QUANTIFYING AND PREDICTING ERROR IN
DEM-DERIVED FLOW-DIRECTION

By
Glenn Alexander O’Neil

In this thesis, I present a new method for quantifying errors in ﬂow-direction
rasters derived from USGS 10-meter digital elevation models (DEMs). The method
utilizes ﬂow-direction rasters derived from ﬁner resolution 2.5-foot LiDAR DEMs as a
reference dataset, and performs cell-by-cell comparisons between the two datasets to
quantify error in the USGS product. I applied this method to a sample of 80 7 5-minute
USGS quadrangles, stratiﬁed by land cover and topography, across Ohio. I employed
quadrangle-level measures of agricultural concentration, total relief, and contour
topology error as independent variables in a multiple linear regression model to predict

mean ﬂow-direction error.

Copyright by
GLENN ALEXANDER O’NEIL
2010

DEDICATION

I dedicate this thesis to my wife Jamie and son Rowan. Though this work cannot
measure my appreciation for their love and support, I would be remiss not to

acknowledge my greatest sources for inspiration.

iv

ACKNOWLEDGMENTS

I would not have been able to complete this thesis without the steadfast support and
advice offered by my advisor, Ashton Shortridge. His insightful feedback helped me

resolve the theoretical and technical hurdles this research presented.

I would also not have been able to complete this thesis without the support of Jon
Bartholic of the Institute of Water Research. His ﬂexibility as a director provided me
with the time and resources needed to generate and analyze the detailed and extensive

datasets of this research.

TABLE OF CONTENTS

LIST OF TABLES ........................................................................................................... viii
LIST OF FIGURES ........................................................................................................... ix
KEY TO SYMBOLS AND ABBREVIATIONS ............................................................ xiv
CHAPTER 1
INTRODUCTION ............................................................................................................... l
1.1 Statement of Problem ................................................................................................. 1
1.2 Digital Elevation Models and Flow-direction ............................................................ 3
1.3 Flow-direction Error ................................................................................................... 4
CHAPTER 2
BACKGROUND AND HYPOTHESES ............................................................................. 6
2.1 Background ................................................................................................................ 6
2.2 Hypotheses ............................................................................................................... 10
CHAPTER 3
METHODS — THE SAMPLE AND INDEPENDENT VARIABLES .............................. 13
3.1 The Sample — Stratiﬁed by Agricultural Concentration and Relief ......................... 14
3.2 Contour Topology Error ........................................................................................... 19
CHAPTER 4
METHODS - CALCULATING THE DEPENDENT VARIABLE: FLOW-DIRECTION
ERROR .............................................................................................................................. 26
4.1 Software and Hardware ............................................................................................ 26
4.2 Data Acquisition ....................................................................................................... 28
4.3 IO-meter DEM Processing ....................................................................................... 29
4.4 LiDAR DEM Processing .......................................................................................... 31
4.4.1 LiDAR Stream Identiﬁcation ........................................................................ 34
4.4.2 Carving Through Artiﬁcial Barriers ............................................................. 45
4.4.3 Implementing Stream Identiﬁcation and Carving in LiDAR Processing ..... 48
4.4.4 Stream-burning and Sink-ﬁlling LiDAR DEMs ........................................... 50
4.5 Flow-direction Error ................................................................................................. 52
CHAPTER 5
RESULTS AND ANALYSIS ............................................................................................ 65

vi

5.1 Independent t-tests .................................................................................................... 66

5.2 Correlations .............................................................................................................. 69
5.3 Spatial Regression .................................................................................................... 71
CHAPTER 6
CONCLUSION .................................................................................................................. 80
6.1 Summary .................................................................................................................. 80
6.2 Signiﬁcance .............................................................................................................. 81
6.3 Future Research ........................................................................................................ 84
APPENDICES
Appendix A - Model Diagrams ..................................................................................... 88
Al Diagram Legend .............................................................................................. 88
A2 Pre-processing ................................................................................................. 89
A3 Post-processing ............................................................................................... 90
Appendix B — Python Code ............................................................................................ 91
B.1 raster_analysis.py ............................................................................................ 91
B.2 analysispreppy ................................................................................................ 95
B.3 stream_dlg_conversion.py ............................................................................. 112
B4 strearn_id_nhood.py ...................................................................................... 114
B5 stream_id_transect.py .................................................................................... 129
3.6 carve.py ......................................................................................................... 136
B7 ﬂow_error.py ................................................................................................. 148
3.8 contour_dlg_conversion.py ........................................................................... 162
B9 contour_topology_analysis.py ....................................................................... 164
8.10 analysis_results.py ....................................................................................... 170
REFERENCES ................................................................................................................ 172

vii

TABLE 3-1:

TABLE 4-1:

TABLE 5-1:

TABLE 5-2:

TABLE 5-3:

TABLE 5-4:

LIST OF TABLES

Breakdown of quadrangles sampled by relief and % agriculture ................ 17
Quadrangles where LiDAR stream identiﬁcation methods were tested ...... 39
Flow-direction error results for various sampling criteria combinations ....67
Independent t-tests between sampling criteria ............................................. 69
Correlations between ﬂow-direction error and other variables ................... 70

Alternative linear regression models, and measures of performance .......... 79

viii

LIST OF FIGURES

Images in this thesis are presented in color

FIGURE 3-1: Ohio 7.5 minutes quadrangles by agricultural concentration. .................. 16
FIGURE 3-2: Ohio 7.5 minute quadrangles by total relief .............................................. 17
FIGURE 3-3: Sample 7.5 minute quadrangles. ............................................................... 18

FIGURE 3-4: Sample contours from a 7.5 Minute quadrangle. Note the topological
errors - dangles at l and intersections at 2-5 ..................................................................... 20

FIGURE 3-5: Corrected contours. Though it is difﬁcult to tell, the intersecting contours

have been manually separated through digitizing ............................................................. 20
FIGURE 3-6: DEM derived from topologically incorrect contours. ............................... 21
FIGURE 3—7: DEM derived from corrected contours. .................................................... 21

FIGURE 3-8: Surface ﬂow concentrations (ﬂow-accumulation) derived from
topologically incorrect contours. ...................................................................................... 22

FIGURE 3-9: Surface ﬂow concentrations derived ﬁom corrected contours. ................ 22

FIGURE 3-10: F low-direction error calculated for the topologically incorrect DEM, with
the corrected DEM as the reference. Error was greatest around the intersection points. 23

FIGURE 3-11: Contours 50, 51, and 52 all have Node 100 as their ToNode, indicating
an intersection at that node. .............................................................................................. 24

FIGURE 4-1: Burning streams into a DEM. ................................................................... 30

FIGURE 4-2: Deriving ﬂow-direction with D8 (graphic source: ESRI ArcGIS 9.3 Help
ﬁles) .................................................................................................................................. 30

FIGURE 4-3: LiDAR tiles selected within a 7.5 minute quadrangle for pre-processing. 32

ix

FIGURE 4-4: Left - aerial photograph. Right — LiDAR DEM and DLG stream feature
in blue. DLG stream vector fails to align with LiDAR's drainage ditch due to differences
in positional accuracy. ...................................................................................................... 33

FIGURE 4-5: Cell positions evaluated by the Neighborhood Method. C = center, M =
max, 0 = opposite max, R1 = perpendicular 1, R2 = perpendicular 2. ............................ 37

FIGURE 4-6: Transect analysis cells (in black); S = starter_cell, O = south_cell, M =
min_cell, X = max_cell_south_of_min. ........................................................................... 39

FIGURE 4-7: Quadrangles where the LiDAR stream identiﬁcation methods were tested.
Numbers refer to Site # in Table 4-1. ............................................................................... 40

FIGURE 4-8: Site 1. Left — aerial; Left Center — LiDAR DEM; Right Center — streams

identiﬁed by the Neighborhood Method; Right — by the Transect Method ...................... 40
FIGURE 4-9: Site 2, same descriptions as FIGURE 4-8. ................................................ 40
FIGURE 4-10: Site 3, same descriptions as FIGURE 4-8. .............................................. 41
FIGURE 4-11: Site 4, same descriptions as FIGURE 4-8 ................... 41
FIGURE 4-12: Site 5, same descriptions as FIGURE 4-8. .............................................. 41
FIGURE 4-13: Site 6, same descriptions as FIGURE 4-8. .............................................. 41
FIGURE 4-14: Site 7, same descriptions as FIGURE 4-8. .............................................. 42
FIGURE 4-15: Site 8, same descriptions as FIGURE 4-8. .............................................. 42
FIGURE 4-16: Site 9, same descriptions as FIGURE 4-8. .............................................. 42
FIGURE 4-17: Site 10, same descriptions as FIGURE 4-8. ............................................ 42

FIGURE 4-18: Left - rough edges of Neighborhood Method stream cells. Right —
smoother edges of Transect Method. ................................................................................ 45

FIGURE 4-19: Leﬁ - the barrier in the DEM. Center — sinks in stream identiﬁed (in
black), lower elevation found (in yellow), path charted (in blue). Right — carved DEM

with elevation values incrementally lowered along path. ................................................. 46
x

FIGURE 4-20: Carving freed up ﬂow within stream features. Left - drainage ditch in
pre-carved DEM. Right - carved drainage ditch. ............................................................. 47

FIGURE 4-21: Carving resolved some barriers (created north-south connection) but also
introduced potential errors (N W—SE connection, N-S connection at right). ..................... 47

FIGURE 4-22: The Neighborhood Method with a minimum elevation difference of 3
feet performed well at identifying stream cells in the LiDAR DEM for areas of high relief
and little to no agriculture. ................................................................................................ 49

FIGURE 4-23: A combination of the Neighborhood and Transect methods performed
best in ﬂatter areas dominated by agriculture. .................................................................. 49

FIGURE 4-24: Left - the ﬁll process created ﬂat features in the LiDAR DEM. Right -
Flow accumulation derived from the ﬁlled LiDAR DEM's ﬂow-direction indicate that the
ﬂat features did not dramatically alter the ﬂow-direction. ............................................... 51

FIGURE 4-25: 3x3 neighborhood of lO-meter ﬂow-direction cells. Cell to be analyzed
for error at center. ............................................................................................................. 53

FIGURE 4-26: LiDAR ﬂow-direction cells intersecting the selected lO-meter cell. ...... 54

FIGURE 4-27: 0% of center LiDAR cells drain through the northwest edges of the
neighborhood. ................................................................................................................... 56

FIGURE 4-28: 0% of center LiDAR cells drain through the north edges of the
neighborhood. ................................................................................................................... 56

FIGURE 4-29: 89% of center LiDAR cells drain through the northeast edges of the
neighborhood. ................................................................................................................... 57

FIGURE 4-30: 0% of center LiDAR cells drain through the east edges of the
neighborhood. ................................................................................................................... 57

FIGURE 4-31: 0% of center LiDAR cells drain through the southeast edges of the
neighborhood. ................................................................................................................... 57

FIGURE 4-32: 0% of center LiDAR cells drain through the south edges of the
neighborhood. ................................................................................................................... 58

xi

FIGURE 4-33: 0% of center LiDAR cells drain through the southwest edges of the
neighborhood. ................................................................................................................... 58

FIGURE 4-34: 11% percent of center LiDAR cells drain through the west edges of the
neighborhood. ................................................................................................................... 58

FIGURE 4-35: Potential ﬂow error scores for assessment of lO-meter cell that ﬂows
southwest ........................................................................................................................... 60

FIGURE 4-36: Flow-direction error map for a single 7.5 minute quadrangle. Blue grid
lines are the result of avoiding edge issues in each LiDAR tile. ...................................... 62

FIGURE 4-37: For the selected lO-meter cell, estimated ﬂow-direction is to the south. 63

FIGURE 4-38: For the selected lO-meter cell, estimated LiDAR ﬂow-direction trends

mainly east and northeast. ................................................................................................. 63
FIGURE 4-39: The ﬂow-direction error is considered high, with a value of 2.98. ......... 63
FIGURE 4-40: Flow-direction error map with ﬂat cells discarded. ................................ 64

FIGURE 5-1: Spatial distribution of ﬂow-direction error amongst the sampled

quadrangles. ...................................................................................................................... 66
FIGURE 5-2: Model 1 - initial linear regression model of ﬂow-direction error. ............ 71
FIGURE 5-3: Histograms of regression variables from Model 1. ................................... 72
FIGURE 5-4: Model 1 diagnostics. ................................................................................. 73
FIGURE 5-5: Mapped residuals of Model 1 .................................................................... 74

FIGURE 5-6: Residuals of Model 1, greater residual error of higher ﬁtted values
indicates potential heteroscedasticity. ............................................................................... 74

FIGURE 5-7: Residuals of Model 1, corrected for spatial autocorrelation, no apparent
heteroscedasticity. ............................................................................................................. 76

FIGURE 5-8: Mapped residuals, corrected for spatial autocorrelation. .......................... 76

xii

FIGURE 5-9: Model 1 diagnostics, corrected for residual spatial autocorrelation. ........ 77

FIGURE A-l: Processing Diagrams - Diagram Legend .................................................. 88
FIGURE A-2: Processing Diagrams - Pre-processing Steps. .......................................... 89
FIGURE A-3: Processing Diagrams - Post-processing Steps .......................................... 9O

xiii

KEY TO SYMBOLS AND ABBREVIATIONS

ASCII — American Standard Code for Information Interchange
ATtILA — Analytical Tools Interface for Landscape Assessments
D8 — single direction ﬂow-direction algorithm

DEM — digital elevation model

DLG - digital line graph

EPA — Environmental Protection Agency

ESRI — Environmental Systems Research Institute

F DE -— F low-direction error

GDAL — Geospatial Data Abstraction Library

GIS — geographical information systems/science

HRHA — high relief high agricultural concentration

HRLA — high relief low agricultural concentration

LiDAR -— Light Detection and Ranging

LRHA — low relief high agricultural concentration

LRLA — low relief low agricultural concentration

NED — National Elevation Dataset

NHD — National Hydrography Dataset

NLCD — National Land Cover Dataset

OGRIP — Ohio Geographically Referenced lnforrnation Program
OSIP — Ohio State-wide Imagery Program

RMSE — root mean square error

xiv

SAR — simultaneous auto-regression
USGS — United States Geological Survey

UTM — Universal Transverse Mercator

XV

Chapter 1 Introduction

1.1 Statement of Problem

In the ﬁeld of hydrology, digital elevation models (DEMs) are frequently used to
predict surface water ﬂow direction and derive digital stream networks. The accuracy of
these predictions and networks relies on the ﬁdelity of the source DEMs. In ﬂatter areas
inaccuracies in a DEM can be more pronounced, particularly if the interpolation of
contours with whole number values (e. g. 315 versus 315.7 feet) was employed to build
the DEM, which is the standard of most DEMs produced by the United States Geological
Survey (USGS) (Carter 1988, Carter 1992, Wilson and Gallant 2000). Agricultural areas
are also prone to DEM inaccuracy. The errors in both ﬂat and agricultural areas are
typically due to issues with the source contours that yield a DEM. In ﬂat areas, sparse
contours provide too little information to the interpolating algorithm that produces the
DEM. In agricultural areas, the source contours can contain topological errors, such as
intersections, around drainage ditches that confuse the interpolation algorithm. The
results are digital surfaces that do not truly reﬂect the Earth’s surface and therefore
inaccurately predict surface water ﬂow. These errors can pose a signiﬁcant problem for
researchers and conservationists who rely on models of surface water ﬂow-direCtion,
such as soil conservationists concerned with water-borne soil erosion and nutrient runoff

from agricultural lands into streams and lakes. If they cannot reference accurate

predictions of how water ﬂows over the surface, then efforts to identify and remediate

sedimentation-prone areas will be misguided and waste valuable time and resources.

There is a rich literature documenting absolute elevation error in DEMs, but
research assessing error in DEM-derived ﬂow-direction is lacking. In this thesis I will
help ﬁll this gap by exploring the following questions:

1. How can ﬁner-resolution DEMs be employed to evaluate ﬂow-direction error
in coarser DEMs?

2. What are the primary challenges in such an evaluation?

3. What landscape characteristics best predict ﬂow-direction error?

To address these questions, I developed a method for calculating ﬂow-direction
error, and implemented it across a large sample of USGS 7.5-minute quadrangles in
Ohio. I then conducted a multivariate analysis to predict ﬂow-direction error in terms of

landscape characteristics, accounting for residual spatial autocorrelation.

This research contributes to the understanding of uncertainty in digital elevation
products, particularly how error propagates into ﬂow-direction. The methods I developed
also aid in the analysis of high-resolution DEMs, speciﬁcally in the extraction of
hydrologic features from such data. At a more practical level, these results inform
hydrological analyses in engineering, agriculture, and any other disciplines that rely on
simulations of surface water ﬂow. For example, with a better understanding of erosion
model uncertainty, soil conservationists can more effectively target efforts to reduce

sediment loading to streams and lakes.

The remainder of this introduction will preview the datasets and methods

employed in this research and layout the organization of the thesis.

1.2 Digital Elevation Models and Flow-direction

A DEM is a raster (grid) dataset of elevation values sampled at regularly spaced
intervals (resolution). They can be produced in a number of ways, but most published
products have been generated through interpolation from topographic sheet contours,
remote sensing from airplanes or satellites, or stereoscopic interpretation of aerial
photographs. For this thesis, I evaluated lO-meter resolution USGS DEMs, produced
through contour interpolation, with 2.5-foot resolution LiDAR (Light Detection and
Ranging) DEMs, produced through remote sensing. The USGS DEMs were interpolated
from contours of USGS topographic maps at a scale of 1:24,000. The LiDAR DEMs
were interpolated from points captured by an airplane-mounted sensor at an altitude of

7,300 feet above ground level.

The USGS is the leading source for DEMs in the US. The freely accessible
National Elevation Dataset (NED) includes 30-meter resolution DEMs for the contiguous
US, lO-meter DEMs for the majority of the nation, and ﬁner resolutions in select
locations. Not all of the DEMs in the NED were produced by the USGS. The NED is
described as “the best available raster elevation data for the coterrninous United States,”
(U SGS 2006). If DEMs meeting National Map Accuracy Standards and of resolutions
ﬁner than those produced by the USGS exist, they may be included in the NED. This

situation applied to the study site of this thesis. Ohio’s 2.5-foot resolution LiDAR DEMs

are accessible through the NED, even though they were generated for the state by a

private contractor.

The organization of DEMs as neighborhoods of elevation values allows for the
calculation of numerous derivative datasets, including slope, aspect, line of sight, and
surface water ﬂow-direction. This last derivative is the horizontal direction of surface
water ﬂow out of a particular area. In the case of ﬂow-direction derived from a DEM, it
is calculated for each grid cell in the original DEM, and can be determined through
multiple methods. Some approaches represent ﬂow-direction as an angle and disperse it
in multiple directions (Quinn et a1. 1991, Freeman 1991 , Tarboton 1997). Other
approaches are more restrictive and represent it as a single value assigned to one of eight
compass directions (O’Callaghan and. Mark 1984, Greenlee 1987, Jenson and Domingue
1988). Nearly all the methods are driven by calculating slope angles between a particular
cell and its surrounding neighbors. For this research, I employed a single-direction
method limited by the eight compass directions. I will discuss this selection in greater

detail in Chapter 5.

1.3 Flow-direction Error

I developed a novel method to calculate the difference in ﬂow-direction between
lO-meter USGS DBMS and 25-foot LiDAR DEMs. This method compared a ﬂow-
direction value of a USGS DEM cell to an aggregated value from LiDAR DEM cells
contained within the coarser USGS cell. I coded the comparison into a weighted error

score based on the extent of agreement (or disagreement) between the two values. I then

determined an error score for the majority of USGS DEM cells in eighty 7.5-minute

quadrangles in Ohio (roughly 1 million cells per quadrangle).

I then calculated the mean ﬂow-direction error for each quadrangle, and used this
statistic as the dependent variable in a multiple linear regression model. I sought to
explain how ﬂow-direction error varied in relation to changes in total relief, agricultural

concentration, and contour topology error.

The next chapter will discuss previous efforts to evaluate DBMS and their
derivatives, and establish this thesis as a new and unique contribution to that effort.
Chapter 3 will describe how I gathered and processed the independent variables of the
proposed multiple linear regression model. Chapter 4 will describe the pre-processing
steps of the USGS lO-meter and LiDAR DEMs, including a new approach for stream
extraction from high-resolution digital elevation data, and the calculation of ﬂow-
direction error. Chapter 5 will describe statistical measures of how ﬂow-direction error
varied across different landscapes, provide an evaluation of the proposed and alternative
regression models, and present an analysis of the results. Chapter 6 will summarize the
ﬁndings, discuss their signiﬁcance, and describe additional research needed to extend this

effort.

Chapter 2 Background and Hypotheses

2.1 Background

Due to the numerous applications that DEMs support, their ubiquity, and the fact
that they are freely accessible, DEM error has been a well studied topic within
geographic information science. Wilson and Gallant (2000) and Fisher and Tate (2006)
provide thorough overviews of existing research on this topic. The USGS classiﬁes error
in its DEMs as either blunders (major elevation discrepancies), systematic (errors that
follow a ﬁxed pattern and are predictable), and random (beyond the control of the
observer). It acknowledges that the errors cannot be completely resolved (USGS 1998).
In addition to these errors, numerous studies have shown error in DBMS and their
derivatives to be spatially autocorrelated (Goodchild et al. 1992, Fisher 1993, Heuvelink
1998, Oksanen and Sarjakoski 2005). To inform users of the uncertainty in the data, the
USGS publishes each DEM with estimates of its root—mean-square error (RMSE), a
measure of vertical accuracy. RMSE is calculated by comparing a minimum of 28
ground-truth elevations to those of the DEM. USGS standards for most DEMs hold
RMSE to no more than one-half of a contour interval (U SGS 1998). Most of the contour
datasets used to produce USGS DEMs have intervals of 5 or 10 feet, implying maximum
RMSE values of 2.5 or ﬁve feet. Fisher (1998) argued that in addition to RMSE, DEMs
should be published with measures of spatial autocorrelation to allow users to
appropriately visualize and simulate the error and its propagation into derivative

products.

Knowing the RMSE for a DEM is useful to researchers concerned with absolute
elevation accuracy, such as Carson et a1. (1997) and Shortridge (2006), to name a few.
However, absolute elevation accuracy is not necessarily informative to users studying
DEM derivative products such as slope, curvature, catchment basins, aspect, and ﬂow-
direction. Even DEMs with low RMSEs can yield signiﬁcantly variable derivatives
(Garbrecht and Starks 1995, Wise 1998, Holmes et a1. 2000, Endreny and Wood 2001 ,
Barber and Shortridge 2005). Elevations for DEMs of particularly ﬂat regions of Ohio
can have value ranges less than 50 feet over areas as large as 64 square miles. These
estimated elevations may be well within the RMSE standards; but derivative products
dependent upon an elevation’s relationship to neighboring values, such as slope, can be

greatly affected by even subtle changes in elevation.

Martinoni and Bernhard (1998) argued that the accuracy of DEM derivatives
warrant greater attention. Ziadat (2007) studied the impact of cell-size and contour
interval on DEM-derived slope, in addition to absolute elevation values. Venteris and
Slater (2005) evaluated multiple DEM derivatives in terms of their ability to predict soil
carbon. Wu et a1. (2005, 2007) evaluated the effects of DEM resolution on the estimation
of soil erosion. Riggs and Dean (2007) explored errors in DEM-derived view-sheds.
Desmet (1997) evaluated differences in slope and aspect across different interpolation
methods, through mean and standard deviation comparisons, on small, arable ﬁelds in
Belgium. Chang and Tsai (1991) studied slope and aspect differences between different
DEM resolutions in a more spatially explicit manner, on a 1 square kilometer region of

Taiwan’s eastern coast.

Other studies have looked at DEM predictions of hydrology. Endreny and Wood
(2001) studied differences between ﬂow-direction methods in terms of predicted ﬂow-
paths on three 64 hectare study sites in western Oklahoma. They compared simulated
ﬂow-path networks to control networks and recorded the percent of overlap. Walker and
Willgoose (1999) demonstrated the inaccuracies in derivatives of published DEMs
through quantitative comparisons of up-stream ﬂow-accumulation networks, slope-area
relationship, and normalized width function within a 1.4 square kilometer surveyed
reference dataset in southwest Australia. Wise (2000) similarly evaluated DEMs through
catchment basins and up-stream ﬂow accumulation, though he focused on a small
catchment in southwest England. Kienzle (2004) explored actual directional changes in
ﬂow measured in degrees on four study sites (roughly 6 square miles each) in Canada.
Wise (2007), Wilson et a1. (2007), Barber and Shortridge (2005), and Thompson et al.
(2001) studied DEM-derived surface ﬂow through comparisons to derived catchment
boundaries. Clarke and Lee (2007) evaluated DEM estimates of surface ﬂow through a

comparison to known stream features in the National Hydrography Dataset (N HD).

While these analyses have demonstrated the propagation of DEM error into
hydrological derivatives, gaps remain in fully documenting this issue. The previous
research has either focused on relatively small and, in some cases, homogenous study
sites, or employed coarse metrics. To better describe error in these derivatives and
identify landscape characteristics that potentially cause it, an analysis on a large and
diverse study area is needed. Such an effort would generate ﬁndings that could be better

generalized to other locations than the ﬁndings of previous research. Furthermore, while

evaluating simulated catchment boundaries is an accepted method of measuring error in
DEM, it does not capture ﬁeld-level variation in DEM-derived surface water ﬂow-
direction. To understand how errors in DEM hydrological derivatives differ at the ﬁnest
scales, evaluations must be made on a cell-by-cell basis. Though Clark and Lee (2007)
approach this speciﬁcity in their comparison of DEM-derived ﬂow to NHD features, their
approach would not allow for a direct comparison where stream features do not exist,
such as the middle of a farm ﬁeld.

For this thesis, I sought to quantify and predict error in ﬂow-direction rasters
derived from USGS lO-meter DEMs (RMSE 12.45 feet —— Ohio Ofﬁce of Information
Technology 2005) by comparing them to ﬂow-direction rasters derived from ﬁner, more
vertically and horizontally accurate LiDAR rasters (RMSE 0.5 feet -Woolpert Inc. 2008).
The approach of evaluating less precise spatial data with more precise data has been
utilized before (Chang and Tsai, 1991; Fisher, 1998; Kyriakidis et al., 1999; Walker and
Willgoose, 1999; Holmes et al., 2000; Kienzle, 2004) and is endorsed by the USGS
(Digital Cartographic Data Standards Task Force, 1988). Numerous studies have
explored performance differences between DEMs of different resolutions and found ﬁner
resolutions superior, depending on the application (Chang and Tsai, 1991; Wolock and
McCabe, 2000; Wu et al., 2005; Mouton, 2005; Wu et al., 2007; Deng et al., 2007; Aziz
and Steward, 2007; Riggs and Dean, 2007). My study site comprised 80 USGS 7.5-
minute quadrangles (over 5,000 square miles) of varying topography and land cover in
Ohio. By focusing at such a ﬁne scale over such a large and diverse study area, I sought

to evaluate DEM error in a novel and insightful manner.

2.2 Hypotheses

I expected areas with low relief, high concentrations of agriculture, and
represented by DEMs produced from contours that contained large numbers of
topological errors (intersections) to produce higher values of ﬂow-direction error. More

formally:

 

Equation: FDE = a + BpAPA + BTRTR + BCICI

Where:

F DE = ﬂow-direction error
PA = % agriculture

TR = total relief

CI = contour intersections

Hypotheses:

H0: BPA = 0 Ha: BPA > 0
H0: BTR = 0 Ha: BTR < 0
H0: BCI = 0 Ha: BCI > 0

 

 

 

Figure 2-1: Proposed multiple linear regression model of flow-direction error.

As established earlier, derivatives from relatively ﬂat DEMs are highly sensitive
to propagated error. Digital contour datasets are a common source for creating DEMs
(F lorinsky, 1998), including all of the Ohio lO-meter DEMs employed in this research
(Ohio Ofﬁce of Information Technology 2005). Flat areas tend to have fewer and sparser
contours as input to the interpolation algorithms that produce DEMs. The wider the

geographic space between contours, the greater the uncertainty in the characterization of

10

surface water ﬂow over that space. The interpolation process is known to introduce
errors into the resulting DEMs, sometimes signiﬁcantly (F aintich 1996, Carson and

Reutebuch, 1997, Carrara et a1. 1997, Meyer 2004).

The presence of agricultural drainage ditches can add to this error. These features
tend to concentrate a ﬂat area’s relatively few contours around them and create
topological errors (intersections, speciﬁcally) in the contours, confusing the DEM
interpolation algorithm. These errors are propagated into the DBMS and their derivative
products. Theoretically, contours should never cross; but when digital contours are
derived from scanning paper or mylar maps, such as USGS 7.5-minute digital line graph
(DLG) hypsography, artifacts in the source map or scanning resolution can cause dense
contours to merge together and create a topological violation (Greenlee, 1987).
Additionally, cartographers may force contours to merge to indicate steep features such
as cliffs, identiﬁed as carrying contours in the DLG attributes. The USGS standards for
DLGs require that intersecting contours be identiﬁed through node topology (described in
greater detail in Chapter 3) (U SGS, 1999). When these topologically errant contours are
interpolated to yield a DEM, the elevation values around an area of intersection can be
inaccurately estimated. For example, if a 700 and 710 foot contour intersect, the DEM
interpolation algorithm would have difﬁculty accurately resolving the true surface at that
location. The 700 foot contour would, in essence, be exposed to the area that
cartographically represents the 710 to 720 foot elevation range, and would subsequently
draw down surrounding values in the resulting DEM. In most cases, the resulting error in

terms of absolute elevation values in this area would be small, as the error would be

11

mitigated by the neighboring cells that factored into the interpolation. Nevertheless, some
DEM derivative products, such as surface water ﬂow-direction, are more sensitive to this

error, as discussed earlier.

Given the effect that low relief, agricultural ditches, and contour intersections can
have on error in interpolated DEMs, I expected that their impacts would be more
substantial on the particularly error-sensitive ﬂow-direction derivative. This expectation

drove the design my hypothesis.

The next chapter provides a detailed description of how the 80 quadrangles were
sampled and processed to extract the dependent variables for the proposed regression

model in Figure 2-1.

12

Chapter 3 Methods -— The Sample and Independent Variables

The general steps I took to analyze ﬂow-direction error in this research were as

follows:

1.

Select a sample of USGS 7.5-minute quadrangles, stratiﬁed by total relief and
agricultural concentration.

Gather USGS 10-meter DEMs, 2.5-foot LiDAR DEMs, DLG hypsography,
and DLG hydrography for each quadrangle.

Process the lO-meter and LiDAR DEMs by enforcing hydrography presence,
ﬁlling spurious sinks, and calculating ﬂow-direction rasters for each
quadrangle.

Calculate the extent of contour intersection for DLG hypsography in each
quadrangle.

Quantify mean ﬂow-direction error for each quadrangle through cell-by-cell
comparison of lO-meter and LiDAR ﬂow-direction rasters.

Generate a multiple linear regression model predicting ﬂow-direction error in
terms of total relief, agricultural concentration, and contour intersection.

I calculated ﬂow-direction error as a single mean value for each quadrangle in a

sample of 80 USGS 7.5-minute (1 :24,000) quadrangles. I will detail this calculation in

the next chapter, but for now I establish it as a single numeric value that will serve as the

dependent variable in a multiple linear regression model. This chapter focuses on the

sampling and pre-processing of the proposed regression model’s independent variables:

percentage agriculture, relief, and contour intersections (steps 1, 2, and 4, of the process

described above).

13

3.1 The Sample - Stratiﬁed by Agricultural Concentration and Relief

This research focused exclusively on the state of Ohio. The state was an ideal
study location because it contained the variability in land cover and topography through
which I attempted to explain ﬂow-direction error. Additionally, the Ohio Geographically
Referenced Information Program (OGRIP - http://ogrip.oit.ohio.gov/Home.aspx)
provided nearly all of the data needed to conduct the analysis for free. The OGRIP
website offered USGS lO-meter DEMs, LiDAR point elevations, interpolated LiDAR

DEMs, USGS 7.5-minute DLG hypsography (contours) and DLG hydrography (streams).

The USGS 7.5-minute quadrangle served as the study’s sample unit, as this is the
spatial scale and extent at which USGS DEMs are typically generated, including all of
the Ohio lO-meter DEMs provided through the NED. Ohio is covered by 793
quadrangles. Due to the intensive pre-processing needed to calculate ﬂow-direction error
(detailed in Chapter 4), I selected a sub-sample of 80 quadrangles. To avoid
oversampling the proposed regression model’s independent variables, I stratiﬁed the
samples by their respective total relief and percentages of row-crop agriculture. I
calculated these attributes for each quadrangle with the Analytical Tools Interface for
Landscape Assessment (ATtILA) tool from EPA (US EPA, 2007), an extension for

ESRI’S ArcView 3.3.

I provided ATtILA with the quadrangle boundaries for Ohio, a lO-meter DEM for
the entire state (described in greater detail in Chapter 4), and a state-wide land cover

14

raster. I downloaded the 2001 National Land Cover Database (NLCD) from the website
of the Multi-Resolution Land Characteristics Consortium
Orttp://www.mrlc.gov/nlcd.php). The NLCD was derived from 30-meter resolution
Landsat data and reported an overall accuracy of 84% (Homer et al., 2007). I obtained a
copy of the dataset that covered the entire Midwest region and clipped it by Ohio’s state
boundary. With these state-wide inputs of elevation and land cover, ATtILA was able to
output the elevation range and the percentage of various land cover classes in each
quadrangle, including row-crop agriculture (2001 NLCD class 82) (Figures 3-1 and 3-2).
I then utilized these quadrangle attributes as independent variables in the proposed

regression model (Figure 2-1).

Next, in order to avoid potential issues in deriving accurate hydrology from
sampled DEMs, I removed quadrangles that contained large amounts of open water (e. g.,
quadrangles on the Lake Erie border), quadrangles where more than 10% of the area was
classiﬁed as urban (because it is difﬁcult to predict surface water ﬂow in downtown
Cleveland), quadrangles that straddled the state border (because the LiDAR tiles would
not cover the entire areas of these quadrangles), and quadrangles that straddled the
boundary of the Ohio State Plane Northern and Southern coordinate systems. OGRIP
distributes the LiDAR DEMs in appropriate State Plane projections. It proved to be
programmatically difficult to analyze a quadrangle that contained LiDAR data in two
different projections. Since LiDAR rasters served as the more accurate reference dataset,
it was important to not introduce additional error into them; therefore, I avoided re-

projecting between State Plane zones. This ﬁltering process reduced the quadrangle

15

population from 793 to 538. I then divided the 538 quadrangles into the following four
bins: high percentage agriculture and high relief (HAHR), high percentage agriculture
and low relief (HALR), low percentage agriculture and high relief (LAHR), low
percentage agriculture and low relief (LALR). The breakpoints for these bins
corresponded to the median values for each class, classifying half of the quadrangles as
high agricultural concentration and half as low; they were similarly classiﬁed by relief. I
then randomly selected 20 quadrangles from each of these bins in order to yield the ﬁnal

sample of 80 quadrangles (Table 3-1 and Figure 3-3).

Percentage Row-crop
Agriculture (quartiles)
CT 0 - 7.5

Er] 7.5 - 30.0

- 30.0 - 71.5

- 71.5 - 93.0

 

 

 

Figure 3-1: Ohio 7.5-minute quadrangles by agricultural concentration.

 

 

 

  

 

 

 

Elevation Range
(feet - quartiles)

 

 

 

 

 

 

[:f} 9- 169
[j 170 - 338
- 339 - 463
- 464 - 886

Figure 3—2: Ohio 7.5-minute quadrangles by total relief.
. Relief
Quadrangle Samplmg Totals
High ( > 327feet) Low (< 327feel)
High ( > 67.8% ) 20 out of 37 20 out of 232 40 out of 269
% Agriculture

Law ( < 67.8% ) 20 out of 232 20 out of 37 40 out of 269
Totals 40 out of 269 40 out of 269 80 out of 538

 

 

 

 

Table 3-12 Breakdown of quadrangles sampled by relief and % agriculture.

17

 

Sample
Quadrangle

Cl

 

 

 

 

 

Figure 3-3: Sample 7.5-minute quadrangles.

As shown in Figure 3-3, the sampled quadrangles tended to cluster along Ohio’s
SW-NE axis. This trend was a byproduct of the sample stratiﬁcation. Northwest Ohio is
dominated by agriculture and low relief, so the 20 quadrangles that constituted this sub-
sarnple had a large geographic area from which they could have been selected. The
situation was similar for the high relief and low percentage agriculture sub-sample of
southeast Ohio. These two situations naturally clustered the remaining half of the sample

along the southwest-northeast axis.

3.2 Contour Topology Error

As established in Chapter 2, contour intersections are suspected of contributing to
ﬂow-direction error. Figures 3-4 through 3-10 illustrate the impact that these errors of
topology can have on ﬂow-direction. Figure 3-4 identiﬁes topological errors in a small
section of a USGS hypsography DLG (Alma, MI). Those errors were corrected in Figure
3-5. Figure 3-6 shows a DEM interpolated from the errant contours of Figure 3-4; note
the large depression created at location 4 of Figure 3-4 and its impact on ﬂow
concentration in Figure 3-8. Figure 3-7 shows a DEM interpolated from the corrected
contours; note the difference in ﬂow concentration in Figure 3-9. Figure 3—10 maps the
difference in ﬂow-directions between the two DEMs (see section 4.5 for details). The
blue areas indicate agreement, while the red areas indicate dramatic disagreement (180

degrees) and are found in the areas where contour intersections were identiﬁed in Figure

3-4.

19

 

 

 

 

 

 

Figure 3-4: Sample contours from a 7.5-minute quadrangle. Note the topological errors - dangles at
l and intersections at 2-5.

 

W/

U

 

 

 

 

f\

Figure 3-5: Corrected contours. Though it is difﬁcult to tell, the intersecting contours have been
manually separated through digitizing.

 

20

 

 

Figure 3-6: DEM derived from topologically incorrect contours.

 

 

Figure 3-7: DEM derived from corrected contours.

21

 

 

Figure 3-8: Surface flow concentrations (flow-accumulation) derived from topologically incorrect
contours.

 

 

Figure 3-9: Surface flow concentrations derived from corrected contours.

22

Flow-direction Ermr

a Severe (4)
a High (3)
:1 Moderate (2)
a how (I)
- None (0)
Figure 3-10: Flow-direction error calculated for the topologically incorrect DEM, with the corrected

DEM as the reference. Error was greatest around the intersection points. (See Figure 4-35 for
legend reference)

 

I developed a Python script to determine the number of contour intersections in
DLG hypsography (the source for the lO-meter DEMs) in each sample quadrangle. The
script converted hypsography DLGs to ArcInfo coverages. This conversion allowed for
the building of line and node topology for each new coverage. The topology included
FromNode and ToNode attributes for each contour feature, which constituted a logic
within the contours by which intersections could be identiﬁed. If multiple features shared
the same FromNode or ToNode, this indicated the presence of an intersection (Figure 3-
11). The Python script employed this logic, iterating through the contours within each
sample quadrangle, identifying the intersections, and recording an intersection count for

each quadrangle. A quadrangle with high relief, and therefore many contours, could

23

potentially have more intersections than a ﬂat quadrangle with comparatively fewer
contours. However, the impact of the intersections on ﬂow-directions would likely be
more dramatic in the ﬂatter quadrangle since there are fewer “correct” contours to
mitigate the error (location 4 in Figure 3-4). In the high relief quadrangle, the
intersections would likely occur in an area dense with “correct” contours that would
conﬁne any of the errors introduced into ﬂow-direction to a small area. To properly
associate contour intersections with potential ﬂow-direction error, the script calculated

the number of intersections per contour in each quadrangle.

 

\K/,// \ \

\.—-’
f-J Contom' ID: 53 ////

l FElevation: 750
y/”\ romNode = 100 i/
’ ‘ FToNode= 101 /i//
\ / V/
\ \ /
lContour ID= 50| \\\ f
\i I Elevation= 750 /
f

1 ,
‘iFromNodc= 99 // \ ode ID— __ 101

ToNode — 100
Node ID— — 100
\\.
L_
_
C 51 l ContourlD—52
l

 

 

 

 

 

ontour ID 5
Elevation: 755 } Elevation = 760
FromNode= 98 l FromNode = 97
{/4 ToNodc = 100 \ ToNode = 100
l" ,3 I/
/ /
/

 

Figure 3-11: Contours 50, 51, and 52 all have Node 100 as their ToNode, indicating an intersection at
that node.

24

In the next chapter, I will describe the steps I took to process the USGS and

LiDAR DBMS, and detail how I calculated ﬂow-direction error.

25

Chapter 4 Methods — Calculating the Dependent Variable:
Flow-direction error
The previous chapter covered how I selected the study’s 80 sample quadrangles,
and calculated the dependent variables of the proposed regression model. In this chapter
I discuss how I calculated ﬂow-direction error. I ﬁrst describe the software and hardware
used in that calculation. I then describe the data sources of the calculation’s inputs, the

pre-processing of those inputs, and the speciﬁc steps for calculating ﬂow-direction error.

The process I implemented for this thesis, outside of its design and programming,
required a substantial quantity of person-hours and computer-hours. It took one month to
complete the analysis for the selected sample (80 7.5-minute quadrangles), with six
machines processing data the entire time. The basic steps were as follows: 1, pre-
process the 10-meter DEMs; 2, pre-process the LiDAR DEMs; 3, calculate the ﬂow-

direction error in the 10-meter DEMs; 4, conduct the statistical analysis.

4.1 Software and Hardware

I performed the data processing almost entirely within ArcGIS 9.3 and the Python
programming language. ArcGIS handled all of the data pre-processing including
clipping, re-projecting, and map algebra functions. It also performed much of the
hydrologic processing of DEMs, including the ﬁlling of spurious sinks and calculating
surface water ﬂow-directions. Python is a ﬂexible and open-source scripting language

that interfaces with ArcGIS’s geo-processor. This integration allowed me to automate

26

much of the ArcGIS pre-processing functions through Python scripts. I employed Python
outside of ArcGIS to perform some of the pre-processing and all of the data post-
processing. Its list data structure is well-suited for analyzing raster datasets, such as
LIDAR DEMs, as ASCII text ﬁles. It is also easy to create text ﬁles with Python in order
to export analysis results and convert them back to rasters for further analysis in ArcGIS.
Python’s ﬂexibility enables it to easily incorporate 3rd party tools, such as the Geospatial
Data Abstraction Library (GDAL - http://www.gdal.org/), an open source project for
raster analysis. Utilizing the list data structure and GDAL raster analysis tools, I
developed custom Python scripts to identify stream locations, carve through artiﬁcial
barriers in LiDAR DBMS, and calculate ﬂow-direction error. I have included all of the

Python scripts used in this research in Appendix B.

The automated data pre-processing and post-processing was conducted on
multiple machines. Four desktop PCs of varying capabilities (from Pentium 4 with 1GB
of RAM to Dual Core with 3GB) running Windows XP handled the pre—processing. On
average, each of the 80 sample quadrangles took about 12 hours to pre-process. This
entailed re-projecting inputs and clipping the state-wide lO-meter DEM by quadrangle
boundaries. It also entailed burning stream locations into, ﬁlling, and calculating ﬂow-
direction for the 10-meter DEMs. The LiDAR pre-processing took up the signiﬁcant
majority of the processing time. The stream identiﬁcation and carving of LiDAR DEMs
took probably 95 — 99% of the total pre-processing time for each quadrangle. Three
additional machines handled the post-processing (calculation of ﬂow-direction error).
Since I scripted the post-processing step entirely in Python (outside of ArcGIS) it could

be performed on Linux machines. Two older PCs (a Pentium III and a Celeron, each

27

with 512 MB of RAM) installed with Linux (Ubuntu Jaunty 9.03) calculated ﬂow-
direction error at an average rate of four hours per quadrangle. A Lenovo Thinkpad
(Dual-core with 3GB of RAM) also aided in the post-processing, at a rate of 1 hour per

quadrangle.

4.2 Data Acquisition

I generated lO-meter DEMs for each sample quadrangle by clipping a mosaicked
state-wide DEM by the respective quadrangle boundaries. The state-wide DEM was
obtained from OGRIP, and is the same data accessible through the NED. The statewide
DEM was created by OGRIP in piecemeal by merging 20-30 1224,000 7.5-minute DLG
hypsography ESRI shapeﬁles into a larger contour GIS layer, using a ﬁnite difference
interpolation technique through ESRI ArcInfo’s TopoToRaster command to convert the
contours to a DEM, and ﬁnally mosaicking the new DEMs into a state-wide dataset

(Ohio Ofﬁce of Information Technology 2005).

I downloaded LiDAR 2.5-foot DEMs from OGRIP’s OSIP (Ohio Statewide
Imagery Program) site in zipped county-wide ﬁles (over 400GB). Each zipped ﬁle
contained multiple 5,000 x 5,000 foot DEM tiles covering the particular county’s extent.
The naming scheme for the tiles was based on a tile’s lower-left X and Y coordinate in
the Ohio State Plane coordinate system. For example, the DEM for tile 722180445 had its
lower left coordinate at 218000, 445000 (X,Y) of the Ohio State Plane North coordinate
system. I utilized this naming scheme to identify the LiDAR tiles that intersected each

sample quadrangle. These DEMs were derived from LiDAR point data (in LAS format —

28

a standard format for LiDAR storage) gathered mainly during leaf-off conditions in

spring 2006 by Woolpert Inc. for the State of Ohio (Woolpert, 2007).

I downloaded stream data, quadrangle boundaries, and hypsography from OGRIP
in the form of 1:24,000 DLGs for each sample quadrangle. I then wrote custom Python

scripts to convert these DLGs to ESRI shapeﬁles for use in the analysis.

4.3 10-meter DEM Pre—processing

Pre-processing took place one sample quadrangle at a time. To generate a ﬂow-
direction raster from a lO-meter DEM for a selected quadrangle I followed steps typically
taken in hydrological GIS analyses to ensure proper surface water drainage. Once the
state-wide lO-meter DEM had been clipped by the boundary of a sampled quadrangle, I
set the analysis mask for the subsequent lO-meter processing to the clipped DEM; this
ensured that all IO-meter outputs aligned spatially. Next, I re-projected the DLG stream
vector dataset from a UTM projection to the appropriate Ohio State Plane Projection
(either Ohio North - International Feet, or Ohio South — International Feet, depending on
the sampled quadrangle) and converted it to a binary raster (stream cell = 1, all others =
O). I then burned the resulting stream raster into the DEM by raising all non-stream cell
elevations in the DEM by 10 feet (Figure 4-1). Stream-burning can improve a DEM’s
ability to characterize the ﬂow-direction of surface water by emphasizing the locations of
streams (Hutchinson, 1989; Saunders, 1999). Next, the ArcGIS Fill command removed
potential spurious sinks in the DEM. Sinks can appear in DEM data as artifacts of the

production method and confound algorithms designed to route surface-water ﬂow over

29

the landscape (Hutchison, 1989; Tarboton et al., 1991). Finally, the ArcGIS Flow-

direction command created a ﬂow-direction raster for the 10-meter DEM. The Flow-

direction tool determines the direction of surface-water ﬂow for each cell by identifying

the cell in a 3x3 neighborhood with the steepest descent from the center cell. The result

is a numeric value for the center cell corresponding to the direction of this steepest

neighbor (Figure 4-2). This method is a single-ﬂow algorithm, it cannot partition ﬂow

among multiple directions. It is commonly referred to as D8 (8 possible directions)

(O’Callaghan and Mark, 1984; Greenlee, 1987; Jenson and Domingue, 1988).

 

Figure 4—1: Burning streams into a DEM.

 

 

 

 

streams
Elevation

' Higher

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Elevations
78 72 69 71 58 49
74 67 56 49 46 50
69 53 44 37 38 48
64 58 55 22 31 24
68 61 47 21 16 19
74 53 34 12 11 12

 

 

 

Figure 4-2: Deriving ﬂow-direction with D8 (graphic source: ESRI ArcGIS 9.3 Help ﬁles)

While D8 has been shown to perform adequately in some studies (Skidmore 1989,

Mouton 2007), it has been frequently maligned in literature as being too rigid and prone

to introducing artifacts into the data, including long runs of parallel water ﬂows (Wilson

30

et al. 2007, Raaﬂaub and Collins 2006, Venteris and Slater 2005, Schmidt and Persson
2003, Burrough et al. 2000, Jones 1998, Mitasova et al. 1995). However, for this analysis
it was the logical choice for several reasons. First, the calculation of upstream ﬂow
accumulations (i.e., the number of cells ﬂowing into a particular cell) in the LiDAR
DEMs was essential for the comparison to the lO-meter DEM, and required a single-
value ﬂow-direction raster as input. Second, and most importantly, the D8 algorithm is
the only available ﬂow-direction tool in the ArcGIS Toolbox. Since ESRI is the world-
leader in GIS market share (Daratech Inc., 2009), D8 is, by default, the most common
ﬂow-direction algorithm implemented in desktop GIS installations today, including the
one used for this research. This popularity of D8, though arguably undeserved, makes
the results of this research more applicable and relevant to the majority of current GIS

USCI'S.

4.4 LiDAR DEM Processing

The pre-processing of LiDAR DEMs was the most time-consurning, in terms of
person-hours and machine-hours, and complex piece of the research. The two primary
challenges were developing a means to burn streams into the LiDAR DBMS and carving
through artiﬁcial barriers that impede surface-water ﬂow. Only after these issues were
resolved could an adequate LiDAR ﬂow-direction raster be generated and used to

evaluate a lO-meter ﬂow-direction raster.

The ﬁrst step in pre-processing LiDAR DEMs for a sampled quadrangle was to

identify which LiDAR tiles intersected the quadrangle. Within Python, I was able to use

31

the bounding Ohio State Plane coordinates of each 7.5-minute quadrangle and the
coordinate-based nomenclature of the LiDAR tiles to determine which ones were
contained within the quadrangle’s extent. To avoid edge issues, I excluded tiles that
overlapped the quadrangle boundaries. This choice sometimes led to a selection of 40

LiDAR tiles, but the majority of the time it selected 48 tiles (Figure 4-3).

l D 7.5 Minute Quad
E Selected LiDAR Tiles
LiDAR Tiles

 

Figure 4-3: LiDAR tiles selected within a 7.5 minute quadrangle for pre-processing.

Once the LiDAR tiles had been identiﬁed, Python scripts began the process of
deriving ﬂow-direction rasters for each tile. Generally, the procedure to accomplish this
was the same as that for the 10-meter DEMs. The code burned streams into a DEM to
enforce appropriate surface-water drainage, ﬁlled spurious sinks, and calculated ﬂow-
direction with the D8 algorithm. However, LiDAR’s positional accuracy and means of
acquisition made stream burning difﬁcult. The LiDAR DEM’s map scales ranged from
1:100 to 1:1,000, far superior to the 124,000 map scales for the DLG hydrography.
Though both datasets were developed to adhere to National Map Accuracy Standards, the

difference in dataset map scales meant that the positions of the LiDAR elevation values

32

were more accurate than the positions of the DLG stream features. The LiDAR
positional accuracy varied little about the 7-foot postings of the source LAS points
(Woolpert Inc., 2007), whereas the DLG hydrography’s position could be in error by as
much as 40 feet or more (USGS, 1999), making stream burning by incorporating vector
features impractical. It would be possible, if not probable, that a burned stream feature

could run through a farm ﬁeld, and not in the drainage ditch represented in the DEM

(Figure 4-4).

 

streams
Elevation
- .. ~ Higher

   

'i i"

,_" ‘_; Lower

 

Figure 4-4: Left — aerial photograph. Right — LiDAR DEM and DLG stream feature in blue. DLG
stream vector fails to align with LiDAR's drainage ditch due to differences in positional accuracy.

Figure 4-4 also illustrates how LiDAR DEMs can contain artiﬁcial barriers within
streams. Notice how the drainage ditch is bisected in the middle of the ﬁgure. In this
instance this feature indicates the presence of a culvert running under a driveway. This
barrier is a byproduct of how LiDAR data is acquired. While the 10-meter DEMs were
interpolated from contours, the LiDAR DEMs were interpolated from numerous tightly
spaced points recorded by an airplane-mounted sensor. Each point represented a location
where the sensor’s beam struck the surface (or an object) and returned, allowing for a

determination of elevation at that point. There is no way for the points and elevations

33

under the driveway bridge to be evaluated, creating a virtual barrier within the stream.
Any effort to deﬁne surface-water ﬂow for Figure 4-4 would err and assume that ﬂow
halted at this barrier; or worse, the entire ditch might be identiﬁed as a sink and ﬁlled by
the ArcGIS ﬁlling algorithm. To resolve these two issues, I developed a method for
identifying stream cells using only the LiDAR elevation values. I also implemented a

method for carving through artiﬁcial barriers, like the bridge over the ditch.

4. 4.1 LiDAR Stream Identiﬁcation

The extraction of surface features from DEMs has been studied extensively. In
terms of using LiDAR to identify stream features, ﬁve previous efforts stand out.
MacMillan et al. (2003) described the capabilities and problems that can arise when
trying to identify stream networks from high-resolution digital elevation data in Alberta,
Canada. The authors used an approach similar to the one I implemented here; however,
their design was more complex, appeared to have failed to resolve the in-stream barriers
posed by bridges and overpasses, and focused on a single location. Leckie et al. (2005)
used spectral reﬂectance, not elevation, of water and substrate material to identify stream
features in Vancouver, British Columbia. Though they were able to achieve high
accuracy rates in certain environments, spectral reﬂectance LiDAR data is less accessible
than elevation data, making their method harder to implement elsewhere. Mason et a1.
(2006) developed an identiﬁcation method based on edge-detection, similar to my
approach. However, they focused on delta regions in Italy and Germany, whereas I

implemented stream identiﬁcation on areas of varying topography and land cover. Vogt

34

et al. (2003) attempted to identify ideal upslope accumulation thresholds for stream
identiﬁcation in different landscapes, but concluded that ﬂat areas still required manual
stream digitizing. The automated approach I developed performed best in ﬂat areas.
James et al. (2007) looked at contour crenulations as a means for locating streams and
areas of concentrated surface ﬂow. Despite their success, their method required that
contour lines be adequately crenulated (i.e., the study-area was not ﬂat), which would
pose problems for ﬂat agricultural areas dominated by ditch drainage, which
characterizes most of northwest Ohio. Contours for this area are sparse and relatively un-
crenulated. The stream identiﬁcation method that I developed is tailored for such an

environment.

A popular approach for deriving stream networks from DEMs is to identify a
threshold for upland contributing cells, and code any cell that exceeds that threshold as a
stream cell. Due to the presence of artiﬁcial barriers (discussed in 4.4.2) in LiDAR
DEMs, this approach would not have worked well for much this study’s area. Surface
water ﬂow would be halted as these barrier locations, limiting appropriate ﬂow-

accumulation tabulations.

I developed two methods for identifying stream features. One method relied on
analyzing cell-neighborhoods around a center cell, and the other focused on transects of
cells. In designing the two methods, I considered what distinguished stream locations in
terms of topography. As discussed previously, I could not use vector features for
streams; all I had were elevation values in the LiDAR DEMs. The basic assumption with
both methods was that stream cells should be opposed by higher elevations in one

direction (e. g., north to south), and similar elevations in the perpendicular direction (e. g.,

35

east to west). The higher elevations should represent the stream banks, while the similar

elevations should represent the stream itself. I developed each method as a Python script.

The ﬁrst method, called the Neighborhood Method, performed a neighborhood
analysis on each cell in the DEM. The script iterated through each cell and analyzed a
square 121-cell neighborhood (11 by 11) around the selected cell (call it center). It
identiﬁed the maximum elevation value (call the cell with that value max) within that
neighborhood and calculated the difference between that value and the elevation value of
the selected cell (call that difference max_minus_center). If max_minus_center was
greater than a user-speciﬁed minimum difference in elevation from a stream bank to the
stream (call this value min__elev_diff, I typically employed values of l and 3 feet) the
script then checked the value of the opposing cell (call it opposite_max) to see if the
difference between its value and center '3 also exceeded min_elev_diﬁT If both of these
conditions were met, then the script deemed it possible that the center cell was a stream
cell opposed by two stream banks. It then looked at opposing cells (call them
perpendicular_1 and perpendicular__2) in the direction perpendicular to the direction of
max and opposite_max to see if their elevations were below min_elev_diﬁf If they were,
then the script deemed center and its entire neighborhood as stream cells. See Figure 4-5
for a conceptual visual. Essentially, if a cell had two relatively tall points opposing it,
these may have been stream banks. If the same cell had relatively ﬂat slopes in the
direction perpendicular to the tall points, then these may have been stream cells. This
approach captured the basic topographic relationship I considered in deﬁning stream

locations. More formally:

36

IF (max — center) > min_elev_diﬁr

AND (opposite_max — center) > min_elev_diﬁ"
AND (perpendicular_1 — center) < min_elev_diﬁ’
AND (perpendicular_2 — center) < min_elev_diﬂ

THEN neighborhood = stream cell

Elevation

' ' Higher

' V . Lower

 

Figure 4-5: Cell positions evaluated by the Neighborhood Method. C = center, M = max, 0 =
opposite max, R1 = perpendicular 1, R2 = perpendicular 2.

The second method for stream identiﬁcation was called the Transect Method.
Like the Neighborhood Method, it also performed a neighborhood analysis on each cell;
but it was a much simpler technique, based on transects. The script began by moving
through the DEM in a north-south direction looking for decreases in elevation. At each
cell (call it starter_cell) the script checked starter_cell ’s southern neighbor (call it
south_cell) to see if its elevation value (call it south_elev) was less than that of

starter_cell’s (call it start_elev). If it was, the script then evaluated a lO-cell transect

37

south of starter_cell. The script then identiﬁed the minimum elevation value (call it
min_elev) and its cell location (call it min_cell) in the transect. If the difference between
start_elev and min_elev was greater than a user-speciﬁed elevation difference (same as
min_elev_diff in the Neighborhood Method, typically 1-3 feet) the script deemed the
transect as potential stream cells. The script then continued to analyze the transect by
looking for the maximum elevation value south of min_cell (call the value
max_elev_south_of_min and the cell max_cell_south_of_min). If the difference between
max_elev_south__of_min and min_elev was also greater than min_elev_diﬁ§ then the cells
from starter_cell to max_cell_south_of_min were classiﬁed as stream cells. Figure 4-6
provides a visual conceptualization of the Transect Method. Similar to the Neighborhood
Method, if a cell was opposed by two higher cells, then it was possible that these cells

were part of the stream network. More formally:

IF start__elev > south_e1ev THEN
IF (start_elev — min_elev) > min_elev_diff
AND (max_elev_south_of_min — min_elev) > min_elev_diff
THEN

start_cell TO max_cell_south_of_min = stream cell

38

Elevation

Higher

   

. Lower
Figure 4-6: Transect analysis cells (in black); S = starter_cell, O = south_cell, M = min_cell, X =
max_cel|_south_of_min.

The Transect Method repeated this approach in three other directions: east to

west (west_cell instead of south_cell), northwest to southeast, and southwest to northeast.

I evaluated the Neighborhood and Transect methods on ten LiDAR tiles of

varying relief, drainage, and land-cover (Table 4-1 and Figure 4-7).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I I I

Is“. # vsgfag-fargipgte I £315: 11.) I mm... L I... Coy... I
I l IWarsaw I 310 Natural lForest I
I 2 IClevelandSmiﬂii 7 {#777132 militia," urban IUrban I
L 3 IColumbus SE, NE I #7 7 56 INatural, urbaan Hrban ‘1’ , _I
L: Arlington I 30 Natural I Agriculture I
5 McClure I 15 . Ditch I Agriculture J

L 6 Knoxville I 485 I Natural IForest, residential I
L 7 I Huntsville I 57 IDitch IAgriculture. forest I
I 8 IGenoa I 24 Ditch, natural 1Agriculture
F 7 I I
I 9 ILatty, Oakwood I 30 Natural, ditch 7 IAgriculture, forest I
I 10 IStonecreek I 320 Natural TForest. agriculture 4

 

Table 4-1: Quadrangles where the LiDAR stream identiﬁcation methods were tested.

39

 

 

Evaluated
[:1 Quads

 

 

 

 

 

 

 

Figure 4-7: Quadrangles where the LiDAR stream identification methods were tested. Numbers
refer to Site # in Table 4-1.

 

 

 

 

 

 

 

Figure 4-8: Site 1. Left — aerial; Left Center -— LiDAR DEM; Right Center - streams identiﬁed by
the Neighborhood Method; Right — by the Transect Method.

 

 

Figure 4~9: Site 2, same descriptions as Figure 4-8.

40

 

 

 

 

 

 

 

 

J

/"

R
‘3

 

 

 

Figure 4-11: Site 4, same descriptions as Figure 4—8.

 

 

 

 

 

 

 

 

 

 

 

 

 

J“

. ~\\
\ ‘I,
t-

v

. 7‘
I \ ‘
,' \ 1 \

 

Figure 4-13: Site 6, same descriptions as Figure 4-8.

41

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4-15: Site 8, same descriptions as Figure 4-8.

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4-17: Site 10, same descriptions as Figure 4-8.

42

The evaluation of the ten sites indicated that the Neighborhood and Transect
stream identiﬁcation methods’ performance varied over different landscapes. The
Neighborhood Method performed adequately in the ﬂatter, ditch-dominated study sites
like 4, 5, 8, and 9. The Transect Method’s best performance was in sites 5, 8, and 10.
Neither method performed very well at sites 2, 3, 6, or 7. The main problem for the
methods at sites 2 and 3 was the topographic complexity of an urban landscape. Site 2
was in the heart of Cleveland, and its streets and buildings posed too complex a surface
for the Neighborhood and Transect methods to resolve. Site 3 captured the campus of
Ohio State University, which also was too complex a surface for the methods to
adequately identify stream locations. This difficulty in urban environments is the reason
that study’s sample of 80 quadrangles were ﬁltered to avoid quadrangles that had even
marginal percentages of urban land cover (> 10%). Neither method could process the
width of the main rivers, such as the Olentangy in site 3 and the Ohio in site 6. Both
methods expected the stream network not to exceed 10 cells (25 feet), which grossly
underestimated the widths of the Olentangy and Ohio rivers, 261 and 1,100 feet
respectively. The problem for the methods at site 7 was that they picked up too many
non-stream cells. These errors were due to the forested landscape, which has been shown
to pose problems in the analysis of LiDAR and other remotely sensed elevation products
(Barber and Shortridge, 2005; Shortridge, 2006). Despite the fact that the LiDAR data
was pre-processed by OGRIP to generate a bare-earth condition, artifact tree canopies
remained in many LiDAR tiles and distorted the stream-to-bank relationship or hid a

stream altogether. The adjacency of an errant elevation value of a tree canopy next to a

43

bare earth value created the illusion of a steep feature and, in the eyes of the two

methods, a potential stream bank.

In comparison to each other, the Neighborhood Method performed better in areas
where the drainage network was more sinuous (Figures 4-11 and 4-16), while the
Transect Method better handled straight agricultural drainage ditches (Figures 4-12 and
4-15). The Transect Method’s poorer performance on sinuous streams was due to its
focus on the basic cardinal (N -S-E-W) and intermediate (N W-SW-SE-NE) directions.
The directions of transects along a sinuous stream would include much more than the
eight considered by the Transect Method; therefore it missed many transects along such
streams and failed to identify them as part of the stream network. This limitation could
potentially be resolved by incorporating all possible transect directions into the method,
but this might render it computationally inefﬁcient. However, the Transect Method’s
focus on those eight basic directions made it well-suited for identifying straight
agricultural ditches. While the Neighborhood Method also did well in identifying these
locations, the Transect Method outputs were cleaner and generated more quickly. The
Neighborhood Method output tended to yield diamond-shaped stream neighborhoods,
which in some instances gave the stream network messy and irregular boundaries. The
boundaries of ditches identiﬁed by the Transect Method tended to be smooth and
consistent, since its East-West analysis of a North-South trending ditch would essentially
yield horizontal transects stacked neatly on top of each other (Figure 4-18). The
processing of the Transect Method was also faster (1 min. 29 sec.) than the Neighborhood

Method for site 5 (5 min. 42 sec.).

44

 

streams

Elevation

I I Higher

I 1.1....

 

Figure 4-18: Left - rough edges of Neighborhood Method stream cells. Right — smoother edges of
Transect Method.

4. 4.2 Carving Through Artificial Barriers

A useful method to resolve artiﬁcial barriers in DEMs is to carve through them
(Soille et al. 2003, Soille 2004a, Soille, 2004b). In this process, the elevations of the
barrier cells are essentially reclassiﬁed in order to route the ﬂow through them.
However, in order to ensure that this step is only applied within stream cells, the stream
network must ﬁrst be identiﬁed. Soille’s method looked at all sinks in a DEM; I was
only interested in removing barriers within streams. I developed a process, which I refer
to as the Carving Method, that searched within the stream cells identiﬁed by one of the
previous methods (Neighborhood or Transect), and looked for sinks (cells in which each
of the 8 adjacent neighbors have higher elevations — as would be the case at the base of
the artiﬁcial barrier in Figure 4-4). The method then searched an expanding square
neighborhood, within a user-speciﬁed maximum neighborhood size (100 cells as a
default) for an elevation value smaller than that of the sink cell in question. If it found
such a cell (presumably on the other side of the barrier), a path was charted from the sink
to the lower cell along which the elevation of each cell was incrementally lowered to

create a slope for water to ﬂow along, essentially removing the barrier (Figure 4-19).

45

 

 

di

SI;

CII

ba

Ila

UIII

311C

The method could be iterated over a user-speciﬁed number, as each carving could yield

new sinks.

 

Figure 4-19: Left - the barrier in the DEM. Center —- sinks in stream identiﬁed (in black), lower
elevation found (in yellow), path charted (in blue). Right — carved DEM with elevation values
incrementally lowered along path.

The implementation of the Carving Method yielded desired results in some
locations, but introduced errors in others. The method was implemented in all ten of the
stream identiﬁcation study locations, but was most applicable in the agricultural drainage
ditch-dominated sites (5, 7, and 8). The method required a prior identiﬁcation of the
stream network, rendering it essentially useless for the urban sites. The sites with natural
drainage did not have many, if any, in-stream barriers. The method worked as designed
for the site 5 location shown in Figure 4-19. Figure 4-20 illustrates how the method
created clear and uninhibited paths within drainage ditches by removing even small
barriers. The left side of Figure 4-21 shows how the Carving Method successfully routed
water ﬂow under a road (N -S direction); but it also shows how the method introduced
undesired carvings. It is not likely that a NW-SE ditch ﬂows under the road intersection,
and aerial photography did not indicate the presence of a culvert running N-S along the

right-edge of Figure 4-21 (in green).

46

, i
1
e1
7!

r
an!»

 

I

Figure 4-20: Carving freed up ﬂow within stream features. Left - drainage ditch in pre—carved
DEM. Right - carved drainage ditch.

 

Figure 4-21: Carving resolved some barriers (created north-south connection) but also introduced
potential errors (NW-SE connection, N-S connection at right).

The Carving Method was sensitive to even slight differences in elevation. When
it encountered an artiﬁcial barrier in the stream, its primary goal was to ﬁnd a neighbor
with an elevation lower than that sink. As its search neighborhood expanded, it may have
encountered cells with lower elevations that were not connected to the original sink, as
illustrated by the right-side of Figure 4-21. A future, more “intelligent” Carving Method
may be able to resolve this issue by considering the general direction of the stream in its
neighborhood analysis. In the case of Figure 4-21, instead of searching in a
neighborhood that incrementally expands in all directions, the expansion could occur in
only the N-S direction, removing the possibility of the NW-SE carving that resulted in

this analysis.

47

4. 4.3 Implementing Stream Identification and Carving in LiDAR Processing

For each of the 80 sample quadrangles, I had to determine which stream
identiﬁcation method (Neighborhood or Transect) should be employed and what values
should be chosen for identiﬁcation parameters (minimum elevation difference and
neighborhood size). As the results of the previous analysis indicate, the choices
regarding the stream identiﬁcation method and its parameters should be based upon the
land-cover, relief, and the shape of the drainage features of a particular quadrangle. Even
though these attributes may substantially vary across the sample of LiDAR tiles within a
quadrangle (40 or 48), since the quadrangle pre-processing scripts operated on one
quadrangle at a time the choices made for the stream identiﬁcation parameters applied to
all of a quadrangle’s tiles. For example, the choice of minimum elevation difference in
the stream identiﬁcation method applied to all tiles, even if the method would have

performed better with a different value in some of the tiles.

For the majority of quadrangles, I chose to use the Neighborhood Method with a
minimum elevation difference of 3 feet, and a neighborhood size of 10 cells (25 feet by
25 feet). These choices held up well during post-processing inspection, particularly in
quadrangles with high relief and little to no agriculture (Figure 4-22). For quadrangles
dominated by agriculture, I still utilized the Neighborhood Method with a neighborhood
size of 10 cells, but chose a more aggressive minimum elevation difference of 1 foot. In
some instances these choices worked well, but in the ﬂattest areas it still failed to
adequately identify the stream network. I re-processed these quadrangles with the

Transect Method, but results were still unsatisfactory. I then adjusted the pre-processing

48

 

 

 

 

SCI

 

q!
u.

 

Flam
Well I

scripts so that both stream identiﬁcation methods were integrated and employed on these

quadrangles, which yielded more appropriate results (Figure 4-23).

— DLG Streai

Identiﬁed Stream

LiDAR DEM
Elevation

Higher

Lower

 

Figure 4-22: The Neighborhood Method with a minimum elevation difference of 3 feet performed
well at identifying stream cells in the LiDAR DEM for areas of high relief and little to no agriculture.

— DLG Streams
Identiﬁed Stream

LiDAR DEM
Elevation

Higher

’ . Lower

 

Figure 4—23: A combination of the Neighborhood and Transect methods performed best in ﬂatter
areas dominated by agriculture.

49

The LiDAR pre-processing was not a simple batch script that ran on its own for
weeks. Many quadrangles were evaluated through the trial-and-error method described
in the preceding paragraph. With a few quadrangles, I ran the stream identiﬁcation
algorithm three times until an adequate stream network was identiﬁed. For example, the
ﬁrst run might have been with the Neighborhood Method and a minimum elevation
difference of 3 feet, a second run with a Neighborhood-Transect combination and an
elevation difference of 1 foot, and a ﬁnal run with a Neighborhood-Transect combination
and an elevation difference of 2 feet. In these and all other similar instances, I inspected
the results visually to determine which run performed best and should be utilized in the

analysis.

Once the stream locations had been identiﬁed, the Carving Method removed
artiﬁcial barriers within those streams. For all 80 of the sample quadrangles, I used a
maximum neighborhood size of 100 cells and 3 iterations of carving, since the method’s

performance did not signiﬁcantly vary across the lO-site evaluation.

4. 4. 4 Stream-burning and Sink-ﬁlling LiDAR DEMS

The last steps of the LiDAR pre-processing were to burn the identiﬁed stream
locations into the carved DEM in the same manner used with the lO-meter DEMs: ﬁll all
sinks and calculate ﬂow-direction through D8. It should be noted that the LiDAR pre-
processing steps described above introduced artifacts into the ﬁnal LiDAR DEM.
Speciﬁcally, the stream identiﬁcation and carving methods typically did not yield a

perfectly contiguous stream network. Some of the identiﬁed cells were fragmented, as

50

seen in Figures 4-22 and 4-23. After the stream burning process these cells essentially
became sinks that were subsequently ﬁlled in by the ﬁlling algorithm. The end results
were ﬂat table-like features in the ﬁnal LiDAR DEM. In terms of the accuracy of
elevation, this outcome was a clear error; however, it did not necessarily alter the general
ﬂow-direction of surface water over these cells, which was what I tried to measure in this
research. The elevation values of the neighboring cells around those ﬁlled sinks usually
remained unchanged after the ﬁll process, meaning that a concentration of surface-water
ﬂow coming from the east still continued west along the ﬂat cells towards the lower

elevations west of those cells (Figure 4-24).

Elevation Upstream Contributing Cells
- Higher _ Higher
_ Lower Lower

   

Figure 4-24: Left - the ﬁll process created ﬁat features in the LiDAR DEM. Right - Flow
accumulation derived from the ﬁlled LiDAR DEM's ﬂow-direction indicate that the ﬂat features did
not dramatically alter the ﬂow-direction.

Once the LiDAR pre-processing steps described above had been completed for all
of the LiDAR tiles for a sample quadrangle, the quadrangle was considered pre-processed
and ready for the assessment of ﬂow-direction error. See Appendix A for a model

diagram of the quadrangle and LiDAR pre-processing.

51

4.5 Flow-direction Error

I calculated ﬂow-direction error for each quadrangle by performing a cell-by-cell
comparison of the lO-meter ﬂow-direction raster to the LiDAR ﬂow-direction rasters,
assuming the LiDAR rasters as the reference condition. As with the pre-processing steps,
I programmed the ﬂow-direction error calculation as a Python script. The code iterated
through each cell in the lO-meter ﬂow-direction raster (around 1 million cells). The

general approach was to compare the ﬂow-direction value for each lO-meter cell to an

aggregated ﬂow-direction value for the neighborhood of LiDAR cells1 that intersected
the particular lO-meter cell (Figures 4-25 and 4-26). But how should the LiDAR ﬂow be

aggregated?

One method of aggregation would be to take an averaged direction from the
LiDAR neighborhood cells, or calculate a single vector representing direction. But this
would be misleading in instances where half of the cells drained in one direction and the
other half drained in the opposite direction. If half of the cells drained north and the other
half drained south, would an east or west vector of ﬂow-direction be an accurate

representation of surface-ﬂow amongst the LiDAR cells?

Another method of aggregation would be to determine which ﬂow-direction was
most frequent amongst the intersecting LiDAR cells in Figure 4-26. However, this

approach would not necessarily characterize where the majority of surface water ﬂow

 

1 The size of the intersecting LiDAR neighborhood was typically a 10x10 grid, though sometimes it was
9x9. The LiDAR cells did not perfectly align with lO-meter cells. This was due to the fact that the 10-
meter width and height of the NED cell was not evenly divisible by the LiDAR resolution of 2.5 feet.
Figures 4-26 through 4-34 illustrate the overlaying of LiDAR cells on a lO-meter cell and show them to be
in perfect alignment for conceptual purposes only.

52

traveled amongst those LiDAR cells. It would be possible, for example, for 80% of the
intersecting LiDAR cells to drain towards the northern edge of the 10-meter cell, only to
be routed due east along that northern edge. In this instance, the predominant ﬂow-
direction value for the LiDAR cells would be north, even though the majority of the ﬂow

left the lO-meter cell area along the eastern edge.

,----10m----,
I I

Flow-direction (08)
- East

- Southeast
- South
Southwest
a West

a Northwest
E North

- Northeast

Figure 4-25: 3x3 neighborhood of 10-meter ﬂow-direction cells. Cell to be analyzed for error at
center.

 

53

Flow-direction (08)
- East

a Southeast
- South

E Southwest
- West

E Northwest
E North

- Northeast

Figure 4-26: LiDAR flow-direction cells intersecting the selected 10-meter cell.

 

Another option would be to examine the edges of the selected lO-meter cell, and
determine what percentage of LiDAR cells drained through each edge. This approach
would avoid the misrepresentation of a frequency value, as described above; however, it
would weight the cardinal directions too strongly. The North, East, South, and West
edges all would have ten cells that could potentially route ﬂow through an edge; whereas
the diagonal directions would each only have a single comer cell through which ﬂow
could be routed. Furthermore, similar to the problem with looking solely at the mode of
ﬂow-direction values, if all of the LiDAR neighborhood cells ﬂowed through the
northern edge of the lO-meter cell is that a fair comparison to the lO-meter ﬂow-direction
value? It would still be possible for all of the LiDAR cells to drain through the northern
edge, and then quickly divert due east as soon as they crossed the edge to ultimately drain

out somewhere in the Northeast lO-meter cell.

54

The lO-meter ﬂow-direction value was determined by calculating Slopes between
the center point of a cell and center points of that cell’s eight immediate neighbors, and
noting the cell with maximum Slope (Equation 4-1). Distance was equal to the cell-
resolution (IO-meters) for cardinal neighbors, and cell-resolution multiplied by 1.414 for
the diagonal neighbors. Therefore, the 10-meter calculation of ﬂow-direction was a
prediction from the center of the center cell to the centers of the neighboring cells. A

comparative analysis with the LiDAR cells should attempt to cover the same geography.

For {e1, 62, , 93}
Flow-direction = i, for II/IAXKIecemer cell - eil) ‘3‘ distance x 100]

Where ei = elevation of the i-th neighbor

Equation 4—1: Formula for calculating ﬂow-directions

The method I developed utilized an expanded neighborhood of LiDAR cells to
determine the degree to which the 10-meter and LiDAR predictions for surface water
ﬂow agreed. This neighborhood extended to the center of each of the neighboring 10-
meter cells, ensuring that the 10-meter ﬂow direction value and aggregated LiDAR ﬂow-
direction value corresponded to the same geographic Space. This expansion of the
LiDAR neighborhood also gave each neighboring lO-meter cell an equal opportunity for
ﬂow to drain through its edges. AS illustrated in Figure 4-27, each lO-meter neighboring
cell had 10 LiDAR cells through which the LiDAR ﬂow could have exited. For each of
these neighboring cells, the method recorded the percentage of cells from the original 10
by 10 LiDAR neighborhood that drained through its edges (Figures 4-27 — 4-34). Since

my goal was to evaluate the ﬂow-direction value of the center lO-meter cell, I only

55

calculated the drainage percentage of LiDAR cells that intersected that center cell (Figure
4-26). The LiDAR cells in the expanded neighborhood (outside of the original LiDAR
cells) were needed to trace the ﬂow of the intersecting LiDAR cells out of the
neighborhood, but these additional cells were not counted in the determination of

aggregated LiDAR ﬂow-direction value.

Flow-direction (D8)
- East

- Southeast
- South
Southwest
a West

a Northwest
[:1 North

- Northeast

Figure 4-27: 0% of center LiDAR cells drain through the northwest edges of the neighborhood.

 

 

Figure 4-28: 0% of center LiDAR cells drain through the north edges of the neighborhood.

 

Figure 4-29: 89% of center LiDAR cells drain through the northeast edges of the neighborhood.

 

Figure 4-30: 0% of center LiDAR cells drain through the east edges of the neighborhood.

 

Figure 4-31: 0% of center LiDAR cells drain through the southeast edges of the neighborhood.

 

Figure 4-32: 0% of center LiDAR cells drain through the south edges of the neighborhood.

 

Figure 4-33: 0% of center LiDAR cells drain through the southwest edges of the neighborhood.

 

Figure 4—34: 11% percent of center LiDAR cells drain through the west edges of the neighborhood.

Next, the script used the ﬂow percentages calculated for each lO-meter neighbor
to weight an error score (Equation 4-2). Recall that in Figure 4-25 the 10—meter cell in

question reported a ﬂow-direction value of southwest. If all of the LiDAR cells that

58

intersected this lO-meter cell drained out of the southwest edges of the expanded LiDAR
neighborhood, then the script would consider the lO-meter cell and LiDAR cells to be in
perfect agreement. This would result in an error value of 0. If the result was the opposite
condition, with the LiDAR cells draining out of the northeast edges of the neighborhood,
this would be perfect disagreement and result in an error score of 4. Other scores would
range between 1 and 3 increasing from agreement to error (Figure 4-35). In the case of
the example that began with Figure 4-25, 89% of the center LiDAR cells drained through
the northeast edges of the expanded neighborhood (a severe error: 4), while the remaining
11% drained through the western edges (a small error: 1). These error values were
weighted by their respective ﬂow percentages ((4 x 0.89) + (1 x 0.11)) to yield an
aggregated ﬂow error of 3.67, an indication of very high error in the 10-meter ﬂow-
direction estimation. Note that this error value does not indicate a speciﬁc direction of
error. The number itself does not indicate whether the LiDAR ﬂow would be closer to

the North neighbor or the East neighbor in Figure 4-35, only that the error was high.

8
FDE =21». 1:-
i=1

Where F DE = ﬂow-direction error for the 10-meter cell
pi = percentage of center LiDAR cells exiting out of the i-th neighbor lO-meter cell

fi = error score (1-4) of the i-th neighbor as compared to the ﬂow-direction of the 10-meter cell

Equation 4-2: Flow-direction error weighting.

59

 

 

Flow-direction Error

ﬂ Severe (4)
make)
I: Moderate (2)
El Low(1)
- Nonew)

 

Figure 4-35: Potential 110w error scores for assessment of 10-meter cell that ﬂows southwest.

The script carried out the process described above for every lO-meter cell
contained within a quadrangle’s sampled LiDAR tiles. The process yielded maps
illustrating the spatial distribution of ﬂow-direction error for a particular quadrangle
(Figure 4-36 — 4-39). Initial tests of this method revealed a spatial trend of high error in
and along stream features (Figure 4-36). This trend was the result of the ﬁlling operation
removing sinks in sections of stream within the LiDAR cells and modifying elevations
across large sections to single elevation values, and subsequently single ﬂow-direction
values. In the ﬂatter, agricultural areas entire ﬁelds were ﬁlled and reported the same
ﬂow-direction value for all their corresponding cells. These were clear distortions of
reality and diminished the LiDAR DEM’s utility as a reference dataset at these locations,
despite its superior vertical and horizontal positional accuracy as compared to the 10-

meter DEMS. To avoid the potential error that these ﬂat cells could have introduced into

60

the analysis, I re-structured the Python script so that any lO-meter cell that touched a ﬂat
LiDAR cell directly (as an intersection of cells) or contained one in its expanded LiDAR
neighborhood was discarded from the analysis (Figure 4—40). Esrskine et al. (2007)
similarly removed ﬂat cells from an analysis of DEM derivatives. This decision resulted
in, on average, the discarding of 33% of lO-meter cells per 7.5 minute quadrangle. In
some of the quadrangles with the lowest overall relief and high concentrations of
agriculture, discard rates were as high as 83%. However, despite the high discard rates in
these quadrangles, the statistical analysis was still able to Show signiﬁcant results for

ﬂow-direction error prediction in these quadrangles (see Chapter 5).

61

Flow-direction Error
- Severe (4)

High (3)

CI Moderate (2)
m LOWU)

- None (0)

 

Figure 4-36: Flow-direction error map for a single 7.5-minute quadrangle. Blue grid lines are the
result of avoiding edge issues in each LiDAR tile.

62

Flow-direction (D8)
- East

- Southeast
- South

E Southwest
- West
Northwest
E North

- Northeast

Figure 4-37: For the selected 10-meter cell, estimated ﬂow-direction is to the south.

 

Flow-direction (D8)
- East

- Southeast
- South

ﬂ Southwest
— West
Northwest
E North

- Northeast

Figure 4-38: For the selected 10-meter cell, estimated LiDAR ﬂow-direction trends mainly east and
northeast.

 

F low-direction Error
- Severe (4)
3 High (3)
D Moderate (2)
Low (1)

- None (0)

ﬁ’lﬁl

Figure 4-39: The ﬂow-direction error is considered high, with a value of 2.98.

 

63

Flow-direction Error

- Severe (4)
B High (3)
:1 Moderate (2)
E3 Low (1)
- None (0)

 

Figure 4-40: Flow-direction error map with ﬂat cells discarded.

Chapter 5 Results and Analysis

The previous two chapters detailed this study’s methodology for identifying the
mean ﬂow-direction error and its standard deviation. Using these data recorded for each
of the 80 sample quadrangles, I conducted independent t-tests to assess whether the mean
ﬂow-direction error varied signiﬁcantly across the agricultural and relief
characterizations listed in Table 3-1. I also explored correlations between the mean ﬂow-
direction error and various quadrangle attributes, and evaluated linear regression models

adjusted for residual spatial autocorrelation.

Overall ﬂow-direction error among the 80 quadrangles was M = 1.04 (SD = 0.85).
Figure 5-1 shows the spatial distribution of ﬂow-direction error amongst the sample.
There is a clear trend of increasing ﬂow-direction error in the SE-NW direction amongst
the quadrangles, mimicking the gradients for agriculture concentration and relief
illustrated in Figures 3-1 and 3-2. These maps and the quantitative analysis I present here
indicate that relief and agricultural concentration can be used to predict ﬂow-direction

error.

65

Mean Flow-direction
Error (quartiles)
C: 0.56 - 0.84

D 0.84 - 1.06

- 1.06 - 1.16

- 1.16 - 1.81

 

 

 

 

 

Figure 5-1: Spatial distribution of ﬂow-direction error amongst the sampled quadrangles.

5.1 Independent T-tests

I expected that ﬂow-direction error would be higher in areas of high agricultural
concentration and low relief. More formally:

Hypotheses:

Ho: FDE High % Ag. < FDE Low % Ag.

Ha: FDE High % Ag. 2 FDE Low % Ag.

Ho: FDE High Relief < FDE Low Relief
Ha: FDE High Relief 2 FDE Low Relief

Where:
FDE = ﬂow-direction error

Table 5-1 shows the overall mean ﬂow-direction error for the sampled quadrangles and

for the bins generated during the sampling process (see Table 3-1). Table 5-2 Shows the

66

results of the independent t-tests calculated between these bins. AS expected, mean ﬂow-

direction error differed signiﬁcantly between quadrangles characterized as having a high

percentage of agriculture (M = 1.16, SD = 0.91) and those with a relatively low

percentage (M = 0.93, SD = 0.79), t(78) = 4.33, p = < 0.001. Similarly, quadrangles

characterized as having high total relief (M = 0.89, SD = 0.78) versus those characterized

as low in relief differed Signiﬁcantly (M = 1.19, SD = 0.93), t(78) = 6.06, p = < 0.001.

On the basis of these results, both null hypotheses listed above were rejected.

Mean Flow-direction Error

 

Relief

 

 

 

 

 

 

 

 

 

High (> 32 7 feet ) Low (< 32 7 feet ) mm"

. wrm, 50:035. M=1.28, 30:11.97, M=1. 16, SD=0.91,

”’g” ( > 673%) N=20 N=20 N=4O

, .

/" Agr'cmtm M=0.75, SD=0.70, M=1.1 1, SD=O.88, M=0.93, SD==0.79,

Low ( < 67.8%) N=20 N=20 N=40
Oman 1120.39, SD=O.78, M=l.l9, 50:0.93, M=l.04, SD=0.85,

N=40 N=40 N=80

 

Table 5—1: Flow-direction error results for the various sampling criteria combinations.

1 also tested my expectations for the speciﬁc sampling bins, which I refer to as

follows:

High relief, high % agriculture: HRHA
High relief, low % agriculture: HRLA
Low relief, high % agriculture: LRHA
Low relief, low % agriculture: LRLA

I expected that quadrangles with little relief and a high percentage of agriculture would

contain the highest ﬂow-direction error values, and vice versa. More formally:

H0: FDE LRHA < FDE HRLA
Ha: FDE LRHA Z FDE HRLA

67

 

I also expected that when either relief or agricultural concentration was held constant the
variation in the other would prove signiﬁcant when compared to ﬂow-direction error.

Ho: FDE LRHA < FDE LRLA
Ha: FDE LRHA Z FDE LRLA

Ho: FDE HRHA < FDE HRLA
Ha: FDE HRHA Z FDE HRLA

Ho: FDE LRHA < FDE HRHA
Ha: FDE LRHA Z FDE HRHA

Ho: FDE LRLA < FDE HRLA

Ha: FDE LRLA 2 F DE HRLA
LRHA quadrangles reported the highest mean ﬂow-direction error, M = 1.28, whereas the
HRLA quadrangles reported the lowest value, M = 0.75. As shown in Table 5-2 these
two means differed signiﬁcantly, t(38) = 7.16, p = < 0.001 (a higher t score than any
other pairing of mean ﬂow-direction errors). These results show that when either relief
or agricultural concentration was held constant, the change in the other corresponded
signiﬁcantly with changes in ﬂow-direction error. There were not signiﬁcant differences
between HRHA and LRLA.

It appears that relief had a stronger impact on ﬂow-direction error than the
concentration of agriculture. The difference in overall mean-error between low relief
quadrangles and high relief quadrangles was greater (1.19 — 0.89 = 0.3) than the
difference between quadrangles with high concentrations of agriculture and those with

low concentrations (1.16 - 0.93 = 0.23).

68

 

 

 

 

 

T-test results HRHA HRLA LRHA LRLA
HRHA
m... 1:82:15?
LRHA “is: 3339’ «:8: 317111116,
LRLA 1(3 3): =0 11.38, t(;8<) (5.311614, 10:): 12635,

 

 

 

 

 

 

 

Table 5-2: Independent t-test results between sampling criteria.

5.2 Correlations

F low-direction error was highly correlated with agricultural concentration, relief,
contour topology error, and the extent to which ﬁlling sinks in the LiDAR DEMs
produced ﬂat areas (Table 5-3). This last correlation was the strongest (r = 0.85). An
initial reaction to such a strong correlation may be that these ﬂat LiDAR cells distorted
the true ground condition so dramatically that they yielded high values for ﬂow-direction
error. However, I discarded any lO-meter cell whose analysis neighborhood (Figure 4-
27) contained even a single ﬂat LiDAR cell. Therefore the error means used in

calculating the correlation did not reﬂect these cells. The more likely explanation for the
I strong positive relationship is that the areas that contained a high number of ﬂat cells in a
depression-less DEM embodied the surface characteristics that confound ﬂow-direction
estimates. Flat cells in a ﬁlled DEM were more likely to occur in areas of high
agricultural concentration and low relief, r = 0.58 and r = -0.73 respectively. These areas

were dominated by agricultural ditches because the low relief provided little natural

69

drainage to route water away from ﬁelds and allow for crop production. Consequently,
due to the presence of artiﬁcial barriers along them, these ditches had a greater chance of
being identiﬁed as sinks than as natural drainage features and ﬁlled ﬂat by the ﬁlling
algorithm. A high percentage of ﬂat cells for a quadrangle indicated low relief and a high
concentration of agricultural ditches in that area. These areas essentially captured the
“worst” of both worlds when trying to predict ﬂow-directions; the low relief raised the
uncertainty in ﬂow-direction estimates, while the concentration of agricultural ditches

increased the likelihood that drainage pathways were ﬁlled ﬂat.

It is worth noting that all of the variables in Table 5-3 were signiﬁcantly
correlated. This fact raised issues of multicollinearity in the attempt to develop a linear

regression model for ﬂow-direction error.

 

 

 

 

 

 

Fl -dir t' o/ I t ti % of ﬂat 10-
Correlations ow cc 1011 o row-crop Relief n ersec ons meter cells
error agriculture per contour discarded
Flow-direction
error
0 -
/o row crop r = 0.77
agriculture
Relief r = -0.76 r = -0.68
Intersections per r 2 0.70 ,. = 0.51 ,- = -0,6()
contour
% of 10-meter
cells discarded r = 0.85 r = 0.53 r = _0_73 r = ()_70
due to ﬂat
LiDAR

 

 

 

 

 

 

 

Table 5-3: Correlations between ﬂow-direction error and other variables. N=80, all correlations
were signiﬁcantly different from zero (p < 0.001).

70

 

5.3 Spatial Regression

I attempted to identify the variables that best explained variability in ﬂow-
direction error through linear regression modeling using the R statistical package (ver.
2.7.1). Since ﬂow-direction error was a spatial phenomenon, I adjusted the models to

account for spatial autocorrelation in the model residuals.

Restating my general hypothesis, I expected that areas that were relatively ﬂat,
had high concentrations of agriculture, and whose DEMS were produced from contours
that contained large numbers of topological errors (intersections) would have produced

higher values of ﬂow-direction error. More formally:

 

Equation: FDE = a + BpAPA + BTRTR + BCICI

Where:

FDE = ﬂow-direction error
PA = % agriculture

TR = total relief

CI = contour intersections

Hypotheses:

Ho: BpA = 0 Ha: BpA > 0
H0: BTR = 0 Ha: BTR < 0
Ho: PC] = 0 Ha: BC] > 0

 

 

 

Figure 5-2: Model 1 - initial linear regression model of ﬂow-direction error.

I tested Model 1 (Figure 5-2) ﬁrst, attempting to explain ﬂow-direction error
through relief, agriculture concentration, and contour intersections. Histograms of the

model’s variables indicated that a transformation was warranted for the contour

71

intersections variable; a logarithmic transformation produced the most normal
distribution (Figure 5-3).

The results in Figure 5-4 conﬁrmed the hypotheses of Model 1, with a solid

overall model performance (R2 = 0.73). However, the Moran’s I value (I = 0.3611, p = <

0.001) for the model residuals indicated that they contained signiﬁcant spatial

autocorrelation (mapped in Figure 5-5 — note the clustering of shades), which was

potentially inﬂating the R 2 and causing the horizontal cone shape (heteroscedasticity) of

the residuals graphed in Figure 5-6.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Mean Flow Error % Row-crop Agriculture
2* a 2 o
O O .—
g‘ c 5" on
2 " ”—1 2
Lu LL.
c l— 1 4‘ﬁ c 1 ’ 1 1 1 l 1
0.5 1.0 1.5 2.0 0 20 40 60 80 100
Flow Error %
Total Relief Contour Intersections
>. O >.
N 3
‘3 I W— 22
u. u.
G l 1 O I l 41'?-
0 200 400 600 800 0.0 0.2 0.4 0.6 0.8
Feet Intersections per Contour
Log Transform of Cntr. Int.
52 8
=
3 c:
a u—
2’5 —I—i—I
C l r l I 1
-5 -4 -3 -2 -l 0
Log(lntersections per Contour)

 

Figure 5-3: Histograms of regression variables from Model 1.

72

 

Signiﬁcance of Terms:

 

 

 

Coefﬁcients Estimate Std. Err. t score p-value

Intercept 1.1994774 0.0864874 13.869 < 2e-l6

o -

/° 5‘” "0" 0.0043492 0.0008723 4.986 3.78e-06
_agnculture

 

Total relief (ft) -0.0005739 0.0001572 -3.652 0.000476

 

5%?“ per 0.0621131 0.0194624 3.191 0.002059

 

 

 

 

 

 

 

Hypotheses:
BpA: Reject Ho
BTR: Reject Ho
BC]: Reject Ho

Residuals:

 

Min. 25th pctl. Median 75th pctl. Max.
0309 -0.1 13 0.008 0.085 0.333

 

 

 

 

 

 

 

 

Residual standard error: 0.1426 on 76 degrees of freedom

Moran ’s I:

 

Observed Expectation Variance
0.361 1 -0.0280 0.0080
p-value = 6.058e-06 (sampling test)

 

 

 

 

 

 

Model Performance:

R2: 0.73

F—statistic: 71.01 on 3 and 76 DF, p-value: < 2.2e-16

Fitted values vs. Actual values correlation: 0.86

 

Aikaike Information Criterion (AIC): -78.743

 

 

Figure 5-4: Model 1 diagnostics.

73

 

 

 

 

 

 

 

 

 

 

r.“___ ,A h, ,2
I MI
I I
. " - . I
l E I D-
I B PA .
I Residuals
- 3E3 521 E --0.31——0.14
I IN I Ej-0.14--0.02
I I 2e BID D L Gem-0.03
D I i I '- -0.03-0.10
I] l -
D - E] I -0.10 0.33
E” i
g Ii i WV
KY /\
Figure 5-5: Mapped residuals of Model 1.
Residuals vs. Fitted
M. _ o 0 °
C
0
O O
‘ o
O 0 0 O
.12 c. ° :0 9 «5% o
g o W 0° ° 3 o
a: o 0 °
0
S "I o o 0 o o 3) o
0 ° 0 0° 0 o
N ° °
C? — o
0 8° 0
«'1
O. ‘ o
l 17 1 I I
0.6 0.8 l 0 1 2 1 4
FittedValues

Figure 5-6: Residuals of Model 1, greater residual error of higher ﬁtted values indicates potential

heteroscedasticity.

74

I mitigated the spatial autocorrelation in Model 1 by implementing the SAR
(simultaneous auto-regression) function of R’s SPDEP (Spatial Dependence) module
using each quadrangle’s three nearest neighbors, inversely weighted by their distance.
SAR corrected the estimates of the model coefﬁcients, but was not designed to produce a
new R2. SAR corrected the residual heteroscedasticity illustrated in Figure 5-6 (Figure 5—
7).

Aﬁer correcting for residual spatial autocorrelation, the regression model’s terms
were still signiﬁcant, conﬁrming my hypotheses (Figure 5-9). The adjusted model’s
residual values were smaller in absolute terms and more concentrated than those for the
original model; the range of residual values was smaller than in the original model (0.018
versus 0.024), as was the standard error (0.007 versus 0.143). Moran’s I for the adjusted
model’s residuals was no longer statistically signiﬁcant (p = 0.28). Additionally, the
Lambda signiﬁcance (p < 0.001), the slightly improved correlation of ﬁtted values versus
actual values (0.91 for SAR adjusted model versus 0.86 for original model), and the
lower AIC value (-98.05 versus -78.4) all indicated that the coefﬁcients of the new model
successfully accounted for the presence of spatial autocorrelation in the residuals.
Though not as clear as these numeric metrics of model improvement, the map of
residuals ﬁom the adjusted model (Figure 5-8) shows a slight improvement as several of

the darker clusters in the original residual map (Figure 5-5) were broken up.

75

SAR Residuals

SAR Residuals vs. SAR Fitted

 

 

 

 

 

o
o
_ o o
o
o
830 o
_ o o
o o °O o o
o o a: 00 o
o 0
:9 n 0 nun
o
0
° 0
o
o
o o o
4 ° ° 0 °
o o o
o
o o
_ o 0
0°
0
I l l l I
0.6 1.0 1.2 1.4 1.6

SAR Fitted Values

Figure 5-7: Residuals of Model 1, corrected for spatial autocorrelation, no apparent

heteroscedasticity.

 

PA ,
SAR Resnduals

- -0.27 - -0.09
-0.09 - -0.03
l: -0.03 - 0.03
l 0.03- 0.10

 

 

-0.10-0.26

 

f

Figure 5—8: Mapped residuals, corrected for spatial autocorrelation.

76

 

Signiﬁcance of Terms:

 

 

 

 

 

 

 

 

 

 

 

 

Coefﬁcients Estimate Std. Err. I score p-value

Intercept 1.] 1910155 0.0832270 13.446 < 26-16

0 -

/° 5‘” "01’ 000391785 0.0008740 4.482 7.384e-06
_agr1culture

Total relief(ﬁ.) -0.00051557 0.0001362 -3.783 0.0001545

Egg“ per 003246648 0.0151490 2.143 0.0321021

Hypotheses:

BpA: Reject Ho
BTR: Reject Ho
BC]: Reject Ho

Residuals:

 

Min. 25th pctl. Median 75th pctl. Max.
0274 -0.078 0.003 0.069 0.256

 

 

 

 

 

 

 

 

Residual standard error: 0.0072944

Moran ’s I:

 

Observed Expectation Variance
0.04] 1 -0.0127 0.0081
p-value = 0.2755 (sampling test)

 

 

 

 

 

 

Model Improvement:

Lambda: 0.59923 LR test value: 21.305 p-value: 3.9180e-06
Fitted values vs. Actual values correlation: 0.91

Aikaike Information Criterion (AIC): -98.048

 

 

 

Figure 5-9: Model 1 diagnostics, corrected for residual spatial autocorrelation.

By most measures, Model 1 performed very well. However, multicollinearity in

the independent variables may have inﬂated the model’s performance. As indicated in

77

Table 5-3, there was a strong, negative correlation between agricultural concentration and
relief (r = -0.68). Relief and the number of intersections per contour was also strongly
negatively correlated (r = -0.60). Agricultural concentration and contour intersections
were moderately positively correlated (r = 0.51). Therefore, I explored other regression
models to see if ﬂow-direction error could be described through other terms and

combinations.

All of the independent variables in Table 5-3 were signiﬁcantly correlated, so any
regression model employing them would have had multicollinearity issues. To consider
the power of each variable individually on ﬂow-direction error, as well as to assess some
additional variables, I developed six alternative regression models (Table 5-4). Models 2
through 4 each employed one of the terms in the original model (Model 1 — Figure 5-2):
percentage row-cop agriculture, total relief, or intersections per contour. While all of
these models explained signiﬁcant variability in ﬂow-direction error, the results
conﬁrmed that percentage row-crop agriculture and total relief explained more of the
variance than intersections per contour. Model 5 employed the percentage of a
quadrangle’s lO-meter cells discarded from the analysis due to the presence of ﬂat
LiDAR cells (ﬂat cell percentage). This model was the best single predictor of ﬂow-
direction error (R2 = 0.72). This independent variable was most closely associated with
elements that confound DEM prediction of ﬂow-direction: drainage ditches (potential
sinks) that are likely to be ﬁlled in and ﬂat terrain. Model 6 was a modiﬁcation of the
original regression model (Model 1) and replaced total relief with ﬂat cell percentage,

since total relief was the independent variable most highly correlated with ﬂat cell

78

percentage. While multicollinearity was still an issue, Model 6 performed much better
than the original model on all measures. Model 7 employed a relatively simple
combination of terms and performed nearly as well as the original model.

That all of these regression models yielded signiﬁcant results begs the question of
which model should those interested in predicting ﬂow-direction error elsewhere employ.
Solely by the numbers, Model 6 should be the predictor of choice. However, the terms of
that model are relatively expensive to acquire. Speciﬁcally, the percentage of discarded
ﬂat cells requires access to LiDAR data, which is limited around the globe, and requires a
signiﬁcant amount of pre-processing and hard drive space. The most economical choice
is Model 7, since land cover and elevation datasets are freely available for the entire US.
and most of the world. Any of the models presented here are statistically signiﬁcant
predictors of ﬂow-direction error. However, the results show that any implementation of

these models requires a correction for spatial autocorrelation.

 

 

 

 

 

 

 

 

Corr. of
Model Residual 2 . . . SAR Sig.
# Model terms Heteroscedasticity R F Statistic SAR Fitted AIC Terms
vs. Actual
114.0
0 -
2 /o ag. no 0.59 (p < 0.001) 0.90 83.0 all
. 109.6
3 total relief yes 0.58 (p «1001) 0.89 -77.9 all
. 70.9
4 log(mt. p. cnt.) no 0.47 (p (0.001) 0.88 -65.6 all
208.4
5 ﬂat cell pct. yes 0.72 (p (0.001) 0.89 -88.9 all
% ag' 142 3
6 ﬂat cell pct. no 0.84 (p < 0 001) 0.93 -124.1 all
log(int. p. cnt.) '
%ag 109.6
7 total relief yes 0.69 (p (0.001) 0.91 -95.7 all

 

 

 

 

 

 

 

 

 

Table 5-4: Alternative linear regression models, and measures of performance.

79

Chapter 6 Conclusion

6.1 Summary

For this research, I sought to quantify error in estimates of surface water ﬂow-
direction derived from USGS lO-meter DBMS and explain how that error varied across
different geographic landscapes. I developed a method by which ﬁner resolution DEMs
(LiDAR - 2.5-foot) can be utilized to evaluate estimates of ﬂow-direction derived from
coarser DEMs (U SGS - 10-meter) on a cell-by-cell basis. I implemented this approach
on 80 7.5-minute USGS quadrangles in Ohio, stratiﬁed by relief and agricultural
concentration. As I expected, statistical analyses revealed that ﬂow-direction error was
strongly, negatively correlated with total relief, and strongly, positively correlated with
agricultural concentration. The ntunber of intersections per feature in the source contours
that produced the USGS DBMS, and the percentage of sinks ﬁlled ﬂat by the ArcGIS
ﬁlling algorithm were also strongly, positively correlated with ﬂow-direction error. A
regression model employing agricultural concentration, contour intersections, and sink
ﬁlling as independent variables explained 84% of the variance in ﬂow-direction error. A
simpler model employing only agricultural concentration and total relief still explained
69% of ﬂow-direction error variance. Both of these regression models had to be adjusted
to correct for residual spatial auto-correlation, and both contained some degree of

multicollinearity.

80

The pre-processing of the LiDAR data was costly and time-consuming, requiring
nearly one terabyte of hard drive space and four microcomputers running constantly for
over a month. The most challenging and time-consuming part of pre-processing was
identifying stream locations within the LiDAR DEMs in order to “burn” a hydrological
network into the surfaces and enforce a proper drainage network. This step required the
development of custom Python scripts that read LiDAR elevation data into Python lists,
utilized neighborhood analyses to locate stream cells, and carved through artiﬁcial
barriers within the identiﬁed stream networks. Qualitative tests indicated that these
custom methods performed best in ﬂat agricultural areas, where the enforcement of

proper drainage networks was most needed.

6.2 Signiﬁcance

While the relationship between DEM error, relief, and agriculture has been
studied before, the speciﬁc attention to ﬂow-direction at ﬁeld scales and over such a large
geographic sample make this research unique. Previous efforts studied the effect of
resolution or interpolation methods on DEM derivatives such as slope, aspect, and soil
moisture. Other research has evaluated surface ﬂow from DEMs by comparing
catchment boundaries derived from upland ﬂow accumulations in relatively small study
sites. In this thesis, however, I have presented a new method for evaluating ﬂow-
direction, conducted this evaluation at the DEM cell level, and described its variation

across the Ohio landscape, which exhibits a diversity of topography and land cover.

81

This thesis makes several contributions to the ﬁeld of geographic information
science. The quantiﬁcation of ﬂow-direction error adds a new measure by which DEMs
can be evaluated. The cell-by-cell evaluation of coarser data with ﬁner data could be
applied to any number of raster datasets. The new method for evaluating surface water
ﬂow-direction could be extendedto other directional datasets, such as aspect, wind-
direction, or groundwater ﬂow-direction. Furthermore, the methods I developed and
tested to locate stream features in LiDAR data, and my implementation of a sink carving
method could be utilized in other hydrological studies of DEMs. This contribution in
particular may prove to be the most signiﬁcant as LiDAR DEMs replace contour-

interpolated DEMs as the standard in spatial analysis.

At a more practical level, the cell-level speciﬁcity of this thesis informs ﬁeld-
level analyses that would be impractical through previously described methods.
Additionally, its broad scope informs analysis across varying topographic and land cover
environments, making it applicable to a large audience of users. One group of DEM
users that will beneﬁt from this research are soil conservationists. Predictions of surface
water ﬂow-direction are critical to identifying speciﬁc ﬁeld-level locations experiencing
soil erosion. Inaccurate estimates will focus conservation efforts on the wrong locations.
A conservationist may look for an erosive gully on the eastern edge of a ﬁeld, when in
actuality it may be on the western edge. Having a quantiﬁed value of ﬂow-direction
uncertainty for a particular location will aid soil conservationists in efﬁciently marshaling
their time and resources. They will be able to gauge how much ﬁeld veriﬁcation is

needed prior to targeting soil conservation practices. For example, if staff for the Lucas

82

County Soil and Water Conservation District utilized a USGS lO-meter DEM in an
erosion model, as can be done with the Revised Universal Soil Loss Equation (Wu et al.
2005, Ouyang et al. 2005), to identify at-risk locations in a hilly area with little
agricultural presence, they could be sufﬁciently conﬁdent in assuming that surface water
ﬂow was appropriately simulated by the model. If, on the other hand, district staff
utilized the same approach to identify an at-risk location in a ﬂat agricultural area (typical
for that region of Ohio) they should exercise caution in assuming that the location was
signiﬁcantly eroding, and conduct a thorough ﬁeld evaluation prior to the installation of

remediation measures.

Soil conservationists are just one speciﬁc group that could beneﬁt from this
research. Environmental engineers deriving stream networks, stream biologists exploring
non-point source pollution loading from small local catchments, civil engineers re-
evaluating ﬂoodplain maps, or any user employing DEMs for ﬁeld-level hydrological
analyses will beneﬁt from knowing the uncertainty in simulated ﬂow-direction at a

particular location.

Martinoni and Bernhard (1998) argued that greater documentation of error in
DBMS, including assessment of derivative error, is needed to ensure their proper use and
analysis. The quantiﬁcation of ﬂow-direction error could be a new metadata standard for

DBMS, and support Martinoni’s and Bernhard’s goal.

83

6.3 Future Research

While I was able to successfully quantify and explain ﬂow-direction error in
terms of topographic and land cover characteristics, the process generated additional
topics that warrant further research. Stream locations could potentially be better
identiﬁed and processed; smoothed LiDAR DEMs may prove to be better reference
datasets; additional ﬂow-direction algorithms should be utilized in the assessment of
ﬂow-direction error; and the scale at which ﬂow-direction between lO-meter and LiDAR

DEMs changes from error to agreement should be evaluated.

The methods I developed to identify and burn stream locations in the LiDAR
DEMs has several limitations. First, they cannot identify wide cross-sections of rivers.
The analysis neighborhood is restricted to a lO-cell (25-foot by 25-foot) neighborhood
and cannot locate the necessary stream bank elevations for large rivers in such a small
window (Figures 4—10 and 4-13). Second, they require the user to specify single
thresholds for stream bank elevation and neighborhood size that in actuality should vary
depending on the relief and land cover of a particular area. In this research, I had to
process several quadrangles three times with different threshold inputs until an acceptable
output was realized. Third, the carving process to remove artiﬁcial barriers in the streams
sometimes carved in inappropriate directions. For example, instead of carving through an
east-westrunning bridge to connect a north-south drainage ditch, it sometimes carved in a
northwest-southeast direction to connect the north-south ditch to a section of an east-west
ditch on the opposite side of the road (Figure 4-21). Revised methods could

accommodate wider rivers, automate input selections by analyzing total relief and land

84

cover
discer

metho

repres
numb
an are
data. 1
081112
DEM:
direct
impart
preser
DEM
Press]
hlpor
0310.

Send;

Well (

311an

cover percentages for given locations, and carve in a more intelligent manner by
discerning the direction of the stream features that need to be connected. These revised

methods would likely improve the precision of ﬂow-direction error estimates.

LiDAR DEMs can contain signiﬁcant noise that distorts the surface
representation. These distortions can be limited by “smoothing” the DEM through any
number of ﬁltering methods. A common approach is to change each cell’s elevation to
an average of its nearest neighbors. This method reduces any large discrepancies in the
data, but can also diminish valid features on the surface. I was particularly interested in
evaluating ﬂow-direction in ﬂat areas; therefore, I was worried that smoothing LiDAR
DEMs in these areas may distort the subtle surface features critical to calculating ﬂow-
direction. As mentioned in Chapter 3, subtle changes in elevation can have a signiﬁcant
impact on ﬂow-direction if that area is ﬂat. I chose to honor the original DEM and
preserve these features, at the risk of noise affecting the model outputs. The LiDAR
DEMs utilized in this thesis were unaltered from their downloaded format. Despite the
presence of noise in some of the LiDAR DEMs, I was still able to conﬁrm my
hypotheses. However, since smoothing is a common procedure, it would be worth
evaluating how the results of this study may change with smoothed LiDAR DEMs

serving as the reference dataset.

While the D8 ﬂow-direction algorithm allowed me to conﬁrm my hypotheses, its
well documented limitations warrant an evaluation of ﬂow-direction error utilizing

alternative and more accepted methods. Multi-direction methods such as MF by Quinn et

85

al. (1991) or multi/single direction hybrids like Tarboton’s D-Inﬁnity (1997) should be

utilized in a cell-by-cell fashion as used here.

I calculated ﬂow-direction error on a cell-by-cell basis, but at what extent, if ever,
does that error change to agreement? The 10-meter cell and its corresponding LiDAR
cells may be in disagreement at the immediate neighborhood level illustrated in 4-35, but
perhaps the predicted ﬂows converge at the next larger neighborhood. It is possible that
ﬂow-direction error is generally conﬁned to a small local neighborhood of cells. It is also
possible that simulated ﬂows travel in different direction over large areas (Figures 3-8
through 3-10). This relationship should be explored to better describe the impact ﬂow-

direction error may have on hydrological analyses.

The availability of high-resolution LiDAR DEMs will continue to grow as GIS
users demand ﬁner data and as digital storage space becomes cheaper. LiDAR DEMs
will likely replace the coarser, contour-interpolated and stereoscopic DEMs produced by
the USGS as the standard digital elevation products for spatial analysis. As that
transition takes place, research should document how existing DEM-based analyses may

change with the incorporation of the ﬁner resolution data. This research is a contribution

to that effort.

86

APPENDICES

87

 

 

 

59:0
QUOI atom :25
m

III.“

59:0 K

 

 

 

1.8qu Sauna

 

 

 

 

 

25:35 $538.5 3. X—GZme—<

88

 

 

 

 

 

 

 

 

 

 

 

238m Beﬁmwaﬁm 2mg 28 «<95
M205 5:9 ”205 ~20“ =5 awe—coﬁmoaoa
7
SEC
m<D§ 0E0
356 @8582
.38”. Hoomeﬂh .228
885m 0858.365
/m<o§ n: anbm

 

 

 

   

 

   

 

.553—
:ocooboéo. m

  

Ema $85-2 53026-32...

 

 

 

  
  

 

 

 

32.838505 K 8355 _
Ema 288%
E..— coEsnéaobm E .55

« 388$ _ .

 

 

 

2.5355 3588.:— 3.. Nun—239:4

    

    

 

   
   

.833—

53026-32.”—
cocoohvéoi

           

 

 

 

 

    

5.9..
.\ «<05
28 $4 - as 55.5
”555 .980: 3:0 55;?
A mzmo «<03

 

 

  

 

33m wﬁmmooeauoan—

    

 

  

 

    

 

 

 

   
 

ego
3::_E-n. n
2.3
368$ _
V.222 .
E “83”.. 2mm
2.205-225
K
.33 .23. ﬁnance»:
.2325 e 25%; 6.5 35
as

 

 

 

89

 

 

mam—22.
Seesawsz
#203

K

 

 

  
     

   

2.8
Scioﬁwsz
53:

   
  

33>
55025.32;

«<91—
@08onng

 

   
  

coca—330
Sum
cocoobvio—m

 

 

 

 

=00
:ozoohuioE

 

2.00
Boeoﬁwsz
w::8£8£

K

 

388$ _

momma
558%-32...

 

 

~20:

 

     
 

    
    

:60 385.2
eom Stm
22826-32 “—

      
       

 

52.4.3—
=262€$5E

668$ _

.33 bo>ov
.895:

.N

 

 

 

38m wimmoueaémem

wig—us:— waﬁmeocum u< Nun—lam:

    

    
  

90

\oooqauhwu)-

QWNU—OQWQO‘MkUJN—o

NN
GM

27
28
29

30
31
32
33
34

35
36

37

APPENDIX B — Python Code

raster_analysis.py

_all_ = ['rasterZarray', 'rastertethlist', 'extract_header']

#raster_analysis.py

#Author: Glenn O'Neil

#Date: September 2009

#This Python module provides tools for converting ArcGIS rasters

#between multiple formats, including NumPy arrays for analysis with GDAL
#and Python lists. It also includes a function for extracting a neighborhood
#ﬁ'om a larger list or array of values.

def raster2array (raster):
"""Takes a raster path and converts it to a NumPy array using GDAL"""

#check to see if gdal and gdalconst modules have been imported
if 'gdal' not in globals():

from osgeo import gdal
if 'gdalconst' not in globals():

from osgeo import gdalconst

raster_ds = gdal.0pen(raster, gdalconst.GA_ReadOnly)
raster_band = raster_ds.GetRasterBand( l)

raster_anay = raster_band.ReadAsArray()

del raster_ds, raster_band

return raster_array

def array2raster (raster_array, raster_ds_template, raster_ds_out_path, raster_ds_out_format,
scratch_folder):

Takes a NumPy array and converts it to an ArcINFO raster using GDAL

raster_array = the NumPy array to be converted
raster_ds_template = the raster dataset that will serve as a template for the output raster
dataset

raster_ds_out_path = the full path of the output raster dataset
raster_ds_out_format = string of the the pixel type of the output raster dataset:

- l_BIT: A 1-bit unsigned interger. Values can be 0 or 1.

- 2_BIT: A 2-bit unsigned integer. The values supported can be from 0 to 3.

- 4_BIT: A 4-bit nsigned integer. The values supported can be from O to 15.

- 8_BlT_UNSIGNED: An unsigned 8-bit data type. The values supported can be
from 0 to 255.

- 8_BlT_SIGNED: An signed 8-bit data type. The values supported can be from -
128 to 127.

91

,3

15
l6

17

18

49
50
5]

<7

54
SS

56
i7

53
so
60
6l

63

63

7 1 71 gr
~.I.

38
39
40

41

42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

60
61

62
63

65
66
67
68
69
70
71
72
73
74
75
76
77

- l6_BIT_UNSlGNED: An unsigned l6-bit data type. The values supported can
be from 0 to 65,535.

- l6_BIT_SIGNED: An signed l6-bit data type. The values supported can be
from -32,768 to 32,767.

- 32_BIT_UNSIGNED: An unsigned 32-bit data type. The values supported can
be from 0 to 4,294,967,295.

- 32_BIT_SIGNED: An signed 32-bit data type. The values supported can be
from -2,147,483,648 to 2,147,483,647.

- 32_BlT_FLOAT: A 32-bit data type supporting decimals.
- 64_BIT: A 64-bit data type supporting decimals.
scratch_folder = the folder where the temporary datasets will be stored

#check to see if the necessary modules have been imported
if 'gdal' not in globals():
from osgeo import gdal
if 'gdalconst' not in globals():
from osgeo import gdalconst
if 'arcgisscripting' not in globals():
import arcgisscripting
if 'gp' not in globals():
gp = arcgisscripting.create(9.3)
elif str(type(gp)) != "<type 'geoprocessing object'>":
del gp
gp = arcgisscripting.create(9.3)

#must ﬁrst convert the template dataset to a GeoTlFF b/c GDAL can't writ to ESRI Grid
format

template_copy = scratch_folder + "\\temp.tif"
gp.CopyRaster_management(raster_ds_template, template_copy, "#", "#", "#", "NONE",
"NONE", raster_ds_out_format)

#read the copied TIFF file into a GDAL dataset and write the raster_array contents into it
raster_ds = gdal.0pen(template_copy, gdalconst.GA_Update)

raster_band = raster_ds.GetRasterBand( l)

raster_band.WriteArray(raster_array)

raster_band.FlushCache()

raster_ds.FlushCache()

del raster_ds, raster_band

#copy the modiﬁed template_copy back to an ESRI Grid
gp.CopyRaster_management(template_copy, raster_ds_out_path)
gp.delete_management(template_copy)

del template_copy

92

78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104

105
106

107
108
109
110
111
112
113
114
115
116
117
118
119

def rastertext21ist (raster_ascii_path, output_datatype='int'):
"""Takes a raster as a text ﬁle and converts it to a list
raster_ascii_path = full path of an ASCII text ﬁle converted from a raster
output_datatype = the datatype of the values in the output list.
Integer is the default, user can specify 'ﬂoat' or 'string'

#read in the elevation ascii ﬁle
raster_ascii = open(raster_ascii_path, 'r')

#initiate the list that will store all of the text values

raster_list = []

raster_ascii.seek(0,0)

for eachLine in raster_ascii:
raster_list.append(eachLine.split(" "))

raster_ascii.close()

#get rid of the descriptive info (the ﬁrst six rows)
for i in range(6):
raster_list.remove(raster_list[0])

#get rid of the newline character at the end of each row
for row in raster_list:
row.remove(row[len(row) - 1])

#convert the textﬁle values (string) to whatever numeric type (integer or ﬂoating type) was
speciﬁed

if output_datatype == 'ﬂoat' or output_datatype == 'Float':
for row in raster_list:
for i in range(len(row)):
row[i] = ﬂoat(row[i])
elif output_datatype = 'string' or output_datatype == 'String':
pass
else: #the default will be integer
for row in raster_list:
for i in range(len(row)):
row[i] = int(row[i])

raster_ascii.close()
del raster_ascii

93

113 (1a:

125

116 r
117 n
128 r}

 

131 n
133 c
133 r
134
135

136 dc

120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160

return raster_list

def extract_header (raster_ascii_path):

""" Extracts the header information from a raster converted to a text ﬁle

and returns it as a list """

raster_ascii = open(raster_ascii_path, ‘r')

header__list = []

for i in range(6):
header_list.append(raster_ascii.readline())

raster_ascii.close()
del raster_ascii
return header_list

def nhood(input_list, row_index, col_index, nsize):

"""Takes a list, row index, and column index and extracts a neighborhood

of nsize around those indexes.
nhood_list = []

#get the boundary indexes

upper_leﬂ_row = row_index - nsize

if upper_leﬁ_row < 0: upper_leﬁ_row = 0

upper_leﬁ_col = col_index - nsize

if upper_left_col < 0: upper_1eﬁ_col = 0

lower_right_row = row_index + nsize

if lower_right_row >(1en(input_list)- 1): lower_right_row = len(input_list) - 1
lower_right_col = col_index + nsize

if lower_right_col > (len(input_list[0]) - 1): lower_right_col = len(input_list[0]) - 1

#loop through the input_list for the speciﬁed boundaries and record the values
row_counter = 0
for i in range(upper_left_row, lower_right_row + 1, 1):
nhood_list.append([])
for j in range(upper_leﬁ_col, lower_right_col + 1, 1):
nhood_list[row_counter].append(input_list[i][i])
row_counter += 1

return nhood_list

94

l-J

‘4.)

tank
*Cmm

5Aum.

88

 

8 z
9 #
w 3
11':
D t
13:
N :
H z
16:
17.5,
H :
W a
30;
31:;
2 g
33:
24k:
3 :
362
37:,-
3 ~

\OOOQGMRWN—

WUNNNNNNNNNNN—~——~——O~——
N—OOOOQONMAWN—OVOOOQONM-hWN—O

Mb)
Am

35

36

37
38

APPENDIX B -— Python Code
analysis _prep.py

#
# analysis _prep.py

# Created September 2009

# Author: Glenn O'Neil

#

# Description: This script generates the datasets and folders necessary for an
# analysis of NED (Naitonal Elevation Dataset) DEM ﬂow-direction
error by comparison to tiles of ﬁner resolution LiDAR DEMS.

 

The preparation includes a stream-burning process to force surface-
water ﬂow to streams. The NED DEMs are burned with USGS Hydrography
1:24K DLG vector features. LiDAR tile datasets are identiﬁed based
on the coordinates of the selected USGS 7.5 minute quadrangle.

This results in a sub-folder of around 40-48 LiDAR DEMs, each to be
prepped separately. The LiDAR DEMs' horizontal positional
accuracy are too precise to burn with 1:24K stream features.
Therefore, this script calls one of two custom scripts (speciﬁed

by the user) that identify stream locations in LiDAR datasets by
analyzing only elevation values. Once stream identiﬁcation has been
completed for the LiDAR DEMs, another custom script is called to
attempt to carve through artiﬁcial barriers in the LiDAR DEMs,

such as foot-bridges and tile culverts. This script identiﬁes

sinks and carves to the closest low-point within a user-speciﬁed
maximum neighborhood size. It continues to identify sinks and carve
for a user-speciﬁed number of iterations. Once the ﬁnal carve

is complete, the LiDAR DEM is stream-burned, sinks ﬁlled, and
flow-direction calculated with the D8 method.

Inputs: 1. quads - an ESRI shapeﬁle of 7.5 Minute USGS Quadrangles
2. quad_id_ﬁeld - unique quad ID ﬁeld of the 'quads' shapeﬁle
3. quad_id - the unique ID of the selected quad
4. state _plane_ﬁeld - ﬁeld in the 'quads' shapeﬁle that denotes
whether the quad lies in the North or Southern

%matﬁitttkzttﬁntkkatﬁtntatntﬁkkﬂttitnttatk=11:

state planes.
5. state _plane_n_sr_ﬁ1e - path to the Ohio State Plane North NAD 83 Ham projection
le

6. state _plane_s_sr_ﬁle - path to the Ohio State Plane South NAD 83 Ham projection
ﬁle
# 7. workspace - the workspace where the quad folder containing all output ﬁles and
folders
# will be created.
# 8. streams_wspace - workspace containing a shapeﬁle of stream features for each Ohio

95

39

4O
41

42
43

45

46
47

48
49

50
51

52
53

54
55

56
57

58
59

60
61

62
63

65
66
67
68
69
70
7 l
72
73
74
75

quad

#

from

#

# 10.
study.

# 11.
#

#

#

#

# 12.
difference in
#

# 13.
inc cells

#

# 14.
difference 1n
#

# 15.
which

#

# 16.
to

#

# 17.
and carve.

# 18.
#

#

# 21.
reducing the
#

# 22.
#

# Outputs:

#

#

#

#

#

#

#

#

#

#

9. ned_dem - a single DEM raster for all of Ohio, generated by contour interpolation

DLG hypsography and submitted to NED.
lidar_master__wspace - workspace containing all of the LiDAR rasters used in this

stream_id_method - the method of stream identiﬁcation for the LiDAR datsaets;
'neighborhood': see script strearn_id_nhood.py
'neighborhood transect combo': a combination of the neighborhood and
trasect methods. See scripts strearn_id_nhood.py

and stream_id_transect.py
elev_diff - parameter of the neighborhood stream ID method. The minimum

elevation from a stream bank to the stream center.
neigh_maxsize - parameter of neighborhood stream 1D method. The maximum size

to search around a given cell for cells that exceed elev_diff.
elev_diﬁ‘_trans - parameter of the transect stream ID method. The minimum

elevation from a stream bank to the stream center.
trans_length - parameter of the transect stream 1D method. The transect length over

the code will search for elevation changes that exceed elev_diff_trans
max_cavre_length - parameter of the carving script. The maximum distance in cells

search about a sink for a lower point.
iterations - parameter of the carving script. The number of times to identify sinks

strearn_id_nhood_script - the path to the neighborhood stream identiﬁcation script.

19. stream_id_transect_script - the path to the transect stream identiﬁcation script.

20. carve_script - the path to the carving script.

PWHQMPPNT‘

1idar_sample - boolean value whether to sample a lattice from the LiDAR tiles,

total number of rasters that must be processed.
del_int_data - boolean value whether to delete intermediate datasets

Folder containing output datasets

ned - lO-meter DEM clipped by quadrangle boundary

ned_b = clipped lO-meter DEM stream-bumed with stream_g

ned_bf - ﬁlled (depressionless) version of ned_b

ned_fd - ﬂow-direction of 10-meter DEM, derived from ned_bf
ned_ﬁd - raster of unique [D values for each lO-meter cell
quad_se1.shp - shapeﬁle of selected quadrangle boundary

streams. shp - shapeﬁle of quadrangle streams (from DLG hydrography)
streams _g binary raster of stream locations, from streams. shp

10. lidar- directory containing all of the quad's LiDAR data, organized by tile

96

76
77

78
79

80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117

 

# 10-1. carved_dem - LiDAR DEM with artiﬁcial barriers in streams carved through
# 10-2. ﬂat_areas - raster of locations with a slope of zero, later ignored in error
calculation

# 10-3. lidar_bum - LiDAR raster with lidar_stream locations burned in.

# 10-5. lidar_ned_ﬁd -LiDAR-resolution raster of ned_ﬁd, used in locating LiDAR
cell neighborhoods for each NED cell.

# 10-6. lidar_stream - binary raster of LiDAR stream locations.

# l 1. template_raster - TIFF copy of the ned_fd raster, used to write in ﬂow_error results.
#

#import the necessary modules

import sys, 05, time, arcgisscripting, math, random
from subprocess import call

gp = arcgisscripting.create(9.3)
gp.CheckOutExtension("Spatial")

ls = os.1inesep

#function to convert seconds to hours and minutes (borrowed from:
#http://mail.python.org/pipennai1/python-list/2003-January.l 81366.html
def sec_to_h_min(s):

temp = ﬂoat()

temp = ﬂoat(s) / (60*60*24)

d = int(temp)

temp = (temp - d) * 24

h = int(temp)

temp = (temp - h) * 60

m = int(temp)

temp = (temp - m) * 60

sec = int(temp)

return h,m,sec

#inputs

quads = sys.argv[l]

quad_id_ﬁeld = sys.argv[2]

quad_id = sys.argv[3]

state _plane_ﬁeld = sys.argv[4]

state _plane_n_sr_ﬁle = sys.argv[S]

state _plane_n_sr = gp.CreateObject("SpatialReference")
state _plane_n_sr.CreateFromF ile(state _plane_n_sr_ﬁle)
state _plane_s_sr_ﬁ1e = sys.argv[6]

state _plane_s_sr = gp.CreateObject("SpatialReference")
state _plane_s_sr.CreateFromF ile(state _plane_s_sr_ﬁle)

97

118 workspace = sys.argv[7]

119 streams_wspace = sys.argv[8] + "\\"
120 ned_dem = sys.argv[9]

121 lidar_master_wspace = sys.argv[ 10]
122 stream_id_method = sys.argv[] l]

123 elev_diff = sys.argv[12]

124 neigh_maxsize = sys.argv[l3]

125 elev_diff_trans = sys.argv[14]

126 trans_length = sys.argv[lS]

127 max_carve_length = sys.argv[l6]

128 iterations = sys.argv[l7]

129 stream_id_nhood_script = sys.argv[18]
130 stream_id_transect_script = sys.argv[l9]
131 carve_script = sys.argv[20]

132 lidar_sample = sys.argv[21]

133 del_int_data = sys.argv[22]

134

135

136 #get the path to the Python executable, for running certain parts of the script as more memory-
efﬁcient sub-processes.

137 pythonexe = os.environ.get("PYTl-lONEXE") #will be used to generate sub-processes to prevent
memory leaks

138 pythonexe += "\\python.exe"

139

140 #check to see if a new workspace must be created

141 if not os.path.exists(workspace + "\\" + quad_id):

142 os.mkdir(workspace + "\\" + quad_id)

143

144 workspace += "\\" + quad_id + "\\"

145

146 #create the log textﬁle

147 log_ﬁle_path = workspace + "\\log.txt"

148 if not os.path.exists(log_ﬁ1e_path):

149 log_ﬁle = open(log_ﬁle_path, 'w')

150 log_ﬁle.write("QUAD__1D: " + quad_id + ls + Is)

151 log_ﬁle.write("Parameters:" + Is)

152 log_ﬁle.write(" - quads: " + quads + ls)

153 log_ﬁle.write(" - quad_id_ﬁeld: " + quad_id_ﬁeld + ls)

154 log_ﬁle.write(" - quad_id: " + quad_id + Is)

155 log_ﬁle.write(" - state _plane_ﬁeld: " + state _plane_ﬁeld + ls)
156 log_ﬁle.write(" - state _plane_n_sr_ﬁle: " + state _plane_n_sr__ﬁle + ls)
157 log_ﬁle.write(" - state _plane_s_sr_ﬁle: " + state _plane_s_sr_ﬁle + Is)
158 log_ﬁle.write(" - workspace: " + workspace + ls)

159 log_ﬁle.write(" - streams_wspace: " + streams_wspace + Is)

98

160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202

log_ﬁle.write(" - ned_dem: " + ned_dem + ls)
log_ﬁle.write(" - lidar_master_wspace: " + lidar_master_wspace + ls)
log_ﬁle.write(" - stream_id_method: " + stream_id_method + ls)
log_ﬁle.write(" - elev_diff: " + elev_diff + ls)
log_ﬁle.write(" - neigh_maxsize: " + neigh_maxsize + ls)
log_ﬁle.write(" - upslope_stream_threshhold: " + upslope_stream_threshhold + ls)
log_ﬁle.write(" - max_carve_length: " + max_carve_length + ls)
log_ﬁle.write(" - iterations: " + iterations + ls)
log_ﬁle.write(" - stream_id_nhood_script: " + stream_id_nhood_script + 15)
log_ﬁle.write(" - stream_id_transect_script: " + stream_id_transect_script + ls)
log_ﬁle.write(" - stream_id_ﬂowacc_script: " + stream_id_ﬂowacc_script + 15)
log_ﬁle.write(" - carve_script: " + carve_script + ls)
log_ﬁle.write(" - lidar_sample: " + lidar_sample + ls)
log_ﬁle.write(" - delete intermediate data: " + del_int_data + Is + ls)
else:
log_ﬁle = open(log_ﬁle_path, 'w')
####################################NED analysis prep###########################

#determine whether to skip the NED section because it had already been processed
ned_ﬁd = workspace + "ned_ﬁd"

ned_fd = workspace + "ned_f "

quad_sel = workspace + "quad_sel.shp"

#Determine the state plane of the selected quad.

cursor = gp.searchcursor(quads, ' "'+ quad_id_ﬁeld +'" = \" + quad_id + '\' ')
row = cursor.next()

state _plane = row.getvalue(state_plane_ﬁeld)

if not gp.exists(ned_ﬁd):
log_ﬁle.write(time.ctime() + " Prepping the NED DEM:" + ls)

#Get the quad
gp.AddMessage(" - Selecting Quad " + quad_id)
log_ﬁle.write(time.ctime() + " - Selecting Quad " + quad_id + ls)

gp.select_analysis(quads, quad_sel, ' "'+ quad_id_ﬁeld +'" = \" + quad_id + '\' ')

#reproject the quad_boundary if necessary

quad_sel_desc = gp.describe(quad_sel)

if quad_sel_desc.SpatialReference.Name == "Unknown":
gp.AddError('The selected quad does not have a deﬁned projection.')
log_ﬁle.write(time.ctime() + " - ERROR: Selecting Quad The selected quad does not

99

have a deﬁned projection")

203 log_ﬁle.close()
204 del log_ﬁle
205 sys.exit()

206

207 quad _pcs = quad_sel_desc.SpatialReference.PCSCode
208 if state _plane = "N":

209 if quad _pcs l= state _plane_n_sr.PCSCode: #Must re-project to State Plane North

210 gp.AddMessage(" - Projecting quad boundary from " + quad_sel_desc.Name + " to " +
state _plane_n_sr.Name)

211 log_ﬁle.write(time.ctime() + " - Projecting quad boundary from " +
quad_sel_desc.Name + “ to " + state _plane_n_sr.Name + ls)

212 quad _proj = workspace + "quad_proj.shp"

213 gp.project_management(quad_sel, quad _proj, state _plane_n_sr)

2 l4 gp.delete_management(quad_sel)

215 gp.rename__management(quad_proj, quad_sel)

216 elif state _plane == "S":

217 if quad _pcs != state _plane_s_sr.PCSCode: #Must re-project to State Plane South

218 gp.AddMessage(" - Projecting quad boundary from " + quad_sel_desc.Name + " to " +
state _plane_s_sr.Name)

219 log_ﬁle.write(time.ctime() + " - Projecting quad boundary from " +
quad_sel_desc.Name + " to " + state _plane_s_sr.Name + ls)

220 quad _proj = workspace + "quad_proj.shp"

221 gp.project_management(quad_sel, quad _proj, state _plane_s_sr)

222 gp.de1ete_management(quad_seI)

223 gp.rename_management(quad_proj, quad_sel)

224

225 #Clip the NED DEM by the quad boundary
226 ned_fd = workspace + "ned_fd"
227 if not gp.exists(ned_fd):

228 gp.AddMessage(" - Clipping the NED DEM by the quad boundary")
229 log_ﬁle.write(time.ctime() + " - Clipping the NED DEM by the quad boundary" + Is)
230 ned = workspace + "ned"
231 gp.ExtractByMask_sa(ned_dem, quad_sel, ned)
232
233

234 #Get the streams

235 streams_input = streams_wspace + quad_id + "ohy.shp"
236 streams = workspace + "streams.shp"

23 7 #Check to see if the stream dataset must be re-projected
23 8 streams_desc = gp.describe(streams_input)

23 9 if streams_desc.SpatialReference.Name = "Unknown":

24 0 gp.AddError('The selected stream dataset does not have a deﬁned projection.')
24 1 log_ﬁle.write(tirne.ctime() + " - ERROR: The selected stream dataset does not have a
deﬁned projection." + ls)

100

242
243
244
245
246
247

248
249

250

251

252
253

254
255

256
257

258
259

260
261

262
263

264

265

266
267

268
269

270
271

2 72
273

274
275
276
277
278

log_ﬁle.close()
del log_ﬁle
sys.exit()

streams _pcs = streams_desc.SpatialReferencePCSCode
if state _plane == "N":

if streams _pcs != state _plane_n_sr.PCSCode: #Must re-project to State Plane North
gp.AddMessage(" - Projecting streams from " + streams_desc.SpatialReference.Name + " to
" + state _plane_n_sr.Name)
10g_ﬁle.write(time.ctime() + " - Projecting streams from " +
streams_desc.SpatialReference.Name + " to " + state _plane_n_sr.Name + ls)

#select the appropriate transformation

if '16N' in streams_desc.SpatialReference.Name:
transformation =
"Thesis_NAD l 927_UTM l 6N_to_NAD l 983_HARN__Ohio_State_Plane_North_fe"

elif '17N' in streams_desc.SpatialReference.Name:
transformation =
"Thesis_N AD 1 927_UTM 1 7N_to_NAD 1983_HARN_Ohio_State_Plane_North_fe"

gp.project_management(streams_input, streams, state _plane_n_sr, transformation)
else: #The streams projection is already in the appropriate projection

gp.AddMessage(" - Stream re-projection not needed. Copying streams over to workspace")
log_ﬁle.write(time.ctime() + " - Stream re-projection not needed. Copying streams
over to workspace." + ls) -

gp.eopy_management(streams_input, streams)
elif state _plane = "S":

if streams _pcs != state _plane_s_sr.PCSCode: #Must re-project to State Plane South
gp.AddMessage(" - Projecting streams from " + streams_desc.SpatialReference.Name + " to
" + state _plane_s_sr.Name)
log_ﬁle.write(time.ctime0 + " - Projecting streams from " +
streams_desc.SpatialReference.Name + " to " + state _plane_s_sr.Name + ls)

#select the appropriate transformation

if '16N' in streams_desc.SpatialReference.Name:
transformation =
"Thesis_NAD 1 927_UTM 1 6N_to_NAD 1 983_HARN_Ohio__State__Plane_South_fe"

elif‘17N' in streams_desc.SpatialReference.Name:
transformation =
"Thesis_NAD 1 92 7_UTM l 7N__to_NAD1 983_HARN_Ohio_State_P1ane_South_fe"

gp.project_management(streams_input, streams, state _plane_s_sr, transformation)
else: #The streams projection is already in the appropriate projection

gp.AddMessage(" - Stream re-projection not needed. Copying streams over to workspace")
log_ﬁle.write(time.ctime() + " - Stream re-projection not needed. Copying streams
over to workspace." + ls)

gp.eopy_management(streams_input, streams)

#Convert the streams to a raster for DEM burning
gp.AddMessage(" - Converting streams to raster")

101

279
280
281
282
283
284
285
286
287
288
289
290
291
292
293

294
295

296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314

315
316

317

318
319

log_ﬁle.write(time.ctime() + " - Converting streams to raster." + ls)
gp.ResetEnvironments()
gp.Extent = ned

. gp.SnapRaster = ned

gp.Mask = ned

gp.CellSize = ned

streams _g_temp = workspace + "streams_g_tmp"
gp.PolylineToRaster_conversion(streams, "Stream", streams_g_temp)

streams _g = workspace + "streams_g"

gp.Sing1eOutputMapAlgebra_sa("con(lsNull(" + streams_g_temp +"), O, 1)", streams_g)
gp.delete_management(streams_g__temp)

#Burn the streams into the NED DEM
gp.AddMessage(" - Burning streams into NED DEM")
log_ﬁle.write(time.ctime() + " - Burning streams into NED DEM." + ls)

ned_b = workspace + "ned_b"
gp.SingleOutputMapAlgebra_sa("con(" + streams_g + " == 1, " + ned + ", " + ned + " + 10)",

ned_b)

#F ill the NED DEMs

gp.AddMessage(" - Filling NED DEM")
log_ﬁle.write(time.ctime() + " - Filling NED DEM." + ls)
ned_bf = workspace + "ned_bf"

gp.ﬁll_sa(ned_b, ned_bf)

#Calculate NED Flowdirection

print "Calculating NED ﬂowdirection"

gp.AddMessage(" - Calculating NED ﬂow-direction")
log_ﬁle.write(time.ctime() + " - Calculating NED ﬂow-direction." + ls)
ned_fd = workspace + "ned_fd"

gp.ﬂowdirection_sa(ned_bf, ned_fd)

#determine if ned_fd needs to be re-projected
gp.ResetEnvironments()

ned_fd_desc = gp.describe(ned_fd)

ned_fd_pcs = ned_fd_desc.SpatialReference.PCSCode
if state _plane = "N":

if ned_fd_pcs != state _plane_n_sr.PCSCode: #Must re-project to State Plane North
gp.AddMessage(" - Projecting ned_fd boundary ﬁom " +

ned_fd_desc.SpatialReference.Name + " to " + state _plane_n_sr.Name)

log_ﬁle.write(time.ctime() + " - Projecting ned_fd boundary from " +

ned_fd_desc.SpatialReference.Name + " to " + state _plane_n_sr.Name + ls)

ned_ fd_proj= workspace + "_ned fd_proj"
gp.ProjectRaster_ management(ned__ fd, ned_ fd_proj, state _plane n_ sr, "NEAREST", "#",

"NAD__ 1983 _To__ HARN _Ohio")

102

320
321
322

323
324

325

326
327

328
329
330

331
332

333
334
335
336

337
338

339
340
341

342
343

344
345
346

347
348

349

350
351
352
353
354
355

356
357

gp.delete_management(ned_fd)
gp.rename_management(ned_fd_proj, ned_fd)
elif state _plane = "S":

if ned_fd_pcs != state _plane_s_sr.PCSCode: #Must re-project to State Plane South
gp.AddMessage(" - Projecting ned_fd boundary from " + ned_fd_desc.Name + " to " +
state _plane__s_sr.Name)
log_ﬁle.write(time.ctime() + " - Projecting ned_fd boundary from " +
ned_fd_desc.SpatialReference.Name + " to " + state _plane_s_sr.Name + ls)

ned_fd_proj = workspace + "ned_fd_proj"
gp.ProjectRaster_management(ned_fd, ned_fd _proj, state _plane_s_sr, "NEAREST", "#",
"NAD_1983_To_l-IARN_Ohio")

gp.delete_management(ned_fd)
gp.rename_management(ned_fd_proj, ned_fd)

#Copy the NED ﬂowdirection raster to a TlF F format for use as a template dataset for writing
error output raster with GDAL later

gp.AddMessage(" - Creating template raster for flow error output")
log_ﬁle.write(time.ctime() + " - Creating template raster for ﬂow error output." + ls)
ned_fd_ﬂoat = workspace + "\\ned_fd_ﬂoat"

gp.Float_sa(ned_fd, ned_fd_ﬂoat)

template_raster = workspace + "\\template_raster.tif'
gp.CopyRaster_management(ned_fd_ﬂoat, template_raster, "#", "#", "#", "NONE", "NONE",
"32__BlT_FLOAT")

gp.de1ete_management(ned_fd_ﬂoat)

#convert the ned_fd raster to a point feature dataset

gp.AddMessage(" - Converting NED ﬂow-direction to a point feature dataset")
log_ﬁle.write(time.ctime() + " - Converting NED ﬂow-direction to a point feature dataset.
+ ls)

ned_fd_pt = workspace + "ned_fd_pt.shp"
gp.RasterToPoint_conversion(ned_fd, ned_fd_pt, "VALUE")

#convert back to a raster, but with the FID values as the raster values

gp.AddMessage(" - Converting NED ﬂow-direction point feature dataset back to a raster with
the F ID Values as raster values")

log_ﬁle.write(time.ctime() + " - Converting NED ﬂow—direction point feature dataset back
to a raster with the FID Values as raster values." + ls)

gp.ResetEnvironments()

gp.Extent = ned_fd

gp.CellSize = ned_fd

gp.Mask = ned_fd

gp.SnapRaster = ned_fd

gp.PointToRaster_conversion(ned_fd_pt, "F 1D", ned_ﬁd, "MOST_FREQUENT", "NONE")

gp.delete_management(ned_fd_pt)
#re-generate the raster using Map Algebra, this prevents the addition of NoDATA cells around
the permiter

103

358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393

394
395

396
397
398
399

#of ned_ﬁd, which PointToRaster can do.
gp.ResetEnvironments()

gp.Extent = ned_fd

gp.CellSize = ned_fd

gp.Mask = ned_fd

gp.SnapRaster = ned_fd

ned_ﬁd2 = ned_ﬁd + "2"
gp.SingleOutputMapAlgebra_sa(ned_ﬁd, ned_ﬁd2)
gp.delete_management(ned_ﬁd)
gp.rename_management(ned_ﬁd2, ned_ﬁd)

#clean up

if del_int_data = "True":
gp.delete_management(streams)
gp.de1ete_management(streams_g)
gp.delete_management(ned_b)
gp.delete_management(ned_bf)

#get the width of the NED ﬂowdirection raster, for storing in the parameters text ﬁle
gp.AddMessage(" - Reading in NED raster cell size")

log_ﬁle.write(time.ctime() + " - Reading in NED raster cell size." + ls)
ned_descr = gp.describe(ned_fd)

ned_height = ned_descr.meancellheight

ned_width = ned_descr.meancellheight

################################################################################

########################LiDDAR analysis prep####################################

if not os.path.isdir(workspace + "lidar"):
os.mkdir(workspace + "lidar")

#determine the LiDAR datasets that intersect the selected quad
log_ﬁle.write(ls + "LiDAR DEM prep:" + ls)

gp.AddMessage(" - Identifying LiDAR Rasters that intersect Quad " + quad_id)
log_ﬁle.write(time.ctime() + " - Identifying LiDAR Rasters that intersect Quad " + quad_id
+ ls)

quad_sel_desc = gp.describe(quad_sel)

quad_sel_ll_x = quad_sel_desc.extent.xmin
quad_sel_ll_y = quad_sel_desc.extent.ymin

104

 

400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418

419
420

42 1
422
423
424
425

426
427

428
429

430
431

432

433

434
435
436
437
438
439

quad_sel_height = quad_sel_desc.extent.Height
quad_sel_width = quad_sel_desc.extent.Width

#determine the name of the LiDAR dataset that contains the lower_left comer of the quad
#ﬁrst get the rasters
lidar_rasters = []
lidar_master_wspace_contents = os.listdir(lidar_master_wspace)
for i in lidar_master_wspace_contents:

if os.path.isdir(1idar_master_wspace + "\\" + i):

lidar_rasters.append(i)

del lidar_master_wspace_contents

#determine the size of a LiDAR raster

lidar_descr = gp.describe(lidar__master_wspace + "\\" + lidar_rasters[O] + "\\" + lidar_rasters[0])
1idar__height= lidar_descr.extent.ymax - lidar_descr.extent.ymin

lidar_width = lidar_descr.extent.xmax - lidar_descr.extent.xmin

lidar_rasters_per_quad_height = int(math.cei1(quad_sel_height / lidar_height))
lidar_rasters_per_quad_width = int(math.ceil(quad_sel_width / lidar_width»

for i in range(len(lidar_rasters)):
#get the lidar lower left coordinates from the LiDAR name (ﬁrst is state plane system, 2-5 are x
and last 3 are y)

lidar_ll_x = int(lidar_rasters[i][ 1 :5] + "000")
lidar_ll_y = int(lidar_rasters[i][Sz] + "000")

ll_x_diff = quad_sel_ll_x - lidar_ll_x
ll_y_diff = quad_sel_ll_y - lidar_ll_y

if 11_x_diff < lidar_width and ll_x_diﬁ‘ >=0 and ll_y_diff < lidar_height and ll_y_diff >=0:
#we've found the lower left LiDAR Dataset

ll_lidar_raster = lidar_rasters[i]

#get the other deﬁning rasters

ul_lidar_raster = lidar_rasters[i][:S] + str(lidar_ll_y + ((lidar_rasters_per_quad_height - 1) *
lidar_height))[z3] '

ur_lidar_raster = lidar_rasters[i][: 1] + str(lidar_ll_x + ((lidar_rasters_per_quad_width - 1) *
lidar_width))[:4] + str(lidar_ll_y + ((lidar_rasters_per_quad_height - l) * lidar_height))[z3]

lr_1idar_raster = lidar_rasters[i][:l] + str(lidar_ll_x + ((lidar_rasters_per_quad_width - 1) *
lidar_width))[:4] + lidar_rasters[i][5:]

break
#use the comer rasters to deﬁne a list of rasters for the entire quad
lidar_raster_list = [U] * lidar_rasters_per_quad_height

for i in range(lidar_rasters_per_quad_height):

105

453
453
454
455
456

457

459
460
461
461

440
44 1
442
443

445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482

lidar_raster_list[i] = [U] * lidar_rasters_per_quad_width

x_location = int(ll_lidar__raster[1:5] + "000")
y_location = int(11_lidar_raster[5:] + "000")
preﬁx = state _plane.lower()

for i in range(lidar_rasters_per_quad_height):
for j in range(lidar_rasters_per_quad_width):
lidar_raster_list[i][j] = preﬁx + str(x_location)[:4] + str(y_location)[:3]
x_location += lidar_width
y_location += lidar_height
#reset the x_location
x_location = int(ul_lidar_raster[l :5] + "000")

#Trim the boundaries off of lidar_raster_list to avoid edge issues during the analysis.

#There will be some areas where the NED dem does not cover the entire LiDAR dem.

lidar_raster_list2 = []

for i in range(l , len(lidar_raster_list) - 1):
lidar_raster_list2.append(lidar__raster_list[i][ l :len(lidar_raster_list[i]) - 1])

del lidar_raster_list

if lidar_sample = "True":
#remove half of the rasters in the form of staggered lattice
lidar_raster_list3 = []
for i in range(len(lidar_raster_list2)):
lidar_raster_list3 .append([])
#determine whether i is odd or even
if i % 2 = 0: # it's even, start at the ﬁrst index of the row
for j in range(O, len(lidar_raster_list2[i]), 2):
lidar_raster_list3[i].append(lidar_raster_list2[i][j])
else: # its odd, start at the second index of the row
for j in range(l, len(lidar_raster_list2[i]), 2):
lidar_raster_list3[i].append(lidar_raster_list2[i][j])
else: ‘
lidar_raster_list3 = lidar_raster_listZ

#move the list into a simpler to handle single dimension
lidar_raster_list = [] 4
for i in range(len(lidar_raster_list3)):
for j in range(len(lidar_raster_list3[i])):
lidar_raster_list.append(lidar_raster_list3 [ i] U ])

106

483
484
485
486
487
488
489
490

49 1
492

493

494
495
496
497
498
499
500
501
502
503
504
505

506
507

508

509

510
511

512

513
514

515
516

517
518

519

520

del lidar_raster_listZ, lidar_raster_list3

#call code to prep the LiDAR tiles
for i in range(len(lidar_raster_list)):
if not os.path.exists(workspace + "lidar\\" + lidar_raster_list[i]):
os.mkdir(workspace + "lidar\\" + lidar_raster_list[i])

gp.AddMessage(" - Prepping LiDAR tile " + str(lidar_raster_list[i]) + " (" + str(i) + " of " +
str(len(lidar_raster_list)) + ")")

log_ﬁle.write(time.ctime() + " - Prepping LiDAR tile " + str(lidar_raster_list[i]) + " (" +
str(i) + " of " + str(len(lidar_raster__list)) + ")" + ls)

1idar_prep_time_start = time.clock()

#get the lidar raster
lidar_dem = lidar_master_wspace + "\\" + lidar_raster_list[i] + "\\" + lidar_raster_list[i]
lidar_wspace = workspace + "lidar\\" + lidar_raster_list[i]

#identify the stream network
lidar_fd = lidar_wspace + "\\lidar_fd"
if not os.path.exists(lidar_fd): #then the code did not complete for this raster

lidar_stream = lidar_wspace + "\\lidar_stream"
if not os.path.exists(lidar_stream):
stream_id_time_start = time.clock()

gp.AddMessage(" - Identifying stream cells with the " + stream_id_method + "
method")
log_ﬁ1e.write(time.ctime() + " - Identifying stream cells with the " +

stream_id_method + " method" + ls)

if stream_id_method == "neighborhood":
call([pythonexe, stream_id_nhood_script, lidar_dem, elev_diff, neigh_maxsize,
lidar_stream, lidar_wspace])

elif stream_id_method = "transect":
call([pythonexe, stream_id_transect_script, lidar_dem, neigh_maxsize, elev__diff,
trans_stream, lidar_wspace])

elif stream_id_method == "neighborhood transect combo":

#run the neighborhood method

gp.AddMessage(" - First identifying stream cells with the neighborhood
method")

log_ﬁle.write(time.ctime() + " - First identifying stream cells with the
neighborhood method" + ls)

nhood_stream = lidar_wspace + "\\nhood_stream"

107

521

522
523

524
525

526
527

528
529

530
531

532
533
534

535
536

537
538
539

540
54 l

542
543
544
545
546
547

548
549

550
SS 1
552
553
554

555
556

557

558

call([pythonexe, stream_id_nhood_script, lidar_dem, elev_diff, neigh_maxsize,

nhood_stream, lidar_wspace])

#run the transect method

gp.AddMessage("
10g_ﬁle.write(time.ctime() + ..

- Next identifying stream cells with the transect method")
- Next identifying stream cells with the

transect method" + ls)

trans_stream = lidar_wspace + "\\trans__stream"
call([pythonexe, stream_id_transect_script, lidar_dem, neigh_maxsize, elev_diff,

trans_stream, lidar_wspace])

#combine the two results

gp.AddMessage("
10g_ﬁle.write(timeetimeo + u

- Combining the neighborhood and transect outputs")
- Combining the neighborhood and transect

outputs" + ls)

gp.ResetEnvironments()
gp.Extent = lidar_dem
gp.Mask = lidar_dem

gp.SnapRaster = lidar_dem
gp.SingleOutputMapAlgebra_sa("con(" + nhood_stream + " == 1 l " + trans_stream + "

== 1, 1, 0)", lidar_stream)

else:

gp.AddError("UNRECOGNIZED STREAM IDENTIFICATION METHOD")
log_ﬁle.write(time.ctime() + " - ERROR: UNRECOGNIZED STREAM

IDENTIFICATION METHOD")

log_ﬁle.close()
del log_ﬁle
sys.exitO

stream_id_time_stop = time.clock()

h,m,s = sec_to_h_min(stream_id_time_stop - stream_id_time_start)

gp.AddMessage(" - ID took " + str(m) + " minutes " + str(s) + " seconds")
log_ﬁle.write(time.ctime() + " - 1D took " + str(m) + " minutes " + str(s) + "

seconds" + ls)

+ ls)

#use the identiﬁed stream cells to burn through artiﬁcial barriers in the LiDAR DEM
carved_dem = lidar_wspace + "\\carved_dem"
if not os.path.exists(carved_dem):

stream_carve_time_start = time.clock()

gp.AddMessage(" - Carving stream cells through artiﬁcial barriers")
log_ﬁle.write(time.ctime() + " - Carving stream cells through artiﬁcial barriers"

call([pythonexe, carve_script, lidar_dem, lidar_stream, max_carve_length, iterations,

carve_iteration_script, "true", carved_dem, lidar_wspace])

stream_carve_time_stop = time.clock()

108

559 h,m,s = sec_to_h_min(stream_carve_time_stop - stream_carve_time_start)

550 gp.AddMessage(" - Carve took " + str(m) + " minutes " + str(s) + " seconds")
561 log_ﬁle.write(time.ctime() + " - Carve took " + str(m) + " minutes " + str(s) + "
seconds" + Is)
562
563 #burn in the identiﬁed streams into the LiDAR DEM
564 gp.ResetEnvironments()
565 lidar_bum = lidar_wspace + "\\lidar_bum"
566 if not os.path.exists(lidar_burn):
567 gp.AddMessage(" - Burning streams into LiDAR DEM")
568 log_ﬁle.write(time.ctime() + " - Burning streams into LiDAR DEM" + Is)
569 gp.SingleOutputMapAlgebra_sa("con(" + lidar_stream + " == 1, " + carved_dem + ", " +
caned_dem + " + 10)", lidar_bum)
570
571 #ﬁll the carved and stream_bumed LiDAR DEM
572 1idar_ﬁll = lidar_wspace + "\\lidar_ﬁll"
573 if not os.path.exists(lidar_ﬁll):
574 gp.AddMessage(" - Filling the LiDAR DEM")
575 log_ﬁle.write(time.ctime() + " - Filling the LiDAR DEM" + Is)
576 gp.ﬁll_sa(lidar_bum, lidar_ﬁll)
577
573 #calculate ﬂow-direction on the ﬁlled LiDAR DEM
579 gp.AddMessage(" - Calculating LiDAR DEM ﬂow-direction")
580 log_ﬁle.write(time.ctime() + " - Calculating LiDAR DEM ﬂow-direction" + Is)
531 gp.ﬁowdirection_sa(lidar_ﬁll, lidar_fd)
582
583 #create a version of the LiDAR dataset with the NED FID values
584 lidar_ned_ﬁd = lidar_wspace + "\\lidar_ned_ﬁd"
535 if not os.path.exists(lidar_ned_ﬁd):
586 gp.AddMessage(" - Creating LiDAR scale raster of NED FID values")
587 log_ﬁle.write(time.ctime() + " - Creating LiDAR scale raster of NED FID values"
+ Is)
588 gp.ResetEnvironments()
589 gp.Extent = lidar_fd
59° gp.CellSize = lidar_fd
591 gp.Mask = lidar_fd
592 gp.SnapRaster = lidar_fd
:2: gp.SingleOutputMapA1gebra_sa(ned_ﬁd, lidar_ned_ﬁd)
595
596 #Getting the LiDAR cell size for the parameters text ﬁle that will inform the
597 #selection of LiDAR tiles based on the NED DEM coordinate
533 gp.AddMessage(" . - Recording LiDAR cellsize info for parameters text ﬁle")
log_ﬁle.wr1te(t1me.ctrme() + " - Recording LiDAR cells1ze mfo for parameters text
ﬁle" + Is)

109

600
601
602
603
604
605
606
607
608
609
610
61 1
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
63 1
632
633
634
63 5
636
637
638
639
640

642

lidar_descr = gp.describe(lidar_fd)

lidar_height = Iidar_descr.meance|lheight

lidar_width = lidar_descrmeancellheight

lidar_cells_per_ned_cell = «ned_height / lidar_height) + (ned_width / lidar_width)) "' 0.5
buffer_size = int(lidar_cells_per_ned_ce|l "' 0.5)

#write cell size information to a parameters textﬁle for processing
parameters_ﬁle_path = lidar_wspace + "\\ﬁow_analysis_parameters.txt"
parameters_ﬁle = open(parameters_ﬁle_path, 'w')
parameters_ﬁle.write("ned_height;" + str(ned_height) + ls)
parameters_ﬁle.write("ned__width;" + str(ned_width) + ls)
parameters_ﬁle.write("lidar_height;" + str(lidar_height) + ls)
parameters_ﬁle.write("lidar_width;" + str(lidar_width) + ls)
parameters_ﬁle.write("lidar_cells_per_ned_cell;" + str(lidar_cells_per_ned__cell) + ls)
parameters_ﬁle.write("buﬁ‘er_size;" + str(buffer_size) + ls)
parameters_ﬁle.close()

del parameters_ﬁle

#create a binary raster of ﬂat areas, so these areas may be discarded
#in the calculation of ﬂow-direction error
gp.AddMessage(" - Creating a mask of ﬂat areas")
log_ﬁle.write(time.ctime() + " - Creating a mask of ﬂat areas" + ls)
gp.ResetEnvironmentsO
gp.Extent = 1idar_ﬁll
gp.SnapRaster = lidar_ﬁll
gp.Mask = lidar_ﬁll
gp.CellSize = 1idar_ﬁll
#calculate slope
slope_temp = lidar_wspace + "\\slope_temp"
if not gp.Exists(slope_temp):
gp.Slope_sa(lidar_ﬁll, slope_temp, "PERCENT_RISE")
#convert slope = 0% to binary grid
ﬂat__areas = lidar_wspace + "\\ﬁat_areas"
if not gp.Exists(ﬂat_areas):
gp.SingleOutputMapAlgebra_sa("con(" + slope_temp + " == 0, 1, 0)", ﬂat_areas)
gp.delete_management(slope_temp)

#clean up

if del_int_data == "True":
gp.AddMessage(" - Deleting intermediate data")
log_ﬁle.write(time.ctime() + " - Deleting intermediate data" + ls)

if os.path.exists(carved_dem):

110

gp.delete_management(carved_dem)

643
644 if os.path.exists(lidar_stream):
645 gp.delete_management(lidar_stream)
646 if os.path.exists(lidar_burn):
647 gp.delete_management(lidar_bum)
643 if os.path.exists(lidar_ﬁll):
649 gp.delete_management(lidar_ﬁll)
650
65 1
652 gp.ResetEnvironments()
6S 3
654 seconds = time.clock() - 1idar_prep_time_start
65 3 h,m,s = sec_to_h_min(seconds)
65 6
6 S 7 gp.AddMessage(" - Tile " + str(lidar_raster_list[i]) + " took " + str(m) + " minutes " +
6 str(s) + " seconds")
5 8 log_ﬁle.write(time.ctime() + " - Tile " + str(lidar_raster_list[i]) + " took " + str(m) + "
minutes " + str(s) + " seconds" + Is)
6 S 9

660 log_ﬁle.close()
66 1 del log_ﬁle

111

soooqauAuJN-i

wwwwwmwwwwNNNNNNNNNN---~.—1......
cmanAwn—oxoooqmuhum—oomqauaw~_c

APPENDIX B -— Python Code

stream_dlg_conversion.py

#
# stream_dlg_conversion.py
# Created September 2009
# Author: Glenn O'Neil

 

#

# Description: Takes a directory of DLG hydrography ﬁles and converts each
# ﬁles relevant features to ESRI shapeﬁles. Crashes due to an

# ESRI bug if it tries to process more than 70 ﬁles.

#

# Inputs: 1. d1g_wspace - directory containing DLG hypsography ﬁles.

# 2. scratch_wspace — directory for temporary ﬁles.

# 3. out_wspace - directory where the output shapeﬁles are written.

#

# Outputs: 1. <quad_name>__ohy.shp — shapeﬁle of quadrangle contours.
#
import os, arcgisscripting, sys

 

gp = arcgisscripting.create(9.3)

d1g_wspace = sys.argv[ 1]
scratch_wspace = sys.argv[2] + "\\"
out_wspace = sys.argv[3] + "\\"

#get the dlg ﬁles
d1g_wspace_list = os.1istdir(dlg_wspace)
dlg_list = []
#ﬁlter the list so it only contains .dlg ﬁles
for i in range(len(dlg_wspace_list)):
if '.d1g' in (dlg_wspace_list[i]):

d1g_list.append(dlg_wspace_list[ 1])

del d1g_wspace_list

d1g_list_length = len(dlg__list)
for i in range(dlg_list_length):
gp.AddMessage("Processing " + str(i + l) + " of " + str(dlg_list__length) + ":")

#convert the dlg to a coverage

gp.AddMessage(" - converting DLG to coverage")
dlg_cov = scratch_wspace + "d1g_cov"
gp.dlgarc(dlg_wspace + "\\" + dlg_list[i], dlg_cov)

112

4o

4 1
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

58
59

60
61
62
63
64
65
66
67
68
69

#convert the coverage to a shapeﬁle

gp.AddMessage(" - converting coverage to shapeﬁle")
dlg_shp = dlg_cov + "_arc.shp"
gp.FeatureClassToShapeﬁle(dlg_cov + "\\arc", scratch_wspace)

#join the .acode ﬁle to the shapeﬁle table on the ID ﬁeld
gp.AddMessage(" - joining acode table to shapeﬁle")
acode = dlg_cov + ".acode"

gp.joinﬁeld (dlg_shp, "ID", acode, "DLG_COV-ID")

#extract the appropriate stream features
gp.AddMessage(" - extracting stream features")
#get the appropriate ﬁelds from dlg_shp
code_ﬁelds = gp.]istﬁelds(dlg_shp, 'MINOR*')

select_query = "'MAIORI " = 50 AND ('
for j in range(len(code_ﬁelds)):
if j != (1en(code_ﬁelds) - I):
select_query += "" + code_ﬁeldsU].Name + '" IN (400,401 ,412,413,414,605,606) OR

else: #treat the last record differently
select_query += "" + code_ﬁeldsﬁ].Name + "' IN (400,401,412,413,414,605,606))'

dlg_out = out_wspace + dlg_list[i][:8] + ‘.shp'
gp.select_analysis(dlg_shp, dlg_out, select_query)

gp.AddMessage(" - deleting intermediate data")
gp.delete_management(dlg_cov)
gp.delete_management(d1g_shp)

70 del dlg_list, select_query

113

53,551,418..-“

19

APPENDIX B — Python Code

stream_id_nhood.py

#
# strearn_id_nhood.py

# Created September 2009
# Author: Glenn O'Neil

 

 

#

# Description: Takes an elevation raster and identiﬁes potential stream cells
# based on their relationship to cell neighborhood of user-speciﬁed
# size. The basic premise is that stream cells will be opposed

# by higher elevation in one direction (stream banks) and similar

# elevations in the direction perpendicular to the bank cells.

#

# Inputs: 1. elev_raster - the elevation raster.

# 2. elev_diff - the difference in elevation between a cell and a

# neighbor that, when exceeded, could potentially represent

# a stream bank.

# 3. neigh_maxsize - the maximum neighborhood size (in cells) to
# search for the stream cell relationship deﬁned above.

# 4. stream_raster - the output binary stream raster.

# 5. workspace - the directory where temporary ﬁles will be written.
#

# Outputs: 1. stream_raster - the output binary stream raster.

#

from decimal import *

ﬁ‘om math import ﬂoor

import os, sys, time, arcgisscripting
import raster_analysis as raster

IS = os.1inesep

gp = arcgisscripting.create(9.3)

arcgis_home = os.environ.get("ARCGISHOME")
gP-AddT°°lbox(angis_home + "ArcToolbox\\Toolboxes\\Conversion Tools.tbx")

#function to convert seconds to hours and minutes (borrowed from:
#httpz/lmail.python.org/pipennail/python-list/2003-January. l 81366.html
def sec_to_h_min(s):

temp = ﬂoat()
temp = ﬂoat(s) / (60*60*24)
d = int(temp)

temp = (temp - d) * 24

114

h = int(temp)
temp = (temp - h) * 60
m = int(temp)
temp = (temp - m) "‘ 60
sec = int(temp)
return h,m,sec

#get the necessary ﬁles from the user

elev_raster = sys.argv[l]

#get the elevation difference that will identify streambed cells\
e1ev_diff= sys.argv[2]

elev_diff = ﬂoat(elev_difl)

#get the max neighborhood size\

neigh_maxsize = sys.argv[3]

#get the ﬁle name and path for the output stream raster
stream_raster = sys.argv[4]

#get the scratch workspace

workspace = sys.argv[S] + "\\"

#start the timer
start_time = time.clock()

#convert the raster to a text ﬁle

elev_ascii_path = workspace + "elev_ascii.txt"
gp.AddMessage("- Converting raster to ASCII")
gp.RasterToASCII_conversion(elev_raster, elev_ascii_path)

#record the header information for writing in the output stream ﬁle
header_list = raster.extract_header(elev_ascii_path)

#read the elevation text raster into a list
elev_list = raster.rastertext21ist(elev__ascii_path, 'float')
os.remove(elev_ascii_path)

#intiate a list to represent the resulting stream ascii, set initial values to 0.
stream_list = [U] "' 1en(e1ev_list)
for i in range(len(stream_list)):

stream_list[i] = [0] "' 1en(e1ev_list[0])

#Move through elev_list and look at the surrounding neighbors for
#values that exceed the user-speciﬁed elevation difference. If one is found,
#check the opposite cell. For example, if a southeast cell is above the

115

83 #threshold, check the corresponding northwest cell. If both are above the

84 #threshold, then the center cell may be the stream. To check, check the opposite
85 #set of cells (in the example above this would be the southwest and northwest
86 #cells. If they are below the threshold, then that direction may be the stream.

8 7 #Code those cells to l in the stream list.

#counters for output
streambed_cells = 0

#a function for analyzing neighborhoods
def nhood (thelist, row_index, col_index, nsize):

#thelist is the list to operate on
#row_index is the row index number of the center cell

#col_index is the column index number of the center cell

#nsize is the size of the neighborhood buffer (nsize of 1 = 3x3 neighborhood)

ubound_index = row_index - nsize #the upper index
bbound_index = row_index + nsize #the bottom index

Ibound_index = col_index - nsize #the left index
rbound_index = col_index + nsize #the right index

global streambed__cells

#check to see if we're near the boundary of the dataset, and adjust accordingly
if ubound_index < O: ubound_index = 0

if bbound_index > (len(thelist) - 1): bbound_index = (len(thelist) - 1)

if Ibound_index < 0: Ibound_index = 0

if rbound_index > (1en(thelist[0]) - l): rbound_index = (1en(thelist[0]) - I)

#ﬁnd the cell in the neighborhood with the maximum steepness
center_value = thelist[row_index][col_index]
max_value = center_value
max_location = [row_index,col_index]
for k in range(ubound_index, bbound_index):
for l in range(lbound_index, rbound_index):
if thelist[k][l] > max_value:

max_value = thelist[k][l]

max_row_index = k

max_col_index = I

if (max_value - center_value) > elev_diff: #There was a cell in the neighborhod
#higher than the center cell and above the elevation difference threshold.

116

126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165

166
167

#Now look for another steep cell on the opposite side of the center cell
#ﬁrst, ﬁnd the distance between the max location and center cell
row_distance = row_index - max_row_index

col_distance = col_index - max_col_index

abs_row_distance = abs(row_distance) if abs(row_distance) > 0 else 1
abs_col_distance = abs(col_distance) if abs(col_distance) > 0 else 1

opposite_row_index = row_index + row_distance
opposite_col_index = col_index + col_distance

#check for the boundaries
if opposite_row_index < 0: opposite_row_index = 0
elif opposite_row_index > bbound_index: opposite_row_index = bbound_index

if opposite_col_index < 0: opposite_col_index = 0
elif opposite_col_index > rbound_index: opposite_col_index = rbound_index

if (thelist[opposite_row_index][opposite_col_index] - center_value) > elev_diff:
#Then the opposite side is also above the threshold. The center of
#the neighborhood may be a the stream.

#Get the indexes of the recipricol diagonal (e.g. the peaks were northeast and southwest of
#the center cell. Now check northwest and southwest to see if it's less than the threshold.
recip_row_index1 = row_index - col_distance

recip_col_indexl = col_index + row_distance

recip_row_index2 = row_index + col_distance

recip_col_index2 = col_index - row_distance

if recip_row_index1 < 0: recip_row_index1 = 0
elif recip_row_index1 > bbound_index: recip_row_index1 = bbound_index

if recip_row_index2 < 0: recip_row_index2 = 0
elif recip_row_index2 > bbound_index: recip_row_index2 = bbound_index

if recip_col_indexl < 0: recip_col_indexl = 0
elif recip_col_indexl > rbound_index: recip_col_indexl = rbound_index

if recip_col_index2 < 0: recip_col_index2 = 0
elif recip_col_index2 > rbound_index: recip_col_index2 = rbound_index

if (thelist[recip_row_indexl][recip_col_indexl] - center_value) < elev_diff and

(theliStIrecip_row_index2][recip_col_index2] - center_value) < elev_diff:

117

168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192

193
194

195
196
197
198
I99
200
201

202
203

204
205
206
207
208
209

streambed_cells += 1
#it might be the stream.

#We now begin to deﬁne the stream basin.

#In this instance, we will deﬁne a rectangular region, with the max value location,
#it's opposite index, and the two reciprocal indidecs as the comers.

#The area of the rectangle will be coded l, deﬁning it as part of the streambed.
#To do this we must determine the slope of the rectangles edges. This will
#enable us to deﬁne the area within the edges. Though, we must determine the
#path from one comer to the other. Here is an example.

#If the slope was 15/7, it would make a big L shaped edge.

#We need to ﬁnd the relatively straight path from each comer to the next.

#We do this by reducing the smaller number of the

#slope to 1. For example, a slope of 15/7 would reduce to a horizontal (H)
#movment of 1 for every 2.5 vertical movements (V). Of course we can't move
#25 cells. So we will construct a list of vertical movements for the path ﬁ'om

# the max cell to the ﬁrst reciprocal cell (one edge of the rectangle). In our
#example above, half of the vertical movements in that list will be a value of 3,
#and half will be 2. Because, along the path, for every horizontal movement of 1,
#50% of the time the subsequent vertical movement will be 2, and 50% of the time 3.

#We had previously determined the absolute row and col distance from the max cell
#to the center cell. The slope of the edge from themax cell to the ﬁrst reciprocal
#cell (one edge of the rectangle) is as follows:

#Calculate new row and col distances. This time from the max index to the ﬁrst

#Watch out for the edges. They could throw off these calculations
#using absolute values can help avoid the problem. Just swap the values
orig_max_row_index = max_row_index

orig_max_col_index = max_col_index

orig_recip_row_indexl = recip_row_index1

orig_recip_col_index1 = recip_col_indexl

if (abs(row_index - max_row_index) < abs(row_index - opposite_row_index» or

(abs(col_index - max_col_index) < abs(col_index - opposite_col_index»:

#you need to switch the max and opposite indexes
max_row_index = opposite_row_index
max_col_index = opposite_col_index
opposite_row_index = orig_max_row_index
opposite_col_index = orig_max_col_index

118

210

211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239

240
24 1

242
243
244
245
246
247
248
249
250
25 1

if (abs(row_index - recip_row_index1) < abs(row_index - recip_row_index2)) or

(abs(col_index - recip_col_indexl) < abs(col_index - recip_col_index2)):

#you need to switch the reciprocal indexes
recip_row_index1 = recip_row_index2
recip_col_indexl = recip_col_index2
recip_col_index2 = orig_recip_col_indexl
recip_row_index2 = orig_recip_row_indexl

row_distance2 = max_row_index - recip_row_index1
col_distance2 = max_col_index - recip_col_indexl

abs_row_distance2 = ﬂoat(abs(row_distance2))
abs_col_distance2 = ﬂoat(abs(col_distance2))

#Initiate lists that will store the h_move and v_move lists.

#The ﬁrst list will contain the h_move and v_move values for the path
#from the max index to the ﬁrst reciprocal index.

#The second list will contain the h_move and v_move values for the path
#from the max index to the second reciprocal index.

#The stream basin will be deﬁned by looping throught the ﬁrst list
#within the second list.

#The list sizes will be the same as the minimum value between abs_row_distance2
#and abs_col_distance2.

list_size = min(abs_row_distance2, abs_col_distance2)

if list_size = 0: list_size = 1

abs_difference = abs(abs_row_distance2 - abs_col_distance2)

move_quotient = 0

#Detennine the h_move and v_move
if abs_row_distance2 < abs_col_distance2 and abs_row_distance2 != 0: #we're stepping

horizontally

primary_move = "horizontal"
#recod the maximum H distance for a V distance of 1
if col_distance2 > 0: #we're stepping left
if abs_row_distance2 > 0:
move_quotient = (abs_col_distance2/abs_row_distance2) * -l
min_h_move = int(ﬂoor(move_quotient)) + 1
if abs(abs_col_distance2 % abs_row_distance2) > 0:
max_h_move = min_h_move - 1
else:
max_h_move = min_h_move

119

 

252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272

273
274

275
276
277

278
279

280

281
282

283
284
285
286
287
288
289
290
291
292

else: #it's a straight line down to the destination index
min_h_move = col_distance2
max_h_move = min_h_move
elif col_distance2 < 0: #we're stepping right
if abs_row_distance2 > 0:
move_quotient = abs_col_distance2/abs_row_distance2
min_h_move = int(ﬂoor(move_quotient))
if abs(abs_col_distance2 % abs_row_distance2) > 0:
max_h_move = min_h_move + 1
else:
max_h_move = min_h_move
else: #it's a straight line down to the destination index
min_h_move = col_distance2 "‘ -l
max_h_move = min_h_move

if row_distance2 < 0: v_move = l
elif row_distance2 > 0: v_move = -1
else: v_move = 0

if min_h_move == max_h_move:
h_move_listl = [min_h_move] * int(list_size)

else:
h_move_listl = move_list(list_size, min_h_move, max_h_move, move_quotient,
row_index, col_index)

v_move_list1 = [v_move] * int(list_size)

e1ifabs_col_distance2 < abs_row_distance2 and abs_col_distance2 != 0: #we're
stepping vertically

primary__move = "vertical"

#recod the maximum V distance for an H distance of 1 .
if row_distance2 > O: #we're stepping up (visually, not in terms of row index
numbers)

if abs_col_distance2 > 0:
move_quotient = (abs_row_distance2/abs__col_distance2) * -1
min_v_move = int(ﬂoor(move_quotient)) + 1
if abs(abs_row_distance2 % abs_col_distance2) > 0:
max_v_move = min_v_move - 1
else:
max_v_move = min_v_move
else: #it's a straight line down to the destination index
min_v_move = row_distance2
max_v_move = min_v_move

120

 

 

 

 

293

294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310

311
312

313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334

elif row_distance2 < 0: #we're stepping down (visually, not in terms of row index
numbers)

if abs_col_distance2 > O:
move_quotient = abs_row_distance2/abs_col_distance2
min_v_move = int(ﬂoor(move_quotient))
if abs(abs_row_distance2 % abs_col_distance2) > O:
max_v_move = min_v_move + 1
else:
max_v_move = min_v_move
else:
min_v_move = row_distance2 * -l
max_v_move = min_v_move

if col_distance2 < 0: h_move = l
elif col_distance2 > 0: h_move = -1
else: h_move = 0

if min_v_move == max_v_move:
v_move_listl = [min_v_move] "‘ int(list_size)

else:
v_move_list1 = move_list(list_size, min_v_move, max_v_move, move_quotient,
row_index, col_index)

h_move_listl = [h_move] * int(list_size)

else: #the're equal, it's a square
primary_move = "either"
if row_distance2 < 0: v_move = l
e1ifrow_distance2 > 0: v_move = -1
else: v_move = 0
if col_distance2 < 0: h_move = 1
elif col_distance2 > 0: h_move = -1
else: h_move = 0

v_move_listl = [v_move] * int(list_size)
h_move_list1 = [h_move] * int(list_size)

#check to see if it's a perfect square

if v_move == :
h_move_1ist1 = [h_move] * int(abs_col_distance2)
v_move_list1 = [0] "‘ int(abs_col_distance2)

if h_move == :
h_move_listl = [0] * int(abs_row_distance2)

121

 

335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352

353
354

355
356
357
358
359
360
361
362

363
364

365

366
367

368
369
370
371
372
373
374
375

numbers)

cell,

v_move_listl = [v_move] * int(abs_row_distance2)

#ﬁgure out the distances from the max index to recipricol index2.
row_distance3 = max_row_index - recip_row_index2
col_distance3 = max_col_index - recip_col_index2
h_move_list2 = []

v_move_list2 = []

if col_distance3 < 0: #we're stepping to the right
for k in range(len(v_move_list1)):
h_move_list2 .append(abs(v_move_listl [k]))
elif col_distance3 > 0: #we‘re stepping to the left
for k in range(len(v_move_list1)): .
h_move_list2 .append(abs(v_move_list1 [k])* -1 )
else:
h_move_list2 = [0] * 1en(v_move_listl)

if row_distance3 < 0: #we're stepping down (visually, not in terms of row index

for k in range(len(h_move_listl )):
v_move_list2.append(abs(h_move_listl [k]))
elif row_distance3 > O: #we're stepping up
for k in range(len(h_move_listl)):
v_move_list2.append(abs(h_move_list1 [k])* -1)
else:
v_move_list2 = [0] "‘ len(h_move_list1)

#Call a function that takes the four comers of the stream basin (the max, its opposite

#and the two recipracols) and constructs a square using those points as the comers.

#Then set all of the values in that area to l for the stream_list. . .
river_bed(max_row_index, max_col_index, v_move_lrstl , h_move_lrstl , v_move_lrstZ,

h_move_list2, primary_move, ubound_index, bbound_index, Ibound_index, rbound_index)

def move_list(list_length, min_move, max_move, quotient, row, col):

#This function constructs the move lists from one cell to another.

the_list = [min_move] * int(list_length)
max_index_list = []

122

376

377
378
379
380

381
382

383
384

385
386
387
388
389
390
391
392
393
394
395
396
397
398

399
400

401
402
403
404
405
406
407
408
409
410
41 1
412
413
414
415
416

move_difference = quotient - min_move #gives us the percentage of the time that max_move
will be used.

max_index_places = abs(list_length * move_difference)
max_index_step = list_length / max_index_places

if int(abs(move_difference) * 10) >= 5: # then the max move will dominate the list or split it
evenly, and should be ﬁrst

max_index_holder = 0

else: #it should start later in the list
max_index_holder = int(round(max_index_step)) - 1 #minus 1 to put it in proper zero-based
index mode

max_index_step_counter = 0

while max_index_holder < len(the_list):
max_index_list.append(max_index_holder)
max_index_step_counter += max_index_stcp
max_index_holder = int(round(max_index_step_counter))

for i in range(len(the_list)):
if i in max_index_list:
’the_list[i] = max_move

return the_list

def river_bed(max_row_index, max_col_index, v_move_listl, h_move_listl , v_move_list2,
h_move_list2, primary_move, ubound_index, bbound_index, Ibound_index, rbound_index):

#this function will loop through the v_move_listl and h_move_listl, while looping though a
#reciprocal version of each list, v_move_list2 and h_move_list2.

current_row_indexl, current_row_index2 = max_row_index, max_row_index
current_col_indexl, current_col_index2 = max_col_index, max_col_index

stream_list[current_row_index1][current_col_indexl ] = 1

#need to make an initial run through v_move_listl and h_move_listl
if primary_move == "vertical":
for p in range(len(v_move_list1)):
for q in range(abs(v_move_list1[p])):
if v_move_listl [p] > 0 : current_row_indexl += 1
elif v_move_listl[p] < 0: current_row_indexl -= 1

if current_row_indexl > bbound_index: current_row_indexl = bbound_index

123

 

 

 

 

 

417
418
419
420
421
422
423
424
425
426
427
428
429
430
43 1
432
433
434
435
436
437
438
439
440
441
442
443

445
446
447
448
449
450
451
452
453
454
455
456
457
458
459

elif current_row_indexl < 0: current_rowwindexl = 0
stream_list[current_row_index I ][current_col_index l ] = 1

if h_move_listl [p] > 0 : current_col_indexl += 1
elif h_move_listl [p] < 0: current_col_indexl -= 1

if current_col_indexl > rbound_index: current_col_indexl = rbound_index
elif current_col_indexl < 0: current_col_indexl = 0

stream_list[current_row__index l ][current_col_index l] = 1

elif primary_move == "horizontal":
for p in range(len(v_move_list1)):

for q in range(abs(h_move_listl [p])):
if h_move_listl[p] > 0 : current_col_indexl += 1
elifh__move_list1 [p] < 0: current_col_indexl -= 1

if current_col_indexl > rbound_index: current_col_indexl = rbound_index
e1ifcurrent_col_index1 < 0: current_col_indexl = O

stream_list[current_row_index 1][current_col_index 1] = 1

if v_move_listl[p] > 0 : current_row_indexl += 1
elif v_move_listl [p] < 0: current_row_indexl -= 1

if current_row_indexl > bbound_index: current_row_indexl = bbound_index
elif current_row_indexl < 0: current_row_indexl = 0

stream_list[current_row_index l ][current_col_index l ] = 1

for p in range(len(v_move_list1)):

if h_move_listl [0] > 0 : current_col_indexl += 1

elif h_move_listl [0] < 0: current_col_indexl -= I

if current_col_indexl > rbound_index: current_col_indexl = rbound_index
elif current_col_indexl < 0: current_col_indexl = 0

if v_move_listl [0] > O : current_row_indexl += 1

elif v_move_listl [0] < 0: current_row_indexl -= 1

if current_row_indexl > bbound_index: current_row_indexl = bbound_index
elifcurrent_row_index1 < 0: current_row_indexl = 0

124

##A

 

 

 

 

460
461
462
463
464
465
466
467
468
469
470
47 I
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
50 l
502

stream_list[current_row_index l ][current_col_index l] = 1

if h_move_list2[0] > 0: current_col_index2 += 1

elif h_move_list2[0] < 0: current_col_index2 -= 1

if current_col_index2 > rbound_index: current_col_index2 = rbound_index
elif current_col_index2 < 0: current_col_index2 = 0

if v_move_list2[0] > 0: current_row_index2 += 1

elif v_move_li512[0] < 0: current_row_index2 -= 1

if current_row_index2 > bbound_index: current_row_index2 = bbound_index
elif current_row_index2 < 0: current_row_index2 = 0

current_row_indexl, current_col_indexl = current_row_index2, current_col_index2

#now loop through both sets of lists
for m in range(len(v_move_list2)):
#need to caputre loop of v_move_listZ
if primary_move == "vertical":
for n in range(abs(h_move_list2[m])):

stream_list[current_row_index2][current_col_indexZ] = l
for p in range(len(v_move_list1)):
for q in range(abs(v_move_list1[p])):
if v_move_listl[p] > O : current_row_indexl += 1

elif v_move_listl [p] < 0: current_row_indexl -= 1

if current_row_indexl > bbound_index: current_row_indexl = bbound_index
elif current_row_indexl < 0: current_row_indexl = 0

stream_list[current_row_index I ][current_col_index I] = I

if h_move_listl[p] > O : current_col_indexl += 1
elif h_move_listl[p] < 0: current_col_indexl -= 1

if current_col_indexl > rbound_index: current_col_indexl = rbound_index

elifcurrent_col_index1 < 0: current_col_indexl = 0
stream_list[current_row_index l ][current_col_index 1 ] = 1

if h_move_list2[0] > 0: current_col_index2 += 1
elif h_move_list2[0] < O: current_col_index2 -= 1

125

 

503
504
505
506
507
508
509
510
51 1
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545

if current_col_index2 > rbound_index: current_col_index2 = rbound_index
elif current_col_index2 < O: current_col_index2 = 0

stream_list[current_row_index2][current_col_index2] = 1

current_col_indexl = current_col_index2
current_row_indexl = current_row_index2

if v_move_list2[0] > O: current_row_index2 += 1
elif v_move_list2[0] < 0: current_row_index2 -=

if current_row_index2 > bbound_index: current_row_index2 = bbound_index
elif current_row_index2 < 0: current_row_index2 = O

current_row_indexl = current_row_index2

elif primary_move == "horizontal":
for n in range(abs(v_move_list2[m])):

stream_list[current_row_index2][current_col_index2] = 1
for p in range(len(v_move_list1)):
for q in range(abs(h_move_listl [p])):
if h_move_listl [p] > 0 : current_col_indexl += 1

elif h__move__listl[p] < 0: current_col_indexl -= 1

if current_col_indexl > rbound_index: current_col_indexl = rbound_index
elif current_col_indexl < 0: current_col_indexl = 0

stream_list[current_row_index1][current_col_index1 ] = 1

if v_move_listl[p] > 0 : current_row_indexl += 1
elif v_move_listl [p] < 0: current_row_indexl -= 1

if current_row_indexl > bbound_index: current_row_indexl = bbound_index
elif current_row_indexl < O: current_row_indexl = 0

stream_list[current_row_index l ][current_col_index l ] = 1

if v_move_list2[0] > 0: current_row_index2 += 1
elif v_move_list2[0] < 0: current_row_index2 -= 1

126

 

 

 

546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588

if current_row_index2 > bbound_index: current_row_index2 = bbound_index
elif current_row_index2 < 0: current_row_index2 = 0

stream_list[current_row_index2][current_col_index2] = 1

current_row_indexl = current_row_index2
current_col_indexl = current_col_index2

if h_move_list2[0] > O: current_col_index2 += 1
elif h_move_list2[0] < 0: current_col_index2 -=

if current_col_index2 > rbound_index: current_col_index2 = rbound_index
elif current_col_index2 < 0: current_col_index2 = 0

current_col_indexl = current_col_index2
else:
stream_list[current_row_index2][current_col_index2] = l

for p in range(len(v_move_list1)):
if h_move_listl [0] > O : current_col_indexl += 1
elifh_move_list1[0] < 0: current_col_indexl -= 1
if current_col_indexl > rbound_index: current_col_indexl = rbound_index
elif current_col_indexl < 0: current_col_indexl = 0

stream_list[current_row_index1 ][current_col_index 1] = 1

if v_move_listl[O] > 0 : current_row_indexl += 1

elifv_move_list1[0] < 0: current_row_indexl -= 1

if current_row_indexl > bbound_index: current_row_indexl = bbound_index
elif current_row_indexl < O: current_row_indexl = 0

stream_list[current_row_index l ][current_col_index 1 ] = 1
if h_move_list2[0] > 0: current_col_index2 += 1
elif h_move_list2[0] < 0: current_col_index2 -= 1

if current_col_index2 > rbound_index: current_col_index2 = rbound_index
elif current_col_index2 < 0: current_col_index2 = 0

127

 

 

589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
61 1
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
63 I

if v_move_list2[0] > 0: current_row_index2 += 1

elif v_move_list2[0] < 0: current_row_index2 -= 1

if current_row_index2 > bbound_index: current_row_index2 = bbound_index
elif current_row_index2 < O: current_row_index2 = 0

current_row_indexl, current_col_indexl = current_row_index2, current_col_index2

percent_done = 0
for i in range(len(elev_list)):
for j in range(len(elev_list[i])):
nhood(elev_list, i, j, int(neigh_maxsize))
percent = int((ﬂoat(i) / (len(stream_list) - 1) * 100))
if percent > percent_done:
gp.AddMessage(' - Analyzing: '+ str(percent) + '% completed')
percent_done = percent

#write the output stream ﬁle
stream_ﬁle_text_name = stream_raster + ".txt"
stream_ﬁle = open(stream_ﬁle_text_name, 'w')
for i in range(len(header_list)):
stream_ﬁle.write(header_list[i])

percent_done = 0
for i in range(len(elev_list)):
for j in range(len(elev_list[i])):
stream_ﬁle.write('%s ' % (stream_list[i][i]))
stream_ﬁle.write(ls)
percent = int((ﬂoat(i) / (len(stream_list) - 1) * 100))
if percent > percent_done:
percent_done = percent
stream_ﬁle.close()

gp.ASCIIToRaster_conversion(stream_ﬁle_text_name, stream_raster, "INTEGER")
os.remove(stream_ﬁ1e_text_name)

gp.AddMessage(" - There were " + str(streambed_cells) + " recip_small_slopes")
stop_time = time.clock() - start_time
seconds = stop_time - start_time

h,m,s = sec_to_h_min(seconds)
gp.AddMessage(' - Took ' + str(h) + ' hours ' + str(m) + ' minutes ' + str(s) + ' seconds')

128

 

\ONQONMAWN—

wwuwwwwwwwwNNNNNNNNN-du—Ir—tu—Ia—I—u—nu—r—a—
OMQO~MAwNHO©OOQO~thN~O©OO\JONUt-h-wN—‘O

APPENDIX B — Python Code

stream_id_transect.py

#
# stream_id_transect.py

# Created September 2009

# Author: Glenn O'Neil

#

# Description: Takes an elevation raster and identiﬁes potential stream cells

 

# based on their relationship within directional transects.

The basic premise is that stream cells in the middle of a
transect will be opposed by higher elevations towards the ends
of the transect.

Inputs: 1. elev_raster_path - the elevation raster.
2. stream_width - the maximum width of the stream, and therefore
the maximum width of the transect.
3. bank_depth - the theshold elevation difference deﬁning a
stream cell from a stream bank cell.
4. stream_output_raster - the output binary stream raster.

Outputs: 1. stream_output_raster - the output binary stream raster.

3t¥==tt¥t=tt1t¢t¢t=lt=tt3t=lt¢t

 

from decimal import "‘

from math import ﬂoor

import time, os, sys, arcgisscripting
import raster_analysis as raster

1s = os.linesep

gp = arcgisscripting.create(9.3)

#function to convert seconds to hours and minutes (borrowed from:
#httpzl/mail.python.org/pipermail/python-list/2003January. 1 81366.html
def sec_to_h_min(s):

temp = ﬂoatO

temp = ﬂoat(s) / (60*60*24)

d = int(temp)

temp = (temp - d) * 24

h = int(temp)

temp = (temp - h) * 60

m = int(temp)

129

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

62
63

65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

temp = (temp - m) "' 60
sec = int(temp)
return h,m,sec

#Get the necessary ﬁles from the user
elev_raster_path = sys.argv[l]

#Get the maximum width of the stream
stream_width = sys.argv[2]
stream_width = ﬂoat(stream__width)

#Get the minimum depth of the stream bed to the stream bank
bank_depth = sys.argv[3]
bank_depth = float(bank_depth)

#Output raster
stream_output_raster = sys.argv[4]

workspace = stream_output_raster[:stream_output_raster.rﬁnd('\\')] + '\\'

#Get the ﬁle name and path for the output stream ﬁle
stream_ﬁ1e_name = workspace +'strt1ant' + str(int(stream_width)) + 'e' + str(int(bank_depth))
+ '.txt'

#Start the timer
start_time = time.clock()

#Convert the raster to a text ﬁle

elev_ascii_path = workspace + "elev_ascii.txt"
gp.AddMessage("- Converting raster to ASCII")
gp.RasterToASClI_conversion(elev_raster_path, elev_ascii_path)

#record the header information for writing in the output stream ﬁle
header_list = raster.extract_header(elev_ascii_path)

#read the elevation text raster into a list
elev_list = raster.rastertext2list(elev_ascii_path, 'ﬂoat')
os.remove(elev_ascii_path)

#Intiate a list to represent the resulting stream ascii, set initial .values to 0.
stream_list = [ﬂ] "‘ 1en(e1ev_list)

for i in range(len(stream_list)):

130

 

82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124

stream_list[i] = [0] * 1en(e1ev_list[0])

 

#Move through the list along transects to identify stream cells. Start by moving
#vertically from the top cells to the bottom cells, beginning with the upper-left

#cell. Move south, and note any locations where elevation decreases, and then increases
#within the user-speciﬁed maximum stream width. You may have to set a minimum
#stream width threshold if this approach picks up too many non-stream features.

#Counters for output
streambed_cells = 0

#A function to analyze a transect of cells. The number of cells is equal to the
#user speciﬁed maximum stream width
def transect(thelist):

#thelist is the list to operate on

global bank_similarity, bank_depth

#ldentify the minimum elevation along the transect
min_elev = min(thelist)
#Note its index in the list
for m in range(len(thelist)):
if thelist[m] == min_elev:
min_index = m
break

#Check the cells aﬁer the minimum_index to see if elevation increases signiﬁcantly
max_elev = max(thelist[min_index:len(thelist)])
if (thelist[O] - min_elev) > bank_depth and (max_elev - min_elev) > bank_depth:
transect_is_stream = True
else:
transect_is_stream = False

return transect_is_stream

#initiate a transect list to send to the transect function
transect_list = [0] * int(stream_width)

transect_called = 0

131

 

125
126
127
128
129
130
131
132
133
134

I35
136

137
138
139
140
141
142
143
144
145
146
147
148
I49
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166

#Move through the list vertically, from northernmost cells to sourthemmost,
#looking for decreases in elevation
percent_done = 0
for i in range(len(elev_list[0])):
for j in range(len(elev_list)):

elev = elev_list[j][i]

#Check the next elevation, watch out for the boundary

if (j + 1) < len(elev_list):

elev_next = elev_list[j + l][i]
else:

break
if elev > elev_next: #We might have the beginning of a stream, send the next K cells to

the transect function
for k in range(int(stream_width)):
#Make sure you're not at the dataset boundary
if (j + k) < len(elev_list) and i < len(elev_list[j]):
transect_list[k] = elev_list[j + k][i]

is_stream = transect(transect_list)
transect_called += 1

if is_stream:
for k in range(int(stream_width)):
if (j + k) < len(elev_list) and i < len(elev_list[j]):
stream_list[j + k][i] = 1

percent = int((ﬂoat(i) / (len(elev_list[OD - 1) * 100))

if percent > percent_done:
gp.AddMessage(’ - Analyzing vertical transects: '+ str(percent) + '% completed')
percent_done = percent

#Move through the list horizontally, from westernmost cells to easternmost,
#looking for decreases in elevation
percent_done = 0
for i in range(len(elev_list)):
for j in range(len(elev_list[i])):
elev = elev_list[i][j]
#Check the next elevation, watch out for the boundary
if (j + 1) < 1en(e1ev_list[0]):
e1ev__next= elev_list[i][j + 1]
else:
break

132

 

167

168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
I83
184
185

186
187

188
189
190
191
I92
193
194
195

196
197

198
199
200
201

202'

203
204
205
206
207

if elev > elev_next: #We might have the beginning of a stream, send the next K cells to
the transect function

for k in range(int(stream_width)):
#Make sure you're not at the dataset boundary
if (j + k) < len(elev_list[jD and i < len(elev_list):
transect_list[k] = elev_list[i][j + k]

is_stream = transect(transect_list)
transect_called += 1

if is_stream:
for k in range(int(stream_width)):
if (j + k) < len(elev_list[jD and i < len(elev_list):
stream_list[i][j + k] = 1

percent = int((ﬂoat(i) / (len(elev_list[0]) - 1) * 100))

if percent > percent_done:
gp.AddMessage(' - Analyzing horizontal transects: ' + str(percent) + '% completed')
percent_done = percent

#Move through the list diagnollay, from northwest cells to southeast, looking for decreases in
elevation

percent_done = 0
for i in range(len(elev_list)):
for j in range(len(elev_list[i])):
elev = elev_list[i][j]
#Check the next elevation, watch out for the boundary
if (i + 1) < len(elev_list[0]) and (i + l) < len(elev_list):
elev_next = elev_list[i + l][i + 1]
else:

break
if elev > elev_next: #We might have the beginning of a stream, send the next K cells to
the transect function

for k in range(int(stream_width)):
#Make sure you're not at the dataset boundary
if (j + k) < len(elev_list[j]) and (i + k) < len(elev_list):
transect_list[k] = elev_list[i + k][j + k]

is_stream = transect(transect_list)
transect_called += 1

if is_stream:
for k in range(int(stream_width)):

133

 

208
209
210
211
212

213
214
215
216
217
218
219
220
221
222
223
224
225
226

227
228

229
230
231
232
233
234
235
236
237
238
239
240
241
242

243
244

245
246
247
248

 

if (j + k) < len(elev_list[jD and (i + k) < len(elev_list):
stream_list[i + k][i + k] = 1

percent = int((ﬂoat(i) / (len(elev_list[0]) - l) * 100))

if percent > percent__ done:
gp. AddMessage('- Analyzing diagonal (N W SE) transects: '+ str(percent) + '%
completed' )

percent_done = percent

#Move through the list diagnollay, from southwest cells to northeast,
#looking for decreases in elevation
percent_done = 0
for i in range(len(elev_list) - 1, -l, -1):
for j in range(len(elev_list[i])):

elev = elev_list[i][j]

#Check the next elevation, watch out for the boundary

if (j + l) < len(elev_list[OD and (i - 1) >= 0:

elev_next = elev_list[i - l][j + 1]
else:

break
if elev > elev_next: #We might have the beginning of a stream, send the next K cells to
the transect function

for k in range(int(stream_width)):
#Make sure you're not at the dataset boundary
if (j + k) < len(elev_list[jD and (i - k) >= 0:
transect_list[k] = elev_list[i - k][j + k]

is_stream = transect(transect_list)
transect_called += 1

if is_stream:
for k in range(int(stream_width)):
if (j + k) < len(elev_list[jD and i >= 0:
stream_list[i - k][i + k] =

percent = int(ﬂoat((len(elev_list[0]) - i + l) / ﬂoat(len(elev_1ist[0]) - 1)) * 100)

if percent > percent_done:
gp.AddMessage(' - Analyzing diagonal (SW-NE) transects: '+ str(percent) + '%
completed')

percent_done = percent

#Write the output stream ﬁle
stream_ﬁle = open(stream_ﬁle_name, 'w')

134

249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270

for i in range(len(header_list)):
stream_ﬁle.write(header__list[i])

percent_done = 0
for i in range(len(elev_list)):
for j in range(len(elev_list[i])):
stream_ﬁle.write('%s ' % (stream_list[i][j]))
stream_ﬁle.write(ls)
stream_ﬁle.close()

gp.AddMessage(' - Converting stream output to raster.')
gp.ASCIIToRaster_conversion(stream_ﬁle_name, stream_output_raster, "INTEGER")

os.remove(stream_ﬁle_name)
os.remove(elev_ascii_path)

print "%i transects were analyzed " % (transect_called)
seconds = time.clock() - start_time

h,m,s = sec_to_h_min(seconds)
print 'Took ' + str(h) + ' hours ' + str(m) + ' minutes ' + str(s) + ' seconds'

135

 

 

©00\)O\MJ>WN~

APPENDIX B - Python Code
carve.py

#
# carve.py

# Created September 2009

# Author: Glenn O'Neil

#

# Description: Takes a DEM and binary raster of stream locations, identiﬁes

 

# sinks in the DEM, and carves paths through artiﬁcial barriers
# within the stream locations of the DEM. Based on work by
# Soille.

#

# Inputs: 1. elev_raster - full path to the DEM.
. stream_raster - binary raster of stream locations, must be

31:
N

the same row and column size as the DEM.

3. max_carve_length - the maximum distance (in cells)
that a carving can cover.

4. iterations - the number of times the script should iterate
through the DEM to carve through barriers. Each loop may
yield new sinks that need to be carved.

5. delete_intermediate - boolean value indicating whether to
to delete the temporary folders for each iteration.

6. output_raster - the ﬁnal carved DEM.

7. workspace - folder where intermediate data is stored.

Outputs: 1. output_raster - the ﬁnal carved DEM.

%Itittﬁittkit3t3t3t1tltt3t3t

 

from decimal import *

from math import ﬂoor

import os, sys, copy, arcgisscripting, shutil
ls = os.linesep

import raster_analysis as raster

gp = arcgisscripting.create(9.3)

# Load required toolboxes...
arcgis_home = os.environ.get("ARCGlSHOME")
gp.AddToolbox(arcgis_home + "ArcToolbox\\Toolboxes\\Data Management Tools.tbx")

136

 

4o
41
42
43
44
45
46
47
48
49
so
51
52
53
54
55
56
57
58
59
6o
61
62
63
64
65
66
67
68
69

70
71

72
73

74
75

76

77
78

79
80

gp.AddToolbox(arcgis_home + "ArcToolbox\\Toolboxes\\Conversion Tools.tbx")

gp.CheckOutExtension("Spatial")

#get the necessary ﬁles ﬁ'om the user
elev_raster = sys.argv[l]
stream_raster = sys.argv[2]
max_carve_length = sys.argv[3]
max_carve_length = int(max_carve_length)
#number of times to iterate

iterations = sys.argv[4]

iterations = int(iterations)
delete_intermediate = sys.argv[S]
output_raster = sys.argv[6]
workspace = sys.argv[7] + "\\"

#Move through elev_list and look for cells that are sinks AND stream cells.
#If that condition is met, begin a radial search looking for an elevation lower
#than the selected cell. If/once that lower cell is found, carve a path to the
#cell by manually lowering elevation values along the path incrementally,
#yielding a slope through the potential barrier that created the sink.

def move_list(list_length, min_move, max_move, quotient):
#This function constructs the move lists from one cell to another.

the_list = [min_move] * int(list_length)

max_index_list = []
move_difference = quotient - min_move #gives us the percentage of the time that max_move
will be used.

max_index_places = abs(list_length * move_difference)
max_index_step = list_length / max_index_places

if int(abs(move_difference) * 10) >= 5: # then the max move will dominate the list or split it
evenly, and should be ﬁrst

max_index_holder = 0

else: #it should start later in the list
max_index_holder = int(round(max_index_step)) - 1 #minus 1 to put it in proper zero-
based index mode

max_index_step_counter = O

137

 

81 while max_index_holder < len(the_list):

82 max_index_list.append(max_index__holder)

83 max_index_step_counter += max_indexnstep

84 max_index_holder = int(round(max_index_step_counter))

85

86 for i in range(len(the_list)):

87 if i in max_index_list:

88 the_list[i] = max_move

89

90 return the_list

91

92

93 def carve(row_index, col_index):

94 global elev_list, elev_out_list, max_carve_length, rbound_index, bbound_index,
Ibound_index, ubound_index

95

96 sink_elev = elev_list[row_index][col_index]

97 carve_to_elev = sink_elev

98

99 #initiate the initial neighborhood size to search. You don't start with l
100 #because the adjacent neighbors were searched in the sink identiﬁcation.
101 nhood_size = 2

102 while not carve_to_elev < sink_elev and nhood_size <= max_carve_length:
103

104 #get the cell's neighbors

105 nhood_list = raster.nhood(elev_list, row_index, col_index, nhood_size)
106

107 #get the minimum elevation value from the returned nhood_list

108 #only check the outer edges since the others had already been checked
109 row_length = len(nhood__list)

110 col_length = len(nhood_list[0])

l 1 1 #check the ﬁrst row

112 min_elev_top = min(nhood_list[0])

113 min_elev_top_index = nhood_list[0].index(min_elev_top)

114

1 15 #check the bottom row

116 min_elev_bottom = min(nhood_list[row_length - 1])

117 min_elevgbottom_index = nhood_list[row__length - l].index(min_elev_bottom)
118

1 19 #check the left column

120 min_elev_leﬁ = nhood_list[0][0]

121 min_elev_left_index = 0

122 for i in range(row_length):

138

 

123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165

 

if nhood_list[i][O] < min_elev_leﬁ: k
min_elev_leﬂ = nhood_list[i][O]
min_elev_leﬁ_index = i

#check the right column
min_elev_right = nhood_list[0][col_length - l]
min_elev_right_index = 0
for i in range(row_length):
if nhood_list[i][col_length - 1] < min_elev_right:
min_elev_right = nhood_list[i][col_length - 1]
min_elev_right_index = 1

#determine which elevation and index has the lowest elevation
min_elev = min(min_elev_top, min_elev_bottom, min_elev_leﬁ, min_elev_right)

#check to see if we've found an elevation less than the sink's
if min_elev < sink_elev:
#note the indexes of the lower sink
if min_elev == min_elev_top:
lower_sink_row = row_index - (int(row_length / 2)) #easy since its the top row
lower_sink_col = (col_index - (int(col_length / 2))) + min_elev_top_index

e1ifmin_elev == min_elev_bottom:
lower_sink_row = row_index + (int(row_length / 2)) #easy since its the bottom row
lower_sink_col = (col_index - (int(col_length / 2))) + min_elev_bottom_index

elif min_elev == min_elev_left:
lower_sink_row = (row_index - (int(row_length / 2))) + min_elev_left_index
lower_sink_col = col_index - (int(row_length / 2)) #easy since its the left column

elif min_elev == min_elev_right:
lower_sink_row = (row_index - (int(row_length / 2))) + min_elev_right_index
lower_sink_col = col_index + (int(row_length / 2)) #easy since its the left column

nhood_size += 1

carve_to_elev = min_elev

#check to see if the search was successful

if nhood_size > max_carve_length:
#couldn't ﬁnd a lower elevation within the max neighborhood size
no_lower_sink__found.append(‘row: ' + str(row_index) +' col: '+ str(col_index))
return

139

 

166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208

#carve from row_index and col_index to lower_sink_row and lower_sink_col
row_dist = row_index - lower_sink_row
col_dist = col_index - lower_sink_col

abs_row_dist = ﬂoat(abs(row_dist))
abs_col_dist = ﬂoat(abs(col_dist))

#Initiate lists that will store the h_move and v_move lists.

#The ﬁrst list will contain the h_move and v_move values for the path

#from the row_index and col_index to the lower_sink_row and lower_sink_col.
#The list sizes will be the same as the minimum value between abs_row_dist
#and abs_col_dist.

list_size = min(abs_row_dist, abs_col_dist)

if list_size == 0: list_size = l

move_quotient = 0
#Determine the h_move and v_move
if abs_row_dist < abs_col_dist and abs_row_dist != 0: #we're stepping horizontally
primary_move = "horizontal"
#recod the maximum H distance for a V distance of 1
if col_dist > 0: #we're stepping left
if abs_row_dist > 0:
move_quotient = (abs_col_dist/abs_row_dist) * -l
min_h_move = int(ﬂoor(move_quotient)) + 1
if abs(abs_col_dist % abs_row_dist) > 0:
max_h_move = min_h_move - 1
else:
max_h_move = min_h_move
else: #it's a straight line down to the destination index
min_h_move = col_dist
max_h_move = min_h_move
elif col_dist < 0: #we're stepping right
if abs_row_dist > 0:
move_quotient = abs_col_dist/abs_row_dist
min_h_move = int(ﬂoor(move_quotient))
if abs(abs_col_dist % abs_row_dist) > O:
max_h_move = min_h_move + 1
else:
max_h_move = min_h_move
else: #it's a straight line down to the destination index
min_h_move = col_dist * -l
max_h_move = min_h_move

140

 

209
210
21 1
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251

if row_dist < 0: v_move = 1
elif row_dist > 0: v_move = -1
else: v_move = 0

if min_h_move == max_h_move:
h_move_list = [min_h_move] "' int(list_size)
else:

h_move_list = move_list(list_size, min_h_move, max_h_move, move_quotient)

v_move_list = [v_move] * int(list_size)

elif abs_col_dist < abs_row_dist and abs_col_dist != 0: #we're stepping vertically
primary_move = "vertical"
#recod the maximum V distance for an H distance of 1
if row_dist > 0: #we're stepping up (visually, not in terms of row index numbers)
if abs_col_dist > 0:
move_quotient = (abs_row_dist/abs_col_dist) * -1
min_v_move = int(ﬁoor(move_quotient)) + 1
if abs(abs_row_dist % abs_col_dist) > 0:
max_v_move = min_v_move - 1
else:
max_v_move = min_v_move
else: #it's a straight line down to the destination index
min_v_move = row_dist
max_v_move = min_v_move
elif row_dist < 0: #we're stepping down (visually, not in terms of row index numbers)
if abs_col_dist > 0:
move_quotient = abs_row_dist/abs_col_dist
min_v_move = int(ﬂoor(move_quotient))
if abs(abs_row_dist % abs_col_dist) > 0:
max_v_move = min_v_move + 1
else:
max_v_move = min_v_move
else:
min_v_move = row_dist "‘ -l
max_v_move = min_v_move

if col_dist < 0: h_move = l
elif col_dist > 0: h_move = -1
else: h_move = 0

if min_v_move == max_v_move:

141

252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278

279
280

281
282
283
284
285
286
287
288
289
290
291
292
293

v_move_list = [min_v_move] * int(list_size)
else:
v_move_list = move_list(list_size, min_v_move, max_v_move, move_quotient)

h_move_list = [h_move] * int(list_size)

else: #the're equal, it's a square
primary_move = "either"
if row_dist < 0: v_move = 1
elif row_dist > 0: v_move = -1
else: v_move = 0
if col_dist < 0: h_move = 1
elif col_dist > 0: h_move = -1
else: h_move = O

v_move_list = [v_move] * int(list_size)
h_move_list = [h_move] * int(list_size)

#check to see if it's a perfect square

if v_move == 0:
h_move_list = [h_move] * int(abs_col_dist)
v_move_list = [O] * int(abs_col_dist)

if h_move == :
h_move_list = [0] * int(abs_row_dist)
v_move_list = [v_move] * int(abs_row_dist)

#we now need to determine the length of the path, and how much the incremental elevation
decline will be

if primary_move == 'either':
path_length = len(h_move_list)
else:
path_length = abs(sum(h_move_1ist)) + abs(sum(v_move__list))
elev_diff = sink_elev - min_elev
elev_decline_increment = elev_diff / ﬂoat(path_length)

current_elev = sink_elev
#variables to keep track of the indexes the path moves through

current_row = row_index
current_col = col_index

142

294
295
296
297
298
299
300
301
302
303
304
305
306
307
308

309
310

311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332

333
334

335

if primary_move == 'horizontal':
#iterate through the h_move_list
for m in range(len(h_move_list)):
h_move = h_move_list[m]
for n in range(abs(h_move)):
if h_move_list[m] > 0: current_col += 1
elif h_move_list[m] < 0: current_col -= 1

current_elev -= elev_decline_increment
elev_out_list[current_row][current_col] = current_elev

if v_move_list[m] > 0: current_row += 1
elif v_move_list[m] < 0: current_row -= 1

#watch for the dataset boundaries
if current_row < 0 or current_row > bbound_index or current_col > rbound_index or
current_col < Ibound_index:

carve_to_elev = sink_elev
break

current_elev -= elev_decline_increment
elev_out_list[current_row][current_col] = current_elev

elif primary_move == 'vertical':
#iterate through the v_move_list
for m in range(len(v_move_list)):
v_move = v_move_list[m]
for n in range(abs(v_move)):
if v_move_list[m] > 0: current_row += 1
elif v_move_list[m] < 0: current_row -= l

current_elev -= elev_decline_increment
elev_out_list[current_row][current_col] = current_elev

if h_move_list[m] > 0: current_col += 1
elif h_move_list[m] < O: current_col -= 1

#watch for the dataset boundaries
if current_row < 0 or current_row > bbound_index or current_col > rbound_index or
current_col < Ibound_index:

carve_to_elev = sink_elev

143

336
337
338
339
340
341
342
343
344
345
346
347
348
349

350
351

352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377

break

current_elev -= elev_decline_increment
elev_out_list[current_row][current_col] = current_elev

else: #its a straight diagonal path
for p in range(len(v_move_list)):
if h_move_list[O] > 0 : current_col += 1
elif h_move_list[O] < 0: current_col -= 1

if v_move_list[O] > 0 : current_row += 1
elif v_move_list[O] < 0: current_row -= 1

#watch for the dataset boundaries
if current_row < 0 or current_row > bbound_index or current_col > rbound_index or

current_col < Ibound_index:

carve_to_elev = sink_elev
break

current_elev -= elev_decline_increment
elev_out_list[current_row][current_col] = current_elev

for h in range(iterations):

gp.AddMessage("Beginning Iteration " + str(h + l) + ":")
#create a workspace for the current iteration
sub_workspace = workspace + 'iteration' + str(h + 1)
os.mkdir(sub__workspace)

#Calculating ﬂow direction
gp.AddMessage(" - Calculating flow direction of elevation raster")
ﬂowdir_raster = sub_workspace + "\\fd_temp"
if h > 0:

#get the previous iteration's output

elev_raster = workspace + 'iteration' + str(h) + "\\elev_out" + str(h)
gp.FlowDirection_sa(elev_raster, ﬂowdir_raster)

#Calculate sinks

gp.AddMessage(" - Calculating sinks")
sinks_raster = sub_workspace + "\\sinks" + str(h + l)
gp.Sink_sa(ﬂowdir_raster, sinks_raster)

144

 

378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415

416
417

418
419

#convert the tasters to text ﬁles

if h = :
gp.AddMessage(" - Converting elevation raster to a text ﬁle")
elev_ascii_path = sub_workspace + "\\elev_temp.txt"
gp.RasterToASCll_conversion(elev_raster, elev_ascii_path)

else: #get the previous iteration's elevation text ﬁle
elev_ascii_path = workspace + 'iteration' + str(h) + '\\elev_out' + str(h) + '.txt'

stream_ascii_path = workspace + "stream_temp.txt"

if h = 0:
gp.AddMessage(" - Converting stream raster to a text ﬁle")
gp.RasterToASCll_conversion(stream_raster, stream_ascii_path)

gp.AddMessage(" - Converting sinks raster to a text ﬁle")
sink_ascii_path = sub_workspace + "\\sinks" + str(h + l) + ".txt"
gp.RasterToASCll_conversion(sinks__raster, sink_ascii_path)

#convert the raster text ﬁles to arrays
gp.AddMessage(" - Reading elevation text ﬁle into a list")
elev_list = raster.rastertext21ist(elev_ascii_path, 'ﬁoat')
if h == :
gp.AddMessage(" - Reading stream text ﬁle into an array")
stream_ary = raster.raster2array(stream_raster)
gp.AddMessage(" - Reading sinks text ﬁle into an array")
sink_ary = raster.raster2array(sinks_raster)

#record the header information for writing in the output stream ﬁle
raster_header = raster.extract_header(elev_ascii_path)

#note the indexes of the dataset boundaries
rbound_index = len(elev_list[OD - l
Ibound_index = 0

bbound_index = len(elev_list) - 1
ubound_index = 0

#intiate a list to represent the resulting elevation ascii, set initial values to elev_list's.
elev_out_list = copy.deepcopy(elev_list)

#initiate a list to keep track of carving attempts that could not ﬁnd a lower cell (i.e., the

maximum

#carve length was exceeded before a lower elevation was found
no_lower_sink_found = []

145

 

(T‘r. .
i
I

420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462

#counters for output

sinks_carved = 0

percent_done = 0

stream_ary_length = len(stream__ary)

gp.AddMessage(' - Analyzing')
for i in range(len(elev_list)):
for j in range(len(elev_list[i])):
if stream_ary[i,j] = 1 and sink_ary[i,j] > 0: #we may have a barrier to carve through
carve(i, j)
sinks_carved += 1
percent = int((ﬂoat(i) / (stream__ary__length - l) * 100))
if percent > percent_done:
gp.AddMessage(' - Analyzing: ' + str(percent) + '% completed‘)
percent_done = percent

del elev_list, sink_ary

#write the output stream ﬁle
elev_out_ﬁle_name = sub_workspace + "\\elev_out" + str(h + l) + ".txt"
elev_out_ﬁle = open(elev_out_ﬁle_name, 'w')
for i in range(len(raster_header)):
elev_out_ﬁle.write(raster_header[i])

percent_done = 0
gp.AddMessage(' - Writing outputs')
for i in range(len(elev_out_list)):
for j in range(len(elev_out_list[i])):
elev_out_ﬁle.write('%s ' % (elev_out_list[i][j]))
elev_out_ﬁle.write(ls)
percent = int((ﬂoat(i) / (len(elev_out_list) - l) * 100))
if percent > percent_done:
gp.AddMessage(' - Writing: '+ str(percent) + '% completed')
percent_done = percent
e1ev__out_ﬁle.close()
del elev_out_ﬁle

gp.AddMessage(' - There were ' + str(sinks_carved) + ' sinks carved.')
gp.AddMessage(' - There were ' + str(len(no_lower_sink_found)) + ' sinks left unresolved')
del no_lower_sink_found

146

 

tic. 1- " .

463 #convert the elev_out_ﬁle to a raster

464 gp.AddMessage(' - Converting the output elevation ﬁle to a raster')

465 elev_out_raster = sub_workspace + "\\elev_out" + str(h + 1)

466 gp.ASCIIToRaster_conversion(elev_out_ﬁle_name, elev_out_raster, "FLOAT")
467

468

469

470 #clean up

471 del elev_out_list

472 gp.AddMessage(' - Cleaning up')

473 gp.Delete_management(ﬂowdir_raster)

474 os.remove(sink_ascii_path)

475

476 if h == 0: os.remove(elev_ascii_path)

477

478

479 if delete_intermediate == 'True':

480 #clean up some more

481 gp.AddMessage('Deleting intermediate data')
482

483 for i in range(iterations):

484 sub_workspace = workspace + 'iteration' + str(i + l)
485 if i < (iterations - I):

486 shutil.rmtree(sub_workspace)

487 else:

488 gp.CopyRaster_management(sub_workspace + "\\elev_out" + str(i + 1), output_raster)
489 shutil.nntree(sub_workspace)

490 os.remove(stream_ascii_path)

491 #del stream_list

492 del stream_ary

493 else:

494 os.remove(stream_ascii_path)

147

\OOOQONMAUJN—

10
11
12
13
14
15
I6
17
l8
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

APPENDIX B — Python Code

ﬂow_error.py

#
# ﬁow_error.py

# Created September 2009
# Author: Glenn O'Neil

 

 

#

# Description: Uses the lO-meter ﬂow-direction raster (ned_fd) and the LiDAR
# tile ﬂow-direction rasters (lidar_fd) in the lidar directory
# to calculate ﬂow-direction error for each lO-meter cell.

#

# Inputs: 1. quad_id - quadrangle ID (e.g. CW210).

# 2. workspace - directory for quadrangle containing all of the
# datasets (ned_fd and lidar_fd).

#

# Outputs: 1. ﬂow_error.tif - TIFF raster of ﬂow-direction raster at the
# extent and cell-size of the ned_fd (IO-meter) with a value
# of 0-4 calculated for each cell.

#

from osgeo import gdal, gdalconst
import time, os, sys, numpy, copy
import raster_analysis as raster

ls = os.1inesep

#get the quad name
quad_id = sys.argv[l]
workspace = sys.argv[2]

#function to convert seconds to hours and minutes (borrowed from:
#http://mail.python.org/pipermail/python-list/2003-January. l81366.html
def sec_to_h_min(s):

temp = ﬂoatO

temp = ﬁoat(s) / (60*60*24)

d = int(temp)

temp = (temp - d) "‘ 24

h = int(temp)

temp = (temp - h) * 60

m = int(temp)

temp = (temp - m) * 60

sec = int(temp)

return h,m,sec

148

 

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

65
66

67
68

69
70
7 l
72
73
74
75
76
77
78
79
80
8 1

#start the timer
script_start__time = time.clock()

#create the log textﬁle

ﬂow_results_ﬁle_path = workspace + "\\log.txt"
ﬂow_results_ﬁle = open(ﬂow_results_ﬁle_path, 'w')
ﬂow_results_ﬁle.write("QUAD_lD;" + quad_id + ls)
ﬂow_results_ﬁle.close()

del flow_results_ﬁle

#read the lidar_fd into a list, by way of a less efﬁcient array
ned_fd = workspace + "\\ned_fd"

ned_fd_ary = raster.raster2array(ned_fd)

ned_fd_list = ned_fd_ary.tolist()

del ned_fd_ary

#read the ned_ﬁd raster into an array
ned_ﬁd = workspace + "\\ned_ﬁd"
ned_ﬁd_ary = raster.raster2array(ned_ﬁd)
ned_ﬁd_list = ned_ﬁd_ary.tolist()

del ned_ﬁd_ary

#initiate an array to represent the resulting flow direction evaluation, with initial values of 0.

#make it the same size as the fd_list
fd_eval_ary = numpy.zeros([len(ned_fd_list),len(ned_fd_1ist[0])])

#initiate an array to keep track of cells that have been processed (to facilitate assignment of
NoData values later)

cells _processed_ary = numpy.zeros([len(ned_fd_list),len(ned_fd_list[0])])

#iterate through the folders of the workspace, and conduct the ﬂow analysis
lidar_wspace_contents = os.1istdir(workspace + "\\lidar")
raster_list = []
for i in lidar_wspace_contents:
if os.path.isdir(workspace + "\\lidar\\" + i) and i != 'info':
raster_list.append(i)

raster_list_length = Ien(raster_list)
for z in range(len(raster_list)):
raster_folder_start_time = time.clock()

149

maﬁa

 

 

 

 

 

82 check_timer_start = time.clock()
83 4‘
84 raster_folder = workspace + "\\lidar\\" + raster_list[z]

85 print ' - Processing raster’ + raster_folder + '(' + str(z + l) + ' of ' + str(raster_list_length) + ')'
86 #get the LiDAR dataset

87 lidar_fd = raster_folder + "\\lidar_fd"

88

89

90 #get the cell size parameters from the ﬂow_analysis_parameters.txt ﬁle and store them in a
dictionary

91 parameters_list = []

92 parameters_dict = {}
93 parameters_ﬁle_path = raster_folder + "\\flow_analysis_parameters.txt"

94 parameters_ﬁle = open(parameters_ﬁle_path, 'r')

95 for eachLine in parameters_ﬁle:

96 parameters_list.append(eachLine.split(";"))
97 parameters_ﬁle.close()

98

99 #get rid of the newline character at the end of each row, and add it to the dictionary

100 for i in range(len(parameters_list)):

101 parameters_list[i][l] = parameters_list[i][l][:len(para1neters_list[i][l]) - 2]
102 parameters_dict[parameters_list[i][0]] = parameters_list[i][1 ]
103

104 #indices for performing neighborhood analyses

105 buffer_size = int(parameters_dict['buﬁ'er_size'])

106 start_center_row = buffer_size

107 start_center_col = buffer_size

108 end_center_row = 0

109 end_center_col = 0

1 10

l 1 1 #read the lidar_fd into a list by way of an array

1 12 lidar_fd_ary = raster.raster2array(lidar_fd)

113 lidar_fd_list = lidar_fd_ary.tolist()

l 14 del lidar_fd_ary

115

116 #read the LiDAR dataset with the NED FID values into a list by way of an array
117 lidar_ned_ﬁd = raster_folder + "\\lidar_ned_ﬁd"

1 18 lidar_ned_ﬁd_ary = raster.raster2array(lidar_ned_ﬁd)

119 lidar_ned_ﬁd_list = lidar_ned_ﬁd_ary.tolist()

120 del lidar_ned_ﬁd__ary

121

122 #read the binary mask for ﬂat areas (derived from LiDAR-scale dataset 'carved_dem')
123 ﬂat_areas = raster_folder + "\\ﬁat_areas"

150

124
125
126

127
128

129
130

131

I32
133
134
135
136
137
138
139
140
141

142
143

144
145
146
147
I48
I49
150
151
152
153
154
155
156
157
158

159
160

I61
162
163

 

ﬂat__areas_ary = raster.raster2array(ﬂat_areas) \
ﬁat_areas_list = ﬂat_areas_ary.tolist() -
del ﬂat_areas_ary

#Index counters for keeping track of lidar_ned_id indices as the code loops through the

ned_ﬁd

#e.g. We will extract the lidar cells that correspond to the selected NED ID. To do that
#we must loop through the array to ﬁnd the appropriate cells, and we don't want to start the

search

#at the beginning each time. Since we're skipping the edge cells of the NED, we'll set the

starting

#values to two less than the number of lidar cells per NED cell.
1idar_cells_per_ned_cell = int(ﬂoat(parameters_dict["lidar_cells_per_ned_cell"]))
lidar_ned_ﬁd_row_start = int(lidar_cells_per_ned_cell) - 2
lidar_ned_ﬁd__row_end = lidar_ned_ﬁd_row_start + 1

lidar_ned_ﬁd_col_start = int(lidar_cells_per_ned_cell) - 2

lidar_ned_ﬁd_col_end = lidar_ned_ﬁd_row_end

script_start__time = time.clock()

#identify the NED F ID of the ﬁrst and last LiDAR cells

lidar_ned_ﬁd_start = lidar_ned_ﬁd_list[0][0]
lidar_ned_ﬁd_end = lidar_ned_ﬁd_list[len(lidar_ned_ﬁd_list) - 1][len(lidar_ned_ﬁd_list[0]) -

#In a few tiles, NoData values crept into the dataset at the comers, so we must check for those
#and grab the ﬁrst positive ned_ﬁd value, by moving diagonally from the comers
row = 0
col = 0
while 1idar_ned_ﬁd_start < 0:
lidar_ned_ﬁd_sta1t = lidar_ned_ﬁd_list[row] [col]
row += 1
col += 1

row = len(lidar_ned_ﬁd_list) - 1

col = len(lidar_ned_ﬁd_list[row]) - 1

while lidar_ned_ﬁd_end < 0:
lidar_ned_ﬁd_end = lidar_ned_ﬁd_list[row][col]
row -= 1
col -= 1

#determine the index in ned_ﬁd_list where the lidar_ned_ﬁd_start and lidar_ned_ﬁd_end

values are located

ned_ﬁd_list_start_row = 0
ned_ﬁd_list_start_col = 0
ned_ﬁd_list_end_row = O

151

I64
165
166
167
168
169
I70
171
172
173
174
175
176
I77
178
179
180
181
182
183
184
185
186
187

188
189

190
191
192

193
194

195
196

197
198

199
200

201
202

203
204

ned_ﬁd_list_end_col = 0
for i in range(len(ned_ﬁd_list)):
try:
ned_ﬁd_list_start_col = ned_ﬁd_list[i].index(lidar_ned_ﬁd_start)
ned_ﬁd_list_start_row = i
break
except ValueError:
pass

for i in range(ned_ﬁd_list_start_row, len(ned__ﬁd_list)):
try:
ned_ﬁd_list_end_col = ned_ﬁd_list[i].index(lidar_ned_ﬁd_end)
ned_ﬁd_list_end_row = i
break
except ValueError:
pass

#move through ned_ﬁd_list performing a neighborhood assessment of ﬂow-direction
percent_done = 0

#variable for determining if a ﬂat_cell was encountered, breaking the loop
ﬂat_area__four1d = False

for i in range(ned_ﬁd__list_start_row + 2, ned_ﬁd_list_end_row - 2): #+2 and -2 to avoid the
top and bottom edges

#reset the Iidar_ned_ﬁd_col_start =
lidar_ned_ﬁd_col_start = int(lidar_cells_per_ned__cel1) - 2

for j in range(ned_ﬁd_list_start_col + 2, ned_ﬁd_list_end_col - 2): #+2 and -2 to avoid the
left and right edges

ned_ﬁd_value = ned_ﬁd_list[i][j]

#extract the LiDAR cells of the selected NED cell and the neighborhood around the
selected NED cell

#ﬁrst, identify where the LiDAR cells are
end_search = False

#skip no data cells, e.g. ned_ﬁd_value < 0 for a 32-bit unsigned integer, and cells that
have already been processed

if ned_ﬁd_value < 0 or cells _processed_ary[i,j] == :
break

152

205 "

 

206 for k in range(lidar_ned_ﬁd_row_start, len(lidar_ned_ﬁd_list)): \

207 for m in range(lidar_ned_ﬁd_col_start, len(lidar__ned_ﬁd_list[k])):

208 if lidar_ned_ﬁd_list[k][m] = ned_ﬁd_value: #we've found the start of the NED
area within the LiDAR cells

209 lidar_ned_ﬁd_row_start = k

210 lidar_ned_ﬁd_col_start = m

21 1 #ﬁnd the col_end

212 for n in range(lidar_ned_ﬁd_col_start, len(lidar_ned_ﬁd_list[k])):

213 if lidar_ned_ﬁd__list[k][n] != ned_ﬁd_value: #we've found the end of the NED
area (the previous column)

214 lidar_ned_ﬁd_col_end = n - l

215 break

216 #fmd the row_end

217 for n in range(lidar_ned_ﬁd_row_start, len(lidar_ned_ﬁd_list)):

218 if lidar_ned_ﬁd_list[n][lidar_ned_ﬁd_col_start] 1= ned_ﬁd_value: #we've
found the end of the NED area (the previous row)

219 lidar_ned_ﬁd_row_end = n - l

220 end_search = True

221 break

222 break

223 if end_search:

224 break

225

226 #determine the boundaries for extracting the LiDAR ﬂow-direction values for the
deﬁned area and a buffer-area equal to half a NED cell.

227 lidar_nhood_row_start = Iidar_ned_ﬁd_row_start - buffer_size

228 lidar_nhood_row_end = lidar_ned_ﬁd_row_end + buffer_size

229 lidar_nhood_col_start = lidar_ned_ﬁd_col_start - buffer_size

230 lidar_nhood_col_end = lidar_ned_ﬁd_col_end + buffer_size

231

232 #create lists to store the values of neighborhood cells

233 lidar_nhood_fd_list = []

234 ﬂat_areas_nhood_list = []

235 for k in range(lidar_nhood_row_start, lidar_nhood_row_end + 1):

236 lidar_nhood_fd_list.append(lidar_fd_list[k][lidar_nhood_col_startlidar_nhood_col_end + 1])
237 ﬂat_areas_nhood_list.append(ﬂat_areas_list[k][lidar_nhood_col_start:lidar_nhood_col_end +
1l)

238

239 nhood_rows = len(lidar_nhood_fd_list)

240 nhood_cols = len(lidar_nhood_fd_list[0])
241

242 end_center_row = nhood_rows - buffer_size
243 end_center_col = nhood_cols - buffer_size
244

153

245

246
247
248

249
250

251
252
253
254
255
256
257
258
259

260
261

262
263
264
265

266
267

268

269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284

#create a list that will mark the selected NED area LiDAR cells (center cells) as l and the
buffer cells as 0

lidar_nhood_ary = numpy.zeros_like(lidar_nhood_fd_list)
lidar_nhood_list = lidar_nhood_ary.tolist()
del lidar_nhood_ary

#1idar_nhood_list[start_center_row:end_center_row] [start_center_col:end_center_col] =

for k in range(start_center_row, end_center_row + 1):
for n in ran ge(start_center_col, end_center_col + l):
lidar_nhood_list[k][n] = 1

#create a list that will store ﬂow-accumulation of center cells within the neighborhood
lidar_nhood_fa_ary = numpy.zeros_like(lidar_nhood_fd_list)

lidar_nhood_fa_list = lidar_nhood_fa_ary.tolist()

del lidar_nhood_fa_ary

#Calculate ﬂow_accumulation in the LiDAR neighborhood.
#Visit each center cell (i.e. lidar_nhood_list == 1), trace its ﬂow path using
lidar_nhood_fd_list until it exits the neighborhood

counter = 0
for k in range(start_center_row, end_center_row):
for m in range(start_center_col, end_center_col):
current_row = k

current_col = m

#while current_row < nhood_rows - 1 and current_row >= 0 and current_col <
nhood_cols - 1 and current_col >= 0:

while current_row < nhood_rows and current_row >= 0 and current_col <
nhood_cols and current_col >= 0:

#check for ﬂat_area

if ﬂat_areas__nhood_list[current_row][current_col] == :
ﬂat_area_found = True
break

#determine what direction ﬂow exits the current cell
lidar_fd_value = lidar_nhood_fd_list[current__row][current_col]
if lidar_fd_value == 1: #ﬂow east

current_col += 1
elif lidar_fd_value == 2: #ﬂow southeast

current_col += 1

current_row += 1
elif lidar_fd_valu == : #ﬂow south

current_row += 1
elif lidar_fd_value == : #ﬁow southwest

154

 

285
286
287
288
289
290
291
292
293
294
295
296
297

298
299

300
301
302
303
304
305
306
307
308
309
310
31 1
312

313
314

315
316
317

318
319

320
321
322
323
324
325

current_col -= l
current_row += 1

elif lidar_fd_value == 16: #ﬂow west
current_col -= l

elif lidar_fd_value == 32: #ﬂow northwest
current_col -= 1
current_row -= 1

elif lidar_fd_value == 64: #ﬂow north
current_row -= l

elif lidar_fd_value == 128: #ﬁow northeast
current_col += 1
current_row -= 1

#update the ﬂowaccumulation list
if current_row < nhood_rows and current_row >= 0 and current_col <
nhood_cols and current_col >= 0:

lidar_nhood_fa_list[current_row][current_col] += 1

if ﬂat_area__found:
break

if ﬂat_area_found:
break

if ﬂat_area_found:
fd_eva1_ary[i,j] = -9999
ﬂat_area_found = False

else:

#re-adjust end_center_row and end_center_col to zero-based indexes, they currently

just
#hold the difference in size between the neighborhood size and buﬁer size
end_center_row -= l
end_center_col -= 1
#create a dictionary that will store the proportion of edge cells in each neighborhood
quad

nhood_proportions_dict = {}

nhood_proportions_dict["NW"] = start_center_row + start_center_col
nhood_proportions_dict["N"] = end_center_col - start_center_col + 1
nhood_proportions_dict["NE"] = nhood_proportions_dict["NW"]
nhood_proportions_dict["E"] = nhood_proportions_dict["N"]
nhood_proportions_dict["SE"] = nhood_proportions_dict["NW"]

155

 

326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344

345
346

347
348
349
350

351
352

353

354
355

356

357
358

359
360
361
362
363

364
365

nhood_proportions_dict["8"] = nhood_proportions_dict["N"]
nhood_proportions_dict["SW"] = nhood_proportions_dict["NW"]
nhood_proportions_dict["W"] = nhood_proportions_dict["N"]
num_edge_cells = ﬂoat(sum(nhood_pr0portions_dict.values()))
for key in nhood_proportions_dict:

nhood_proportions_dict[key] = nhood_proportions_dict[key] / num_edge_cells

#create a dictionary that will store the total exiting ﬂow for each neighborhood quad
ﬂow_dict = {}

ﬂow_dict["NW"] = 0

ﬂow_dict["N"] = 0

ﬁow_dict["NE"] = 0

ﬂow_dict["E"] = 0

ﬂow_dict["SE"] = O

ﬂow_dict["S"] = 0

ﬂow_dict["SW"] = 0

ﬂow_dict["W"] = 0

#trace the outer edge of the boundary cells, and count the ﬂowaccumulation cells that

exit the study area

or NE

or NE

or NE

#aggregate the results by direction (E,SE,S,SW,W,NW,N,NE)

#trace the northern edge
for k in range(nhood_cols):

if k < start_center_col: #we're in the northewest quad
if lidar_nhood_fd_list[0][k] in [32,64,128]: #then the ﬂow exits the quad NW, N,

ﬂow_dict["NW"] += lidar_nhood_fa_list[0][k]

elif k >= start_center_col and k <= end_center_col: #we're in the northern quad
if lidar_nhood_fd_list[O] [k] in [32,64,128]: #then the ﬂow exits the quad NW, N,

ﬂow_dict["N"] += lidar_nhood_fa_list[0][k]

else: #we're in the northeast quad
if lidar_nhood_fd_list[0][k] in [32,64,128]: #then the ﬂow exits the quad NW, N,

ﬂow_dict["NE"] += lidar_nhood_fa_list[0][k]

#trace the southern edge
last_row_index = nhood_rows - 1
for k in range(len(lidar_nhood_fd_list[last_row_index])):

if k < start_center_col: #we're in the southwest quad
if lidar_nhood_fd_list[last_row_index] [k] in [2,4,8]: #then the ﬂow exits the quad

SE, S, or SW

156

 

366

367
368

369

370
371

372
373
374
375

376
377

378

379
380
381
382

383
384

385

386
387

388
389
390
391
392

393
394

395

396
397
398
399

400
401

402
403

ﬂow_dict["SW"] += lidar_nhood_fa_list[last_row_index][k]

elif k >= start_center_col and k <= end_center_col: #we're in the south quad
if lidar_nhood_fd_list[last_row_index][k] in [2,4,8]: #then the ﬂow exits the quad
SE, S, or SW
ﬂow_dict["S"] += lidar_nhood_fa_list[last_row_index] [k]
else: #we're in the southeast quad
if lidar_nhood_fd_list[last_row_index][k] in [2,4,8]: #then the ﬂow exits the quad
SE, S, or SW

ﬂow_dict["SE"] += lidar_nhood_fa_list[last_row_index][k]

#trace the western edge
for k in range(nhood_rows):
if k < start_center_row: #we're in the northewest quad
if lidar_nhood_fd_list[k][O] in [8,16,32]: #then the ﬂow exits the quad SW, W, or
NW
#don't double count the northwestemmost cell, it may have been counted in the
trace of the northern edge
if k == 0 and lidar_nhood_fd_list[0][0] == 32:
pass
else:
ﬂow_dict["NW"] += lidar_nhood_fa_list[k][O]

elif k >= start_center_row and k <= end_center_row: #we're in the west quad
if lidar_nhood_fd_list[k][O] in [8,16,32]: #then the ﬂow exits the quad SW, W, or

NW
ﬂow_dict["W"] += lidar_nhood_fa_list[k][O]
else: #we're in the southwest quad
if lidar_nhood_fd_list[k][O] in [8,16,32]: #then the ﬂow exits the quad SW, W, or
NW

flow_dict["SW"] += lidar_nhood_fa_list[k][O]

#trace the eastern edge
last_col_index = nhood_cols - 1
for k in range(nhood_rows):

if k < start_center_row: #we're in the northeast quad
if lidar_nhood_fd_list[k][last_col_index] in [128,1,2]: #then the flow exits the
quad NE, E, or SE
#don't double count the northeastemmost cell, it may have been counted in the
trace of the northern edge

if k = 0 and lidar_nhood_fd_list[O][last_col_index] == 128:
pass

else:
ﬂow_dict["NE"] += lidar_nhood_fa_list[k][last_col_index]

elif k >= start_center_row and k <= end_center_row: #we're in the east quad
if lidar_nhood_fd_list[k][last_col_index] in [128,1,2]: #then the ﬂow exits the
quad NE, E, or SE

ﬂow_dict["E"] += lidar_nhood_fa_list[k][last_col_index]
else: #we're in the southeast quad

157

 

404

405

406

407
408
409
410
41 1

412
413

414
415
416
417
418
419
420
421
422
423
424
425
426

427
428

429
430
431
432
433
434
435
436
437
438
439
440
441
442
443

quad NE,

if lidar_nhood_fd_list[k][last_col_index] in [128,1,2]: #then the ﬂow exits the
E, or SE
#don't double count the southeastemmost cell, it may have been counted in the

trace of the southern edge

if k = last_row_index and

lidar_nhood_fd_list[last_row_index][last_col_index] == :

pass
else:
flow_dict["SE"] += lidar_nhood_fa_list[k][last_col_index]

#create a dictionary that will store the percentage of total exiting ﬂow for each

neighborhood quad

direction

sum_ﬂow__exit_cells = ﬁoat(sum(ﬁow_dict.values()))
if sum_ﬂow_exit_cells == :

print '0 value at ' + str(i) + ' , ' + str(i) + '; ned_id = ' + str(ned_ﬁd_value)
ﬂow _pct_dict = {}
ﬂow _pct_dict["NW"] = ﬂow_dict["NW"] / sum_ﬂow_exit_cells
ﬁow_pct_dict["N"] = ﬂow_dict["N"] / sum_ﬂow__exit_cells
ﬂow _pct_dict["NE"] = ﬂow_dict["NE"] / sum_ﬂow_exit_cells
ﬂow _pct_dict["E"] = ﬂow_dict["E"] / sum_ﬁow_exit_cells
ﬂow _pct_dict["SE"] = ﬂow_dict["SE"] / sum_ﬁow_exit__cells
ﬂow _pct_dict["S"] = ﬂow_dict["S"] / sum_ﬂow_exit_cells
ﬂow_pct_dict["SW"] = ﬂow_dict["SW"] / sum_ﬂow_exit_cells
ﬂow _pct_dict["W"] = ﬁow_dict["W"] / sum_ﬂow_exit_cells

#evaluate the difference in ﬂow-direction ﬁ'om the center cells to the NED ﬂow-

ned_fd_value = ned_fd_list[i][j]

ﬁow_error_dict = {}

if ned_fd_value == 1: #east
flow_error_dict["NW"] = 3
ﬂow_error_dict["N"] = 2
ﬂow_error_dict["NE"] = I
ﬂow_error_dict["E"] = O
ﬂow_error_dict["SE"] = 1
ﬂow_error_dict["S"] = 2
ﬂow_error_dict["SW"] = 3
ﬂow_error_dict["W"] = 4

elif ned_fd_value == 2: #southeast
flow_error_dict["NW"] = 4
ﬂow_error_dict["N"] = 3
ﬂow_error_dict["NE"] = 2

158

444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486

ﬂow_error_dict["E"] = 1
ﬂow_error_dict["SE"] = 0
ﬂow_error_dict["S"] = 1
ﬂow_error_dict["SW"] = 2
ﬁow_error_dict["W"] = 3

elif ned_fd_value == 4: #south

ﬂow_error_dict["NW"] = 3
flow_error_dict["N"] = 4
ﬂow_error_dict["NE"] = 3
flow_error_dict["E"] = 2
flow_error_dict["SE"] = 1
ﬂow_error_dict["S"] = 0
ﬂow_error_dict["SW"] = 1
ﬂow_error_dict["W"] = 2

e1ifned_fd_value == 8: #southwest

ﬂow_error_dict["NW"] = 2
ﬂow_error_dict["N"] = 3
ﬁow_error_dict["NE"] = 4
ﬁow_error_dict["E"] = 3
ﬂow_error_dict["SE"] = 2
ﬂow_error_dict["S"] = l
ﬂow_error_dict["SW"] = O
ﬁow_error_dict["W"] = l

elif ned_fd_value == 16: #west

ﬂow_error_dict["NW"] = l
ﬂow_error_dict["N"] = 2
ﬂow_error_dict["NE"] = 3
ﬂow_error_dict["E"] = 4
ﬂow_error_dict["SE"] = 3
ﬂow_error_dict["S"] = 2
ﬂow_error_dict["SW"] = 1
ﬂow_error_dict["W"] = 0

elif ned_fd_value == 32: #northwest

ﬂow_error_dict["NW"] = 0
ﬂow_error_dict["N"] = 1
ﬁow_error_dict["NE"] = 2
ﬂow_error_dict["E"] = 3
ﬂow_error_dict["SE"] = 4
ﬂow_error_dict["S"] = 3
ﬂow_error_dict["SW"] = 2
ﬂow_error_dict["W"] = l

elif ned_fd_value == 64: #north

ﬂow_error_dict["NW"] = 1

159

487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519

520
521

522
523
524
525
526
527
528

ﬂow_error_dict["N"] = 0
ﬁow_error_dict["NE"] = l
ﬂow_error_dict["E"] = 2
ﬁow_error_dict["SE"] = 3
ﬂow_error_dict["S"] = 4
ﬂow_error_dict["SW"] = 3
ﬂow_error_dict["W"] = 2
elif ned_fd_value == 128: #northeast
ﬂow_error_dict["NW"] = 2
ﬂow_error_dict["N"] = l
ﬂow_error_dict["NE"] = 0
ﬂow_error_dict["E"] = 1
ﬁow_error_dict["SE"] = 2
ﬂow_error_dict["S"] = 3
ﬂow_error_dict["SW"] = 4
ﬂow.error_dict["W"] = 3

weighted_ﬂow_error = 0
for direction in ﬂow _pct_dict:
weighted_ﬂow_error += ﬂow _pct_dict[direction] * ﬂow_error_dict[direction]
#record the weighted ﬂow error in the NED sized raster list
fd_eval_ary[i,j] = weighted_ﬂow_error

del nhood_proportions_dict, ﬂow_dict, ﬂow _pct_dict, ﬂow_error_dict

cells _processed_ary[i,j] = I

#delete used lists and dictionaries
del lidar_nhood_list, lidar_nhood_fd_list, ﬂat_areas_nhood_list, lidar_nhood_fa_list

percent = int(ﬂoat(i - ned_ﬁd_list_start_row) / (ned_ﬁd_list_end_row -
ned_ﬁd_list_start_row) * 100)

if percent > percent_done:
print ' - ' + str(percent) + "% done (ned_ﬁd : " + str(ned_ﬁd_value) + ")"
percent_done = percent

del lidar_fd_list, 1idar_ned_ﬁd_list

160

 

 

529

530
531

532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548

549
550

551
552
553
554
555

 

raster_folder_seconds = time.clock() - raster_folder_start_time

h,m,s = sec_to_h_min(raster_folder_seconds) \_
print ' - raster ' + raster_folder + ' took ' + str(h) + ' hours ' + str(m) + ' minutes ' + str(s) + '
seconds'

#loop through the cells _processed_ary and set any cells with a value of 0 to NoData (-9999)
for i in range(len(cells_processed_ary)):
for j in range(len(cells_processed_ary[i])):
if cells _processed__ary[i,j] == :
fd_eval_ary[i,j] = -9999

#write the cumulative error array to a raster

#use the TIFF ﬁle as a template, since GDAL cannot write ArcINFO binary rasters
template_raster = workspace + "\\template_raster.tif'

template_raster_ds = gdal.0pen(template_raster, gdalconst.GA_Update)
template_raster_band = template_raster_ds.GetRasterBand(1)
template_raster_band.WriteArray(fd_eval_ary)
template_raster_band.FlushCache()

template_raster_ds.FlushCacheO

del template_raster_ds, template_raster_band, fd_eval_ary, ned_fd_list, ned_ﬁd_list,
lidar_wspace_contents, raster_list, cells _processed_ary

os.rename(template_raster, template_raster.replace('template_raster’, 'ﬂow_error'))
seconds = time.clock() - script_start_time

h,m,s = sec_to_h_min(seconds)
print 'Whole thing Took ' + str(h) + ' hours ' + str(m) + ' minutes ' + str(s) + ' seconds'

161

 

 

\OOOQONMAWN—

WWWWWMWWWMNNNNNNNNNN—t—n—Iu—ou—tu—n—u—u—n—
\OWVO‘MAwN—‘OQWQO‘M-5‘JJN—OOWNONM#WN—O

APPENDIX B — Python Code

 

 

contour_dlg_conversion.py

 

#
# contour_dlg_conversion.py
# Created September 2009

# Author: Glenn O'Neil

#

# Description: Takes a directory of DLG hypsography ﬁles and converts each
# ﬁles relevant features to ESRI shapeﬁles. Crashes due to an

# ESRI bug if it tries to process more than 70 ﬁles.

#

# Inputs: 1. d1g_wspace - directory containing DLG hypsography ﬁles.

# 2. scratch_wspace - directory for temporary ﬁles.

# 3. out_wspace - directory where the output shapeﬁles are written.
#

# Outputs: 1. <quad_name>_ohp.shp - shapeﬁle of quadrangle contours.

#

 

import os, arcgisscripting, sys
gp = arcgisscripting.create(9.3)

d1g_wspace = sys.argv[l]
scratch_wspace = sys.argv[2] + "\\"
out_wspace = sys.argv[3] + "\\"

#get the dlg ﬁles
d1g_wspace_list = os.listdir(dlg_wspace)
dlg_list = []
#ﬁlter the list so it only contains .dlg ﬁles
for i in range(len(dlg_wspace__list)):
if '.d1g' in (dlg_wspace_list[i]):

dlg_list.append(dlg_wspace_list[i])

del d1g_wspace_list

d1g_list_length = len(dlg_list)
for i in range(dlg_list_length):
gp.AddMessage("Processing " + str(i + l) + " of " + str(dlg_list_length) + ":")

#convert the dlg to a coverage

gp.AddMessage(" - converting DLG to coverage")
dlg_cov = scratch_wspace + "dlg_cov"

162

40 gp.dlgarc(dlg_wspace + "\\" + dlg_list[i], dlg_cov)
4] #convert the coverage to a shapeﬁle

 

42 gp.AddMessage(" - converting coverage to shapeﬁle")
43 dlg_shp = dlg_cov + "_arc.shp"

44 gp.FeatureClassToShapeﬁle(dlg_cov + "\\arc", scratch_wspace)
45 #join the .acode ﬁle to the shapeﬁle table on the ID ﬁeld
46 gp.AddMessage(" - joining acode table to shapeﬁle")
47 acode = dlg_cov + ".acode"

48 gp.joinﬁeld (dlg_shp, "ID", acode, "DLG_COV-ID")

49 #extract the appropriate stream features

50 gp.AddMessage(" - extracting features")

51 #get the appropriate ﬁelds from dlg_shp

52 code_ﬁelds = gp.|istﬁelds(dlg_shp, 'MlNOR”)

53 select_query = "'MAJORI " = 20 AND ('

54 for j in range(len(code_ﬁelds)):

55 if j != (1en(code_ﬁelds) - 1):

56 select_query += "" + code_ﬁeldsﬁ].Name + "' NOT IN
(202,205,206,207,208,209,210,299) OR '

57 else: #treat the last record differently

select_query += "" + code_ﬁeldsﬁ].Name + '" NOT IN
58 (202,205,206,207,208,209,210,299))'

59

60 dlg_out = out_wspace + dlg_list[i][:8] + '.shp'

61 gp.select_analysis(dlg_shp, dlg_out, select_query)
62

63 gp.AddMessage(" - deleting intermediate data")
64 gp.delete_management(dlg_cov)

65 gp.delete_management(dlg_shp)

66

67 del dlg_list, select_query

163

\OOONO‘LIIADJNH

APPENDIX B — Python Code

contour_toppology_analysis.py

#
# contour_topology_analysis.py
# Created September 2009

# Author: Glenn O'Neil

 

#

# Description: This script identiﬁes topological error in contour datasets.

# It analyzes node ID information to determine how many contour

# intersections exists in a dataset. It uses quad boundary and

# featre vertices to determine how many contour dangles exist.

#

# Inputs: 1. quad_ds - an ESRI shapeﬁle of 7.5 Minute USGS Quadrangles

# 2. quad_id_ﬁeld - unique ID ﬁeld of the 'quads' shapeﬁle

# 3. contour_wspace - workspace where the contour datasets are stored
# 4. workspace - a folder where temporary datasets will be written

#

# Outputs: 1. Inter - a ﬁeld in quad_ds that represents the number of

# intersections in that quad.

# 2. Dangle - a ﬁeld in quad_ds that represents contour dangle

# 3. Contours - a ﬁeld in quad_ds that represents the number of contours
# 4. Int _p_cnt — a ﬁeld in quad_ds that represents the number of

# intersections per contour.

# 5. Dan _p__cnt - a ﬁeld in quad_ds that represents the number of

# dangles per contour.

#

 

#import the necessary modules
import sys, 05, time, arcgisscripting
gp = arcgisscripting.create(9.3)

#inputs

quad_ds = sys.argv[l]
quad_id_ﬁeld = sys.argv[2]
contour_wspace = sys.argv[3]
workspace = sys.argv[4]

gp.OverWriteOutput = 1

#insert a ﬁeld into the quad dataset that will store the number of intersections
quad_ds_ﬁelds = gp.listﬁelds(quad_ds)

164

 

40
41
42
43

45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
6O
61
62
63

65
66
67
68
69

70
71

72
73
74
75
76
77
78

79
80

81

ﬁeld _present = False

for i in range(len(quad__ds_ﬁelds)):
if quad_ds_ﬁelds[i].name == "Inter":

ﬁeld _present = True

if not ﬁeld _present:
gp.addﬁeld(quad_ds, "Inter", "short", "8")
gp.addﬁeld(quad_ds, "Dangle", "short", "8")
gp.addﬁeld(quad_ds, "Contours", "short", "8")
gp.addﬁeld(quad__ds, "Int _p_cnt", "FLOAT")
gp.addﬁeld(quad_ds, "Dan_p_cnt", "FLOAT")

#detennine the number of quads to process
rows = gp.searchcursor(quad_ds)
quad_count = 0
row = rows.next()
while row:

quad_count += 1

row = rows.next()
del row, rows

#loop through the quad_ids

quad_rows = gp.updatecursor(quad_ds)
quad_row = quad_rows.next()

counter = 0

while quad_row:

quad_id = quad_row.getvalue(quad__id_ﬁeld)
quad_contour_ds = contour_wspace + "\\" + quad_id + "ohp.shp"

gp.AddMessage("- Processing quad " + quad_id + "(" + str(counter + l) + " of " +
str(quad_count) + ")")

counter += 1

if not gp.exists(quad_contour_ds):
gp.AddMessage(" - could not ﬁnd contours for quad " + quad_id)
quad_row = quad_rows.next()

else:

#check to see if intersections have already been identiﬁed
if quad_row. getvalue("1nter") > 0: #the quad has PROBABLY already been processed,
possible that there were 0 intersections anyway

quad_row = quad_rows.next()

165

 

 

 

82
83

84
85

86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123

continue

#create a table of unique node IDs in the contours

#read the ﬁiode ﬁeld and store unique results in an array, repeat for tnode ﬁelds, then create a

table
cursor = gp.searchcursor(quad_contour_ds)
node_ids = []
row = cursor.next()
while row:
fnode_id = row.getvalue("FROMNODE")
tnode_id = row.getvalue("TONODE")
if not fnode__id in node_ids:
node_ids.append(ﬁiode_id)
if not tnode_id in node_ids:
node_ids.append(tnode_id)

row = cursor.next()
node_ids.sort()

node_id_table = workspace + "\\node_id.dbf'
gp.CreateTable(workspace, "node_iddbf')
gp.AddField(node_id_table, "ID", "SHORT")
rows = gp.InsertCursor(node_id_table)
for i in range(len(node_ids)):
row = rows.NewRow()
row.ID = node_ids[i]
rows.InsertRow(row)

del row, rows

#select only non-border contours (i.e. MAJOR] > 0)

#make a feature layer ﬁrst

quad_contour_ﬂ = quad_id + "_contour_ds__fl"

gp.MakeFeatureLayer(quad_contour_ds, quad_contour_ﬂ)
gp.SelectLayerByAttribute(quad_contour_fl, "NEW_SELECTION", "\"MAJORI\" > 0")

#Calculate FNode and TNOde Frequency Tables

ﬁiode_ﬁeq = workspace + "\\" + quad_id + "_fnode__freq.dbf‘
tnode_freq = workspace + "\\" + quad_id + "_tnode_freq.dbf'
gp.Frequency(quad_contour_ﬂ, ﬁrode_freq, "FROMNODE")
gp.Frequency(quad_contour_ﬂ, tnode_freq, "TONODE")

166

 

124
125
126
127
128
129
130
131
132
133

134
135

I36

137

138
I39

140

141

142
143

144
I45
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162

#Convert FNode and TNode Freq. to Table Views
fnode_freq_tv = quad_id + "_fnode_freq_tv"
tnode_ﬁ'ethv = quad_id + "_tnode_freq_tv"
gp.maketableview(ﬁiode_freq, fnode_freq_tv)
gp.maketableview(tnode_ﬁ'eq, tnode_freq_tv)

#select intersections in the frequency tables
gp.SelectLayerByAttribute(fnode_freq_tv, "NEW_SELECTION", "\"FREQUENCY\" > 1")
gp.SelectLayerByAttribute(tnode_freq_tv, "NEW_SELECTION", "\"FREQUENCY\" > I")

#join the selected frequency tables

gp.AddIoin_management(quad_contour_ﬂ, "FROMNODE", fnode_ﬁeq_tv, "FROMNODE",
"KEEP_ALL")

gp.AddJoin_management(quad_contour_fl, quad_id + "ohp.TONODE", tnode_freq_tv,
"TONODE", "KEEP_ALL")

#select the contours that have intersections (fromnode and tnode frequencies > 1)

gp.SelectLayerByAttribute(quad_contour_ﬂ, "NEW_SELECTION", "\"" + quad_id +
"_fnode_freq.FREQUENCY\" > 1 OR \"" + quad_id + "_tnode_freq.FREQUENCY\" > 1")

#gp.SelectLayerByAttribute("quad_contour_ds_fl", "NEW_SELECTION",
"'fnode_frethv:FREQUENCY" > 1')

#Subratct mid-contour nodes

gp.SelectLayerByAttribute(quad_contour_ﬂ, "REMOVE_FROM_SELECTION", “(\"" +
quad_id + "_fnode_freq.FREQUENCY\" = 2 AND \"" + quad_id + "_tnode_freq.FREQUENCY\"
IS NULL) OR (\"" + quad_id + "_tnode_freq.FREQUENCY\" = 2 AND \"" + quad_id +
"_fnode_freq.FREQUENCY\" IS NULL)")

intersection_count = gp.GetCount_management(quad_contour_ﬂ)

quad_row.lnter = intersection_count.getOutput(0)

#the dangle process:

#use feature vertices to points with the "BOTH_ENDS"

contour_vertices = workspace + "\\" + quad_id + "_vertices.shp"
gp.FeatureVerticesToPoints(quad_contour_ds, contour_vertices, "BOTH_ENDS")

#use addXY on the new points layer
gp.AddXY__management(contour_vertices)

#create a new string ﬁeld 30 characters long for X Y concatenation
gp.AddField(contour_vertices, "XY", "TEXT")

#concatenate the X ﬁeld and the Y ﬁeld with a ";"
gp.CalculateField(contour_vertices, "XY", "[POINT_X] & \";\" & [POINT_Y]")

167

 

163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178

179
180

181
182
183
184
185
186

187
188

1 89
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204

 

#calculate a frequency table on the concatenated XY values
contour_vertices_ﬂ = quad_id + "__contour_vertices_ﬂ"
gp.MakeFeatureLayer(contour_vertices, contour_vertices_ﬁ)
xy_freq = workspace + "\\" + quad_id + "_xy_ﬁ'eq.dbf'
gp.Frequency(contour_vertices_ﬁ, xy_freq, "XY")
xy_freq_tv = quad_id + "_xy_ﬁ'eq_tv"
gp.maketableview(xy_freq, xy_freq_tv)

#join the frequency table to the contour_vertices
gp.Adeoin_management(contour_vertices_ﬂ, "XY", xy_freq_tv, "XY", "KEEP_ALL")

#select the selected quad from the quad shapeﬁle and create a 500 foot inside buffer
#This allows us to avoid selecting the dangling nodes on the quad boundary
quad_select = workspace + "\\" + quad_id + "_sel.shp"

gp.select_analysis(quad_ds, quad_select, "\"" + quad_id_ﬁeld + "\" = "' + quad_id + ""')

quad_inside_buff = workspace + "\\" + quad_id + "_inside_buff.shp"
gp.buffer_analysis(quad_select, quad_inside_buff, "-200") #probably in meters, depends on

quad_select units

quad_inside__buff_ﬂ = quad_id + "_inside_buff_ﬂ"
gp.MakeFeatureLayer(quad_inside_buff, quad_inside_buff_ﬂ)

#select the vertices that intersect the new inside buffer
gp.SelectLayerByLocation(contour_vertices_ﬂ, "intersect", quad_inside_buﬁ)

#create a sub-selection the vertices that have a frequency = 1
gp.Se1ectLayerByAttribute(contour_vertices_ﬂ, "SUBSET_SELECTION", "\"" + quad_id +

"_xy_freq.FREQUENCY\" = 1")

#get the recourd count on the vertices
dangle_count = gp.GetCount_management(contour_vertices_ﬂ)

quad_rowDangle = dangle_count.getOutput(0)

#calculate the number of contours, to determine intersection and dangle ratios

#Calculate a contour frequency table

gp.RemoveJoin(quad__contour_ﬂ, quad_id + "_ﬁiode_freq", quad_id + "_tnode_freq")
gp.SelectLayerByAttribute(quad_contour_ﬂ) #clears its current selection
contour_ﬁ'eq = workspace + "\\" + quad_id + "_contour_freq.dbf'
gp.Frequency(quad_contour_ﬂ, contour_freq, "ID")

contour_count = gp.GetCount_management(quad_contour_ﬂ)

quad_row.Contours = contour_count.getOutput(O)

168

205

 

 

206 intersections _per_contour = float(intersection_count.getOutput(0))/
float(contour_count.getOutput(0))

207 dangles_per_contour = ﬁoat(dangle_count.getOutput(0)) / ﬂoat(contour_count.getOutput(O))

208

209 quad_row.lnt_p_cnt = str(intersections_per_contour)

210 quad_row.Dan_p_cnt = str(dangles_per_contour)

211

2 1 2 quad_rows.updaterow(quad_row)

213

2 l 4 gp.delete_management(quad_contour_ﬂ)

21 5 gp.delete_management(fnode_freq)

2 l 6 gp.delete_management(tnode_freq)

2 1 7 gp.delete_management(node_id_table)

2 1 8 gp.delete_management(contour_vertices)

2 l 9 gp.delete_management(contour_vertices_ﬂ)

220 gp.delete__management(xy_freq)

22 1 gp.delete_management(quad_select)

222 gp.delete_management(quad_inside_buff)

223 gp.delete_management(quad_inside_buff_ﬂ)

224 gp.delete_management(contour_freq)

225

226 quad_row = quad_rows.Next()

227

228 del quad_row, quad_rows

169

\OOOQO‘thbbJN—

OWQ¢MAWNHOOWQaMAWN~OOW\lQMbWN—o

APPENDIX B — Python Code

analysis_results.py

#
# analysis_results.py

# Created September 2009
# Author: Glenn O'Neil

 

#

# Description: Takes a directory of ﬂow error rasters, reads it into a NumPy
# Array through GDAL and records each raster's mean, standard
# deviation and median ﬂow error value into a text ﬁle.

#

# Inputs: 1. workspace - directory

# 2. output_text_ﬁle - text ﬁle containing the mean, median,

# and standard deviation ﬂow error values for each raster.

#

# Outputs: 1. output text ﬁle -

#

#

 

import sys, os, raster_analysis, numpy
from osgeo import gdal, gdalconst

#get the inputs

workspace = sys.argv[l]
output_text_ﬁle_path = sys.argv[2]
1s = os.linesep

output_text_ﬁle = open(output_text_ﬁle_path, 'w')

#get the folders of the workspace
quads_list = []
quads_wspace_contents = os.listdir(workspace)
for i in quads_wspace_contents:
if os.path.isdir(workspace + "\\" + i):
quads_list.append(i)
del quads_wspace_contents

quads_list.sort()

170

I .

 

 

 

 

40

41 quads_list_length = len(quads_list)
42 for i in range(quads__list_length):
43 #for i in range(l):

44

45 #get the ﬂow_error raster

46 ﬂow_error_raster = workspace + "\\" + quads_list[i] + "\\" + quads_list[i] +
"_ﬂow_error.tif"

47

48 if os.path.exists(ﬂow_error_raster):

49

50 #read the raster into an array

51 ﬂow_error_ary = raster_analysis.raster2array(ﬂow_error_raster)

52

53 #append non NoData (i.e. -9999) results to a list

54 error_values_list = []

55 for j in range(len(ﬂow_error_ary)):

56 for k in range(len(ﬂow_error_ary[j])):

57 if ﬁow_error_ary[j,k] != -9999.0:

58 error_values_list.append(ﬂow_error_ary[j,k])

59

60 del ﬂow_error_ary

61

62 #convert the list back to an array for stat analysis

63 error_values_ary = numpy.array(error_values__list)

64 error_values_list.sort()

65 median = error_values_list[int(numpy.round(len(error_values_list) / 2)) - l]

66

67 #capture the mean and standard deviation of the ary

68 mean = error_values_ary.mean()

69 stdev = error_values_ary.std()

70 del error_values_ary, error_values_list

71

72 #record the stats to the text ﬁle

73 output_text_ﬁle.write(quads_list[i] + "l" + str(mean) + "l" + str(stdev) + "I" +
str(median) + Is)

74

75 print 'Processed ' + quads_list[i] + "(" + str(i + l) + " of " + str(quads_list_length) + ")"

76

77

78 output_text_ﬁle.close()
79 del output_text_ﬁle, quads_list

171

 

 

REFERENCES

172

REFERENCES

 

Aziz, S. A.; Steward, Brian L. (2007). Development of Agricultural Field
DEM Using Repeated GPS Measurements from Field Operations: Effects of
Sampling Intensity and Pattern. ASABE paper No. 071089. St. Joseph, Mich.:
ASABE.

Barber, C. P.; Shortridge, A. (2005). Lidar Elevation Data for Surface
Hydrologic Modeling: Resolution and Representation Issues. Cartography and
Geographic Information Science 32.4, pp. 401-410.

Burrough, P.; vanGans, P.F.M; MacMillan, RA. (2000). High resolution landform
classiﬁcation using fuzzy k-means. Journal of Fuzzy Sets and Systems 113, pp. 37-
52.

Carrara, A.; Bitelli, G.; Carla, R. (1997). Comparison of techniques for generating
digital terrain models from contour lines. International Journal of Geographic
Information Science I I , pp. 451-473.

Carson, W.; Reutebuch, S. (1997). A Rigorous Test of the Accuracy of USGS
Digital Elevation Models in Forested Areas of Oregon and Washington.
ACSM/ASPRS Annual Convention & Exposition Technical Papers. April 7-10,
1997.

Carter, J. R. (1988). Digital representations of topographic surfaces. Photogrammetric
Engineering and Remote Sensing 54, pp. 1577-1580.

Carter, J. R. (1992). The effect of data precision on the calculation of slope and aspect
using gridded DEMs. Cartographica 29.1, pp. 22-34.

Chang, K.; Tsai, B. (1991). The Effect of DEM Resolution on Slope and

Aspect Mapping. Cartography and Geographic Information Systems 18.], pp.
69-77.

Clarke, K.C., Lee, SJ. (2007). Spatial Resolution and Algorithm Choice as
Modiﬁers of Downslope Flow Computed from Digital Elevation Models.
Cartography and Geographic Information Science 34. 3, pp. 215-230.

Daratech Inc. (2009, August 20). Press Release — ‘GIS/Geospatial Industry Worldwide
Growth Slows to 1% in 2009.” Directions Magazine. Retrieved February 16, 2010,
from Directions Magazine:

http://www.directionsmag.com/press.releases/?duty=Show&id=3 63 1 8.

173

Deng, Y.; Wilson, J .; Bauer, B. (2007). DEM resolution dependencies of terrain
attributes across a landscape. International Journal of Geographical Information
Science 21.2, pp. 187-213.

Desmet, P. (1997). Effects of Interpolation Errors on the Analysis of DEMs. Earth
Surface Processes and Landforms 22, pp. 563-5 80.

Digital Cartographic Data Standards Task Force. (1998). A proposed standard for digital
cartographic data. The American Cartographer 15.1, pp. 129 — 136.

Endreny, T.A.; Wood, ER (2001). Representing elevation uncertainty in runoff
modeling and ﬂowpath mapping. Hydrological Processes 15, pp. 2223-2236.

Erskine, R.H.; Green, T.R.; Ramiriez, J.A.; MacDonald, L.H. (2007). Digital
Elevation Accuracy and Grid Cell Size: Effects on Estimated Terrain Attributes.
Soil Science Society of America Journal 71. 4, pp. 1371-1380.

Faintich, M. (1996). Digital Elevation Models. Cellular Business September, pp. 46-5 8.

Fisher, P. (1993). Algorithm and implementation uncertainty in viewshed analysis.
International Journal of Geographical Information Science 7. 4, pp. 331-347.

Fisher, P. (1998). Improved Modeling of Elevation Error with Geostatistics.
Geolnformatica 2.3, pp. 215-233.

Fisher, P.; Tate, N. (2006). Causes and consequences of error in digital
elevation models. Progress in Physical Geography 30. 4, pp. 467-489.

Florinsky, I. (1998). Accuracy of local topographic variables derived from digital
elevation models. International Journal of Geographical Information Science 12.],
pp. 47-61.

Freeman, T. (1991). Calculating Catchment Area With Divergent Flow Based on a
Regular Grid. Computers & Geosciences 1 7.3, pp. 413-422.

Garbrecht, J ., Starks, P. (1995). Note on the Use of USGS Level 1 7.5-Minute DEM
Coverages for Landscape Drainage Analyses. Photogrammetric Engineering & Remote
Sensing 61.5, pp. 519-522. '

Goodchild, M.; Guoqing, S.; Shiren, Y. (1992). Development and test of an error
model for categorical data. International Journal of Geographic Information
Science 6.2, pp. 87-104.

Greenlee, D. (1987). Raster and Vector Processing for Scanned Linework.
Photogrammetric Engineering and Remote Sensing 53.10, pp. 1383-1387.

174

Heuvelink, G. (1998). Error Propagation in Environmental Modelling. Bristol, PA:
Taylor and Francis, Inc.

Holmes, K.W.; Chadwick, O.A.; Kyriakidis, PC. (2000). Error in a USGS 30-meter
digital elevation model and its impact on terrain modeling. Journal of Hydrology
233, pp. 154-173.

Homer, C.; Dewitz, J .; Fry, J .; Coan, M.; Hossain, N.; Larson, C.; Herold, N.; McKerrow,
A.; Van Driel, J .; Wickham, J. (2007). Completion of the 2001 National Land
Cover Database for the Cotenninous United States. Photogrammetric Engineering
and Remote Sensing April, pp. 829-840.

Hutchinson, M. (1989). A New Procedure for Gridding Elevation and Stream Line Data
with Automatic Removal of Spurious Pits. Journal of Hydrology 106, pp. 211-232.

James, L.; Watson, D.; Hansen, W. (2007). Using LiDAR data to map gullies and
headwater streams under forest canopy. CA TENA 71.], pp. 132-144.

Jenson, S.; Domingue, J. (1988). Extracting Topographic Structure from Digital
Elevation Data for Geographic Information System Analysis. Photogrammetric
Engineering and Remote Sensing 54.11, pp. 1593-1600.

Jones, K.H. (1998). A Comparison of Algorithms used to Compute Hill Slope as a
Property of the DEM. Computers & Geosciences 24.4, pp. 315-323.

Kienzle, S. (2004). The Effect of DEM Raster Resolution on First Order, Second Order
and Compound Terrain Derivatives. Transactions in GIS 8.], pp. 83-111.

Kyriakidis, P.C.; Shortridge, A.M.; Goodchild, M.F. (1999). Geostatistics for
conﬂation and accuracy assessment of digital elevation models. International
Journal of Geographical Information Science 13.7, pp. 677-707.

Leckie, D.; Cloney, E.; Jay, C. (2005). Automated mapping of stream features with high-
resolution multispectrai imagery: An example of the capabilities. Photogrammetric
Engineering and Remote Sensing 71.2, pp. 145-155.

MacMillan, R.; Martin, T.; Earle, T.; McNabb, D. (2003). Automated analysis and
classiﬁcation of landforms using high-resolution digital elevation data: applications
and issues. Canadian Journal of Remote Sensing 29.5, pp. 592-606.

Martinoni, D., Bernhard, L. (1998). A conceptual framework for reliable digital terrain

modelling. In: Proceedings of the Eighth Symposium on Spatial Data Handling,
Vancouver, Canada, pp. 737-750.

175

 

 

Mason, D.; Scott, T.; Wang, H. (2006). Extraction of tidal channel networks from
airborne scanning laser altimetry. ISPRS Journal of Photogrammetry and Remote
Sensing 61.2, pp. 67-83.

Meyer, TH. (2004). The Discontinuous Nature of Kriging Interpolation for Digital
Terrain Modeling. Cartography and Geographic Information Science 31. 4, pp.
209-216.

Mitasova, H.; Hoﬁerka, J .; Zlocha, M.; Iverson, L. (1995). Modeling Topographic
Potential for Erosion and Deposition Using GIS. International Journal of GIS
January, pp. 1-19.

Mouton, A. (2005). Generating Stream Maps Using LiDAR Derived Digital Elevation
Models and 10-m USGS DEM. Thesis. Washington State University.

O’Callaghan, J .F., Mark, D.M. (1984). The Extraction of Drainage Networks From
Digital Elevation Data. Computer Vision, Graphics and Image Processing 28, pp.
328-344.

Ohio Ofﬁce of Information Technology, GIS Support Center. (2005). Ohio IO-meter
Digital Elevation Model (F GDC) / oh_dem10b (ISO). Retrieved July 17, 2008,

from
http://metadataexplorer.gis.state.oh.us/metadataexplorer/ﬁrll_metadata. j sp?docId=
{DO820264-6E96-4E1A-86OF-F89F42C3 9F 64}&loggedln=false.

Oksanen, J .; Tapani, S. (2005). Error propagation of DEM-based surface
derivatives. Computers and Geosciences 31, pp. 1015-1027.

Ouyang, D.; Bartholic, J.; Selegean, J. (2005). Assessing Sediment Loading from
Agricultural Croplands in the Great Lakes Basin. Journal of American Science 1.2,
pp. 14-21.

Quinn, P.; Beven, K.; Chevallier, P.; Planchon, O. (1991). The Prediction of Hillslope
Flowpaths for Distributed Hydrological Modelling Using Digital Terrain Models.
Hydrological Processes 5, pp. 59-80.

Raaﬂaub, L.D.; Collins, M.J. (2006). The effect of error in gridded digital elevation
models on the estimation of topographic parameters. Environmental Modelling
and Software 21, pp. 710-732.

Riggs, P.D.; Dean, DJ. (2007). An Investigation into the Causes of Errors and

Inconsistencies in Predicted Viewsheds. Transactions in GIS 11.2, pp. 175-
196.

176

 

 

Saunders, W. (1999). Preparations of DEMS for use in Environmental Modeling
Analysis. 1999 ESRI User Conference. San Diego, CA, July 24—30, 1999.
http:l/proceedings.esri.com/library/userconf/proc99/proceed/papers/pap802/p802.
Htrn.

Schmidt, F.; Persson, A. (2003). Comparison of DEM Data Capture and Topographic
Wetness Indices. Precision Agriculture 4, p. 179-192.

Shortridge, A. (2006). Shuttle Radar Topography Mission Elevation Data Error and Its
Relationship to Land Cover. Cartography and Geographic Information Science
33.1, pp. 65-75.

Skidmore, AK. (1989). A comparison of techniques for calculating gradient and aspect
from a gridded digital elevation model. International Journal of Geographical
Information Systems 2.4, pp. 323-334.

Soille, P.; Vogt, J .; Colombo, R. (2003). Carving and adaptive drainage enforcement of
grid digital elevation models.” Water Resources Research 39.12, pp. 10-1 to 10-3.

Soille, P. (2004a). Morphological carving. Pattern Recognition Letters 25, pp. 543-
550.

Soille, P. (2004b). Optimal removal of spurious pits in grid digital elevation models.
Water Resources Research 40.12.

Tarboton, D.G. (1991). On the Extraction of Channel Networks from Digital Elevation
Data. Hydrological Processes 5, pp. 81-100.

Tarboton, D.G. (1997). A New Method for the Determination of Flow Directions and
Upslope Areas in Grid Digital Elevation Models. Water Resources Research 33.2,
pp. 3 09-3 19.

Thompson, J.A.; Bell, J .C.; Butler, CA. (2001). Digital elevation model resolution:
effects on terrain attribute calculation and quantitative soil-landscape modeling.
Geoderma I, pp. 67-89.

US EPA. (2007, July 17). Analytical Tools Interface for Landscape Assessments
(ATtILA). Environmental Protection Agency Website. Retrieved August 2009,
from http://www.epa. gov/esd/land-sci/atti1a/index.htm.

USGS. (2006, August). National Elevation Dataset. United States Geological Survey.
Retrieved March 2009, ﬁ'om http://ned.usgs.gov.

USGS. (1998, January). Standards for Digital Elevation Models. Rocky Mountain

Mapping Center. Retrieved March, 2009, from
http://rockyweb.cr.usgs.gov/nmpstds/demstds.html.

177

 

USGS. (1999, September). Standards for Digital Line Graphs. Rocky Mountain
Mapping Center. Retrieved March 2009, from
http://rmmcweb.cr.usgs.gov/nmpstds/acrodocs/dlg-3/2d1g0999.pdf.

Venteris, E.R.; Slater, B.K. (2005). A Comparison Between Contour Elevation Data
Sources for DEM Creation and Soil Carbon Prediction, Coshocton, Ohio.
Transactions in GIS 9.2, pp. 179-198.

Vogt, J .; Colombo, R.; Bertolo, F. (2003). Deriving drainage networks and catchment
boundaries: a new methodology combining digital elevation data and environmental
characteristics. Geomorphology 53, pp. 281-298.

Walker, J .P., Willgoose, GR. (1999). On the effect of digital elevation model accuracy
on hydrology and geomorphology. Water Resources Research 53.7, pp. 2259-2268.

Wilson, J .P.; Gallant, J.C. (2000). Terrain Analysis. New York: John Wiley and Sons,
Inc.

Wilson, J .P.; Lam, C.S.; Deng, Y. (2007) Comparison of the performance of ﬂow-
routing algorithms used in GIS-based hydrologic analysis. Hydrological Processes
February, pp. 1026-1044.

Wise, S. (1998). The Effect of GIS Interpolation Errors on the Use of Digital
Elevation Models in Geomorphology, In: Landform Monitoring. Modellingand
Analysis, Edited by S. N. Lane, K. S. Richards and J. H. Chandler, John Wiley and
Sons, 300 pp.

Wise, S. (2000). Assessing the quality for hydrological applications of digital elevation
models derived from contours. Hydrological Processes 14, pp. 1909-1929.

Wise, S. (2007). Effect of differing DEM creation methods on the results from a
hydrological model. Computers & Geosciences 33, pp. 1351-1365.

Wolock, D.M.; McCabe, G.J. Differences in topographic characteristics computed from
100- and lOOO-m resolution digital elevation model data. Hydrological Processes
14, pp. 987-1002.

Woolpert Inc. (2007, October 11). OSIP DEM Tiles by County — GRID Format (FGDC)
/N1440375 (ISO). OGRIP Website. Retrieved July 17, 2008, from
http://metadataexplorer.gis.state.oh.us/metadataexplorer/full_metadata.j sp?docId=
{6FABOC57-1A57-4142-BCF7-A3 8F 4D1 1 F CA3 }&loggedln=false.

Wu, S.; Li, J .; Huang, G. (2005). An evaluation of grid size uncertainty in empirical soil
loss modeling with digital elevation models. Environmental Modelling and
Assessment 10, pp. 33-42.

178

Wu, 8.; Li, J ., Huang, G. (2007). Modeling the effects of elevation data resolution on the
performance of topography-based watershed runoff simulation. Environmental
Modelling & Soﬂware 22, pp. 1250-1260.

Ziadat, RM. (2007). Effect of Contour Intervals and Grid Cell Size on the Accuracy
of DBMS and Slope Derivatives. Transactions in GIS. 11.], pp. 67-81.

179

         

      

UNIVERSITY

030

63 61

 

 

 

 

 

 

”11111111111111 11111111“
3 12 9 3 6 Jr