a... Jfimflm? 3&3. _ , . ‘ , . ‘ ‘ . . , . _ 2W6 . . . , flfi . , ‘ ‘ . 24V: . a a? _ $3.. 4.1... .. 3.2... .. .. at 5... Fa: a»). ... ‘ 5.5 Rh... :1 I| ..3_....«..§lu. Stuftbll 3:13....(‘1 u Ev! . t . . ti: o L‘ .ti 4: I. :4 «’y. .v . a. .w . . c v. II. “I ‘30 \ nu . |. bvv-a..u‘!.0n . it» I WIFE 5L 3’ if" 0 LIBRARY Michigan State Ui liversity This is to certify that the thesis entitled QUANTIFYING AND PREDICTING ERROR IN DEM- DERIVED FLOW-DIRECTION presented by GLENN ALEXANDER O'NEIL has been accepted towards fulfillment of the requirements for the MS. degree in Geggraphy A) f M . Major Professor’s Signfifire 4'17 , 20/ 0 Date MSU is an Affirmative Action/Equal Opportunity Employer —-.--.--.--.—-—n-a-.--.- OI-t-o-o-n-I-o-o-o-u-o-I-uI PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE FEB 2 6 2012 0830 11 5/08 KzlProjIAccaPrelelRCIDateDue.indd QUANTIFYWG AND PREDICTIN G ERROR IN DEM-DERIVED FLOW-DIRECTION By Glenn Alexander O’Neil A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Geography 2010 ABSTRACT QUANTIFYING AND PREDICTING ERROR IN DEM-DERIVED FLOW-DIRECTION By Glenn Alexander O’Neil In this thesis, I present a new method for quantifying errors in flow-direction rasters derived from USGS 10-meter digital elevation models (DEMs). The method utilizes flow-direction rasters derived from finer resolution 2.5-foot LiDAR DEMs as a reference dataset, and performs cell-by-cell comparisons between the two datasets to quantify error in the USGS product. I applied this method to a sample of 80 7 5-minute USGS quadrangles, stratified by land cover and topography, across Ohio. I employed quadrangle-level measures of agricultural concentration, total relief, and contour topology error as independent variables in a multiple linear regression model to predict mean flow-direction error. Copyright by GLENN ALEXANDER O’NEIL 2010 DEDICATION I dedicate this thesis to my wife Jamie and son Rowan. Though this work cannot measure my appreciation for their love and support, I would be remiss not to acknowledge my greatest sources for inspiration. iv ACKNOWLEDGMENTS I would not have been able to complete this thesis without the steadfast support and advice offered by my advisor, Ashton Shortridge. His insightful feedback helped me resolve the theoretical and technical hurdles this research presented. I would also not have been able to complete this thesis without the support of Jon Bartholic of the Institute of Water Research. His flexibility as a director provided me with the time and resources needed to generate and analyze the detailed and extensive datasets of this research. TABLE OF CONTENTS LIST OF TABLES ........................................................................................................... viii LIST OF FIGURES ........................................................................................................... ix KEY TO SYMBOLS AND ABBREVIATIONS ............................................................ xiv CHAPTER 1 INTRODUCTION ............................................................................................................... l 1.1 Statement of Problem ................................................................................................. 1 1.2 Digital Elevation Models and Flow-direction ............................................................ 3 1.3 Flow-direction Error ................................................................................................... 4 CHAPTER 2 BACKGROUND AND HYPOTHESES ............................................................................. 6 2.1 Background ................................................................................................................ 6 2.2 Hypotheses ............................................................................................................... 10 CHAPTER 3 METHODS — THE SAMPLE AND INDEPENDENT VARIABLES .............................. 13 3.1 The Sample — Stratified by Agricultural Concentration and Relief ......................... 14 3.2 Contour Topology Error ........................................................................................... 19 CHAPTER 4 METHODS - CALCULATING THE DEPENDENT VARIABLE: FLOW-DIRECTION ERROR .............................................................................................................................. 26 4.1 Software and Hardware ............................................................................................ 26 4.2 Data Acquisition ....................................................................................................... 28 4.3 IO-meter DEM Processing ....................................................................................... 29 4.4 LiDAR DEM Processing .......................................................................................... 31 4.4.1 LiDAR Stream Identification ........................................................................ 34 4.4.2 Carving Through Artificial Barriers ............................................................. 45 4.4.3 Implementing Stream Identification and Carving in LiDAR Processing ..... 48 4.4.4 Stream-burning and Sink-filling LiDAR DEMs ........................................... 50 4.5 Flow-direction Error ................................................................................................. 52 CHAPTER 5 RESULTS AND ANALYSIS ............................................................................................ 65 vi 5.1 Independent t-tests .................................................................................................... 66 5.2 Correlations .............................................................................................................. 69 5.3 Spatial Regression .................................................................................................... 71 CHAPTER 6 CONCLUSION .................................................................................................................. 80 6.1 Summary .................................................................................................................. 80 6.2 Significance .............................................................................................................. 81 6.3 Future Research ........................................................................................................ 84 APPENDICES Appendix A - Model Diagrams ..................................................................................... 88 Al Diagram Legend .............................................................................................. 88 A2 Pre-processing ................................................................................................. 89 A3 Post-processing ............................................................................................... 90 Appendix B — Python Code ............................................................................................ 91 B.1 raster_analysis.py ............................................................................................ 91 B.2 analysispreppy ................................................................................................ 95 B.3 stream_dlg_conversion.py ............................................................................. 112 B4 strearn_id_nhood.py ...................................................................................... 114 B5 stream_id_transect.py .................................................................................... 129 3.6 carve.py ......................................................................................................... 136 B7 flow_error.py ................................................................................................. 148 3.8 contour_dlg_conversion.py ........................................................................... 162 B9 contour_topology_analysis.py ....................................................................... 164 8.10 analysis_results.py ....................................................................................... 170 REFERENCES ................................................................................................................ 172 vii TABLE 3-1: TABLE 4-1: TABLE 5-1: TABLE 5-2: TABLE 5-3: TABLE 5-4: LIST OF TABLES Breakdown of quadrangles sampled by relief and % agriculture ................ 17 Quadrangles where LiDAR stream identification methods were tested ...... 39 Flow-direction error results for various sampling criteria combinations ....67 Independent t-tests between sampling criteria ............................................. 69 Correlations between flow-direction error and other variables ................... 70 Alternative linear regression models, and measures of performance .......... 79 viii LIST OF FIGURES Images in this thesis are presented in color FIGURE 3-1: Ohio 7.5 minutes quadrangles by agricultural concentration. .................. 16 FIGURE 3-2: Ohio 7.5 minute quadrangles by total relief .............................................. 17 FIGURE 3-3: Sample 7.5 minute quadrangles. ............................................................... 18 FIGURE 3-4: Sample contours from a 7.5 Minute quadrangle. Note the topological errors - dangles at l and intersections at 2-5 ..................................................................... 20 FIGURE 3-5: Corrected contours. Though it is difficult to tell, the intersecting contours have been manually separated through digitizing ............................................................. 20 FIGURE 3-6: DEM derived from topologically incorrect contours. ............................... 21 FIGURE 3—7: DEM derived from corrected contours. .................................................... 21 FIGURE 3-8: Surface flow concentrations (flow-accumulation) derived from topologically incorrect contours. ...................................................................................... 22 FIGURE 3-9: Surface flow concentrations derived fiom corrected contours. ................ 22 FIGURE 3-10: F low-direction error calculated for the topologically incorrect DEM, with the corrected DEM as the reference. Error was greatest around the intersection points. 23 FIGURE 3-11: Contours 50, 51, and 52 all have Node 100 as their ToNode, indicating an intersection at that node. .............................................................................................. 24 FIGURE 4-1: Burning streams into a DEM. ................................................................... 30 FIGURE 4-2: Deriving flow-direction with D8 (graphic source: ESRI ArcGIS 9.3 Help files) .................................................................................................................................. 30 FIGURE 4-3: LiDAR tiles selected within a 7.5 minute quadrangle for pre-processing. 32 ix FIGURE 4-4: Left - aerial photograph. Right — LiDAR DEM and DLG stream feature in blue. DLG stream vector fails to align with LiDAR's drainage ditch due to differences in positional accuracy. ...................................................................................................... 33 FIGURE 4-5: Cell positions evaluated by the Neighborhood Method. C = center, M = max, 0 = opposite max, R1 = perpendicular 1, R2 = perpendicular 2. ............................ 37 FIGURE 4-6: Transect analysis cells (in black); S = starter_cell, O = south_cell, M = min_cell, X = max_cell_south_of_min. ........................................................................... 39 FIGURE 4-7: Quadrangles where the LiDAR stream identification methods were tested. Numbers refer to Site # in Table 4-1. ............................................................................... 40 FIGURE 4-8: Site 1. Left — aerial; Left Center — LiDAR DEM; Right Center — streams identified by the Neighborhood Method; Right — by the Transect Method ...................... 40 FIGURE 4-9: Site 2, same descriptions as FIGURE 4-8. ................................................ 40 FIGURE 4-10: Site 3, same descriptions as FIGURE 4-8. .............................................. 41 FIGURE 4-11: Site 4, same descriptions as FIGURE 4-8 ................... 41 FIGURE 4-12: Site 5, same descriptions as FIGURE 4-8. .............................................. 41 FIGURE 4-13: Site 6, same descriptions as FIGURE 4-8. .............................................. 41 FIGURE 4-14: Site 7, same descriptions as FIGURE 4-8. .............................................. 42 FIGURE 4-15: Site 8, same descriptions as FIGURE 4-8. .............................................. 42 FIGURE 4-16: Site 9, same descriptions as FIGURE 4-8. .............................................. 42 FIGURE 4-17: Site 10, same descriptions as FIGURE 4-8. ............................................ 42 FIGURE 4-18: Left - rough edges of Neighborhood Method stream cells. Right — smoother edges of Transect Method. ................................................................................ 45 FIGURE 4-19: Lefi - the barrier in the DEM. Center — sinks in stream identified (in black), lower elevation found (in yellow), path charted (in blue). Right — carved DEM with elevation values incrementally lowered along path. ................................................. 46 x FIGURE 4-20: Carving freed up flow within stream features. Left - drainage ditch in pre-carved DEM. Right - carved drainage ditch. ............................................................. 47 FIGURE 4-21: Carving resolved some barriers (created north-south connection) but also introduced potential errors (N W—SE connection, N-S connection at right). ..................... 47 FIGURE 4-22: The Neighborhood Method with a minimum elevation difference of 3 feet performed well at identifying stream cells in the LiDAR DEM for areas of high relief and little to no agriculture. ................................................................................................ 49 FIGURE 4-23: A combination of the Neighborhood and Transect methods performed best in flatter areas dominated by agriculture. .................................................................. 49 FIGURE 4-24: Left - the fill process created flat features in the LiDAR DEM. Right - Flow accumulation derived from the filled LiDAR DEM's flow-direction indicate that the flat features did not dramatically alter the flow-direction. ............................................... 51 FIGURE 4-25: 3x3 neighborhood of lO-meter flow-direction cells. Cell to be analyzed for error at center. ............................................................................................................. 53 FIGURE 4-26: LiDAR flow-direction cells intersecting the selected lO-meter cell. ...... 54 FIGURE 4-27: 0% of center LiDAR cells drain through the northwest edges of the neighborhood. ................................................................................................................... 56 FIGURE 4-28: 0% of center LiDAR cells drain through the north edges of the neighborhood. ................................................................................................................... 56 FIGURE 4-29: 89% of center LiDAR cells drain through the northeast edges of the neighborhood. ................................................................................................................... 57 FIGURE 4-30: 0% of center LiDAR cells drain through the east edges of the neighborhood. ................................................................................................................... 57 FIGURE 4-31: 0% of center LiDAR cells drain through the southeast edges of the neighborhood. ................................................................................................................... 57 FIGURE 4-32: 0% of center LiDAR cells drain through the south edges of the neighborhood. ................................................................................................................... 58 xi FIGURE 4-33: 0% of center LiDAR cells drain through the southwest edges of the neighborhood. ................................................................................................................... 58 FIGURE 4-34: 11% percent of center LiDAR cells drain through the west edges of the neighborhood. ................................................................................................................... 58 FIGURE 4-35: Potential flow error scores for assessment of lO-meter cell that flows southwest ........................................................................................................................... 60 FIGURE 4-36: Flow-direction error map for a single 7.5 minute quadrangle. Blue grid lines are the result of avoiding edge issues in each LiDAR tile. ...................................... 62 FIGURE 4-37: For the selected lO-meter cell, estimated flow-direction is to the south. 63 FIGURE 4-38: For the selected lO-meter cell, estimated LiDAR flow-direction trends mainly east and northeast. ................................................................................................. 63 FIGURE 4-39: The flow-direction error is considered high, with a value of 2.98. ......... 63 FIGURE 4-40: Flow-direction error map with flat cells discarded. ................................ 64 FIGURE 5-1: Spatial distribution of flow-direction error amongst the sampled quadrangles. ...................................................................................................................... 66 FIGURE 5-2: Model 1 - initial linear regression model of flow-direction error. ............ 71 FIGURE 5-3: Histograms of regression variables from Model 1. ................................... 72 FIGURE 5-4: Model 1 diagnostics. ................................................................................. 73 FIGURE 5-5: Mapped residuals of Model 1 .................................................................... 74 FIGURE 5-6: Residuals of Model 1, greater residual error of higher fitted values indicates potential heteroscedasticity. ............................................................................... 74 FIGURE 5-7: Residuals of Model 1, corrected for spatial autocorrelation, no apparent heteroscedasticity. ............................................................................................................. 76 FIGURE 5-8: Mapped residuals, corrected for spatial autocorrelation. .......................... 76 xii FIGURE 5-9: Model 1 diagnostics, corrected for residual spatial autocorrelation. ........ 77 FIGURE A-l: Processing Diagrams - Diagram Legend .................................................. 88 FIGURE A-2: Processing Diagrams - Pre-processing Steps. .......................................... 89 FIGURE A-3: Processing Diagrams - Post-processing Steps .......................................... 9O xiii KEY TO SYMBOLS AND ABBREVIATIONS ASCII — American Standard Code for Information Interchange ATtILA — Analytical Tools Interface for Landscape Assessments D8 — single direction flow-direction algorithm DEM — digital elevation model DLG - digital line graph EPA — Environmental Protection Agency ESRI — Environmental Systems Research Institute F DE -— F low-direction error GDAL — Geospatial Data Abstraction Library GIS — geographical information systems/science HRHA — high relief high agricultural concentration HRLA — high relief low agricultural concentration LiDAR -— Light Detection and Ranging LRHA — low relief high agricultural concentration LRLA — low relief low agricultural concentration NED — National Elevation Dataset NHD — National Hydrography Dataset NLCD — National Land Cover Dataset OGRIP — Ohio Geographically Referenced lnforrnation Program OSIP — Ohio State-wide Imagery Program RMSE — root mean square error xiv SAR — simultaneous auto-regression USGS — United States Geological Survey UTM — Universal Transverse Mercator XV Chapter 1 Introduction 1.1 Statement of Problem In the field of hydrology, digital elevation models (DEMs) are frequently used to predict surface water flow direction and derive digital stream networks. The accuracy of these predictions and networks relies on the fidelity of the source DEMs. In flatter areas inaccuracies in a DEM can be more pronounced, particularly if the interpolation of contours with whole number values (e. g. 315 versus 315.7 feet) was employed to build the DEM, which is the standard of most DEMs produced by the United States Geological Survey (USGS) (Carter 1988, Carter 1992, Wilson and Gallant 2000). Agricultural areas are also prone to DEM inaccuracy. The errors in both flat and agricultural areas are typically due to issues with the source contours that yield a DEM. In flat areas, sparse contours provide too little information to the interpolating algorithm that produces the DEM. In agricultural areas, the source contours can contain topological errors, such as intersections, around drainage ditches that confuse the interpolation algorithm. The results are digital surfaces that do not truly reflect the Earth’s surface and therefore inaccurately predict surface water flow. These errors can pose a significant problem for researchers and conservationists who rely on models of surface water flow-direCtion, such as soil conservationists concerned with water-borne soil erosion and nutrient runoff from agricultural lands into streams and lakes. If they cannot reference accurate predictions of how water flows over the surface, then efforts to identify and remediate sedimentation-prone areas will be misguided and waste valuable time and resources. There is a rich literature documenting absolute elevation error in DEMs, but research assessing error in DEM-derived flow-direction is lacking. In this thesis I will help fill this gap by exploring the following questions: 1. How can finer-resolution DEMs be employed to evaluate flow-direction error in coarser DEMs? 2. What are the primary challenges in such an evaluation? 3. What landscape characteristics best predict flow-direction error? To address these questions, I developed a method for calculating flow-direction error, and implemented it across a large sample of USGS 7.5-minute quadrangles in Ohio. I then conducted a multivariate analysis to predict flow-direction error in terms of landscape characteristics, accounting for residual spatial autocorrelation. This research contributes to the understanding of uncertainty in digital elevation products, particularly how error propagates into flow-direction. The methods I developed also aid in the analysis of high-resolution DEMs, specifically in the extraction of hydrologic features from such data. At a more practical level, these results inform hydrological analyses in engineering, agriculture, and any other disciplines that rely on simulations of surface water flow. For example, with a better understanding of erosion model uncertainty, soil conservationists can more effectively target efforts to reduce sediment loading to streams and lakes. The remainder of this introduction will preview the datasets and methods employed in this research and layout the organization of the thesis. 1.2 Digital Elevation Models and Flow-direction A DEM is a raster (grid) dataset of elevation values sampled at regularly spaced intervals (resolution). They can be produced in a number of ways, but most published products have been generated through interpolation from topographic sheet contours, remote sensing from airplanes or satellites, or stereoscopic interpretation of aerial photographs. For this thesis, I evaluated lO-meter resolution USGS DEMs, produced through contour interpolation, with 2.5-foot resolution LiDAR (Light Detection and Ranging) DEMs, produced through remote sensing. The USGS DEMs were interpolated from contours of USGS topographic maps at a scale of 1:24,000. The LiDAR DEMs were interpolated from points captured by an airplane-mounted sensor at an altitude of 7,300 feet above ground level. The USGS is the leading source for DEMs in the US. The freely accessible National Elevation Dataset (NED) includes 30-meter resolution DEMs for the contiguous US, lO-meter DEMs for the majority of the nation, and finer resolutions in select locations. Not all of the DEMs in the NED were produced by the USGS. The NED is described as “the best available raster elevation data for the coterrninous United States,” (U SGS 2006). If DEMs meeting National Map Accuracy Standards and of resolutions finer than those produced by the USGS exist, they may be included in the NED. This situation applied to the study site of this thesis. Ohio’s 2.5-foot resolution LiDAR DEMs are accessible through the NED, even though they were generated for the state by a private contractor. The organization of DEMs as neighborhoods of elevation values allows for the calculation of numerous derivative datasets, including slope, aspect, line of sight, and surface water flow-direction. This last derivative is the horizontal direction of surface water flow out of a particular area. In the case of flow-direction derived from a DEM, it is calculated for each grid cell in the original DEM, and can be determined through multiple methods. Some approaches represent flow-direction as an angle and disperse it in multiple directions (Quinn et a1. 1991, Freeman 1991 , Tarboton 1997). Other approaches are more restrictive and represent it as a single value assigned to one of eight compass directions (O’Callaghan and. Mark 1984, Greenlee 1987, Jenson and Domingue 1988). Nearly all the methods are driven by calculating slope angles between a particular cell and its surrounding neighbors. For this research, I employed a single-direction method limited by the eight compass directions. I will discuss this selection in greater detail in Chapter 5. 1.3 Flow-direction Error I developed a novel method to calculate the difference in flow-direction between lO-meter USGS DBMS and 25-foot LiDAR DEMs. This method compared a flow- direction value of a USGS DEM cell to an aggregated value from LiDAR DEM cells contained within the coarser USGS cell. I coded the comparison into a weighted error score based on the extent of agreement (or disagreement) between the two values. I then determined an error score for the majority of USGS DEM cells in eighty 7.5-minute quadrangles in Ohio (roughly 1 million cells per quadrangle). I then calculated the mean flow-direction error for each quadrangle, and used this statistic as the dependent variable in a multiple linear regression model. I sought to explain how flow-direction error varied in relation to changes in total relief, agricultural concentration, and contour topology error. The next chapter will discuss previous efforts to evaluate DBMS and their derivatives, and establish this thesis as a new and unique contribution to that effort. Chapter 3 will describe how I gathered and processed the independent variables of the proposed multiple linear regression model. Chapter 4 will describe the pre-processing steps of the USGS lO-meter and LiDAR DEMs, including a new approach for stream extraction from high-resolution digital elevation data, and the calculation of flow- direction error. Chapter 5 will describe statistical measures of how flow-direction error varied across different landscapes, provide an evaluation of the proposed and alternative regression models, and present an analysis of the results. Chapter 6 will summarize the findings, discuss their significance, and describe additional research needed to extend this effort. Chapter 2 Background and Hypotheses 2.1 Background Due to the numerous applications that DEMs support, their ubiquity, and the fact that they are freely accessible, DEM error has been a well studied topic within geographic information science. Wilson and Gallant (2000) and Fisher and Tate (2006) provide thorough overviews of existing research on this topic. The USGS classifies error in its DEMs as either blunders (major elevation discrepancies), systematic (errors that follow a fixed pattern and are predictable), and random (beyond the control of the observer). It acknowledges that the errors cannot be completely resolved (USGS 1998). In addition to these errors, numerous studies have shown error in DBMS and their derivatives to be spatially autocorrelated (Goodchild et al. 1992, Fisher 1993, Heuvelink 1998, Oksanen and Sarjakoski 2005). To inform users of the uncertainty in the data, the USGS publishes each DEM with estimates of its root—mean-square error (RMSE), a measure of vertical accuracy. RMSE is calculated by comparing a minimum of 28 ground-truth elevations to those of the DEM. USGS standards for most DEMs hold RMSE to no more than one-half of a contour interval (U SGS 1998). Most of the contour datasets used to produce USGS DEMs have intervals of 5 or 10 feet, implying maximum RMSE values of 2.5 or five feet. Fisher (1998) argued that in addition to RMSE, DEMs should be published with measures of spatial autocorrelation to allow users to appropriately visualize and simulate the error and its propagation into derivative products. Knowing the RMSE for a DEM is useful to researchers concerned with absolute elevation accuracy, such as Carson et a1. (1997) and Shortridge (2006), to name a few. However, absolute elevation accuracy is not necessarily informative to users studying DEM derivative products such as slope, curvature, catchment basins, aspect, and flow- direction. Even DEMs with low RMSEs can yield significantly variable derivatives (Garbrecht and Starks 1995, Wise 1998, Holmes et a1. 2000, Endreny and Wood 2001 , Barber and Shortridge 2005). Elevations for DEMs of particularly flat regions of Ohio can have value ranges less than 50 feet over areas as large as 64 square miles. These estimated elevations may be well within the RMSE standards; but derivative products dependent upon an elevation’s relationship to neighboring values, such as slope, can be greatly affected by even subtle changes in elevation. Martinoni and Bernhard (1998) argued that the accuracy of DEM derivatives warrant greater attention. Ziadat (2007) studied the impact of cell-size and contour interval on DEM-derived slope, in addition to absolute elevation values. Venteris and Slater (2005) evaluated multiple DEM derivatives in terms of their ability to predict soil carbon. Wu et a1. (2005, 2007) evaluated the effects of DEM resolution on the estimation of soil erosion. Riggs and Dean (2007) explored errors in DEM-derived view-sheds. Desmet (1997) evaluated differences in slope and aspect across different interpolation methods, through mean and standard deviation comparisons, on small, arable fields in Belgium. Chang and Tsai (1991) studied slope and aspect differences between different DEM resolutions in a more spatially explicit manner, on a 1 square kilometer region of Taiwan’s eastern coast. Other studies have looked at DEM predictions of hydrology. Endreny and Wood (2001) studied differences between flow-direction methods in terms of predicted flow- paths on three 64 hectare study sites in western Oklahoma. They compared simulated flow-path networks to control networks and recorded the percent of overlap. Walker and Willgoose (1999) demonstrated the inaccuracies in derivatives of published DEMs through quantitative comparisons of up-stream flow-accumulation networks, slope-area relationship, and normalized width function within a 1.4 square kilometer surveyed reference dataset in southwest Australia. Wise (2000) similarly evaluated DEMs through catchment basins and up-stream flow accumulation, though he focused on a small catchment in southwest England. Kienzle (2004) explored actual directional changes in flow measured in degrees on four study sites (roughly 6 square miles each) in Canada. Wise (2007), Wilson et a1. (2007), Barber and Shortridge (2005), and Thompson et al. (2001) studied DEM-derived surface flow through comparisons to derived catchment boundaries. Clarke and Lee (2007) evaluated DEM estimates of surface flow through a comparison to known stream features in the National Hydrography Dataset (N HD). While these analyses have demonstrated the propagation of DEM error into hydrological derivatives, gaps remain in fully documenting this issue. The previous research has either focused on relatively small and, in some cases, homogenous study sites, or employed coarse metrics. To better describe error in these derivatives and identify landscape characteristics that potentially cause it, an analysis on a large and diverse study area is needed. Such an effort would generate findings that could be better generalized to other locations than the findings of previous research. Furthermore, while evaluating simulated catchment boundaries is an accepted method of measuring error in DEM, it does not capture field-level variation in DEM-derived surface water flow- direction. To understand how errors in DEM hydrological derivatives differ at the finest scales, evaluations must be made on a cell-by-cell basis. Though Clark and Lee (2007) approach this specificity in their comparison of DEM-derived flow to NHD features, their approach would not allow for a direct comparison where stream features do not exist, such as the middle of a farm field. For this thesis, I sought to quantify and predict error in flow-direction rasters derived from USGS lO-meter DEMs (RMSE 12.45 feet —— Ohio Office of Information Technology 2005) by comparing them to flow-direction rasters derived from finer, more vertically and horizontally accurate LiDAR rasters (RMSE 0.5 feet -Woolpert Inc. 2008). The approach of evaluating less precise spatial data with more precise data has been utilized before (Chang and Tsai, 1991; Fisher, 1998; Kyriakidis et al., 1999; Walker and Willgoose, 1999; Holmes et al., 2000; Kienzle, 2004) and is endorsed by the USGS (Digital Cartographic Data Standards Task Force, 1988). Numerous studies have explored performance differences between DEMs of different resolutions and found finer resolutions superior, depending on the application (Chang and Tsai, 1991; Wolock and McCabe, 2000; Wu et al., 2005; Mouton, 2005; Wu et al., 2007; Deng et al., 2007; Aziz and Steward, 2007; Riggs and Dean, 2007). My study site comprised 80 USGS 7.5- minute quadrangles (over 5,000 square miles) of varying topography and land cover in Ohio. By focusing at such a fine scale over such a large and diverse study area, I sought to evaluate DEM error in a novel and insightful manner. 2.2 Hypotheses I expected areas with low relief, high concentrations of agriculture, and represented by DEMs produced from contours that contained large numbers of topological errors (intersections) to produce higher values of flow-direction error. More formally: Equation: FDE = a + BpAPA + BTRTR + BCICI Where: F DE = flow-direction error PA = % agriculture TR = total relief CI = contour intersections Hypotheses: H0: BPA = 0 Ha: BPA > 0 H0: BTR = 0 Ha: BTR < 0 H0: BCI = 0 Ha: BCI > 0 Figure 2-1: Proposed multiple linear regression model of flow-direction error. As established earlier, derivatives from relatively flat DEMs are highly sensitive to propagated error. Digital contour datasets are a common source for creating DEMs (F lorinsky, 1998), including all of the Ohio lO-meter DEMs employed in this research (Ohio Office of Information Technology 2005). Flat areas tend to have fewer and sparser contours as input to the interpolation algorithms that produce DEMs. The wider the geographic space between contours, the greater the uncertainty in the characterization of 10 surface water flow over that space. The interpolation process is known to introduce errors into the resulting DEMs, sometimes significantly (F aintich 1996, Carson and Reutebuch, 1997, Carrara et a1. 1997, Meyer 2004). The presence of agricultural drainage ditches can add to this error. These features tend to concentrate a flat area’s relatively few contours around them and create topological errors (intersections, specifically) in the contours, confusing the DEM interpolation algorithm. These errors are propagated into the DBMS and their derivative products. Theoretically, contours should never cross; but when digital contours are derived from scanning paper or mylar maps, such as USGS 7.5-minute digital line graph (DLG) hypsography, artifacts in the source map or scanning resolution can cause dense contours to merge together and create a topological violation (Greenlee, 1987). Additionally, cartographers may force contours to merge to indicate steep features such as cliffs, identified as carrying contours in the DLG attributes. The USGS standards for DLGs require that intersecting contours be identified through node topology (described in greater detail in Chapter 3) (U SGS, 1999). When these topologically errant contours are interpolated to yield a DEM, the elevation values around an area of intersection can be inaccurately estimated. For example, if a 700 and 710 foot contour intersect, the DEM interpolation algorithm would have difficulty accurately resolving the true surface at that location. The 700 foot contour would, in essence, be exposed to the area that cartographically represents the 710 to 720 foot elevation range, and would subsequently draw down surrounding values in the resulting DEM. In most cases, the resulting error in terms of absolute elevation values in this area would be small, as the error would be 11 mitigated by the neighboring cells that factored into the interpolation. Nevertheless, some DEM derivative products, such as surface water flow-direction, are more sensitive to this error, as discussed earlier. Given the effect that low relief, agricultural ditches, and contour intersections can have on error in interpolated DEMs, I expected that their impacts would be more substantial on the particularly error-sensitive flow-direction derivative. This expectation drove the design my hypothesis. The next chapter provides a detailed description of how the 80 quadrangles were sampled and processed to extract the dependent variables for the proposed regression model in Figure 2-1. 12 Chapter 3 Methods -— The Sample and Independent Variables The general steps I took to analyze flow-direction error in this research were as follows: 1. Select a sample of USGS 7.5-minute quadrangles, stratified by total relief and agricultural concentration. Gather USGS 10-meter DEMs, 2.5-foot LiDAR DEMs, DLG hypsography, and DLG hydrography for each quadrangle. Process the lO-meter and LiDAR DEMs by enforcing hydrography presence, filling spurious sinks, and calculating flow-direction rasters for each quadrangle. Calculate the extent of contour intersection for DLG hypsography in each quadrangle. Quantify mean flow-direction error for each quadrangle through cell-by-cell comparison of lO-meter and LiDAR flow-direction rasters. Generate a multiple linear regression model predicting flow-direction error in terms of total relief, agricultural concentration, and contour intersection. I calculated flow-direction error as a single mean value for each quadrangle in a sample of 80 USGS 7.5-minute (1 :24,000) quadrangles. I will detail this calculation in the next chapter, but for now I establish it as a single numeric value that will serve as the dependent variable in a multiple linear regression model. This chapter focuses on the sampling and pre-processing of the proposed regression model’s independent variables: percentage agriculture, relief, and contour intersections (steps 1, 2, and 4, of the process described above). 13 3.1 The Sample - Stratified by Agricultural Concentration and Relief This research focused exclusively on the state of Ohio. The state was an ideal study location because it contained the variability in land cover and topography through which I attempted to explain flow-direction error. Additionally, the Ohio Geographically Referenced Information Program (OGRIP - http://ogrip.oit.ohio.gov/Home.aspx) provided nearly all of the data needed to conduct the analysis for free. The OGRIP website offered USGS lO-meter DEMs, LiDAR point elevations, interpolated LiDAR DEMs, USGS 7.5-minute DLG hypsography (contours) and DLG hydrography (streams). The USGS 7.5-minute quadrangle served as the study’s sample unit, as this is the spatial scale and extent at which USGS DEMs are typically generated, including all of the Ohio lO-meter DEMs provided through the NED. Ohio is covered by 793 quadrangles. Due to the intensive pre-processing needed to calculate flow-direction error (detailed in Chapter 4), I selected a sub-sample of 80 quadrangles. To avoid oversampling the proposed regression model’s independent variables, I stratified the samples by their respective total relief and percentages of row-crop agriculture. I calculated these attributes for each quadrangle with the Analytical Tools Interface for Landscape Assessment (ATtILA) tool from EPA (US EPA, 2007), an extension for ESRI’S ArcView 3.3. I provided ATtILA with the quadrangle boundaries for Ohio, a lO-meter DEM for the entire state (described in greater detail in Chapter 4), and a state-wide land cover 14 raster. I downloaded the 2001 National Land Cover Database (NLCD) from the website of the Multi-Resolution Land Characteristics Consortium Orttp://www.mrlc.gov/nlcd.php). The NLCD was derived from 30-meter resolution Landsat data and reported an overall accuracy of 84% (Homer et al., 2007). I obtained a copy of the dataset that covered the entire Midwest region and clipped it by Ohio’s state boundary. With these state-wide inputs of elevation and land cover, ATtILA was able to output the elevation range and the percentage of various land cover classes in each quadrangle, including row-crop agriculture (2001 NLCD class 82) (Figures 3-1 and 3-2). I then utilized these quadrangle attributes as independent variables in the proposed regression model (Figure 2-1). Next, in order to avoid potential issues in deriving accurate hydrology from sampled DEMs, I removed quadrangles that contained large amounts of open water (e. g., quadrangles on the Lake Erie border), quadrangles where more than 10% of the area was classified as urban (because it is difficult to predict surface water flow in downtown Cleveland), quadrangles that straddled the state border (because the LiDAR tiles would not cover the entire areas of these quadrangles), and quadrangles that straddled the boundary of the Ohio State Plane Northern and Southern coordinate systems. OGRIP distributes the LiDAR DEMs in appropriate State Plane projections. It proved to be programmatically difficult to analyze a quadrangle that contained LiDAR data in two different projections. Since LiDAR rasters served as the more accurate reference dataset, it was important to not introduce additional error into them; therefore, I avoided re- projecting between State Plane zones. This filtering process reduced the quadrangle 15 population from 793 to 538. I then divided the 538 quadrangles into the following four bins: high percentage agriculture and high relief (HAHR), high percentage agriculture and low relief (HALR), low percentage agriculture and high relief (LAHR), low percentage agriculture and low relief (LALR). The breakpoints for these bins corresponded to the median values for each class, classifying half of the quadrangles as high agricultural concentration and half as low; they were similarly classified by relief. I then randomly selected 20 quadrangles from each of these bins in order to yield the final sample of 80 quadrangles (Table 3-1 and Figure 3-3). Percentage Row-crop Agriculture (quartiles) CT 0 - 7.5 Er] 7.5 - 30.0 - 30.0 - 71.5 - 71.5 - 93.0 Figure 3-1: Ohio 7.5-minute quadrangles by agricultural concentration. Elevation Range (feet - quartiles) [:f} 9- 169 [j 170 - 338 - 339 - 463 - 464 - 886 Figure 3—2: Ohio 7.5-minute quadrangles by total relief. . Relief Quadrangle Samplmg Totals High ( > 327feet) Low (< 327feel) High ( > 67.8% ) 20 out of 37 20 out of 232 40 out of 269 % Agriculture Law ( < 67.8% ) 20 out of 232 20 out of 37 40 out of 269 Totals 40 out of 269 40 out of 269 80 out of 538 Table 3-12 Breakdown of quadrangles sampled by relief and % agriculture. 17 Sample Quadrangle Cl Figure 3-3: Sample 7.5-minute quadrangles. As shown in Figure 3-3, the sampled quadrangles tended to cluster along Ohio’s SW-NE axis. This trend was a byproduct of the sample stratification. Northwest Ohio is dominated by agriculture and low relief, so the 20 quadrangles that constituted this sub- sarnple had a large geographic area from which they could have been selected. The situation was similar for the high relief and low percentage agriculture sub-sample of southeast Ohio. These two situations naturally clustered the remaining half of the sample along the southwest-northeast axis. 3.2 Contour Topology Error As established in Chapter 2, contour intersections are suspected of contributing to flow-direction error. Figures 3-4 through 3-10 illustrate the impact that these errors of topology can have on flow-direction. Figure 3-4 identifies topological errors in a small section of a USGS hypsography DLG (Alma, MI). Those errors were corrected in Figure 3-5. Figure 3-6 shows a DEM interpolated from the errant contours of Figure 3-4; note the large depression created at location 4 of Figure 3-4 and its impact on flow concentration in Figure 3-8. Figure 3-7 shows a DEM interpolated from the corrected contours; note the difference in flow concentration in Figure 3-9. Figure 3—10 maps the difference in flow-directions between the two DEMs (see section 4.5 for details). The blue areas indicate agreement, while the red areas indicate dramatic disagreement (180 degrees) and are found in the areas where contour intersections were identified in Figure 3-4. 19 Figure 3-4: Sample contours from a 7.5-minute quadrangle. Note the topological errors - dangles at l and intersections at 2-5. W/ U f\ Figure 3-5: Corrected contours. Though it is difficult to tell, the intersecting contours have been manually separated through digitizing. 20 Figure 3-6: DEM derived from topologically incorrect contours. Figure 3-7: DEM derived from corrected contours. 21 Figure 3-8: Surface flow concentrations (flow-accumulation) derived from topologically incorrect contours. Figure 3-9: Surface flow concentrations derived from corrected contours. 22 Flow-direction Ermr a Severe (4) a High (3) :1 Moderate (2) a how (I) - None (0) Figure 3-10: Flow-direction error calculated for the topologically incorrect DEM, with the corrected DEM as the reference. Error was greatest around the intersection points. (See Figure 4-35 for legend reference) I developed a Python script to determine the number of contour intersections in DLG hypsography (the source for the lO-meter DEMs) in each sample quadrangle. The script converted hypsography DLGs to ArcInfo coverages. This conversion allowed for the building of line and node topology for each new coverage. The topology included FromNode and ToNode attributes for each contour feature, which constituted a logic within the contours by which intersections could be identified. If multiple features shared the same FromNode or ToNode, this indicated the presence of an intersection (Figure 3- 11). The Python script employed this logic, iterating through the contours within each sample quadrangle, identifying the intersections, and recording an intersection count for each quadrangle. A quadrangle with high relief, and therefore many contours, could 23 potentially have more intersections than a flat quadrangle with comparatively fewer contours. However, the impact of the intersections on flow-directions would likely be more dramatic in the flatter quadrangle since there are fewer “correct” contours to mitigate the error (location 4 in Figure 3-4). In the high relief quadrangle, the intersections would likely occur in an area dense with “correct” contours that would confine any of the errors introduced into flow-direction to a small area. To properly associate contour intersections with potential flow-direction error, the script calculated the number of intersections per contour in each quadrangle. \K/,// \ \ \.—-’ f-J Contom' ID: 53 //// l FElevation: 750 y/”\ romNode = 100 i/ ’ ‘ FToNode= 101 /i// \ / V/ \ \ / lContour ID= 50| \\\ f \i I Elevation= 750 / f 1 , ‘iFromNodc= 99 // \ ode ID— __ 101 ToNode — 100 Node ID— — 100 \\. L_ _ C 51 l ContourlD—52 l ontour ID 5 Elevation: 755 } Elevation = 760 FromNode= 98 l FromNode = 97 {/4 ToNodc = 100 \ ToNode = 100 l" ,3 I/ / / / Figure 3-11: Contours 50, 51, and 52 all have Node 100 as their ToNode, indicating an intersection at that node. 24 In the next chapter, I will describe the steps I took to process the USGS and LiDAR DBMS, and detail how I calculated flow-direction error. 25 Chapter 4 Methods — Calculating the Dependent Variable: Flow-direction error The previous chapter covered how I selected the study’s 80 sample quadrangles, and calculated the dependent variables of the proposed regression model. In this chapter I discuss how I calculated flow-direction error. I first describe the software and hardware used in that calculation. I then describe the data sources of the calculation’s inputs, the pre-processing of those inputs, and the specific steps for calculating flow-direction error. The process I implemented for this thesis, outside of its design and programming, required a substantial quantity of person-hours and computer-hours. It took one month to complete the analysis for the selected sample (80 7.5-minute quadrangles), with six machines processing data the entire time. The basic steps were as follows: 1, pre- process the 10-meter DEMs; 2, pre-process the LiDAR DEMs; 3, calculate the flow- direction error in the 10-meter DEMs; 4, conduct the statistical analysis. 4.1 Software and Hardware I performed the data processing almost entirely within ArcGIS 9.3 and the Python programming language. ArcGIS handled all of the data pre-processing including clipping, re-projecting, and map algebra functions. It also performed much of the hydrologic processing of DEMs, including the filling of spurious sinks and calculating surface water flow-directions. Python is a flexible and open-source scripting language that interfaces with ArcGIS’s geo-processor. This integration allowed me to automate 26 much of the ArcGIS pre-processing functions through Python scripts. I employed Python outside of ArcGIS to perform some of the pre-processing and all of the data post- processing. Its list data structure is well-suited for analyzing raster datasets, such as LIDAR DEMs, as ASCII text files. It is also easy to create text files with Python in order to export analysis results and convert them back to rasters for further analysis in ArcGIS. Python’s flexibility enables it to easily incorporate 3rd party tools, such as the Geospatial Data Abstraction Library (GDAL - http://www.gdal.org/), an open source project for raster analysis. Utilizing the list data structure and GDAL raster analysis tools, I developed custom Python scripts to identify stream locations, carve through artificial barriers in LiDAR DBMS, and calculate flow-direction error. I have included all of the Python scripts used in this research in Appendix B. The automated data pre-processing and post-processing was conducted on multiple machines. Four desktop PCs of varying capabilities (from Pentium 4 with 1GB of RAM to Dual Core with 3GB) running Windows XP handled the pre—processing. On average, each of the 80 sample quadrangles took about 12 hours to pre-process. This entailed re-projecting inputs and clipping the state-wide lO-meter DEM by quadrangle boundaries. It also entailed burning stream locations into, filling, and calculating flow- direction for the 10-meter DEMs. The LiDAR pre-processing took up the significant majority of the processing time. The stream identification and carving of LiDAR DEMs took probably 95 — 99% of the total pre-processing time for each quadrangle. Three additional machines handled the post-processing (calculation of flow-direction error). Since I scripted the post-processing step entirely in Python (outside of ArcGIS) it could be performed on Linux machines. Two older PCs (a Pentium III and a Celeron, each 27 with 512 MB of RAM) installed with Linux (Ubuntu Jaunty 9.03) calculated flow- direction error at an average rate of four hours per quadrangle. A Lenovo Thinkpad (Dual-core with 3GB of RAM) also aided in the post-processing, at a rate of 1 hour per quadrangle. 4.2 Data Acquisition I generated lO-meter DEMs for each sample quadrangle by clipping a mosaicked state-wide DEM by the respective quadrangle boundaries. The state-wide DEM was obtained from OGRIP, and is the same data accessible through the NED. The statewide DEM was created by OGRIP in piecemeal by merging 20-30 1224,000 7.5-minute DLG hypsography ESRI shapefiles into a larger contour GIS layer, using a finite difference interpolation technique through ESRI ArcInfo’s TopoToRaster command to convert the contours to a DEM, and finally mosaicking the new DEMs into a state-wide dataset (Ohio Office of Information Technology 2005). I downloaded LiDAR 2.5-foot DEMs from OGRIP’s OSIP (Ohio Statewide Imagery Program) site in zipped county-wide files (over 400GB). Each zipped file contained multiple 5,000 x 5,000 foot DEM tiles covering the particular county’s extent. The naming scheme for the tiles was based on a tile’s lower-left X and Y coordinate in the Ohio State Plane coordinate system. For example, the DEM for tile 722180445 had its lower left coordinate at 218000, 445000 (X,Y) of the Ohio State Plane North coordinate system. I utilized this naming scheme to identify the LiDAR tiles that intersected each sample quadrangle. These DEMs were derived from LiDAR point data (in LAS format — 28 a standard format for LiDAR storage) gathered mainly during leaf-off conditions in spring 2006 by Woolpert Inc. for the State of Ohio (Woolpert, 2007). I downloaded stream data, quadrangle boundaries, and hypsography from OGRIP in the form of 1:24,000 DLGs for each sample quadrangle. I then wrote custom Python scripts to convert these DLGs to ESRI shapefiles for use in the analysis. 4.3 10-meter DEM Pre—processing Pre-processing took place one sample quadrangle at a time. To generate a flow- direction raster from a lO-meter DEM for a selected quadrangle I followed steps typically taken in hydrological GIS analyses to ensure proper surface water drainage. Once the state-wide lO-meter DEM had been clipped by the boundary of a sampled quadrangle, I set the analysis mask for the subsequent lO-meter processing to the clipped DEM; this ensured that all IO-meter outputs aligned spatially. Next, I re-projected the DLG stream vector dataset from a UTM projection to the appropriate Ohio State Plane Projection (either Ohio North - International Feet, or Ohio South — International Feet, depending on the sampled quadrangle) and converted it to a binary raster (stream cell = 1, all others = O). I then burned the resulting stream raster into the DEM by raising all non-stream cell elevations in the DEM by 10 feet (Figure 4-1). Stream-burning can improve a DEM’s ability to characterize the flow-direction of surface water by emphasizing the locations of streams (Hutchinson, 1989; Saunders, 1999). Next, the ArcGIS Fill command removed potential spurious sinks in the DEM. Sinks can appear in DEM data as artifacts of the production method and confound algorithms designed to route surface-water flow over 29 the landscape (Hutchison, 1989; Tarboton et al., 1991). Finally, the ArcGIS Flow- direction command created a flow-direction raster for the 10-meter DEM. The Flow- direction tool determines the direction of surface-water flow for each cell by identifying the cell in a 3x3 neighborhood with the steepest descent from the center cell. The result is a numeric value for the center cell corresponding to the direction of this steepest neighbor (Figure 4-2). This method is a single-flow algorithm, it cannot partition flow among multiple directions. It is commonly referred to as D8 (8 possible directions) (O’Callaghan and Mark, 1984; Greenlee, 1987; Jenson and Domingue, 1988). Figure 4—1: Burning streams into a DEM. streams Elevation ' Higher Elevations 78 72 69 71 58 49 74 67 56 49 46 50 69 53 44 37 38 48 64 58 55 22 31 24 68 61 47 21 16 19 74 53 34 12 11 12 Figure 4-2: Deriving flow-direction with D8 (graphic source: ESRI ArcGIS 9.3 Help files) While D8 has been shown to perform adequately in some studies (Skidmore 1989, Mouton 2007), it has been frequently maligned in literature as being too rigid and prone to introducing artifacts into the data, including long runs of parallel water flows (Wilson 30 et al. 2007, Raaflaub and Collins 2006, Venteris and Slater 2005, Schmidt and Persson 2003, Burrough et al. 2000, Jones 1998, Mitasova et al. 1995). However, for this analysis it was the logical choice for several reasons. First, the calculation of upstream flow accumulations (i.e., the number of cells flowing into a particular cell) in the LiDAR DEMs was essential for the comparison to the lO-meter DEM, and required a single- value flow-direction raster as input. Second, and most importantly, the D8 algorithm is the only available flow-direction tool in the ArcGIS Toolbox. Since ESRI is the world- leader in GIS market share (Daratech Inc., 2009), D8 is, by default, the most common flow-direction algorithm implemented in desktop GIS installations today, including the one used for this research. This popularity of D8, though arguably undeserved, makes the results of this research more applicable and relevant to the majority of current GIS USCI'S. 4.4 LiDAR DEM Processing The pre-processing of LiDAR DEMs was the most time-consurning, in terms of person-hours and machine-hours, and complex piece of the research. The two primary challenges were developing a means to burn streams into the LiDAR DBMS and carving through artificial barriers that impede surface-water flow. Only after these issues were resolved could an adequate LiDAR flow-direction raster be generated and used to evaluate a lO-meter flow-direction raster. The first step in pre-processing LiDAR DEMs for a sampled quadrangle was to identify which LiDAR tiles intersected the quadrangle. Within Python, I was able to use 31 the bounding Ohio State Plane coordinates of each 7.5-minute quadrangle and the coordinate-based nomenclature of the LiDAR tiles to determine which ones were contained within the quadrangle’s extent. To avoid edge issues, I excluded tiles that overlapped the quadrangle boundaries. This choice sometimes led to a selection of 40 LiDAR tiles, but the majority of the time it selected 48 tiles (Figure 4-3). l D 7.5 Minute Quad E Selected LiDAR Tiles LiDAR Tiles Figure 4-3: LiDAR tiles selected within a 7.5 minute quadrangle for pre-processing. Once the LiDAR tiles had been identified, Python scripts began the process of deriving flow-direction rasters for each tile. Generally, the procedure to accomplish this was the same as that for the 10-meter DEMs. The code burned streams into a DEM to enforce appropriate surface-water drainage, filled spurious sinks, and calculated flow- direction with the D8 algorithm. However, LiDAR’s positional accuracy and means of acquisition made stream burning difficult. The LiDAR DEM’s map scales ranged from 1:100 to 1:1,000, far superior to the 124,000 map scales for the DLG hydrography. Though both datasets were developed to adhere to National Map Accuracy Standards, the difference in dataset map scales meant that the positions of the LiDAR elevation values 32 were more accurate than the positions of the DLG stream features. The LiDAR positional accuracy varied little about the 7-foot postings of the source LAS points (Woolpert Inc., 2007), whereas the DLG hydrography’s position could be in error by as much as 40 feet or more (USGS, 1999), making stream burning by incorporating vector features impractical. It would be possible, if not probable, that a burned stream feature could run through a farm field, and not in the drainage ditch represented in the DEM (Figure 4-4). streams Elevation - .. ~ Higher 'i i" ,_" ‘_; Lower Figure 4-4: Left — aerial photograph. Right — LiDAR DEM and DLG stream feature in blue. DLG stream vector fails to align with LiDAR's drainage ditch due to differences in positional accuracy. Figure 4-4 also illustrates how LiDAR DEMs can contain artificial barriers within streams. Notice how the drainage ditch is bisected in the middle of the figure. In this instance this feature indicates the presence of a culvert running under a driveway. This barrier is a byproduct of how LiDAR data is acquired. While the 10-meter DEMs were interpolated from contours, the LiDAR DEMs were interpolated from numerous tightly spaced points recorded by an airplane-mounted sensor. Each point represented a location where the sensor’s beam struck the surface (or an object) and returned, allowing for a determination of elevation at that point. There is no way for the points and elevations 33 under the driveway bridge to be evaluated, creating a virtual barrier within the stream. Any effort to define surface-water flow for Figure 4-4 would err and assume that flow halted at this barrier; or worse, the entire ditch might be identified as a sink and filled by the ArcGIS filling algorithm. To resolve these two issues, I developed a method for identifying stream cells using only the LiDAR elevation values. I also implemented a method for carving through artificial barriers, like the bridge over the ditch. 4. 4.1 LiDAR Stream Identification The extraction of surface features from DEMs has been studied extensively. In terms of using LiDAR to identify stream features, five previous efforts stand out. MacMillan et al. (2003) described the capabilities and problems that can arise when trying to identify stream networks from high-resolution digital elevation data in Alberta, Canada. The authors used an approach similar to the one I implemented here; however, their design was more complex, appeared to have failed to resolve the in-stream barriers posed by bridges and overpasses, and focused on a single location. Leckie et al. (2005) used spectral reflectance, not elevation, of water and substrate material to identify stream features in Vancouver, British Columbia. Though they were able to achieve high accuracy rates in certain environments, spectral reflectance LiDAR data is less accessible than elevation data, making their method harder to implement elsewhere. Mason et a1. (2006) developed an identification method based on edge-detection, similar to my approach. However, they focused on delta regions in Italy and Germany, whereas I implemented stream identification on areas of varying topography and land cover. Vogt 34 et al. (2003) attempted to identify ideal upslope accumulation thresholds for stream identification in different landscapes, but concluded that flat areas still required manual stream digitizing. The automated approach I developed performed best in flat areas. James et al. (2007) looked at contour crenulations as a means for locating streams and areas of concentrated surface flow. Despite their success, their method required that contour lines be adequately crenulated (i.e., the study-area was not flat), which would pose problems for flat agricultural areas dominated by ditch drainage, which characterizes most of northwest Ohio. Contours for this area are sparse and relatively un- crenulated. The stream identification method that I developed is tailored for such an environment. A popular approach for deriving stream networks from DEMs is to identify a threshold for upland contributing cells, and code any cell that exceeds that threshold as a stream cell. Due to the presence of artificial barriers (discussed in 4.4.2) in LiDAR DEMs, this approach would not have worked well for much this study’s area. Surface water flow would be halted as these barrier locations, limiting appropriate flow- accumulation tabulations. I developed two methods for identifying stream features. One method relied on analyzing cell-neighborhoods around a center cell, and the other focused on transects of cells. In designing the two methods, I considered what distinguished stream locations in terms of topography. As discussed previously, I could not use vector features for streams; all I had were elevation values in the LiDAR DEMs. The basic assumption with both methods was that stream cells should be opposed by higher elevations in one direction (e. g., north to south), and similar elevations in the perpendicular direction (e. g., 35 east to west). The higher elevations should represent the stream banks, while the similar elevations should represent the stream itself. I developed each method as a Python script. The first method, called the Neighborhood Method, performed a neighborhood analysis on each cell in the DEM. The script iterated through each cell and analyzed a square 121-cell neighborhood (11 by 11) around the selected cell (call it center). It identified the maximum elevation value (call the cell with that value max) within that neighborhood and calculated the difference between that value and the elevation value of the selected cell (call that difference max_minus_center). If max_minus_center was greater than a user-specified minimum difference in elevation from a stream bank to the stream (call this value min__elev_diff, I typically employed values of l and 3 feet) the script then checked the value of the opposing cell (call it opposite_max) to see if the difference between its value and center '3 also exceeded min_elev_difiT If both of these conditions were met, then the script deemed it possible that the center cell was a stream cell opposed by two stream banks. It then looked at opposing cells (call them perpendicular_1 and perpendicular__2) in the direction perpendicular to the direction of max and opposite_max to see if their elevations were below min_elev_difif If they were, then the script deemed center and its entire neighborhood as stream cells. See Figure 4-5 for a conceptual visual. Essentially, if a cell had two relatively tall points opposing it, these may have been stream banks. If the same cell had relatively flat slopes in the direction perpendicular to the tall points, then these may have been stream cells. This approach captured the basic topographic relationship I considered in defining stream locations. More formally: 36 IF (max — center) > min_elev_difir AND (opposite_max — center) > min_elev_difi" AND (perpendicular_1 — center) < min_elev_difi’ AND (perpendicular_2 — center) < min_elev_difl THEN neighborhood = stream cell Elevation ' ' Higher ' V . Lower Figure 4-5: Cell positions evaluated by the Neighborhood Method. C = center, M = max, 0 = opposite max, R1 = perpendicular 1, R2 = perpendicular 2. The second method for stream identification was called the Transect Method. Like the Neighborhood Method, it also performed a neighborhood analysis on each cell; but it was a much simpler technique, based on transects. The script began by moving through the DEM in a north-south direction looking for decreases in elevation. At each cell (call it starter_cell) the script checked starter_cell ’s southern neighbor (call it south_cell) to see if its elevation value (call it south_elev) was less than that of starter_cell’s (call it start_elev). If it was, the script then evaluated a lO-cell transect 37 south of starter_cell. The script then identified the minimum elevation value (call it min_elev) and its cell location (call it min_cell) in the transect. If the difference between start_elev and min_elev was greater than a user-specified elevation difference (same as min_elev_diff in the Neighborhood Method, typically 1-3 feet) the script deemed the transect as potential stream cells. The script then continued to analyze the transect by looking for the maximum elevation value south of min_cell (call the value max_elev_south_of_min and the cell max_cell_south_of_min). If the difference between max_elev_south__of_min and min_elev was also greater than min_elev_difi§ then the cells from starter_cell to max_cell_south_of_min were classified as stream cells. Figure 4-6 provides a visual conceptualization of the Transect Method. Similar to the Neighborhood Method, if a cell was opposed by two higher cells, then it was possible that these cells were part of the stream network. More formally: IF start__elev > south_e1ev THEN IF (start_elev — min_elev) > min_elev_diff AND (max_elev_south_of_min — min_elev) > min_elev_diff THEN start_cell TO max_cell_south_of_min = stream cell 38 Elevation Higher . Lower Figure 4-6: Transect analysis cells (in black); S = starter_cell, O = south_cell, M = min_cell, X = max_cel|_south_of_min. The Transect Method repeated this approach in three other directions: east to west (west_cell instead of south_cell), northwest to southeast, and southwest to northeast. I evaluated the Neighborhood and Transect methods on ten LiDAR tiles of varying relief, drainage, and land-cover (Table 4-1 and Figure 4-7). I I I Is“. # vsgfag-fargipgte I £315: 11.) I mm... L I... Coy... I I l IWarsaw I 310 Natural lForest I I 2 IClevelandSmiflii 7 {#777132 militia," urban IUrban I L 3 IColumbus SE, NE I #7 7 56 INatural, urbaan Hrban ‘1’ , _I L: Arlington I 30 Natural I Agriculture I 5 McClure I 15 . Ditch I Agriculture J L 6 Knoxville I 485 I Natural IForest, residential I L 7 I Huntsville I 57 IDitch IAgriculture. forest I I 8 IGenoa I 24 Ditch, natural 1Agriculture F 7 I I I 9 ILatty, Oakwood I 30 Natural, ditch 7 IAgriculture, forest I I 10 IStonecreek I 320 Natural TForest. agriculture 4 Table 4-1: Quadrangles where the LiDAR stream identification methods were tested. 39 Evaluated [:1 Quads Figure 4-7: Quadrangles where the LiDAR stream identification methods were tested. Numbers refer to Site # in Table 4-1. Figure 4-8: Site 1. Left — aerial; Left Center -— LiDAR DEM; Right Center - streams identified by the Neighborhood Method; Right — by the Transect Method. Figure 4~9: Site 2, same descriptions as Figure 4-8. 40 J /" R ‘3 Figure 4-11: Site 4, same descriptions as Figure 4—8. J“ . ~\\ \ ‘I, t- v . 7‘ I \ ‘ ,' \ 1 \ Figure 4-13: Site 6, same descriptions as Figure 4-8. 41 Figure 4-15: Site 8, same descriptions as Figure 4-8. Figure 4-17: Site 10, same descriptions as Figure 4-8. 42 The evaluation of the ten sites indicated that the Neighborhood and Transect stream identification methods’ performance varied over different landscapes. The Neighborhood Method performed adequately in the flatter, ditch-dominated study sites like 4, 5, 8, and 9. The Transect Method’s best performance was in sites 5, 8, and 10. Neither method performed very well at sites 2, 3, 6, or 7. The main problem for the methods at sites 2 and 3 was the topographic complexity of an urban landscape. Site 2 was in the heart of Cleveland, and its streets and buildings posed too complex a surface for the Neighborhood and Transect methods to resolve. Site 3 captured the campus of Ohio State University, which also was too complex a surface for the methods to adequately identify stream locations. This difficulty in urban environments is the reason that study’s sample of 80 quadrangles were filtered to avoid quadrangles that had even marginal percentages of urban land cover (> 10%). Neither method could process the width of the main rivers, such as the Olentangy in site 3 and the Ohio in site 6. Both methods expected the stream network not to exceed 10 cells (25 feet), which grossly underestimated the widths of the Olentangy and Ohio rivers, 261 and 1,100 feet respectively. The problem for the methods at site 7 was that they picked up too many non-stream cells. These errors were due to the forested landscape, which has been shown to pose problems in the analysis of LiDAR and other remotely sensed elevation products (Barber and Shortridge, 2005; Shortridge, 2006). Despite the fact that the LiDAR data was pre-processed by OGRIP to generate a bare-earth condition, artifact tree canopies remained in many LiDAR tiles and distorted the stream-to-bank relationship or hid a stream altogether. The adjacency of an errant elevation value of a tree canopy next to a 43 bare earth value created the illusion of a steep feature and, in the eyes of the two methods, a potential stream bank. In comparison to each other, the Neighborhood Method performed better in areas where the drainage network was more sinuous (Figures 4-11 and 4-16), while the Transect Method better handled straight agricultural drainage ditches (Figures 4-12 and 4-15). The Transect Method’s poorer performance on sinuous streams was due to its focus on the basic cardinal (N -S-E-W) and intermediate (N W-SW-SE-NE) directions. The directions of transects along a sinuous stream would include much more than the eight considered by the Transect Method; therefore it missed many transects along such streams and failed to identify them as part of the stream network. This limitation could potentially be resolved by incorporating all possible transect directions into the method, but this might render it computationally inefficient. However, the Transect Method’s focus on those eight basic directions made it well-suited for identifying straight agricultural ditches. While the Neighborhood Method also did well in identifying these locations, the Transect Method outputs were cleaner and generated more quickly. The Neighborhood Method output tended to yield diamond-shaped stream neighborhoods, which in some instances gave the stream network messy and irregular boundaries. The boundaries of ditches identified by the Transect Method tended to be smooth and consistent, since its East-West analysis of a North-South trending ditch would essentially yield horizontal transects stacked neatly on top of each other (Figure 4-18). The processing of the Transect Method was also faster (1 min. 29 sec.) than the Neighborhood Method for site 5 (5 min. 42 sec.). 44 streams Elevation I I Higher I 1.1.... Figure 4-18: Left - rough edges of Neighborhood Method stream cells. Right — smoother edges of Transect Method. 4. 4.2 Carving Through Artificial Barriers A useful method to resolve artificial barriers in DEMs is to carve through them (Soille et al. 2003, Soille 2004a, Soille, 2004b). In this process, the elevations of the barrier cells are essentially reclassified in order to route the flow through them. However, in order to ensure that this step is only applied within stream cells, the stream network must first be identified. Soille’s method looked at all sinks in a DEM; I was only interested in removing barriers within streams. I developed a process, which I refer to as the Carving Method, that searched within the stream cells identified by one of the previous methods (Neighborhood or Transect), and looked for sinks (cells in which each of the 8 adjacent neighbors have higher elevations — as would be the case at the base of the artificial barrier in Figure 4-4). The method then searched an expanding square neighborhood, within a user-specified maximum neighborhood size (100 cells as a default) for an elevation value smaller than that of the sink cell in question. If it found such a cell (presumably on the other side of the barrier), a path was charted from the sink to the lower cell along which the elevation of each cell was incrementally lowered to create a slope for water to flow along, essentially removing the barrier (Figure 4-19). 45 di SI; CII ba Ila UIII 311C The method could be iterated over a user-specified number, as each carving could yield new sinks. Figure 4-19: Left - the barrier in the DEM. Center —- sinks in stream identified (in black), lower elevation found (in yellow), path charted (in blue). Right — carved DEM with elevation values incrementally lowered along path. The implementation of the Carving Method yielded desired results in some locations, but introduced errors in others. The method was implemented in all ten of the stream identification study locations, but was most applicable in the agricultural drainage ditch-dominated sites (5, 7, and 8). The method required a prior identification of the stream network, rendering it essentially useless for the urban sites. The sites with natural drainage did not have many, if any, in-stream barriers. The method worked as designed for the site 5 location shown in Figure 4-19. Figure 4-20 illustrates how the method created clear and uninhibited paths within drainage ditches by removing even small barriers. The left side of Figure 4-21 shows how the Carving Method successfully routed water flow under a road (N -S direction); but it also shows how the method introduced undesired carvings. It is not likely that a NW-SE ditch flows under the road intersection, and aerial photography did not indicate the presence of a culvert running N-S along the right-edge of Figure 4-21 (in green). 46 , i 1 e1 7! r an!» I Figure 4-20: Carving freed up flow within stream features. Left - drainage ditch in pre—carved DEM. Right - carved drainage ditch. Figure 4-21: Carving resolved some barriers (created north-south connection) but also introduced potential errors (NW-SE connection, N-S connection at right). The Carving Method was sensitive to even slight differences in elevation. When it encountered an artificial barrier in the stream, its primary goal was to find a neighbor with an elevation lower than that sink. As its search neighborhood expanded, it may have encountered cells with lower elevations that were not connected to the original sink, as illustrated by the right-side of Figure 4-21. A future, more “intelligent” Carving Method may be able to resolve this issue by considering the general direction of the stream in its neighborhood analysis. In the case of Figure 4-21, instead of searching in a neighborhood that incrementally expands in all directions, the expansion could occur in only the N-S direction, removing the possibility of the NW-SE carving that resulted in this analysis. 47 4. 4.3 Implementing Stream Identification and Carving in LiDAR Processing For each of the 80 sample quadrangles, I had to determine which stream identification method (Neighborhood or Transect) should be employed and what values should be chosen for identification parameters (minimum elevation difference and neighborhood size). As the results of the previous analysis indicate, the choices regarding the stream identification method and its parameters should be based upon the land-cover, relief, and the shape of the drainage features of a particular quadrangle. Even though these attributes may substantially vary across the sample of LiDAR tiles within a quadrangle (40 or 48), since the quadrangle pre-processing scripts operated on one quadrangle at a time the choices made for the stream identification parameters applied to all of a quadrangle’s tiles. For example, the choice of minimum elevation difference in the stream identification method applied to all tiles, even if the method would have performed better with a different value in some of the tiles. For the majority of quadrangles, I chose to use the Neighborhood Method with a minimum elevation difference of 3 feet, and a neighborhood size of 10 cells (25 feet by 25 feet). These choices held up well during post-processing inspection, particularly in quadrangles with high relief and little to no agriculture (Figure 4-22). For quadrangles dominated by agriculture, I still utilized the Neighborhood Method with a neighborhood size of 10 cells, but chose a more aggressive minimum elevation difference of 1 foot. In some instances these choices worked well, but in the flattest areas it still failed to adequately identify the stream network. I re-processed these quadrangles with the Transect Method, but results were still unsatisfactory. I then adjusted the pre-processing 48 SCI q! u. Flam Well I scripts so that both stream identification methods were integrated and employed on these quadrangles, which yielded more appropriate results (Figure 4-23). — DLG Streai Identified Stream LiDAR DEM Elevation Higher Lower Figure 4-22: The Neighborhood Method with a minimum elevation difference of 3 feet performed well at identifying stream cells in the LiDAR DEM for areas of high relief and little to no agriculture. — DLG Streams Identified Stream LiDAR DEM Elevation Higher ’ . Lower Figure 4—23: A combination of the Neighborhood and Transect methods performed best in flatter areas dominated by agriculture. 49 The LiDAR pre-processing was not a simple batch script that ran on its own for weeks. Many quadrangles were evaluated through the trial-and-error method described in the preceding paragraph. With a few quadrangles, I ran the stream identification algorithm three times until an adequate stream network was identified. For example, the first run might have been with the Neighborhood Method and a minimum elevation difference of 3 feet, a second run with a Neighborhood-Transect combination and an elevation difference of 1 foot, and a final run with a Neighborhood-Transect combination and an elevation difference of 2 feet. In these and all other similar instances, I inspected the results visually to determine which run performed best and should be utilized in the analysis. Once the stream locations had been identified, the Carving Method removed artificial barriers within those streams. For all 80 of the sample quadrangles, I used a maximum neighborhood size of 100 cells and 3 iterations of carving, since the method’s performance did not significantly vary across the lO-site evaluation. 4. 4. 4 Stream-burning and Sink-filling LiDAR DEMS The last steps of the LiDAR pre-processing were to burn the identified stream locations into the carved DEM in the same manner used with the lO-meter DEMs: fill all sinks and calculate flow-direction through D8. It should be noted that the LiDAR pre- processing steps described above introduced artifacts into the final LiDAR DEM. Specifically, the stream identification and carving methods typically did not yield a perfectly contiguous stream network. Some of the identified cells were fragmented, as 50 seen in Figures 4-22 and 4-23. After the stream burning process these cells essentially became sinks that were subsequently filled in by the filling algorithm. The end results were flat table-like features in the final LiDAR DEM. In terms of the accuracy of elevation, this outcome was a clear error; however, it did not necessarily alter the general flow-direction of surface water over these cells, which was what I tried to measure in this research. The elevation values of the neighboring cells around those filled sinks usually remained unchanged after the fill process, meaning that a concentration of surface-water flow coming from the east still continued west along the flat cells towards the lower elevations west of those cells (Figure 4-24). Elevation Upstream Contributing Cells - Higher _ Higher _ Lower Lower Figure 4-24: Left - the fill process created fiat features in the LiDAR DEM. Right - Flow accumulation derived from the filled LiDAR DEM's flow-direction indicate that the flat features did not dramatically alter the flow-direction. Once the LiDAR pre-processing steps described above had been completed for all of the LiDAR tiles for a sample quadrangle, the quadrangle was considered pre-processed and ready for the assessment of flow-direction error. See Appendix A for a model diagram of the quadrangle and LiDAR pre-processing. 51 4.5 Flow-direction Error I calculated flow-direction error for each quadrangle by performing a cell-by-cell comparison of the lO-meter flow-direction raster to the LiDAR flow-direction rasters, assuming the LiDAR rasters as the reference condition. As with the pre-processing steps, I programmed the flow-direction error calculation as a Python script. The code iterated through each cell in the lO-meter flow-direction raster (around 1 million cells). The general approach was to compare the flow-direction value for each lO-meter cell to an aggregated flow-direction value for the neighborhood of LiDAR cells1 that intersected the particular lO-meter cell (Figures 4-25 and 4-26). But how should the LiDAR flow be aggregated? One method of aggregation would be to take an averaged direction from the LiDAR neighborhood cells, or calculate a single vector representing direction. But this would be misleading in instances where half of the cells drained in one direction and the other half drained in the opposite direction. If half of the cells drained north and the other half drained south, would an east or west vector of flow-direction be an accurate representation of surface-flow amongst the LiDAR cells? Another method of aggregation would be to determine which flow-direction was most frequent amongst the intersecting LiDAR cells in Figure 4-26. However, this approach would not necessarily characterize where the majority of surface water flow 1 The size of the intersecting LiDAR neighborhood was typically a 10x10 grid, though sometimes it was 9x9. The LiDAR cells did not perfectly align with lO-meter cells. This was due to the fact that the 10- meter width and height of the NED cell was not evenly divisible by the LiDAR resolution of 2.5 feet. Figures 4-26 through 4-34 illustrate the overlaying of LiDAR cells on a lO-meter cell and show them to be in perfect alignment for conceptual purposes only. 52 traveled amongst those LiDAR cells. It would be possible, for example, for 80% of the intersecting LiDAR cells to drain towards the northern edge of the 10-meter cell, only to be routed due east along that northern edge. In this instance, the predominant flow- direction value for the LiDAR cells would be north, even though the majority of the flow left the lO-meter cell area along the eastern edge. ,----10m----, I I Flow-direction (08) - East - Southeast - South Southwest a West a Northwest E North - Northeast Figure 4-25: 3x3 neighborhood of 10-meter flow-direction cells. Cell to be analyzed for error at center. 53 Flow-direction (08) - East a Southeast - South E Southwest - West E Northwest E North - Northeast Figure 4-26: LiDAR flow-direction cells intersecting the selected 10-meter cell. Another option would be to examine the edges of the selected lO-meter cell, and determine what percentage of LiDAR cells drained through each edge. This approach would avoid the misrepresentation of a frequency value, as described above; however, it would weight the cardinal directions too strongly. The North, East, South, and West edges all would have ten cells that could potentially route flow through an edge; whereas the diagonal directions would each only have a single comer cell through which flow could be routed. Furthermore, similar to the problem with looking solely at the mode of flow-direction values, if all of the LiDAR neighborhood cells flowed through the northern edge of the lO-meter cell is that a fair comparison to the lO-meter flow-direction value? It would still be possible for all of the LiDAR cells to drain through the northern edge, and then quickly divert due east as soon as they crossed the edge to ultimately drain out somewhere in the Northeast lO-meter cell. 54 The lO-meter flow-direction value was determined by calculating Slopes between the center point of a cell and center points of that cell’s eight immediate neighbors, and noting the cell with maximum Slope (Equation 4-1). Distance was equal to the cell- resolution (IO-meters) for cardinal neighbors, and cell-resolution multiplied by 1.414 for the diagonal neighbors. Therefore, the 10-meter calculation of flow-direction was a prediction from the center of the center cell to the centers of the neighboring cells. A comparative analysis with the LiDAR cells should attempt to cover the same geography. For {e1, 62, , 93} Flow-direction = i, for II/IAXKIecemer cell - eil) ‘3‘ distance x 100] Where ei = elevation of the i-th neighbor Equation 4—1: Formula for calculating flow-directions The method I developed utilized an expanded neighborhood of LiDAR cells to determine the degree to which the 10-meter and LiDAR predictions for surface water flow agreed. This neighborhood extended to the center of each of the neighboring 10- meter cells, ensuring that the 10-meter flow direction value and aggregated LiDAR flow- direction value corresponded to the same geographic Space. This expansion of the LiDAR neighborhood also gave each neighboring lO-meter cell an equal opportunity for flow to drain through its edges. AS illustrated in Figure 4-27, each lO-meter neighboring cell had 10 LiDAR cells through which the LiDAR flow could have exited. For each of these neighboring cells, the method recorded the percentage of cells from the original 10 by 10 LiDAR neighborhood that drained through its edges (Figures 4-27 — 4-34). Since my goal was to evaluate the flow-direction value of the center lO-meter cell, I only 55 calculated the drainage percentage of LiDAR cells that intersected that center cell (Figure 4-26). The LiDAR cells in the expanded neighborhood (outside of the original LiDAR cells) were needed to trace the flow of the intersecting LiDAR cells out of the neighborhood, but these additional cells were not counted in the determination of aggregated LiDAR flow-direction value. Flow-direction (D8) - East - Southeast - South Southwest a West a Northwest [:1 North - Northeast Figure 4-27: 0% of center LiDAR cells drain through the northwest edges of the neighborhood. Figure 4-28: 0% of center LiDAR cells drain through the north edges of the neighborhood. Figure 4-29: 89% of center LiDAR cells drain through the northeast edges of the neighborhood. Figure 4-30: 0% of center LiDAR cells drain through the east edges of the neighborhood. Figure 4-31: 0% of center LiDAR cells drain through the southeast edges of the neighborhood. Figure 4-32: 0% of center LiDAR cells drain through the south edges of the neighborhood. Figure 4-33: 0% of center LiDAR cells drain through the southwest edges of the neighborhood. Figure 4—34: 11% percent of center LiDAR cells drain through the west edges of the neighborhood. Next, the script used the flow percentages calculated for each lO-meter neighbor to weight an error score (Equation 4-2). Recall that in Figure 4-25 the 10—meter cell in question reported a flow-direction value of southwest. If all of the LiDAR cells that 58 intersected this lO-meter cell drained out of the southwest edges of the expanded LiDAR neighborhood, then the script would consider the lO-meter cell and LiDAR cells to be in perfect agreement. This would result in an error value of 0. If the result was the opposite condition, with the LiDAR cells draining out of the northeast edges of the neighborhood, this would be perfect disagreement and result in an error score of 4. Other scores would range between 1 and 3 increasing from agreement to error (Figure 4-35). In the case of the example that began with Figure 4-25, 89% of the center LiDAR cells drained through the northeast edges of the expanded neighborhood (a severe error: 4), while the remaining 11% drained through the western edges (a small error: 1). These error values were weighted by their respective flow percentages ((4 x 0.89) + (1 x 0.11)) to yield an aggregated flow error of 3.67, an indication of very high error in the 10-meter flow- direction estimation. Note that this error value does not indicate a specific direction of error. The number itself does not indicate whether the LiDAR flow would be closer to the North neighbor or the East neighbor in Figure 4-35, only that the error was high. 8 FDE =21». 1:- i=1 Where F DE = flow-direction error for the 10-meter cell pi = percentage of center LiDAR cells exiting out of the i-th neighbor lO-meter cell fi = error score (1-4) of the i-th neighbor as compared to the flow-direction of the 10-meter cell Equation 4-2: Flow-direction error weighting. 59 Flow-direction Error fl Severe (4) make) I: Moderate (2) El Low(1) - Nonew) Figure 4-35: Potential 110w error scores for assessment of 10-meter cell that flows southwest. The script carried out the process described above for every lO-meter cell contained within a quadrangle’s sampled LiDAR tiles. The process yielded maps illustrating the spatial distribution of flow-direction error for a particular quadrangle (Figure 4-36 — 4-39). Initial tests of this method revealed a spatial trend of high error in and along stream features (Figure 4-36). This trend was the result of the filling operation removing sinks in sections of stream within the LiDAR cells and modifying elevations across large sections to single elevation values, and subsequently single flow-direction values. In the flatter, agricultural areas entire fields were filled and reported the same flow-direction value for all their corresponding cells. These were clear distortions of reality and diminished the LiDAR DEM’s utility as a reference dataset at these locations, despite its superior vertical and horizontal positional accuracy as compared to the 10- meter DEMS. To avoid the potential error that these flat cells could have introduced into 60 the analysis, I re-structured the Python script so that any lO-meter cell that touched a flat LiDAR cell directly (as an intersection of cells) or contained one in its expanded LiDAR neighborhood was discarded from the analysis (Figure 4—40). Esrskine et al. (2007) similarly removed flat cells from an analysis of DEM derivatives. This decision resulted in, on average, the discarding of 33% of lO-meter cells per 7.5 minute quadrangle. In some of the quadrangles with the lowest overall relief and high concentrations of agriculture, discard rates were as high as 83%. However, despite the high discard rates in these quadrangles, the statistical analysis was still able to Show significant results for flow-direction error prediction in these quadrangles (see Chapter 5). 61 Flow-direction Error - Severe (4) High (3) CI Moderate (2) m LOWU) - None (0) Figure 4-36: Flow-direction error map for a single 7.5-minute quadrangle. Blue grid lines are the result of avoiding edge issues in each LiDAR tile. 62 Flow-direction (D8) - East - Southeast - South E Southwest - West Northwest E North - Northeast Figure 4-37: For the selected 10-meter cell, estimated flow-direction is to the south. Flow-direction (D8) - East - Southeast - South fl Southwest — West Northwest E North - Northeast Figure 4-38: For the selected 10-meter cell, estimated LiDAR flow-direction trends mainly east and northeast. F low-direction Error - Severe (4) 3 High (3) D Moderate (2) Low (1) - None (0) fi’lfil Figure 4-39: The flow-direction error is considered high, with a value of 2.98. 63 Flow-direction Error - Severe (4) B High (3) :1 Moderate (2) E3 Low (1) - None (0) Figure 4-40: Flow-direction error map with flat cells discarded. Chapter 5 Results and Analysis The previous two chapters detailed this study’s methodology for identifying the mean flow-direction error and its standard deviation. Using these data recorded for each of the 80 sample quadrangles, I conducted independent t-tests to assess whether the mean flow-direction error varied significantly across the agricultural and relief characterizations listed in Table 3-1. I also explored correlations between the mean flow- direction error and various quadrangle attributes, and evaluated linear regression models adjusted for residual spatial autocorrelation. Overall flow-direction error among the 80 quadrangles was M = 1.04 (SD = 0.85). Figure 5-1 shows the spatial distribution of flow-direction error amongst the sample. There is a clear trend of increasing flow-direction error in the SE-NW direction amongst the quadrangles, mimicking the gradients for agriculture concentration and relief illustrated in Figures 3-1 and 3-2. These maps and the quantitative analysis I present here indicate that relief and agricultural concentration can be used to predict flow-direction error. 65 Mean Flow-direction Error (quartiles) C: 0.56 - 0.84 D 0.84 - 1.06 - 1.06 - 1.16 - 1.16 - 1.81 Figure 5-1: Spatial distribution of flow-direction error amongst the sampled quadrangles. 5.1 Independent T-tests I expected that flow-direction error would be higher in areas of high agricultural concentration and low relief. More formally: Hypotheses: Ho: FDE High % Ag. < FDE Low % Ag. Ha: FDE High % Ag. 2 FDE Low % Ag. Ho: FDE High Relief < FDE Low Relief Ha: FDE High Relief 2 FDE Low Relief Where: FDE = flow-direction error Table 5-1 shows the overall mean flow-direction error for the sampled quadrangles and for the bins generated during the sampling process (see Table 3-1). Table 5-2 Shows the 66 results of the independent t-tests calculated between these bins. AS expected, mean flow- direction error differed significantly between quadrangles characterized as having a high percentage of agriculture (M = 1.16, SD = 0.91) and those with a relatively low percentage (M = 0.93, SD = 0.79), t(78) = 4.33, p = < 0.001. Similarly, quadrangles characterized as having high total relief (M = 0.89, SD = 0.78) versus those characterized as low in relief differed Significantly (M = 1.19, SD = 0.93), t(78) = 6.06, p = < 0.001. On the basis of these results, both null hypotheses listed above were rejected. Mean Flow-direction Error Relief High (> 32 7 feet ) Low (< 32 7 feet ) mm" . wrm, 50:035. M=1.28, 30:11.97, M=1. 16, SD=0.91, ”’g” ( > 673%) N=20 N=20 N=4O , . /" Agr'cmtm M=0.75, SD=0.70, M=1.1 1, SD=O.88, M=0.93, SD==0.79, Low ( < 67.8%) N=20 N=20 N=40 Oman 1120.39, SD=O.78, M=l.l9, 50:0.93, M=l.04, SD=0.85, N=40 N=40 N=80 Table 5—1: Flow-direction error results for the various sampling criteria combinations. 1 also tested my expectations for the specific sampling bins, which I refer to as follows: High relief, high % agriculture: HRHA High relief, low % agriculture: HRLA Low relief, high % agriculture: LRHA Low relief, low % agriculture: LRLA I expected that quadrangles with little relief and a high percentage of agriculture would contain the highest flow-direction error values, and vice versa. More formally: H0: FDE LRHA < FDE HRLA Ha: FDE LRHA Z FDE HRLA 67 I also expected that when either relief or agricultural concentration was held constant the variation in the other would prove significant when compared to flow-direction error. Ho: FDE LRHA < FDE LRLA Ha: FDE LRHA Z FDE LRLA Ho: FDE HRHA < FDE HRLA Ha: FDE HRHA Z FDE HRLA Ho: FDE LRHA < FDE HRHA Ha: FDE LRHA Z FDE HRHA Ho: FDE LRLA < FDE HRLA Ha: FDE LRLA 2 F DE HRLA LRHA quadrangles reported the highest mean flow-direction error, M = 1.28, whereas the HRLA quadrangles reported the lowest value, M = 0.75. As shown in Table 5-2 these two means differed significantly, t(38) = 7.16, p = < 0.001 (a higher t score than any other pairing of mean flow-direction errors). These results show that when either relief or agricultural concentration was held constant, the change in the other corresponded significantly with changes in flow-direction error. There were not significant differences between HRHA and LRLA. It appears that relief had a stronger impact on flow-direction error than the concentration of agriculture. The difference in overall mean-error between low relief quadrangles and high relief quadrangles was greater (1.19 — 0.89 = 0.3) than the difference between quadrangles with high concentrations of agriculture and those with low concentrations (1.16 - 0.93 = 0.23). 68 T-test results HRHA HRLA LRHA LRLA HRHA m... 1:82:15? LRHA “is: 3339’ «:8: 317111116, LRLA 1(3 3): =0 11.38, t(;8<) (5.311614, 10:): 12635, Table 5-2: Independent t-test results between sampling criteria. 5.2 Correlations F low-direction error was highly correlated with agricultural concentration, relief, contour topology error, and the extent to which filling sinks in the LiDAR DEMs produced flat areas (Table 5-3). This last correlation was the strongest (r = 0.85). An initial reaction to such a strong correlation may be that these flat LiDAR cells distorted the true ground condition so dramatically that they yielded high values for flow-direction error. However, I discarded any lO-meter cell whose analysis neighborhood (Figure 4- 27) contained even a single flat LiDAR cell. Therefore the error means used in calculating the correlation did not reflect these cells. The more likely explanation for the I strong positive relationship is that the areas that contained a high number of flat cells in a depression-less DEM embodied the surface characteristics that confound flow-direction estimates. Flat cells in a filled DEM were more likely to occur in areas of high agricultural concentration and low relief, r = 0.58 and r = -0.73 respectively. These areas were dominated by agricultural ditches because the low relief provided little natural 69 drainage to route water away from fields and allow for crop production. Consequently, due to the presence of artificial barriers along them, these ditches had a greater chance of being identified as sinks than as natural drainage features and filled flat by the filling algorithm. A high percentage of flat cells for a quadrangle indicated low relief and a high concentration of agricultural ditches in that area. These areas essentially captured the “worst” of both worlds when trying to predict flow-directions; the low relief raised the uncertainty in flow-direction estimates, while the concentration of agricultural ditches increased the likelihood that drainage pathways were filled flat. It is worth noting that all of the variables in Table 5-3 were significantly correlated. This fact raised issues of multicollinearity in the attempt to develop a linear regression model for flow-direction error. Fl -dir t' o/ I t ti % of flat 10- Correlations ow cc 1011 o row-crop Relief n ersec ons meter cells error agriculture per contour discarded Flow-direction error 0 - /o row crop r = 0.77 agriculture Relief r = -0.76 r = -0.68 Intersections per r 2 0.70 ,. = 0.51 ,- = -0,6() contour % of 10-meter cells discarded r = 0.85 r = 0.53 r = _0_73 r = ()_70 due to flat LiDAR Table 5-3: Correlations between flow-direction error and other variables. N=80, all correlations were significantly different from zero (p < 0.001). 70 5.3 Spatial Regression I attempted to identify the variables that best explained variability in flow- direction error through linear regression modeling using the R statistical package (ver. 2.7.1). Since flow-direction error was a spatial phenomenon, I adjusted the models to account for spatial autocorrelation in the model residuals. Restating my general hypothesis, I expected that areas that were relatively flat, had high concentrations of agriculture, and whose DEMS were produced from contours that contained large numbers of topological errors (intersections) would have produced higher values of flow-direction error. More formally: Equation: FDE = a + BpAPA + BTRTR + BCICI Where: FDE = flow-direction error PA = % agriculture TR = total relief CI = contour intersections Hypotheses: Ho: BpA = 0 Ha: BpA > 0 H0: BTR = 0 Ha: BTR < 0 Ho: PC] = 0 Ha: BC] > 0 Figure 5-2: Model 1 - initial linear regression model of flow-direction error. I tested Model 1 (Figure 5-2) first, attempting to explain flow-direction error through relief, agriculture concentration, and contour intersections. Histograms of the model’s variables indicated that a transformation was warranted for the contour 71 intersections variable; a logarithmic transformation produced the most normal distribution (Figure 5-3). The results in Figure 5-4 confirmed the hypotheses of Model 1, with a solid overall model performance (R2 = 0.73). However, the Moran’s I value (I = 0.3611, p = < 0.001) for the model residuals indicated that they contained significant spatial autocorrelation (mapped in Figure 5-5 — note the clustering of shades), which was potentially inflating the R 2 and causing the horizontal cone shape (heteroscedasticity) of the residuals graphed in Figure 5-6. Mean Flow Error % Row-crop Agriculture 2* a 2 o O O .— g‘ c 5" on 2 " ”—1 2 Lu LL. c l— 1 4‘fi c 1 ’ 1 1 1 l 1 0.5 1.0 1.5 2.0 0 20 40 60 80 100 Flow Error % Total Relief Contour Intersections >. O >. N 3 ‘3 I W— 22 u. u. G l 1 O I l 41'?- 0 200 400 600 800 0.0 0.2 0.4 0.6 0.8 Feet Intersections per Contour Log Transform of Cntr. Int. 52 8 = 3 c: a u— 2’5 —I—i—I C l r l I 1 -5 -4 -3 -2 -l 0 Log(lntersections per Contour) Figure 5-3: Histograms of regression variables from Model 1. 72 Significance of Terms: Coefficients Estimate Std. Err. t score p-value Intercept 1.1994774 0.0864874 13.869 < 2e-l6 o - /° 5‘” "0" 0.0043492 0.0008723 4.986 3.78e-06 _agnculture Total relief (ft) -0.0005739 0.0001572 -3.652 0.000476 5%?“ per 0.0621131 0.0194624 3.191 0.002059 Hypotheses: BpA: Reject Ho BTR: Reject Ho BC]: Reject Ho Residuals: Min. 25th pctl. Median 75th pctl. Max. 0309 -0.1 13 0.008 0.085 0.333 Residual standard error: 0.1426 on 76 degrees of freedom Moran ’s I: Observed Expectation Variance 0.361 1 -0.0280 0.0080 p-value = 6.058e-06 (sampling test) Model Performance: R2: 0.73 F—statistic: 71.01 on 3 and 76 DF, p-value: < 2.2e-16 Fitted values vs. Actual values correlation: 0.86 Aikaike Information Criterion (AIC): -78.743 Figure 5-4: Model 1 diagnostics. 73 r.“___ ,A h, ,2 I MI I I . " - . I l E I D- I B PA . I Residuals - 3E3 521 E --0.31——0.14 I IN I Ej-0.14--0.02 I I 2e BID D L Gem-0.03 D I i I '- -0.03-0.10 I] l - D - E] I -0.10 0.33 E” i g Ii i WV KY /\ Figure 5-5: Mapped residuals of Model 1. Residuals vs. Fitted M. _ o 0 ° C 0 O O ‘ o O 0 0 O .12 c. ° :0 9 «5% o g o W 0° ° 3 o a: o 0 ° 0 S "I o o 0 o o 3) o 0 ° 0 0° 0 o N ° ° C? — o 0 8° 0 «'1 O. ‘ o l 17 1 I I 0.6 0.8 l 0 1 2 1 4 FittedValues Figure 5-6: Residuals of Model 1, greater residual error of higher fitted values indicates potential heteroscedasticity. 74 I mitigated the spatial autocorrelation in Model 1 by implementing the SAR (simultaneous auto-regression) function of R’s SPDEP (Spatial Dependence) module using each quadrangle’s three nearest neighbors, inversely weighted by their distance. SAR corrected the estimates of the model coefficients, but was not designed to produce a new R2. SAR corrected the residual heteroscedasticity illustrated in Figure 5-6 (Figure 5— 7). Afier correcting for residual spatial autocorrelation, the regression model’s terms were still significant, confirming my hypotheses (Figure 5-9). The adjusted model’s residual values were smaller in absolute terms and more concentrated than those for the original model; the range of residual values was smaller than in the original model (0.018 versus 0.024), as was the standard error (0.007 versus 0.143). Moran’s I for the adjusted model’s residuals was no longer statistically significant (p = 0.28). Additionally, the Lambda significance (p < 0.001), the slightly improved correlation of fitted values versus actual values (0.91 for SAR adjusted model versus 0.86 for original model), and the lower AIC value (-98.05 versus -78.4) all indicated that the coefficients of the new model successfully accounted for the presence of spatial autocorrelation in the residuals. Though not as clear as these numeric metrics of model improvement, the map of residuals fiom the adjusted model (Figure 5-8) shows a slight improvement as several of the darker clusters in the original residual map (Figure 5-5) were broken up. 75 SAR Residuals SAR Residuals vs. SAR Fitted o o _ o o o o 830 o _ o o o o °O o o o o a: 00 o o 0 :9 n 0 nun o 0 ° 0 o o o o o 4 ° ° 0 ° o o o o o o _ o 0 0° 0 I l l l I 0.6 1.0 1.2 1.4 1.6 SAR Fitted Values Figure 5-7: Residuals of Model 1, corrected for spatial autocorrelation, no apparent heteroscedasticity. PA , SAR Resnduals - -0.27 - -0.09 -0.09 - -0.03 l: -0.03 - 0.03 l 0.03- 0.10 -0.10-0.26 f Figure 5—8: Mapped residuals, corrected for spatial autocorrelation. 76 Significance of Terms: Coefficients Estimate Std. Err. I score p-value Intercept 1.] 1910155 0.0832270 13.446 < 26-16 0 - /° 5‘” "01’ 000391785 0.0008740 4.482 7.384e-06 _agr1culture Total relief(fi.) -0.00051557 0.0001362 -3.783 0.0001545 Egg“ per 003246648 0.0151490 2.143 0.0321021 Hypotheses: BpA: Reject Ho BTR: Reject Ho BC]: Reject Ho Residuals: Min. 25th pctl. Median 75th pctl. Max. 0274 -0.078 0.003 0.069 0.256 Residual standard error: 0.0072944 Moran ’s I: Observed Expectation Variance 0.04] 1 -0.0127 0.0081 p-value = 0.2755 (sampling test) Model Improvement: Lambda: 0.59923 LR test value: 21.305 p-value: 3.9180e-06 Fitted values vs. Actual values correlation: 0.91 Aikaike Information Criterion (AIC): -98.048 Figure 5-9: Model 1 diagnostics, corrected for residual spatial autocorrelation. By most measures, Model 1 performed very well. However, multicollinearity in the independent variables may have inflated the model’s performance. As indicated in 77 Table 5-3, there was a strong, negative correlation between agricultural concentration and relief (r = -0.68). Relief and the number of intersections per contour was also strongly negatively correlated (r = -0.60). Agricultural concentration and contour intersections were moderately positively correlated (r = 0.51). Therefore, I explored other regression models to see if flow-direction error could be described through other terms and combinations. All of the independent variables in Table 5-3 were significantly correlated, so any regression model employing them would have had multicollinearity issues. To consider the power of each variable individually on flow-direction error, as well as to assess some additional variables, I developed six alternative regression models (Table 5-4). Models 2 through 4 each employed one of the terms in the original model (Model 1 — Figure 5-2): percentage row-cop agriculture, total relief, or intersections per contour. While all of these models explained significant variability in flow-direction error, the results confirmed that percentage row-crop agriculture and total relief explained more of the variance than intersections per contour. Model 5 employed the percentage of a quadrangle’s lO-meter cells discarded from the analysis due to the presence of flat LiDAR cells (flat cell percentage). This model was the best single predictor of flow- direction error (R2 = 0.72). This independent variable was most closely associated with elements that confound DEM prediction of flow-direction: drainage ditches (potential sinks) that are likely to be filled in and flat terrain. Model 6 was a modification of the original regression model (Model 1) and replaced total relief with flat cell percentage, since total relief was the independent variable most highly correlated with flat cell 78 percentage. While multicollinearity was still an issue, Model 6 performed much better than the original model on all measures. Model 7 employed a relatively simple combination of terms and performed nearly as well as the original model. That all of these regression models yielded significant results begs the question of which model should those interested in predicting flow-direction error elsewhere employ. Solely by the numbers, Model 6 should be the predictor of choice. However, the terms of that model are relatively expensive to acquire. Specifically, the percentage of discarded flat cells requires access to LiDAR data, which is limited around the globe, and requires a significant amount of pre-processing and hard drive space. The most economical choice is Model 7, since land cover and elevation datasets are freely available for the entire US. and most of the world. Any of the models presented here are statistically significant predictors of flow-direction error. However, the results show that any implementation of these models requires a correction for spatial autocorrelation. Corr. of Model Residual 2 . . . SAR Sig. # Model terms Heteroscedasticity R F Statistic SAR Fitted AIC Terms vs. Actual 114.0 0 - 2 /o ag. no 0.59 (p < 0.001) 0.90 83.0 all . 109.6 3 total relief yes 0.58 (p «1001) 0.89 -77.9 all . 70.9 4 log(mt. p. cnt.) no 0.47 (p (0.001) 0.88 -65.6 all 208.4 5 flat cell pct. yes 0.72 (p (0.001) 0.89 -88.9 all % ag' 142 3 6 flat cell pct. no 0.84 (p < 0 001) 0.93 -124.1 all log(int. p. cnt.) ' %ag 109.6 7 total relief yes 0.69 (p (0.001) 0.91 -95.7 all Table 5-4: Alternative linear regression models, and measures of performance. 79 Chapter 6 Conclusion 6.1 Summary For this research, I sought to quantify error in estimates of surface water flow- direction derived from USGS lO-meter DBMS and explain how that error varied across different geographic landscapes. I developed a method by which finer resolution DEMs (LiDAR - 2.5-foot) can be utilized to evaluate estimates of flow-direction derived from coarser DEMs (U SGS - 10-meter) on a cell-by-cell basis. I implemented this approach on 80 7.5-minute USGS quadrangles in Ohio, stratified by relief and agricultural concentration. As I expected, statistical analyses revealed that flow-direction error was strongly, negatively correlated with total relief, and strongly, positively correlated with agricultural concentration. The ntunber of intersections per feature in the source contours that produced the USGS DBMS, and the percentage of sinks filled flat by the ArcGIS filling algorithm were also strongly, positively correlated with flow-direction error. A regression model employing agricultural concentration, contour intersections, and sink filling as independent variables explained 84% of the variance in flow-direction error. A simpler model employing only agricultural concentration and total relief still explained 69% of flow-direction error variance. Both of these regression models had to be adjusted to correct for residual spatial auto-correlation, and both contained some degree of multicollinearity. 80 The pre-processing of the LiDAR data was costly and time-consuming, requiring nearly one terabyte of hard drive space and four microcomputers running constantly for over a month. The most challenging and time-consuming part of pre-processing was identifying stream locations within the LiDAR DEMs in order to “burn” a hydrological network into the surfaces and enforce a proper drainage network. This step required the development of custom Python scripts that read LiDAR elevation data into Python lists, utilized neighborhood analyses to locate stream cells, and carved through artificial barriers within the identified stream networks. Qualitative tests indicated that these custom methods performed best in flat agricultural areas, where the enforcement of proper drainage networks was most needed. 6.2 Significance While the relationship between DEM error, relief, and agriculture has been studied before, the specific attention to flow-direction at field scales and over such a large geographic sample make this research unique. Previous efforts studied the effect of resolution or interpolation methods on DEM derivatives such as slope, aspect, and soil moisture. Other research has evaluated surface flow from DEMs by comparing catchment boundaries derived from upland flow accumulations in relatively small study sites. In this thesis, however, I have presented a new method for evaluating flow- direction, conducted this evaluation at the DEM cell level, and described its variation across the Ohio landscape, which exhibits a diversity of topography and land cover. 81 This thesis makes several contributions to the field of geographic information science. The quantification of flow-direction error adds a new measure by which DEMs can be evaluated. The cell-by-cell evaluation of coarser data with finer data could be applied to any number of raster datasets. The new method for evaluating surface water flow-direction could be extendedto other directional datasets, such as aspect, wind- direction, or groundwater flow-direction. Furthermore, the methods I developed and tested to locate stream features in LiDAR data, and my implementation of a sink carving method could be utilized in other hydrological studies of DEMs. This contribution in particular may prove to be the most significant as LiDAR DEMs replace contour- interpolated DEMs as the standard in spatial analysis. At a more practical level, the cell-level specificity of this thesis informs field- level analyses that would be impractical through previously described methods. Additionally, its broad scope informs analysis across varying topographic and land cover environments, making it applicable to a large audience of users. One group of DEM users that will benefit from this research are soil conservationists. Predictions of surface water flow-direction are critical to identifying specific field-level locations experiencing soil erosion. Inaccurate estimates will focus conservation efforts on the wrong locations. A conservationist may look for an erosive gully on the eastern edge of a field, when in actuality it may be on the western edge. Having a quantified value of flow-direction uncertainty for a particular location will aid soil conservationists in efficiently marshaling their time and resources. They will be able to gauge how much field verification is needed prior to targeting soil conservation practices. For example, if staff for the Lucas 82 County Soil and Water Conservation District utilized a USGS lO-meter DEM in an erosion model, as can be done with the Revised Universal Soil Loss Equation (Wu et al. 2005, Ouyang et al. 2005), to identify at-risk locations in a hilly area with little agricultural presence, they could be sufficiently confident in assuming that surface water flow was appropriately simulated by the model. If, on the other hand, district staff utilized the same approach to identify an at-risk location in a flat agricultural area (typical for that region of Ohio) they should exercise caution in assuming that the location was significantly eroding, and conduct a thorough field evaluation prior to the installation of remediation measures. Soil conservationists are just one specific group that could benefit from this research. Environmental engineers deriving stream networks, stream biologists exploring non-point source pollution loading from small local catchments, civil engineers re- evaluating floodplain maps, or any user employing DEMs for field-level hydrological analyses will benefit from knowing the uncertainty in simulated flow-direction at a particular location. Martinoni and Bernhard (1998) argued that greater documentation of error in DBMS, including assessment of derivative error, is needed to ensure their proper use and analysis. The quantification of flow-direction error could be a new metadata standard for DBMS, and support Martinoni’s and Bernhard’s goal. 83 6.3 Future Research While I was able to successfully quantify and explain flow-direction error in terms of topographic and land cover characteristics, the process generated additional topics that warrant further research. Stream locations could potentially be better identified and processed; smoothed LiDAR DEMs may prove to be better reference datasets; additional flow-direction algorithms should be utilized in the assessment of flow-direction error; and the scale at which flow-direction between lO-meter and LiDAR DEMs changes from error to agreement should be evaluated. The methods I developed to identify and burn stream locations in the LiDAR DEMs has several limitations. First, they cannot identify wide cross-sections of rivers. The analysis neighborhood is restricted to a lO-cell (25-foot by 25-foot) neighborhood and cannot locate the necessary stream bank elevations for large rivers in such a small window (Figures 4—10 and 4-13). Second, they require the user to specify single thresholds for stream bank elevation and neighborhood size that in actuality should vary depending on the relief and land cover of a particular area. In this research, I had to process several quadrangles three times with different threshold inputs until an acceptable output was realized. Third, the carving process to remove artificial barriers in the streams sometimes carved in inappropriate directions. For example, instead of carving through an east-westrunning bridge to connect a north-south drainage ditch, it sometimes carved in a northwest-southeast direction to connect the north-south ditch to a section of an east-west ditch on the opposite side of the road (Figure 4-21). Revised methods could accommodate wider rivers, automate input selections by analyzing total relief and land 84 cover discer metho repres numb an are data. 1 081112 DEM: direct impart preser DEM Press] hlpor 0310. Send; Well ( 311an cover percentages for given locations, and carve in a more intelligent manner by discerning the direction of the stream features that need to be connected. These revised methods would likely improve the precision of flow-direction error estimates. LiDAR DEMs can contain significant noise that distorts the surface representation. These distortions can be limited by “smoothing” the DEM through any number of filtering methods. A common approach is to change each cell’s elevation to an average of its nearest neighbors. This method reduces any large discrepancies in the data, but can also diminish valid features on the surface. I was particularly interested in evaluating flow-direction in flat areas; therefore, I was worried that smoothing LiDAR DEMs in these areas may distort the subtle surface features critical to calculating flow- direction. As mentioned in Chapter 3, subtle changes in elevation can have a significant impact on flow-direction if that area is flat. I chose to honor the original DEM and preserve these features, at the risk of noise affecting the model outputs. The LiDAR DEMs utilized in this thesis were unaltered from their downloaded format. Despite the presence of noise in some of the LiDAR DEMs, I was still able to confirm my hypotheses. However, since smoothing is a common procedure, it would be worth evaluating how the results of this study may change with smoothed LiDAR DEMs serving as the reference dataset. While the D8 flow-direction algorithm allowed me to confirm my hypotheses, its well documented limitations warrant an evaluation of flow-direction error utilizing alternative and more accepted methods. Multi-direction methods such as MF by Quinn et 85 al. (1991) or multi/single direction hybrids like Tarboton’s D-Infinity (1997) should be utilized in a cell-by-cell fashion as used here. I calculated flow-direction error on a cell-by-cell basis, but at what extent, if ever, does that error change to agreement? The 10-meter cell and its corresponding LiDAR cells may be in disagreement at the immediate neighborhood level illustrated in 4-35, but perhaps the predicted flows converge at the next larger neighborhood. It is possible that flow-direction error is generally confined to a small local neighborhood of cells. It is also possible that simulated flows travel in different direction over large areas (Figures 3-8 through 3-10). This relationship should be explored to better describe the impact flow- direction error may have on hydrological analyses. The availability of high-resolution LiDAR DEMs will continue to grow as GIS users demand finer data and as digital storage space becomes cheaper. LiDAR DEMs will likely replace the coarser, contour-interpolated and stereoscopic DEMs produced by the USGS as the standard digital elevation products for spatial analysis. As that transition takes place, research should document how existing DEM-based analyses may change with the incorporation of the finer resolution data. This research is a contribution to that effort. 86 APPENDICES 87 59:0 QUOI atom :25 m III.“ 59:0 K 1.8qu Sauna 25:35 $538.5 3. X—GZme—< 88 238m Befimwafim 2mg 28 «<95 M205 5:9 ”205 ~20“ =5 awe—cofimoaoa 7 SEC m 55025.32; «<91— @08onng coca—330 Sum cocoobvio—m =00 :ozoohuioE 2.00 Boeofiwsz w::8£8£ K 388$ _ momma 558%-32... ~20: :60 385.2 eom Stm 22826-32 “— 52.4.3— =262€$5E 668$ _ .33 bo>ov .895: .N 38m wimmoueaémem wig—us:— wafimeocum u< Nun—lam: 90 \oooqauhwu)- QWNU—OQWQO‘MkUJN—o NN GM 27 28 29 30 31 32 33 34 35 36 37 APPENDIX B — Python Code raster_analysis.py _all_ = ['rasterZarray', 'rastertethlist', 'extract_header'] #raster_analysis.py #Author: Glenn O'Neil #Date: September 2009 #This Python module provides tools for converting ArcGIS rasters #between multiple formats, including NumPy arrays for analysis with GDAL #and Python lists. It also includes a function for extracting a neighborhood #fi'om a larger list or array of values. def raster2array (raster): """Takes a raster path and converts it to a NumPy array using GDAL""" #check to see if gdal and gdalconst modules have been imported if 'gdal' not in globals(): from osgeo import gdal if 'gdalconst' not in globals(): from osgeo import gdalconst raster_ds = gdal.0pen(raster, gdalconst.GA_ReadOnly) raster_band = raster_ds.GetRasterBand( l) raster_anay = raster_band.ReadAsArray() del raster_ds, raster_band return raster_array def array2raster (raster_array, raster_ds_template, raster_ds_out_path, raster_ds_out_format, scratch_folder): Takes a NumPy array and converts it to an ArcINFO raster using GDAL raster_array = the NumPy array to be converted raster_ds_template = the raster dataset that will serve as a template for the output raster dataset raster_ds_out_path = the full path of the output raster dataset raster_ds_out_format = string of the the pixel type of the output raster dataset: - l_BIT: A 1-bit unsigned interger. Values can be 0 or 1. - 2_BIT: A 2-bit unsigned integer. The values supported can be from 0 to 3. - 4_BIT: A 4-bit nsigned integer. The values supported can be from O to 15. - 8_BlT_UNSIGNED: An unsigned 8-bit data type. The values supported can be from 0 to 255. - 8_BlT_SIGNED: An signed 8-bit data type. The values supported can be from - 128 to 127. 91 ,3 15 l6 17 18 49 50 5] <7 54 SS 56 i7 53 so 60 6l 63 63 7 1 71 gr ~.I. 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 66 67 68 69 70 71 72 73 74 75 76 77 - l6_BIT_UNSlGNED: An unsigned l6-bit data type. The values supported can be from 0 to 65,535. - l6_BIT_SIGNED: An signed l6-bit data type. The values supported can be from -32,768 to 32,767. - 32_BIT_UNSIGNED: An unsigned 32-bit data type. The values supported can be from 0 to 4,294,967,295. - 32_BIT_SIGNED: An signed 32-bit data type. The values supported can be from -2,147,483,648 to 2,147,483,647. - 32_BlT_FLOAT: A 32-bit data type supporting decimals. - 64_BIT: A 64-bit data type supporting decimals. scratch_folder = the folder where the temporary datasets will be stored #check to see if the necessary modules have been imported if 'gdal' not in globals(): from osgeo import gdal if 'gdalconst' not in globals(): from osgeo import gdalconst if 'arcgisscripting' not in globals(): import arcgisscripting if 'gp' not in globals(): gp = arcgisscripting.create(9.3) elif str(type(gp)) != "": del gp gp = arcgisscripting.create(9.3) #must first convert the template dataset to a GeoTlFF b/c GDAL can't writ to ESRI Grid format template_copy = scratch_folder + "\\temp.tif" gp.CopyRaster_management(raster_ds_template, template_copy, "#", "#", "#", "NONE", "NONE", raster_ds_out_format) #read the copied TIFF file into a GDAL dataset and write the raster_array contents into it raster_ds = gdal.0pen(template_copy, gdalconst.GA_Update) raster_band = raster_ds.GetRasterBand( l) raster_band.WriteArray(raster_array) raster_band.FlushCache() raster_ds.FlushCache() del raster_ds, raster_band #copy the modified template_copy back to an ESRI Grid gp.CopyRaster_management(template_copy, raster_ds_out_path) gp.delete_management(template_copy) del template_copy 92 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 def rastertext21ist (raster_ascii_path, output_datatype='int'): """Takes a raster as a text file and converts it to a list raster_ascii_path = full path of an ASCII text file converted from a raster output_datatype = the datatype of the values in the output list. Integer is the default, user can specify 'float' or 'string' #read in the elevation ascii file raster_ascii = open(raster_ascii_path, 'r') #initiate the list that will store all of the text values raster_list = [] raster_ascii.seek(0,0) for eachLine in raster_ascii: raster_list.append(eachLine.split(" ")) raster_ascii.close() #get rid of the descriptive info (the first six rows) for i in range(6): raster_list.remove(raster_list[0]) #get rid of the newline character at the end of each row for row in raster_list: row.remove(row[len(row) - 1]) #convert the textfile values (string) to whatever numeric type (integer or floating type) was specified if output_datatype == 'float' or output_datatype == 'Float': for row in raster_list: for i in range(len(row)): row[i] = float(row[i]) elif output_datatype = 'string' or output_datatype == 'String': pass else: #the default will be integer for row in raster_list: for i in range(len(row)): row[i] = int(row[i]) raster_ascii.close() del raster_ascii 93 113 (1a: 125 116 r 117 n 128 r} 131 n 133 c 133 r 134 135 136 dc 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 return raster_list def extract_header (raster_ascii_path): """ Extracts the header information from a raster converted to a text file and returns it as a list """ raster_ascii = open(raster_ascii_path, ‘r') header__list = [] for i in range(6): header_list.append(raster_ascii.readline()) raster_ascii.close() del raster_ascii return header_list def nhood(input_list, row_index, col_index, nsize): """Takes a list, row index, and column index and extracts a neighborhood of nsize around those indexes. nhood_list = [] #get the boundary indexes upper_lefl_row = row_index - nsize if upper_lefi_row < 0: upper_lefi_row = 0 upper_lefi_col = col_index - nsize if upper_left_col < 0: upper_1efi_col = 0 lower_right_row = row_index + nsize if lower_right_row >(1en(input_list)- 1): lower_right_row = len(input_list) - 1 lower_right_col = col_index + nsize if lower_right_col > (len(input_list[0]) - 1): lower_right_col = len(input_list[0]) - 1 #loop through the input_list for the specified boundaries and record the values row_counter = 0 for i in range(upper_left_row, lower_right_row + 1, 1): nhood_list.append([]) for j in range(upper_lefi_col, lower_right_col + 1, 1): nhood_list[row_counter].append(input_list[i][i]) row_counter += 1 return nhood_list 94 l-J ‘4.) tank *Cmm 5Aum. 88 8 z 9 # w 3 11': D t 13: N : H z 16: 17.5, H : W a 30; 31:; 2 g 33: 24k: 3 : 362 37:,- 3 ~ \OOOQGMRWN— WUNNNNNNNNNNN—~——~——O~—— N—OOOOQONMAWN—OVOOOQONM-hWN—O Mb) Am 35 36 37 38 APPENDIX B -— Python Code analysis _prep.py # # analysis _prep.py # Created September 2009 # Author: Glenn O'Neil # # Description: This script generates the datasets and folders necessary for an # analysis of NED (Naitonal Elevation Dataset) DEM flow-direction error by comparison to tiles of finer resolution LiDAR DEMS. The preparation includes a stream-burning process to force surface- water flow to streams. The NED DEMs are burned with USGS Hydrography 1:24K DLG vector features. LiDAR tile datasets are identified based on the coordinates of the selected USGS 7.5 minute quadrangle. This results in a sub-folder of around 40-48 LiDAR DEMs, each to be prepped separately. The LiDAR DEMs' horizontal positional accuracy are too precise to burn with 1:24K stream features. Therefore, this script calls one of two custom scripts (specified by the user) that identify stream locations in LiDAR datasets by analyzing only elevation values. Once stream identification has been completed for the LiDAR DEMs, another custom script is called to attempt to carve through artificial barriers in the LiDAR DEMs, such as foot-bridges and tile culverts. This script identifies sinks and carves to the closest low-point within a user-specified maximum neighborhood size. It continues to identify sinks and carve for a user-specified number of iterations. Once the final carve is complete, the LiDAR DEM is stream-burned, sinks filled, and flow-direction calculated with the D8 method. Inputs: 1. quads - an ESRI shapefile of 7.5 Minute USGS Quadrangles 2. quad_id_field - unique quad ID field of the 'quads' shapefile 3. quad_id - the unique ID of the selected quad 4. state _plane_field - field in the 'quads' shapefile that denotes whether the quad lies in the North or Southern %matfiitttkzttfintkkatfitntatntfikkflttitnttatk=11: state planes. 5. state _plane_n_sr_fi1e - path to the Ohio State Plane North NAD 83 Ham projection le 6. state _plane_s_sr_file - path to the Ohio State Plane South NAD 83 Ham projection file # 7. workspace - the workspace where the quad folder containing all output files and folders # will be created. # 8. streams_wspace - workspace containing a shapefile of stream features for each Ohio 95 39 4O 41 42 43 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 66 67 68 69 70 7 l 72 73 74 75 quad # from # # 10. study. # 11. # # # # # 12. difference in # # 13. inc cells # # 14. difference 1n # # 15. which # # 16. to # # 17. and carve. # 18. # # # 21. reducing the # # 22. # # Outputs: # # # # # # # # # # 9. ned_dem - a single DEM raster for all of Ohio, generated by contour interpolation DLG hypsography and submitted to NED. lidar_master__wspace - workspace containing all of the LiDAR rasters used in this stream_id_method - the method of stream identification for the LiDAR datsaets; 'neighborhood': see script strearn_id_nhood.py 'neighborhood transect combo': a combination of the neighborhood and trasect methods. See scripts strearn_id_nhood.py and stream_id_transect.py elev_diff - parameter of the neighborhood stream ID method. The minimum elevation from a stream bank to the stream center. neigh_maxsize - parameter of neighborhood stream 1D method. The maximum size to search around a given cell for cells that exceed elev_diff. elev_difi‘_trans - parameter of the transect stream ID method. The minimum elevation from a stream bank to the stream center. trans_length - parameter of the transect stream 1D method. The transect length over the code will search for elevation changes that exceed elev_diff_trans max_cavre_length - parameter of the carving script. The maximum distance in cells search about a sink for a lower point. iterations - parameter of the carving script. The number of times to identify sinks strearn_id_nhood_script - the path to the neighborhood stream identification script. 19. stream_id_transect_script - the path to the transect stream identification script. 20. carve_script - the path to the carving script. PWHQMPPNT‘ 1idar_sample - boolean value whether to sample a lattice from the LiDAR tiles, total number of rasters that must be processed. del_int_data - boolean value whether to delete intermediate datasets Folder containing output datasets ned - lO-meter DEM clipped by quadrangle boundary ned_b = clipped lO-meter DEM stream-bumed with stream_g ned_bf - filled (depressionless) version of ned_b ned_fd - flow-direction of 10-meter DEM, derived from ned_bf ned_fid - raster of unique [D values for each lO-meter cell quad_se1.shp - shapefile of selected quadrangle boundary streams. shp - shapefile of quadrangle streams (from DLG hydrography) streams _g binary raster of stream locations, from streams. shp 10. lidar- directory containing all of the quad's LiDAR data, organized by tile 96 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 # 10-1. carved_dem - LiDAR DEM with artificial barriers in streams carved through # 10-2. flat_areas - raster of locations with a slope of zero, later ignored in error calculation # 10-3. lidar_bum - LiDAR raster with lidar_stream locations burned in. # 10-5. lidar_ned_fid -LiDAR-resolution raster of ned_fid, used in locating LiDAR cell neighborhoods for each NED cell. # 10-6. lidar_stream - binary raster of LiDAR stream locations. # l 1. template_raster - TIFF copy of the ned_fd raster, used to write in flow_error results. # #import the necessary modules import sys, 05, time, arcgisscripting, math, random from subprocess import call gp = arcgisscripting.create(9.3) gp.CheckOutExtension("Spatial") ls = os.1inesep #function to convert seconds to hours and minutes (borrowed from: #http://mail.python.org/pipennai1/python-list/2003-January.l 81366.html def sec_to_h_min(s): temp = float() temp = float(s) / (60*60*24) d = int(temp) temp = (temp - d) * 24 h = int(temp) temp = (temp - h) * 60 m = int(temp) temp = (temp - m) * 60 sec = int(temp) return h,m,sec #inputs quads = sys.argv[l] quad_id_field = sys.argv[2] quad_id = sys.argv[3] state _plane_field = sys.argv[4] state _plane_n_sr_file = sys.argv[S] state _plane_n_sr = gp.CreateObject("SpatialReference") state _plane_n_sr.CreateFromF ile(state _plane_n_sr_file) state _plane_s_sr_fi1e = sys.argv[6] state _plane_s_sr = gp.CreateObject("SpatialReference") state _plane_s_sr.CreateFromF ile(state _plane_s_sr_file) 97 118 workspace = sys.argv[7] 119 streams_wspace = sys.argv[8] + "\\" 120 ned_dem = sys.argv[9] 121 lidar_master_wspace = sys.argv[ 10] 122 stream_id_method = sys.argv[] l] 123 elev_diff = sys.argv[12] 124 neigh_maxsize = sys.argv[l3] 125 elev_diff_trans = sys.argv[14] 126 trans_length = sys.argv[lS] 127 max_carve_length = sys.argv[l6] 128 iterations = sys.argv[l7] 129 stream_id_nhood_script = sys.argv[18] 130 stream_id_transect_script = sys.argv[l9] 131 carve_script = sys.argv[20] 132 lidar_sample = sys.argv[21] 133 del_int_data = sys.argv[22] 134 135 136 #get the path to the Python executable, for running certain parts of the script as more memory- efficient sub-processes. 137 pythonexe = os.environ.get("PYTl-lONEXE") #will be used to generate sub-processes to prevent memory leaks 138 pythonexe += "\\python.exe" 139 140 #check to see if a new workspace must be created 141 if not os.path.exists(workspace + "\\" + quad_id): 142 os.mkdir(workspace + "\\" + quad_id) 143 144 workspace += "\\" + quad_id + "\\" 145 146 #create the log textfile 147 log_file_path = workspace + "\\log.txt" 148 if not os.path.exists(log_fi1e_path): 149 log_file = open(log_file_path, 'w') 150 log_file.write("QUAD__1D: " + quad_id + ls + Is) 151 log_file.write("Parameters:" + Is) 152 log_file.write(" - quads: " + quads + ls) 153 log_file.write(" - quad_id_field: " + quad_id_field + ls) 154 log_file.write(" - quad_id: " + quad_id + Is) 155 log_file.write(" - state _plane_field: " + state _plane_field + ls) 156 log_file.write(" - state _plane_n_sr_file: " + state _plane_n_sr__file + ls) 157 log_file.write(" - state _plane_s_sr_file: " + state _plane_s_sr_file + Is) 158 log_file.write(" - workspace: " + workspace + ls) 159 log_file.write(" - streams_wspace: " + streams_wspace + Is) 98 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 log_file.write(" - ned_dem: " + ned_dem + ls) log_file.write(" - lidar_master_wspace: " + lidar_master_wspace + ls) log_file.write(" - stream_id_method: " + stream_id_method + ls) log_file.write(" - elev_diff: " + elev_diff + ls) log_file.write(" - neigh_maxsize: " + neigh_maxsize + ls) log_file.write(" - upslope_stream_threshhold: " + upslope_stream_threshhold + ls) log_file.write(" - max_carve_length: " + max_carve_length + ls) log_file.write(" - iterations: " + iterations + ls) log_file.write(" - stream_id_nhood_script: " + stream_id_nhood_script + 15) log_file.write(" - stream_id_transect_script: " + stream_id_transect_script + ls) log_file.write(" - stream_id_flowacc_script: " + stream_id_flowacc_script + 15) log_file.write(" - carve_script: " + carve_script + ls) log_file.write(" - lidar_sample: " + lidar_sample + ls) log_file.write(" - delete intermediate data: " + del_int_data + Is + ls) else: log_file = open(log_file_path, 'w') ####################################NED analysis prep########################### #determine whether to skip the NED section because it had already been processed ned_fid = workspace + "ned_fid" ned_fd = workspace + "ned_f " quad_sel = workspace + "quad_sel.shp" #Determine the state plane of the selected quad. cursor = gp.searchcursor(quads, ' "'+ quad_id_field +'" = \" + quad_id + '\' ') row = cursor.next() state _plane = row.getvalue(state_plane_field) if not gp.exists(ned_fid): log_file.write(time.ctime() + " Prepping the NED DEM:" + ls) #Get the quad gp.AddMessage(" - Selecting Quad " + quad_id) log_file.write(time.ctime() + " - Selecting Quad " + quad_id + ls) gp.select_analysis(quads, quad_sel, ' "'+ quad_id_field +'" = \" + quad_id + '\' ') #reproject the quad_boundary if necessary quad_sel_desc = gp.describe(quad_sel) if quad_sel_desc.SpatialReference.Name == "Unknown": gp.AddError('The selected quad does not have a defined projection.') log_file.write(time.ctime() + " - ERROR: Selecting Quad The selected quad does not 99 have a defined projection") 203 log_file.close() 204 del log_file 205 sys.exit() 206 207 quad _pcs = quad_sel_desc.SpatialReference.PCSCode 208 if state _plane = "N": 209 if quad _pcs l= state _plane_n_sr.PCSCode: #Must re-project to State Plane North 210 gp.AddMessage(" - Projecting quad boundary from " + quad_sel_desc.Name + " to " + state _plane_n_sr.Name) 211 log_file.write(time.ctime() + " - Projecting quad boundary from " + quad_sel_desc.Name + “ to " + state _plane_n_sr.Name + ls) 212 quad _proj = workspace + "quad_proj.shp" 213 gp.project_management(quad_sel, quad _proj, state _plane_n_sr) 2 l4 gp.delete_management(quad_sel) 215 gp.rename__management(quad_proj, quad_sel) 216 elif state _plane == "S": 217 if quad _pcs != state _plane_s_sr.PCSCode: #Must re-project to State Plane South 218 gp.AddMessage(" - Projecting quad boundary from " + quad_sel_desc.Name + " to " + state _plane_s_sr.Name) 219 log_file.write(time.ctime() + " - Projecting quad boundary from " + quad_sel_desc.Name + " to " + state _plane_s_sr.Name + ls) 220 quad _proj = workspace + "quad_proj.shp" 221 gp.project_management(quad_sel, quad _proj, state _plane_s_sr) 222 gp.de1ete_management(quad_seI) 223 gp.rename_management(quad_proj, quad_sel) 224 225 #Clip the NED DEM by the quad boundary 226 ned_fd = workspace + "ned_fd" 227 if not gp.exists(ned_fd): 228 gp.AddMessage(" - Clipping the NED DEM by the quad boundary") 229 log_file.write(time.ctime() + " - Clipping the NED DEM by the quad boundary" + Is) 230 ned = workspace + "ned" 231 gp.ExtractByMask_sa(ned_dem, quad_sel, ned) 232 233 234 #Get the streams 235 streams_input = streams_wspace + quad_id + "ohy.shp" 236 streams = workspace + "streams.shp" 23 7 #Check to see if the stream dataset must be re-projected 23 8 streams_desc = gp.describe(streams_input) 23 9 if streams_desc.SpatialReference.Name = "Unknown": 24 0 gp.AddError('The selected stream dataset does not have a defined projection.') 24 1 log_file.write(tirne.ctime() + " - ERROR: The selected stream dataset does not have a defined projection." + ls) 100 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 2 72 273 274 275 276 277 278 log_file.close() del log_file sys.exit() streams _pcs = streams_desc.SpatialReferencePCSCode if state _plane == "N": if streams _pcs != state _plane_n_sr.PCSCode: #Must re-project to State Plane North gp.AddMessage(" - Projecting streams from " + streams_desc.SpatialReference.Name + " to " + state _plane_n_sr.Name) 10g_file.write(time.ctime() + " - Projecting streams from " + streams_desc.SpatialReference.Name + " to " + state _plane_n_sr.Name + ls) #select the appropriate transformation if '16N' in streams_desc.SpatialReference.Name: transformation = "Thesis_NAD l 927_UTM l 6N_to_NAD l 983_HARN__Ohio_State_Plane_North_fe" elif '17N' in streams_desc.SpatialReference.Name: transformation = "Thesis_N AD 1 927_UTM 1 7N_to_NAD 1983_HARN_Ohio_State_Plane_North_fe" gp.project_management(streams_input, streams, state _plane_n_sr, transformation) else: #The streams projection is already in the appropriate projection gp.AddMessage(" - Stream re-projection not needed. Copying streams over to workspace") log_file.write(time.ctime() + " - Stream re-projection not needed. Copying streams over to workspace." + ls) - gp.eopy_management(streams_input, streams) elif state _plane = "S": if streams _pcs != state _plane_s_sr.PCSCode: #Must re-project to State Plane South gp.AddMessage(" - Projecting streams from " + streams_desc.SpatialReference.Name + " to " + state _plane_s_sr.Name) log_file.write(time.ctime0 + " - Projecting streams from " + streams_desc.SpatialReference.Name + " to " + state _plane_s_sr.Name + ls) #select the appropriate transformation if '16N' in streams_desc.SpatialReference.Name: transformation = "Thesis_NAD 1 927_UTM 1 6N_to_NAD 1 983_HARN_Ohio__State__Plane_South_fe" elif‘17N' in streams_desc.SpatialReference.Name: transformation = "Thesis_NAD 1 92 7_UTM l 7N__to_NAD1 983_HARN_Ohio_State_P1ane_South_fe" gp.project_management(streams_input, streams, state _plane_s_sr, transformation) else: #The streams projection is already in the appropriate projection gp.AddMessage(" - Stream re-projection not needed. Copying streams over to workspace") log_file.write(time.ctime() + " - Stream re-projection not needed. Copying streams over to workspace." + ls) gp.eopy_management(streams_input, streams) #Convert the streams to a raster for DEM burning gp.AddMessage(" - Converting streams to raster") 101 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 log_file.write(time.ctime() + " - Converting streams to raster." + ls) gp.ResetEnvironments() gp.Extent = ned . gp.SnapRaster = ned gp.Mask = ned gp.CellSize = ned streams _g_temp = workspace + "streams_g_tmp" gp.PolylineToRaster_conversion(streams, "Stream", streams_g_temp) streams _g = workspace + "streams_g" gp.Sing1eOutputMapAlgebra_sa("con(lsNull(" + streams_g_temp +"), O, 1)", streams_g) gp.delete_management(streams_g__temp) #Burn the streams into the NED DEM gp.AddMessage(" - Burning streams into NED DEM") log_file.write(time.ctime() + " - Burning streams into NED DEM." + ls) ned_b = workspace + "ned_b" gp.SingleOutputMapAlgebra_sa("con(" + streams_g + " == 1, " + ned + ", " + ned + " + 10)", ned_b) #F ill the NED DEMs gp.AddMessage(" - Filling NED DEM") log_file.write(time.ctime() + " - Filling NED DEM." + ls) ned_bf = workspace + "ned_bf" gp.fill_sa(ned_b, ned_bf) #Calculate NED Flowdirection print "Calculating NED flowdirection" gp.AddMessage(" - Calculating NED flow-direction") log_file.write(time.ctime() + " - Calculating NED flow-direction." + ls) ned_fd = workspace + "ned_fd" gp.flowdirection_sa(ned_bf, ned_fd) #determine if ned_fd needs to be re-projected gp.ResetEnvironments() ned_fd_desc = gp.describe(ned_fd) ned_fd_pcs = ned_fd_desc.SpatialReference.PCSCode if state _plane = "N": if ned_fd_pcs != state _plane_n_sr.PCSCode: #Must re-project to State Plane North gp.AddMessage(" - Projecting ned_fd boundary fiom " + ned_fd_desc.SpatialReference.Name + " to " + state _plane_n_sr.Name) log_file.write(time.ctime() + " - Projecting ned_fd boundary from " + ned_fd_desc.SpatialReference.Name + " to " + state _plane_n_sr.Name + ls) ned_ fd_proj= workspace + "_ned fd_proj" gp.ProjectRaster_ management(ned__ fd, ned_ fd_proj, state _plane n_ sr, "NEAREST", "#", "NAD__ 1983 _To__ HARN _Ohio") 102 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 gp.delete_management(ned_fd) gp.rename_management(ned_fd_proj, ned_fd) elif state _plane = "S": if ned_fd_pcs != state _plane_s_sr.PCSCode: #Must re-project to State Plane South gp.AddMessage(" - Projecting ned_fd boundary from " + ned_fd_desc.Name + " to " + state _plane__s_sr.Name) log_file.write(time.ctime() + " - Projecting ned_fd boundary from " + ned_fd_desc.SpatialReference.Name + " to " + state _plane_s_sr.Name + ls) ned_fd_proj = workspace + "ned_fd_proj" gp.ProjectRaster_management(ned_fd, ned_fd _proj, state _plane_s_sr, "NEAREST", "#", "NAD_1983_To_l-IARN_Ohio") gp.delete_management(ned_fd) gp.rename_management(ned_fd_proj, ned_fd) #Copy the NED flowdirection raster to a TlF F format for use as a template dataset for writing error output raster with GDAL later gp.AddMessage(" - Creating template raster for flow error output") log_file.write(time.ctime() + " - Creating template raster for flow error output." + ls) ned_fd_float = workspace + "\\ned_fd_float" gp.Float_sa(ned_fd, ned_fd_float) template_raster = workspace + "\\template_raster.tif' gp.CopyRaster_management(ned_fd_float, template_raster, "#", "#", "#", "NONE", "NONE", "32__BlT_FLOAT") gp.de1ete_management(ned_fd_float) #convert the ned_fd raster to a point feature dataset gp.AddMessage(" - Converting NED flow-direction to a point feature dataset") log_file.write(time.ctime() + " - Converting NED flow-direction to a point feature dataset. + ls) ned_fd_pt = workspace + "ned_fd_pt.shp" gp.RasterToPoint_conversion(ned_fd, ned_fd_pt, "VALUE") #convert back to a raster, but with the FID values as the raster values gp.AddMessage(" - Converting NED flow-direction point feature dataset back to a raster with the F ID Values as raster values") log_file.write(time.ctime() + " - Converting NED flow—direction point feature dataset back to a raster with the FID Values as raster values." + ls) gp.ResetEnvironments() gp.Extent = ned_fd gp.CellSize = ned_fd gp.Mask = ned_fd gp.SnapRaster = ned_fd gp.PointToRaster_conversion(ned_fd_pt, "F 1D", ned_fid, "MOST_FREQUENT", "NONE") gp.delete_management(ned_fd_pt) #re-generate the raster using Map Algebra, this prevents the addition of NoDATA cells around the permiter 103 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 #of ned_fid, which PointToRaster can do. gp.ResetEnvironments() gp.Extent = ned_fd gp.CellSize = ned_fd gp.Mask = ned_fd gp.SnapRaster = ned_fd ned_fid2 = ned_fid + "2" gp.SingleOutputMapAlgebra_sa(ned_fid, ned_fid2) gp.delete_management(ned_fid) gp.rename_management(ned_fid2, ned_fid) #clean up if del_int_data = "True": gp.delete_management(streams) gp.de1ete_management(streams_g) gp.delete_management(ned_b) gp.delete_management(ned_bf) #get the width of the NED flowdirection raster, for storing in the parameters text file gp.AddMessage(" - Reading in NED raster cell size") log_file.write(time.ctime() + " - Reading in NED raster cell size." + ls) ned_descr = gp.describe(ned_fd) ned_height = ned_descr.meancellheight ned_width = ned_descr.meancellheight ################################################################################ ########################LiDDAR analysis prep#################################### if not os.path.isdir(workspace + "lidar"): os.mkdir(workspace + "lidar") #determine the LiDAR datasets that intersect the selected quad log_file.write(ls + "LiDAR DEM prep:" + ls) gp.AddMessage(" - Identifying LiDAR Rasters that intersect Quad " + quad_id) log_file.write(time.ctime() + " - Identifying LiDAR Rasters that intersect Quad " + quad_id + ls) quad_sel_desc = gp.describe(quad_sel) quad_sel_ll_x = quad_sel_desc.extent.xmin quad_sel_ll_y = quad_sel_desc.extent.ymin 104 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 42 1 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 quad_sel_height = quad_sel_desc.extent.Height quad_sel_width = quad_sel_desc.extent.Width #determine the name of the LiDAR dataset that contains the lower_left comer of the quad #first get the rasters lidar_rasters = [] lidar_master_wspace_contents = os.listdir(lidar_master_wspace) for i in lidar_master_wspace_contents: if os.path.isdir(1idar_master_wspace + "\\" + i): lidar_rasters.append(i) del lidar_master_wspace_contents #determine the size of a LiDAR raster lidar_descr = gp.describe(lidar__master_wspace + "\\" + lidar_rasters[O] + "\\" + lidar_rasters[0]) 1idar__height= lidar_descr.extent.ymax - lidar_descr.extent.ymin lidar_width = lidar_descr.extent.xmax - lidar_descr.extent.xmin lidar_rasters_per_quad_height = int(math.cei1(quad_sel_height / lidar_height)) lidar_rasters_per_quad_width = int(math.ceil(quad_sel_width / lidar_width» for i in range(len(lidar_rasters)): #get the lidar lower left coordinates from the LiDAR name (first is state plane system, 2-5 are x and last 3 are y) lidar_ll_x = int(lidar_rasters[i][ 1 :5] + "000") lidar_ll_y = int(lidar_rasters[i][Sz] + "000") ll_x_diff = quad_sel_ll_x - lidar_ll_x ll_y_diff = quad_sel_ll_y - lidar_ll_y if 11_x_diff < lidar_width and ll_x_difi‘ >=0 and ll_y_diff < lidar_height and ll_y_diff >=0: #we've found the lower left LiDAR Dataset ll_lidar_raster = lidar_rasters[i] #get the other defining rasters ul_lidar_raster = lidar_rasters[i][:S] + str(lidar_ll_y + ((lidar_rasters_per_quad_height - 1) * lidar_height))[z3] ' ur_lidar_raster = lidar_rasters[i][: 1] + str(lidar_ll_x + ((lidar_rasters_per_quad_width - 1) * lidar_width))[:4] + str(lidar_ll_y + ((lidar_rasters_per_quad_height - l) * lidar_height))[z3] lr_1idar_raster = lidar_rasters[i][:l] + str(lidar_ll_x + ((lidar_rasters_per_quad_width - 1) * lidar_width))[:4] + lidar_rasters[i][5:] break #use the comer rasters to define a list of rasters for the entire quad lidar_raster_list = [U] * lidar_rasters_per_quad_height for i in range(lidar_rasters_per_quad_height): 105 453 453 454 455 456 457 459 460 461 461 440 44 1 442 443 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 lidar_raster_list[i] = [U] * lidar_rasters_per_quad_width x_location = int(ll_lidar__raster[1:5] + "000") y_location = int(11_lidar_raster[5:] + "000") prefix = state _plane.lower() for i in range(lidar_rasters_per_quad_height): for j in range(lidar_rasters_per_quad_width): lidar_raster_list[i][j] = prefix + str(x_location)[:4] + str(y_location)[:3] x_location += lidar_width y_location += lidar_height #reset the x_location x_location = int(ul_lidar_raster[l :5] + "000") #Trim the boundaries off of lidar_raster_list to avoid edge issues during the analysis. #There will be some areas where the NED dem does not cover the entire LiDAR dem. lidar_raster_list2 = [] for i in range(l , len(lidar_raster_list) - 1): lidar_raster_list2.append(lidar__raster_list[i][ l :len(lidar_raster_list[i]) - 1]) del lidar_raster_list if lidar_sample = "True": #remove half of the rasters in the form of staggered lattice lidar_raster_list3 = [] for i in range(len(lidar_raster_list2)): lidar_raster_list3 .append([]) #determine whether i is odd or even if i % 2 = 0: # it's even, start at the first index of the row for j in range(O, len(lidar_raster_list2[i]), 2): lidar_raster_list3[i].append(lidar_raster_list2[i][j]) else: # its odd, start at the second index of the row for j in range(l, len(lidar_raster_list2[i]), 2): lidar_raster_list3[i].append(lidar_raster_list2[i][j]) else: ‘ lidar_raster_list3 = lidar_raster_listZ #move the list into a simpler to handle single dimension lidar_raster_list = [] 4 for i in range(len(lidar_raster_list3)): for j in range(len(lidar_raster_list3[i])): lidar_raster_list.append(lidar_raster_list3 [ i] U ]) 106 483 484 485 486 487 488 489 490 49 1 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 del lidar_raster_listZ, lidar_raster_list3 #call code to prep the LiDAR tiles for i in range(len(lidar_raster_list)): if not os.path.exists(workspace + "lidar\\" + lidar_raster_list[i]): os.mkdir(workspace + "lidar\\" + lidar_raster_list[i]) gp.AddMessage(" - Prepping LiDAR tile " + str(lidar_raster_list[i]) + " (" + str(i) + " of " + str(len(lidar_raster_list)) + ")") log_file.write(time.ctime() + " - Prepping LiDAR tile " + str(lidar_raster_list[i]) + " (" + str(i) + " of " + str(len(lidar_raster__list)) + ")" + ls) 1idar_prep_time_start = time.clock() #get the lidar raster lidar_dem = lidar_master_wspace + "\\" + lidar_raster_list[i] + "\\" + lidar_raster_list[i] lidar_wspace = workspace + "lidar\\" + lidar_raster_list[i] #identify the stream network lidar_fd = lidar_wspace + "\\lidar_fd" if not os.path.exists(lidar_fd): #then the code did not complete for this raster lidar_stream = lidar_wspace + "\\lidar_stream" if not os.path.exists(lidar_stream): stream_id_time_start = time.clock() gp.AddMessage(" - Identifying stream cells with the " + stream_id_method + " method") log_fi1e.write(time.ctime() + " - Identifying stream cells with the " + stream_id_method + " method" + ls) if stream_id_method == "neighborhood": call([pythonexe, stream_id_nhood_script, lidar_dem, elev_diff, neigh_maxsize, lidar_stream, lidar_wspace]) elif stream_id_method = "transect": call([pythonexe, stream_id_transect_script, lidar_dem, neigh_maxsize, elev__diff, trans_stream, lidar_wspace]) elif stream_id_method == "neighborhood transect combo": #run the neighborhood method gp.AddMessage(" - First identifying stream cells with the neighborhood method") log_file.write(time.ctime() + " - First identifying stream cells with the neighborhood method" + ls) nhood_stream = lidar_wspace + "\\nhood_stream" 107 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 54 l 542 543 544 545 546 547 548 549 550 SS 1 552 553 554 555 556 557 558 call([pythonexe, stream_id_nhood_script, lidar_dem, elev_diff, neigh_maxsize, nhood_stream, lidar_wspace]) #run the transect method gp.AddMessage(" 10g_file.write(time.ctime() + .. - Next identifying stream cells with the transect method") - Next identifying stream cells with the transect method" + ls) trans_stream = lidar_wspace + "\\trans__stream" call([pythonexe, stream_id_transect_script, lidar_dem, neigh_maxsize, elev_diff, trans_stream, lidar_wspace]) #combine the two results gp.AddMessage(" 10g_file.write(timeetimeo + u - Combining the neighborhood and transect outputs") - Combining the neighborhood and transect outputs" + ls) gp.ResetEnvironments() gp.Extent = lidar_dem gp.Mask = lidar_dem gp.SnapRaster = lidar_dem gp.SingleOutputMapAlgebra_sa("con(" + nhood_stream + " == 1 l " + trans_stream + " == 1, 1, 0)", lidar_stream) else: gp.AddError("UNRECOGNIZED STREAM IDENTIFICATION METHOD") log_file.write(time.ctime() + " - ERROR: UNRECOGNIZED STREAM IDENTIFICATION METHOD") log_file.close() del log_file sys.exitO stream_id_time_stop = time.clock() h,m,s = sec_to_h_min(stream_id_time_stop - stream_id_time_start) gp.AddMessage(" - ID took " + str(m) + " minutes " + str(s) + " seconds") log_file.write(time.ctime() + " - 1D took " + str(m) + " minutes " + str(s) + " seconds" + ls) + ls) #use the identified stream cells to burn through artificial barriers in the LiDAR DEM carved_dem = lidar_wspace + "\\carved_dem" if not os.path.exists(carved_dem): stream_carve_time_start = time.clock() gp.AddMessage(" - Carving stream cells through artificial barriers") log_file.write(time.ctime() + " - Carving stream cells through artificial barriers" call([pythonexe, carve_script, lidar_dem, lidar_stream, max_carve_length, iterations, carve_iteration_script, "true", carved_dem, lidar_wspace]) stream_carve_time_stop = time.clock() 108 559 h,m,s = sec_to_h_min(stream_carve_time_stop - stream_carve_time_start) 550 gp.AddMessage(" - Carve took " + str(m) + " minutes " + str(s) + " seconds") 561 log_file.write(time.ctime() + " - Carve took " + str(m) + " minutes " + str(s) + " seconds" + Is) 562 563 #burn in the identified streams into the LiDAR DEM 564 gp.ResetEnvironments() 565 lidar_bum = lidar_wspace + "\\lidar_bum" 566 if not os.path.exists(lidar_burn): 567 gp.AddMessage(" - Burning streams into LiDAR DEM") 568 log_file.write(time.ctime() + " - Burning streams into LiDAR DEM" + Is) 569 gp.SingleOutputMapAlgebra_sa("con(" + lidar_stream + " == 1, " + carved_dem + ", " + caned_dem + " + 10)", lidar_bum) 570 571 #fill the carved and stream_bumed LiDAR DEM 572 1idar_fill = lidar_wspace + "\\lidar_fill" 573 if not os.path.exists(lidar_fill): 574 gp.AddMessage(" - Filling the LiDAR DEM") 575 log_file.write(time.ctime() + " - Filling the LiDAR DEM" + Is) 576 gp.fill_sa(lidar_bum, lidar_fill) 577 573 #calculate flow-direction on the filled LiDAR DEM 579 gp.AddMessage(" - Calculating LiDAR DEM flow-direction") 580 log_file.write(time.ctime() + " - Calculating LiDAR DEM flow-direction" + Is) 531 gp.fiowdirection_sa(lidar_fill, lidar_fd) 582 583 #create a version of the LiDAR dataset with the NED FID values 584 lidar_ned_fid = lidar_wspace + "\\lidar_ned_fid" 535 if not os.path.exists(lidar_ned_fid): 586 gp.AddMessage(" - Creating LiDAR scale raster of NED FID values") 587 log_file.write(time.ctime() + " - Creating LiDAR scale raster of NED FID values" + Is) 588 gp.ResetEnvironments() 589 gp.Extent = lidar_fd 59° gp.CellSize = lidar_fd 591 gp.Mask = lidar_fd 592 gp.SnapRaster = lidar_fd :2: gp.SingleOutputMapA1gebra_sa(ned_fid, lidar_ned_fid) 595 596 #Getting the LiDAR cell size for the parameters text file that will inform the 597 #selection of LiDAR tiles based on the NED DEM coordinate 533 gp.AddMessage(" . - Recording LiDAR cellsize info for parameters text file") log_file.wr1te(t1me.ctrme() + " - Recording LiDAR cells1ze mfo for parameters text file" + Is) 109 600 601 602 603 604 605 606 607 608 609 610 61 1 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 63 1 632 633 634 63 5 636 637 638 639 640 642 lidar_descr = gp.describe(lidar_fd) lidar_height = Iidar_descr.meance|lheight lidar_width = lidar_descrmeancellheight lidar_cells_per_ned_cell = «ned_height / lidar_height) + (ned_width / lidar_width)) "' 0.5 buffer_size = int(lidar_cells_per_ned_ce|l "' 0.5) #write cell size information to a parameters textfile for processing parameters_file_path = lidar_wspace + "\\fiow_analysis_parameters.txt" parameters_file = open(parameters_file_path, 'w') parameters_file.write("ned_height;" + str(ned_height) + ls) parameters_file.write("ned__width;" + str(ned_width) + ls) parameters_file.write("lidar_height;" + str(lidar_height) + ls) parameters_file.write("lidar_width;" + str(lidar_width) + ls) parameters_file.write("lidar_cells_per_ned_cell;" + str(lidar_cells_per_ned__cell) + ls) parameters_file.write("bufi‘er_size;" + str(buffer_size) + ls) parameters_file.close() del parameters_file #create a binary raster of flat areas, so these areas may be discarded #in the calculation of flow-direction error gp.AddMessage(" - Creating a mask of flat areas") log_file.write(time.ctime() + " - Creating a mask of flat areas" + ls) gp.ResetEnvironmentsO gp.Extent = 1idar_fill gp.SnapRaster = lidar_fill gp.Mask = lidar_fill gp.CellSize = 1idar_fill #calculate slope slope_temp = lidar_wspace + "\\slope_temp" if not gp.Exists(slope_temp): gp.Slope_sa(lidar_fill, slope_temp, "PERCENT_RISE") #convert slope = 0% to binary grid flat__areas = lidar_wspace + "\\fiat_areas" if not gp.Exists(flat_areas): gp.SingleOutputMapAlgebra_sa("con(" + slope_temp + " == 0, 1, 0)", flat_areas) gp.delete_management(slope_temp) #clean up if del_int_data == "True": gp.AddMessage(" - Deleting intermediate data") log_file.write(time.ctime() + " - Deleting intermediate data" + ls) if os.path.exists(carved_dem): 110 gp.delete_management(carved_dem) 643 644 if os.path.exists(lidar_stream): 645 gp.delete_management(lidar_stream) 646 if os.path.exists(lidar_burn): 647 gp.delete_management(lidar_bum) 643 if os.path.exists(lidar_fill): 649 gp.delete_management(lidar_fill) 650 65 1 652 gp.ResetEnvironments() 6S 3 654 seconds = time.clock() - 1idar_prep_time_start 65 3 h,m,s = sec_to_h_min(seconds) 65 6 6 S 7 gp.AddMessage(" - Tile " + str(lidar_raster_list[i]) + " took " + str(m) + " minutes " + 6 str(s) + " seconds") 5 8 log_file.write(time.ctime() + " - Tile " + str(lidar_raster_list[i]) + " took " + str(m) + " minutes " + str(s) + " seconds" + Is) 6 S 9 660 log_file.close() 66 1 del log_file 111 soooqauAuJN-i wwwwwmwwwwNNNNNNNNNN---~.—1...... cmanAwn—oxoooqmuhum—oomqauaw~_c APPENDIX B -— Python Code stream_dlg_conversion.py # # stream_dlg_conversion.py # Created September 2009 # Author: Glenn O'Neil # # Description: Takes a directory of DLG hydrography files and converts each # files relevant features to ESRI shapefiles. Crashes due to an # ESRI bug if it tries to process more than 70 files. # # Inputs: 1. d1g_wspace - directory containing DLG hypsography files. # 2. scratch_wspace — directory for temporary files. # 3. out_wspace - directory where the output shapefiles are written. # # Outputs: 1. __ohy.shp — shapefile of quadrangle contours. # import os, arcgisscripting, sys gp = arcgisscripting.create(9.3) d1g_wspace = sys.argv[ 1] scratch_wspace = sys.argv[2] + "\\" out_wspace = sys.argv[3] + "\\" #get the dlg files d1g_wspace_list = os.1istdir(dlg_wspace) dlg_list = [] #filter the list so it only contains .dlg files for i in range(len(dlg_wspace_list)): if '.d1g' in (dlg_wspace_list[i]): d1g_list.append(dlg_wspace_list[ 1]) del d1g_wspace_list d1g_list_length = len(dlg__list) for i in range(dlg_list_length): gp.AddMessage("Processing " + str(i + l) + " of " + str(dlg_list__length) + ":") #convert the dlg to a coverage gp.AddMessage(" - converting DLG to coverage") dlg_cov = scratch_wspace + "d1g_cov" gp.dlgarc(dlg_wspace + "\\" + dlg_list[i], dlg_cov) 112 4o 4 1 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 #convert the coverage to a shapefile gp.AddMessage(" - converting coverage to shapefile") dlg_shp = dlg_cov + "_arc.shp" gp.FeatureClassToShapefile(dlg_cov + "\\arc", scratch_wspace) #join the .acode file to the shapefile table on the ID field gp.AddMessage(" - joining acode table to shapefile") acode = dlg_cov + ".acode" gp.joinfield (dlg_shp, "ID", acode, "DLG_COV-ID") #extract the appropriate stream features gp.AddMessage(" - extracting stream features") #get the appropriate fields from dlg_shp code_fields = gp.]istfields(dlg_shp, 'MINOR*') select_query = "'MAIORI " = 50 AND (' for j in range(len(code_fields)): if j != (1en(code_fields) - I): select_query += "" + code_fieldsU].Name + '" IN (400,401 ,412,413,414,605,606) OR else: #treat the last record differently select_query += "" + code_fieldsfi].Name + "' IN (400,401,412,413,414,605,606))' dlg_out = out_wspace + dlg_list[i][:8] + ‘.shp' gp.select_analysis(dlg_shp, dlg_out, select_query) gp.AddMessage(" - deleting intermediate data") gp.delete_management(dlg_cov) gp.delete_management(d1g_shp) 70 del dlg_list, select_query 113 53,551,418..-“ 19 APPENDIX B — Python Code stream_id_nhood.py # # strearn_id_nhood.py # Created September 2009 # Author: Glenn O'Neil # # Description: Takes an elevation raster and identifies potential stream cells # based on their relationship to cell neighborhood of user-specified # size. The basic premise is that stream cells will be opposed # by higher elevation in one direction (stream banks) and similar # elevations in the direction perpendicular to the bank cells. # # Inputs: 1. elev_raster - the elevation raster. # 2. elev_diff - the difference in elevation between a cell and a # neighbor that, when exceeded, could potentially represent # a stream bank. # 3. neigh_maxsize - the maximum neighborhood size (in cells) to # search for the stream cell relationship defined above. # 4. stream_raster - the output binary stream raster. # 5. workspace - the directory where temporary files will be written. # # Outputs: 1. stream_raster - the output binary stream raster. # from decimal import * fi‘om math import floor import os, sys, time, arcgisscripting import raster_analysis as raster IS = os.1inesep gp = arcgisscripting.create(9.3) arcgis_home = os.environ.get("ARCGISHOME") gP-AddT°°lbox(angis_home + "ArcToolbox\\Toolboxes\\Conversion Tools.tbx") #function to convert seconds to hours and minutes (borrowed from: #httpz/lmail.python.org/pipennail/python-list/2003-January. l 81366.html def sec_to_h_min(s): temp = float() temp = float(s) / (60*60*24) d = int(temp) temp = (temp - d) * 24 114 h = int(temp) temp = (temp - h) * 60 m = int(temp) temp = (temp - m) "‘ 60 sec = int(temp) return h,m,sec #get the necessary files from the user elev_raster = sys.argv[l] #get the elevation difference that will identify streambed cells\ e1ev_diff= sys.argv[2] elev_diff = float(elev_difl) #get the max neighborhood size\ neigh_maxsize = sys.argv[3] #get the file name and path for the output stream raster stream_raster = sys.argv[4] #get the scratch workspace workspace = sys.argv[S] + "\\" #start the timer start_time = time.clock() #convert the raster to a text file elev_ascii_path = workspace + "elev_ascii.txt" gp.AddMessage("- Converting raster to ASCII") gp.RasterToASCII_conversion(elev_raster, elev_ascii_path) #record the header information for writing in the output stream file header_list = raster.extract_header(elev_ascii_path) #read the elevation text raster into a list elev_list = raster.rastertext21ist(elev__ascii_path, 'float') os.remove(elev_ascii_path) #intiate a list to represent the resulting stream ascii, set initial values to 0. stream_list = [U] "' 1en(e1ev_list) for i in range(len(stream_list)): stream_list[i] = [0] "' 1en(e1ev_list[0]) #Move through elev_list and look at the surrounding neighbors for #values that exceed the user-specified elevation difference. If one is found, #check the opposite cell. For example, if a southeast cell is above the 115 83 #threshold, check the corresponding northwest cell. If both are above the 84 #threshold, then the center cell may be the stream. To check, check the opposite 85 #set of cells (in the example above this would be the southwest and northwest 86 #cells. If they are below the threshold, then that direction may be the stream. 8 7 #Code those cells to l in the stream list. #counters for output streambed_cells = 0 #a function for analyzing neighborhoods def nhood (thelist, row_index, col_index, nsize): #thelist is the list to operate on #row_index is the row index number of the center cell #col_index is the column index number of the center cell #nsize is the size of the neighborhood buffer (nsize of 1 = 3x3 neighborhood) ubound_index = row_index - nsize #the upper index bbound_index = row_index + nsize #the bottom index Ibound_index = col_index - nsize #the left index rbound_index = col_index + nsize #the right index global streambed__cells #check to see if we're near the boundary of the dataset, and adjust accordingly if ubound_index < O: ubound_index = 0 if bbound_index > (len(thelist) - 1): bbound_index = (len(thelist) - 1) if Ibound_index < 0: Ibound_index = 0 if rbound_index > (1en(thelist[0]) - l): rbound_index = (1en(thelist[0]) - I) #find the cell in the neighborhood with the maximum steepness center_value = thelist[row_index][col_index] max_value = center_value max_location = [row_index,col_index] for k in range(ubound_index, bbound_index): for l in range(lbound_index, rbound_index): if thelist[k][l] > max_value: max_value = thelist[k][l] max_row_index = k max_col_index = I if (max_value - center_value) > elev_diff: #There was a cell in the neighborhod #higher than the center cell and above the elevation difference threshold. 116 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 #Now look for another steep cell on the opposite side of the center cell #first, find the distance between the max location and center cell row_distance = row_index - max_row_index col_distance = col_index - max_col_index abs_row_distance = abs(row_distance) if abs(row_distance) > 0 else 1 abs_col_distance = abs(col_distance) if abs(col_distance) > 0 else 1 opposite_row_index = row_index + row_distance opposite_col_index = col_index + col_distance #check for the boundaries if opposite_row_index < 0: opposite_row_index = 0 elif opposite_row_index > bbound_index: opposite_row_index = bbound_index if opposite_col_index < 0: opposite_col_index = 0 elif opposite_col_index > rbound_index: opposite_col_index = rbound_index if (thelist[opposite_row_index][opposite_col_index] - center_value) > elev_diff: #Then the opposite side is also above the threshold. The center of #the neighborhood may be a the stream. #Get the indexes of the recipricol diagonal (e.g. the peaks were northeast and southwest of #the center cell. Now check northwest and southwest to see if it's less than the threshold. recip_row_index1 = row_index - col_distance recip_col_indexl = col_index + row_distance recip_row_index2 = row_index + col_distance recip_col_index2 = col_index - row_distance if recip_row_index1 < 0: recip_row_index1 = 0 elif recip_row_index1 > bbound_index: recip_row_index1 = bbound_index if recip_row_index2 < 0: recip_row_index2 = 0 elif recip_row_index2 > bbound_index: recip_row_index2 = bbound_index if recip_col_indexl < 0: recip_col_indexl = 0 elif recip_col_indexl > rbound_index: recip_col_indexl = rbound_index if recip_col_index2 < 0: recip_col_index2 = 0 elif recip_col_index2 > rbound_index: recip_col_index2 = rbound_index if (thelist[recip_row_indexl][recip_col_indexl] - center_value) < elev_diff and (theliStIrecip_row_index2][recip_col_index2] - center_value) < elev_diff: 117 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 I99 200 201 202 203 204 205 206 207 208 209 streambed_cells += 1 #it might be the stream. #We now begin to define the stream basin. #In this instance, we will define a rectangular region, with the max value location, #it's opposite index, and the two reciprocal indidecs as the comers. #The area of the rectangle will be coded l, defining it as part of the streambed. #To do this we must determine the slope of the rectangles edges. This will #enable us to define the area within the edges. Though, we must determine the #path from one comer to the other. Here is an example. #If the slope was 15/7, it would make a big L shaped edge. #We need to find the relatively straight path from each comer to the next. #We do this by reducing the smaller number of the #slope to 1. For example, a slope of 15/7 would reduce to a horizontal (H) #movment of 1 for every 2.5 vertical movements (V). Of course we can't move #25 cells. So we will construct a list of vertical movements for the path fi'om # the max cell to the first reciprocal cell (one edge of the rectangle). In our #example above, half of the vertical movements in that list will be a value of 3, #and half will be 2. Because, along the path, for every horizontal movement of 1, #50% of the time the subsequent vertical movement will be 2, and 50% of the time 3. #We had previously determined the absolute row and col distance from the max cell #to the center cell. The slope of the edge from themax cell to the first reciprocal #cell (one edge of the rectangle) is as follows: #Calculate new row and col distances. This time from the max index to the first #Watch out for the edges. They could throw off these calculations #using absolute values can help avoid the problem. Just swap the values orig_max_row_index = max_row_index orig_max_col_index = max_col_index orig_recip_row_indexl = recip_row_index1 orig_recip_col_index1 = recip_col_indexl if (abs(row_index - max_row_index) < abs(row_index - opposite_row_index» or (abs(col_index - max_col_index) < abs(col_index - opposite_col_index»: #you need to switch the max and opposite indexes max_row_index = opposite_row_index max_col_index = opposite_col_index opposite_row_index = orig_max_row_index opposite_col_index = orig_max_col_index 118 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 24 1 242 243 244 245 246 247 248 249 250 25 1 if (abs(row_index - recip_row_index1) < abs(row_index - recip_row_index2)) or (abs(col_index - recip_col_indexl) < abs(col_index - recip_col_index2)): #you need to switch the reciprocal indexes recip_row_index1 = recip_row_index2 recip_col_indexl = recip_col_index2 recip_col_index2 = orig_recip_col_indexl recip_row_index2 = orig_recip_row_indexl row_distance2 = max_row_index - recip_row_index1 col_distance2 = max_col_index - recip_col_indexl abs_row_distance2 = float(abs(row_distance2)) abs_col_distance2 = float(abs(col_distance2)) #Initiate lists that will store the h_move and v_move lists. #The first list will contain the h_move and v_move values for the path #from the max index to the first reciprocal index. #The second list will contain the h_move and v_move values for the path #from the max index to the second reciprocal index. #The stream basin will be defined by looping throught the first list #within the second list. #The list sizes will be the same as the minimum value between abs_row_distance2 #and abs_col_distance2. list_size = min(abs_row_distance2, abs_col_distance2) if list_size = 0: list_size = 1 abs_difference = abs(abs_row_distance2 - abs_col_distance2) move_quotient = 0 #Detennine the h_move and v_move if abs_row_distance2 < abs_col_distance2 and abs_row_distance2 != 0: #we're stepping horizontally primary_move = "horizontal" #recod the maximum H distance for a V distance of 1 if col_distance2 > 0: #we're stepping left if abs_row_distance2 > 0: move_quotient = (abs_col_distance2/abs_row_distance2) * -l min_h_move = int(floor(move_quotient)) + 1 if abs(abs_col_distance2 % abs_row_distance2) > 0: max_h_move = min_h_move - 1 else: max_h_move = min_h_move 119 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 else: #it's a straight line down to the destination index min_h_move = col_distance2 max_h_move = min_h_move elif col_distance2 < 0: #we're stepping right if abs_row_distance2 > 0: move_quotient = abs_col_distance2/abs_row_distance2 min_h_move = int(floor(move_quotient)) if abs(abs_col_distance2 % abs_row_distance2) > 0: max_h_move = min_h_move + 1 else: max_h_move = min_h_move else: #it's a straight line down to the destination index min_h_move = col_distance2 "‘ -l max_h_move = min_h_move if row_distance2 < 0: v_move = l elif row_distance2 > 0: v_move = -1 else: v_move = 0 if min_h_move == max_h_move: h_move_listl = [min_h_move] * int(list_size) else: h_move_listl = move_list(list_size, min_h_move, max_h_move, move_quotient, row_index, col_index) v_move_list1 = [v_move] * int(list_size) e1ifabs_col_distance2 < abs_row_distance2 and abs_col_distance2 != 0: #we're stepping vertically primary__move = "vertical" #recod the maximum V distance for an H distance of 1 . if row_distance2 > O: #we're stepping up (visually, not in terms of row index numbers) if abs_col_distance2 > 0: move_quotient = (abs_row_distance2/abs__col_distance2) * -1 min_v_move = int(floor(move_quotient)) + 1 if abs(abs_row_distance2 % abs_col_distance2) > 0: max_v_move = min_v_move - 1 else: max_v_move = min_v_move else: #it's a straight line down to the destination index min_v_move = row_distance2 max_v_move = min_v_move 120 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 elif row_distance2 < 0: #we're stepping down (visually, not in terms of row index numbers) if abs_col_distance2 > O: move_quotient = abs_row_distance2/abs_col_distance2 min_v_move = int(floor(move_quotient)) if abs(abs_row_distance2 % abs_col_distance2) > O: max_v_move = min_v_move + 1 else: max_v_move = min_v_move else: min_v_move = row_distance2 * -l max_v_move = min_v_move if col_distance2 < 0: h_move = l elif col_distance2 > 0: h_move = -1 else: h_move = 0 if min_v_move == max_v_move: v_move_listl = [min_v_move] "‘ int(list_size) else: v_move_list1 = move_list(list_size, min_v_move, max_v_move, move_quotient, row_index, col_index) h_move_listl = [h_move] * int(list_size) else: #the're equal, it's a square primary_move = "either" if row_distance2 < 0: v_move = l e1ifrow_distance2 > 0: v_move = -1 else: v_move = 0 if col_distance2 < 0: h_move = 1 elif col_distance2 > 0: h_move = -1 else: h_move = 0 v_move_listl = [v_move] * int(list_size) h_move_list1 = [h_move] * int(list_size) #check to see if it's a perfect square if v_move == : h_move_1ist1 = [h_move] * int(abs_col_distance2) v_move_list1 = [0] "‘ int(abs_col_distance2) if h_move == : h_move_listl = [0] * int(abs_row_distance2) 121 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 numbers) cell, v_move_listl = [v_move] * int(abs_row_distance2) #figure out the distances from the max index to recipricol index2. row_distance3 = max_row_index - recip_row_index2 col_distance3 = max_col_index - recip_col_index2 h_move_list2 = [] v_move_list2 = [] if col_distance3 < 0: #we're stepping to the right for k in range(len(v_move_list1)): h_move_list2 .append(abs(v_move_listl [k])) elif col_distance3 > 0: #we‘re stepping to the left for k in range(len(v_move_list1)): . h_move_list2 .append(abs(v_move_list1 [k])* -1 ) else: h_move_list2 = [0] * 1en(v_move_listl) if row_distance3 < 0: #we're stepping down (visually, not in terms of row index for k in range(len(h_move_listl )): v_move_list2.append(abs(h_move_listl [k])) elif row_distance3 > O: #we're stepping up for k in range(len(h_move_listl)): v_move_list2.append(abs(h_move_list1 [k])* -1) else: v_move_list2 = [0] "‘ len(h_move_list1) #Call a function that takes the four comers of the stream basin (the max, its opposite #and the two recipracols) and constructs a square using those points as the comers. #Then set all of the values in that area to l for the stream_list. . . river_bed(max_row_index, max_col_index, v_move_lrstl , h_move_lrstl , v_move_lrstZ, h_move_list2, primary_move, ubound_index, bbound_index, Ibound_index, rbound_index) def move_list(list_length, min_move, max_move, quotient, row, col): #This function constructs the move lists from one cell to another. the_list = [min_move] * int(list_length) max_index_list = [] 122 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 41 1 412 413 414 415 416 move_difference = quotient - min_move #gives us the percentage of the time that max_move will be used. max_index_places = abs(list_length * move_difference) max_index_step = list_length / max_index_places if int(abs(move_difference) * 10) >= 5: # then the max move will dominate the list or split it evenly, and should be first max_index_holder = 0 else: #it should start later in the list max_index_holder = int(round(max_index_step)) - 1 #minus 1 to put it in proper zero-based index mode max_index_step_counter = 0 while max_index_holder < len(the_list): max_index_list.append(max_index_holder) max_index_step_counter += max_index_stcp max_index_holder = int(round(max_index_step_counter)) for i in range(len(the_list)): if i in max_index_list: ’the_list[i] = max_move return the_list def river_bed(max_row_index, max_col_index, v_move_listl, h_move_listl , v_move_list2, h_move_list2, primary_move, ubound_index, bbound_index, Ibound_index, rbound_index): #this function will loop through the v_move_listl and h_move_listl, while looping though a #reciprocal version of each list, v_move_list2 and h_move_list2. current_row_indexl, current_row_index2 = max_row_index, max_row_index current_col_indexl, current_col_index2 = max_col_index, max_col_index stream_list[current_row_index1][current_col_indexl ] = 1 #need to make an initial run through v_move_listl and h_move_listl if primary_move == "vertical": for p in range(len(v_move_list1)): for q in range(abs(v_move_list1[p])): if v_move_listl [p] > 0 : current_row_indexl += 1 elif v_move_listl[p] < 0: current_row_indexl -= 1 if current_row_indexl > bbound_index: current_row_indexl = bbound_index 123 417 418 419 420 421 422 423 424 425 426 427 428 429 430 43 1 432 433 434 435 436 437 438 439 440 441 442 443 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 elif current_row_indexl < 0: current_rowwindexl = 0 stream_list[current_row_index I ][current_col_index l ] = 1 if h_move_listl [p] > 0 : current_col_indexl += 1 elif h_move_listl [p] < 0: current_col_indexl -= 1 if current_col_indexl > rbound_index: current_col_indexl = rbound_index elif current_col_indexl < 0: current_col_indexl = 0 stream_list[current_row__index l ][current_col_index l] = 1 elif primary_move == "horizontal": for p in range(len(v_move_list1)): for q in range(abs(h_move_listl [p])): if h_move_listl[p] > 0 : current_col_indexl += 1 elifh__move_list1 [p] < 0: current_col_indexl -= 1 if current_col_indexl > rbound_index: current_col_indexl = rbound_index e1ifcurrent_col_index1 < 0: current_col_indexl = O stream_list[current_row_index 1][current_col_index 1] = 1 if v_move_listl[p] > 0 : current_row_indexl += 1 elif v_move_listl [p] < 0: current_row_indexl -= 1 if current_row_indexl > bbound_index: current_row_indexl = bbound_index elif current_row_indexl < 0: current_row_indexl = 0 stream_list[current_row_index l ][current_col_index l ] = 1 for p in range(len(v_move_list1)): if h_move_listl [0] > 0 : current_col_indexl += 1 elif h_move_listl [0] < 0: current_col_indexl -= I if current_col_indexl > rbound_index: current_col_indexl = rbound_index elif current_col_indexl < 0: current_col_indexl = 0 if v_move_listl [0] > O : current_row_indexl += 1 elif v_move_listl [0] < 0: current_row_indexl -= 1 if current_row_indexl > bbound_index: current_row_indexl = bbound_index elifcurrent_row_index1 < 0: current_row_indexl = 0 124 ##A 460 461 462 463 464 465 466 467 468 469 470 47 I 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 50 l 502 stream_list[current_row_index l ][current_col_index l] = 1 if h_move_list2[0] > 0: current_col_index2 += 1 elif h_move_list2[0] < 0: current_col_index2 -= 1 if current_col_index2 > rbound_index: current_col_index2 = rbound_index elif current_col_index2 < 0: current_col_index2 = 0 if v_move_list2[0] > 0: current_row_index2 += 1 elif v_move_li512[0] < 0: current_row_index2 -= 1 if current_row_index2 > bbound_index: current_row_index2 = bbound_index elif current_row_index2 < 0: current_row_index2 = 0 current_row_indexl, current_col_indexl = current_row_index2, current_col_index2 #now loop through both sets of lists for m in range(len(v_move_list2)): #need to caputre loop of v_move_listZ if primary_move == "vertical": for n in range(abs(h_move_list2[m])): stream_list[current_row_index2][current_col_indexZ] = l for p in range(len(v_move_list1)): for q in range(abs(v_move_list1[p])): if v_move_listl[p] > O : current_row_indexl += 1 elif v_move_listl [p] < 0: current_row_indexl -= 1 if current_row_indexl > bbound_index: current_row_indexl = bbound_index elif current_row_indexl < 0: current_row_indexl = 0 stream_list[current_row_index I ][current_col_index I] = I if h_move_listl[p] > O : current_col_indexl += 1 elif h_move_listl[p] < 0: current_col_indexl -= 1 if current_col_indexl > rbound_index: current_col_indexl = rbound_index elifcurrent_col_index1 < 0: current_col_indexl = 0 stream_list[current_row_index l ][current_col_index 1 ] = 1 if h_move_list2[0] > 0: current_col_index2 += 1 elif h_move_list2[0] < O: current_col_index2 -= 1 125 503 504 505 506 507 508 509 510 51 1 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 if current_col_index2 > rbound_index: current_col_index2 = rbound_index elif current_col_index2 < O: current_col_index2 = 0 stream_list[current_row_index2][current_col_index2] = 1 current_col_indexl = current_col_index2 current_row_indexl = current_row_index2 if v_move_list2[0] > O: current_row_index2 += 1 elif v_move_list2[0] < 0: current_row_index2 -= if current_row_index2 > bbound_index: current_row_index2 = bbound_index elif current_row_index2 < 0: current_row_index2 = O current_row_indexl = current_row_index2 elif primary_move == "horizontal": for n in range(abs(v_move_list2[m])): stream_list[current_row_index2][current_col_index2] = 1 for p in range(len(v_move_list1)): for q in range(abs(h_move_listl [p])): if h_move_listl [p] > 0 : current_col_indexl += 1 elif h__move__listl[p] < 0: current_col_indexl -= 1 if current_col_indexl > rbound_index: current_col_indexl = rbound_index elif current_col_indexl < 0: current_col_indexl = 0 stream_list[current_row_index1][current_col_index1 ] = 1 if v_move_listl[p] > 0 : current_row_indexl += 1 elif v_move_listl [p] < 0: current_row_indexl -= 1 if current_row_indexl > bbound_index: current_row_indexl = bbound_index elif current_row_indexl < O: current_row_indexl = 0 stream_list[current_row_index l ][current_col_index l ] = 1 if v_move_list2[0] > 0: current_row_index2 += 1 elif v_move_list2[0] < 0: current_row_index2 -= 1 126 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 if current_row_index2 > bbound_index: current_row_index2 = bbound_index elif current_row_index2 < 0: current_row_index2 = 0 stream_list[current_row_index2][current_col_index2] = 1 current_row_indexl = current_row_index2 current_col_indexl = current_col_index2 if h_move_list2[0] > O: current_col_index2 += 1 elif h_move_list2[0] < 0: current_col_index2 -= if current_col_index2 > rbound_index: current_col_index2 = rbound_index elif current_col_index2 < 0: current_col_index2 = 0 current_col_indexl = current_col_index2 else: stream_list[current_row_index2][current_col_index2] = l for p in range(len(v_move_list1)): if h_move_listl [0] > O : current_col_indexl += 1 elifh_move_list1[0] < 0: current_col_indexl -= 1 if current_col_indexl > rbound_index: current_col_indexl = rbound_index elif current_col_indexl < 0: current_col_indexl = 0 stream_list[current_row_index1 ][current_col_index 1] = 1 if v_move_listl[O] > 0 : current_row_indexl += 1 elifv_move_list1[0] < 0: current_row_indexl -= 1 if current_row_indexl > bbound_index: current_row_indexl = bbound_index elif current_row_indexl < O: current_row_indexl = 0 stream_list[current_row_index l ][current_col_index 1 ] = 1 if h_move_list2[0] > 0: current_col_index2 += 1 elif h_move_list2[0] < 0: current_col_index2 -= 1 if current_col_index2 > rbound_index: current_col_index2 = rbound_index elif current_col_index2 < 0: current_col_index2 = 0 127 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 61 1 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 63 I if v_move_list2[0] > 0: current_row_index2 += 1 elif v_move_list2[0] < 0: current_row_index2 -= 1 if current_row_index2 > bbound_index: current_row_index2 = bbound_index elif current_row_index2 < O: current_row_index2 = 0 current_row_indexl, current_col_indexl = current_row_index2, current_col_index2 percent_done = 0 for i in range(len(elev_list)): for j in range(len(elev_list[i])): nhood(elev_list, i, j, int(neigh_maxsize)) percent = int((float(i) / (len(stream_list) - 1) * 100)) if percent > percent_done: gp.AddMessage(' - Analyzing: '+ str(percent) + '% completed') percent_done = percent #write the output stream file stream_file_text_name = stream_raster + ".txt" stream_file = open(stream_file_text_name, 'w') for i in range(len(header_list)): stream_file.write(header_list[i]) percent_done = 0 for i in range(len(elev_list)): for j in range(len(elev_list[i])): stream_file.write('%s ' % (stream_list[i][i])) stream_file.write(ls) percent = int((float(i) / (len(stream_list) - 1) * 100)) if percent > percent_done: percent_done = percent stream_file.close() gp.ASCIIToRaster_conversion(stream_file_text_name, stream_raster, "INTEGER") os.remove(stream_fi1e_text_name) gp.AddMessage(" - There were " + str(streambed_cells) + " recip_small_slopes") stop_time = time.clock() - start_time seconds = stop_time - start_time h,m,s = sec_to_h_min(seconds) gp.AddMessage(' - Took ' + str(h) + ' hours ' + str(m) + ' minutes ' + str(s) + ' seconds') 128 \ONQONMAWN— wwuwwwwwwwwNNNNNNNNN-du—Ir—tu—Ia—I—u—nu—r—a— OMQO~MAwNHO©OOQO~thN~O©OO\JONUt-h-wN—‘O APPENDIX B — Python Code stream_id_transect.py # # stream_id_transect.py # Created September 2009 # Author: Glenn O'Neil # # Description: Takes an elevation raster and identifies potential stream cells # based on their relationship within directional transects. The basic premise is that stream cells in the middle of a transect will be opposed by higher elevations towards the ends of the transect. Inputs: 1. elev_raster_path - the elevation raster. 2. stream_width - the maximum width of the stream, and therefore the maximum width of the transect. 3. bank_depth - the theshold elevation difference defining a stream cell from a stream bank cell. 4. stream_output_raster - the output binary stream raster. Outputs: 1. stream_output_raster - the output binary stream raster. 3t¥==tt¥t=tt1t¢t¢t=lt=tt3t=lt¢t from decimal import "‘ from math import floor import time, os, sys, arcgisscripting import raster_analysis as raster 1s = os.linesep gp = arcgisscripting.create(9.3) #function to convert seconds to hours and minutes (borrowed from: #httpzl/mail.python.org/pipermail/python-list/2003January. 1 81366.html def sec_to_h_min(s): temp = floatO temp = float(s) / (60*60*24) d = int(temp) temp = (temp - d) * 24 h = int(temp) temp = (temp - h) * 60 m = int(temp) 129 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 temp = (temp - m) "' 60 sec = int(temp) return h,m,sec #Get the necessary files from the user elev_raster_path = sys.argv[l] #Get the maximum width of the stream stream_width = sys.argv[2] stream_width = float(stream__width) #Get the minimum depth of the stream bed to the stream bank bank_depth = sys.argv[3] bank_depth = float(bank_depth) #Output raster stream_output_raster = sys.argv[4] workspace = stream_output_raster[:stream_output_raster.rfind('\\')] + '\\' #Get the file name and path for the output stream file stream_fi1e_name = workspace +'strt1ant' + str(int(stream_width)) + 'e' + str(int(bank_depth)) + '.txt' #Start the timer start_time = time.clock() #Convert the raster to a text file elev_ascii_path = workspace + "elev_ascii.txt" gp.AddMessage("- Converting raster to ASCII") gp.RasterToASClI_conversion(elev_raster_path, elev_ascii_path) #record the header information for writing in the output stream file header_list = raster.extract_header(elev_ascii_path) #read the elevation text raster into a list elev_list = raster.rastertext2list(elev_ascii_path, 'float') os.remove(elev_ascii_path) #Intiate a list to represent the resulting stream ascii, set initial .values to 0. stream_list = [fl] "‘ 1en(e1ev_list) for i in range(len(stream_list)): 130 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 stream_list[i] = [0] * 1en(e1ev_list[0]) #Move through the list along transects to identify stream cells. Start by moving #vertically from the top cells to the bottom cells, beginning with the upper-left #cell. Move south, and note any locations where elevation decreases, and then increases #within the user-specified maximum stream width. You may have to set a minimum #stream width threshold if this approach picks up too many non-stream features. #Counters for output streambed_cells = 0 #A function to analyze a transect of cells. The number of cells is equal to the #user specified maximum stream width def transect(thelist): #thelist is the list to operate on global bank_similarity, bank_depth #ldentify the minimum elevation along the transect min_elev = min(thelist) #Note its index in the list for m in range(len(thelist)): if thelist[m] == min_elev: min_index = m break #Check the cells afier the minimum_index to see if elevation increases significantly max_elev = max(thelist[min_index:len(thelist)]) if (thelist[O] - min_elev) > bank_depth and (max_elev - min_elev) > bank_depth: transect_is_stream = True else: transect_is_stream = False return transect_is_stream #initiate a transect list to send to the transect function transect_list = [0] * int(stream_width) transect_called = 0 131 125 126 127 128 129 130 131 132 133 134 I35 136 137 138 139 140 141 142 143 144 145 146 147 148 I49 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 #Move through the list vertically, from northernmost cells to sourthemmost, #looking for decreases in elevation percent_done = 0 for i in range(len(elev_list[0])): for j in range(len(elev_list)): elev = elev_list[j][i] #Check the next elevation, watch out for the boundary if (j + 1) < len(elev_list): elev_next = elev_list[j + l][i] else: break if elev > elev_next: #We might have the beginning of a stream, send the next K cells to the transect function for k in range(int(stream_width)): #Make sure you're not at the dataset boundary if (j + k) < len(elev_list) and i < len(elev_list[j]): transect_list[k] = elev_list[j + k][i] is_stream = transect(transect_list) transect_called += 1 if is_stream: for k in range(int(stream_width)): if (j + k) < len(elev_list) and i < len(elev_list[j]): stream_list[j + k][i] = 1 percent = int((float(i) / (len(elev_list[OD - 1) * 100)) if percent > percent_done: gp.AddMessage(’ - Analyzing vertical transects: '+ str(percent) + '% completed') percent_done = percent #Move through the list horizontally, from westernmost cells to easternmost, #looking for decreases in elevation percent_done = 0 for i in range(len(elev_list)): for j in range(len(elev_list[i])): elev = elev_list[i][j] #Check the next elevation, watch out for the boundary if (j + 1) < 1en(e1ev_list[0]): e1ev__next= elev_list[i][j + 1] else: break 132 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 I83 184 185 186 187 188 189 190 191 I92 193 194 195 196 197 198 199 200 201 202' 203 204 205 206 207 if elev > elev_next: #We might have the beginning of a stream, send the next K cells to the transect function for k in range(int(stream_width)): #Make sure you're not at the dataset boundary if (j + k) < len(elev_list[jD and i < len(elev_list): transect_list[k] = elev_list[i][j + k] is_stream = transect(transect_list) transect_called += 1 if is_stream: for k in range(int(stream_width)): if (j + k) < len(elev_list[jD and i < len(elev_list): stream_list[i][j + k] = 1 percent = int((float(i) / (len(elev_list[0]) - 1) * 100)) if percent > percent_done: gp.AddMessage(' - Analyzing horizontal transects: ' + str(percent) + '% completed') percent_done = percent #Move through the list diagnollay, from northwest cells to southeast, looking for decreases in elevation percent_done = 0 for i in range(len(elev_list)): for j in range(len(elev_list[i])): elev = elev_list[i][j] #Check the next elevation, watch out for the boundary if (i + 1) < len(elev_list[0]) and (i + l) < len(elev_list): elev_next = elev_list[i + l][i + 1] else: break if elev > elev_next: #We might have the beginning of a stream, send the next K cells to the transect function for k in range(int(stream_width)): #Make sure you're not at the dataset boundary if (j + k) < len(elev_list[j]) and (i + k) < len(elev_list): transect_list[k] = elev_list[i + k][j + k] is_stream = transect(transect_list) transect_called += 1 if is_stream: for k in range(int(stream_width)): 133 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 if (j + k) < len(elev_list[jD and (i + k) < len(elev_list): stream_list[i + k][i + k] = 1 percent = int((float(i) / (len(elev_list[0]) - l) * 100)) if percent > percent__ done: gp. AddMessage('- Analyzing diagonal (N W SE) transects: '+ str(percent) + '% completed' ) percent_done = percent #Move through the list diagnollay, from southwest cells to northeast, #looking for decreases in elevation percent_done = 0 for i in range(len(elev_list) - 1, -l, -1): for j in range(len(elev_list[i])): elev = elev_list[i][j] #Check the next elevation, watch out for the boundary if (j + l) < len(elev_list[OD and (i - 1) >= 0: elev_next = elev_list[i - l][j + 1] else: break if elev > elev_next: #We might have the beginning of a stream, send the next K cells to the transect function for k in range(int(stream_width)): #Make sure you're not at the dataset boundary if (j + k) < len(elev_list[jD and (i - k) >= 0: transect_list[k] = elev_list[i - k][j + k] is_stream = transect(transect_list) transect_called += 1 if is_stream: for k in range(int(stream_width)): if (j + k) < len(elev_list[jD and i >= 0: stream_list[i - k][i + k] = percent = int(float((len(elev_list[0]) - i + l) / float(len(elev_1ist[0]) - 1)) * 100) if percent > percent_done: gp.AddMessage(' - Analyzing diagonal (SW-NE) transects: '+ str(percent) + '% completed') percent_done = percent #Write the output stream file stream_file = open(stream_file_name, 'w') 134 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 for i in range(len(header_list)): stream_file.write(header__list[i]) percent_done = 0 for i in range(len(elev_list)): for j in range(len(elev_list[i])): stream_file.write('%s ' % (stream_list[i][j])) stream_file.write(ls) stream_file.close() gp.AddMessage(' - Converting stream output to raster.') gp.ASCIIToRaster_conversion(stream_file_name, stream_output_raster, "INTEGER") os.remove(stream_file_name) os.remove(elev_ascii_path) print "%i transects were analyzed " % (transect_called) seconds = time.clock() - start_time h,m,s = sec_to_h_min(seconds) print 'Took ' + str(h) + ' hours ' + str(m) + ' minutes ' + str(s) + ' seconds' 135 ©00\)O\MJ>WN~ APPENDIX B - Python Code carve.py # # carve.py # Created September 2009 # Author: Glenn O'Neil # # Description: Takes a DEM and binary raster of stream locations, identifies # sinks in the DEM, and carves paths through artificial barriers # within the stream locations of the DEM. Based on work by # Soille. # # Inputs: 1. elev_raster - full path to the DEM. . stream_raster - binary raster of stream locations, must be 31: N the same row and column size as the DEM. 3. max_carve_length - the maximum distance (in cells) that a carving can cover. 4. iterations - the number of times the script should iterate through the DEM to carve through barriers. Each loop may yield new sinks that need to be carved. 5. delete_intermediate - boolean value indicating whether to to delete the temporary folders for each iteration. 6. output_raster - the final carved DEM. 7. workspace - folder where intermediate data is stored. Outputs: 1. output_raster - the final carved DEM. %Itittfiittkit3t3t3t1tltt3t3t from decimal import * from math import floor import os, sys, copy, arcgisscripting, shutil ls = os.linesep import raster_analysis as raster gp = arcgisscripting.create(9.3) # Load required toolboxes... arcgis_home = os.environ.get("ARCGlSHOME") gp.AddToolbox(arcgis_home + "ArcToolbox\\Toolboxes\\Data Management Tools.tbx") 136 4o 41 42 43 44 45 46 47 48 49 so 51 52 53 54 55 56 57 58 59 6o 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 gp.AddToolbox(arcgis_home + "ArcToolbox\\Toolboxes\\Conversion Tools.tbx") gp.CheckOutExtension("Spatial") #get the necessary files fi'om the user elev_raster = sys.argv[l] stream_raster = sys.argv[2] max_carve_length = sys.argv[3] max_carve_length = int(max_carve_length) #number of times to iterate iterations = sys.argv[4] iterations = int(iterations) delete_intermediate = sys.argv[S] output_raster = sys.argv[6] workspace = sys.argv[7] + "\\" #Move through elev_list and look for cells that are sinks AND stream cells. #If that condition is met, begin a radial search looking for an elevation lower #than the selected cell. If/once that lower cell is found, carve a path to the #cell by manually lowering elevation values along the path incrementally, #yielding a slope through the potential barrier that created the sink. def move_list(list_length, min_move, max_move, quotient): #This function constructs the move lists from one cell to another. the_list = [min_move] * int(list_length) max_index_list = [] move_difference = quotient - min_move #gives us the percentage of the time that max_move will be used. max_index_places = abs(list_length * move_difference) max_index_step = list_length / max_index_places if int(abs(move_difference) * 10) >= 5: # then the max move will dominate the list or split it evenly, and should be first max_index_holder = 0 else: #it should start later in the list max_index_holder = int(round(max_index_step)) - 1 #minus 1 to put it in proper zero- based index mode max_index_step_counter = O 137 81 while max_index_holder < len(the_list): 82 max_index_list.append(max_index__holder) 83 max_index_step_counter += max_indexnstep 84 max_index_holder = int(round(max_index_step_counter)) 85 86 for i in range(len(the_list)): 87 if i in max_index_list: 88 the_list[i] = max_move 89 90 return the_list 91 92 93 def carve(row_index, col_index): 94 global elev_list, elev_out_list, max_carve_length, rbound_index, bbound_index, Ibound_index, ubound_index 95 96 sink_elev = elev_list[row_index][col_index] 97 carve_to_elev = sink_elev 98 99 #initiate the initial neighborhood size to search. You don't start with l 100 #because the adjacent neighbors were searched in the sink identification. 101 nhood_size = 2 102 while not carve_to_elev < sink_elev and nhood_size <= max_carve_length: 103 104 #get the cell's neighbors 105 nhood_list = raster.nhood(elev_list, row_index, col_index, nhood_size) 106 107 #get the minimum elevation value from the returned nhood_list 108 #only check the outer edges since the others had already been checked 109 row_length = len(nhood__list) 110 col_length = len(nhood_list[0]) l 1 1 #check the first row 112 min_elev_top = min(nhood_list[0]) 113 min_elev_top_index = nhood_list[0].index(min_elev_top) 114 1 15 #check the bottom row 116 min_elev_bottom = min(nhood_list[row_length - 1]) 117 min_elevgbottom_index = nhood_list[row__length - l].index(min_elev_bottom) 118 1 19 #check the left column 120 min_elev_lefi = nhood_list[0][0] 121 min_elev_left_index = 0 122 for i in range(row_length): 138 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 if nhood_list[i][O] < min_elev_lefi: k min_elev_lefl = nhood_list[i][O] min_elev_lefi_index = i #check the right column min_elev_right = nhood_list[0][col_length - l] min_elev_right_index = 0 for i in range(row_length): if nhood_list[i][col_length - 1] < min_elev_right: min_elev_right = nhood_list[i][col_length - 1] min_elev_right_index = 1 #determine which elevation and index has the lowest elevation min_elev = min(min_elev_top, min_elev_bottom, min_elev_lefi, min_elev_right) #check to see if we've found an elevation less than the sink's if min_elev < sink_elev: #note the indexes of the lower sink if min_elev == min_elev_top: lower_sink_row = row_index - (int(row_length / 2)) #easy since its the top row lower_sink_col = (col_index - (int(col_length / 2))) + min_elev_top_index e1ifmin_elev == min_elev_bottom: lower_sink_row = row_index + (int(row_length / 2)) #easy since its the bottom row lower_sink_col = (col_index - (int(col_length / 2))) + min_elev_bottom_index elif min_elev == min_elev_left: lower_sink_row = (row_index - (int(row_length / 2))) + min_elev_left_index lower_sink_col = col_index - (int(row_length / 2)) #easy since its the left column elif min_elev == min_elev_right: lower_sink_row = (row_index - (int(row_length / 2))) + min_elev_right_index lower_sink_col = col_index + (int(row_length / 2)) #easy since its the left column nhood_size += 1 carve_to_elev = min_elev #check to see if the search was successful if nhood_size > max_carve_length: #couldn't find a lower elevation within the max neighborhood size no_lower_sink__found.append(‘row: ' + str(row_index) +' col: '+ str(col_index)) return 139 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 #carve from row_index and col_index to lower_sink_row and lower_sink_col row_dist = row_index - lower_sink_row col_dist = col_index - lower_sink_col abs_row_dist = float(abs(row_dist)) abs_col_dist = float(abs(col_dist)) #Initiate lists that will store the h_move and v_move lists. #The first list will contain the h_move and v_move values for the path #from the row_index and col_index to the lower_sink_row and lower_sink_col. #The list sizes will be the same as the minimum value between abs_row_dist #and abs_col_dist. list_size = min(abs_row_dist, abs_col_dist) if list_size == 0: list_size = l move_quotient = 0 #Determine the h_move and v_move if abs_row_dist < abs_col_dist and abs_row_dist != 0: #we're stepping horizontally primary_move = "horizontal" #recod the maximum H distance for a V distance of 1 if col_dist > 0: #we're stepping left if abs_row_dist > 0: move_quotient = (abs_col_dist/abs_row_dist) * -l min_h_move = int(floor(move_quotient)) + 1 if abs(abs_col_dist % abs_row_dist) > 0: max_h_move = min_h_move - 1 else: max_h_move = min_h_move else: #it's a straight line down to the destination index min_h_move = col_dist max_h_move = min_h_move elif col_dist < 0: #we're stepping right if abs_row_dist > 0: move_quotient = abs_col_dist/abs_row_dist min_h_move = int(floor(move_quotient)) if abs(abs_col_dist % abs_row_dist) > O: max_h_move = min_h_move + 1 else: max_h_move = min_h_move else: #it's a straight line down to the destination index min_h_move = col_dist * -l max_h_move = min_h_move 140 209 210 21 1 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 if row_dist < 0: v_move = 1 elif row_dist > 0: v_move = -1 else: v_move = 0 if min_h_move == max_h_move: h_move_list = [min_h_move] "' int(list_size) else: h_move_list = move_list(list_size, min_h_move, max_h_move, move_quotient) v_move_list = [v_move] * int(list_size) elif abs_col_dist < abs_row_dist and abs_col_dist != 0: #we're stepping vertically primary_move = "vertical" #recod the maximum V distance for an H distance of 1 if row_dist > 0: #we're stepping up (visually, not in terms of row index numbers) if abs_col_dist > 0: move_quotient = (abs_row_dist/abs_col_dist) * -1 min_v_move = int(fioor(move_quotient)) + 1 if abs(abs_row_dist % abs_col_dist) > 0: max_v_move = min_v_move - 1 else: max_v_move = min_v_move else: #it's a straight line down to the destination index min_v_move = row_dist max_v_move = min_v_move elif row_dist < 0: #we're stepping down (visually, not in terms of row index numbers) if abs_col_dist > 0: move_quotient = abs_row_dist/abs_col_dist min_v_move = int(floor(move_quotient)) if abs(abs_row_dist % abs_col_dist) > 0: max_v_move = min_v_move + 1 else: max_v_move = min_v_move else: min_v_move = row_dist "‘ -l max_v_move = min_v_move if col_dist < 0: h_move = l elif col_dist > 0: h_move = -1 else: h_move = 0 if min_v_move == max_v_move: 141 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 v_move_list = [min_v_move] * int(list_size) else: v_move_list = move_list(list_size, min_v_move, max_v_move, move_quotient) h_move_list = [h_move] * int(list_size) else: #the're equal, it's a square primary_move = "either" if row_dist < 0: v_move = 1 elif row_dist > 0: v_move = -1 else: v_move = 0 if col_dist < 0: h_move = 1 elif col_dist > 0: h_move = -1 else: h_move = O v_move_list = [v_move] * int(list_size) h_move_list = [h_move] * int(list_size) #check to see if it's a perfect square if v_move == 0: h_move_list = [h_move] * int(abs_col_dist) v_move_list = [O] * int(abs_col_dist) if h_move == : h_move_list = [0] * int(abs_row_dist) v_move_list = [v_move] * int(abs_row_dist) #we now need to determine the length of the path, and how much the incremental elevation decline will be if primary_move == 'either': path_length = len(h_move_list) else: path_length = abs(sum(h_move_1ist)) + abs(sum(v_move__list)) elev_diff = sink_elev - min_elev elev_decline_increment = elev_diff / float(path_length) current_elev = sink_elev #variables to keep track of the indexes the path moves through current_row = row_index current_col = col_index 142 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 if primary_move == 'horizontal': #iterate through the h_move_list for m in range(len(h_move_list)): h_move = h_move_list[m] for n in range(abs(h_move)): if h_move_list[m] > 0: current_col += 1 elif h_move_list[m] < 0: current_col -= 1 current_elev -= elev_decline_increment elev_out_list[current_row][current_col] = current_elev if v_move_list[m] > 0: current_row += 1 elif v_move_list[m] < 0: current_row -= 1 #watch for the dataset boundaries if current_row < 0 or current_row > bbound_index or current_col > rbound_index or current_col < Ibound_index: carve_to_elev = sink_elev break current_elev -= elev_decline_increment elev_out_list[current_row][current_col] = current_elev elif primary_move == 'vertical': #iterate through the v_move_list for m in range(len(v_move_list)): v_move = v_move_list[m] for n in range(abs(v_move)): if v_move_list[m] > 0: current_row += 1 elif v_move_list[m] < 0: current_row -= l current_elev -= elev_decline_increment elev_out_list[current_row][current_col] = current_elev if h_move_list[m] > 0: current_col += 1 elif h_move_list[m] < O: current_col -= 1 #watch for the dataset boundaries if current_row < 0 or current_row > bbound_index or current_col > rbound_index or current_col < Ibound_index: carve_to_elev = sink_elev 143 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 break current_elev -= elev_decline_increment elev_out_list[current_row][current_col] = current_elev else: #its a straight diagonal path for p in range(len(v_move_list)): if h_move_list[O] > 0 : current_col += 1 elif h_move_list[O] < 0: current_col -= 1 if v_move_list[O] > 0 : current_row += 1 elif v_move_list[O] < 0: current_row -= 1 #watch for the dataset boundaries if current_row < 0 or current_row > bbound_index or current_col > rbound_index or current_col < Ibound_index: carve_to_elev = sink_elev break current_elev -= elev_decline_increment elev_out_list[current_row][current_col] = current_elev for h in range(iterations): gp.AddMessage("Beginning Iteration " + str(h + l) + ":") #create a workspace for the current iteration sub_workspace = workspace + 'iteration' + str(h + 1) os.mkdir(sub__workspace) #Calculating flow direction gp.AddMessage(" - Calculating flow direction of elevation raster") flowdir_raster = sub_workspace + "\\fd_temp" if h > 0: #get the previous iteration's output elev_raster = workspace + 'iteration' + str(h) + "\\elev_out" + str(h) gp.FlowDirection_sa(elev_raster, flowdir_raster) #Calculate sinks gp.AddMessage(" - Calculating sinks") sinks_raster = sub_workspace + "\\sinks" + str(h + l) gp.Sink_sa(flowdir_raster, sinks_raster) 144 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 #convert the tasters to text files if h = : gp.AddMessage(" - Converting elevation raster to a text file") elev_ascii_path = sub_workspace + "\\elev_temp.txt" gp.RasterToASCll_conversion(elev_raster, elev_ascii_path) else: #get the previous iteration's elevation text file elev_ascii_path = workspace + 'iteration' + str(h) + '\\elev_out' + str(h) + '.txt' stream_ascii_path = workspace + "stream_temp.txt" if h = 0: gp.AddMessage(" - Converting stream raster to a text file") gp.RasterToASCll_conversion(stream_raster, stream_ascii_path) gp.AddMessage(" - Converting sinks raster to a text file") sink_ascii_path = sub_workspace + "\\sinks" + str(h + l) + ".txt" gp.RasterToASCll_conversion(sinks__raster, sink_ascii_path) #convert the raster text files to arrays gp.AddMessage(" - Reading elevation text file into a list") elev_list = raster.rastertext21ist(elev_ascii_path, 'fioat') if h == : gp.AddMessage(" - Reading stream text file into an array") stream_ary = raster.raster2array(stream_raster) gp.AddMessage(" - Reading sinks text file into an array") sink_ary = raster.raster2array(sinks_raster) #record the header information for writing in the output stream file raster_header = raster.extract_header(elev_ascii_path) #note the indexes of the dataset boundaries rbound_index = len(elev_list[OD - l Ibound_index = 0 bbound_index = len(elev_list) - 1 ubound_index = 0 #intiate a list to represent the resulting elevation ascii, set initial values to elev_list's. elev_out_list = copy.deepcopy(elev_list) #initiate a list to keep track of carving attempts that could not find a lower cell (i.e., the maximum #carve length was exceeded before a lower elevation was found no_lower_sink_found = [] 145 (T‘r. . i I 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 #counters for output sinks_carved = 0 percent_done = 0 stream_ary_length = len(stream__ary) gp.AddMessage(' - Analyzing') for i in range(len(elev_list)): for j in range(len(elev_list[i])): if stream_ary[i,j] = 1 and sink_ary[i,j] > 0: #we may have a barrier to carve through carve(i, j) sinks_carved += 1 percent = int((float(i) / (stream__ary__length - l) * 100)) if percent > percent_done: gp.AddMessage(' - Analyzing: ' + str(percent) + '% completed‘) percent_done = percent del elev_list, sink_ary #write the output stream file elev_out_file_name = sub_workspace + "\\elev_out" + str(h + l) + ".txt" elev_out_file = open(elev_out_file_name, 'w') for i in range(len(raster_header)): elev_out_file.write(raster_header[i]) percent_done = 0 gp.AddMessage(' - Writing outputs') for i in range(len(elev_out_list)): for j in range(len(elev_out_list[i])): elev_out_file.write('%s ' % (elev_out_list[i][j])) elev_out_file.write(ls) percent = int((float(i) / (len(elev_out_list) - l) * 100)) if percent > percent_done: gp.AddMessage(' - Writing: '+ str(percent) + '% completed') percent_done = percent e1ev__out_file.close() del elev_out_file gp.AddMessage(' - There were ' + str(sinks_carved) + ' sinks carved.') gp.AddMessage(' - There were ' + str(len(no_lower_sink_found)) + ' sinks left unresolved') del no_lower_sink_found 146 tic. 1- " . 463 #convert the elev_out_file to a raster 464 gp.AddMessage(' - Converting the output elevation file to a raster') 465 elev_out_raster = sub_workspace + "\\elev_out" + str(h + 1) 466 gp.ASCIIToRaster_conversion(elev_out_file_name, elev_out_raster, "FLOAT") 467 468 469 470 #clean up 471 del elev_out_list 472 gp.AddMessage(' - Cleaning up') 473 gp.Delete_management(flowdir_raster) 474 os.remove(sink_ascii_path) 475 476 if h == 0: os.remove(elev_ascii_path) 477 478 479 if delete_intermediate == 'True': 480 #clean up some more 481 gp.AddMessage('Deleting intermediate data') 482 483 for i in range(iterations): 484 sub_workspace = workspace + 'iteration' + str(i + l) 485 if i < (iterations - I): 486 shutil.rmtree(sub_workspace) 487 else: 488 gp.CopyRaster_management(sub_workspace + "\\elev_out" + str(i + 1), output_raster) 489 shutil.nntree(sub_workspace) 490 os.remove(stream_ascii_path) 491 #del stream_list 492 del stream_ary 493 else: 494 os.remove(stream_ascii_path) 147 \OOOQONMAUJN— 10 11 12 13 14 15 I6 17 l8 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 APPENDIX B — Python Code flow_error.py # # fiow_error.py # Created September 2009 # Author: Glenn O'Neil # # Description: Uses the lO-meter flow-direction raster (ned_fd) and the LiDAR # tile flow-direction rasters (lidar_fd) in the lidar directory # to calculate flow-direction error for each lO-meter cell. # # Inputs: 1. quad_id - quadrangle ID (e.g. CW210). # 2. workspace - directory for quadrangle containing all of the # datasets (ned_fd and lidar_fd). # # Outputs: 1. flow_error.tif - TIFF raster of flow-direction raster at the # extent and cell-size of the ned_fd (IO-meter) with a value # of 0-4 calculated for each cell. # from osgeo import gdal, gdalconst import time, os, sys, numpy, copy import raster_analysis as raster ls = os.1inesep #get the quad name quad_id = sys.argv[l] workspace = sys.argv[2] #function to convert seconds to hours and minutes (borrowed from: #http://mail.python.org/pipermail/python-list/2003-January. l81366.html def sec_to_h_min(s): temp = floatO temp = fioat(s) / (60*60*24) d = int(temp) temp = (temp - d) "‘ 24 h = int(temp) temp = (temp - h) * 60 m = int(temp) temp = (temp - m) * 60 sec = int(temp) return h,m,sec 148 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 66 67 68 69 70 7 l 72 73 74 75 76 77 78 79 80 8 1 #start the timer script_start__time = time.clock() #create the log textfile flow_results_file_path = workspace + "\\log.txt" flow_results_file = open(flow_results_file_path, 'w') flow_results_file.write("QUAD_lD;" + quad_id + ls) flow_results_file.close() del flow_results_file #read the lidar_fd into a list, by way of a less efficient array ned_fd = workspace + "\\ned_fd" ned_fd_ary = raster.raster2array(ned_fd) ned_fd_list = ned_fd_ary.tolist() del ned_fd_ary #read the ned_fid raster into an array ned_fid = workspace + "\\ned_fid" ned_fid_ary = raster.raster2array(ned_fid) ned_fid_list = ned_fid_ary.tolist() del ned_fid_ary #initiate an array to represent the resulting flow direction evaluation, with initial values of 0. #make it the same size as the fd_list fd_eval_ary = numpy.zeros([len(ned_fd_list),len(ned_fd_1ist[0])]) #initiate an array to keep track of cells that have been processed (to facilitate assignment of NoData values later) cells _processed_ary = numpy.zeros([len(ned_fd_list),len(ned_fd_list[0])]) #iterate through the folders of the workspace, and conduct the flow analysis lidar_wspace_contents = os.1istdir(workspace + "\\lidar") raster_list = [] for i in lidar_wspace_contents: if os.path.isdir(workspace + "\\lidar\\" + i) and i != 'info': raster_list.append(i) raster_list_length = Ien(raster_list) for z in range(len(raster_list)): raster_folder_start_time = time.clock() 149 mafia 82 check_timer_start = time.clock() 83 4‘ 84 raster_folder = workspace + "\\lidar\\" + raster_list[z] 85 print ' - Processing raster’ + raster_folder + '(' + str(z + l) + ' of ' + str(raster_list_length) + ')' 86 #get the LiDAR dataset 87 lidar_fd = raster_folder + "\\lidar_fd" 88 89 90 #get the cell size parameters from the flow_analysis_parameters.txt file and store them in a dictionary 91 parameters_list = [] 92 parameters_dict = {} 93 parameters_file_path = raster_folder + "\\flow_analysis_parameters.txt" 94 parameters_file = open(parameters_file_path, 'r') 95 for eachLine in parameters_file: 96 parameters_list.append(eachLine.split(";")) 97 parameters_file.close() 98 99 #get rid of the newline character at the end of each row, and add it to the dictionary 100 for i in range(len(parameters_list)): 101 parameters_list[i][l] = parameters_list[i][l][:len(para1neters_list[i][l]) - 2] 102 parameters_dict[parameters_list[i][0]] = parameters_list[i][1 ] 103 104 #indices for performing neighborhood analyses 105 buffer_size = int(parameters_dict['bufi'er_size']) 106 start_center_row = buffer_size 107 start_center_col = buffer_size 108 end_center_row = 0 109 end_center_col = 0 1 10 l 1 1 #read the lidar_fd into a list by way of an array 1 12 lidar_fd_ary = raster.raster2array(lidar_fd) 113 lidar_fd_list = lidar_fd_ary.tolist() l 14 del lidar_fd_ary 115 116 #read the LiDAR dataset with the NED FID values into a list by way of an array 117 lidar_ned_fid = raster_folder + "\\lidar_ned_fid" 1 18 lidar_ned_fid_ary = raster.raster2array(lidar_ned_fid) 119 lidar_ned_fid_list = lidar_ned_fid_ary.tolist() 120 del lidar_ned_fid__ary 121 122 #read the binary mask for flat areas (derived from LiDAR-scale dataset 'carved_dem') 123 flat_areas = raster_folder + "\\fiat_areas" 150 124 125 126 127 128 129 130 131 I32 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 I48 I49 150 151 152 153 154 155 156 157 158 159 160 I61 162 163 flat__areas_ary = raster.raster2array(flat_areas) \ fiat_areas_list = flat_areas_ary.tolist() - del flat_areas_ary #Index counters for keeping track of lidar_ned_id indices as the code loops through the ned_fid #e.g. We will extract the lidar cells that correspond to the selected NED ID. To do that #we must loop through the array to find the appropriate cells, and we don't want to start the search #at the beginning each time. Since we're skipping the edge cells of the NED, we'll set the starting #values to two less than the number of lidar cells per NED cell. 1idar_cells_per_ned_cell = int(float(parameters_dict["lidar_cells_per_ned_cell"])) lidar_ned_fid_row_start = int(lidar_cells_per_ned_cell) - 2 lidar_ned_fid__row_end = lidar_ned_fid_row_start + 1 lidar_ned_fid_col_start = int(lidar_cells_per_ned_cell) - 2 lidar_ned_fid_col_end = lidar_ned_fid_row_end script_start__time = time.clock() #identify the NED F ID of the first and last LiDAR cells lidar_ned_fid_start = lidar_ned_fid_list[0][0] lidar_ned_fid_end = lidar_ned_fid_list[len(lidar_ned_fid_list) - 1][len(lidar_ned_fid_list[0]) - #In a few tiles, NoData values crept into the dataset at the comers, so we must check for those #and grab the first positive ned_fid value, by moving diagonally from the comers row = 0 col = 0 while 1idar_ned_fid_start < 0: lidar_ned_fid_sta1t = lidar_ned_fid_list[row] [col] row += 1 col += 1 row = len(lidar_ned_fid_list) - 1 col = len(lidar_ned_fid_list[row]) - 1 while lidar_ned_fid_end < 0: lidar_ned_fid_end = lidar_ned_fid_list[row][col] row -= 1 col -= 1 #determine the index in ned_fid_list where the lidar_ned_fid_start and lidar_ned_fid_end values are located ned_fid_list_start_row = 0 ned_fid_list_start_col = 0 ned_fid_list_end_row = O 151 I64 165 166 167 168 169 I70 171 172 173 174 175 176 I77 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 ned_fid_list_end_col = 0 for i in range(len(ned_fid_list)): try: ned_fid_list_start_col = ned_fid_list[i].index(lidar_ned_fid_start) ned_fid_list_start_row = i break except ValueError: pass for i in range(ned_fid_list_start_row, len(ned__fid_list)): try: ned_fid_list_end_col = ned_fid_list[i].index(lidar_ned_fid_end) ned_fid_list_end_row = i break except ValueError: pass #move through ned_fid_list performing a neighborhood assessment of flow-direction percent_done = 0 #variable for determining if a flat_cell was encountered, breaking the loop flat_area__four1d = False for i in range(ned_fid__list_start_row + 2, ned_fid_list_end_row - 2): #+2 and -2 to avoid the top and bottom edges #reset the Iidar_ned_fid_col_start = lidar_ned_fid_col_start = int(lidar_cells_per_ned__cel1) - 2 for j in range(ned_fid_list_start_col + 2, ned_fid_list_end_col - 2): #+2 and -2 to avoid the left and right edges ned_fid_value = ned_fid_list[i][j] #extract the LiDAR cells of the selected NED cell and the neighborhood around the selected NED cell #first, identify where the LiDAR cells are end_search = False #skip no data cells, e.g. ned_fid_value < 0 for a 32-bit unsigned integer, and cells that have already been processed if ned_fid_value < 0 or cells _processed_ary[i,j] == : break 152 205 " 206 for k in range(lidar_ned_fid_row_start, len(lidar_ned_fid_list)): \ 207 for m in range(lidar_ned_fid_col_start, len(lidar__ned_fid_list[k])): 208 if lidar_ned_fid_list[k][m] = ned_fid_value: #we've found the start of the NED area within the LiDAR cells 209 lidar_ned_fid_row_start = k 210 lidar_ned_fid_col_start = m 21 1 #find the col_end 212 for n in range(lidar_ned_fid_col_start, len(lidar_ned_fid_list[k])): 213 if lidar_ned_fid__list[k][n] != ned_fid_value: #we've found the end of the NED area (the previous column) 214 lidar_ned_fid_col_end = n - l 215 break 216 #fmd the row_end 217 for n in range(lidar_ned_fid_row_start, len(lidar_ned_fid_list)): 218 if lidar_ned_fid_list[n][lidar_ned_fid_col_start] 1= ned_fid_value: #we've found the end of the NED area (the previous row) 219 lidar_ned_fid_row_end = n - l 220 end_search = True 221 break 222 break 223 if end_search: 224 break 225 226 #determine the boundaries for extracting the LiDAR flow-direction values for the defined area and a buffer-area equal to half a NED cell. 227 lidar_nhood_row_start = Iidar_ned_fid_row_start - buffer_size 228 lidar_nhood_row_end = lidar_ned_fid_row_end + buffer_size 229 lidar_nhood_col_start = lidar_ned_fid_col_start - buffer_size 230 lidar_nhood_col_end = lidar_ned_fid_col_end + buffer_size 231 232 #create lists to store the values of neighborhood cells 233 lidar_nhood_fd_list = [] 234 flat_areas_nhood_list = [] 235 for k in range(lidar_nhood_row_start, lidar_nhood_row_end + 1): 236 lidar_nhood_fd_list.append(lidar_fd_list[k][lidar_nhood_col_startlidar_nhood_col_end + 1]) 237 flat_areas_nhood_list.append(flat_areas_list[k][lidar_nhood_col_start:lidar_nhood_col_end + 1l) 238 239 nhood_rows = len(lidar_nhood_fd_list) 240 nhood_cols = len(lidar_nhood_fd_list[0]) 241 242 end_center_row = nhood_rows - buffer_size 243 end_center_col = nhood_cols - buffer_size 244 153 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 #create a list that will mark the selected NED area LiDAR cells (center cells) as l and the buffer cells as 0 lidar_nhood_ary = numpy.zeros_like(lidar_nhood_fd_list) lidar_nhood_list = lidar_nhood_ary.tolist() del lidar_nhood_ary #1idar_nhood_list[start_center_row:end_center_row] [start_center_col:end_center_col] = for k in range(start_center_row, end_center_row + 1): for n in ran ge(start_center_col, end_center_col + l): lidar_nhood_list[k][n] = 1 #create a list that will store flow-accumulation of center cells within the neighborhood lidar_nhood_fa_ary = numpy.zeros_like(lidar_nhood_fd_list) lidar_nhood_fa_list = lidar_nhood_fa_ary.tolist() del lidar_nhood_fa_ary #Calculate flow_accumulation in the LiDAR neighborhood. #Visit each center cell (i.e. lidar_nhood_list == 1), trace its flow path using lidar_nhood_fd_list until it exits the neighborhood counter = 0 for k in range(start_center_row, end_center_row): for m in range(start_center_col, end_center_col): current_row = k current_col = m #while current_row < nhood_rows - 1 and current_row >= 0 and current_col < nhood_cols - 1 and current_col >= 0: while current_row < nhood_rows and current_row >= 0 and current_col < nhood_cols and current_col >= 0: #check for flat_area if flat_areas__nhood_list[current_row][current_col] == : flat_area_found = True break #determine what direction flow exits the current cell lidar_fd_value = lidar_nhood_fd_list[current__row][current_col] if lidar_fd_value == 1: #flow east current_col += 1 elif lidar_fd_value == 2: #flow southeast current_col += 1 current_row += 1 elif lidar_fd_valu == : #flow south current_row += 1 elif lidar_fd_value == : #fiow southwest 154 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 31 1 312 313 314 315 316 317 318 319 320 321 322 323 324 325 current_col -= l current_row += 1 elif lidar_fd_value == 16: #flow west current_col -= l elif lidar_fd_value == 32: #flow northwest current_col -= 1 current_row -= 1 elif lidar_fd_value == 64: #flow north current_row -= l elif lidar_fd_value == 128: #fiow northeast current_col += 1 current_row -= 1 #update the flowaccumulation list if current_row < nhood_rows and current_row >= 0 and current_col < nhood_cols and current_col >= 0: lidar_nhood_fa_list[current_row][current_col] += 1 if flat_area__found: break if flat_area_found: break if flat_area_found: fd_eva1_ary[i,j] = -9999 flat_area_found = False else: #re-adjust end_center_row and end_center_col to zero-based indexes, they currently just #hold the difference in size between the neighborhood size and bufier size end_center_row -= l end_center_col -= 1 #create a dictionary that will store the proportion of edge cells in each neighborhood quad nhood_proportions_dict = {} nhood_proportions_dict["NW"] = start_center_row + start_center_col nhood_proportions_dict["N"] = end_center_col - start_center_col + 1 nhood_proportions_dict["NE"] = nhood_proportions_dict["NW"] nhood_proportions_dict["E"] = nhood_proportions_dict["N"] nhood_proportions_dict["SE"] = nhood_proportions_dict["NW"] 155 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 nhood_proportions_dict["8"] = nhood_proportions_dict["N"] nhood_proportions_dict["SW"] = nhood_proportions_dict["NW"] nhood_proportions_dict["W"] = nhood_proportions_dict["N"] num_edge_cells = float(sum(nhood_pr0portions_dict.values())) for key in nhood_proportions_dict: nhood_proportions_dict[key] = nhood_proportions_dict[key] / num_edge_cells #create a dictionary that will store the total exiting flow for each neighborhood quad flow_dict = {} flow_dict["NW"] = 0 flow_dict["N"] = 0 fiow_dict["NE"] = 0 flow_dict["E"] = 0 flow_dict["SE"] = O flow_dict["S"] = 0 flow_dict["SW"] = 0 flow_dict["W"] = 0 #trace the outer edge of the boundary cells, and count the flowaccumulation cells that exit the study area or NE or NE or NE #aggregate the results by direction (E,SE,S,SW,W,NW,N,NE) #trace the northern edge for k in range(nhood_cols): if k < start_center_col: #we're in the northewest quad if lidar_nhood_fd_list[0][k] in [32,64,128]: #then the flow exits the quad NW, N, flow_dict["NW"] += lidar_nhood_fa_list[0][k] elif k >= start_center_col and k <= end_center_col: #we're in the northern quad if lidar_nhood_fd_list[O] [k] in [32,64,128]: #then the flow exits the quad NW, N, flow_dict["N"] += lidar_nhood_fa_list[0][k] else: #we're in the northeast quad if lidar_nhood_fd_list[0][k] in [32,64,128]: #then the flow exits the quad NW, N, flow_dict["NE"] += lidar_nhood_fa_list[0][k] #trace the southern edge last_row_index = nhood_rows - 1 for k in range(len(lidar_nhood_fd_list[last_row_index])): if k < start_center_col: #we're in the southwest quad if lidar_nhood_fd_list[last_row_index] [k] in [2,4,8]: #then the flow exits the quad SE, S, or SW 156 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 flow_dict["SW"] += lidar_nhood_fa_list[last_row_index][k] elif k >= start_center_col and k <= end_center_col: #we're in the south quad if lidar_nhood_fd_list[last_row_index][k] in [2,4,8]: #then the flow exits the quad SE, S, or SW flow_dict["S"] += lidar_nhood_fa_list[last_row_index] [k] else: #we're in the southeast quad if lidar_nhood_fd_list[last_row_index][k] in [2,4,8]: #then the flow exits the quad SE, S, or SW flow_dict["SE"] += lidar_nhood_fa_list[last_row_index][k] #trace the western edge for k in range(nhood_rows): if k < start_center_row: #we're in the northewest quad if lidar_nhood_fd_list[k][O] in [8,16,32]: #then the flow exits the quad SW, W, or NW #don't double count the northwestemmost cell, it may have been counted in the trace of the northern edge if k == 0 and lidar_nhood_fd_list[0][0] == 32: pass else: flow_dict["NW"] += lidar_nhood_fa_list[k][O] elif k >= start_center_row and k <= end_center_row: #we're in the west quad if lidar_nhood_fd_list[k][O] in [8,16,32]: #then the flow exits the quad SW, W, or NW flow_dict["W"] += lidar_nhood_fa_list[k][O] else: #we're in the southwest quad if lidar_nhood_fd_list[k][O] in [8,16,32]: #then the flow exits the quad SW, W, or NW flow_dict["SW"] += lidar_nhood_fa_list[k][O] #trace the eastern edge last_col_index = nhood_cols - 1 for k in range(nhood_rows): if k < start_center_row: #we're in the northeast quad if lidar_nhood_fd_list[k][last_col_index] in [128,1,2]: #then the flow exits the quad NE, E, or SE #don't double count the northeastemmost cell, it may have been counted in the trace of the northern edge if k = 0 and lidar_nhood_fd_list[O][last_col_index] == 128: pass else: flow_dict["NE"] += lidar_nhood_fa_list[k][last_col_index] elif k >= start_center_row and k <= end_center_row: #we're in the east quad if lidar_nhood_fd_list[k][last_col_index] in [128,1,2]: #then the flow exits the quad NE, E, or SE flow_dict["E"] += lidar_nhood_fa_list[k][last_col_index] else: #we're in the southeast quad 157 404 405 406 407 408 409 410 41 1 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 quad NE, if lidar_nhood_fd_list[k][last_col_index] in [128,1,2]: #then the flow exits the E, or SE #don't double count the southeastemmost cell, it may have been counted in the trace of the southern edge if k = last_row_index and lidar_nhood_fd_list[last_row_index][last_col_index] == : pass else: flow_dict["SE"] += lidar_nhood_fa_list[k][last_col_index] #create a dictionary that will store the percentage of total exiting flow for each neighborhood quad direction sum_flow__exit_cells = fioat(sum(fiow_dict.values())) if sum_flow_exit_cells == : print '0 value at ' + str(i) + ' , ' + str(i) + '; ned_id = ' + str(ned_fid_value) flow _pct_dict = {} flow _pct_dict["NW"] = flow_dict["NW"] / sum_flow_exit_cells fiow_pct_dict["N"] = flow_dict["N"] / sum_flow__exit_cells flow _pct_dict["NE"] = flow_dict["NE"] / sum_flow_exit_cells flow _pct_dict["E"] = flow_dict["E"] / sum_fiow_exit_cells flow _pct_dict["SE"] = flow_dict["SE"] / sum_fiow_exit__cells flow _pct_dict["S"] = flow_dict["S"] / sum_flow_exit_cells flow_pct_dict["SW"] = flow_dict["SW"] / sum_flow_exit_cells flow _pct_dict["W"] = fiow_dict["W"] / sum_flow_exit_cells #evaluate the difference in flow-direction fi'om the center cells to the NED flow- ned_fd_value = ned_fd_list[i][j] fiow_error_dict = {} if ned_fd_value == 1: #east flow_error_dict["NW"] = 3 flow_error_dict["N"] = 2 flow_error_dict["NE"] = I flow_error_dict["E"] = O flow_error_dict["SE"] = 1 flow_error_dict["S"] = 2 flow_error_dict["SW"] = 3 flow_error_dict["W"] = 4 elif ned_fd_value == 2: #southeast flow_error_dict["NW"] = 4 flow_error_dict["N"] = 3 flow_error_dict["NE"] = 2 158 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 flow_error_dict["E"] = 1 flow_error_dict["SE"] = 0 flow_error_dict["S"] = 1 flow_error_dict["SW"] = 2 fiow_error_dict["W"] = 3 elif ned_fd_value == 4: #south flow_error_dict["NW"] = 3 flow_error_dict["N"] = 4 flow_error_dict["NE"] = 3 flow_error_dict["E"] = 2 flow_error_dict["SE"] = 1 flow_error_dict["S"] = 0 flow_error_dict["SW"] = 1 flow_error_dict["W"] = 2 e1ifned_fd_value == 8: #southwest flow_error_dict["NW"] = 2 flow_error_dict["N"] = 3 fiow_error_dict["NE"] = 4 fiow_error_dict["E"] = 3 flow_error_dict["SE"] = 2 flow_error_dict["S"] = l flow_error_dict["SW"] = O fiow_error_dict["W"] = l elif ned_fd_value == 16: #west flow_error_dict["NW"] = l flow_error_dict["N"] = 2 flow_error_dict["NE"] = 3 flow_error_dict["E"] = 4 flow_error_dict["SE"] = 3 flow_error_dict["S"] = 2 flow_error_dict["SW"] = 1 flow_error_dict["W"] = 0 elif ned_fd_value == 32: #northwest flow_error_dict["NW"] = 0 flow_error_dict["N"] = 1 fiow_error_dict["NE"] = 2 flow_error_dict["E"] = 3 flow_error_dict["SE"] = 4 flow_error_dict["S"] = 3 flow_error_dict["SW"] = 2 flow_error_dict["W"] = l elif ned_fd_value == 64: #north flow_error_dict["NW"] = 1 159 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 flow_error_dict["N"] = 0 fiow_error_dict["NE"] = l flow_error_dict["E"] = 2 fiow_error_dict["SE"] = 3 flow_error_dict["S"] = 4 flow_error_dict["SW"] = 3 flow_error_dict["W"] = 2 elif ned_fd_value == 128: #northeast flow_error_dict["NW"] = 2 flow_error_dict["N"] = l flow_error_dict["NE"] = 0 flow_error_dict["E"] = 1 fiow_error_dict["SE"] = 2 flow_error_dict["S"] = 3 flow_error_dict["SW"] = 4 flow.error_dict["W"] = 3 weighted_flow_error = 0 for direction in flow _pct_dict: weighted_flow_error += flow _pct_dict[direction] * flow_error_dict[direction] #record the weighted flow error in the NED sized raster list fd_eval_ary[i,j] = weighted_flow_error del nhood_proportions_dict, flow_dict, flow _pct_dict, flow_error_dict cells _processed_ary[i,j] = I #delete used lists and dictionaries del lidar_nhood_list, lidar_nhood_fd_list, flat_areas_nhood_list, lidar_nhood_fa_list percent = int(float(i - ned_fid_list_start_row) / (ned_fid_list_end_row - ned_fid_list_start_row) * 100) if percent > percent_done: print ' - ' + str(percent) + "% done (ned_fid : " + str(ned_fid_value) + ")" percent_done = percent del lidar_fd_list, 1idar_ned_fid_list 160 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 raster_folder_seconds = time.clock() - raster_folder_start_time h,m,s = sec_to_h_min(raster_folder_seconds) \_ print ' - raster ' + raster_folder + ' took ' + str(h) + ' hours ' + str(m) + ' minutes ' + str(s) + ' seconds' #loop through the cells _processed_ary and set any cells with a value of 0 to NoData (-9999) for i in range(len(cells_processed_ary)): for j in range(len(cells_processed_ary[i])): if cells _processed__ary[i,j] == : fd_eval_ary[i,j] = -9999 #write the cumulative error array to a raster #use the TIFF file as a template, since GDAL cannot write ArcINFO binary rasters template_raster = workspace + "\\template_raster.tif' template_raster_ds = gdal.0pen(template_raster, gdalconst.GA_Update) template_raster_band = template_raster_ds.GetRasterBand(1) template_raster_band.WriteArray(fd_eval_ary) template_raster_band.FlushCache() template_raster_ds.FlushCacheO del template_raster_ds, template_raster_band, fd_eval_ary, ned_fd_list, ned_fid_list, lidar_wspace_contents, raster_list, cells _processed_ary os.rename(template_raster, template_raster.replace('template_raster’, 'flow_error')) seconds = time.clock() - script_start_time h,m,s = sec_to_h_min(seconds) print 'Whole thing Took ' + str(h) + ' hours ' + str(m) + ' minutes ' + str(s) + ' seconds' 161 \OOOQONMAWN— WWWWWMWWWMNNNNNNNNNN—t—n—Iu—ou—tu—n—u—u—n— \OWVO‘MAwN—‘OQWQO‘M-5‘JJN—OOWNONM#WN—O APPENDIX B — Python Code contour_dlg_conversion.py # # contour_dlg_conversion.py # Created September 2009 # Author: Glenn O'Neil # # Description: Takes a directory of DLG hypsography files and converts each # files relevant features to ESRI shapefiles. Crashes due to an # ESRI bug if it tries to process more than 70 files. # # Inputs: 1. d1g_wspace - directory containing DLG hypsography files. # 2. scratch_wspace - directory for temporary files. # 3. out_wspace - directory where the output shapefiles are written. # # Outputs: 1. _ohp.shp - shapefile of quadrangle contours. # import os, arcgisscripting, sys gp = arcgisscripting.create(9.3) d1g_wspace = sys.argv[l] scratch_wspace = sys.argv[2] + "\\" out_wspace = sys.argv[3] + "\\" #get the dlg files d1g_wspace_list = os.listdir(dlg_wspace) dlg_list = [] #filter the list so it only contains .dlg files for i in range(len(dlg_wspace__list)): if '.d1g' in (dlg_wspace_list[i]): dlg_list.append(dlg_wspace_list[i]) del d1g_wspace_list d1g_list_length = len(dlg_list) for i in range(dlg_list_length): gp.AddMessage("Processing " + str(i + l) + " of " + str(dlg_list_length) + ":") #convert the dlg to a coverage gp.AddMessage(" - converting DLG to coverage") dlg_cov = scratch_wspace + "dlg_cov" 162 40 gp.dlgarc(dlg_wspace + "\\" + dlg_list[i], dlg_cov) 4] #convert the coverage to a shapefile 42 gp.AddMessage(" - converting coverage to shapefile") 43 dlg_shp = dlg_cov + "_arc.shp" 44 gp.FeatureClassToShapefile(dlg_cov + "\\arc", scratch_wspace) 45 #join the .acode file to the shapefile table on the ID field 46 gp.AddMessage(" - joining acode table to shapefile") 47 acode = dlg_cov + ".acode" 48 gp.joinfield (dlg_shp, "ID", acode, "DLG_COV-ID") 49 #extract the appropriate stream features 50 gp.AddMessage(" - extracting features") 51 #get the appropriate fields from dlg_shp 52 code_fields = gp.|istfields(dlg_shp, 'MlNOR”) 53 select_query = "'MAJORI " = 20 AND (' 54 for j in range(len(code_fields)): 55 if j != (1en(code_fields) - 1): 56 select_query += "" + code_fieldsfi].Name + "' NOT IN (202,205,206,207,208,209,210,299) OR ' 57 else: #treat the last record differently select_query += "" + code_fieldsfi].Name + '" NOT IN 58 (202,205,206,207,208,209,210,299))' 59 60 dlg_out = out_wspace + dlg_list[i][:8] + '.shp' 61 gp.select_analysis(dlg_shp, dlg_out, select_query) 62 63 gp.AddMessage(" - deleting intermediate data") 64 gp.delete_management(dlg_cov) 65 gp.delete_management(dlg_shp) 66 67 del dlg_list, select_query 163 \OOONO‘LIIADJNH APPENDIX B — Python Code contour_toppology_analysis.py # # contour_topology_analysis.py # Created September 2009 # Author: Glenn O'Neil # # Description: This script identifies topological error in contour datasets. # It analyzes node ID information to determine how many contour # intersections exists in a dataset. It uses quad boundary and # featre vertices to determine how many contour dangles exist. # # Inputs: 1. quad_ds - an ESRI shapefile of 7.5 Minute USGS Quadrangles # 2. quad_id_field - unique ID field of the 'quads' shapefile # 3. contour_wspace - workspace where the contour datasets are stored # 4. workspace - a folder where temporary datasets will be written # # Outputs: 1. Inter - a field in quad_ds that represents the number of # intersections in that quad. # 2. Dangle - a field in quad_ds that represents contour dangle # 3. Contours - a field in quad_ds that represents the number of contours # 4. Int _p_cnt — a field in quad_ds that represents the number of # intersections per contour. # 5. Dan _p__cnt - a field in quad_ds that represents the number of # dangles per contour. # #import the necessary modules import sys, 05, time, arcgisscripting gp = arcgisscripting.create(9.3) #inputs quad_ds = sys.argv[l] quad_id_field = sys.argv[2] contour_wspace = sys.argv[3] workspace = sys.argv[4] gp.OverWriteOutput = 1 #insert a field into the quad dataset that will store the number of intersections quad_ds_fields = gp.listfields(quad_ds) 164 40 41 42 43 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 6O 61 62 63 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 field _present = False for i in range(len(quad__ds_fields)): if quad_ds_fields[i].name == "Inter": field _present = True if not field _present: gp.addfield(quad_ds, "Inter", "short", "8") gp.addfield(quad_ds, "Dangle", "short", "8") gp.addfield(quad_ds, "Contours", "short", "8") gp.addfield(quad__ds, "Int _p_cnt", "FLOAT") gp.addfield(quad_ds, "Dan_p_cnt", "FLOAT") #detennine the number of quads to process rows = gp.searchcursor(quad_ds) quad_count = 0 row = rows.next() while row: quad_count += 1 row = rows.next() del row, rows #loop through the quad_ids quad_rows = gp.updatecursor(quad_ds) quad_row = quad_rows.next() counter = 0 while quad_row: quad_id = quad_row.getvalue(quad__id_field) quad_contour_ds = contour_wspace + "\\" + quad_id + "ohp.shp" gp.AddMessage("- Processing quad " + quad_id + "(" + str(counter + l) + " of " + str(quad_count) + ")") counter += 1 if not gp.exists(quad_contour_ds): gp.AddMessage(" - could not find contours for quad " + quad_id) quad_row = quad_rows.next() else: #check to see if intersections have already been identified if quad_row. getvalue("1nter") > 0: #the quad has PROBABLY already been processed, possible that there were 0 intersections anyway quad_row = quad_rows.next() 165 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 continue #create a table of unique node IDs in the contours #read the fiiode field and store unique results in an array, repeat for tnode fields, then create a table cursor = gp.searchcursor(quad_contour_ds) node_ids = [] row = cursor.next() while row: fnode_id = row.getvalue("FROMNODE") tnode_id = row.getvalue("TONODE") if not fnode__id in node_ids: node_ids.append(fiiode_id) if not tnode_id in node_ids: node_ids.append(tnode_id) row = cursor.next() node_ids.sort() node_id_table = workspace + "\\node_id.dbf' gp.CreateTable(workspace, "node_iddbf') gp.AddField(node_id_table, "ID", "SHORT") rows = gp.InsertCursor(node_id_table) for i in range(len(node_ids)): row = rows.NewRow() row.ID = node_ids[i] rows.InsertRow(row) del row, rows #select only non-border contours (i.e. MAJOR] > 0) #make a feature layer first quad_contour_fl = quad_id + "_contour_ds__fl" gp.MakeFeatureLayer(quad_contour_ds, quad_contour_fl) gp.SelectLayerByAttribute(quad_contour_fl, "NEW_SELECTION", "\"MAJORI\" > 0") #Calculate FNode and TNOde Frequency Tables fiiode_fieq = workspace + "\\" + quad_id + "_fnode__freq.dbf‘ tnode_freq = workspace + "\\" + quad_id + "_tnode_freq.dbf' gp.Frequency(quad_contour_fl, firode_freq, "FROMNODE") gp.Frequency(quad_contour_fl, tnode_freq, "TONODE") 166 124 125 126 127 128 129 130 131 132 133 134 135 I36 137 138 I39 140 141 142 143 144 I45 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 #Convert FNode and TNode Freq. to Table Views fnode_freq_tv = quad_id + "_fnode_freq_tv" tnode_fi'ethv = quad_id + "_tnode_freq_tv" gp.maketableview(fiiode_freq, fnode_freq_tv) gp.maketableview(tnode_fi'eq, tnode_freq_tv) #select intersections in the frequency tables gp.SelectLayerByAttribute(fnode_freq_tv, "NEW_SELECTION", "\"FREQUENCY\" > 1") gp.SelectLayerByAttribute(tnode_freq_tv, "NEW_SELECTION", "\"FREQUENCY\" > I") #join the selected frequency tables gp.AddIoin_management(quad_contour_fl, "FROMNODE", fnode_fieq_tv, "FROMNODE", "KEEP_ALL") gp.AddJoin_management(quad_contour_fl, quad_id + "ohp.TONODE", tnode_freq_tv, "TONODE", "KEEP_ALL") #select the contours that have intersections (fromnode and tnode frequencies > 1) gp.SelectLayerByAttribute(quad_contour_fl, "NEW_SELECTION", "\"" + quad_id + "_fnode_freq.FREQUENCY\" > 1 OR \"" + quad_id + "_tnode_freq.FREQUENCY\" > 1") #gp.SelectLayerByAttribute("quad_contour_ds_fl", "NEW_SELECTION", "'fnode_frethv:FREQUENCY" > 1') #Subratct mid-contour nodes gp.SelectLayerByAttribute(quad_contour_fl, "REMOVE_FROM_SELECTION", “(\"" + quad_id + "_fnode_freq.FREQUENCY\" = 2 AND \"" + quad_id + "_tnode_freq.FREQUENCY\" IS NULL) OR (\"" + quad_id + "_tnode_freq.FREQUENCY\" = 2 AND \"" + quad_id + "_fnode_freq.FREQUENCY\" IS NULL)") intersection_count = gp.GetCount_management(quad_contour_fl) quad_row.lnter = intersection_count.getOutput(0) #the dangle process: #use feature vertices to points with the "BOTH_ENDS" contour_vertices = workspace + "\\" + quad_id + "_vertices.shp" gp.FeatureVerticesToPoints(quad_contour_ds, contour_vertices, "BOTH_ENDS") #use addXY on the new points layer gp.AddXY__management(contour_vertices) #create a new string field 30 characters long for X Y concatenation gp.AddField(contour_vertices, "XY", "TEXT") #concatenate the X field and the Y field with a ";" gp.CalculateField(contour_vertices, "XY", "[POINT_X] & \";\" & [POINT_Y]") 167 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 1 89 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 #calculate a frequency table on the concatenated XY values contour_vertices_fl = quad_id + "__contour_vertices_fl" gp.MakeFeatureLayer(contour_vertices, contour_vertices_fi) xy_freq = workspace + "\\" + quad_id + "_xy_fi'eq.dbf' gp.Frequency(contour_vertices_fi, xy_freq, "XY") xy_freq_tv = quad_id + "_xy_fi'eq_tv" gp.maketableview(xy_freq, xy_freq_tv) #join the frequency table to the contour_vertices gp.Adeoin_management(contour_vertices_fl, "XY", xy_freq_tv, "XY", "KEEP_ALL") #select the selected quad from the quad shapefile and create a 500 foot inside buffer #This allows us to avoid selecting the dangling nodes on the quad boundary quad_select = workspace + "\\" + quad_id + "_sel.shp" gp.select_analysis(quad_ds, quad_select, "\"" + quad_id_field + "\" = "' + quad_id + ""') quad_inside_buff = workspace + "\\" + quad_id + "_inside_buff.shp" gp.buffer_analysis(quad_select, quad_inside_buff, "-200") #probably in meters, depends on quad_select units quad_inside__buff_fl = quad_id + "_inside_buff_fl" gp.MakeFeatureLayer(quad_inside_buff, quad_inside_buff_fl) #select the vertices that intersect the new inside buffer gp.SelectLayerByLocation(contour_vertices_fl, "intersect", quad_inside_bufi) #create a sub-selection the vertices that have a frequency = 1 gp.Se1ectLayerByAttribute(contour_vertices_fl, "SUBSET_SELECTION", "\"" + quad_id + "_xy_freq.FREQUENCY\" = 1") #get the recourd count on the vertices dangle_count = gp.GetCount_management(contour_vertices_fl) quad_rowDangle = dangle_count.getOutput(0) #calculate the number of contours, to determine intersection and dangle ratios #Calculate a contour frequency table gp.RemoveJoin(quad__contour_fl, quad_id + "_fiiode_freq", quad_id + "_tnode_freq") gp.SelectLayerByAttribute(quad_contour_fl) #clears its current selection contour_fi'eq = workspace + "\\" + quad_id + "_contour_freq.dbf' gp.Frequency(quad_contour_fl, contour_freq, "ID") contour_count = gp.GetCount_management(quad_contour_fl) quad_row.Contours = contour_count.getOutput(O) 168 205 206 intersections _per_contour = float(intersection_count.getOutput(0))/ float(contour_count.getOutput(0)) 207 dangles_per_contour = fioat(dangle_count.getOutput(0)) / float(contour_count.getOutput(O)) 208 209 quad_row.lnt_p_cnt = str(intersections_per_contour) 210 quad_row.Dan_p_cnt = str(dangles_per_contour) 211 2 1 2 quad_rows.updaterow(quad_row) 213 2 l 4 gp.delete_management(quad_contour_fl) 21 5 gp.delete_management(fnode_freq) 2 l 6 gp.delete_management(tnode_freq) 2 1 7 gp.delete_management(node_id_table) 2 1 8 gp.delete_management(contour_vertices) 2 l 9 gp.delete_management(contour_vertices_fl) 220 gp.delete__management(xy_freq) 22 1 gp.delete_management(quad_select) 222 gp.delete_management(quad_inside_buff) 223 gp.delete_management(quad_inside_buff_fl) 224 gp.delete_management(contour_freq) 225 226 quad_row = quad_rows.Next() 227 228 del quad_row, quad_rows 169 \OOOQO‘thbbJN— OWQ¢MAWNHOOWQaMAWN~OOW\lQMbWN—o APPENDIX B — Python Code analysis_results.py # # analysis_results.py # Created September 2009 # Author: Glenn O'Neil # # Description: Takes a directory of flow error rasters, reads it into a NumPy # Array through GDAL and records each raster's mean, standard # deviation and median flow error value into a text file. # # Inputs: 1. workspace - directory # 2. output_text_file - text file containing the mean, median, # and standard deviation flow error values for each raster. # # Outputs: 1. output text file - # # import sys, os, raster_analysis, numpy from osgeo import gdal, gdalconst #get the inputs workspace = sys.argv[l] output_text_file_path = sys.argv[2] 1s = os.linesep output_text_file = open(output_text_file_path, 'w') #get the folders of the workspace quads_list = [] quads_wspace_contents = os.listdir(workspace) for i in quads_wspace_contents: if os.path.isdir(workspace + "\\" + i): quads_list.append(i) del quads_wspace_contents quads_list.sort() 170 I . 40 41 quads_list_length = len(quads_list) 42 for i in range(quads__list_length): 43 #for i in range(l): 44 45 #get the flow_error raster 46 flow_error_raster = workspace + "\\" + quads_list[i] + "\\" + quads_list[i] + "_flow_error.tif" 47 48 if os.path.exists(flow_error_raster): 49 50 #read the raster into an array 51 flow_error_ary = raster_analysis.raster2array(flow_error_raster) 52 53 #append non NoData (i.e. -9999) results to a list 54 error_values_list = [] 55 for j in range(len(flow_error_ary)): 56 for k in range(len(flow_error_ary[j])): 57 if fiow_error_ary[j,k] != -9999.0: 58 error_values_list.append(flow_error_ary[j,k]) 59 60 del flow_error_ary 61 62 #convert the list back to an array for stat analysis 63 error_values_ary = numpy.array(error_values__list) 64 error_values_list.sort() 65 median = error_values_list[int(numpy.round(len(error_values_list) / 2)) - l] 66 67 #capture the mean and standard deviation of the ary 68 mean = error_values_ary.mean() 69 stdev = error_values_ary.std() 70 del error_values_ary, error_values_list 71 72 #record the stats to the text file 73 output_text_file.write(quads_list[i] + "l" + str(mean) + "l" + str(stdev) + "I" + str(median) + Is) 74 75 print 'Processed ' + quads_list[i] + "(" + str(i + l) + " of " + str(quads_list_length) + ")" 76 77 78 output_text_file.close() 79 del output_text_file, quads_list 171 REFERENCES 172 REFERENCES Aziz, S. A.; Steward, Brian L. (2007). Development of Agricultural Field DEM Using Repeated GPS Measurements from Field Operations: Effects of Sampling Intensity and Pattern. ASABE paper No. 071089. St. Joseph, Mich.: ASABE. Barber, C. P.; Shortridge, A. (2005). Lidar Elevation Data for Surface Hydrologic Modeling: Resolution and Representation Issues. Cartography and Geographic Information Science 32.4, pp. 401-410. Burrough, P.; vanGans, P.F.M; MacMillan, RA. (2000). High resolution landform classification using fuzzy k-means. Journal of Fuzzy Sets and Systems 113, pp. 37- 52. Carrara, A.; Bitelli, G.; Carla, R. (1997). Comparison of techniques for generating digital terrain models from contour lines. International Journal of Geographic Information Science I I , pp. 451-473. Carson, W.; Reutebuch, S. (1997). A Rigorous Test of the Accuracy of USGS Digital Elevation Models in Forested Areas of Oregon and Washington. ACSM/ASPRS Annual Convention & Exposition Technical Papers. April 7-10, 1997. Carter, J. R. (1988). Digital representations of topographic surfaces. Photogrammetric Engineering and Remote Sensing 54, pp. 1577-1580. Carter, J. R. (1992). The effect of data precision on the calculation of slope and aspect using gridded DEMs. Cartographica 29.1, pp. 22-34. Chang, K.; Tsai, B. (1991). The Effect of DEM Resolution on Slope and Aspect Mapping. Cartography and Geographic Information Systems 18.], pp. 69-77. Clarke, K.C., Lee, SJ. (2007). Spatial Resolution and Algorithm Choice as Modifiers of Downslope Flow Computed from Digital Elevation Models. Cartography and Geographic Information Science 34. 3, pp. 215-230. Daratech Inc. (2009, August 20). Press Release — ‘GIS/Geospatial Industry Worldwide Growth Slows to 1% in 2009.” Directions Magazine. Retrieved February 16, 2010, from Directions Magazine: http://www.directionsmag.com/press.releases/?duty=Show&id=3 63 1 8. 173 Deng, Y.; Wilson, J .; Bauer, B. (2007). DEM resolution dependencies of terrain attributes across a landscape. International Journal of Geographical Information Science 21.2, pp. 187-213. Desmet, P. (1997). Effects of Interpolation Errors on the Analysis of DEMs. Earth Surface Processes and Landforms 22, pp. 563-5 80. Digital Cartographic Data Standards Task Force. (1998). A proposed standard for digital cartographic data. The American Cartographer 15.1, pp. 129 — 136. Endreny, T.A.; Wood, ER (2001). Representing elevation uncertainty in runoff modeling and flowpath mapping. Hydrological Processes 15, pp. 2223-2236. Erskine, R.H.; Green, T.R.; Ramiriez, J.A.; MacDonald, L.H. (2007). Digital Elevation Accuracy and Grid Cell Size: Effects on Estimated Terrain Attributes. Soil Science Society of America Journal 71. 4, pp. 1371-1380. Faintich, M. (1996). Digital Elevation Models. Cellular Business September, pp. 46-5 8. Fisher, P. (1993). Algorithm and implementation uncertainty in viewshed analysis. International Journal of Geographical Information Science 7. 4, pp. 331-347. Fisher, P. (1998). Improved Modeling of Elevation Error with Geostatistics. Geolnformatica 2.3, pp. 215-233. Fisher, P.; Tate, N. (2006). Causes and consequences of error in digital elevation models. Progress in Physical Geography 30. 4, pp. 467-489. Florinsky, I. (1998). Accuracy of local topographic variables derived from digital elevation models. International Journal of Geographical Information Science 12.], pp. 47-61. Freeman, T. (1991). Calculating Catchment Area With Divergent Flow Based on a Regular Grid. Computers & Geosciences 1 7.3, pp. 413-422. Garbrecht, J ., Starks, P. (1995). Note on the Use of USGS Level 1 7.5-Minute DEM Coverages for Landscape Drainage Analyses. Photogrammetric Engineering & Remote Sensing 61.5, pp. 519-522. ' Goodchild, M.; Guoqing, S.; Shiren, Y. (1992). Development and test of an error model for categorical data. International Journal of Geographic Information Science 6.2, pp. 87-104. Greenlee, D. (1987). Raster and Vector Processing for Scanned Linework. Photogrammetric Engineering and Remote Sensing 53.10, pp. 1383-1387. 174 Heuvelink, G. (1998). Error Propagation in Environmental Modelling. Bristol, PA: Taylor and Francis, Inc. Holmes, K.W.; Chadwick, O.A.; Kyriakidis, PC. (2000). Error in a USGS 30-meter digital elevation model and its impact on terrain modeling. Journal of Hydrology 233, pp. 154-173. Homer, C.; Dewitz, J .; Fry, J .; Coan, M.; Hossain, N.; Larson, C.; Herold, N.; McKerrow, A.; Van Driel, J .; Wickham, J. (2007). Completion of the 2001 National Land Cover Database for the Cotenninous United States. Photogrammetric Engineering and Remote Sensing April, pp. 829-840. Hutchinson, M. (1989). A New Procedure for Gridding Elevation and Stream Line Data with Automatic Removal of Spurious Pits. Journal of Hydrology 106, pp. 211-232. James, L.; Watson, D.; Hansen, W. (2007). Using LiDAR data to map gullies and headwater streams under forest canopy. CA TENA 71.], pp. 132-144. Jenson, S.; Domingue, J. (1988). Extracting Topographic Structure from Digital Elevation Data for Geographic Information System Analysis. Photogrammetric Engineering and Remote Sensing 54.11, pp. 1593-1600. Jones, K.H. (1998). A Comparison of Algorithms used to Compute Hill Slope as a Property of the DEM. Computers & Geosciences 24.4, pp. 315-323. Kienzle, S. (2004). The Effect of DEM Raster Resolution on First Order, Second Order and Compound Terrain Derivatives. Transactions in GIS 8.], pp. 83-111. Kyriakidis, P.C.; Shortridge, A.M.; Goodchild, M.F. (1999). Geostatistics for conflation and accuracy assessment of digital elevation models. International Journal of Geographical Information Science 13.7, pp. 677-707. Leckie, D.; Cloney, E.; Jay, C. (2005). Automated mapping of stream features with high- resolution multispectrai imagery: An example of the capabilities. Photogrammetric Engineering and Remote Sensing 71.2, pp. 145-155. MacMillan, R.; Martin, T.; Earle, T.; McNabb, D. (2003). Automated analysis and classification of landforms using high-resolution digital elevation data: applications and issues. Canadian Journal of Remote Sensing 29.5, pp. 592-606. Martinoni, D., Bernhard, L. (1998). A conceptual framework for reliable digital terrain modelling. In: Proceedings of the Eighth Symposium on Spatial Data Handling, Vancouver, Canada, pp. 737-750. 175 Mason, D.; Scott, T.; Wang, H. (2006). Extraction of tidal channel networks from airborne scanning laser altimetry. ISPRS Journal of Photogrammetry and Remote Sensing 61.2, pp. 67-83. Meyer, TH. (2004). The Discontinuous Nature of Kriging Interpolation for Digital Terrain Modeling. Cartography and Geographic Information Science 31. 4, pp. 209-216. Mitasova, H.; Hofierka, J .; Zlocha, M.; Iverson, L. (1995). Modeling Topographic Potential for Erosion and Deposition Using GIS. International Journal of GIS January, pp. 1-19. Mouton, A. (2005). Generating Stream Maps Using LiDAR Derived Digital Elevation Models and 10-m USGS DEM. Thesis. Washington State University. O’Callaghan, J .F., Mark, D.M. (1984). The Extraction of Drainage Networks From Digital Elevation Data. Computer Vision, Graphics and Image Processing 28, pp. 328-344. Ohio Office of Information Technology, GIS Support Center. (2005). Ohio IO-meter Digital Elevation Model (F GDC) / oh_dem10b (ISO). Retrieved July 17, 2008, from http://metadataexplorer.gis.state.oh.us/metadataexplorer/firll_metadata. j sp?docId= {DO820264-6E96-4E1A-86OF-F89F42C3 9F 64}&loggedln=false. Oksanen, J .; Tapani, S. (2005). Error propagation of DEM-based surface derivatives. Computers and Geosciences 31, pp. 1015-1027. Ouyang, D.; Bartholic, J.; Selegean, J. (2005). Assessing Sediment Loading from Agricultural Croplands in the Great Lakes Basin. Journal of American Science 1.2, pp. 14-21. Quinn, P.; Beven, K.; Chevallier, P.; Planchon, O. (1991). The Prediction of Hillslope Flowpaths for Distributed Hydrological Modelling Using Digital Terrain Models. Hydrological Processes 5, pp. 59-80. Raaflaub, L.D.; Collins, M.J. (2006). The effect of error in gridded digital elevation models on the estimation of topographic parameters. Environmental Modelling and Software 21, pp. 710-732. Riggs, P.D.; Dean, DJ. (2007). An Investigation into the Causes of Errors and Inconsistencies in Predicted Viewsheds. Transactions in GIS 11.2, pp. 175- 196. 176 Saunders, W. (1999). Preparations of DEMS for use in Environmental Modeling Analysis. 1999 ESRI User Conference. San Diego, CA, July 24—30, 1999. http:l/proceedings.esri.com/library/userconf/proc99/proceed/papers/pap802/p802. Htrn. Schmidt, F.; Persson, A. (2003). Comparison of DEM Data Capture and Topographic Wetness Indices. Precision Agriculture 4, p. 179-192. Shortridge, A. (2006). Shuttle Radar Topography Mission Elevation Data Error and Its Relationship to Land Cover. Cartography and Geographic Information Science 33.1, pp. 65-75. Skidmore, AK. (1989). A comparison of techniques for calculating gradient and aspect from a gridded digital elevation model. International Journal of Geographical Information Systems 2.4, pp. 323-334. Soille, P.; Vogt, J .; Colombo, R. (2003). Carving and adaptive drainage enforcement of grid digital elevation models.” Water Resources Research 39.12, pp. 10-1 to 10-3. Soille, P. (2004a). Morphological carving. Pattern Recognition Letters 25, pp. 543- 550. Soille, P. (2004b). Optimal removal of spurious pits in grid digital elevation models. Water Resources Research 40.12. Tarboton, D.G. (1991). On the Extraction of Channel Networks from Digital Elevation Data. Hydrological Processes 5, pp. 81-100. Tarboton, D.G. (1997). A New Method for the Determination of Flow Directions and Upslope Areas in Grid Digital Elevation Models. Water Resources Research 33.2, pp. 3 09-3 19. Thompson, J.A.; Bell, J .C.; Butler, CA. (2001). Digital elevation model resolution: effects on terrain attribute calculation and quantitative soil-landscape modeling. Geoderma I, pp. 67-89. US EPA. (2007, July 17). Analytical Tools Interface for Landscape Assessments (ATtILA). Environmental Protection Agency Website. Retrieved August 2009, from http://www.epa. gov/esd/land-sci/atti1a/index.htm. USGS. (2006, August). National Elevation Dataset. United States Geological Survey. Retrieved March 2009, fi'om http://ned.usgs.gov. USGS. (1998, January). Standards for Digital Elevation Models. Rocky Mountain Mapping Center. Retrieved March, 2009, from http://rockyweb.cr.usgs.gov/nmpstds/demstds.html. 177 USGS. (1999, September). Standards for Digital Line Graphs. Rocky Mountain Mapping Center. Retrieved March 2009, from http://rmmcweb.cr.usgs.gov/nmpstds/acrodocs/dlg-3/2d1g0999.pdf. Venteris, E.R.; Slater, B.K. (2005). A Comparison Between Contour Elevation Data Sources for DEM Creation and Soil Carbon Prediction, Coshocton, Ohio. Transactions in GIS 9.2, pp. 179-198. Vogt, J .; Colombo, R.; Bertolo, F. (2003). Deriving drainage networks and catchment boundaries: a new methodology combining digital elevation data and environmental characteristics. Geomorphology 53, pp. 281-298. Walker, J .P., Willgoose, GR. (1999). On the effect of digital elevation model accuracy on hydrology and geomorphology. Water Resources Research 53.7, pp. 2259-2268. Wilson, J .P.; Gallant, J.C. (2000). Terrain Analysis. New York: John Wiley and Sons, Inc. Wilson, J .P.; Lam, C.S.; Deng, Y. (2007) Comparison of the performance of flow- routing algorithms used in GIS-based hydrologic analysis. Hydrological Processes February, pp. 1026-1044. Wise, S. (1998). The Effect of GIS Interpolation Errors on the Use of Digital Elevation Models in Geomorphology, In: Landform Monitoring. Modellingand Analysis, Edited by S. N. Lane, K. S. Richards and J. H. Chandler, John Wiley and Sons, 300 pp. Wise, S. (2000). Assessing the quality for hydrological applications of digital elevation models derived from contours. Hydrological Processes 14, pp. 1909-1929. Wise, S. (2007). Effect of differing DEM creation methods on the results from a hydrological model. Computers & Geosciences 33, pp. 1351-1365. Wolock, D.M.; McCabe, G.J. Differences in topographic characteristics computed from 100- and lOOO-m resolution digital elevation model data. Hydrological Processes 14, pp. 987-1002. Woolpert Inc. (2007, October 11). OSIP DEM Tiles by County — GRID Format (FGDC) /N1440375 (ISO). OGRIP Website. Retrieved July 17, 2008, from http://metadataexplorer.gis.state.oh.us/metadataexplorer/full_metadata.j sp?docId= {6FABOC57-1A57-4142-BCF7-A3 8F 4D1 1 F CA3 }&loggedln=false. Wu, S.; Li, J .; Huang, G. (2005). An evaluation of grid size uncertainty in empirical soil loss modeling with digital elevation models. Environmental Modelling and Assessment 10, pp. 33-42. 178 Wu, 8.; Li, J ., Huang, G. (2007). Modeling the effects of elevation data resolution on the performance of topography-based watershed runoff simulation. Environmental Modelling & Soflware 22, pp. 1250-1260. Ziadat, RM. (2007). Effect of Contour Intervals and Grid Cell Size on the Accuracy of DBMS and Slope Derivatives. Transactions in GIS. 11.], pp. 67-81. 179 UNIVERSITY 030 63 61 ”11111111111111 11111111“ 3 12 9 3 6 Jr