A MINIMALISTIC DATA DISTRIBUTION SYSTEM
TO SUPPORT UNCERTAINTY-AWARE GIS
By
Nicholas Oren Ronnei

A THESIS
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Geography – Master of Science
2017

ABSTRACT
A MINIMALISTIC DATA DISTRIBUTION SYSTEM
TO SUPPORT UNCERTAINTY-AWARE GIS
By
Nicholas Oren Ronnei
Error and uncertainty are inherent in all digital elevation models (DEMs) –
representations of the Earth’s terrain. It is absolutely essential to account for this
uncertainty in any GIS operations that rely on this data because uncertainty
propagates through any derived products. This can have very serious
consequences such as the potential invalidation of model results. Geostatistical
methods like conditional stochastic simulation have been developed to mitigate
this problem, but they require expert knowledge to apply them to a project.
Despite the fact that uncertainty propagation has been discussed in geographic
literature for nearly three decades, there has been very little progress in making
such analysis accessible to those who are not geostatistics experts — the
majority of GIS users. This research uses open source software to build a system
that makes the results of complex error models accessible to researchers
worldwide without the need for expert knowledge. Then, I use this system to
acquire data and perform a basic analysis, demonstrating how the average
researcher might incorporate uncertainty propagation in own their work. In doing
so, I hope to elucidate the ways in which conditional stochastic simulation
changes the traditional spatial data model and set an example for others to
follow.

Copyright by
NICHOLAS OREN RONNEI
2017

For the people and puppies who have supported me on this trying and
rewarding journey – I would not have made it without you.

iv

ACKNOWLEDGEMENTS

I would like to thank my committee for their tremendous patience with me throughout the
development of this manuscript. I would also like to thank Dr. Ashton Shortridge specifically for
his support and guidance during my time at Michigan State University. Thank you to the many
friends and family members who provided me much needed moral support. Thank you as well,
to the the innumerable discussion board contributors who both helped me find sources and
solve tough technical problems by sharing their expertise in a public forum. Finally, I would like
to thank the administrative staff of the Department of Geography for keeping me on track
through my absent mindedness. This work was funded by the National Geospatial Intelligence
Agency, for which I am truly grateful.

v

TABLE OF CONTENTS

LIST OF FIGURES

viii

KEY TO ABBREVIATIONS

x

1 Introduction
1.1 The Problem
1.2 The Solution
1.3 Research Objectives

1
2
4
5

2 Literature Review
2.1 Error and Uncertainty
2.1.1 The Spatial Structure of Error
2.1.2 Error Propagation
2.1.3 Mitigating Error Impacts with Geostatistics
2.2 GIS and Uncertainty
2.2.1 Spatial Data Quality
2.2.2 A Brief History of Error-aware GIS
2.2.3 Uncertainty and GIS in Distributed Environments
2.3 GIS in Distributed Environments
2.3.1 Early Examples of Distributed GIS
2.3.2 The Importance of Web Standards
2.3.3 Web Standards and Geographic Information
2.3.4 The Limitations of Web Standards
2.3.5 The Importance of Systems Architecture
2.3.6 Service Oriented Architectures
2.3.7 Resource Oriented Architectures
2.3.8 Modern Examples of Distributed GIS

7
7
7
8
11
15
15
17
21
23
24
27
31
34
36
38
40
42

3 Building a New Web-based Distribution System
3.1 Design Philosophy
3.1.1 Make it Open
3.1.2 Make it Easy
3.1.3 Make it Magic
3.1.4 Standards Compliance
3.2 Server Architecture
3.2.1 Choosing the Right Architecture
3.2.2 Node.js and Express
3.2.3 PostGIS and R
3.3 User Interface
3.3.1 Front End Framework
3.3.2 Web Mapping Library

52
52
53
54
55
56
58
58
62
66
69
69
71

4 From Planning to Practice
4.1 The Back End

73
74

vi

4.1.1 PostGIS Lessons Learned
4.1.2 Redesigning the Architecture
4.2 The Front End
4.2.1 User Interface
4.2.2 User Experience
4.3 Connecting the Two
4.3.1 Feeding User Input to R
4.3.2 Security concerns
4.4 System Performance

74
77
78
79
80
83
83
85
87

5 Using the System
5.1 Monte Carlo Viewshed Analysis
5.1.1 Changing the Data Model
5.1.2 An Example Analysis with R and GRASS

94
95
96
97

6 Conclusions
6.1 Limitations
6.2 Future Directions

100
102
104

REFERENCES

106

vii

LIST OF FIGURES

Figure 1. A basic diagram of the simple kriging process. Phase A shows the sample data
in red (boreholes from the example above). Phase B shows the regular grid to which values
are interpolated in black. Phase C shows the calculation for a single point (green) in the
grid. Phase D shows the final interpolated surface and its associated contours. The smooth
surface generated is typical of interpolation kriging.

9

Figure 2. The standard method for calculating a viewshed.

13

Figure 3. A single run of the Monte Carlo Simulation viewshed model.

13

Figure 4. The entire Monte Carlo Simulation viewshed model.

14

Figure 5. Duckham’s (2002) conceptual model of the error-sensitive GIS developer’s
perspective.

19

Figure 6. A timeline of some of the major contributions to the concept of error-aware GIS.

23

Figure 7. A family tree of Service and Resource Oriented Architectures and their
constituent systems as discussed in the preceding paragraphs of this section.

51

Figure 8. A conceptual diagram of our system’s architecture. The database both stores the
data and performs the analysis using PL/R. After processing, the server sends the user a
download link by email.

61

Figure 9. Left: Calculating slope on a tiled DEM – notice the dark colored lines laid across
the image in a perfect grid. Right: Calculating slope with ST_Union to avoid edge effects.

76

Figure 10. The Data panel of the user interface allows users to submit data requests to our
system.

80

Figure 11. This image shows the user interface with a color blindness filter applied and form
validation feedback showing.
82
Figure 12. This image shows the user interface with a color blindness filter applied and form
validation feedback showing.
84
Figure 13. Run times in hours across all 50 jobs for 720 arcsecond patches.

90

Figure 14. Run times in hours across all 50 jobs for 540 arcsecond patches.

91

Figure 15. Run times in hours across all 50 jobs for 360 arcsecond patches.

91

Figure 16. Average time to complete a job by size of requested area.

92

viii

Figure 17. A comparison between the final result of the Monte Carlo Viewshed Analysis
script and one of its intermediate outputs.

ix

99

KEY TO ABBREVIATIONS

AESC

American Engineering Standards Committee

AJAX

Asynchronous JavaScript and XML

API

Application Programming Interface

ANSI

American National Standards Organization

AML

ARC/INFO Macro Language

ASTER

Advanced Space-borne Thermal Emissions Radiometer

AWARE

Available Water Resource in Mountain Environments

CRUD

Create, Read, Update, Delete

CSS

Conditional Stochastic Simulation

DEM

Digital Elevation Model

DDoS

Distributed Denial of Service

DUE

Data Uncertainty Engine

ESRI

Earth Science Research Institute

FEMA

Federal Emergency Management Administration

FGDC

Federal Geographic Data Committee

FOSS

Free and Open Source Software

FTP

File Transfer Protocol

GDAL

Geospatial Data Abstraction Library

GDEM

Global Digital Elevation Model

GeOnAS

Geographic Online Analysis System

GEO

Group on Earth Observation

GEOSS

Global Earth Observation System of Systems

x

GIS

Geographic Information Systems

GiST

Generalized Search Tree

GNU

GNU’s Not Unix

GPL

General Public License

GPS

Global Positioning System

GRASS

Geographic Resources Analysis Support System

GWASS

GRASS Web Application Software System

GUI

Graphical User Interface

HTML

Hypertext Markup Language

HTTP

Hypertext Transfer Protocol

INSPIRE

Infrastructure for Spatial Information in Europe

IP

Internet Protocol Address

ISO

International Organization for Standardization

IT

Information Technology

JPL

Jet Propulsion Laboratory

JSON

JavaScript Object Notation

JSON-RPC

JavaScript Object Notation Remote Procedure Call

LAMP

Linux, Apache, MySQL, and PHP

LAN

Local Area Network

LP DAAC

Land Processes Distributed Active Archive Center

MATCH

Multidisciplinary Assessment of Technology Centre for Healthcare

MCA

Monte Carlo Analysis

MEAN

MongoDB, Express, Angular, and Node.js

MODIS

Moderate Resolution Imaging Spectroradiometer

xi

MPGC

Multiple Protocol Geospatial Client

NASA

National Aeronautic and Space Agency

NGA

National Geospatial Intelligence Agency (formerly NIMA)

NIMA

National Imagery and Mapping Agency

NSDI

National Spatial Data Infrastructure

OGC

Open Geospatial Consortium

OGIS

Open Geodata Interoperability Specification

OS

Operating System

OSM

OpenStreetMap

PC

Personal Computer

QGIS

Quantum GIS

PHP

PHP Hypertext Preprocessor

REST

Representational State Transfer

ROA

Resource Oriented Architecture

RPC

Remote Procedure Call

SDI

Spatial Data Infrastructure

SDSS

Spatial Decision Support Systems

SOA

Service Oriented Architecture

SOAP

Simple Object Access Protocol

SPA

Single Page Application

SQL

Structured Query Language

SRTM

Shuttle Radar Topography Mission

TRI

Topographic Roughness Index

UI

User Interface

xii

UMN

University of Minnesota

URI

Universal Resource Identifier

URL

Universal Resource Locator

UX

User Experience

W3C

World Wide Web Consortium

WCS

Web Coverage Service

WFS

Web Feature Service

WMS

Web Map Service

WMTS

Web Map Tile Service

WIRM

Web-based Interactive River Model

WPS

Web Processing Service

WSDL

Web Service Description Language

XML

Extensible Markup Language

xiii

1 Introduction
In the information age, scientists are rarely short of data. The incredible speed with
which sensor technologies continue to grow, evolve, and miniaturize suggests that this trend will
continue. Indeed, in the geosciences, we know more about our planet than we ever have. This
data revolution occurred due to improved data collection techniques generally, and satellite data
acquisition in particular. Since the dawn of flight, aerial imagery has played a critical role in
improving our knowledge of our world. Now, with so many satellite constellations orbiting earth
and providing a constant flow of data in nearly real time, it is easy to think that we have
answered all the data-related questions of remote sensing. We can tell health of the forest
canopy in a small patch of forest in Madagascar which has never been surveyed thanks to
MODIS. We have elevation data coverage for 99% of the Earth’s land mass thanks to GDEM1.
As a researcher, it can feel as though one has all the pieces and need only put them together to
find the answer to a difficult problem.
But is this really the case? Of course not! Even though the sheer quantity of data in the
public domain and the massive areas for which those data are available can create that
perception, these data are attempts to model the natural world. By definition, models are
simplified versions of the complex phenomena they are intended to represent. Therefore there
will always be some disagreement between the model and the reality. While researchers no
longer have to worry about a lack of such essential data as sensor suites like MODIS, ASTER,
and LandSat provide, they do have to carefully consider the quality of those data. Remotely
sensed datasets can appear infallible to the end user. They cover the globe and can provide a

1

https://asterweb.jpl.nasa.gov/GDEM.ASP

1

discrete “measurement” of a phenomenon anywhere in their coverage zones, be it temperature,
land classification, or soil moisture. However, those experienced in remote sensing know that
those datasets contain a lot of uncertainty and error. This work focuses on the error and
uncertainty inherent to elevation data. It discusses the existence of uncertainty in the data, the
characteristics of that uncertainty, and its impact on the researcher. It briefly covers the ways in
which such uncertainty has been handled in the past, and advances a new method of dealing
with it in the future. Finally, it provides concrete examples for other researchers to follow so that
they can avoid the negative impacts of uncertainty on their work.

1.1 The Problem
Elevation data is a fundamental requirement for all sorts of geospatial research. From
modeling precipitation and climate to calculating viewsheds, the applications of digital elevation
models (DEMs) are ubiquitous. Some of these applications have very significant impacts on the
lives of everyday people, such as FEMA Floodplain Mapping in the United States. These maps
designate how much people must pay for flood insurance and even delineates where people
can and cannot be covered for flood insurance based on risk of inundation. But what if these
maps were wrong altogether? People might be paying more than they should for flood
insurance. Worse, their homes may be in danger of frequent flooding and they would not even
know it. When uncertainty and error exist in the input data, they propagate through any
operation on the data and remain embedded in the result (Heuvelink, 1999). Error and
uncertainty do exist in DEMs, as they exist in all geographic data, and their propagation can
lead to a variety of negative outcomes like that of the floodplain example above (Fisher & Tate,
2006). No matter how careful researchers are, if their underlying assumptions are incorrect then
their results will be flawed.

2

But what are these errors, this uncertainty? Succinctly, “Our lack of knowledge about the
reliability of a measurement in its representation of the truth is referred to as uncertainty,”
whereas, “Error is defined as the departure of a measurement from its true value” (Wechsler,
1999). So, uncertainty in the context of DEMs refers to our lack of knowledge about the error in
the dataset. Global datasets like ASTER GDEM promise 1 arcsecond horizontal resolution
(approximately 30m) and 1m vertical precision with an overall accuracy of approximately 25m.
So while GDEM can detect an elevation change as small as 1 meter, vertical measurement
could be 25 meters higher or lower than reported and occur 30 meters in any direction from
where it is reported (Toutin & Cheng, 1999). In reality uncertainty is greater than reported.
GDEM has a practical horizontal resolution of ~72m (Tachikawa et al., 2011). Furthermore, error
in vertical measurements varies widely depending on landscape characteristics (Tachikawa,
Kaku, & Iwasaki, 2011).
Fortunately, extensive research has addressed the problem of error and uncertainty in
DEMs more broadly and GDEM in particular. The research proves the existence of error in
GDEM (Miliaresis & Paraschou, 2011; Gesch et al., 2012). It proves that the error in DEMs is
non-random (Oksanen & Sarjakoski, 2006; Erdoğan, 2010). Researchers have even discovered
effective ways to model and predict error in DEMs (Kyriakidis, Shortridge, & Goodchild, 1999;
Fisher, 1998). Specifically, conditional stochastic simulation (CSS) has proven immensely useful
when coupled with Monte Carlo analysis (MCA) for ameliorating the impact of error in DEMs
(Aerts, Goodchild, & Heuvelink, 2003; Castrignanò et al., 2006). There only remains one real
problem regarding uncertainty that has not been thoroughly researched: how best to put
information about error and uncertainty in the hands of the average GIS user. That’s problematic
because whether the user is aware of it or not, the uncertainty and error in a dataset become
embedded in the result of any GIS operation performed on that dataset or any derivative of it.

3

This can lead to serious problems if, for example, those operations involve mapping a flood
plain. While the analysis might be correct, the uncertainty in the data obscure the truth and may
cause a person to make poor decisions about where to build their new home.
There are ways to handle uncertainty in geographic data that help minimize its impact
such as conditional stochastic simulation (explained in later sections). Unfortunately, today’s
GIS user must be a geostatistical expert to apply conditional stochastic simulation to a given
project. This effectively prevents the vast majority of GIS users from considering the impact of
error and uncertainty in their research any further than writing a couple sentences about it in the
Limitations section of their papers. The major challenge facing these users is the task of
developing the error model for their data. This research seeks to change that by bringing error
modeling to them.

1.2 The Solution
This work is part of a larger project sponsored by the National Geospatial-Intelligence
Agency (NGA) and led by Dr. Ashton Shortridge and Dr. Joseph Messina at Michigan State
University. The first stage of the project involved substantial research on modeling error and
uncertainty in SRTM v4.1 and ASTER GDEM v2, the primary datasets on which this paper will
focus. The second stage of the project – addressed in this thesis – involves distributing the
results of these models so that researchers around the globe can take advantage of their work.
The major difficulty in accounting for uncertainty in SRTM and GDEM with CSS is the
development of the original error model. The second most difficult step is developing code to
efficiently generate error surfaces from that error model. Removing those two steps from the
chain of operations effectively removes the barrier to entry for the average GIS user, as MCA
with pre-generated datasets is conceptually straightforward.

4

1.3 Research Objectives
The objectives of this research are as follows:

1. Synthesize the technical approaches of web-based spatial data distribution systems.
2. Create a system to distribute DEM error realizations on the Internet.
3. Create an example analysis to demonstrate how researchers might incorporate
stochastic simulation in their own work.
4. Examine how the stochastic paradigm influences the traditional geographic data model.

Objective 1 is a precondition for achieving Objective 2. Without this foundation, it would
be nearly impossible to build an effective system. Objective 2 will ensure that researchers
around the globe can employ complex error modeling in their work without the need to be
experts on the subject. However, simply accessing the results of error models will not be enough
– researchers need to know what to do with these data. Objective 3 will provide examples of
how to use the stochastic paradigm on three common and simple GIS operations: slope, aspect,
and viewshed analysis. Objective 3 is also crucial to the accomplishing stage two of the larger
NGA project because education is just as important as access. Objective 4 addresses the
fundamental changes the stochastic paradigm makes to the definition of geographic data.
Goodchild, Shortridge, and Fohl discussed this issue as early as 1999. Rather than including
measures of uncertainty in the metadata, our system effectively includes uncertainty information
within the data themselves. Reconceptualizing the data model represents one of the first big

5

steps towards the creation of the “error aware GIS” alluded to throughout the literature
(Duckham, 2002; Fisher & Tate, 2006).
We begin by exploring the literature surrounding the existence and characteristics of
error and uncertainty in DEMs and the problems they cause. Then, we discuss methods for
analyzing and ameliorating the impacts of error and uncertainty on GIS operations. Next we turn
to the literature surrounding web-based geographic data distribution systems: their origins, their
construction, and modern examples. After that, we discuss the design philosophy and plan for
building and implementing our own system, followed by an examination of why the original plan
failed and how we adapted to the challenges we faced. Then, we provide an example of how to
use the data distribution we created over the course of our research. Finally, we conclude with a
summary of what we accomplished and what remains to be done.

6

2 Literature Review
2.1 Error and Uncertainty
Error exists in all geographic datasets. DEMs are no exception, regardless of what
technologies are used to collect the data (Zandbergen, 2010; Oksanen & Sarjakoski, 2006;
Holmes, Chadwick, & Kyriakidis, 2000). Fisher and Tate (2006) present four varieties of error in
geographic data: error with bias, systematic error, random error, and spatially autocorrelated
error. Many researchers have studied the accuracy of SRTM and GDEM using other DEMs as
references (Bolten & Waldhoff, 2010; Rexer & Hirt, 2014). Others have done the same using
highly accurate laser altimeters (Hengl & Reuter, 2011; Zhao, Xue, & Ling, 2010; Ni, Sun, &
Ranson, 2015) or GPS benchmarking (Li et al., 2013). Together, these studies reveal that error
in DEMs generally, and SRTM and GDEM in particular, is spatially autocorrelated – the closer
two positions are, the more likely they are to have similar levels of error.

2.1.1 The Spatial Structure of Error
While ubiquitous, error and uncertainty have characteristics that vary from dataset to
dataset. If researchers can find the characteristics shared by spatially autocorrelated clusters of
error, then they can model error based on the presence of those characteristics (Carlisle, 2005).
Error in SRTM varies spatially depending on land cover, aspect, and slope (Shortridge &
Messina, 2011). Error in GDEM is also affected by the same factors (Jing et al., 2014). Simply
put, areas with forest cover, mountainous regions, and rugged areas whose slopes face a
certain way are more error prone than areas which do not have those characteristics. In addition
to those commonalities, GDEM uncertainty is heavily affected by the number of “scenes”

7

(individual satellite images) used in the derivation of a particular value, known as its “stack
number” (Miliaresis & Paraschou, 2011).

2.1.2 Error Propagation
All data have inherent error and uncertainty. When a GIS user performs an operation on
a dataset, the uncertainty from input dataset will be present the output as well. This process is
known as error propagation, and it is problematic because “the output may not be sufficiently
reliable for correct conclusions to be drawn from it” (Heuvelink, 1999). Understanding how error
and uncertainty propagate can help reduce their impact on the output. CSS is a statistical
technique which researchers use to do just that. Petroleum exploration is one field which
employs CSS quite extensively, and in a way very similar to how it is used in this research
(Honh, 2013). To take an example from the oil and gas industry, imagine an oil reservoir deep
underground. The goal is to make the most accurate map possible of that reservoir given the
limited data available. There are many points at which field workers have collected bore
samples, but because the boundaries of geological features are not quite as discrete as we
typically imagine, we can only find them by taking measurements of continuous data (in this
case, the concentration of oil). Naturally, nearby boreholes are likely to have similar sample
values. Kriging, a linear estimation technique used by the mining industry since at least the
1960s, takes advantage of this (Cressie, 1990; Hohn, 2013).
There are several types of kriging, but just two main reasons to use them: interpolation
and simulation. In the case of the former, the modeler is simply trying to get a smooth, reliable
prediction surface. The mechanics of interpolation kriging work like inverse distance weighting in
that sample points close to a grid point have more weight in the prediction than those

8

Figure 1. A basic diagram of the simple kriging process. Phase A shows the sample
data in red (boreholes from the example above). Phase B shows the regular grid to
which values are interpolated in black. Phase C shows the calculation for a single point
(green) in the grid. Phase D shows the final interpolated surface and its associated
contours. The smooth surface generated is typical of interpolation kriging.
farther away. Predictions in simple kriging (the only type discussed in this section) tend toward
the global mean of the sample values, ensuring a nice, smooth surface. While this is great for
making prediction maps, it totally ignores the frequently noisy structure of empirical data. it is
like using the average of all historical maximum temperatures for a given day to estimate its high
temperature. While it is not a bad guess, the actual value is likely to vary from from the estimate.

9

For a brief history and a good explanation of the origins and development of kriging use for
interpolation, see Cressie (1990).
In the second case, and the one relevant to this project, the modeler does not predict
grid values using the actual sample values, but those drawn from a normal distribution. This
process is known as Stochastic Simulation and it produces a realization of a random field per
the Gauss-Markov Theory. The modeler can go a step further and “condition” the distribution to
make it reflect the sample data. In this case, known as conditional Gaussian simulation (also
referred to by the broader term conditional stochastic simulation over the course of this paper),
the sample values are converted to Z-scores, and these are used to predict Z-scores for each
grid point. Then, a value for the grid cell is drawn from a normal distribution based on that
Z-score. See Figure 1 for a graphical demonstration of the kriging process.
Let us revisit our imaginary oil reservoir to see the practical implication of CSS. By using
simulation rather than interpolation, our survey team produces one possible realization of a
conditional random field. Put another way, it is one equally probable representation of reality
based on the characteristics of the available data. If we perform the simulation process a large
number of times, we can create a probability surface that reveals how likely it is to find oil at a
particular place, rather than where oil is. The former is preferable to the latter, because it
captures the impact of uncertainty on the results of our oil discovery model. This is possible
because the realizations have the same properties as the simulation model, preserving the
texture of the data. The interpolation surface, on the other hand, is smooth and oversimplifies
the complexity of reality. For geospatial analyses which depend on the texture of the landscape,
such as slope, this is a crucial benefit.

10

2.1.3 Mitigating Error Impacts with Geostatistics
Many methods have been used to model uncertainty and error in DEMs (Erdoğan, 2010;
Cuartero et al., 2014). The most thoroughly researched method for modeling and understanding
error and uncertainty is conditional stochastic simulation. In their 2006 paper, Wechsler and
Kroll discuss how CSS can be used “for evaluation of the effects of uncertainty on elevation and
derived topographic parameters.” Indeed, Heuvelink (1999) has long used CSS to model
uncertainty in geographic data, including DEMs. CSS is often coupled with another technique
called Monte Carlo simulation to study uncertainty propagation.
Uncertainty and error propagate in all geographic operations (de Moel & Aerts, 2011;
Heuvelink, 1999; Arbia, Griffith, & Haining, 1998), but CSS and MCS can be used to measure
and mitigate their impact. Combining CSS and MCS to study uncertainty propagation is by no
means a technique unique to elevation data, nor even to the field of GIScience. The literatures
of spatial decision support systems, geostatistics, and geophysics (to name just a few) contain
many examples of studies on the impact of uncertainty on model outputs. More relevant to this
work, Fisher (1998) and others such as Carlisle (2005) and Castrignanò et al. (2006) have
shown that CSS and MCS greatly improve the quality of elevation error modeling, thus
enhancing the accuracy of outputs.
While these effects can be measured and accounted for, few people do so because they
lack the experience necessary to apply CSS to their own work. While the researchers
mentioned above have done work with CSS and MCS, they are domain experts. Furthermore,
they are not necessarily interested in the data or the output of the model they are working with.
they are interested in measuring uncertainty propagation. While this is scientifically worthy in its
own right, for these techniques to be meaningful to the geosciences community they must have

11

a practical application. Experts also need to be able to communicate effectively about
uncertainty in data to end users who aren’t used working with it – a challenge no one to date
has been able to overcome sufficiently as discussed in Section 2.2.1.
The process of repeating CSS to create a probability surface described in the previous
subsection is known as Monte Carlo simulation. To better understand it, let’s drop our oil
reservoir example and consider a simple, widely known GIS operation: the viewshed calculation.
In a standard viewshed calculation, our inputs are a Digital Elevation Model, a point of origin,
and other parameters depending on the specific implementation of the algorithm such as
observer height and atmospheric refraction. In a Monte Carlo simulation, the model looks very
similar. In fact, one will notice that the only difference between the standard (Figure 2) and the
MCS (Figure 3) versions of the model is that the latter accepts an error realization (one version
of reality) instead of a DEM for its elevation input parameter. This is, however a little deceptive
because Figure 3 only represents a single run of the full MCS model. The complete model
contains numerous runs, as one can see from Figure 4.
Each realization is, in theory, equally likely to represent reality, so it makes almost
intuitive sense that more runs produce a better result. The more possibilities one considers, the
clearer the picture of reality becomes. If a particular grid cell falls within the viewshed over 90%
of simulations, we can be quite certain that it would really be in the viewshed if we stood at our
selected point of origin and looked for it. This is, of course, a very simple example. Bearing in
mind the practical and computational challenges involved in creating robust, effective models for
generating error realizations it is easy to see why most spatial research has been so slow to
include it. That said, there are a variety of example applications that, while still conducted by
domain experts, represent quintessential real-world use cases.

12

Figure 2. The standard method for calculating a viewshed.

Figure 3. A single run of the Monte Carlo Simulation viewshed model.

13

Figure 4. The entire Monte Carlo Simulation viewshed model.
The CSS/MCS technique discussed above offers a way for users to understand how
error and uncertainty propagates and and to account for error and uncertainty in their analyses.
One such example comes from Oksanen and Sarjakoski (2005), who detail the effects of error
propagation in drainage basin delineation. Another example of using CSS to quantify error can
be found in the work of Hengl, Heuvelink, and Van Loon (2010) who use CSS to generate
stream networks from elevation data. One of the more influential examples of CSS/MCS and its
benefits comes from Aerts, Goodchild, and Heuvelink (2003). Their case study provided ski
resort owners with far more accurate estimates of cost and schedule to build a new ski run than
traditional route optimization, potentially saving them millions of dollars in unforeseen costs. The
author highly recommends this paper to the curious reader, as it offers a clear explanation of the
processes involved and is relatively accessible even to those new to the CSS/MCS.

14

2.2 GIS and Uncertainty
2.2.1 Spatial Data Quality
When beginning a new project, the quality of the data used should be of the utmost
concern to the researcher. But what defines the quality of a spatial dataset? There are two core
communities concerned with error in geographic data: spatial data quality and accuracy. The
former focuses mostly on standards, while the latter is concerned with positional and attribute
accuracy. There is some consensus within the discipline that this distinction is arbitrary, and that
efforts should be made to draw the two research groups back together, as both of them are
fundamentally concerned with the same thing – data quality (Devillers et al., 2010).
Questions of data quality arose long before GIS, but are particularly relevant since its
rise to popularity. Questions of accuracy arise from the simple fact that GIS models the real
world, and models are always imperfect: “These forcible deviations between a representation
and actual circumstances constitute error” (Chrisman, 1991). The accuracy community’s
research has focused largely on the quantification of error and uncertainty (examples include
Heuvelink, 1999, Goodchild 1993; Heuvelink & Burrough, 1993; Kyriakidis, Shortridge, &
Goodchild, 1999; Goodchild, Shortridge, & Fohl, 1999; Wechsler, 1999). The spatial data quality
community concerns itself with how to communicate those imperfections in a way that is
meaningful. Regardless of their differences and despite many advances in both areas, “a large
body of scientific knowledge is still only in the hands of researchers and embedded in scientific
publications” (Devillers et al., 2010).
The boon that GIS offers to researchers, planners, and analysts of all kinds is hard to
understate. Members of those communities have been able to accomplish things previously
thought impossible and in no time at all. It is easy to get caught up in the smooth flow of things

15

and forget to ask hard questions. “Errors and uncertainties in data can lead to serious problems,
not only in the form of inaccurate results but also in the consequences of decisions made on the
basis of peer data,” leading to repercussions ranging from getting directions to the wrong place
to legal action (Goodchild, 1993). By paying attention to spatial data quality, GIS users can
avoid those problems. Accuracy of the data is an important characteristic, but communicating
information about accuracy in an actionable format is essential.
From the user’s perspective, useful metadata answers one simple question: are the data
fit for use in a particular application? The good metadata should help users with that “fitness for
use” assessment (Chrisman, 1991; Kim, 1999; Veregin, 1999; Guptill, 1999; Devillers et al.,
2002; Van Oort & Bregt, 2005). In spatial data In the spatial community as in others, there are
standards for the metadata which require metrics be included to help researchers answer that
question. Unfortunately, metadata frequently fall short of that goal. Research shows that,
contrary to popular belief, spatial data users often do attempt to assess fitness for use (Van Oort
& Bregt, 2005). However, even experienced users can have trouble doing so (Devillers et al.,
2010) and others feel that the cost in time and other resources to fully assess data quality
outweighs the benefits of doing so (Van Oort & Bregt, 2005).
One reason commonly suggested for this failure is that standards often document
uncertainty using confusing terminology and obscure methodologies (Devillers et al., 2010).
Early metadata standards work from the Federal Geographic Data Committee (FGDC),
International Hydrographic Organization, was essential for reasons discussed in the Web
Standards and Geographic Information section of this paper, but failed to appropriately address
fitness for use issues (Kim, 1999; Veregin, 1999; Guptill, 1999; Salge, 1999). Since then, little
practical progress has been made towards easing the assessment of fitness for use (Devillers et
al., 2010). However, the literature reveals a wide variety of different approaches to the problem,

16

mostly from the accuracy community, including visualization (MacEachren, Robinson, & Harper,
2005) and web-based uncertainty simulation (Bowling & Shortridge, 2010; Byrne et al., 2010;
Walker & Chapra, 2014). These developments are promising and reflect the beginnings of
practical applications of research regarding fitness for use (Devillers et al., 2010).
This research constitutes another contribution to those developments. By removing the
complicated statistical models which describe the error inherent to GDEM and SRTM from the
user’s workflow, it improves the accessibility of spatial data quality information. Instead of
attempting to infer the accuracy of information at a particular spot, using error realizations from
the web distribution system allows the GIS user to work with data in a format and manner he or
she is familiar with. The only major changes are in the number of times the user must run the
analysis and the addition of a final map algebra step. After that, fitness for use can be visualized
using the resulting probability surface, unlike traditional analyses. See the Using the System
section of this paper for a detailed example.

2.2.2 A Brief History of Error-aware GIS
The concept of an “error-aware GIS” capable of handling spatial data quality information
is not a new one (Unwin, 1995). In fact, the impact of uncertainty on GIS and its uses has long
been a research priority (Heuvelink, 1999; Goodchild & Gopal, 1989). Previous research has
even lead to the development of a few systems which aspired to the title of “error-aware”,
though they have all had drawbacks that tempered their usefulness.
Logsden, Bell, and Westerlund (1996) developed an uncertainty visualization tool for
land transition probabilities using C, UNIX shell scripts, and ARC/INFO Macro Language (AML).
While undeniably useful for planners, this tool fall far from the “error-aware GIS” described in the
literature. Its primary contribution to such that goal was the development of a visualization tool,

17

and may be better described as an “probability-aware GIS”. While it did employ stochastic
modeling, those models were used to determine the probability of land transition based on
LandSat data, and data quality was not considered. Still, the ideas behind the visualization
techniques they employed serve as a useful guide for future developers.
Another, much more robust example of error-aware GIS development comes from the
work of Goodchild, Shortridge, and Fohl (1999). They proposed a method for encapsulating
uncertainty within a geographic dataset. This is important for two reasons. First, “Choosing a
data model that artificially separates quality and spatial data entails additional conceptual and
implementational structures to maintain the connections between spatial data and its quality”
(Duckham, 2002). The second is the fundamental change to the geographic data model which
encapsulation creates. It prevents the geographic data and at least some of their associated
metadata from being separated, a point which Beard (1997) says is increasingly important in a
world driven by data sharing and it precludes data creators from slapping unhelpful generic
spatial data quality information on their dataset because the simulation models are specific to
each dataset (Duckham, 2002). The approach taken in this paper is revolutionary and doubtless
a good example to follow, but the key drawback is that data creators still need to be
geostatistical experts, making it difficult for the average user to create his or her own data. It
would also likely increase the cost of data production for data vendors.
Duckham (2002) attempts to address some of those issues in his own work on
developing an error-sensitive GIS for Kingston Communications. Duckham employs an
object-oriented database design to store data and their associated metadata (data quality
information included) and then employs a user interface which assists in both the collection

18

Figure 5. Duckham’s (2002) conceptual model of the error-sensitive GIS developer’s
perspective.
and analysis of those data. This concept is very similar to the encapsulation work of Goodchild,
Shortridge, and Fohl (1999) but improves on it by providing a user interface to work with the
data and their quality information. The artificial-intelligence-assisted data collection user
interface also overcomes the data production cost problems of that work (Duckham, 2002;
Goodchild, Shortridge, & Fohl, 1999).
Another informative example in the timeline of error-aware GIS development is the work
of Aerts, Clarke, and Keuper (2003) who qualitatively tested uncertainty visualization
techniques. This work is essential to the development of an error-aware GIS one of its core
functions would be to aid the user in the conceptualization and visualization of error. Indeed, all
of the research discussed so far in this subsection stresses the essential nature of an intuitive
user interface in the development of an error-sensitive system.
Karssenberg and De Jong (2005a, 2005b) also worked on creating an error-aware GIS
by extending the PCRaster environmental modeling package. PCRaster was created and is now
maintained by the University of Utrecht (Karssenberg et al., 2010). They focused on
environmental modeling using the existing modeling language built into PCRaster, but extended
the framework to from two- to three-dimensional models and added tools for MCS. The flavor of

19

error propagation analysis this enabled was unusual compared to the other variations discussed
in this paper because its primary focus was continuous phenomena such as temperature or
precipitation data rather than discrete features (Karssenberg & De Jong, 2005a). Because
elevation is also a continuous phenomena, PCRaster is particularly relevant to this research.
On the other hand, PCRaster had its share of problems. For one, it still requires expert
knowledge of geostatistics, as the functions it employs require the user to specify the
semivariogram and other parameters used in the conditional stochastic simulation and Monte
Carlo analysis process. Additionally, while the rather rigid structure of the program makes its use
rather straightforward for the user, it is not computationally optimized or particularly extensible.
Finally, Karssenberg and De Jong (2005b) point out that, at the time of writing, PCRaster was
not truly a GIS. That is, it was not capable of visualizing or organizing data in its own right, and
relied on other software to do so. Since then, the University of Utrecht has released the software
under the GNU General Public License and added visualization support – two very important
conditions for making these types of analysis accessible to the general public.
In 2007, Heuvelink and Brown introduced perhaps the most fully featured attempt at an
error-sensitive GIS to date: a prototype known as the Data Uncertainty Engine (DUE). Briefly
put, “The Data Uncertainty Engine (DUE) is a prototype software tool for assessing uncertainties
in environmental data, storing them within a database, and for generating realisations of data to
include in Monte Carlo uncertainty propagation studies” (Heuvelink, 2007). The DUE
incorporates the ideas of Goodchild, Shortridge, and Fohl (1999) in that it encapsulates an
uncertainty model within the data model, but it does so following the same object-oriented
principles which Duckham (2002) espouses. Like PCRaster, it is GNU GPL licensed and can
handle continuous data (Brown & Heuvelink, 2008). Unlike PCRaster, it uses a database to
organize and store the models and data. The database can also be used to apply common

20

models to standard geographic data which has no encapsulated model, similar to Duckham’s
(2002) system. Additionally, DUE has the capacity to assist not only positional uncertainty
analysis, but also attribute and temporal uncertainty analyses or any combination thereof
(Brown & Heuvelink, 2007; Heuvelink, 2007).
Unfortunately, like the original PCRaster, DUE cannot rightly be called a GIS and must
be used alongside another program which can perform GIS operations (Heuvelink, 2007). On
the surface, this may seem an odd design choice. Why build an error-aware “GIS” when it
cannot perform GIS operations? By separating the GIS from the assistance that the DUE offers,
the end user can use whichever software they feel comfortable with – a major user experience
bonus. Additionally, it avoids platform lock-in and ensures that if the development of a particular
GIS falters, the DUE can live on; this point is not trivial, as it seems Duckham’s platform died
with the spatial software division of one of the companies with which he collaborated2.

2.2.3 Uncertainty and GIS in Distributed Environments
Recently, uncertainty analysis, like so many other computational challenges, has moved
to the cloud. The Model Web, an idea created and undertaken by Group on Earth Observation
(GEO), “Is a generic concept for increasing access to models and their outputs and to facilitate
greater model-model interaction, resulting in webs of interacting models, databases, and
websites” (Nativi, Mazzetti, & Geller, 2013). The Model Web (Bastin et al., 2013) would combine
disparate models from various disciplines to make them accessible from a single interface,
allowing the user to find, access, chain together, and run models entirely on the web. The
UncertWeb would do something similar; while the Model Web offers access to traditional
geospatial models, UncertWeb offers uncertainty-aware models.

2

According to http://www.laser-scan.com/demo/laser-scan-history/, accessed March 01, 2017.

21

As Nativi, Mazzetti, and Geller (2013) point out, “The long term vision of a consultative
infrastructure is clearly an ambitious goal.” Unfortunately, it seems to have proven too ambitious.
Work on both the Model Web and UncertWeb appears to have ended at or before the summer
of 2017. For the Model Web, there is no official website to check as the plan was to include
Model Web work within the Global Earth Observation System of Systems (GEOSS) framework,
but the GEOSS interface does not include Web Processing Services (WPS) – the main Open
Geospatial Consortium standard on which the Model Web was to rely – as a filter option. This
suggests that the work was not actually completed within the timeline mandated by the project
sponsors, and was discontinued thereafter (Nativi, Mazzetti, & Geller, 2013). As of summer,
2017 the official UncertWeb website returns no response3 despite the fact that the domain is still
registered to one of the project’s main developers4 and that the official project page (maintained
by the leading university on the grant: Aston University, Birmingham, UK) lists the project as
“completed”. Additionally, development on the official UncertWeb GitHub repository ceased in
2013, with only minor modifications thereafter. Despite their early end, the work that went into
these platforms offers many valuable insights on system design philosophy.
I believe that the Model Web and UncertWeb designers are visionaries in much the
same way Duckham (2002) and Aerts, Clarke, and Keuper (2003) were. The discoverability and
interoperability principles they espoused are important to consider when developing a
sustainable, useful system, even if their plans were never fully realized. Additionally, the truly
immense body of literature related to the Model Web and UncertWeb touches numerous
disciplines from information theory to ecology – evidence itself of the value of abstract
specifications. Distributed systems literature and the number of disciplines which contribute to it
will only expand in the future. This research aims to continue down the path the Model Web and

3
4

http://www.uncertweb.org accessed July 10, 2017
https://who.is/whois/uncertweb.org accessed July 10, 2017

22

UncertWeb research created while avoiding some of the problems that brought them down. To
accomplish that aim, a careful assessment of implementation decisions made then and the
technological advances since is required.

Figure 6. A timeline of some of the major contributions to the concept of error-aware
GIS.

2.3 GIS in Distributed Environments
In order to discuss GIS specifically, one must first understand what a “distributed system”
actually is. According to Coleman (1999), “The term distributed computing was coined,” by
Champine, Coop, and Heinselman (1980), “to describe a situation where processing tasks and
data are distributed among separate hardware components connected by a network.” In simple
terms, distributed computing allows for the separation of data management, analysis, data
visualization across multiple machines connected by a network such as a Local Area Network
(LAN) or the Internet. The benefits of distributed computing are numerous, but especially so in
the scientific community.
Sometimes referred to as cloud computing, distributed computing frees researchers from
the limitations of their own personal hardware. In a distributed system, researchers may control
intensive analyses that require very expensive hardware remotely, allowing the centralization of

23

computing resources thereby reducing costs. Distributed systems can also allow researchers to
select and analyze data without having the data itself or the analysis software installed on their
local machines.

2.3.1 Early Examples of Distributed GIS
Until the mid-1980s, distributed GIS followed the model of most computer systems and
consisted of a mainframe or mini-mainframe central host to which users would connect via a
terminal with very minimal computation power of its own. This could be considered “distributed”
computing in that each user shared central resources and accesses those resources from a
separate machine. The advent of the PC changed that, and by 1986 PCs were rapidly becoming
the preferred machine for GIS users due to their low cost and the fact that users no longer had
to share scarce computation resources. This shift away from distributed computing had its own
problems, particularly because data became difficult to share and lacked the quality control
implicit to a centrally hosted system. By the late 1980s, central spatial database servers entered
the network, supercharging distributed spatial analysis by reintroducing an authoritative data
source and providing mass storage devices which could be accessed from low-capacity
personal computers (Coleman, 1999).
As network technology continued to improve in the early 1990s, networks were no longer
constrained to office buildings or universities. Instead, the numerous previously isolated
networks spread across the U.S. (and later, the world) were connected to form the Internet
(Coleman, 1999). It became possible for organizations to share information across vast
distances at incredible speeds. For GIS users, it changed the way geographic information was
discovered and accessed forever. The same ideas as the intra-organizational distributed

24

systems described in the preceding paragraph, expanded to Internet scale, led to the FGDC’s
National Spatial Data Infrastructure (NSDI) initiative (FGDC, 1994).
Though governments had been working on Spatial Data Infrastructures (SDIs) since at
least the mid 1980s, the advent of the Internet changed the scale of those projects immensely
(Coleman, 1999; Guptill, 1999; Maguire & Longley, 2005). Instead of a central database server
on a closed network, these large scale SDIs had web-based data distribution systems
commonly called (meta)data catalogs or clearinghouses. Clearinghouses were specialized
websites meant to be “the means by which data users can more economically find, share, and
use geospatial data” (Guptill, 1999). The goal of all such systems, early or contemporary, is to
help users save time and effort when acquiring new data and lower costs for producers by
reducing the amount of duplicated data.
Clearinghouses followed a design known as Resource Oriented Architecture (ROA).
They were typically filesystem-based and allowed clients to select datasets piece by piece with
many options to choose from (Han et al., 2008). Often they relied on File Transfer Protocol
(FTP), and the only graphical interface one could expect was from either a dedicated FTP client
or a web browser capable of working as one. As technology improved and the Internet grew, it
became common to access the same sort of systems via Hypertext Transfer Protocol (HTTP)
rather than FTP – the ubiquitous download link. Even as databases and their dynamic query
capabilities, became more popular, ROA dominated the SDI environment.
The problem with a resource-oriented approach to portal development is that data
access is often only one aspect of a user’s problem. What if a user needs to process the data in
a particular way, but do not have the software available to them on their local machine? What if
a user has a postal address, but they need a point defined by latitude and longitude instead? At
this point, the user needs a service, which, at the time, was unavailable in a resource oriented

25

architecture. This would later change, as discussed in the Resource Oriented Architectures
section of this paper, but by 2002 the spatial community was already frustrated with the limited
capabilities of clearinghouses. To meet user needs, “The functional capabilities of
clearinghouses should likely be changed from a data-oriented to a user and application-oriented
focus” (Crompvoets et al., 2004).
This shift reflects not only lessons learned during the development and use of first
generation systems, but also the availability of new technologies such as Web Services. Thanks
to the standardization of Web Services, portal developers could design abstract systems to
interact with third-party services, even without knowledge of what those services or even if they
exist yet. This design paradigm is much better at fulfilling a user’s needs than first generation
systems because it is more flexible. For example, portals can provide services that allow the
user to view the data in his or her web browser before downloading. Services may also directly
expose data, allowing a user to download only the data subset they require as opposed to an
entire dataset.
In 2017, when cell phones are as powerful as the desktop PCs of 10 years ago, these
miraculous capabilities are easy to take for granted simply because they are so ubiquitous.
Obviously the Internet has been incredibly important to the sharing, analysis, and display of
geographic data (Rinner, 2003). It is worth pointing out that the growth of the Internet in general
is what has enabled the growth of distributed GIS in particular. Internet protocols were not
invented with a geographic context in mind. Distributed GIS is built on the extensive set of
existing standards which power the Internet, so understanding some of them, at least at a basic
level, is important to understanding our proposed system.

26

2.3.2 The Importance of Web Standards
Standards are the key to interoperability in all technology. The American National
Standards Institute (ANSI) was originally founded as the American Engineering Standards
Committee (AESC) in 1919 to promote and maintain standards in the engineering world5. Since
then, ANSI has developed specifications for everything from paper sizes (ANSI/ASME Y14.1) to
programming languages such as “ANSI C” (ANSI X3.159-1989) and “American Standard
Fortran” or informally, “FORTRAN 66” (ASA X3.9-1966). In the case of eye protection, without
such standards employers would need to trust the word of private companies when it came to
the safety of their employees vision. In the case of software, it would mean that a package
purchased from one vendor may not work with software from another. While this scenario is
great for the software company, which would have a very captive user base, it is not so great for
the software user. Imagine if ArcGIS was still the only program that could open a shapefile –
ESRI would effectively have a monopoly on spatial analysis. Truly, standards are critical to the
sharing of information.
The necessity of standards becomes even clearer in networked environments. What if
web browsers such as Mozilla Firefox and Google Chrome used different methods of connecting
to web servers? As Guptill (1999) points out, “The number of interfaces required (and the effort
required) increases as the square of the number of communicating systems.” With the need to
design so many interfaces, the Internet could not possibly have grown at the rate it has. Despite
the existence of numerous vendors who supply web browsers, all of them adhere absolutely to
low level standards like HTTP. If they did not, they would not be able to access the Internet.
However, precisely because HTTP is low level and focused, it gives developers a lot of flexibility

5

ANSI History: https://www.ansi.org/about_ansi/introduction/history

27

on how to handle connections to a website after they have been established with HTTP. The
implications of this dynamic are best explained with an example.
Imagine making a telephone call to a customer support line. You pick up the phone, dial
a number, and you are connected to an automated system on the other end. Whatever happens
next is up to you, the user. You can listen to the prerecorded messages and respond using
predetermined prompts as explained by the system. You might, or example, “Press 2 for
accounts and billing,” after which you will be given more information and another set of ways to
respond. Communicating over the internet is very similar, in that while the communication
between your phone and the automated system is standard – the tone produced when you
depress a key on the keypad is the same every time – the response from the system is not.
Pressing 2 when calling a different automated system, or even a different menu within the same
system, is likely to have a much different outcome.
HTTP works in a very similar way. Without going into unnecessary detail, the HTTP
specification contains several HTTP methods. These methods are often referred to as “HTTP
verbs” because they have names like GET, POST, PATCH, and DELETE. In addition to these
four there are several others6, each with their own purpose and semantics. HTTP verbs,
conceptually, work in a way similar to pressing the buttons in the example above. Using certain
methods on certain web addresses produce one set of results, while using those same methods
at a different web address may produce different results. How the system responds is entirely
up to the system designer.
When a developer writes code to handle HTTP requests, he or she must decide explicitly
how the server will respond to each individual request method at each endpoint (web page).
Similarly, phone system designers have to decide how to respond to each individual tone for

6

List of HTTP Methods: https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods

28

each individual menu. While good systems follow predictable patterns such as the REST style
discussed in the “Resource Oriented Architectures” of this paper, there is no standard procedure
for how a system must respond to a particular input.
What that means in practical terms is that when you type “http://www.example.com” into
a web browser, there is no law of computing that says what has to happen next. You can be
redirected to a different page, a download could begin, or anything else the developer wanted
could happen so long as the basic rules of HTTP are followed. It all depends on how the server
responds to the request your browser issued to the website. While different HTTP methods have
slightly different properties (POST, for example, allows the requester to send some data along
with the request), it is this flexibility is what allows developers to create the interfaces which
enable distributed systems that perform arbitrarily complex processes in response to requests.
In order to interact with data or have a server perform geoprocessing remotely, a
developer must create a Web Service. IBM broadly describes a web service as “a generic term
for a software function that is hosted at a network addressable location.”7 Confusingly, there is
also a formal web service standard. The W3C formally defines Web Services as using SOAP
and the Web Service Description Language (WSDL)8. In this paper, we retain the informal
definition because the difference between the definitions lies in implementation details, not the
concept. Furthermore, whether or not we follow a particular standard is generally not important
to the users – they just want to have a working, reliable service.
In the terminology of web services, the web address to which the request is sent is
referred to as an endpoint. Using a server-side processing architecture a GIS developer can, for
example, create an endpoint that accepts POST requests containing a user’s email address and

7

JSON Web Services:
https://www.ibm.com/support/knowledgecenter/en/SSGMCP_5.3.0/com.ibm.cics.ts.webservices.doc/conc
epts/concepts_json.html
8
W3C Glossary, Web Service: https://www.w3.org/TR/2004/NOTE-ws-gloss-20040211/#webservice

29

an arbitrary geometry, runs an analysis on data stored locally using that geometry, and emails
the result to the user after the analysis finishes. By employing a hybrid architecture, the server
may do the processing and return the data to the user’s web browser for visualization. Under a
client side architecture, the endpoint could instead be designed to return data directly to the
user, allowing the browser to perform analysis and visualize the result (Walker & Chapra, 2014;
Bryne et al., 2010).
The dynamic nature of such of a web processing service makes the flexibility which
HTTP offers a must-have, but there are downsides to that flexibility as well. To continue with the
previous example, POST is a logical verb to use. However, there’s nothing to prevent that
developer from only responding to the PATCH or PUT methods instead. Because there are no
set rules for how a server responds to a request, it can difficult to chain systems together. It can
also be difficult to discover and understand what systems do.
To combat these problems, programmers have come up with standards and
specifications that operate at higher levels than HTTP such as Simple Object Access Protocol
(SOAP), which defines a series of rules for machine-to-machine communication that is
commonly used in distributed GIS solutions. For distributed computation to work, all the
machines must have a previously agreed upon object model, and interoperability in GIS has
long been a topic of research (Sondheim, Gardels, & Buehler, 1999). Whether sharing
geographic information amongst researchers, processing it on a remote server, or downloading
it from a data producer, the geographic data case is fundamentally concerned with
interoperability and therefore standardization. Furthermore, standardization has many benefits
beyond interoperability. The process of discovering geographic data and geoprocessing
services is much easier when a standardized metadata format. It is important to note, however,
that standardization is challenging and only matters if the entire community participates.

30

Because of this, standardization is almost always an organic, user-community driven process
triggered in response to the need to communicate across various systems (Salgé, 1999).

2.3.3 Web Standards and Geographic Information
Geographers around the world have understood the necessity of standardization for a
long time. In 1919, President Woodrow Wilson attempted to standardize the production of
federal geographic information by creating the Board of Surveys and Maps, which would later go
on to become the Federal Geographic Data Committee (FGDC)9. Since then, numerous
national, regional, and international organizations have attempted to standardize geographic
data (Salge, 1999). Standardization in GIS is the key to interoperability, and interoperability is
the path to distributed GIS systems (Sondheim, Gardels, & Buehler, 1999; Coleman, 1999).
With this in mind, the Open Geospatial Consortium (OGC) is an international non-profit
organization made up of representatives from the geospatial community which creates the
standards necessary for sharing geographic data, creating web services, and more.
Founded as the Open GRASS Foundation in 1992 to support private sector adoption of
GRASS GIS, the OGC refocused its priorities in 1994 and became a standards organization.
Their goal was to build a foundation for a rich ecosystem comprised of “diverse geoprocessing
systems communicating directly over networks by means of a set of open interfaces based on
the ‘Open Geodata Interoperability Specification’ (OGIS)”10. OGC went on to create several very
important standards for geographic data on the web. A few of the particularly well-known ones
are the Web Feature Service (WFS), Web Map Service (WMS), and Web Map Tile Service
(WMTS), though there are many others.11 The OGC standard most relevant to this research is
the Web Processing Service (WPS) specification (OGC, 2015).

9

FGDC History: https://www.fgdc.gov/who-we-are/history
OGC History: http://www.opengeospatial.org/ogc/history
11
List of OGS Standards: http://www.opengeospatial.org/standards
10

31

The WPS specification outlines a standard set of methods which a system must provide
via SOAP in order to be considered WPS-compliant. The specification also outlines standards
for service metadata and specific ways in which those the metadata must be made available.
These strict rules and codified principles can be overly constricting for small projects, but are
necessary for large scale systems. Following standards like the WPS allows users to access a
service in a predictable, easy-to-understand way. It also allows a developer to chain several web
services together, realizing to OGC’s original vision of an interoperable networks of geographic
data and analysis.
Standards may be considered either de facto or de jure depending on their origins, but
are essential regardless of those origins (Salgé, 1999). De facto standards are created ad hoc
by user communities in a direct response to their needs. They are informal, but ubiquitous. For
example, the ESRI Shapefile (ESRI, 1998) is a de facto standard format for geospatial vector
data simply because of its past popularity despite serious design flaws. De jure standards often
begin as de facto standards which evolve and become formalized under the guidance of some
overseeing committee that represents the needs of a community such as the OGC, ISO, FGDC,
or the FGDC’s European counterpart INSPIRE. OGC services in particular followed this path,
while older standards such as metadata and file transfer standards developed by national and
regional organizations as early as the 1980s were created in direct response to a need for
formal standards (Salgé, 1999).
Traditionally, commercial software development has implemented its own de facto
standards as needed to encourage interoperability amongst a company’s own software
products. This is why ESRI’s ArcGIS Online works very well with all of ESRI’s software.
Because the standards are de facto and not available to the public, those software products
suffer from limited interoperability with systems outside a company’s software ecosystem.

32

As early as 1999, the rise of the Internet began to put pressure on many geographic
software companies to adopt open standards and improve interoperability (Goodchild & Longley,
1999). This trend continues today, especially as the toolchains researchers rely on become
more complex as is evident from the ever-increasing number of OGC standards. Further
evidence of this pressure shows in some of ESRI’s choices over the recent years, including the
decision to switch from Visual Basic to a free and open source language called Python12 for
scripting in the ArcGIS suite. Other good signs of interoperability improvements include the
ESRI JavaScript API (similar to the Google Maps API), the esri-leaflet plugin for the popular
Leaflet library, and ESRI’s decision to adopt the Mapbox Vector Tile Specification.
Despite these promising developments, some things do not seem to change. In
December of 2016, ESRI introduced their new, proprietary “expression language” called Arcade
(Barker, 2016). According to its dedicated page on ESRI’s website, Arcade can be use for
anything from “writing simple scripts to control how features are rendered, or expressions to
control label text”.13 The main benefit of the language, however, is that it provides a single
expression-based interface for styling and even analyzing data which can be used in any ESRI
product, from mobile to browser to desktop. While highly interoperable within the ESRI
ecosystem, the proprietary nature of the language prevents it from being used in any other
context and steers ESRI users away from learning other languages which may be more broadly
useful. ESRI is by no means unique in this behavior, and is only referenced here as an example
due to the organization’s massive influence in the geographic community.
While ESRI and other companies producing proprietary software do offer solutions for
interoperability that meet open standards in addition to their own private standards, there are

12
13

https://www.python.org/
https://developers.arcgis.com/arcade/

33

myriad open source options capable of serving geographic data on the web. Boundless14, for
example, uses exclusively open source technology to offer a commercially supported enterprise
GIS solution similar to that of the ESRI suite. For technically advanced users looking to build a
system of their own, Steiniger and Hunter (2012) provide an outline of selected existing open
source software that can be used to build up a Spatial Data Infrastructure (SDI). They point out
that for every category of system that compose SDIs, open source components exist for their
proprietary counterparts.
The simple existence of such software and the standards supporting it is important. As
Dalle and Jullien (2001) point out, “Most software technologies are indeed subject to network
effects and thus tend to give rise to de facto standards and to monopolies.” Without open
standards provided by OGC, ISO, FGDC, and others, those monopolies would be held in private
hands (Salgé, 1999). Without open standards, reliable and interoperable systems like those
offered by Boundless or discussed by Steiniger and Hunter (2012), Pebesma et al. (2010), and
Yue et al. (2015) would be highly difficult, if not outright impossible, to build. Furthermore, the
existence of some standards, such as metadata standards, allow other standards to be built on
top of them providing other benefits in the process, such as the Web Catalog Service
specification which enables users to discover data services.

2.3.4 The Limitations of Web Standards
There are, of course, limits to the benefits standards can provide. The biggest obstacle
to implementing standards like the OGC’s WPS specification is the complexity. The WPS
specification dictates what data formats the service can accept, what data can be accepted in
those formats, the names of methods used to describe and call the service, and so many more
details. At 133 pages of dense technical language, it is also hard to read (OGC, 2015). Because
14

https://boundlessgeo.com/

34

of these barriers to entry, a geographer looking to develop a new geoprocessing service may
have a hard time designing it from the beginning to be OGC compliant (Yue et al., 2015).
Researchers are typically not IT experts, and this is a major barrier to the widespread adoption
and implementation of OGC compliant services which require a service-driven approach and a
SOAP-based interface (Mazzetti, Nativi, & Caron, 2009). Without some experience in
implementing standards compliant systems, the only way to do so with any ease is to use an
existing piece of software that implements the standards by default. Another reason one might
find to not implement a particular standard when creating a new web service is simply that the
goals of the specification do not align with the goals of the service, the organization creating the
service, or the community who will use the service.
One great example of a large project that made the decision to avoid a standard in favor
of its goals and usefulness for users is OpenStreetMap. At the time of writing, OpenStreetMap
uses a REST API15 which is not OGC compliant to accomplish this. While all of these operations
could be handled in an OGC compliant fashion, doing so would add a lot of extra work to the
project for very minimal benefit to the community that supports it16.
Another instance in which standards may hinder more than help a project is if that
project is doing something new and revolutionary. The best standards anticipate change, but
innovation can quickly make them obsolete (Salge, 1999). For instance, the GeoBrain
implements many OGC standards, but chooses to modify the WPS because it does not fit the
system’s intended purpose well enough. GeoBrain is discussed in detail in the Examples of
Modern Distributed GIS section of this paper.

15

According to the wiki page: http://wiki.openstreetmap.org/wiki/API_v0.6
An OSM help question on the topic:
https://help.openstreetmap.org/questions/981/ogc-and-interoperability
16

35

Finally, not all standards apply to all situations. It does not necessarily make sense for
someone building a car in the United States to follow European standards. Still, standards like
the minimum strength of the materials used in that car should be followed regardless of where it
is built. In software, machines that will be part of a network need to follow the basic standards
which enable the Internet. Beyond that, the question of which standards to follows is determined
largely by the purpose of the system and thereby the architectural choices of its designer.

2.3.5 The Importance of Systems Architecture
For those of us who gained our computing and system design knowledge outside of
computer science departments, the idea of a “system architecture” may be abstract. As it turns
out, that’s not our fault. In his 1987 paper on the subject, John Zachman points out that the term
is rather ambiguous even amongst systems architects. In fact, the purpose of the paper is to use
concepts from engineering, construction, and normal architecture to draw analogies between
software because it was impossible for professionals to agree on a definition. In a follow-on
piece a decade later, Zachman (1997) still cannot provide a concise definition, but the ISO
Architecture Working Group (2011) defines systems architecture as the “fundamental concepts or

properties of a system in its environment embodied in its elements, relationships, and in the
principles of its design and evolution”.
The concept, more or less, is thus: systems are made up of many small pieces. Those
pieces need to fit together perfectly when assembled to create a system. The only way to
achieve that goal is to plan it out from the start, just like an architect draws floor plans for a
house before construction begins. A system’s architecture is the conceptual model to which it
must adhere, or risk imperiling the rest of the system.

36

Just as contractors rely on architects to give them direction, so too do developers rely on
system architects. Given time and training, contractors can make changes to the design on the
fly. A contractor could also have an architect develop several modular floor plans, allowing the
contractor swap parts of the plan in or out as needed. The same holds true for software
architects and developers. The home architect, in this case, is analogous to a very talented
developer or team of developers who create some cornerstone piece of software, such as a web
server. The architects may design that server so that pieces of it can be swapped out or added
as needed – a user authentication strategy, a template engine, or other common subsystems
which run on web servers.
Other developers can take that piece of software and work with it, so long as they follow
the basic principles which allow the various pieces of the system to interact. They can modify
the existing system directly, or extend it to add additional functionality as long as they consult
the “blueprint” first to make sure the changes will not break anything else. At the end of the day,
it is up the developer to put the architect’s plan into action.
As in geography, there is a question of scale involved in system architecture. The
example above covers the scale of a single web server, but in many cases a web server is just
one part of an application. In the application built during the course of this research, for
example, users interact (indirectly) with a database server and system processes in addition to
the web server. Each subsystem has its own architectural requirements. Additionally, the system
as a whole needs its own architecture. At the highest level, architecture must consider how the
client and server interact. Today there are two main types of architecture: Resource Oriented
Architecture (ROA) and Service Oriented Architecture (SOA). Where ROAs are concerned with
providing access to resources, SOAs are concerned with getting things done – with the
execution of some some operation over a network.

37

2.3.6 Service Oriented Architectures
SOA is “an organizing and delivery paradigm” which can provide remote access to data,
processing tools, and even visualization tools across various domains (Mazzetti, Nativi, &
Caron, 2009). SOAs exist to support a model called service-driven (Nativi, Mazzetti, & Geller,
2013) or service-oriented computing, “A concept in which a larger software system is
decomposed into independent distributed components ... These components are deployed on
remote servers and are designed to respond to individual requests from client applications“
(Castronova, Goodall, & Elag, 2013). Basically, SOA seeks to provide a full suite of computing
capability by aggregating access to services (as opposed to the services themselves) of various
types from various domains via a single interface.
SOA is all about machine-to-machine communication, and service-oriented applications
work by chaining services together via a standard interface to complete an operation. For
example, a user may request a particular operation be performed on a particular piece of data.
The user only sends a single request, but the server translates that request into a chain of
service calls which may, for example, include a service to provide the dataset, a service to
process that dataset, and a service to visualize the output. Users may also choose to chain
several processing steps together (Di, 2004b).
The modular nature of SOA is perfect for large groups like the geospatial community
which have diverse interests, but rely on many of the same tools. It allows subgroups to use
their expert knowledge to maintain their own datasets and tools, exposing them for use as
necessary by the broader community while maintaining total control over those tools (Mazzetti,
Nativi, & Caron, 2009). Separating the code base out into small, separately developed pieces
also improves maintainability. The benefits provided by the loose-coupling between the system

38

and the service are only possible because of interoperability standards like the Web Processing
Service specification, the importance of which are discussed in the Web Standards and
Geographic Information section of this paper.
The primary benefit of SOA is that it allows disparate entities to combine their services
under a single application without relinquishing ownership of the code or data, which essential
for the development of a truly distributed GIS (Granell, Díaz, & Gould, 2010). The distributed,
modular nature of SOA makes for easier code maintenance, as developers are only responsible
for their small part of the system. Furthermore, the loose coupling between systems makes it
easy to add additional services as they are developed and prevents the entire application from
failing should something happen to a single service. These benefits make SOA an ideal
candidate for systems like UncertWeb, GEO Model Web, and many of the other tools discussed
in the Modern Examples of Distributed GIS section of this paper. (Bastin et al., 2012; Nativi,
Mazzetti, & Geller, 2013).
SOA is well suited to the design of portals and has enjoyed great success in the
geographic community, but it has several problems as well. By far the most serious of those
problems is the immense complexity of service-oriented systems (Bastin et al., 2012). Because
they are based entirely on machine-to-machine communication, they require advanced user
interfaces to make them accessible to their human users (Nativi et al., 2011). Even more
complex, services may require other services to mediate between them for the purposes of,
amongst other things, describing and converting data (Nativi et al., 2011; Nativi, Mazzetti, &
Geller, 2013). it is almost impossible to implement a small scale SOA because they are, at their
core, systems of systems. A single system cannot accomplish the goals of SOA because each
service must be its own system, which may be accessed through a service broker (another
system), which a lay user must access via a user interface – yet another system. To cap it off, all

39

of those systems must follow very specific rules to be able to transfer requests and data
between themselves. These and other difficulties relating the the complexity of SOA are
limitations which cannot be ignored. Should a developer decide these barriers to entry are too
great, there is another family of architectures to explore.

2.3.7 Resource Oriented Architectures
A ROA is only concerned with accessing particular resource in response to user request.
This is a direct contrast to SOAs which are only concerned with accomplishing a given task. In a
ROA every resource has a Uniform Resource Identifier (URI), the most common type of which is
the Uniform Resource Locator (URL). The URL, commonly known as a web address, is actually
a standardized URI that contains not only the name of the resource, but a method for retrieving
it.17 But what is a resource? According to Granell et al. (2013), “Any informational entity may be
regarded as a resource in the target RESTful application.” Essentially, this means that anything,
be it a document, model, raw data, or any other entity one can put on the internet, can be a
resource.
One extremely common ROA is known as Representational State Transfer (REST) and
was first described by Roy Fielding (2000) in his dissertation. Under a REST architecture, a user
interacts with a server via an Internet protocol (typically HTTP, but not necessarily – see
Constrained Application Protocol, RFC 7252). REST usually employs HTTP’s GET, PUT, POST,
and DELETE methods to create a uniform interface via which a user may interact with resources
on a server. For an accessible yet detailed description of REST principles, see Mazzetti, Nativi,
and Caron (2009). In doing so, REST elevates HTTP from a simple transport protocol to an
Application protocol capable of performing arbitrary operations on a remote machine (Granell et
al., 2013). Because it relies only on HTTP and not the heavy, complex requirements of SOAP or
17

URL Spec: https://www.w3.org/html/wg/href/draft#url

40

other SOA structures, REST is simple to deploy, reliable, and scalable, making it a very
common architecture for building APIs, but it can also be limiting.
As explained above, early geographic information clearinghouses were almost entirely
resource-oriented. Early clearinghouse development revealed the shortfalls of ROA in
geographic data distribution; specifically, their static nature could not accommodate changing
user demands. Today many geographic data distribution systems use REST APIs. These APIs
differ significantly from traditional ROAs. Often referred to as RESTful web services, they stretch
the boundaries of ROA by allowing users access to services (just like SOA) by mapping
functions – conceptually modeled as resources – to specific URLs and HTTP methods. For
example, ESRI offers RESTful web services to access geographic data as discussed later in
this paper. Rather than sending request parameters as part of a complex object, the parameters
are encoded directly in the URL. Much of the same information is transferred, but using a
different information model. If this seems convoluted and confusing, that’s because,
conceptually, it is.
Several examples of this confusion are visible in the research on RESTful geoservices.
Furthermore, for a service to be truly RESTful, request and response must be exchanged in a
format which supports hypermedia (interactive media). JavaScript Object Notation (JSON) has
no such capability even though it is commonly used in “RESTful” systems (Walker & Chapra,
2014; Yue et al., 2015). Even top engineers at large Internet companies like LinkedIn have
trouble with this.18
To further complicate the matter, RESTful services blur the line between SOA and ROA.
As discussed above, the term “resource” can refer to just about anything. When using a ROA to
provide services as resources, does it inherently become a SOA? REST is all about transferring

18

JSON and REST: https://www.linkedin.com/pulse/rest-vs-rpc-soa-showdown-joshua-hartman

41

representations of the state of resources on a remote server, but Castronova, Goodall, and Elag
(2013) attempt to use it to extend PyWPS – a WGC compliant SOA. This a bothersome
question, but fortunately does not inhibit the creation of RESTful services in any way. REST is
merely a style of architecture, not a standard or protocol fundamental to the proper operation of
the Internet. Finally, some researchers have found that REST offers few unique practical
benefits for geoportals, and cost far outweighs benefit if an SOA is already in place (Lucchi,
Millot, & Elfers, 2008).
Despite the drawbacks listed above, many geospatial service providers do provide
RESTful interfaces. Walker and Chapra (2014) present the Web-based Interactive River Model
(WIRM), “An interactive web application for a simulation model of biochemical oxygen demand
and dissolved oxygen in rivers,” which provided great insight into the inner workings of
REST-based services and clients in the geosciences field. Mazzetti, Nativi, and Caron (2009)
provide a detailed and thoughtful analysis of the benefits of REST over the traditional
SOA/SOAP model for environmental systems modeling. Yue et al. (2015) include it in their
vision for the future of distributed and “intelligent” GIS. Finally, Granell et al. (2013) offer the
most extensive treatment of REST for geospatial web services I have discovered, including a
summary of other architectures historically used by the online modelling community, definitions
of REST and ROA, and practical implementation suggests for the deploying RESTful services
for web-based modeling. As demonstrated in the following section, it is also a common design
pattern in many existing geospatial data distribution systems.

2.3.8 Modern Examples of Distributed GIS
GIS on the web is rapidly growing and changing, but its importance has been increasing
ever since it is inception (Rinner, 2003). Having discussed the fundamental architectures and

42

standards which enable the development of modern distributed GIS, we are now in a much
better position to compare them and gain useful insights in the process. As revolutionary as they
were at the time, clearinghouses quickly became outdated. Succinctly, “For the first generation,
data were the key driver for SDI development and the focus of initiative development. However,
for the second generation, the use of that data (and data applications) and the need of users are
the driving force for SDI development” (Crompvoets et al., 2004). While the core principles of
improving discoverability and reducing cost remain, the manner in which they are addressed
has changed and new functionality has been added.
Instead of clearinghouses, second generation web distribution systems are usually
referred to as “portals”. According to Maguire and Longley (2005), “Portals are web sites that act
as a door or gateway to a collection of information resources, including data sets, services,
cookbooks, news, tutorials, tools and an organized collection of links to many other sites usually
through catalogs.” Though there are many types of portal across the web (Maguire & Longley,
2005), this research is only concerned with portals for geographic information and will use the
term portal or the more explicit “geoportal” interchangeably. Tait (2005) defines a geoportal as “a
web site that presents an entry point to geographic content on the web or, more simply, a web
site where geographic content can be discovered.”
Lessons learned from user dissatisfaction with clearinghouses encouraged more
extensible designs in portals. The generic interfaces employed to achieve that extensibility
allowed the evolution of the portal concept to better align with the Maguire and Longley (2005)
definition of a portal. Today’s portals feature improved, much more interactive user interfaces
which can accomplish the same tasks clearinghouses were built for in addition to connecting
users to many additional resources such as processing and visualization services. Technologies

43

like Asynchronous JavaScript and XML (AJAX) allow users to interact with portals as though
they were desktop software (Han et al., 2009; Qui et al., 2012; Han et al., 2012).
There exist today several proprietary and open source solutions, commercially supported
or otherwise, that can act as web-based data distribution systems. Esri’s ArcGIS Server,
Autodesk’s MapGuide, Boundless’ Geoserver, and UMN Mapserver are just a few of the many
options. Combining any of these technologies with an online interface (e.g. ArcGIS Online and
ArcGIS Server) creates a portal. Even constrained to Free and Open Source Software (FOSS)
alone, there are numerous options for building a web-based GIS for data distribution. Steiniger
and Hunter (2012) describe various recently developed FOSS tools useful for building a geodata
portal. Lee (2009) offers another method, combining a FOSS back end with Google’s Web Map
API front end for a private-public hybrid19 that’s easy to create and maintain.
While today’s online geographic data distribution systems are usually much more
complicated than older systems, they are conceptually identical to the early second generation
portals first described in the early to mid 2000s. Anyone wishing to design such a system must
look to existing systems for guidance, or suffer the consequences of eschewing many years of
trial and error. Unfortunately, diligent searches revealed very few journal articles describing the
technical implementation of existing systems. On the other hand, there is a wealth of conceptual
descriptions; all of the work relating to the Model Web and UncertWeb discussed in the “A Brief
History of GIS and Error” subsection of this thesis fall under the “conceptual description” and
“second generation” categories.
I postulate that there are three fairly simple explanations for the limited number of white
papers regarding the construction of distributed GIS: the difficulty of translating mundane
activities of system design into formal literature, the prerogative of proprietary software

19

With “private” being a reference to the proprietary data on which Google Maps rely.

44

companies to protect trade secrets, and the ad hoc nature of open source projects. First,
understanding the implementation of a system design is, in my experience, best gained through
trial and error, with a lot more error than success. Formal literature is mostly concerned with
communicating what worked, not what did not work the first several hundred times. As to the
second point, it does not make economic sense for SuperGeo, Autodesk, or Esri to share
detailed guides about how they build their software. Their distributed GIS systems are their
highest value products, and giving competitors such valuable information would be a poor
business decision.
Finally, FOSS projects are are built piece by piece, as needed, for whatever purpose the
community needs at the moment they are conceived of. These are typically not good conditions
for detailed and accurate recording of a development process. According to research on open
source innovation, “Open source software is typically created within open source software
projects, often initiated by an individual or group that wants to develop a software product to
meet their own needs” (von Krogh & Hippel, 2006). Other research on open source projects
points out that communications and organization are often very loose; they often lack “explicit
system-level design, or even detailed design,” or a “project plan, schedule, or list of
deliverables” (Mockus et al., 2002). There are not hard and fast rules for how to or why to do
things. Instead, issues are dealt with as they arise and therefore typically not documented
outside of a user group listserv or forum.
Ironically, the loose organization and informality that makes open source projects so
flexible and agile are the same things that makes them difficult to study. Though the Mozilla and
Apache case study proves it can be done (Mockus et al., 2002), a similar research effort on
geospatial software would probably be worthy of its own thesis. As a result, while it might be
fairly easy to download and use FOSS, it can be difficult to model a new system off of an

45

existing one without the deep knowledge of that system’s developer community. One of the
secondary goals of this research is to address this gap in the literature.
Details or no, there is ample experience for a developer to draw from when considering
the design of a geoportal. The earliest example I could unearth of a second generation portal
was called “G-portal”, and described a ROA-based system for geodata creation, dissemination,
discovery, and display in educational settings (Lim et al., 2002). Diping Li (2004a) introduces
G-portal’s SOA-based counterpart GeoBrain: “A three-tier standard-based open geospatial web
service system which fully automates data discovery, access, and integration steps of the
geospatial knowledge discovery process under the interoperable service framework.” System
users could access GeoBrain’s functionality through a user interface called the Integrated
Multiple-protocol Geoinformation Client (MPGC). In a separate paper, Li (2004b) also offers a
detailed explanation of the fundamental concepts which make GeoBrain possible. SOA in
general and GeoBrain in particular would go on to become major players in the modern
distributed GIS field.
Essentially, GeoBrain is an interconnected web of distributed data providers, catalog
services, geoprocessing services, and visualization services (Li, 2004b). It is the quintessential
service-oriented geoportal. As discussed above, it is built for machine-to-machine
communication, and can be very difficult for people to use (Nativi et al., 2011). As web
technologies improved, the GeoBrain team introduced the GeoBrain Online Analysis System
(GeOnAS), a web-browser-based Graphical User Interface (GUI) for interacting with GeoBrain
services which replaced the MPGC (Han et al., 2008). GeOnAS is a distributed GIS designed
for data discovery and visualization with limited processing capabilities. (Zhao et al., 2012).
Improving on the GeOnAS concept, Han et al. (2012) authored a paper outlining the
purpose, goals, and even some implementation details of a project called DEM Explorer. As the

46

name suggests, DEM Explorer served as a portal where users could view and download digital
elevation data. Powered by GeoBrain, DEM Explorer could access the Geodata Abstraction
Library (GDAL)’s DEM processing capabilities and limited GRASS GIS functions via web
processing services to produce vector data such as watershed basins or raster data like
Topographic Roughness Index (TRI) in response to user queries. The project was very
successful and NASA, its sponsor, adopted DEM Explorer as the interface for the Land
Processes Distributed Active Archive Center (LP DAAC) portal, renaming it Earth Explorer. At
LP DAAC, the connections to GeoBrain’s processing functions were severed, leaving users
access to only data retrieval and viewing services and making it the quintessence of a modern
data distribution service. It still has all the download functionality despite the fact that as of June,
2017 the interface is slated for deprecation in favor of the newer Earthdata platform.
DEM Explorer is a web-based GUI for GeoBrain, which relies on Apache’s Java-based
Axis2 server (Han et al., 2012). GeoBrain offers OGC compliant WMS, WFS, and WCS.
Additionally, it offers Web Standards compliant (but not OGC compliant) services for
geoprocessing (Di, 2004a). DEM Explorer helps a user consume these services are by
translating user input to machine-readable format and sending requests to the server. On the
server side, programmers use a servlet called WSClient to the GeoBrain system which converts
standard HTTP GET requests and their associated parameters into SOAP messages and
passes them along to the the appropriate service. While DEM Explorer uses HTTP for its
interactions, GeoBrain’s design also lets it interact with over other transport protocols.
DEM Explorer is not the only web-based GUI created to interact with GeoBrain. Its
cousin, the GRASS Web Application Software System (GWASS), also offers a fully-featured
GUI which accesses GeOnAS’ GRASS GIS web-services. Instead of using the WSClient
servlet, GWASS’ application stack – the group of front end, back end, and database

47

technologies which comprise the system – uses its own web server to translate between the
GUI client’s requests and GeoBrain services. Unlike DEM Explorer and GeOnAS which were
built to demonstrate the capabilities of distributed GIS, GWASS was built to solve the specific
problem of operating system incompatibility for GRASS GIS (Qiu et al, 2012). Apparently,
though, it did not completely solve that problem as it is no longer available online20.
Yet another example of a distributed GIS based on SOA comes from Granell, Díaz, and
Gould (2010) who discuss the development of the AWARE: “A tool for monitoring and
forecasting Available WAter REsource in mountain environments.” AWARE implements
OGC-compliant services including WMS, WCS, WFS, and WPS for environmental modeling
using Java as the application language and Apache as the server. AWARE is built on the same
concepts as its more broadly focused cousin INSPIRE, another European SOA-based portal
similar to GeoBrain (INSPIRE, 2007; Lucchi, Millot, & Elfers, 2008). Like other SOAs, AWARE is
extremely complicated, well featured, and a great example of a modern geoportal, offering
everything from discovery to advanced processing (Granell, Díaz, & Gould, 2010).
Unfortunately, like its siblings the GEO Model Web, UncertWeb, and GWASS, AWARE
appears to have been abandoned after completion21. The noticeable tendencies of these large
SOA-based systems to be left unfinished or abandoned after completion is notable. With the
exception of extremely large systems like INSPIRE, GEOSS, Earth Explorer, and Earthdata,
service-driven portals have not lived up to the hopes of their designers. I believe that the main
reason for this is that these systems do not run themselves. Their immense complexity and
interdependence, even distributed across so many organizations, is too much for developers to
maintain. Reducing complexity by removing processing functions seems to be a good way to
keep systems like Earth Explorer (Han et al., 2012) alive. Private companies like NextGIS have

20
21

GWASS Home Page: http://wastetoenergy.utdallas.edu/gwass
Official AWARE project page: http://www.copernicus.eu/projects/aware

48

adopted this model as well, using an AJAX-enabled web-based GUI to consume OGC compliant
services from a MapServer instance connected to PostgreSQL/PostGIS22.
As a counter the oft-too-large goals of previous geoportals, there has been a resurgence
of ROA-based systems in distributed GIS. Mazzetti, Nativi, and Caron (2009) offer one
explanation for why may be taking place: “Scientists’ main objective is not to develop and
maintain the complex infrastructures required for SOA implementation, but to access and
publish information in the easiest possible way.” RESTful services are simple, easy to deploy,
and very lightweight, making them an attractive option. they are also widely accepted in the
community. The OGC Standards Working Group is currently developing a standard for RESTful
services23, but there are many RESTful geospatial services in the wild.
One of the most well known of those services is OpenStreetMap (OSM). OSM is, first
and foremost, a database. To interact with the database, OSM developers created a GUI
through which clients may interact with the data stored in that database – querying existing data,
creating new data, deleting records, or simply plotting them on the map interface. That GUI is
very similar in function to those of GWASS, GeOnAS, or AWARE except that relies on the REST
API which underlies the entire system, rather than a SOA. While OSM’s architecture is well
suited to REST, their system is also meant to function in manner similar to the websites for
which REST was designed. Performing standard Create, Read, Update, and Delete (CRUD)
operations is the most common application of the REST architecture.
In their 2013 paper, Granell et al., 2013 discuss how services may be exposed with
REST, rather than just the resources exposed by OSM. Naturally, adding additional functionality
adds additional complexity to the application as well. However, Some big names in the
geospatial community offer REST APIs. ESRI developed its ArcGIS Server API using the REST

22
23

NextGIS: http://nextgis.com/
OGC SWG RESTful Services: http://www.opengeospatial.org/projects/groups/restfulswg

49

style, and exposes a full range of capabilities similar to that of GeoBrain via a ROA24. GIS Cloud
is another GIS company which offers a REST API, though it is not as fully featured as ArcGIS
Server. GIS Cloud does not allow any geoprocessing more complex than geolocation over their
API, but they do allow users to dynamically create maps and services to share those maps.
Figure 7 offers a “family tree” of the systems discussed throughout this section, and the
graphical representation of the connections between systems within each architectural type
reveals several interesting things about each. First, the systems of the SOA branch have a lot
more connections to one another than the systems of the ROA branch. This is because SOAs
are designed to function within a system of systems and require numerous related systems to
function properly, whereas ROAs are designed to operate alone or within a system of systems.
Put another way, ROA functions the same way regardless of how the system is accessed. SOAs
require one system for actual geoprocessing and an additional system to access that
functionality. To see the difference, compare the GeoBrain system with OSM. GeoBrain has four
connected systems, all of which serve to provide access to some subset of GeoBrain itself.
Without these systems, the protocols used by GeoBrain are extremely difficult to access
manually for a human user due to the fact that SOAs are designed for machine-to-machine
communication. OSM, on the other hand, has no connections between other systems in the
ROA branch. While one can design an additional system to access OSM, that system would use
the exact same methods as a human user who sought to access the data manually.
Another interesting point illuminated by Figure 7 is the difference in attribution between
the two branches. While the SOA branch is filled with academic citations, the ROA branch has
far more web addresses than journal citations. ROA is far more popular than SOA on the
Internet today, and this discrepancy seems to show that the same holds true within the

24

ESRI ArcGIS Server REST API: http://resources.arcgis.com/en/help/arcgis-rest-api/

50

geospatial industry. This idea is further supported by the fact that very few of the systems in
SOA branch of the tree are still operating – hence the lack of web addresses. This is a simple
consideration, but should not be overlooked. While the SOA paradigm is clearly preferred for the
design of geospatial data distribution systems, there are some benefits to choosing a ROA.

Figure 7. A family tree of Service and Resource Oriented Architectures and their constituent
systems as discussed in the preceding paragraphs of this section.

51

3 Building a New Web-based Distribution System
This section of the paper discusses the technologies we use to accomplish our first
research objective: create an easy way to distribute DEM error realizations on the Internet.
Having reviewed the long and varied history of both uncertainty-aware GIS and distributed GIS,
we are in a better position to discuss the motivations for the development of a new system. we
are also in a better position to understand exactly how our system differs from past approaches
to the same problem and what those differences mean for users. It starts by discussing the
design philosophy of the system, explaining which principles are most important to us and why.
Then, it presents a discussion of the selected technologies and the reason for their selection,
beginning with the back end and concluding with the front end.

3.1 Design Philosophy
One of the stated objectives of this research is to put advanced error modeling in the
hands of the average GIS user. The literature had very clear advice for the selection of software
in building such a system to ensure that the average GIS user can use it. To promote
widespread use, a system should “minimise user requirements and maximise simplicity” while
ensuring that any software used is “accessible to all DEM users” (Darnell, Tate, & Brunsdon,
2008). The second requirement makes using FOSS is a necessity – a recommendation this
work adheres to strictly. While Darnell, Tate, and Brunsdon (2008) offer their own example of
what this might look like, their decision to run the error realization on the user’s machine have
considerable drawbacks. Specifically, their model requires that the user be able to interpret a
variogram and fit the variogram model – something most people lack the training for. They do,

52

however, have a very robust visualization component that provides a great example for anyone
to follow when building such a system. Instead, our system proposes a solution similar to that of
MATCH Uncertainty Elicitation Tool25 described by Morris, Oakley, and Crowe (2014) in that it
relies on an uncertainty expert to do the actual modeling, and allows end users to access the
results of those models.
Uncertainty visualization techniques are a perfect example of why, when building a
system to distribute error realizations and models to operate on them, there are more usability
considerations than software alone. It is also very important to look at research on user perceive
and understand uncertainty. People have been working on visualizing uncertainty ever since
computer hardware existed to do so (Logsdon, Bell, & Westerlund, 1996). In one particularly
useful study, Aerts, Clarke, and Keuper (2003) performed research on various methods for
clearly visualizing uncertainty in land use change scenarios and discovered that people are
most comfortable with variations in the hue of a single color to represent uncertainty. The work
of MacEachren, Robinson, and Harper (2005) offers more useful contributions to research
regarding uncertainty visualization as it relates to user perception. The considerations above,
coupled with the desire for this work to be helpful to a broad community, will be accomplished by
adhering to three core design principles: Make it Open, Make it Easy, and Make it Magic. This
subsection discusses each of them in turn, explaining what they mean and why they are
important. It also tackles the question of whether or not to implement open standards.

3.1.1 Make it Open
Like all of the software used in its creation, this web distribution system must be free and
open source software. While FOSS is not categorically free of cost, all of the projects this

25

MATCH Uncertainty Elicitation Tool: http://optics.eee.nottingham.ac.uk/match/uncertainty.php

53

system relies on are available for download at no charge. This helps keep our development
costs down, improving the likelihood that the project will live longer than the grant that supported
it, unlike many of the systems discussed in the “Modern Examples of Distributed GIS”. It also
ensures that developers around the world will have access to the same software and can
contribute to the project should they find it useful. Finally, both the author and his advisor have
strong personal commitments to the open source community and would like to contribute to it in
a meaningful way. Of course, none of these outcomes are likely unless the system achieves the
next principle.

3.1.2 Make it Easy
Perhaps the most important message gleaned from the literature on both uncertainty and
distributed GIS is that systems need to be easy if they are to be widely adopted. If a potential
user finds a system difficult to navigate, they are not likely to use it again. The prototypical “user”
we refer to throughout this paper – the “average GIS user” – is someone who may or may not
have had formal GIS training, but uses it on a near daily basis. The “user” we target is someone
who knows enough about GIS to know that the data they use have flaws, but not enough to
know how to do something about it. The overarching goal of this project is not to give those
users a deep understanding of error modeling or propagation, but to make it easy to access
realizations and thereby consider uncertainty in their work. If the system is too complicated for a
user to even download a set of error realizations, we have already failed. Even if the system is
mildly inconvenient it is likely to turn some people away, so making the system easy to use is
absolutely critical to the success of the project.
While an obvious consideration for the user, ease is equally important for developers.
Choosing a platform with a lot of pre-existing software packages created to ease development is

54

a must, as it significantly reduces the work required to produce the minimum viable product for
serving error realizations on the web – the primary goal of this research. But the benefits go
beyond initial development cost. Ensuring the code uses a common platform and is easy to
maintain greatly improves the odds that other developers in the geospatial community will
extend this system to suit their own needs. When groups of people contribute to a software
project for their own reasons, it inevitably makes the software better in the long run (von Krogh
& von Hippel, 2006). Building the system with tools that make it easy to extend encourages
others to contribute to our open source project thus improving the system and strengthening our
ability to distribute DEM error realizations in the future.

3.1.3 Make it Magic
This final principle goes hand in hand with “Make it Easy”, but from a slightly different
point of view. “Make it Magic” demands that the system “just works” for the researcher who
simply wants to share his or her work without building a web server. The system design should
promote a “set and forget” type of deployment. After the initial download, anyone should be able
to get the system up and running on their local machine with minor configuration changes and
the fewest possible interactions with the command line. After its running, the server should take
care of itself, only informing the person who deployed it when something goes wrong.
The system needs to be capable of handling long-running processes, allowing users to
hook it up to arbitrarily complex geoprocessing scripts. Neither system users nor researchers
who deploy it should have to concern themselves with how the system handles that. The system
should accept incoming requests and respond immediately, though processing may not even
begin until well after the response is issued. Whenever the process finally completes, the
system should notify the requestor that the data are ready for download. This is called an

55

asynchronous operation and it ensures the best possible experience for the end user while still
achieving the goals of the researcher who deployed the system.

3.1.4 Standards Compliance
Standards compliance must be implemented before the system will truly be
production-ready. However, we found it very impractical to implement OGC standards in the
prototype. The first problem to arise was the selection of a standard. The main purpose of the
system is to distribute DEM data, so the WCS specification seems a natural choice. On the
other hand, each request runs a model on the server using that input, suggesting that maybe
WPS is a better choice. Further complicating matters, it can take several hours to generate
those data depending on the size of the requested area. Conceptually, our process is part data
processing service and part web coverage service. It is, after all, built for distributing raster data.
However, the data do not exist until the user requests that the model generate them. In reality
either could be a good choice given more time and people to consider the problem, but neither
handle changes to the geographic data model well.
In a perfect world, we could employ the encapsulation method for changing the data
model suggested by Goodchild, Shortridge and Fohl (1999) and store the uncertainty simulation
model within the dataset and let users generate realizations locally. Unfortunately mainstream
GIS has not developed to suit that data model. Our method is something of a hack: we distribute
many uncertainty-adjusted versions of the same dataset in an open and widely supported format
that mainstream GIS software is designed to handle – the GeoTIFF. This changes the data
model because in one sense, “the dataset” refers to all of those versions collectively. In another,
equally valid interpretation, we are distributing numerous related datasets. While the WPS
specification can handle either, using it does not make the problem any easier.

56

There are existing Python libraries and frameworks which, when combined, can perform
very similar processes to the ones we are using now. Specifically, the PyWPS26 library and Flask
27

microframework can be combined to create an easy, off-the-shelf solution for serving

OGC-compliant processing services. The WPS specification does not require a specific way of
returning data, which makes it flexible enough to handle our revised data model, but it also
means an additional layer of complexity on top of the same storage and processing logic we had
have to write without it. Furthermore, our simulation scripts are all written in the R Language (R
Core Team, 2016) and require advanced tools unavailable in Python. While a port from R to
Python is possible thanks to libraries like rpy228, it would be a massive undertaking.
On the other hand, not complying with standards makes it very easy to adhere to the
main goal of this project: build a platform to distribute DEM error realizations on the web, and
show researchers how they can use those realizations with their existing tools. While, they
cannot be pulled into a desktop GIS over a service like WFS, but they can be used just like any
other GeoTIFF. This is a huge ease-of-use bonus, as virtually every GIS user has worked with a
DEM in GeoTIFF format. If they need another format for some reason, any GIS can easily
convert the data to desired format because GeoTIFF is an OGC standard format. Like the OSM
development community discovered, it sometimes makes sense to sacrifice standards
compliance in favor of meeting the needs of your users.
Based on lessons learned from the wide range of literature discussed in the Literature
Review section of the paper, we needed to create a system that:
1. Uses exclusively Free and Open Source Software
2. Is easy to use

26

http://pywps.org/
http://flask.pocoo.org/
28
https://rpy2.bitbucket.io/
27

57

3. Is easy to maintain and extend
4. Requires little or no experience to deploy
5. does not require scientists to learn about web technologies
6. Asynchronously generates and distributes data
Point number 1 obviously addresses the “Make it Open” principle. Points 2, 3, and 4 correspond
to the “Make it Easy” principle, and “Make it Magic” to points 5 and 6. The sum of these
requirements is a reliable system that empowers domain experts to provide access to their work
regardless of their web development skills. While ideally the system should implement a
standard architecture, that goal is unrealistic for prototype development.

3.2 Server Architecture
3.2.1 Choosing the Right Architecture
The extensive discussion of systems architecture in the Literature Review, rather than
clarifying the issue, has somewhat muddied the waters. The concepts of SOA and ROA are
easy enough to understand on their own, but in the context of implementation the lines between
them blur. Some describe RESTful web services as SOAs, despite the fact that REST is an
ROA style. On the other hand, SOAs imply a tight contract between client and server that
follows a strict set of rules defined by the W3C.
To avoid all the difficulties which involve standards-compliant services, we could adopt a
REST style and not worry about how it is categorized. Unfortunately, the REST model is hard to
apply to distributed GIS services. For instance, consider the difficulty of modeling a raster as a
web resource. Obviously, the entire raster can be a resource. But then how do you process that
in a web architecture? The size of the data can preclude processing or even transmission of a
large area. Also, this prevents dynamic area selection – a global raster would need to be cut up

58

with some sort of tiling system. Another option would be to model individual pixel as a resource,
but that quickly becomes absurd! Each HTTP call has a cost in time, and how many calls might
it take to make to perform a complex analysis? Even though the ease and simplicity of REST
make it a tempting option, it does not appear to be the right tool for the job.
To simplify the discussion, it is best to remember that questions arise from trying to apply
a standard pattern to a process it simply was not designed for. While they are the best practices,
they are not the only practices. As discussed in the previous subsection, standards compliance
can sometimes hinder a project’s development rather than support it. Of course, web standards
are important and a robust system should follow a standard architecture to facilitate community
development and reduce failure points. However, when it comes to choosing between
implementing a standard and achieving the project’s goals, accomplishing the goals should
always be more important.
The main goal of this research is not to produce a robust, production-ready system. It is
to create a prototype to test the feasibility of generating DEM error realizations and determine
some of the obstacles to building a robust, production-ready system. Coming into the project,
the author had no back end development experience whatsoever. The Make it Easy principle
applies to both users and developers, so the easiest path to the minimum via product is best
one. Ultimately the system should adopt a standard architecture, but without constructing a
prototype first, especially with the lack of systems architecture experience, we may choose the
wrong one.
Based on the need to create a minimum viable product as quickly and easily as possible,
the best approach is to take what we need from the principles of various architectural styles.
From the SOA-based Remote Procedure Call (RPC), we will borrow the concept of an endpoint
accepting a single call to call a single function. we are only exposing a single function, and we

59

do not need to do anything else with the endpoint because users will be notified of success or
failure by email, as the operations may take a very long time to complete. We do not want to
deal with any of the standards that robust RPC APIs use, however, as that adds a layer of
difficulty unnecessary for prototype development. Instead, we can use the “weakly” RESTful
style described by Lucchi and Millot (2008). A weakly or accidentally RESTful style implements
some of the principles of REST, but not all of them. Like REST, we can use HTTP for our
standard interface. Instead of implementing all 4 of the standard methods, we will using a single
method (POST) to perform an RPC-like call. This architecture is simple, very easy, and, most
importantly, it works.
Having settled on this hybrid architectural style, we need to determine what that function
call actually does. There are two possible approaches for distributing DEM error realizations on
the web: pre-generating and storing them in a database, or creating them on-the-fly in response
to a user request. The former has the benefits of ease and durability, but raises potential storage
problems. One thousand SRTM realizations would require approximately 4.25 terabytes of disc
space based on the average storage size of an SRTM tile. The same number of GDEM
realizations would require 272 terabytes (see further discussion in Section 4). While possible,
such a storage solutions are very expensive and would likely raise other computational issues
for the database. The PostgreSQL website brags, “There are active PostgreSQL systems in
production environments that manage in excess of 4 terabytes of data,” so while it could
probably handle GDEM alone, managing both datasets would be uncharted territory29. The latter
is more difficult to implement, but relieves the burden of storing several thousand global raster
datasets. In addition, on the fly processing seems a better fit for the random nature of the
stochastic paradigm. Figure 8 and the following paragraph describe how that system works.

29

https://www.postgresql.org/about/

60

Figure 8. A conceptual diagram of our system’s architecture. The database both stores the data
and performs the analysis using PL/R. After processing, the server sends the user a download
link by email.
The web server receives and immediately responds to incoming requests. Then, the
data request is queued for execution as soon as the computing resources become available.
When the database is ready, we issue it a new query. That query activates a custom function
which produces error realizations for the area specified in the system user’s request. The
database runs that query, looping as necessary to produce the requested number of
realizations. When the database finishes processing it alerts the web server, which zips the
freshly generated DEM error realizations and sends the requester an email informing them that
their data are ready for download at the supplied link. The following subsections discuss the
technologies chosen to implement this architecture and the reasons they were chosen.

61

3.2.2 Node.js and Express
Node.js30, usually shortened to “Node”, is an open source server platform written in
JavaScript31. This is fairly revolutionary because until the creation of Node, JavaScript was a
technology constrained solely to the browser. For front end developers, this is a dream come
true. Previously, a move from client-side (front end) development to server-side (back end)
development required learning a new language such as Java, C++, or Python. Now, a developer
can use a single, widely-used language across the whole stack without compromising on
functionality or flexibility – characteristics which definitely fit the Make it Easy and Make it Open
principles of our system design. Node is a mature platform capable of performing any operation
its more traditional cousins can. Node is used by some big names, including PayPal32, Netflix33,
and Uber34 because it is easy to use, extremely fast, and capable of doing more with the same
computing resources. This subsection will explain some of those benefits as they relate to this
project.
Because Node uses JavaScript, there is a very small learning curve for most developers.
That’s because JavaScript is one of the three main languages used in front end web
development, the other two being Hypertext Markup Language (HTML) and Cascading Style
Sheets. After a turbulent decade in the hands of private software companies, an open source
revolution began around JavaScript in 2005 thanks to the benefits of AJAX35 – the same
benefits discussed by Han et al. (2008) in relation to GeOnAS. As a result, JavaScript has very
well developed documentation and plenty of grey literature (i.e. blog posts, forums) on how to

30

https://nodejs.org/
https://www.javascript.com/
32
Node and PayPal: https://www.paypal-engineering.com/2013/11/22/node-js-at-paypal/
33
Node and Netflix: https://www.dev-metal.com/going-node-js-netflix-slides-micah-r-netflix/
34
Node and Uber: https://nodejs.org/static/documents/casestudies/Nodejs-at-Uber.pdf
35
History of JavaScript https://www.w3.org/community/webed/wiki/A_Short_History_of_JavaScript
31

62

accomplish an incredible array of tasks using the language making it easy to learn by example.
Also, the fact that it has been so widely utilized for so long increases the odds that a given
developer has some previous experience JavaScript.
Node is powered by Google V8 JavaScript engine36, an open source project that
compiles JavaScript to machine code (Node.js, 2016). What that means in practice is that
JavaScript can run even faster than C++ with the proper optimization37. While this might not be
an important consideration for a prototype system, there are undeniable benefits to starting out
on a scalable platform. In an ideal world, the data distribution system would be used by
everyone who needs to perform spatial analysis using GDEM or SRTM. The term “optimistic”
falls a mile short describing of that outcome, but choosing scalable technologies from the
beginning makes growth later on much easier. In addition to speed, Node has other
characteristics that make it scalable.
The most important of those characteristics is the “event-driven, non-blocking I/O model
that makes it lightweight and efficient38,” and allows a Node server to handle many more
connections than other servers using the same available resources. Node accomplishes this
with the Event Loop, which is best understood with an example. Imagine going to a doctor’s
appointment. You walk in the door, head to the receptionist’s desk to check in, and take a seat to
await your appointment. When a doctor becomes available, a nurse notifies you, brings you
back, and the appointment gets underway. This is a typical system for a doctor’s office to follow,
but it could be done another way. Imagine instead that your doctor handles every piece of the
visit from check in to examination. You avoid the initial wait, but this quickly becomes inefficient.
The intake phase costs the clinic a lot more money, as a doctor is paid much more per hour than

36

https://developers.google.com/v8/
Google V8 Beats C++: http://v8-io12.appspot.com/#93
38
About Node: https://nodejs.org/en/about/
37

63

a receptionist would be to perform the same tasks. But what it it is a busy day at the clinic, or the
patient before you took longer than expected? If there are no doctors to greet new patients, sick
people will be turned away. To prevent that the clinic may have to hire additional doctors to
handle high-demand situations.
Node, on the other hand, follows the form of the first system. By using a single core (the
receptionist) to accept all incoming requests (patients) and only connecting them to processing
cores (doctors) as they become available, the system (clinic) uses its resources more efficiently.
When a new patient comes in, the receptionist processes all the system’s information, puts the
patient “in line” to see the doctor (asynchronicity), and is immediately ready (not blocked) to
handle the next patient regardless of how complicated the first patient’s appointment might be.
Only after the doctor finishes with the current patient does he or she send a nurse (event) to
begin the next examination, ensuring all patients are accommodated in a timely and responsive
manner.
Node’s asynchronous, event-driven style is not the best solution for every problem.
There are two specific cases for which Node is a bad option: performing heavy computation and
interacting with a relational database39. That said, it was designed specifically for use in
distributed networks – an important consideration for research on distributed GIS (Node.js,
2016). This Node.js-based data distribution system does both of those things, but in ways that
do not negatively impact Node’s efficiency. Given the constraints that the system must be easy
to use and should meet researchers where they are comfortable, these two weaknesses can
even become strengths, as discussed in the Section 4.
In addition to the fact that Node’s design principles fit our goals well, there are several
practical reasons for selecting Node as the server platform. The first of those is Node’s fantastic

39

Node’s Weaknesses: https://www.toptal.com/nodejs/why-the-hell-would-i-use-node-js

64

package management system, the Node Package Manager (npm). Built to manage project
dependencies, npm allows developers to quickly and easily install any software published on the
npm website. After installing the software, npm tracks it without storing the entire package so
that code can be easily shared amongst developers. This benefit is especially powerful when
combined with version control software like git, as it ensures developers will use the exact same
software every time they update their local copy of the project. Node programs accept a modular
design philosophy similar to the Unix design philosophy40 which makes npm indispensable.
Instead of writing large programs that offer complete functionality, Node developers usually write
software that accomplishes one very specific task, no more. These “modules” are designed to
work with others and inside of a larger application. Because Node is a mature and very popular
open source project, the community is very large and diverse, modules have already been
written to accomplish nearly all of the common tasks a server might face. This project makes
use of many of those modules, relying on original code only when necessary, improving system
reliability and lowering the barrier to entry for future developers who may want adapt the system
to suit their own needs.
The project uses the Express web server framework to handle and respond to incoming
requests. Express has long been a favorite in the Node community because of it is pluggable
middleware system which makes it easy to extend the server’s functionality. With minimal
configuration, a developer can add support for user authentication, data validation, static file
serving, logging, and innumerable other essential functions. Strong documentation and
continued community support are two more reasons to choose Express. In reality, however, the
server framework is interchangeable. Express was the right choice for this project because it
was well documented and easy to learn, but another developer could accomplish same tasks

40

Unix Design Philosophy: https://en.wikipedia.org/wiki/Unix_philosophy

65

just as effectively using different modules without much added difficulty. The goal of this project
is not to create an innovative new web server, but to give users to a geoprocessing service and
the data it produces. The web server itself only supports this goal, it is not the entire solution –
the actual geoprocessing happens outside the web server subsystem.

3.2.3 PostGIS and R
PostGIS is the spatial objects extension for the open source PostgreSQL database
server. PostgreSQL is widely used and loved; at the time of writing, DB Engines ranks it as the
5th most popular database in the world, with its popularity growing steadily since 201441.
PostGIS has enjoyed a similar meteoric rise in the geospatial industry as a comprehensive
vector data storage and analysis system. More recently, it gained capabilities to work with raster
data. The PostGIS Raster Module42 promises numerous benefits for raster storage and analysis.
There are many small but useful features which databases offer that make them
preferable to file systems for spatial data storage. One of those is built-in, easy to use data
backup solutions. Another is the transaction log, which records all operations performed in the
database. This log can be analyzed if something goes wrong with the system and help bring it
back to life. PostGIS also offers easy ways to restrict which database users can perform which
operations, allowing developers to create separate “roles” (user accounts) for applications,
researchers, and database administrators. These varying permission levels promote system
security and protect data from accidents while also improving the usefulness of logs, which
record information on who conducted a transaction. All of these useful details can be
accomplished with simple file system storage as well, but they are much easier in with a
database and require a narrower range of skills to implement.

41
42

DB Engines: https://db-engines.com/en/ranking
https://postgis.net/docs/manual-2.2/RT_reference.html

66

PostGIS also offers several big-picture benefits, one of which is data organization. The
models which generate the error realization data provided by the data distribution system rely on
several spatial datasets. Using PostGIS allows us to keep all of that data in one central place,
accessing the data through a common interface. This helps keep code simple and readable.
While the same thing could be accomplished using just the file system, PostGIS offers other
benefits that make it a good choice for storing raster data.
One of the biggest benefits PostGIS offers is discussed in the “Early Examples of
Distributed GIS” subsection of this paper: a centralized, authoritative source for geospatial data
which can be accessed from anywhere in an enterprise system. While useful, that benefit does
not directly impact this project because the data distribution system itself is the mechanism for
providing users with data. However, this system is designed to help data producers quickly
move from installation to production with the same tools they are already using. It is likely that
those tools include a PostGIS database.
The final benefit PostGIS offers this project is the numerous procedure languages
available for writing database functions. Procedure languages like PL/R and PL/Python allow
users to create stored procedures in the database using languages other than SQL. That
capability is very useful for this project because the scripts originally created to produce error
realizations are written using the R language. The ability to wrap those scripts in database
functions streamlines calls from the web server to the database server – the web server need
only call the function using tools already available for interacting with the database. In addition, it
offloads the intensive processing from the web server to the database server, freeing Node’s
Event Loop and allowing it work efficiently. PostgreSQL is built to handle these operations, so
there’s no concern about computation there, and PL/R allows us to work with a language we are
already familiar with.

67

R offers a whole lot more than familiarity and simplicity. It is a very powerful open source
tool for statistical analysis and is widely used in the spatial community. While not quite as
popular as Python, R has recently seen a meteoric rise in popularity amongst geospatial
developers43. This is likely because R offers extremely advanced statistical analysis capabilities
which are unavailable in Python44. The original research which created the error models used in
our system used R for that very reason. Additionally, R has long been popular in scientific
disciplines which touch Geography such as Biology. This makes it a natural choice when trying
to work along interdisciplinary lines and increases the likelihood that our project will be beneficial
to other researchers who want to distribute their own research results. For these reasons, R is
the best language choice to accomplish our goals. However, we will not be using it in isolation.
The error model requires a few basic raster-derived inputs: slope, aspect, and MODIS
ecoregion, all of which are stored in the database. These and other variables are used in a
linear regression to calculate a global mean error surface. This global mean error surface,
henceforth called the “Mean Layer”, is the main input to the actual error simulation calculation.
The technique used in error simulation is called regression kriging, which uses a linear
regression model to calculate the Mean Layer – the average expected error – and then simple
kriging to calculate the spatially structured residuals of the linear regression (Hengl, Heuvelink,
& Rossiter, 2007). Using native database functions, we can easily calculate the Mean Layer and
store it for later use. This makes it much easier to generate error realizations as they are
requested, because we can call a single function on a single dataset, significantly reducing
database calls and processing time.

43
44

https://blogs.esri.com/esri/esri-insider/2015/07/20/building-a-bridge-to-the-r-community/
https://www.r-bloggers.com/r-an-integrated-statistical-programming-environment-and-gis/

68

3.3 User Interface
So far we have discussed the back end of the data distribution system, but not the part
system users will actually interact with. In computer programming, people often use a house as
a metaphor to understand the various parts of an web application. The database is the
foundation, the back end (web server) is the house itself, and the front end (user interface) is
interior design. While people might appreciate the layout of a house and its sturdy construction,
they rarely think about these things because they are layers beneath the surface. The
importance of a good user interface was stressed in both the “A Brief History of Error and GIS”
and the “Modern Examples of Distributed GIS” subsections of this paper. Developers may build
a fantastic system, but if it lacks a good user interface it will not attract many users.
Building on the experiences and recommendations of previous researchers, our system
implements a simple, intuitive user interface. It employs AJAX technology to ensure the page
remains responsive for users. It uses a colorblind-friendly color palette to improve accessibility.
Like the database and back end, the user interface is composed of solely open source
technologies to ensure that the entire stack can be used by any researcher who would like to
implement the system to distribute his or her own work. The following paragraphs describe
those technologies and explain why they were selected.

3.3.1 Front End Framework
AngularJS45 is the front end framework we selected for building the user interface.
Angular is an extremely popular open source project developed by Google for creating beautiful
web applications. For a surprisingly small size AngularJS offers a lot of additional functionality

45

https://angularjs.org/

69

over other libraries and frameworks commonly used in front end development46. Additionally,
instead of following traditional Model-View-Controller or Model-View-Viewmodel patterns,
AngularJS is a Model-View-Whatever (as in, “whatever works for you”)47 framework allowing for
more flexible, task-oriented development. These benefits impact both developers and system
end users in the following ways.
Angular is built to create web applications, so its core design principles mandate that the
page remain responsive to users at all times. This is a huge user experience benefit, as users
are never waiting for a loading bar to fill up. For developers, AngularJS exposes straightforward
APIs to implement the AJAX calls which allow that responsiveness. In addition, AngularJS offers
two-way data binding that allows a developer to easily manage user input and send it to the
server. These are just two of the numerous useful features AngularJS offers developers, but
they are the most relevant to this project.
Another aspect of AngularJS that’s very beneficial to this project in particular and
community-driven development in general is that it was designed for testability. In addition to
AngularJS itself, Google has led the effort to create and maintain open source testing tools such
as Karma and Protractor. they have also supported development on existing test suites such as
Jasmine and Sinon. Tests are important because in practice, they work as de facto standards for
your project. Sometimes called specifications, tests are used to determine whether or not code
runs as expected. After adding some new feature, developers can run tests to ensure that their
changes have not broken some important part of software.
Though it offers far more functionality than the project currently needs, the flexibility,
community support, and testability AngularJS offers make it a good platform for growth. This
research yielded only a prototype of a data distribution system that, while functional, is a long

46
47

https://www.airpair.com/angularjs/posts/jquery-angularjs-comparison-migration-walkthrough
https://plus.google.com/+AngularJS/posts/aZNVhj355G2

70

way off from being used in high capacity production environments. That will change. Currently,
our system supports a single processing service. The system’s architecture does not mandate
this in any way, so the system could, in theory, support an entire geoportal like those discussed
in the “Modern Examples of Distributed GIS” subsection and offer services for everything from
data discovery to data visualization. As the project continues to grow, AngularJS will be able to
accommodate that growth.

3.3.2 Web Mapping Library
Our application uses the Leaflet48 open source web mapping library. Leaflet is a good
choice for this project for several reasons. First, it is a popular, mature open source project that
is based on another, even more mature open source web mapping library called OpenLayers.
Both are well featured and maintained by very active communities. For our purposes, Leaflet is
a better choice than OpenLayers because it is smaller and faster. Leaflet also has a much
simpler API than OpenLayers. Since the web map in our application is really only a convenient
way to select data we do not need any of the advanced functionality OpenLayers offers, but the
speed and simplicity of Leaflet are useful.
Of additional benefit to our application is the Leaflet Angular Directive – an open source
project that lets a developer easily integrate Leaflet with an AngularJS project. A similar project
exists for Open Layers that is also well supported and widely used, but for the reasons
described in the previous paragraph, Leaflet remains our library of choice. Pre-existing Directive
code is a nice bonus because complicated directives are tough to write. While it is not strictly
necessary to include Leaflet in a Directive, it does have some nice features.
When using AngularJS, developers write JavaScript following a particular object model
to produce what are known as Directives. After writing the JavaScript, directives can be included
48

http://leafletjs.com/

71

on a web page simply by adding an HTML element. So, instead of writing all of the boilerplate
code needed to display a simple web map, a developer can simply include the Leaflet Directive
and write “<leaflet></leaflet>” to easily render the map on the page. Using the Leaflet Directive
also helps the map fit in better with the rest of the app’s components.
AngularJS also includes the concepts of Controllers and Services. Controllers control
how a Directive responds to user interactions, while Services are used to pass data back and
forth between various Controllers. In our application, we use the Leaflet Directive and a map
Controller to display and control the map respectively. We use a form Directive and Controller to
accept and to respond to user input. Finally, the map Service communicates user interaction
with the map to the form, and interaction with the form to the map. All of this happens in near
real time thanks to the two-way data binding which Angular offers. What this means for the user
is that when he or she resizes the bounding box delimiting the data request, the numbers on the
form change as the corner of the box is dragged.

72

4 From Planning to Practice
The previous section discussed our plan for building the system, the various components
that make up our “stack”, and the reasons for choosing them. Developers use the term “stack” to
refer to the combination of technologies which power a particular system. For example, the
LAMP stack (Linux, Apache, MySQL, and PHP) is a very common setup used to power many
software projects. It is the most common stack used for the popular Wordpress blogging
platform. In the Node.js world, the MEAN stack (MongoDB, Express, AngularJS, and Node) is
the most common. These labels are convenient, as they vaguely describe the architecture of a
system enough for people to talk about them categorically. In reality, though, there are infinite
variations on these common stacks. For instance, it is very common to swap out Apache for
another server called nginx (pronounced Engine Ex) in a Wordpress stack. For Node web
applications, it is not uncommon to run nginx as the main web server for serving static files, then
connect to the local Node application server through some network magic.
These implementation details are invisible when discussing technologies at acronym
scale. Reducing an application to LAMP or MEAN makes it convenient to talk about, but it also
hides the complexity of a system. In other words, it hides many of the things that can go wrong
during the development process. Unfortunately, that is the scale at which all of the traditional
literature on the topic of distributed GIS discusses the subject. More detailed treatments of
system implementation do exist, but in the grey literature of the web such as blog posts from
system developers or questions asked on help forums like the Stack Exchange family of
websites. This section of the paper addresses that gap, and will directly discuss the challenges
encountered during implementation and offer some critiques of the technologies used. Finally, it

73

will address how these considerations relate to the research objectives put forth at the beginning
of this paper.

4.1 The Back End
4.1.1 PostGIS Lessons Learned
PostGIS is undoubtedly a fantastic tool for geospatial research. It can certainly be a
powerful analysis tool, particularly in the when it comes to vector data. It is a masterful storage
solution for raster data, too. It is not, however, a good choice for large scale raster analysis. The
PostGIS Raster Module needs to be developed further before it can rival other GIS solutions for
the analysis of continuous spatial phenomena. This section discusses some of those
shortcomings as they relate to the project.
The first of many problems with the PostGIS Raster Module is the result of a frustrating
Catch 22. Whether in a file system or a database, the best way to quickly access stored raster
data is to cut it up into manageable pieces. A raster with coverage for the entire globe can be
extremely large. ASTER GDEM version 2 has 22,702 1° by 1° tiles averaging about 12
megabytes each in GeoTIFF format, translating to about 272 gigabytes of data total (NASA JPL,
2009). SRTM version 4.1 at 3-arcsecond resolution has just under 15,000 1° by 1° tiles
averaging about 3 megabytes each in GeoTIFF format, for a total of 45 gigabytes (NIMA, 2000).
Obviously, that’s too much data to handle at once, so they are broken into smaller, more
manageable pieces for storage and transfer. Typically, it is the responsibility of the user to
merge these pieces back together if they need more than one of them to perform their research.
Common raster data models like the GeoTIFF spatially reference pixels by their location
relative to neighboring pixels. The only spatial information in the raster is attached to the upper
left pixel, and the locations of all other pixels are defined by the rotation of that first pixel (i.e.

74

degrees from North) and how many pixels down or to the right they are of the upper left corner.
So, in order to spatially locate any point within a raster, a GIS must do the math to figure out
how far down the column and across the row to look for the point. This operation is very slow
because it is a precise process. Databases can improve query speed by creating spatial
indexes, which partition space to allow the computer to quickly find them using a common tree
data structure such as GiST (Generalized Search Tree), B-tree, R-tree, or KD-tree to name a
few. Because rasters only have one point, such indexes are not very useful on large rasters.
One of the main goals of a database is to provide access to those data as though they
were all together in one file. In order to maintain fast query times, however, the opposite must be
true. Instead of combining the tiles, databases cut them up into even smaller pieces. Spatial
indexes become much more effective for the minimum bounding rectangle of small each piece.
Research on optimal tile size for SRTM and GDEM access in a PostGIS database shows that
100 by 100 pixels is the best for query times (Langley & Shortridge, 2015). These mini-rasters
can usually be accessed just as if they were part of a larger whole with one major exception:
any operation which relies on a neighborhood of cells will encounter the edge of a raster much
faster than it would otherwise. In our case, we discovered this while trying to calculate slope on
multi-tile areas. Slope and its cousin aspect are important because they are inputs to the models
used for generating DEM error realizations.
Slope is the first derivative of the elevation surface. To calculate slope, a neighborhood
of at least 4 (bishop’s or rook’s case) cells are required49, though a 3x3 kernel (queen’s case) is
more common (Verdin et al., 2007). What this means in practice is that we cannot calculate
slope at the edge of a raster, because there are not enough neighboring cells. This quickly
becomes a major problem for a database composed of 100x100 pixel tiles. Figure 9 shows the

49

https://www.usna.edu/Users/oceano/pguth/md_help/html/demb1f3n.htm

75

results of a slope calculation on tiled DEM. Fortunately, PostGIS has a tool to counter this
problem.

Figure 9. Left: Calculating slope on a tiled DEM – notice the dark colored lines laid
across the image in a perfect grid. Right: Calculating slope with ST_Union to avoid edge
effects.
By using the ST_Union function, a user can combine those small tiles into a cohesive
whole to perform operations that require neighboring cells. This method is also useful retrieving
data for analysis in an external software package such as QGIS or ArcGIS. Unfortunately, there
are limits to the size of the area ST_Union can handle. In my experiments with GDEM v2, areas
larger than 60 km2 caused PostGIS to throw a memory exception, cancelling the process. I
attempted to work around this process by creating custom database functions to iteratively
calculate slope for small areas around the globe using the PL/PGSQL language because it has
access to native PostGIS functions. To my dismay, this approach did not solve the problem.
Cursors and For Loops – the iterator patterns required to implement the functions i just
described – are notoriously slow in databases50. My custom solutions were taking more than 40
50

https://stackoverflow.com/questions/287445/why-do-people-hate-sql-cursors-so-much

76

hours to run on the State of Colorado alone, so pre-generating a global slope layer was out of
the question.
This caused problems for our original plan for generating error realizations on-the-fly
because it prevented us from creating and storing the Mean Layer. Without the benefits of a
pre-generated mean layer, there was little benefit to using the database for any analysis. It
became more practical to take an alternative architectural approach which would allow us to
continue using R, but without the added difficulty of developing database functions. Instead, we
used Node’s ability to spawn child processes – operating system processes which Node
manages. These can be any process which can be started from the command line, including
geoprocessing scripts written in R or any other language.

4.1.2 Redesigning the Architecture
When this project began, the one of the main goals was to shift a lot of the processing
load to PostGIS to ease integration with the web server and streamline the system. The
previous subsection explained that the analytical and data management tools available in
PostGIS ultimately were not up to the task. Heavy computation on a Node server is bad. It
blocks the event loop and prevents the server from processing incoming requests. When it
became clear that wrapping our error realization generation functions in the database was not
an option, we were unable to offload that heavy processing. Fortunately, we were able to work
around that problem by using Node’s ability to spawn child processes which, like any other OS
process, receive their own dedicated core.
Node is a single-threaded process, so it only requires one core for itself. Any remaining
cores can be used however the server needs, be that for handling database connections or
geoprocessing. Under the original architecture, those cores would be consumed with database

77

operations. Under the new architecture, they will be used to run the R script we use to generate
error realizations. Each running instance of the script pauses its own execution to use its own
thread to pull in data from the database. Those processes run completely independent of the
web server, allowing it to achieve maximum efficiency under either architecture so the redesign
should not negatively impact performance. What will degrade performance under the new
architecture is the increased amount of geoprocessing and database calls required to produce
the data, but because we were unable to produce a global Mean Layer, that was a reality we
faced anyway.
Frustrating though it was, I believe this roadblock ultimately proved beneficial to the
project because the new architecture further separates the distribution mechanism from the
geoprocessing mechanism. The modular nature of the new architecture allows the model code
to continue evolving separate from the distribution system. It also provides the system with the
flexibility to let developers plug-and-play with different geoprocessing scripts. One consequence
of that separation is that a script must be able to access the data it needs on its own, but unless
written to run within a desktop GIS environment most geoprocessing scripts already do that. As
a result, researchers probably need to change very little of the business logic in their scripts
before attaching them to the distribution system. The nature of those changes is further
discussed in the “Combining the Two” subsection.

4.2 The Front End
Unlike the back end, front end development proceeded exactly as planned. The
AngularJS framework proved very easy to work with and readily handled our use case. The
reasons for choosing AngularJS described in the “Front End Framework” subsection of this
paper, so this subsection will instead focus exclusively on implementation details. It begins with

78

a description and user interface, moves on to discuss the UI router module and its role in the
project, and concludes with a discussion of user experience considerations.

4.2.1 User Interface
The system’s user interface is designed as a Single Page Application (SPA). SPAs are
built to look and feel like a desktop app with the intent of improving overall user experience.
Users do not have to follow a series of links to navigate through the website. The same page
they land on is the page they stay on throughout their time using the system. The layout of the
application is similar to that of other mapping applications like Google Maps or Esri’s Story Map,
also examples of SPAs, with a sidebar on the left taking up approximately 30% of the screen
and a map on the right taking up the rest. This layout emphasizes the map as the primary tool of
discovery and leaves the sidebar to provide information only when the user seeks it. Most
importantly, it is a layout very familiar to users because it is commonly used in web mapping
applications. Figure 10 shows a screenshot of the interface.
Across the top in green are the three panel views: About, Help, and Data. The About
panel just provides information about the project for the curious explorer. The Help panel
answers some questions we have anticipated users might have, such as “How do I use this
website?” and “How do I draw a shape?”. The Data panel is open when a user lands on the
page, and provides the data request form as well as brief description of how to use the page
and a link to the Help panel should the user require more information. On the Data panel just
below the description and above the form are blue buttons which the user may click to either
“Draw a Shape” or “Use View” to fill in the bounding box information required by the form. When
a user clicks on either, the grey message at the top changes to a help message about how use
that method to complete the form. In addition, a red box appears on the map to show the

79

selected area. Users may also fill out the bounding box manually if they know the coordinates
and then verify their selection using the “Plot Area on Map” button at the bottom of the form.
After filling out the form, a user may either submit their data request with the “Submit” button, or
start over by pressing the “Reset” button. There’s nothing complicated about our UI, and that’s
exactly the point.

Figure 10. The Data panel of the user interface allows users to submit data requests to
our system.

4.2.2 User Experience
A good user experience is about more than intuitive, good looking design. While both of
those qualities are important, good design both anticipates and enforces intuition. it is also
inclusive. A user interface cannot be well designed if it fails to consider accessibility for vision
impaired users. This subsection describes how our user interface addresses those concerns.
We adhere closely to HTML standards for form design and use label elements to
describe each input element. This ensures that visually impaired users who rely on screen

80

readers will still be able to to access and use the application. We also considered colorblind
users in our design, ensuring that the colors selected for warnings and success messages were
distinct enough to catch the eye regardless of which type of colorblindness a user may suffer
from. These seem like small concerns, but they are not for the millions of people who deal with
them every day. it is these subtle choices that make good interfaces. Another subtle but simple
way to improve user experience is by reinforcing user interaction. This is especially important for
SPAs which, by design, do not reload when a user clicks a button or provide other signals that
user input has been received. Instead, a developer conceive of ways to explicitly signal that
application is registering the user’s actions.
In our user interface, every interaction with the system is acknowledged and reinforced.
For example, we use client-side validation to inform the user if the values they have entered are
incorrect. If a user supplies impossible coordinates for the bounding box, the form will request
that they enter a value between -180 and 180. All fields of the form are type and content
validated in a similar manner to improve user experience. Additionally, the “Submit” button at the
bottom of the form remains disabled until all the fields are filled and validated. After the user
submits the request, the button is disabled again to prevent the user from accidentally
submitting the request more than once. Figure 11 demonstrates what these features look like for
a color blind user – note that while different from the colors in Figure 11, they are still very
distinct. In addition, the “Submit” and “Plot Area on Map” buttons are disabled because the form
is incomplete.

81

Figure 11. This image shows the user interface with a color blindness filter applied and
form validation feedback showing.
The Data panel is not the only part of the interface for which reinforcement is important.
Users should also be guided between panels. UI Router allows our interface to avoid a common
pitfall of SPAs, which is poor navigation capabilities. Because the SPAs are just a single web
page, the URL in the address bar typically does not change when you navigate through the
application. For example, when switching between the panels of our sidebar the URL would not
change to reflect which panel the user was currently viewing. UI Router allows us to do exactly
that, using the address bar to signal when the panel has changed. In addition, it allows the user
to share the link the that specific panel; visiting the URL for the help panel overrides the
interface’s default behavior of showing the Data panel first. Deep linking also provides great
opportunities for extending the user interface in the future. As this project matures, it is likely that
we will continue adding new services. At some point, we may need to redesign the interface to
allow for a catalog of possible services. UI Router allows us to deeply link each of those

82

services within the application allowing a user to bookmark a particular service so that they can
quickly and easily find it in the future.

4.3 Connecting the Two
4.3.1 Feeding User Input to R
After the failure of the original system architecture plan, we had to rethink how we would
manage user input in our new environment. Some digging online revealed a way to use Node to
manage a batch of long-running geoprocessing tasks51. The API for creating child processes in
Node allows a developer to pass environment options, including global environment variables.
Our new system architecture works the same way, except that it parses user input to set the
values of the variables rather than pulling them from a pre-created list. We rely on an
asynchronous task queue to execute our geoprocessing script with its unique environment
variables. The following paragraphs describe how that works.
When a user requests data, the server may take anywhere from a few minutes to several
hours to complete the request. In either case, that’s way too long to be waiting for a response
from the server. To avoid that problem, we implement what is commonly called a worker
function. Worker functions just set up background processes. In JavaScript, functions are first
class objects. This means that they, like any other object, can be put into an array, assigned to a
variable, or even passed to another function as an argument. This includes worker functions.
When a data request comes in, the server receives it, validates it, supplies

51

https://contourline.wordpress.com/2013/10/08/700/

83

Figure 12. This image shows the user interface with a color blindness filter
applied and form validation feedback showing.
the request data to the worker function, then inserts the worker function into a process queue.
When system resources are available to execute the process, the queue calls the worker
function, which begins processing the data based on the request parameters. After the task is
complete, the worker process emits an event to tell the system it is done processing and ready
for the next task. The “done” event in turn triggers the auto-mailer, which sends an email to
inform the user that the data are ready for download.
One important thing to note about the process described above is that the worker
function can call processes in other languages. That is hugely beneficial to the goal of meeting
researchers where they are comfortable. Not only can end users work with the resulting data in
whatever workflow they are already using, but any researchers who want to use the system to
distribute their own work can use their existing code with minimal modifications. While most

84

other server platforms offer the same capability, none of them can match Node’s ability to
handle so many connections efficiently. Though we only tested the server with single-threaded
operations, we anticipate that so long as a developer using a multithreaded script considers that
when configuring the server, the system should handle it just fine. Thanks to Node and child
processes, the system is easy to use both as a data producer and data consumer. In addition, it
certainly meets people where they are comfortable, allowing them to work in languages they are
already familiar with. These characteristics match the design philosophy of this project well and
are a big part of the reason we retained Node as the server platform after the original
architectural plan proved to be a failure.
Another implication of the new architecture is that the system could easily be extended
to run multiple scripts, allowing the developer to expose many geoprocessing services via the
same web server. This would move the system much closer conceptually to service-oriented
portals like GeOnAS. As discussed in the previous subsection, the user interface is also ready
for extension after the system becomes production ready. Had the original plan for the system’s
architecture prevailed, the system would have none of the flexibility worker functions offer by
loosely coupling data processing and web server.

4.3.2 Security concerns
In today’s world, security must be a primary concern for any network-enabled system.
Recent high-profile attacks on companies like Sony52 and organizations like the Office of
Personnel Management53 have proven that no one is safe. Computer security researchers have
even proven that attackers can take over a car as its driving down the road54. Clearly, an
analysis of potential security risks is of critical importance.

52

https://www.washingtonpost.com/news/the-switch/wp/2014/12/18/the-sony-pictures-hack-explained/
http://nbcnews.to/1GlS9jm
54
https://www.wired.com/2016/03/fbi-warns-car-hacking-real-risk/
53

85

The system described over the course of this paper has a very small “attack surface”.
The term attack surface refers to “all of the different points where an attacker could get into a
system, and where they could get data out”55. Our system has exactly one entry point – the data
request endpoint to which users send POST requests, and exactly one exit point – a static file
server which only distributed data pre-packaged by the geoprocessing script. All other data
manipulation takes place in the geoprocessing script or the server code which cannot be
accessed from the outside. While that makes an attack unlikely, it most certainly does not make
it impossible.
Any time a server accepts raw user input, security must be a concern56. That concern is
especially valid when that user input is used to make database calls. Attackers can provide
malformed input which can give them access to the database or even the entire system. The
best way to avoid those problems is by rigorously validating any user-provided data on both the
client and server side of the application.
Another attack to which our system is vulnerable is the Distributed Denial of Service
(DDoS) attack. This type of attack involves flooding a system with incoming requests and is a
favorite tactic of hacktivist groups like Anonymous because they are virtually impossible to
defend against 57. The only way to tell the difference between valid and malicious requests is to
examine how quickly an IP address is making the requests. A common practice is to block
requests occurring in rapid succession, e.g. less than one second apart, because humans do
not typically make requests that quickly. Even then, the server has to do the work of analyzing
and rejecting those requests.

55

https://www.owasp.org/index.php?title=Attack_Surface_Analysis_Cheat_Sheet
https://www.owasp.org/index.php/Input_Validation_Cheat_Sheet
57
https://www.wired.com/2016/01/hacker-lexicon-what-are-dos-and-ddos-attacks/
56

86

Fortunately, DDoS attacks are very unlikely to cause actual damage to a system. They
do not help attackers gain control of a system, only to bring it to a crawling halt. The system
would experience the same results should a large number of people suddenly decide to start
using the system at once. Because of the obscurity of its purpose (security through obscurity,
they say) and the fact that it is a prototype, our system architecture does not have capabilities
for defending against a DDoS attack.
The final security consideration for any system must be ensuring user privacy, and ours
is no exception. For the ease of development, the system currently uses HTTP to transfer data.
In a production environment, the system should use HTTPS so that traffic between the server
and client are encrypted. We may not be requesting credit card information over an unsecured
channel, but without SSL (Secure Sockets Layer) encryption attackers could easily read the
email addresses sent in data requests. The switch from HTTP to HTTPS is trivial and simply
requires registration with a certificate authority such as LetsEncrypt58 and the placement of the
provided certificate in the server’s root directory. Node and Express have built-in methods for
handling HTTPS, so only minor adjustment will be required to the server code.

4.4 System Performance
In any computer system performance is important, but that holds especially true for web
services. On a desktop computer, the user is the only one that controls the load on the system.
If he or she is running a computationally intensive analysis which makes the computer unusable,
the only person inconvenienced is himself or herself. For systems like ours which open
themselves to the Internet, anyone can place a heavy load on the server. This is the entire point
of the DDoS attacks discussed in the previous subsection. If our system goes down, it could

58

https://www.letsencrypt.org

87

affect anyone who has unprocessed data requests waiting for computation resources. But
understanding our system’s performance is not only about protecting users, it is also about
affording them the best possible experience. By testing our system’s strengths and weaknesses,
we can learn how to optimize its performance. For example, if we know that our system
processes realizations faster when there is fewer jobs processing at once, we can find the sweet
spot between concurrency and speed which allows us to serve the most users as quickly as
possible. Both to protect ourselves from the possibility of a crash due to overloading and to tune
our system for maximum efficiency, we need to know exactly what our system is capable of.
This section is about discovering its limits and describing its operational characteristics.
There are three main factors which determine our processing time: the number of
realizations generated, the size of the requested area, and the concurrency number of jobs
running at once (concurrency). Based on the performance of our R scripts before using them in
the context of the data distribution system, we have a few hypotheses as to how each will affect
our system. First, we believe that the number of realizations has the least impact on overall
performance, as Dr. Shortridge’s experience with the code showed him that the most
computationally expensive part of the process is variogram estimation. Since the variogram
model only needs to be estimated once for a given area, it should be fairly trivial to produce
large numbers of realizations for that area.
The next consideration is the size of the requested area, which we believe greatly
impacts performance. Our script has a problem with large areas because of the way we retrieve
data from our PostGIS database. The some of the same limitations that prevented us from
calculating the global Mean Layer inside the database using PL/PgSQL and PL/R prevent us
from efficiently pulling those data into our R script. Tiling the database is necessary to maintain
fast query time, but as discussed in Section 3.2.3 it causes problems when recombining the

88

data. In PostGIS, we needed to use the ST_Union function which could handle only small areas.
Our R script avoids ST_Union by manually stitching together the tiles returned by our database
query. Unfortunately, this has its own performance problems which, in the end, are very
comparable to ST_Union’s. As a result, we expect that the size of the requested area will have a
large negative impact on system performance.
The final factor in our system’s performance is how many concurrent processing – how
many geoprocessing jobs are running at any given time. We expect to see that regardless of the
fact that each geoprocess the server spawns has its own dedicated CPU, the greater the
number of running jobs, the poorer the system’s performance will be. We attribute this to the fact
that memory is still shared amongst all the processes. For that reason, we currently limit the
number of concurrent processes to 8, preventing our server from ever coming close to its full
capacity. The physical machine we run our application on has 64 cores and 64 gigabytes of
memory. Based on what we observed in development, our processes usually use about 2GB but
as much as 4GB of memory per process. By limiting the number of jobs to 8, we ensure that the
system never exceeds more than 50% of its memory usage for our application. This is
important, as the server is a shared resource on which other researchers rely. Testing our
system’s capacity should help us discover if 8 jobs is really the best balance of capacity and
speed.
To test our system’s capacity, we randomly generated 50 data requests for three different
sized study areas and logged the processing time of each to a file. Using JavaScript’s
Math.random()59 function to pull from a uniform distribution to get both random locations within
the Continental United States and a random number of realizations between 50 and 500. The
idea is that these conditions simulate the actual load a system might face. Not everyone is

59

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random

89

requesting the same number of realizations for the same area, so our method allows us to vary
the request parameters in a way similar to what we expect to see from users. We tested square
patches that were 360, 540, and 720 arcseconds on a side and compared run times both within
their respective groups and averages across all groups.

Figure 13. Run times in hours across all 50 jobs for 720 arcsecond patches.

90

Figure 14. Run times in hours across all 50 jobs for 540 arcsecond patches

Figure 15. Run times in hours across all 50 jobs for 360 arcsecond patches.

91

Figure 16. Average time to complete a job by size of requested area.
Based on the results of our tests, we feel confident that our hypotheses about size and
concurrent processing are correct. Figures 13, 14, and 15 show each of the individual task run
times for each group. Regardless of the area’s size, processing times increased as more jobs
were completed, suggesting that the number of concurrent processes impacts processing time.
Ideally we would confirm that hypothesis with tests which slowly incremented concurrency over
time. Unfortunately they proved too difficult to create in the limited time between system
development and the publication of this thesis, but would be very interesting to see and may be
worth the effort in the future. At first, it seemed that the numerous peaks and valleys across all
of the sizes might be related to concurrency as well, but the way the test scripts are written
ensures that 8 jobs will always be running until fewer than 8 jobs remain in the process queue.
Instead, they may suggest that the number of realizations requested matters more than we
initially thought. Further testing should be done to see how incrementing the requested number
of realizations affects processing time.

92

Finally, Figure 16 compares average processing times across request sizes. As
expected, size severely impacts performance. There appears to be an exponential increase in
processing time for linear increases in the size of the requested area. While it would have been
nice to test this with larger sizes, the already long run times threatened to become prohibitive.
The long run times are due to the limitations of the database. As described in Section 4.1, the
tiling process required for storing large datasets is problematic when it is necessary to
recombine the tiles for analysis, like it is in our case. Because of the memory limitations of the
function PostGIS uses to recombine tiles natively, our error modeling script manually
recombines them. Unfortunately, due to the way PostGIS provides access to the underlying
data, our script suffers the same limitations. As with the database, our script will attempt to
perform the requested operation regardless of whether or not it can be successfully completed.
Until the data retrieval logic is improved, we should most certainly limit the size of the area
which users can request to avoid excessive stress on the system and prevent users from
entering data requests which cannot be fulfilled.

93

5 Using the System
Building a system to distribute error realizations only completes the first research
objective. To truly empower researchers to get more accurate results from their work, the next
step is to show them how to use the error realizations in their own work. The literature
surrounding Spatial Decision Support Systems (SDSS) frequently considers uncertainty. Indeed,
literature related to handling uncertainty within SDSS goes all the way back to the 1990s
(Crossland, Wynne, & Perkins, 1995). One of the seminal papers on error propagation and
uncertainty comes from the SDSS literature (Aerts, Goodchild, & Heuvelink, 2003). As such, this
literature will play a critical role in the development of our own error-aware analysis.
In developing our analysis scripts we will continue to use only FOSS, as this ensures
that they will be helpful to the greatest number of users. Hengl, Heuvelink, and Van Loon (2010)
provide an example of an analysis tool created using FOSS that includes stochastic simulation
in their work using the R Statistical Package (R Core Team, 2016) for stream network derivation.
Darnell, Tate, and Brunsdon (2008) also use R to implement a FOSS model for stochastic
simulation and error visualization. However, both these examples leave something to be desired
because they rely too much on the user having certain statistical knowledge. In reality, most of
“these users undoubtedly lack expert knowledge about both the data collection methods
employed by the data producer and the spatial simulation model theory and implementation in
vogue with spatial information scientists” (Goodchild, Shortridge, & Fohl, 1999). Our system
avoids that problem, as the analysis below reveals.

94

5.1 Monte Carlo Viewshed Analysis
In order to demonstrate how researchers might use our data distribution system, we
looked to basic but common analyses which GIS users perform using DEMs. One of those,
viewshed analysis, is an ideal operation to demonstrate necessity and simplicity of performing
uncertainty-aware analyses. Given the location of an observer and a DEM, a basic viewshed
analysis determines whether or not the land represented by a given cell in the DEM is visible to
the observer. Uncertainty in the DEM can easily lead to bad results.
Our data distribution system promotes a type of uncertainty-aware analysis known as
Monte Carlo analysis, so named after the well-known casinos of that municipality. Monte Carlo
analysis seeks to reduce uncertainty by anticipating all possible outcomes of a particular
operation and the probability that those outcomes will occur60. In order to do that, a researcher
needs to vary the inputs to their analysis based on a set of conditions that reflect possible
variations in the data – a process known as conditional stochastic simulation which we
discussed in the Error and Uncertainty section of this thesis. Our system removes the difficulties
associated with gathering those simulated inputs, letting researchers skip straight to the analysis
phase.
Monte Carlo viewshed analysis is not very different from a regular viewshed calculation.
In fact, a Monte Carlo viewshed analysis is composed of many regular viewshed analyses. In
the context of our system, a Monte Carlo viewshed analysis performs a regular viewshed
analysis for every DEM Error Realization the user downloads. After all realizations have been
processed, the researcher takes the mean of all the outputs to create a viewshed probability
surface. Instead of values of either 1 or 0 (visible or not visible), the viewshed probability surface

60

http://www.investopedia.com/articles/financial-theory/08/monte-carlo-multivariate-model.asp

95

contains values between 1 and 0 reflecting the probability that a particular DEM cell is visible
from the observer’s location. For a graphical representation of the process, refer to Figures 3
and 4 in Section 2.1.3

5.1.1 Changing the Data Model
Monte Carlo analysis requires a fundamental rethinking of the traditional data model. In
addition to the traditional combination of attribute data, geometric data, projection data, and
metadata, the Monte Carlo paradigm requires a fifth component: an uncertainty model. Thanks
to our system, researchers no longer need advanced geostatistical knowledge to perform Monte
Carlo analysis. Unfortunately, it does not free them from the constraints imposed by the inability
of mainstream GIS software to handle a new data model.
Ideally, an error-aware GIS would support the ideas of Goodchild, Shortridge, and Fohl
(1999) or Heuvelink, Brown, and van Loon (2007) who discuss the possibility and logistics of
actually including uncertainty information in the data model. As this paper has thoroughly
explained, while such software does exist it is not accessible to the majority of users and is far
from mainstream. As discussed in Section 3.1.4, our system works around this problem by
changing the data model at a conceptual level rather than at implementation level, enabling the
use mainstream tools for Monte Carlo analysis. Instead of encapsulating a simulation model
within the data and requiring that users know how to to produce realizations, we distribute only
the model outputs. Conceptually, both methods deliver numerous versions of a single dataset.
However, the latter only requires the user to adjust his or her thinking, while the former requires
a software redesign. Therefore, the former is more likely to actually be used by researchers
because it allows them to use tools they are already familiar with and requires minimal

96

adjustment to existing analysis scripts. To demonstrate this, we wrote a simple Monte Carlo
Viewshed Analysis script using R and GRASS GIS.

5.1.2 An Example Analysis with R and GRASS
We selected R because it is FOSS and available around the globe in many languages,
maximizing the likelihood that researchers can apply this example to their own work. In addition,
it is available for download free of charge and has a very active development community. As
discussed previously, it is also widely used in the geospatial community. Finally, R can be used
as a geoprocessing engine in QGIS – the most popular open source GIS in the world.
Eventually this example can be fully integrated with the QGIS GUI, making it easy for anyone to
use an uncertainty-aware GIS. These reasons, combined with the fact that we are already using
R for our DEM Error Realization scripts, make R a natural choice for our application example.
The script we developed follows the method described at the start of this section. The
code is free, open, and available online via the Help section of the distribution system. It relies
on the very useful and greatly appreciated work of the GRASS Development Team (2016) for
GRASS GIS; the R Core Team (2016) for the R Language; Roger Bivand (2016) for the rgrass7
package which connects the two; Pebesma and Bivand (2005) for the sp package; Hijmans
(2016) for the raster package; Bivand, Keit, and Rowlingson (2016) for the rgdal package; and
Shum and Akimov (2015) for the hashids package. Monte Carlo viewshed analysis is possible
using these packages and some simple file manipulation logic.
After downloading and extracting DEM error realizations for the desired study area, they
should be projected because the realizations are distributed in the WGS84 geographic
coordinate system. The r.viewshed algorithm used in the analysis relies on map units to
calculate slope, and degrees can be a problematic unit in that calculation (GRASS Development

97

Team, 2016). The script provides a function to easily convert all realizations from EPSG 4326 to
the desired coordinate reference system. After that, a user simply needs to supply one more
more observation points in a SpatialPointsDataFrame object, the name of a column in that
object to use for naming the output viewshed(s), the path of the DEM error realizations directory,
and the path to the desired output directory. Behind the scenes, the code performs a viewshed
calculation for each point using each DEM Error Realization and returns a single raster that is
the mean of all viewshed calculations.
The analysis itself is only important as proof of concept. The “study area” selected was
one of the randomly selected test patches created during the system tests described in Section
4.4 and covers an approximately 11 square kilometer area near Modesto, California, USA. What
is meaningful is how simple it becomes to conduct Monte Carlo analysis when the user no
longer needs to create, tune, and run error simulation models to create DEM error realizations.
The script also reveals the implication of the changes Monte Carlo analysis requires, accepting
a directory of inputs rather than a single file. For comparison purposes, Figure 17 shows the
results of the entire analysis, the output of one of the intermediate viewshed analyses for
calculation. Note the differences in the values displayed in their legends and in their
appearance – the intermediate result implies authority and truth by declaring cells as visible or
not visible, while the final result reflects the fuzziness of the actual data.
The intermediate viewshed in Figure 17 is just one of the 107 viewshed analyses (one
for each DEM Error Realization) conducted on this test dataset. While it looks very similar to the
Monte Carlo Viewshed at first, they have very different interpretations. In the intermediate result,
cells may be either visible (value of 1) or invisible (value of 0) to the observer. The Monte Carlo
result instead shows the percentage of intermediate viewsheds in which a particular cell is
visible – the higher a cell’s value (the lighter its color), the more frequently it is visible. Even the

98

cells with the highest probabilities of being visible to the theoretical observer are invisible in
about 10% of the Intermediate analyses, clearly demonstrating the uncertainty inherent to
SRTM v4.1. Without our system to provide easy access to DEM error realizations, this is the
level of uncertainty researchers are absorbing. Rather than relying abstract measures of
accuracy stored in an associated metadata file to help them decide whether or not SRTM is fit
for their use, they can visually interpret the uncertainty – a much more intuitive approach. This is
just one of the numerous benefits Monte Carlo analysis has to offer.

Figure 17. A comparison between the final result of the Monte Carlo Viewshed Analysis
script and one of its intermediate outputs.

99

6 Conclusions
Though many obstacles remain on the path to bringing uncertainty-aware spatial
analysis, the web-based data distribution system described in this thesis represents a step in
the right direction. First and foremost, it accomplishes the primary research objective of
developing a system for distributing DEM error realizations over the Internet. Second, we
succeeded in creating a relatively simple example of how to incorporate Monte Carlo analysis
into basic GIS operations to help researchers understand how they might apply it in their own
work. We did both of these things using exclusively open source technologies, ensuring that
anyone can benefit from them. Finally, we explored the implications of the stochastic paradigm
on the geographic data model in a wide variety of contexts, from systems architecture to applied
analysis. Though our system changes the data model at a conceptual level rather than an
implementation level, it still provides some challenges for both users and developers. it is also
the best way to address the data model problem given the fact that mainstream GIS software
does not handle the stochastic paradigm well, and other software, though it exists, is both hard
to acquire and difficult to use.
The key driver behind our primary research objectives – the reason for doing this
research in the first place – is making it easier for geospatial researchers to consider the impact
of uncertainty on their work. Our new system is the first uncertainty-aware analysis tool that truly
offloads the burden of creating, tuning, and running the complicated geostatistical models which
produce our DEM error realizations from end users and onto geostatistical experts. In the spirit
of our core design principles, we have definitely made it easy and made it magic. We feel that
the simple analysis scripts discussed in Section 5 in general and Section 5.2.2 in particular

100

demonstrate that. Despite all these successes, there were most definitely some
disappointments as well.
Without question, the biggest of those disappointments was the inability of PostgreSQL
and PostGIS to meet our analysis needs. It proved extraordinarily difficult to work with the data
the way we needed to, likely because SQL is a declarative language and lacks the functional
control of languages typically used for analysis such as Python and R61. Once we finally had
code that worked for generating the Mean Layer, we quickly discovered that it would not scale to
the size we needed. Far from being able to generate a Mean Layer for the entire globe, our
queries could not even handle all of Colorado. Even after abandoning our hopes of storing the
Mean Layer in the database, PostGIS still caused us performance problems as discussed in
Section 4.4. Admittedly, being forced to change our architectural plans ended up a fortuitous
development, even though the process of finding PostGIS’ limitations was frustrating. The major
advantage of the new approach is that it allows a researcher to, with minor adjustments, use the
same code they developed in their research to create the final product. Python, R, or anything
else that can be run from the command line and access system environment variables will
probably work just fine. While this problem has a silver lining, others do not.
Initially, we had hoped to develop more than an example for researchers to follow. At the
beginning, we set out to create script tools capable of integrating with QGIS’ user interface,
making it even easier for researchers to use our system. Had we succeeded, a researcher
would not even need to be familiar in with R to use our code to perform his or her own Monte
Carlo viewshed analysis. We also wanted to offer at least two additional Monte Carlo analysis
tools (Slope and Aspect) to the portfolio, but the time spent trying to work around the database
ultimately prevented us from achieving those goals. While disappointing, this failure was not

61

https://neo4j.com/blog/imperative-vs-declarative-query-languages/

101

total as we were still provided at least one example. In addition to these minor issues, this
project also suffered from some major limitations.

6.1 Limitations
Without question, the biggest limitation to this research was our extremely limited back
end development experience. Researchers in Geography are usually not concerned with
matters of web server architecture, so there online resources were the best we had. While the
documentation for all of our selected modules and libraries is very good, there is no substitute
for practical experience in software development. While not catastrophic, the implications of our
inexperience certainly affected the project and cost us more than just time. For example, despite
knowing the importance of testing and selecting frameworks known for their testability, our code
has no test coverage whatsoever. This was not for lack of consideration or effort, but simply
because we did not have enough experience to know how things were supposed to work,
learning as we went along.
Automated tests formally codify an application’s logic, running a set of functions which
determine if the application is actually doing what the developer thinks it is – if it is doing what it
was designed to do. Automated software testing is very important, particularly in open source
projects in which contributors may rarely meet face to face, because it defines a set of rules for
the application to follow. If a developer makes a change to the codebase, they can simply run
the tests to ensure that their modifications have not negatively affected the system before
sharing their changes with others. Not having these tests puts limits on the future of our project,
and they need to developed before any of the items discussed in the “Future Directions”
subsection.

102

Another limitation was our stubborn insistence on using PostGIS for our analysis, for
which we lost time on every other aspect of the project including testing. The author’s
inexperience with web architectures led him to believe that it would be the only easy way to
achieve the system objectives discussed in Section 3, and the ever-elusive solution to our query
problems always seemed to be just around the corner. Even when we finally found a solution, it
was not good enough to accomplish our goal of generating the global Mean Layer within the
database. We would have seen far more return on our invested time and effort had we turned to
other solutions like Node’s child processes earlier. Had we settled on a different architecture
sooner, we may have had time to write our tests, complete all of QGIS tools we had originally
hoped to create, or implement a standards-compliant service interface like JSON-RPC. PostGIS
is a proven tool for data storage and vector-based geospatial analysis, but we strongly
recommend against using it for advanced raster analysis.
The final limitation of this project which we will discuss is also a testament to its
uniqueness and importance: no other system exists by which we can measure our system’s
performance. While Section 4.4 discusses system performance, there is no industry benchmark
against which we may compare our results. To our knowledge, there are not even similar
systems written in other languages which might truly compare to our’s. Certainly Heuvelink and
Brown’s Data Uncertainty Engine is in the same class, but it has very different functionality and
according to a 2010 paper, the only way to obtain a copy is to directly contact the authors and
request it. Without building a second system in a different language, we have no way to
objectively assess the benefits one stack may provide over another. Obviously, we did not have
the time to develop a second system purely for comparison purposes. While very
understandable, this limitation is also no other systems like this We only created one server, so

103

we cannot really test it against an implementation in another language or using different
technologies.

6.2 Future Directions
Though successful in developing a prototype system, the results do not constitute a
production-ready system. The software is written by amateurs, for amateurs. It needs a lot of
work from people more experienced with web application design. Having recently accepted a
position as a software developer with a company that uses a very similar stack to the system
described in this thesis, the author is confident he will soon have the experience to make good
decisions about the system’s architecture and continue its development. One of the first goals
for that improved architecture must be the adoption of some standardized connection interface
such as OGC’s WPS or JSON-RPC. Which one will be best depends largely on decisions about
how the architecture should grow to interface with new clients.
The main purpose of employing a Web Services standard is to allow third parties to
confidently develop clients for the service we provide. Of course, it also offers other benefits like
chainability and discoverability depending on which standard is chosen. Currently, our online
GUI is the only client designed to access our API. Adopting a standard like WPS would allow
interested parties to use existing clients designed to work with those standards. For example,
QGIS and ArcGIS are both designed to work with the WPS standard. It would also be possible
to connect to a WPS or JSON-RPC service from R.
Should we find a good way to generate and store the global Mean Layer, we could
transition the service from an entirely server-side architecture to a hybrid client-server
architecture like the one discussed by Walker and Chapra (2014). By creating an R package
designed around our API, we could transmit the global Mean Layer to the client using the WPS

104

specification and let the package generate realizations on the client side. This would
dramatically reduce the processing load on our application servers and improve the user
experience because researchers would no longer have to wait for extended periods to get their
data, instead generating it themselves from the much smaller (and therefore faster to transfer)
Mean Layer dataset.
After architecture improvements, the biggest goal for future research relating to our
system must be spreading knowledge about it, and the best way to do that is by example.
Future work must focus on not only applying our system in various analyses, but publishing
tools that may be reused by others for the same purpose. Ideally, those tools would not be just
code, though that would certainly be acceptable. Instead, those works should seek to fill the gap
left by this thesis and create tools which leverage QGIS’ GUI to improve user experience and
reach a wider audience. This would be the best sort of evangelism our system could hope for,
and it would certainly drive traffic. The authors of such tools could even provide access to them
directly through our system’s web interface, making them extremely accessible.

105

REFERENCES

106

REFERENCES

Aerts, J. C. J. H., Goodchild, M. F., & Heuvelink, G. B. M. (2003). Accounting for Spatial
Uncertainty in Optimization with Spatial. Transactions in GIS, 7(2), 211–230.
http://doi.org/10.1111/1467-9671.00141
Aerts, J. C. J. H., Lin, N., Botzen, W., Emanuel, K., & de Moel, H. (2013). Low-Probability Flood
Risk Modeling for New York City. Risk Analysis, 33(5), 772–788.
http://doi.org/10.1111/risa.12008
Aerts, J. C. J. H., Clarke, K. C., & Keuper, A. D. (2003). Testing popular visualization techniques
for representing model uncertainty. Cartography and Geographic Information Science,
30(3), 249–261. http://doi.org/10.1559/152304003100011180
Arbia, G., Griffith, D., & Haining, R. (1998). Error propagation modelling in raster GIS: overlay
operations. International Journal of Geographical Information Science, 12(2), 145–167.
http://doi.org/10.1080/136588198241932
Atkinson, P. M., & Foody, G. M. (2002). Uncertainty in Remote Sensing and GIS.
http://doi.org/10.1002/0470035269
AWG, I. (2011). Systems and software engineering — Architecture description (ISO/IEC/IEEE
42010:2011).
Bastin, L., Cornford, D., Jones, R., Heuvelink, G. B. M., Stasch, C., Pebesma, E., … Williams,
M. (2012). Managing Uncertainty in Integrated Environmental Modelling Frameworks:
The UncertWeb framework. Environ Modell Softw, (in press), 116–134.
http://doi.org/http://dx.doi.org/10.1016/j.envsoft.2012.02.008
Beard, K. (1997). Representations of data quality. Taylor and Francis.
Bédard, Y., Gervais, M., Devillers, R., Levesque, M.-A., & Bernier, E. (2009). Data Quality
Issues and Geographic Knowledge Discovery. Geographic Data Mining and Knowledge
Discovery, (May), 99–115. http://doi.org/10.1201/9781420073980.ch5
Bivand, R. (2016). rgrass7: Interface Between GRASS 7 Geographical Information System and
R. Retrieved from https://cran.r-project.org/package=rgrass7
Bivand, R., Keitt, T., & Rowlingson, B. (2016). rgdal: Bindings for the Geospatial Data
Abstraction Library. Retrieved from https://cran.r-project.org/package=rgdal
Bolten, A., & Waldhoff, G. (2010). Error Estimation of Aster Gdem for Regional Applications Comparison To Aster Dem and Als Elevation Models. In 3rd ISDE Digital Earth Summit.
Nessebar, Bulgaria.

107

Bowling, E., & Shortridge, A. (2010). A Dynamic Web-based Data Model for Representing
Geographic Points with Uncertain Locations. In N. J. Tate & P. F. Fisher (Eds.),
International Symposium on Spatial Accuracy Assessment in Natural Resources and
Environmental Sciences (pp. 1–4). Leicester.
Brown, J. D., & Heuvelink, G. B. M. (2007). The Data Uncertainty Engine ( DUE ): A software
tool for assessing and simulating uncertain environmental variables. Computers &
Geosciences, 33, 172–190. http://doi.org/10.1016/j.cageo.2006.06.015
Brown, J. D., & Heuvelink, G. B. M. (2008). Data Uncertainty Engine ( DUE ) User’s Manual.
Byrne, J., Heavey, C., & Byrne, P. J. (2010). A review of Web-based simulation and supporting
tools. Simulation Modelling Practice and Theory, 18(3), 253–276.
http://doi.org/10.1016/j.simpat.2009.09.013
Carlisle, B. H. (2005). Modelling the Spatial Distribution of DEM Error. Transactions in GIS, 9(4),
521–540. Retrieved from http://dx.doi.org/10.1111/j.1467-9671.2005.00233.x
Castrignanò, A., Buttafuoco, G., Comolli, R., & Ballabio, C. (2006). Accuracy assessment of
digital elevation model using stochastic simulation. In M. Caetano & M. Painho (Eds.),
7th International Symposium on Spatial Accuracy Assessment in Natural Resources and
Environmental Sciences (pp. 490–498). Lisbon, Portugal. Retrieved from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.5561&rep=rep1&type=pdf
Castronova, A. M., Goodall, J. L., & Elag, M. M. (2013). Models as web services using the Open
Geospatial Consortium (OGC) Web Processing Service (WPS) standard. Environmental
Modelling and Software, 41, 72–83. http://doi.org/10.1016/j.envsoft.2012.11.010
Champine, G. A., Coop, R. D., & Heinselman, R. C. (1980). Distributed Computer Systems:
Impact on Management Design and Analysis. Amsterdam: Elsevier Science Inc.
Chrisman, N. (1983). The Role of Quality Information in the Long-Term Functioning of a
Geographic Information System. Proceedings of the International Symposium on
Automated Cartography, 79–88. http://doi.org/10.3138/7146-4332-6J78-0671
Chrisman, N. R. (1991). The error component in spatial data. Geographical Information
Systems: Principles and Applications, 165–174.
Claessens, L., Heuvelink, G. B. M., Schoorl, J. M., & Veldkamp, A. (2005). DEM resolution
effects on shallow landslide hazard and soil redistribution modelling. Earth Surface
Processes and Landforms, 30(4), 461–477. http://doi.org/10.1002/esp.1155
Coleman, D. J. (1999). Geographical information systems in networked environments. In P. A.
Longley, M. F. Goodchild, D. J. Maguire, & D. W. Rhind (Eds.), Geographical Information
Systems: Principles and Applications (2nd ed., Vol. 1, pp. 317–329). New York: John
Wiley and Sons.

108

Cressie, N. (1990). The origins of kriging. Mathematical Geology, 22(3), 239–252.
http://doi.org/10.1007/BF00889887
Crompvoets, J., Bregt, A., Rajabifard, A., & Williamson, I. (2004). Assessing the worldwide
developments of national spatial data clearinghouses. International Journal of
Geographical Information Science, 18(7), 665–689.
http://doi.org/10.1080/13658810410001702030
Crossland, M. D., Wynne, B. E., & Perkins, W. C. (1995). Spatial decision support systems: An
overview of technology and a test of efficacy. Decision Support Systems, 14(995),
219–235. http://doi.org/10.1016/0167-9236(94)00018-N
Cuartero, a., Polo, M. E., Rodriguez, P. G., Felicisimo, a. M., & Ruiz-Cuetos, J. C. (2014). The
Use of Spherical Statistics to Analyze Digital Elevation Models: An Example From LIDAR
and ASTER GDEM. IEEE Geoscience and Remote Sensing Letters, 11(7), 1200–1204.
http://doi.org/10.1109/LGRS.2013.2288924
Dalle, J., & Jullien, N. (2001). OPEN-SOURCE vs . PROPRIETARY SOFTWARE, (33), 1–16.
Retrieved from http://flosshub.org/sites/flosshub.org/files/dalle2.pdf
Darnell, A. R., Tate, N. J., & Brunsdon, C. (2008). Improving user assessment of error
implications in digital elevation models. Computers, Environment and Urban Systems,
32(4), 268–277. http://doi.org/10.1016/j.compenvurbsys.2008.02.003
de Moel, H., & Aerts, J. C. J. H. (2011). Effect of uncertainty in land use, damage models and
inundation depth on flood damage estimates. Natural Hazards, 58(1), 407–425.
http://doi.org/10.1007/s11069-010-9675-6
De Moel, H., Asselman, N. E. M., & H. Aerts, J. C. J. (2012). Uncertainty and sensitivity analysis
of coastal flood damage estimates in the west of the Netherlands. Natural Hazards and
Earth System Science, 12(4), 1045–1058. http://doi.org/10.5194/nhess-12-1045-2012
Devillers, R., Gervais, M., Bédard, Y., & Jeansoulin, R. (2002). 45 Spatial Data Quality: From
Metadata To Quality Indicators and Contextual End - User Manual, (March 2002), 21–22.
Devillers, R., Stein, A., Bédard, Y., Chrisman, N., Fisher, P., & Shi, W. (2010). Thirty Years of
Research on Spatial Data Quality: Achievements, Failures, and Opportunities.
Transactions in GIS, 14(4), 387–400. http://doi.org/10.1111/j.1467-9671.2010.01212.x
Di, L. (2004a). GeoBrain-A Web Services based Geospatial Knowledge Building System.
Proceeding of NASA Earth Science Technology Conference, 2004 June 22 - 24, Palo
Alto CA USA, 8.
Di, L. (2004b). Distributed Geospatial Information Services-architectures, Standards, and
Research Issues. XXth ISPRS Congress Technical Commission II, 187–193.
Duckham, M. (2002). A User-Oriented Perspective of Error-sensitive GIS Development.
Transactions in GIS, 6(2), 179–193. http://doi.org/10.1111/1467-9671.00104

109

Erdoğan, S. (2010). Modelling the spatial distribution of DEM error with geographically weighted
regression: An experimental study. Computers & Geosciences, 36(1), 34–43.
http://doi.org/10.1016/j.cageo.2009.06.005
ESRI. (1998). ESRI Shapefile Technical Description. Computational Statistics, 16(July),
370–371. http://doi.org/10.1016/0167-9473(93)90138-J
FGDC (1994). The 1994 plan for the national spatial data infrastructure––building the foundation
of an information based society. Washington: FGDC.
Fisher, P. E., & Tate, N. J. (2006). Causes and consequences of error in digital elevation
models. Progress in Physical Geography, 30(4), 467–489. http://doi.org/Doi
10.1191/0309133306pp492ra
Fisher, P. (1998). Improved Modeling of Elevation Error with Geostatistics. GeoInformatica, 2(3),
215–233. http://doi.org/10.1023/A:1009717704255
Fujisada, H., Urai, M., & Iwasaki, A. (2012). Technical Methodology for ASTER Global DEM.
IEEE Transactions on Geoscience and Remote Sensing, 50(10), 3725–3736.
http://doi.org/10.1109/TGRS.2012.2187300
Gesch, D., Oimoen, M., Zhang, Z., Meyer, D., & Danielson, J. (2012). Validation of the Aster
Global Digital Elevation Model Version 2 Over the Conterminous United States.
International Archives of the Photogrammetry, Remote Sensing and Spatial Information
Sciences, XXXIX(September), 281–286.
http://doi.org/10.5194/isprsarchives-XXXIX-B4-281-2012
Goodchild, M. F. (1993). Data models and data quality: problems and prospects. Environmental
Modeling with GIS. Retrieved from http://www.geog.ucsb.edu/~good/papers/192.pdf
Goodchild, M. F., & Gopal, S. (1989). The accuracy of spatial databases. CRC Press.
Goodchild, M. F., Shortridge, A. M., & Fohl, P. (1999). Encapsulating Simulation Models With
Geospatial Data Sets. Spatial Accuracy Assessment: Land Information Uncertainty in
Natural Resources, 1–16.
Goodchild, M. F., & Longley, P. A. (1999). The future of GIS and spatial analysis. In P. A.
Longley, M. F. Goodchild, D. J. Maguire, & D. W. Rhind (Eds.), Geographical information
systems (2nd ed., Vol. 1, pp. 567–580). New York. Retrieved from
http://www.geos.ed.ac.uk/~gisteac/gis_book_abridged/files/ch40.pdf
Granell, C., Díaz, L., & Gould, M. (2010). Service-oriented applications for environmental
models: Reusable geospatial services. Environmental Modelling and Software, 25(2),
182–198. http://doi.org/10.1016/j.envsoft.2009.08.005

110

Granell, C., Díaz, L., Schade, S., Ostländer, N., & Huerta, J. (2013). Enhancing integrated
environmental modelling by designing resource-oriented interfaces. Environmental
Modelling and Software, 39, 229–246. http://doi.org/10.1016/j.envsoft.2012.04.013
GRASS Development Team. (2016). Geographic Resources Analysis Support System (GRASS
GIS) Software, Version 7.0. Retrieved from http://grass.osgeo.org
Guptill, S. C. (1999). Metadata and data catalogues. In P. A. Longley, M. F. Goodchild, D. J.
Maguire, & D. W. Rhind (Eds.), Geographical Information Systems: Principles and
Technical Issues (2nd ed., Vol. 2, pp. 677–692). New York: John Wiley and Sons.
Han, W., Di, L., Zhao, P., & Li, X. (2009). Using Ajax for Desktop-like Geospatial Web
Application Development. In 2009 17th International Conference on Geoinformatics.
Fairfax, VA, USA: IEEE.
http://doi.org/https://doi.org/10.1109/GEOINFORMATICS.2009.5293475
Han, W., Di, L., Zhao, P., & Shao, Y. (2012). DEM Explorer: An online interoperable DEM data
sharing and analysis system. Environmental Modelling & Software, 38(JUNE 2012),
101–107. http://doi.org/10.1016/j.envsoft.2012.05.015
Han, W., Di, L., Zhao, P., Wei, Y., & Li, X. (2008). Design and implementation of geobrain online
analysis system (GeOnAS). Lecture Notes in Computer Science (Including Subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5373 LNCS,
27–36. http://doi.org/10.1007/978-3-540-89903-7_4
Hengl, T., Heuvelink, G. B. M., & Van Loon, E. E. (2010). On the uncertainty of stream networks
derived from elevation data: The error propagation approach. Hydrology and Earth
System Sciences, 14(7), 1153–1165. http://doi.org/10.5194/hess-14-1153-2010
Hengl, T., Heuvelink, G. B. M., & Rossiter, D. G. (2007). About regression-kriging: From
equations to case studies. Computers and Geosciences, 33(10), 1301–1315.
http://doi.org/10.1016/j.cageo.2007.05.001
Hengl, T., & Reuter, H. (2011). How accurate and usable is GDEM? A statistical assessment of
GDEM using LiDAR data. Handbook of Quantitative and Theoretical Geography or
Advances in Quantitative and Theoretical Geography, (July), 000–046. Retrieved from
http://www.geomorphometry.org/HenglReuter2011
Heuvelink, G. B. M., Brown, J. D., & van Loon, E. E. (2007). A probabilistic framework for
representing and simulating uncertain environmental variables. International Journal of
Geographical Information Science, 21(5), 497–513.
http://doi.org/10.1080/13658810601063951
Heuvelink, G. B. M. (2007). ERROR − AWARE GIS AT WORK : REAL − WORLD
APPLICATIONS OF THE DATA UNCERTAINTY ENGINE. International Archives of the
Photogrammetry, Remote Sensing and Spatial Information Sciences, 34.

111

Heuvelink, G. B. M. (1996). Identification of field attribute error under different models of spatial
variation. International Journal of Geographical Information Systems, 10(8), 921–935.
Heuvelink, G. B. M. (1999). Propagation of errors in spatial modelling with GIS. Geographical
Information Systems: Principles and Applications, Vol. 1., 3(4), 303–322.
http://doi.org/10.1080/02693798908941518
Heuvelink, G. B. M., & Burrough, P. A. (1993). Error propagation in cartographic modelling using
Boolean logic and continuous classification. International Journal of Geographical
Information Systems, 7(3), 231–246.
Heuvelink, G. B. M., & Burrough, P. a. (2002). Developments in statistical approaches to spatial
uncertainty and its propagation. International Journal of Geographical Information
Science, 16(2), 111–113. http://doi.org/10.1080/13658810110099071
Hijmans, R. J. (2016). raster: Geographic Data Analysis and Modeling. Retrieved from
https://cran.r-project.org/package=raster
Holmes, K. W., Chadwick, O. A., & Kyriakidis, P. C. (2000). Error in a USGS 30-meter digital
elevation model and its impact on terrain modeling. Journal of Hydrology, 233(1–4),
154–173. http://doi.org/10.1016/S0022-1694(00)00229-8
I. N. S. P. I. R. E. Directive (2007). Directive 2007/2/EC of the European Parliament and of the
Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the
European Community (INSPIRE). Published in the official Journal on the 25th April.
Jing, C., Shortridge, A., Lin, S., & Wu, J. (2014). Comparison and validation of SRTM and
ASTER GDEM for a subtropical landscape in Southeastern China. International Journal
of Digital Earth, 7(12), 969–992. http://doi.org/10.1080/17538947.2013.807307
Karssenberg, D., & De Jong, K. (2005a). Dynamic environmental modelling in GIS: 1. Modelling
in three spatial dimensions. International Journal of Geographical Information Science,
19(5), 559–579. http://doi.org/10.1080/13658810500032362
Karssenberg, D., & De Jong, K. (2005b). Dynamic environmental modelling in GIS: 2. Modelling
error propagation. International Journal of Geographical Information Science, 19(6),
623–637. http://doi.org/10.1080/13658810500032362
Karssenberg, D., Schmitz, O., Salamon, P., de Jong, K., & Bierkens, M. F. P. (2010). A software
framework for construction of process-based stochastic spatio-temporal models and data
assimilation. Environmental Modelling and Software, 25(4), 489–502.
http://doi.org/10.1016/j.envsoft.2009.10.004
Kim, T. J. (1999). Metadata for geo-spatial data sharing: A comparative analysis. Annals of
Regional Science, 33, 171–181. http://doi.org/10.1007/s001680050099

112

Kyriakidis, P. C., Shortridge, A. M., & Goodchild, M. F. (1999). Geostatistics for conflation and
accuracy assessment of digital elevation models. International Journal of Geographical
Information Science, 13(7), 677–707. http://doi.org/10.1080/136588199241067
Li, P., Shi, C., Li, Z., Muller, J.-P., Drummond, J., Li, X., … Liu, J. (2013). Evaluation of ASTER
GDEM using GPS benchmarks and SRTM in China. International Journal of Remote
Sensing, 34(5), 1744–1771. http://doi.org/10.1080/01431161.2012.726752
Li, X., Di, L., Han, W., Zhao, P., & Dadi, U. (2009). Sharing and reuse of service-based
geospatial processing through a Web Processing Service. 2009 17th International
Conference on Geoinformatics, Geoinformatics 2009.
http://doi.org/10.1109/GEOINFORMATICS.2009.5293431
Li, X., Di, L., Han, W., Zhao, P., & Dadi, U. (2010). Sharing geoscience algorithms in a Web
service-oriented environment (GRASS GIS example). Computers and Geosciences,
36(8), 1060–1068. http://doi.org/10.1016/j.cageo.2010.03.004
Li, Z., Yang, C., Huang, Q., Liu, K., Sun, M., & Xia, J. (2014). Building Model as a Service to
support geosciences. Computers, Environment and Urban Systems.
http://doi.org/10.1016/j.compenvurbsys.2014.06.004
Lim, E.-P., Goh, D. H.-L., Liu, Z., Ng, W.-K., Khoo, C. S.-G., & Higgins, S. E. (2002). G-Portal: a
map-based digital library for distributed geospatial and georeferenced resources. … on
Digital Libraries, 351–358. http://doi.org/citeulike-article-id:329152
Logsdon, M. G., Bell, E. J., & Westerlund, F. V. (1996). Probability mapping of land use change:
A GIS interface for visualizing transition probabilities. Computers, Environment and
Urban Systems, 20(6), 389–398. http://doi.org/10.1016/S0198-9715(97)00004-5
Lucchi, R., Millot, M., & Elfers, C. (2008). Resource Oriented Architecture and REST.
Assessment of Impact and Advantages on INSPIRE, Ispra: European Communities,
5–13. http://doi.org/10.2788/80035
MacEachren, A. M., Robinson, A., & Hopper, S. (2005). Visualizing geospatial information
uncertainty: What we know and what we need to know. Cartography and Geographic
Information Science, 32(3), 139–160. http://doi.org/10.1559/1523040054738936
Maguire, D. J., & Longley, P. A. (2005). The emergence of geoportals and their role in spatial
data infrastructures. Computers, Environment and Urban Systems, 29(1 SPEC.ISS.),
3–14. http://doi.org/10.1016/j.compenvurbsys.2004.05.012
Mazzetti, P., Nativi, S., & Caron, J. (2009). RESTful implementation of geospatial services for
Earth and Space Science applications. International Journal of Digital Earth, 2, 40–61.
http://doi.org/10.1080/17538940902866153
Miliaresis, G. C., & Paraschou, C. V. E. (2011). An evaluation of the accuracy of the ASTER
GDEM and the role of stack number: a case study of Nisiros Island, Greece. Remote
Sensing Letters, 2(2), 127–135. http://doi.org/10.1080/01431161.2010.503667

113

Mockus, A. (2002). Two Case Studies of Open Source Software Development : Apache and
Mozilla. Engineering, 11(3), 309–346.
Morris, D. E., Oakley, J. E., & Crowe, J. A. (2014). A web-based tool for eliciting probability
distributions from experts. Environmental Modelling and Software, 52, 1–4.
http://doi.org/10.1016/j.envsoft.2013.10.010
NASA JPL. (2009). ASTER Global Digital Elevation Model [Data set]. NASA JPL.
https://doi.org/10.5067/aster/astgtm.002
Nativi, S., Khalsa, S., Domenico, B., Craglia, M., Pearlman, J., Mazzetti, P., & Rew, R. (2011).
The brokering approach for earth science cyberinfrastructure. EarthCube White Paper.
US NSF [Online].
Nativi, S., Mazzetti, P., & Geller, G. N. (2013). Environmental model access and interoperability:
The GEO Model Web initiative. Environmental Modelling and Software, 39, 214–228.
http://doi.org/10.1016/j.envsoft.2012.03.007
Ni, W., Sun, G., & Ranson, K. J. (2015). Characterization of ASTER GDEM elevation data over
vegetated area compared with lidar data. International Journal of Digital Earth, 8(3),
198–211. http://doi.org/10.1080/17538947.2013.861025
Node.js. (2016). How Uber Uses Node.js to Scale Their Business. Retrieved from
https://nodejs.org/static/documents/casestudies/Nodejs-at-Uber.pdf
OGC. (2015). OGC WPS 2.0 Interface Standard. Open Geospatial Consortium. Retrieved from
https://portal.opengeospatial.org/files/14-065
Oksanen, J., & Sarjakoski, T. (2005). Error propagation analysis of DEM‐based drainage basin
delineation. International Journal of Remote Sensing, 26(14), 3085–3102.
http://doi.org/10.1080/01431160500057947
Oksanen, J., & Sarjakoski, T. (2006). Uncovering the statistical and spatial characteristics of fine
toposcale DEM error. International Journal of Geographical Information Science, 20(4),
345–369. http://doi.org/10.1080/13658810500433891
Pebesma, E. J., & Bivand, R. S. (2005). Classes and methods for spatial data in {R}. R News,
5(2), 9–13. Retrieved from https://cran.r-project.org/doc/Rnews/
Pebesma, E., Cornford, D., Nativi, S., & Stasch, C. (2010). The uncertainty enabled model web
(UncertWeb). CEUR Workshop Proceedings, 679.
Qiu, F., Ni, F., Chastain, B., Huang, H., Zhao, P., Han, W., & Di, L. (2012). GWASS: GRASS web
application software system based on the GeoBrain web service. Computers and
Geosciences, 47, 143–150. http://doi.org/10.1016/j.cageo.2012.01.023

114

R Core Team. (2016). R: A Language and Environment for Statistical Computing. Vienna,
Austria. Retrieved from https://www.r-project.org/
Reuter, H. I., Nelson, a., & Jarvis, a. (2007). An evaluation of void‐filling interpolation methods
for SRTM data. International Journal of Geographical Information Science, 21(9),
983–1008. http://doi.org/10.1080/13658810601169899
Reuter, H. I., Nelson, A., Strobl, P., Mehl, W., & Jarvis, A. (2009). A first assessment of Aster
GDEM tiles for absolute accuracy, relative accuracy and terrain parameters. IEEE
Transactions on Geoscience and Remote Sensing, 240–243.
http://doi.org/10.1109/IGARSS.2009.5417688
Rexer, M., & Hirt, C. (2014). Comparison of free high resolution digital elevation data sets
(ASTER GDEM2, SRTM v2.1/v4.1) and validation against accurate heights from the
Australian National Gravity Database. Australian Journal of Earth Sciences, 61(2),
213–226. http://doi.org/10.1080/08120099.2014.884983
Rinner, C. (2003). Web-based Spatial Decision Support : Status and Research Directions.
Journal of Geographic Information and Decision Analysis, 7(1), 14–31.
Roman, D., Schade, S., Berre, A. J., Bodsberg, N. R., & Langlois, J. (2009). Model as a Service
(MaaS). In AGILE Workshop - Grid Technologies for Geospatial Application.
Salgé, F. (1999). National and international data standards. In P. A. Longley, M. F. Goodchild, D.
J. Maguire, & D. W. Rhind (Eds.), Geographical Information Systems: Principles and
Applications (2nd ed., Vol. 2, pp. 693–706). New York: John Wiley and Sons.
San, B. T., & Suzen, M. L. (2005). Digital elevation model (DEM) generation and accuracy
assessment from ASTER stereo data. International Journal of Remote Sensing, 26(22),
5013–5027. http://doi.org/10.1080/01431160500177620
Seffino, L. A., Medeiros, C. B., Rocha, J. V., & Yi, B. (1999). WOODSS - a spatial decision
support system based on workflows. Decision Support Systems, 27(1), 105–123.
http://doi.org/10.1016/S0167-9236(99)00039-1
Shekhar, S., Coyle, M., Goyal, B., Liu, D., and Sarkar, S. (1997). Data models in geographic
information systems. Communications of the ACM, 40(4), 103–111.
Shortridge, A., & Messina, J. (2011). Spatial structure and landscape associations of SRTM
error. Remote Sensing of Environment, 115(6), 1576–1587.
http://doi.org/10.1016/j.rse.2011.02.017
Shum, A., & Akimov, I. (2015). hashids: Generate Short Unique YouTube-Like IDs (Hashes)
from Integers. Retrieved from https://cran.r-project.org/package=hashids
Sondheim, M., Gardels, K., & Buehler, K. (1999). GIS interoperability. In P. A. Longley, M. F.
Goodchild, D. J. Maguire, & D. W. Rhind (Eds.), Geographical Information Systems (2nd

115

ed., Vol. 1, pp. 347–358). New York: John Wiley and Sons. Retrieved from
http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:GIS+interoperability#1
Steiniger, S., & Hunter, A. J. S. (2012). Free and Open Source GIS Software for Building a
Spatial Data Infrastructure. Geospatial free and open source software in the 21st
century. http://doi.org/10.1007/978-3-642-10595-1_15
Tachikawa, T., Hato, M., Kaku, M., & Iwasaki, A. (2011). CHARACTERISTICS OF ASTER
GDEM VERSION 2.
Tachikawa, T., Kaku, M., & Iwasaki, A. (2011). ASTER GDEM Version 2 Validation Report.
International Geoscience and Remote Sensing Symposium (IGARSS), 1–24.
Tait, M. G. (2005). Implementing geoportals: Applications of distributed GIS. Computers,
Environment and Urban Systems, 29(1 SPEC.ISS.), 33–47.
http://doi.org/10.1016/j.compenvurbsys.2004.05.011
Toutin, T., & Cheng, P. (1999). DEM Generation with ASTER Stereo. EOM Current Issues,
48–51. Retrieved from
http://www.eomonline.com/Common/currentissues/June01/thierry.htm
Unwin, D. J. (1995). Geographical information systems and the problem of “error and
uncertainty.” Progress in Human Geography, 19(4), 549–558.
http://doi.org/10.1177/030913259501900408
Urai, M., Tachikawa, T., & Fujisada, H. (2012). Data Acquisition Strategies for Aster Global Dem
Generation. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information
Sciences, I-4(September), 199–202. http://doi.org/10.5194/isprsannals-I-4-199-2012
Uran, O., & Janssen, R. (2003). Why are spatial decision support systems not used? Some
experiences from the Netherlands. Computers, Environment and Urban Systems, 27(5),
511–526. http://doi.org/10.1016/S0198-9715(02)00064-9
Van Oort, P. A. J., & Bregt, A. K. (2005). Do users ignore spatial data quality? A
decision-theoretic perspective. Risk Analysis, 25(6), 1599–1610.
http://doi.org/10.1111/j.1539-6924.2005.00678.x
Van Oort, P. (2006). Spatial data quality: from description to application. Publications on
Geodesy 60. Retrieved from http://library.wur.nl/WebQuery/wdab/1788022
Verdin, K. L., Godt, J. W., Funk, C., Pedreros, D., Worstell, B., & Verdin, J. (2007). Development
of a Global Slope Dataset for Estimation of Landslide Occurrence Resulting from
Earthquakes. Colorado: U.S. Geological Survey, Open-File Report, 1188, 25.
Veregin, H. (1999). Data quality parameters. Geographical Information Systems: Principles and
Applications, Vol. 1., 177–190. Retrieved from
http://www.geos.ed.ac.uk/~gisteac/gis_book_abridged/files/ch12.pdf

116

von Krogh, G., & von Hippel, E. (2006). The promise of research on open source software.
Management Science, 52(7), 975–983. http://doi.org/10.1287/mnsc.1060.0560
Walker, J. D., & Chapra, S. C. (2014). A client-side web application for interactive environmental
simulation modeling. Environmental Modelling & Software, 55, 49–60.
Wang, R. Y., & Strong, D. M. (1996). Beyond Accuracy : What Data Quality Means to Data
Consumers. Journal of Management Information Systems, 12(4), 5–33.
Wechsler, S. P. (1999). A Methodology for Digital Elevation Model (DEM) Uncertainty
Evaluation: The Effect DEM Uncertainty on Topographic Parameters. ESRI Proceedings
1999. Retrieved from http://downloads2.esri.com/campus/uploads/library/pdfs/6283.pdf
Wechsler, S. P., & Kroll, C. N. (2006). Quantifying DEM Uncertainty and its Effect on
Topographic Parameters. Photogrammetric Engineering & Remote Sensing, 72(9),
1081–1090. http://doi.org/10.14358/PERS.72.9.1081
Welch, R., Jordan, T., Lang, H., & Murakami, H. (1998). ASTER as a source for topographic
data in the late 1990’s. IEEE Transactions on Geoscience and Remote Sensing, 36(4),
1282–1289. http://doi.org/10.1109/36.701078
Wu, S., Li, J., & Huang, G. H. (2008). Characterization and evaluation of elevation data
uncertainty in water resources modeling with GIS. Water Resources Management, 22(8),
959–972. http://doi.org/10.1007/s11269-007-9204-x
Yamaguchi, Y., Kahle, A. B., Tsu, H., Kawakami, T., & Pniel, M. (1998). Overview of advanced
spaceborne thermal emission and reflection radiometer (ASTER). IEEE Transactions on
Geoscience and Remote Sensing, 36(4), 1062–1071. http://doi.org/10.1109/36.700991
Yue, P., Baumann, P., Bugbee, K., & Jiang, L. (2015). Towards intelligent GIServices. Earth
Science Informatics, 8(3), 463–481. http://doi.org/10.1007/s12145-015-0229-z
Zachman, J. A. (1987). A framework for information systems architecture. IBM Systems Journal.
http://doi.org/10.1147/sj.263.0276
Zachman, J. A. (1997). Enterprise architecture: The issue of the century. Database
Programming & Design, 44+.
Zhao, G., Xue, H., & Ling, F. (2010). Assessment of ASTER GDEM performance by comparing
with SRTM and ICESat/GLAS data in Central China. 2010 18th International Conference
on Geoinformatics, (40801186), 1–5.
http://doi.org/10.1109/GEOINFORMATICS.2010.5567970
Zhao, P., Di, L., Han, W., & Li, X. (2012). Building a web-services based geospatial online
analysis system. IEEE Journal of Selected Topics in Applied Earth Observations and
Remote Sensing, 5(6), 1780–1792. http://doi.org/10.1109/JSTARS.2012.2197372

117