BAYESIAN HIERARCHICAL SPATIAL MODELS TO IMPROVE FOREST VARIABLE
PREDICTION AND MAPPING WITH LIGHT DETECTION AND RANGING DATA
SETS
By
Chad Babcock

A THESIS
Submitted
to Michigan State University
in partial fulfillment of the requirements
for the degree of
Geography – Master of Science
2014

ABSTRACT
BAYESIAN HIERARCHICAL SPATIAL MODELS TO IMPROVE FOREST VARIABLE
PREDICTION AND MAPPING WITH LIGHT DETECTION AND RANGING DATA
SETS
By
Chad Babcock
Light Detection and Ranging (LiDAR) data has shown great potential to estimate spatially
explicit forest variables, including above-ground biomass, stem density, tree height, and more.
Due to its ability to garner information about the vertical and horizontal structure of forest
canopies effectively and efficiently, LiDAR sensors have played a key role in the development
of operational air and space-borne instruments capable of gathering information about forest structure at regional, continental, and global scales. Combining LiDAR datasets with
field-based validation measurements to build predictive models is becoming an attractive
solution to the problem of quantifying and mapping forest structure for private forest land
owners and local, state, and federal government entities alike. As with any statistical model
using spatially indexed data, the potential to violate modeling assumptions resulting from
spatial correlation is high. This thesis explores several different modeling frameworks that
aim to accommodate correlation structures within model residuals. The development is motivated using LiDAR and forest inventory datasets. Special attention is paid to estimation
and propagation of parameter and model uncertainty through to prediction units. Inference
follows a Bayesian statistical paradigm. Results suggest the proposed frameworks help ensure model assumptions are met and prediction performance can be improved by pursuing
spatially enabled models.

ACKNOWLEDGMENTS

I would like to thank my adviser Dr. Andrew Finley for his academic guidance over the
past four years. I’d also like to acknowledge my committee members Dr. Ashton Shortridge and Dr. David Macfarlane for their invaluable comments and criticisms of my thesis
work. Hopefully their inputs can help to turn the following thesis chapters to peer reviewed
publications.

iii

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vi

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
CHAPTER 1 OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1
4

CHAPTER 2 MULTIVARIATE SPATIAL REGRESSION MODELS FOR
PREDICTING INDIVIDUAL TREE STRUCTURE VARIABLES USING LIDAR DATA . . . . . . . . . . . . . . . . . .
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Study area and field data . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 LiDAR data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Univariate spatial regression . . . . . . . . . . . . . . . . . . . . . . .
2.3.1.1 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Multivariate spatial regression . . . . . . . . . . . . . . . . . . . . . .
2.3.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Candidate models and implementation . . . . . . . . . . . . . . . . . . . . .
2.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7
7
9
9
10
11
11
13
14
17
19
20
26
28
29

CHAPTER 3 IMPROVING LIDAR BASED PREDICTION OF FOREST BIOMASS USING HIERARCHICAL MODELS WITH
SPATIALLY VARYING COEFFICIENTS . . . . . . . . . . . 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Study Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 Fraser Experimental Forest . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.2 Marcell Experimental Forest . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.3 Glacier Lake Ecosystem Experimental Site . . . . . . . . . . . . . . . 36
3.2.4 Niwot Long Term Ecological Research Site . . . . . . . . . . . . . . . 37
3.2.5 Field Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.6 LiDAR Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1 Modeling Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.2 Candidate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
iv

3.3.3 Fit and Prediction Accuracy
3.4 Results . . . . . . . . . . . . . . . .
3.4.1 FEF Results . . . . . . . . .
3.4.2 MEF Results . . . . . . . .
3.4.3 GLEES Results . . . . . . .
3.4.4 NIWOT Results . . . . . . .
3.5 Disscusion . . . . . . . . . . . . . .
BIBLIOGRAPHY . . . . . . . . . . . .

Assessment
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

46
47
52
57
59
61
65
67

CHAPTER 4 SUMMARY AND FUTURE WORK . . . . . . . . . . . . . . 71

v

LIST OF TABLES

Table 2.1

Non-spatial multivariate regression model parameter estimates. . . . . . .

23

Table 2.2

Spatial multivariate regression model parameter estimates.

. . . . . . . .

24

Table 2.3

Candidate models’ fit and predictive performance. . . . . . . . . . . . . .

25

Table 3.1

Candidate model parameter estimates and 95% credible intervals for FEF.

53

Table 3.2

Candidate model parameter estimates and 95% credible intervals for MEF.

58

Table 3.3

Candidate model parameter estimates and 95% credible intervals for GLEES. 59

Table 3.4

Candidate model parameter estimates and 95% credible intervals for NIWOT. 62

vi

LIST OF FIGURES

Figure 2.1

Location of the Penobscot Experimental Forest (PEF), Maine, is identified with a diamond symbol in the lower-left inset map. Point symbols
within the PEF’s two bounding polygons denote the location of the 19
inventory plots included in the study. Three of these plots with associated tree locations are illustrated in Figure 2.2. . . . . . . . . . . . . . .

11

Figure 2.2

Three of the 19 inventory plots with observed trees denoted with an
open circle symbol and validation holdout trees denoted with a plus symbol. 18

Figure 2.3

Histogram of the spatial multivariate regression model residuals for height. 22

Figure 2.4

Scatter plots of residuals from non-spatial version of the multivariate
candidate model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

Aerial photographs of FEF, MEF, GLEES, and NIWOT with inventory
sub-plot locations highlighted in red. . . . . . . . . . . . . . . . . . . . .

38

The cumulative sums of the standardized eigenvalues (in decreasing order) for FEF, MEF, GLEES, and NIWOT. . . . . . . . . . . . . . . . . .

41

Semivariogram models generated using the residuals from the non-spatial
model estimates for FEF, MEF, GLEES, and NIWOT. . . . . . . . . . .

49

Maps showing predicted AGB means for the full LiDAR extent at FEF,
MEF, GLEES, and NIWOT. . . . . . . . . . . . . . . . . . . . . . . . . .

50

Maps showing predicted AGB 95% credible interval width (a measure
of prediction uncertainty) at FEF, MEF, GLEES, and NIWOT. . . . . .

51

Plots showing fitted versus actual AGB values for FEF’s non-spatial
and SVC models. Vertical grey error bars depict 95% credible intervals
for fitted values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

Maps showing the spatially varying coefficients (βˆx + wˆx ) and associated
95% credible interval width (CIW) for the SVC model at FEF. . . . . .

55

Figure 3.1

Figure 3.2

Figure 3.3

Figure 3.4

Figure 3.5

Figure 3.6

Figure 3.7

vii

Maps showing the spatially varying coefficients βˆ0 + wˆ0 and βˆ2 + wˆ2
zoomed into the extent outlined in red in figures 3.7a and 3.7c. A
quantile interval color classification and smoothing filter are used to
highlight the more subtle spatial adjustments of these parameters. The
sub-plot locations are identified in green. . . . . . . . . . . . . . . . . . .

56

Plots showing fitted versus actual AGB values for MEF’s non-spatial
and SVI models. Vertical grey error bars depict 95% credible intervals
for fitted values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

Figure 3.10 Plots showing fitted versus actual AGB values for GLEES’s non-spatial
and SVI models. Vertical grey error bars depict 95% credible intervals
for fitted values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

Figure 3.11 Plots showing fitted versus actual AGB values for NIWOT’s non-spatial
and SVC-β2 models. Vertical grey error bars depict 95% credible intervals for fitted values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

Figure 3.12 Maps showing the spatially varying coefficients (βˆx + wˆx ) and associated
95% credible interval width (CIW) for the SVC-β2 model at NIWOT. . .

63

Figure 3.13 Maps showing the spatially varying coefficients βˆ0 + wˆ0 and βˆ2 + wˆ2
zoomed into the extent outlined in red in figures 3.12a and 3.12c. A
quantile interval color classification and smoothing filter are used to
highlight the more subtle spatial adjustments of these parameters. The
sub-plot locations are identified in green. . . . . . . . . . . . . . . . . . .

64

Figure 3.8

Figure 3.9

viii

CHAPTER 1
OVERVIEW

Over the past decade, there has been increasing interest in deriving spatially explicit estimates of forest structure variables ranging from the individual tree to landscape level (Brunner (1998); Frescino et al. (2001); Vepakomma et al. (2008); Honey-Ros´es et al. (2011)).
From a forest ecology and management perspective, researchers are often interested in the
spatial configuration of tree canopies, for either studying light penetration to the forest floor,
animal habitat fragmentation, or other tree canopy dependent phenomena. For example,
deGravelles et al. (2014) looked at Baldcypress sapling growth responses to over-story canopy
gap size. Garabedian et al. (2014) used LiDAR predicted forest attributes to map and
analyze red-cockaded woodpecker habitat in South Carolina. Yang et al. (2013) used a
LiDAR reconstructed forest and thermal images to better understand bat flight behavior.
Beyond exploration of small scale questions concerning individual trees or stands, there is
a need within the research community for statistical models appropriate for capturing spatial
and temporal processes at broad scales. For example, concerning climate change, the United
Nations has set forth their Reducing Emissions from Deforestation and forest Degradation
(REDD+) initiative which aims to reduce net greenhouse gas emissions, in part, by encouraging the use of enhanced forest management practices in developing countries. In order
to implement some of the incentive programs proposed in REDD+, methods to accurately
measure and map forest carbon stocks along with associated measures of uncertainty need
to be developed. The National Aeronautics and Space Administration (NASA) has created
the Carbon Monitoring System (CMS) to fund research activities leading to the development of carbon measurement and verification systems in effort to meet the goals set forth
by REDD+.

1

Calibrating LiDAR data with field validation measurements through linear regression
models is often used to accommodate the need for spatial prediction of forest structure
(Means et al. (2000); Montesano et al. (2014); Næsset et al. (2013)). Census collection of
spatially explicit tree data is prohibitively expensive even for small forest areas. As a result,
tree measurements are usually only collected for a subset of the trees in a domain. Given
that most stems, even in relatively small study areas, e.g. 10 ha, are never measured, effective methods are needed to predict forest characteristics at unmeasured locations. Because
LiDAR remote sensing is an effective and efficient way to collect spatially explicit data about
forest height and canopy size, relating variables derived from LiDAR point clouds or waveforms to spatially indexed field measurements can lead to powerful predictive models for
mapping forest structure (Wulder et al. (2008); Lefsky et al. (2002)). Special considerations
are needed when working with spatial data in linear modeling. Often when developing these
types of models, after accounting for LiDAR covariates, extraneous spatially structured variation is present in the residuals. This is a violation of the assumption of independent and
identically distributed errors, which calls into question any inference gained or predictions
carried out using the model.
This thesis details and illustrates Bayesian hierarchical spatial models appropriate for
addressing issues of spatially correlated regression model residuals when calibrating LiDAR
data in a forest setting. Chapter 2 explores the use of multivariate spatial regression modeling
to simultaneously predict tree height, canopy depth, crown radius and stem diameter. Spatial
random effects are used to ensure statistical validity and improve prediction performance.
Chapter 3 also uses spatial random effects to derive spatially varying coefficients models
to predict forest above-ground biomass at four study sites. Results from both Chapters
2 and 3 show that leveraging residual spatial dependence via spatial random effects can
improve predictions of forest structure variables. We see from Chapter 2 that when predicting
multiple, correlated, forest outcome variables, ‘borrowing strength’ both within and among

2

trees via random effects can lead to improved model fit and prediction. Chapter 3 shows
that accommodating LiDAR coefficient non-stationarity with strategically placed spatial
random effects can improve prediction accuracies. Such methods may enjoy expanded use as
researchers are increasingly focusing on modeling small scale environmental processes over
larger domains.

3

BIBLIOGRAPHY

4

BIBLIOGRAPHY

Brunner, A. (1998). A light model for spatially explicit forest stand models. Forest Ecology
and Management, 107, 19 – 46.
deGravelles, W.W., Hutchinson, S., & Conner, W.H. (2014). Survival and growth of suppressed baldcypress reproduction in response to canopy gap creation in a north carolina,
usa swamp. Wetlands, 34, 79–87.
Frescino, T.S., Edwards, T.C., & Moisen, G.G. (2001). Modeling spatially explicit forest
structural attributes using generalized additive models. Journal of Vegetation Science, 12,
15–26.
Garabedian, J.E., McGaughey, R.J., Reutebuch, S.E., Parresol, B.R., Kilgo, J.C., Moorman,
C.E., & Peterson, M.N. (2014). Quantitative analysis of woodpecker habitat using highresolution airborne lidar estimates of forest structure and composition. Remote Sensing
of Environment, 145, 68 – 80.
Honey-Ros´es, J., Baylis, K., & Ramirez, M.I. (2011). A spatially explicit estimate of avoided
forest loss. Conservation Biology, 25, 1032–1043.
Lefsky, M..A., Cohen, W.B., Parker, G.G., & Harding, D.J. (2002). Lidar remote sensing for
ecosystem studies: Lidar, an emerging remote sensing technology that directly measures
the three-dimensional distribution of plant canopies, can accurately estimate vegetation
structural attributes and should be of particular interest to forest, landscape, and global
ecologists. BioScience, 52, 19–30.
Means, J.E., Acker, S.A., Fitt, B.J., Renslow, M., Emerson, L., & Hendrix, C.J. (2000).
Predicting forest stand characteristics with airborne scanning lidar. Photogrammetric
Engineering and Remote Sensing, 66, 1367–1371.
Montesano, P., Nelson, R.F., Dubayah, R.O., Sun, G., Cook, B.D., Ranson, K.J.R., Næsset,
E., & Kharuk, V. (2014). The uncertainty of biomass estimates from lidar and sar across
a boreal forest structure gradient. Remote Sensing of Environment, , –.
Næsset, E., Bolland˚
as, O.M., Gobakken, T., Gregoire, T.G., & St˚
ahl, G. (2013). Modelassisted estimation of change in forest biomass over an 11 year period in a sample survey
supported by airborne lidar: A case study with post-stratification to provide “activity
data”. Remote Sensing of Environment, 128, 299 – 314.
Vepakomma, U., St-Onge, B., & Kneeshaw, D. (2008). Spatially explicit characterization of
boreal forest gap dynamics using multi-temporal lidar data. Remote Sensing of Environment, 112, 2326 – 2340.

5

Wulder, M.A., Bater, C.W., Coops, N.C., Hilker, T., & White, J.C. (2008). The role of lidar
in sustainable forest management. The Forestry Chronicle, 84, 807–826.
Yang, X., Schaaf, C., Strahler, A., Kunz, T., Fuller, N., Betke, M., Wu, Z., Wang, Z.,
Theriault, D., Culvenor, D., Jupp, D., Newnham, G., & Lovell, J. (2013). Study of bat
flight behavior by combining thermal image analysis with a lidar forest reconstruction.
Canadian Journal of Remote Sensing, 39, S112–S125.

6

CHAPTER 2
MULTIVARIATE SPATIAL REGRESSION MODELS FOR PREDICTING
INDIVIDUAL TREE STRUCTURE VARIABLES USING LIDAR DATA

2.1

Introduction

Recent advances in remote sensing, specifically Light Detection and Ranging (LiDAR) sensors, provide detailed data at unprecedented scales. At broad spatial scales, large amounts
of LiDAR data for temperate North American forests are being collected, or will soon
be collected by the U.S. Forest Service (USFS), NASA’s Laser Vegetation Imaging Sensor
(LVIS; https://lvis.gsfc.nasa.gov), National Ecological Observatory Network (NEON) Airborne Observation Platform (Kamper et al., 2010), and upcoming NASA missions. These
high-dimensional data contain information about individual tree and forest structure that
can be related to variables of interest through suitable modeling frameworks. Many studies
couple LiDAR variables with georeferenced forest inventory plot data through parametric or
non-parametric regression techniques, see, e.g., Salas et al. (2010), Dalponte et al. (2009),
Dinuls et al. (2012), Monnet et al. (2011), Niemann et al. (2012), Junttila et al. (2010), and
references therein. The inference garnered through these analyses often supports decisions
with important economic and ecological implications; therefore, it is critical to correctly
estimate uncertainty.
NEON and similar initiatives aim to assimilate high spatial resolution LiDAR data with
intensive forest inventories, e.g., stem maps, to better predict individual tree characteristics (Kamper et al., 2010). Given variable measurements on a subset of trees included in a
stem map and LiDAR variables for all trees, a regression model is used to predict variable
values for all unmeasured trees. Many studies have used regression analysis in this capacity; however, few explicitly accommodate residual spatial dependence, see, e.g., Anderson
7

et al. (2008), Gonzalez et al. (2010), Muss et al. (2011), Tonolli et al. (2011), Popescu et al.
(2011). These non-spatial regression models are adequate in the absence of extraneous structured variation, beyond what is explained by the covariates. However, when observations are
spatially indexed, we might expect similar outcomes, i.e., tree measurements, in proximate
locations, possibly resulting from common environmental conditions or disturbance regimes.
Ignoring this spatial dependence can result in spurious estimates of model parameters and
erroneous predictions. Hoeting (2009) offers a nice discussion on the consequences of not
meeting the assumption of uncorrelated model residuals. A common solution to spatial
dependence among the residuals is to add a spatially varying model intercept via spatial
random effects that account for spatial association through a decreasing function of distance
and perhaps direction between observed locations. Beyond helping to ensure the statistical
validity of the model, the addition of spatial random effects to the intercept allows for partitioning of residual uncertainty into a spatial and non-spatial component which can improve
model fit and predictive performance. Further, from a forest assessment standpoint, we are
often interested in predicting multiple tree variables. These variables are often correlated
within and between individual trees and therefore it is attractive to consider joint models
that are capable of estimating these covariances and leverage them for subsequent prediction
for unmeasured trees.
In this paper, we detail spatial regression models that incorporate information from
LiDAR along with information inherent in the observed forest inventory data to predict
multiple outcome variables. In particular, we define and assess the utility of a multivariate spatial regression model with spatial random effects that accommodate dependence of
outcome variables both within and between tree locations. The remainder of the paper is
organized as follows. The study area and data set used to illustrate the proposed models are
described in Section 2.2. The proposed regression models are detailed in Section 3.3 along
with the associated model validation methods. Details about candidate model implementa-

8

tion and subsequent analysis results are given in Sections 2.4 and 3.4, respectively. Finally,
some concluding remarks with an indication of future work are provided in Section 3.5.

2.2
2.2.1

Materials
Study area and field data

This study was conducted on the 1619 ha Penobscot Experimental Forest (PEF) located in
the towns of Bradley and Eddington, Maine (44◦ 52’ N, 68◦ 38’ W; Figure 1). Mean annual
temperature is 6.2◦ C and mean annual precipitation is 110 cm. The principal soil material
is glacial till and soils range from well-drained loams and sandy loams found on glacial till
ridges to poorly and very poorly drained silt loams found in flatter areas between ridges
(Sendak et al., 2003).
Located within the Acadian forest, the PEF is characterized by a mixture of northern conifer and hardwood species. Conifer species include red (Picea rubens Sarg.), white
(P. glauca (Moench) Voss), black spruce (P. mariana (Mill.) BSP), balsam fir (Abies balsamea (L.) Mill.), eastern hemlock (Tsuga canadensis (L.) Carr.), eastern white pine (Pinus
strobus L.), and northern white-cedar (Thuja occidentalis L.). Hardwood species include
red maple (Acer rubrum L.), paper (Betula papyrifera Marsh.), gray birch (B. populifolia Marsh.), quaking (Populus tremuloides Michx.), and bigtooth aspen (P. grandidentata
Michx.). An operational-scale experiment to compare ten silvicultural treatments was established by USFS researchers on the PEF between 1952 and 1957. The experiment was
designed to investigate the influence of silviculture on the growth, yield, and economics of
eastern spruce-fir (northern conifer) stands.
A network of permanent sample plots (PSPs) was established along transects nested
within each experimental unit, averaging 18 plots per unit. The PSPs consisted of a nested
design with 0.081-, 0.020-, and 0.008 ha circular plots sharing the same plot center. The

9

USFS routinely inventoried these PSPs: all pole- and sawtimber-sized trees ≥ 11.4 cm
diameter at breast height (DBH; measured at 1.37 m above the forest floor) were measured
in the 0.081 ha plot, large saplings (6.4 cm ≤ DBH <11.4 cm) were measured on the 0.020
ha plot, and small saplings (1.3 cm ≤ DBH <6.4 cm) were measured on the 0.008 ha plot.
Since 2000, PSPs have been remeasured by the USFS every 10 years, and, if harvesting
occurred, immediately pre- and post-harvest. Live trees were uniquely numbered in these
plots while experimental unit, plot and tree number, species, DBH, and status (e.g. live
tree versus snag), were recorded. On approximately 25% of the PSPs, all trees were stem
mapped and measured for total height (HT; nearest 0.1 m), height to crown base (HCB;
lowest live branch; nearest 0.1 m), and crown radii (CR; nearest 0.1 m) in four cardinal
directions. From this data, crown length (CL; m) was estimated as the absolute difference
between HT and HCB. For this analysis, a total of 494 trees from 19 plots was used.

2.2.2

LiDAR data

Discrete return LiDAR was acquired over the PEF in the early fall of 2010 (leaf-on) with
a small fixed-wing aircraft. The density of the LiDAR returns was 0.5 pulse hits per m2 .
LiDAR returns within a 5 m radius of each tree centroid were used to calculate a range
of percentile heights to serve as a candidate set of LiDAR covariates. Percentile heights
were calculated at 5% intervals, ranging from 5% to 100%. Exploratory data analysis using
univariate non-spatial regression models identified the 95th percentile height as the best
predictor variable for HT, CL, CR, and DBH. The 50th percentile height covariate also
explained a statistically significant portion of the variability in HT and DBH and was not
highly correlated with the 95th percentile height covariate. These covariates are referred to
as P50 and P95 in the subsequent development.

10

4
3
2
0

1

Northing (km)

0

1

2

3

Easting (km)

Figure 2.1: Location of the Penobscot Experimental Forest (PEF), Maine, is identified with
a diamond symbol in the lower-left inset map. Point symbols within the PEF’s two bounding
polygons denote the location of the 19 inventory plots included in the study. Three of these
plots with associated tree locations are illustrated in Figure 2.2.

2.3
2.3.1

Methods
Univariate spatial regression

Following the data description in Section 2.2, at tree location s ∈ D ⊆

2 , we observe an

outcome variable y(s), e.g., HT, CL, CR, or DBH, along with a p × 1 vector of spatially ref-

11

erenced covariates x(s) derived from the LiDAR data. The outcome variable and covariates
can be associated through a spatial regression model such as,
y (s) = x (s) β + w (s) + (s) ,

(2.1)

where the model residual comprises a spatial process, w(s), and an independent white-noise
process, (s), that captures measurement error or micro-scale spatial variation. With any
collection of n locations, say S = {s1 , . . . , sn }, we assume the independent and identically
distributed (si )’s follow a normal distribution N (0, τ 2 ), where τ 2 is often called the nugget.
The spatial random effects, w(s), provide local adjustment (with structured dependence) to the mean, interpreted as capturing the effect of unmeasured or unobserved covariates with spatial pattern. A popular modeling choice for the spatial random effects
is the Gaussian process, w(s) ∼ GP (0, C(·, ·; θ)), specified by a valid covariance function C(s, s∗ ; θ) = Cov(w(s), w(s∗ )) that models the covariance corresponding to a pair
of locations s and s∗ . The process realizations are collected into a n × 1 vector, say
w = (w(s1 ), . . . , w(sn )) , which follows a multivariate normal distribution M V N (0, Σw ),
where Σw is the n×n covariance matrix of w with (i, j)th element given by C(si , sj ; θ). Specification of C(s, s∗ ; θ) must ensure that the resulting Σw matrix is symmetric and positive
definite. This can be done by using a class of positive definite functions that are characterized
as the characteristic function of a symmetric random variable, see, e.g., Cressie (1993), Chil´es
& Delfiner (2008), and Banerjee et al. (2004). Here, we specify C(s, s∗ ; θ) = σ 2 ρ(s, s∗ ; φ)
where θ = {σ 2 , φ}, ρ(·; φ) is a correlation function, and φ controls the rate of correlation
decay of the surface w(s). Then Var(w(s)) = σ 2 represents a spatial variance component
in the model in (3.1). This specification assumes a stationary and isotropic process. Stationarity, in spatial modeling contexts refers to the setting where the correlation function
only depends on the separation between locations (i.e., translation-invariant). Isotropy goes
further and specifies the dependence through the Euclidean distance between the sites s and

12

s∗ , i.e., s − s∗ (see, e.g., Banerjee et al., 2004, p22).
To complete the Bayesian specification, we assign prior distributions to the model parameters and inference proceeds by sampling from the posterior distributions of the parameters,
see, e.g., Gelman et al. (2004) for further details. Note that the spatial process induces
a M V N (0, σ 2 R(φ)) distribution on w, where R(φ) is the n × n correlation matrix whose
(i, j)th element is ρ(si , sj ; φ). For the remaining parameters, we assign β a multivariate
normal prior, say M V N (µβ , Σβ ) and the spatial variance components σ 2 and τ 2 are assigned inverse-Gamma (IG) priors. The process correlation parameter, φ, is assigned an
informative prior based on characteristics of the underlying spatial domain.
Following notation similar to that used in Gelman et al. (2004), the posterior distribution
of p(β, w, σ 2 , τ 2 , φ | y), where y = (y(s1 ), . . . , y(sn )) , is proportional to
p(φ) × IG(τ 2 | aτ , bτ ) × IG(σ 2 | aσ , bσ )
× M V N (β | µβ , Σβ ) × M V N (w | 0, σ 2 R(φ))
×

n
∏

N (y(si ) | x(si ) β + w(si ), τ 2 ).

(2.2)

i=1

The posterior distribution of each parameter was estimated using a Markov chain Monte
Carlo (MCMC) algorithm. Specifically, β and w were updated using a Gibbs sampler
from their full conditional distributions and the remaining parameters were updated using a
Metropolis-Hastings sampler, where the target distribution for any (set of) parameter(s) is
proportional to the product of the terms in (3.2) that involve that (those) parameter(s). For
convenience, we collect the model parameters into Ω = {β, σ 2 , φ, τ 2 }. The MCMC algorithm
yielded posterior samples Ω and spatial random effects w.

2.3.1.1

Prediction

As detailed in Section 3.1, our interest is in predicting structure characteristics of n0 unmeasured trees using their locations, say S0 = {s0,1 , s0,2 , . . . , s0,n0 }, and associated LiDAR
13

covariates. The posterior predictive distribution of spatial random effects p(w0 | y), where
w0 = (w(s0,1 ), w(s0,2 ), . . . , w(s0,n0 )) , is proportional to
∫
p(w0 | Ω, w, y)p(w | Ω, y)p(Ω | y)dΩw.

(2.3)

Given posterior samples, {Ω(l) , w(l) }L
l=1 this distribution can be obtained via composition
(l)

sampling by drawing w0 from p(w0 | Ω(l) , w(l) , y). The process realizations over the new
locations are conditionally independent of the observed outcomes given the realizations over
the observed locations and the process parameters. That is, p(w0 | w, Ω, y) = p(w0 | Ω, w),
which is a multivariate normal distribution with mean and variance given by
E[w0 | w, Ω] = Cov(w0 , w)Var−1 (w)w = R0 (φ) R(φ)−1 w
and
Var[w0 | w, Ω] = σ 2

{
}
−1
R(φ) − R0 (φ) R(φ) R0 (φ) ,

where R0 (φ) is the n0 × n matrix with (i, j)th element given by ρ(s0,i , sj ; φ). Finally, given
a set of covariates at a new location s0 , samples from the posterior predictive distribution of
(l)
the outcome variable, y(s0 )(l) , are drawn from N (x(s0 ) β (l) + w0 , τ 2(l) ) for l = 1, 2, . . . , L.

2.3.2

Multivariate spatial regression

As described in Section 3.1, given the covariance among trees’ structure outcome variables
it is particularly attractive to specify a joint model capable of estimating and using this covariance to potentially improve prediction. Here, we extend the univariate spatial regression
to consider each tree’s m outcome variables. In our setting m = 4, i.e., HT, CL, CR, and
DBH. Let y(s) = (y1 (s), . . . , ym (s)) be the collection of these outcomes measured at tree
location s. Given such data our model should account for inherent association between the
outcomes within each location and between observations across locations. We consider the

14

following multivariate regression model
y(s) = X(s) β + w(s) + (s).
Here X(s) is an m × p block-diagonal matrix (p =

(2.4)

∑m

l=1 pl ), with l-th diagonal element

being the 1×pl vector xl (s) , and β = (β 1 , . . . , β p ) is a p×1 vector of regression coefficients,
with each β l a pl × 1 vector of regression coefficients corresponding to xl (s) . This specifies
the mean structure that accounts for large scale variation in the outcome. Spatial variation in
the outcome is modeled using a m × 1 vector of random effects w(s) = (w1 (s), . . . , wm (s)) .
Customarily, we assume the unstructured residuals (s), defined analogous to w(s), follow
a multivariate normal distribution with zero mean and an m × m dispersion matrix Ψ.
Spatially structured dependence is introduced in (2.4) through a multivariate (m × 1)
spatial process w(s) ∼ GP (0, C(·, ·; θ)), see, e.g., Cressie (1993), where the cross-covariance
function C(s, s∗ ; θ) is now defined to be the m × m matrix with (i, j)th entry Cij (s, s∗ ) =
Cov(wi (s), wj (s∗ )). As in the univariate case, this model specifies a stationary and isotropic
process.
The cross-covariance function completely determines the joint dispersion structure implied by the spatial process. Specifically, for any n and any arbitrary collection of locations S
the nm×1 vector w = (w(s1 ) , . . . , w(sn ) ) will have the covariance matrix given by Σw (θ),
an nm × nm block matrix whose (i, j)th block is the cross-covariance matrix C(si , sj ; θ),
which is a symmetric and positive definite function.
One possibility for C(s, s∗ ; θ), where θ = {Σ, φ}, is a separable specification, ρ(s, s∗ ; φ)Σ,
where Σ is an m × m covariance matrix between the outcomes and ρ(s, s∗ ; φ) is a spatial
correlation function. This implies Var(w) = R(φ) ⊗ Σ, where ⊗ is the kronecker product
operator and R(φ) was defined previously in Section 2.3.1. A separable spatial association
structure is rather restrictive here, assigning a single spatial correlation parameter φ to be
shared by all the outcomes. This can be inappropriate, e.g., why should all the tree structure

15

variables have the same strength of spatial associations?
More general cross-covariance functions are not routine to specify since they demand
that for any number of locations and any choice of these locations the resulting covariance
matrix for the associated data be positive definite. Various constructions are possible, see,
e.g., Hoef & Barry (1998), Higdon et al. (1999), and Fuentes & Smith (2002).
We motivate an alternative approach as follows. For the separable model, notice that
we can write C(s, s∗ ; θ) = AΘ(s, s∗ ; φ)A where Σ = AA is a Cholesky factorization and
Θ(s, s∗ ; φ) = ρ(s, s∗ ; φ)Im , where Im is the m × m identity matrix. We call Θ(s, s∗ ; φ) the
cross-correlation function which must satisfy Θ(s, s; φ) = Im for all φ = {φ1 , φ2 , . . . , φm }.
This implies that C(s, s) = AA and A identifies with a matrix square-root (e.g., Cholesky)
of C(s, s). For modeling A, without loss of generality one can assume that A = C 1/2 (s, s)
is a lower-triangular square-root; the one-to-one correspondence between the elements of A
and Cw (s, s) is well-known (see, e.g., Harville, 1997, p229). Therefore, A determines the
association between the elements of w(s) within s. Also we adopt a flexible and computationally feasible approach that specifies Θ(s, s∗ ; φ) as a diagonal matrix with ρk (s, s∗ ; φk ),
k = 1, . . . , m as its diagonal elements. This incorporates a set of m correlation parameters, or even m different correlation functions, offering an attractive, easily interpretable and
flexible approach. This approach resembles the linear model of coregionalization (LMC) as
in, e.g., Grzebyk & Wackernagel (1994), Wackernagel (2003), Schmidt & Gelfand (2003),
and Gelfand et al. (2004). See, also, Reich & Fuentes (2007) for a Bayesian nonparametric
adaptation. We will assume this structure in our subsequent development.
Once a cross-covariance function is specified for a multivariate Gaussian process, the
realizations of w(s) over the set of observed locations S is given by M V N (0, Σw (θ)), where
θ = {A, φ} and Σw (θ) is an mn × mn block matrix whose (i, j)th block is the m × m
cross-covariance C(si , sj ; θ), i, j = 1, . . . , n.
Collecting observations from the n locations into an mn×1 vector y = (y(s1 ) , . . . , y(sn ) ) ,

16

the multivariate analogue of (3.2) as

m
∏

p(φk ) × p(A) × M V N (β | µβ , Σβ )

k=1

× p(Ψ) × M V N (w | 0, Σw (θ))
×

n
∏

M V N (y(si ) | X(si ) β + w(si ), Ψ),

(2.5)

i=1

With the appropriate dimensional adjustments, the prior specifications for w, β, and
elements of φ are the same as the univariate case. We assumed cross-covariance matrix AA
and Ψ followed an inverse-Wishart (IW) distribution. Sampling from the parameters’ posterior distributions follows the same algorithm as the univariate case. Upon convergence, the
MCMC output generates L samples, {Ω(l) , w(l) }L
l=1 , where Ω is now {β, A, φ, Ψ}. Subsequent prediction of the spatial random effects w(s0 ) and the outcome y(s0 ) is also analogous
to the univariate setting.

2.3.3

Model selection

Model fit is assessed using a posterior predictive loss function based on simulating independent replicates for each observed outcome. For the multivariate spatial model (2.4), and
each location si we compute, for each i = 1, 2, . . . , n,
∫
p(y rep (si ) | y) =

M V N (y rep (si ) | Xβ + w(si ), Ψ)
× p(Ω, w(si ) | y) dΩ dw(si )

(2.6)

where the conditional distribution of the replicate y rep (si ) given the parameters is simply
the likelihood component corresponding to y(si ). Models that perform well under a decisiontheoretic balanced loss function, i) penalizing both departure of replicated means from their
observed values (lack of fit) and ii) excessive uncertainty in the replicated data, are preferred.
17

Let µrep,i and Σrep,i be the posterior predictive mean and variance for each y rep (si ). Then
using a squared error loss function (e.g., Gelfand & Ghosh, 1998), the measures for these two
∑
2
criteria are evaluated as G = n
i=1 y(si ) − µrep,i , where · is the standard Euclidean
∑
norm, and P = n
i=1 trace(Σrep,i ). Following Gelfand & Ghosh (1998), we used the score

60
50
30

40

Observed tree locations
Holdout tree locations

0

10

20

Northing (m)

70

80

90

D = G + P as a model selection criterion, with lower values of D indicating better models.

0

10

20

30

40

50

60

70

80

90

Easting (m)

Figure 2.2: Three of the 19 inventory plots with observed trees denoted with an open circle
symbol and validation holdout trees denoted with a plus symbol.

Finally, because prediction is our overarching objective, we assessed model predictive
performance using a split-set validation approach. Here, 85% of the observations were used
to estimate candidate models’ parameters and the remaining 15% were used for subsequent
prediction-based validation. Models were ranked based on their holdout set root mean
squared prediction error (RMSPE), where again lower values of RMSPE indicate greater
predictive performance.

18

2.4

Candidate models and implementation

We considered four candidate models for the tree structure data described in Section 2.2: i)
univariate non-spatial regression (3.1) with w = 0; ii) univariate spatial regression (3.1); iii)
multivariate non-spatial regression (2.4) with w = 0, and; iv) multivariate spatial regression
(2.4).
Prior to regression analysis HT, CL, CR, and DBH were log transformed to satisfy the
model assumption that (si )’s are normally distributed. This transformation also ensures a
correct positive support for the outcomes following back transformation.
As detailed in Section 2.3.3, model fit was evaluated using the D criterion. Predictive
performance was assessed using RMSPE calculated using 74 randomly selected trees i.e.,
a 15% holdout set. Candidate model parameters were estimated using the remaining 420
observations. The distribution of observed and holdout trees in three inventory plots are
illustrated in Figure 2.2.
To complete the proposed models’ Bayesian specification, prior distributions were assigned to each parameter. As is customary, a flat prior was assigned to the regression
intercept and slope parameters, i.e., Σβ = 0. The covariance matrices AA and Ψ, were
each assigned an IW (a, b) with the degrees of freedom, a, set to m + 1. For AA and Ψ, the
IW scale matrix, b, was constructed with zeros on the off-diagonal elements and diagonal
elements taken as the partial-sill and nugget values, respectively, from univariate semivariograms fit to the residuals of the non-spatial multivariate model. An exponential correlation
function, ρ(s, s∗ ; φ) = exp(−φ s − s∗ ), was used in the univariate and multivariate models.
The maximum distance between any two trees in the data set is 2479 m. Therefore the
spatial decay parameters, φ’s, were assigned a uniform U (0.0012, 3), which corresponds to
support for an effective spatial range between approximately 1 to 2500 m (i.e., where effective
spatial range is the distance at which the correlation drops to 0.05).

19

The candidate models were coded in C++ and used Intel’s Math Kernel Library threaded
BLAS and LAPACK routines for matrix computations. All analyses were conducted on a
Linux workstation using two Intel Nahalem hyperthreaded quad-Xeon processors. The samplers were run for three chains of 50,000 iterations. The first 10,000 iterations were discarded
as burn-in, following convergence diagnostics detailed in Gelman et al. (2004). Parameter
and predictive inference was based on 120,000 post burn-in samples (i.e., three chains each
of 40,000 iterations). The most computationally intensive model required approximately 3
hours to deliver the three chains.

2.5

Results and Discussion

Figure 2.3 offers histograms of the multivariate spatial regression model residuals and provides evidence that, following the transformation described in Section 2.4, the assumption
about the distribution of residuals is reasonable. Parameter estimates for the multivariate
non-spatial and spatial regression models, i.e., candidate models iii and iv, are provided in
Tables 2.1 and 2.2, respectively. In both tables, subscripts on the regression slope parameters indicate the outcome variable and associated LiDAR covariate. The β·,0 correspond to
the intercept whereas β·,P 50 and β·,P 95 correspond to the 50th and 95th LiDAR height percentile covariate, respectively. All of the non-spatial model slope parameters are significant
at the 0.05 level, i.e., 95% credible intervals do not include zero. This result suggests the
LiDAR covariates explain a substantial portion of the variability in the outcome variables.
A very different conclusion is drawn from the multivariate spatial regression model results,
Table 2.2, which suggest none of the slope parameters associated with the LiDAR covariates
differ from zero. These disparate conclusions occur because we violate the key model assumption of independent and identically distributed residuals when applying the non-spatial
model to these data. The non-spatial model assumes each tree observation contributes a

20

unique portion of information, however, this is not the case – due to strong spatial dependence among measurements between tree locations. The addition of the spatial random
effects help to meet this model assumption by explicitly modeling the residual spatial dependence and hence providing a more accurate summary of covariates’ importance. Comparing
Tables 2.1 and 2.2, apportioning residual variability from the non-spatial model variancecovariance Ψ to the spatial model cross-covariance AA confirms the presence of strong
residual spatial dependence. Further, the estimates of effective spatial range in Table 2.2
suggest there is spatial dependence beyond 0.5 km for all outcomes, even after accounting
for the LiDAR covariates.
Given the assumed exponential correlation function, the effective spatial range associated
with the first outcome variable in the multivariate vector, i.e., y 1 (s), is obtained by solving
ρ(d; φ) = 0.05 for d, i.e., d = − ln(0.05)/φ. However, because of the linear combination
induced by the cross-covariance matrix, the subsequent effective spatial ranges are obtained
by solving a system of equations (see, e.g., Gelfand et al., 2004, p292). For example, the effective spatial range for y 2 (s) is given by solving (a22,1 ρ(d; φ1 ) + a22,2 ρ(d; φ2 ))/(a22,1 + a22,2 ) =
0.05 for d, where a2,1 and a2,2 are the elements of A corresponding to the row and column subscripts. In a similar way, the effective spatial range for y 3 (s) is given by solving
(a23,1 ρ(d; φ1 ) + a23,2 ρ(d; φ2 ) + a23,3 ρ(d; φ3 ))/(a23,1 + a23,2 + a23,3 ) = 0.05 for d. The effective
spatial ranges for y 4 (s), . . . , y m (s) follow the same pattern.
Although not shown, the univariate non-spatial and spatial regression models, i.e., candidate models i and ii, parameter estimates do not differ substantially from their multivariate
counterparts.
Contrary to the spatial model results, one expects the P95 covariate to explain a substantial portion of variability in at least the tree height outcome, i.e., HT. The P95, or
similar upper canopy LiDAR return covariate, has been shown to be well correlated with
tree height in many studies. However, in the current data set there is just not enough unique

21

1200
400

600

Frequency

800

1000

2000
1500
1000

Frequency

0

200

500
0
−0.6

−0.4

−0.2

0.0

0.2

0.4

0.6

0.8

−0.5

0.0

CL residuals

800

Frequency

0

0

200

400

600

1500
1000
500

Frequency

1.0

1000 1200 1400

HT residuals

0.5

−0.5

0.0

0.5

1.0

−1.0

CR residuals

−0.5

0.0

0.5

1.0

DBH residuals

Figure 2.3: Histogram of the spatial multivariate regression model residuals for height.

information to reduce the width of βHT,P 95 ’s credible interval and exclude zero. Given the
estimated range of spatial dependence, we might conclude that tree measurements within
plots are redundant and that even plots in close proximity are not contributing much unique
information. In the end, we are probably only working with an effective sample size of

22

19,

Table 2.1: Non-spatial multivariate regression model parameter estimates.
Parameter

50 (2.5, 97.5) percentiles

βHT,0
βHT,P 50
βHT,P 95
βCL,0
βCL,P 95
βCR,0
βCR,P 95
βDBH,0
βDBH,P 50
βDBH,P 95
ΨHT,HT
ΨHT,CL
ΨHT,CR
ΨHT,DBH
ΨCL,CL
ΨCL,CR
ΨCL,DBH
ΨCR,CR
ΨCR,CL
ΨDBH,DBH

1.67 (1.60, 1.74)
-0.02 (-0.03, -0.01)
0.07 (0.06, 0.08)
1.29 (1.20, 1.38)
0.04 (0.03, 0.04)
-0.10 (-0.20, -0.00)
0.05 (0.04, 0.05)
1.41 (1.30, 1.54)
-0.01 (-0.02, -0.00)
0.09 (0.08, 0.10)
0.07 (0.06, 0.08)
0.06 (0.05, 0.07)
0.06 (0.05, 0.07)
0.10 (0.09, 0.12)
0.15 (0.14, 0.17)
0.11 (0.09, 0.12)
0.13 (0.11, 0.15)
0.18 (0.16, 0.21)
0.17 (0.14, 0.19)
0.25 (0.22, 0.28)

as opposed to the 420 tree observations assumed when applying the non-spatial model.
Model fit and prediction summaries are given in Table 2.3. Here, a lower value of D for
the spatial models compared to that of the non-spatial models suggests that the addition of
spatial random effects improves model fit.
As detailed in Section 3.1, our interest is in the predictive ability of the models. This is
assessed using the RMSPE based on prediction for the 15% holdout set and summarized in
the last four rows of Table 2.3. These results suggest the spatially structured random effects
result in greater predictive accuracy than that obtained from the non-spatial models, i.e.,
candidate models ii and iv have lower RMSPE than their non-spatial counterparts i and iii,
respectively.
There are marginal differences between the RMSPEs obtained from the univariate and
multivariate spatial models. However, with the exception of CL, the multivariate model
achieves a smaller RMSPE for the outcome variables. The lower RMSPE is likely due to

23

Table 2.2: Spatial multivariate regression model parameter estimates.
Parameter
βHT,0
βHT,P 50
βHT,P 95
βCL,0
βCL,P 95
βCR,0
βCR,P 95
βDBH,0
βDBH,P 50
βDBH,P 95
AAHT,HT
AAHT,CL
AAHT,CR
AAHT,DBH
AACL,CL
AACL,CR
AACL,DBH
AACR,CR
AACR,DBH
AADBH,DBH
ρHT,CL
ρHT,CR
ρHT,DBH
ρCL,CR
ρCL,DBH
ρCR,DBH
ΨHT,HT
ΨHT,CL
ΨHT,CR
ΨHT,DBH
ΨCL,CL
ΨCL,CR
ΨCL,DBH
ΨCR,CR
ΨCR,DBH
ΨDBH,DBH

50 (2.5, 97.5) percentiles
2.65 (2.36, 2.98)
0.00 (-0.00, 0.01)
0.00 (-0.01, 0.01)
2.15 (1.69, 2.39)
-0.01 (-0.02, 0.01)
0.97 (0.43, 1.24)
-0.00 (-0.02, 0.01)
3.07 (2.43, 3.54)
0.00 (-0.00, 0.02)
-0.00 (-0.02, 0.01)
0.14 (0.11, 0.17)
0.14 (0.11, 0.15)
0.17 (0.14, 0.18)
0.25 (0.21, 0.27)
0.17 (0.12, 0.24)
0.19 (0.16, 0.26)
0.29 (0.22, 0.34)
0.30 (0.24, 0.34)
0.37 (0.31, 0.42)
0.51 (0.43, 0.60)
0.87 (0.83, 0.89)
0.83 (0.78, 0.85)
0.91 (0.89, 0.95)
0.89 (0.86, 0.93)
0.94 (0.92, 0.96)
0.94 (0.92, 0.97)
0.03 (0.02, 0.03)
0.03 (0.02, 0.03)
0.01 (0.01, 0.02)
0.03 (0.02, 0.04)
0.09 (0.08, 0.10)
0.04 (0.03, 0.05)
0.04 (0.03, 0.05)
0.05 (0.04, 0.06)
0.03 (0.02, 0.04)
0.07 (0.04, 0.09)

Eff. range1 (m)
Eff. range2 (m)

598.22 (276.58, 804.28)

Eff. range3 (m)
Eff. range4 (m)

567.48 (352.99, 704.88)

609.64 (428.73, 740.14)
592.88 (338.04, 772.92)

explicit modeling of the covariance among the random effects via the cross-covariance matrix
AA . The cross-covariance matrix allows the model to borrow strength across the random

24

Table 2.3: Candidate models’ fit and predictive performance.
Univariate

Multivariate

Non-spatial

Spatial

Non-spatial

Spatial

G

–

–

265

75

P

–

–

278

121

D

–

–

543

196

RMSPEHT

3.84

2.60

3.82

2.56

RMSPECL

2.91

2.41

2.88

2.45

RMSPECR

1.06

0.81

1.05

0.76

RMSPEDBH

12.67

8.67

12.73

8.46

effects which can improve prediction. Figure 2.4 shows scatter plots of the univariate nonspatial model residuals, i.e., after accounting for the LiDAR covariates. These plots suggest
several strong positive linear relationships among the residuals, a portion of which is captured
by the AA which can also be viewed as the within tree outcome covariance. In Table 2.2, we
have converted the AA covariance estimates to correlations to simplify interpretation, e.g.,
ρHT,CL = AAHT,CL /(AAHT,HT × AACL,CL ). It is these strong correlations that help
improve predictive performance. Further, explicitly modeling these within tree covariances
can help to ensure a biologically feasible combination of predicted structure variables for new
locations – this is not guaranteed when using m separate univariate models for prediction.
Finally, although it is not an objective of this study, estimates of AA or corresponding
ρ’s can allow one to test hypotheses about the statistical significance of covariation among
outcomes – after accounting for covariates. For example, after accounting for the LiDAR
covariates, all outcomes have positive covariance or correlations that differ significantly from
zero.

25

0.0 0.5

−1.0

0.0 0.5

0.0

1.0 −1.0

HT

0.5

−1.0

CL

1.0
0.0

0.0

1.0

−1.5

−0.5

CR

−1.0

−1.0

DBH

−1.0

0.0 0.5

−1.0

0.0

1.0−1.5

−0.5

0.5

−1.0

0.0

1.0

Figure 2.4: Scatter plots of residuals from non-spatial version of the multivariate candidate
model.

2.6

Summary

In this paper we compared univariate and multivariate regression models with and without
spatially-structured random effects for predicting several tree structure variables. In addition
26

to helping meet the regression model’s assumption of independent and identically distributed
residuals, the results of the PEF data analysis suggest that the addition of univariate or multivariate spatial random effects improved model fit and prediction. In some cases, explicitly
modeling the residual spatial dependence among the outcomes via multivariate random effects resulted in improved inference.
The difference between inference gleaned from the non-spatial and spatial models about
the importance of the LiDAR covariates underscores the need to meet model assumptions
in this and similar studies where residual spatial dependence could be present.
By working in a Bayesian paradigm we have access to the full posterior predictive distribution of each outcome variable at each new location, which facilitates uncertainty assessment.
Access to posterior predictive samples could also be useful in settings where the investigator wishes to propagate uncertainty in outcome variables through subsequent economic or
ecological numerical models. For example, many forest yield models require spatially and
temporally explicit inputs that must be somehow imputed to the region of interest.
There are obviously many alternative approaches to deriving covariates from LiDAR data
and more exhaustive approaches for selecting a subset of covariates to serve in the regression
model. Regardless of the approach used, there is typically strong correlation among the
covariates and hence only a few can be included in the model – without risking issues related
to multicollinearity. Therefore, we do not expect that a more sophisticated covariate selection
approach would result in substantial gains in explained variance over that obtained using the
P50 and P95 covariates. Rather, in practice the addition of spatial random effects generally
results in larger gains in model fit and predictive ability.
Our future research will focus on developing modeling frameworks that automate the
selection of covariates from high-dimensional remotely sensed data. Also, we are considering application of non-stationary cross-covariance functions that allow covariance among
outcomes to vary by location.

27

2.7

Acknowledgment

The work of all authors was supported by the NASA grant 11-CMS11-0021. The third author
was supported by NSF EF-1137309, DMS-1106609, and NASA grant 11-CMS11-0028. This
work was also supported by The National Ecological Observatory Network (NEON) which
is a project sponsored by the NSF grants EF-1029808, EF-1138160, EF-1150319 and DBI0752017 and managed under cooperative agreement by NEON, Inc. Data for this study
were provided by a unit of the Northern Research Station, U.S. Forest Service, located at
the Penobscot Experimental Forest in Maine. The authors would like to thank Michael R.
Saunders for the use of his dissertation data.

28

BIBLIOGRAPHY

29

BIBLIOGRAPHY

Anderson, J.E., Plourde, L.C., Martin, M.E., Braswell, B.H., M. L. Smith, R. O. Dubayah,
M.A.H., & Blair, B.J. (2008). Integrating waveform lidar with hyperspectral imagery for
inventory of a northern temperate forest. Remote Sensing of Environment, 112, 1856–1870.
Banerjee, S., Carlin, B.P., & Gelfand, A.E. (2004). Hierarchical Modeling and Analysis for
Spatial Data. First ed., Chapman and Hall/CRC Press, Bocan Raton, FL.
Chil´es, J.P., & Delfiner, P. (2008). Geostatistics: Modeling Spatial Uncertainty. Second ed.,
Wiley, New York, NY.
Cressie, N.A.C. (1993). Statistics for Spatial Data. Second ed., Wiley, New York, NY.
Dalponte, M., Coops, N.C., Bruzzone, L., & Gianelle, D. (2009). Analysis on the use of
multiple returns lidar data for the estimation of tree stems volume. IEEE Journal of
Selected Topics in Applied Earth Observations and Remote Sensing, 2, 310–318.
Dinuls, R., Erins, G., Lorencs, A., Mednieks, I., & Sinica-Sinavskis, J. (2012). Tree species
identification in mixed baltic forest using lidar and multispectral data. IEEE Journal of
Selected Topics in Applied Earth Observations and Remote Sensing, 5, 594–603.
Fuentes, M., & Smith, R.L. (2002). A new class of nonstationary spatial models. Biometrika,
89, 197–210.
Gelfand, A.E., & Ghosh, S.K. (1998). Model choice: A minimum posterior predictive loss
approach. Biometrika, 85, 1–11.
Gelfand, A.E., Schmidt, A.M., Banerjee, S., & Sirmans, C.F. (2004). Nonstationary multivariate process modelling through spatially varying coregionalization (with discussion).
TEST, 13, 263–312.
Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (2004). Bayesian Data Analysis. Second
ed., Chapman and Hall/CRC Press, Boca Raton, FL.
Gonzalez, P., Asner, G.P., Battles, J.J., Lefsky, M.A., Waring, K.M., & Palace, M. (2010).
Forest carbon densities and uncertainties from lidar, quickbird, and field measurements in
california. Remote Sensing of Environment, 114, 1561–1575.
Grzebyk, M., & Wackernagel, H. (1994). Multivariate analysis and spatial/temporal scales:
Real and complex models, in: Proceedings of the XVIIth International Biometrics Conference, Hamilton, Ontario. pp. 19–33.
Harville, D.A. (1997). Matrix Algebra from a Statisticians Perspective. First ed., Springer,
New York, NY.
30

Higdon, D., Swall, J., & Kern, J. (1999). Non stationary spatial modeling. First ed., Oxford
University Press, Oxford.
Hoef, J.V., & Barry, R. (1998). Modelling crossvariograms for cokriging and multivariable
spatial prediction. Journal of Statistical Planning and Inference, 69, 275–294.
Hoeting, J.A. (2009). The importance of accounting for spatial and temporal correlation in
analyses of ecological data. Ecological Applications, 19, 574–577.
Junttila, V., Kauranne, T., & Lepp¨anen, V. (2010). Estimation of forest stand parameters
from airborne laser scanning using calibrated plot databases. Forest Science, 56, 257–270.
Kamper, T.U., Johnson, B.R., Kuester, M., & Keller, M. (2010). Neon: The first continentalscale ecological observatory with airborne remote sensing of vegetation canopy biochemistry and structure. Journal of Applied Remote Sensing, , 043510.
Monnet, J.M., Chanussot, J., & Berger, F. (2011). Support vector regression for the estimation of forest stand parameters using airborne laser scanning. IEEE Geoscience and
Remote Sensing Letters, 8, 580–584.
Muss, J.D., Mladenoff, D.J., & Townsend, P.A. (2011). A pseudo-waveform technique to
assess forest structure using discrete lidar data. Remote Sensing of Environment, 115, 824
– 835.
Niemann, K.O., Quinn, G., Goodenough, D.G., Visintini, F., & Loos, R. (2012). Addressing
the effects of canopy structure on the remote sensing of foliar chemistry of a 3-dimensional,
radiometrically porous surface. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5, 584–593.
Popescu, S.C., Zhao, K., Neuenschwander, A., & Lin, C. (2011). Satellite lidar vs. small
footprint airborne lidar: comparing the accuracy of aboveground biomass estimates and
forest structure metrics at footprint level. Remote Sensing of Environment, 115, 2786–
2797.
Reich, B.J., & Fuentes, M. (2007). A multivariate semiparametric bayesian spatial modeling
framework for hurricane surface wind fields. The Annals of Applied Statistics, 1, 249–264.
Salas, C., Ene, L., Gregoire, T.G., Næsset, E., & Gobakken, T. (2010). Modelling tree
diameter from airborne laser scanning derived variables: A comparison of spatial statistical
models. Remote Sensing of Environment, 114, 1277 – 1285.
Schmidt, A.M., & Gelfand, A.E. (2003). A bayesian coregionalization model for multivariate
pollutant data. Journal of Geophysics Research-Atmospheres, 108, 8783.
Sendak, P.E., Brissette, J.C., & Frank, R.M. (2003). Silviculture affects composition, growth,
and yield in mixed northern conifers: 40-year results from the penobscot experimental
forest. Canadian Journal of Forest Research, 33, 2116–2128.
31

Tonolli, S., Dalponte, M., Neteler, M., Rodeghiero, M., Vescovo, L., & Gianelle, D. (2011).
Fusion of airborne lidar and satellite multispectral data for the estimation of timber volume
in the southern alps. Remote Sensing of Environment, 115, 24862498.
Wackernagel, H. (2003). Multivariate Geostatistics: An Introduction with Applications.
Third ed., Springer Verlag, Berlin.

©2013 IEEE. Reprinted, with permission, from Chad Babcock, Jason Matney, Andrew

O. Finley, Aaron Weiskittel and Bruce D. Cook, Multivariate Spatial Regression Models for
Predicting Individual Tree Structure Variables Using LiDAR Data, IEEE Journal of Selected
Topics In Applied Earth Observations and Remote Sensing, February 2013.

32

CHAPTER 3
IMPROVING LIDAR BASED PREDICTION OF FOREST BIOMASS USING
HIERARCHICAL MODELS WITH SPATIALLY VARYING COEFFICIENTS

3.1

Introduction

Estimating forest above-ground biomass (AGB), along with other structure variables, using
discrete Light Detection and Ranging (LiDAR) data is an active subject in ecological research. Studies in this area show a strong potential for discrete return LiDAR to be used as
a tool for developing spatially explicit estimates of many forest attributes, including AGB,
either on its own or in conjunction with other remote sensing technologies (see, e.g., Gonzalez
et al., 2010; Sherrill et al., 2008; Lucas et al., 2006; Lim & Treitz, 2004).
Regression models proposed for forest AGB mapping using LiDAR data often do not
explicitly accommodate residual spatial dependence, see, e.g, Anderson et al. (2008), Gonzalez et al. (2010), Muss et al. (2011), Tonolli et al. (2011), and Popescu et al. (2011). A
non-spatial model can be appropriate if all spatially structured variation in the outcome is
accounted for by the covariates used for model fitting; however, this is often an unrealistic
assumption when data are spatially indexed. Considering forest attribute data, it is reasonable to expect similar outcomes for neighboring locations. For example, inventory plots close
together could comprise like tree species and have comparable stem densities due to similar
disturbance histories and environmental conditions. In contrast, inventory plots far apart
are less likely to share common composition or structure attributions. A key assumption of
regression is that model residuals are not correlated. Given the spatially dependent nature
of forest inventory data, the potential for spatially correlated residuals in AGB models is
high. Not accounting for residual spatial dependence can result in falsely precise estimates
of model parameters and erroneous predictions (Hoeting, 2009). For these reasons, it is
33

important to check for spatially structured residuals when spatial data are used to fit AGB
models.
To ensure a model’s statistical validity, spatial dependence among residuals can be accommodated by introducing a spatially varying intercept via the addition of an appropriately
structured random effect. Including spatial random effects that account for spatial association between observed locations can not only help to reduce correlation among residuals
but also improve model fit and prediction accuracy via partitioning error uncertainty to a
spatial and non-spatial component and borrowing information from observed locations to
inform prediction at proximate locations.
In addition to a spatially varying intercept, it may be appealing to allow the regression
parameters associated with the covariates to vary across the study area. For instance, given
the heterogeneity of forest species composition, age classes, and resources (e.g., light, water,
soil characteristics), a single set of regression parameters might not adequately capture the
space varying relationship between forest outcome variables and the covariates. In such
cases, one might attempt to capture these localized effects by including forest characteristics
and/or environmental conditions as covariates in the regression model; however, it is not
always clear which variables should be included and, in many applied settings, the necessary
variables are not available. Rather, localized relationships between the outcome variable and
covariates might be more effectively captured by allowing the regression coefficients to vary
smoothly over the study area.
Such spatially varying coefficient models have received some attention in the statistical
literature (see, e.g., Gelfand et al., 2004; Finley et al., 2009, 2011), and more recently in
applied disciplines (Wheeler & Calder, 2007; Wheeler & Waller, 2009; Finley, 2010). Flexible
specification of these models follows the Bayesian paradigm of statistical inference (see,
e.g., Carlin & Louis, 2000; Gelman et al., 2013; Banerjee et al., 2004), where analysis uses
samples, obtained using Markov chain Monte Carlo (MCMC) methods, from the posterior

34

distributions of model parameters.
In an effort to more fully account for patterns of spatial dependence between AGB and
LiDAR covariates, we assess the utility of a Bayesian hierarchical modeling framework that
accommodates both residual spatial dependence and non-stationarity of model covariates
through the introduction of spatial random effects.
We explore this objective using four forest inventory datasets that are part of the North
American Carbon Program (NACP) each comprising point-referenced measures of AGB and
discrete LiDAR. For each dataset, we consider a set of regression model specifications of
varying complexity. Models are assessed based on fit criteria and predictive performance
using a ten-fold cross-validation approach.
The remainder of the paper evolves as follows. The motivating datasets and field and
LiDAR data processing are detailed in Section 3.2. The proposed modeling framework and
model assessment are described in Section 3.3, followed by analysis results and associated
discussion presented in Section 3.4. Finally, some concluding remarks with an indication of
future work are provided in Section 3.5.

3.2
3.2.1

Study Sites
Fraser Experimental Forest

The Fraser Experimental Forest (FEF) is located in central Colorado (39◦ 4’ N, 105◦ 52’
W) near the town of Fraser. Tree species at FEF consist primarily of Abies lasiocarpa and
Picea engelmannii at higher elevations and Pinus contorta at lower elevations. Climate at
FEF is characterized by cold and relatively long winters, with mean annual temperature
and precipitation of 0◦ C and 737 mm, respectively. FEF experienced a widespread standreplacing fire in approximately 1685. Selective clearcuts were conducted as watershed-scale
manipulations at FEF in the 1950s. Although FEF is currently experiencing mortality due

35

to mountain pine beetle, the field measurements and LiDAR acquisition for this study were
completed prior to beetle infestation.

3.2.2

Marcell Experimental Forest

Located in northern Minnesota, the Marcell Experimental Forest (MEF) consists of mixed
forests that include both upland forests and peatlands. Upland forests are generally dominated by Populus tremuloides and grandidentata, but contain substantial components of
Betula papyrifera, Pinus resinosa, Pinus strobus, and Pinus banksiana. Lowland tree species
include Larix laricina, Picea mariana, Fraxinus nigra, and Thuja occidentalis. Climate at
MEF is subhumid continental, with mean annual precipitation of 785 mm, mean annual
temperature of 3◦ C and air temperature extremes of −46◦ C and 38◦ C. Forests of the Lake
States region experienced widespread logging around the turn of the 20th century, including much of the MEF landscape, and natural disturbances at MEF include windstorms of
variable intensity and rare wildfires.

3.2.3

Glacier Lake Ecosystem Experimental Site

Glacier Lake Ecosystem Experimental Site (GLEES) is located approximately 55 km west of
Laramie, Wyoming at 3190 m elevation (41◦ 22’ N, 106◦ 14’ W). GLEES includes a mosaic of
trees and alpine meadows; the forest is dominated by Picea engelmannii and Abies lasiocarpa.
GLEES has a mean annual temperature of −2◦ C and a mean annual precipitation of 1000
mm, primarily as snow. Tree ages at GLEES suggest either a stand-replacing disturbance
more than 400 years ago with slow recovery, or a series of smaller disturbances over the last
400 years (Bradford et al., 2008).

36

3.2.4

Niwot Long Term Ecological Research Site

Niwot Long Term Ecological Research Site (NIWOT) is located at 3050 m elevation on the
front range of the Rocky Mountains (40◦ 2’ N, 105◦ 33’ W), near the town of Nederland,
Colorado. Tree species include primarily a mix of Abies lasiocarpa, Picea engelmannii and
Pinus contorta with minor components of Pinus flexilis and Populus tremuloides. Mean
annual temperature and precipitation are 4◦ C and 800 mm, respectively. Disturbance history
at NIWOT includes widespread clearcuts between 1900-1910.

3.2.5

Field Data Preparation

Field data at each site were collected using methods similar to the Forest Inventory and
Analysis style plot design (Bechtold & Patterson, 2005). Each plot location consists of four
subplots with radius 8–10 m (depending on site) with one being in the center and 3 others 35
m away from the plot center at 0◦ , 120◦ , and 240◦ . Additional plots consisting of only one
subplot were also used in this analysis. FEF, MEF, NIWOT and GLEES contained 61, 115,
62 and 46 subplots respectively. Figure 3.1 shows the spatial distribution of the plots within
the LiDAR coverage area. Within each subplot, individual tree diameter at breast height
and height measurements for both live and dead trees were taken and used in species specific
allometric equations to estimate AGB (stem, branch and foliage biomass). Additional details
about field measurements and allometric equations used for biomass estimation are available
in Bradford et al. (2010). The individual tree AGB estimates were totaled for each subplot
and converted to AGB Mg/ha. The AGB measurements were then square root transformed
to better approximate a normal distribution before model fitting.

37

0

0

500 m

500 m

(a) FEF

(b) MEF

0

0

500 m

(c) GLEES

500 m

(d) NIWOT

Figure 3.1: Aerial photographs of FEF, MEF, GLEES, and NIWOT with inventory sub-plot
locations highlighted in red.

3.2.6

LiDAR Data Preparation

Height-above-ground measurements for the first return pulses were calculated by subtracting
the point elevations from a digital terrain model constructed from the LiDAR data. LiDAR
return height empirical cumulative distribution curves were constructed over each sub-plot

38

and a percentile height dataset was compiled by extracting heights associated with 5% intervals (i.e., 5%–100%) for each curve. These percentile heights served as an initial set of
regression covariates. Using the same intervals, a percentile height dataset was constructed
for a 20 × 20 meter grid over the LiDAR coverage area and subsequently used to construct
AGB prediction maps. The cell size was chosen because it is approximately the same area
as the observed sub-plots.
Variable selection via dimension reduction was necessary because of high collinearity
among the percentile height covariates. We chose eigen (spectral) decomposition to accomplish this. Eigen decomposition is a matrix factorization technique that is useful for
characterizing patterns in high-dimensional data (Harville, 1997). Eigen decomposition (or
similar orthogonalization techniques) is used widely for data reduction or compression in
statistics, signal processing, pattern recognition, remote sensing, and other fields where
high-dimensional and/or highly correlated data are encountered (Guanter et al., 2012, 2013;
Huang et al., 2014; Wang et al., 2014).
Let A be the sub-plot and B be the grid percentile height sets with observations along
the rows and percentile heights along the columns. Their dimensions are n × p and m × p,
respectively, where n is the number of observations (i.e., sub-plot count) and m is the number
of prediction cells. Let A be the standardized scores for A calculated as follows,
Ai,j =

Ai,j − µj
,
σj

here µj and σ j are the mean and standard deviation of the j-th column of A. A correlation
matrix ρ is constructed for A by,
ρ=

AA
.
n−1

Let
ρ = P ΛP

39

be the eigen decomposition of ρ with the diagonal elements of Λ (p × p) holding the eigenvalues of ρ in decreasing order (off diagonal elements are zero) and P (p × p) holding the
corresponding eigenvectors along the columns. P is an orthogonal transformation matrix
that can be used to project A into a new vector space. Let A∗ = AP be the transformed
standardized scores of the sub-plot percentile height matrix. The columns of A∗ are now
uncorrelated.
To transform the gridded data B to the same vector space as A∗ , B was first standardized
via,
B i,j =

B i,j − µj
.
σj

Note that µj and σ j are the mean are standard deviation calculated from A. Then B ∗ =
BP is then the gridded percentile height data projected into the same vector space as A∗ .
In practice, a subset of columns from A∗ (i.e., principal component scores) can be used as
covariates in regression analysis—referred to as principal components regression (Chatterjee
& Hadi, 2006). This is the approach we pursued here. The columns of A∗ represent a
new, orthogonal (i.e., uncorrelated), candidate covariate set. From this set we retained the
minimum number of covariates that explained at least 80% of the variance in the percentile
height data. This was done by selecting the leftmost columns of A∗ whose standardized
eigenvalues total was > 0.8. To meet this criterion the two leftmost columns of A∗ were
retained for FEF, NIWOT and GLEES. It was necessary to keep three columns for MEF.
Figure 3.2 shows a plot of the cumulative sum of the eigenvalues for each site. We kept the
corresponding columns of B ∗ for each site to use in subsequent grid prediction and mapping.
Some grid cell locations contained covariate values that well exceeded the range of values
used for model fitting. This is a result of a relatively small set of observed field plots
compared to the size and heterogeneity of the study area. To avoid potential issues arising
from model extrapolation (Perrin, 1904), grid cells with values outside the support of the
observed covariate values were removed from the analysis. In the AGB maps that follow,
40

1.0
0.8
0.6
0.4
0.0

0.2

Eigenvalue Cumulative Sum

0.8
0.6
0.4
0.2
0.0

Eigenvalue Cumulative Sum

1.0

the excluded grid cells are shown in black.

5

10

15

20

5

Ordered Eigenvector

15

20

Ordered Eigenvector

0.8
0.6
0.4
0.2
0.0

0.0

0.2

0.4

0.6

0.8

Eigenvalue Cumulative Sum

1.0

(b) MEF

1.0

(a) FEF

Eigenvalue Cumulative Sum

10

5

10

15

20

5

Ordered Eigenvector

10

15

20

Ordered Eigenvector

(c) GLEES

(d) NIWOT

Figure 3.2: The cumulative sums of the standardized eigenvalues (in decreasing order) for
FEF, MEF, GLEES, and NIWOT.

41

3.3
3.3.1

Methods
Modeling Framework

Geostatistical settings typically assume at location s ∈ D ⊆ R2 , where s is a vector of
geographic coordinates in domain D, a Gaussian outcome variable y(s) is modeled using the
regression model,
˜ (s) w(s) + (s).
y(s) = x(s) β + x

(3.1)

Here the linear mean structure that accounts for large scale variation in the outcome is
composed of a p × 1 vector x(s), comprising an intercept and spatially referenced covariates,
˜ (s) is
and an associated column vector of regression coefficients β = (β1 , β2 , . . . , βp ) . The x
a q × 1 vector that includes the intercept and those covariates from x(s) whose impact on
the outcome is posited to vary spatially. This space varying impact is captured through the
vector of spatial random effects w(s) = (w1 (s), w2 (s), . . . , wq (s)) . Various sub-models are
˜ (s) and associated w(s). Customarily, (s)
formed by specifying different combinations of x
is modeled as an independent white-noise process that captures measurement error or microscale variation. With any collection of n locations, say S = {s1 , s2 , . . . , sn }, we assume that
the (si )’s are identically and independently distributed as N (0, τ 2 ), where τ 2 is often called
the nugget.
For this model, spatial structure is typically introduced through a multivariate Gaussian process (GP) (Cressie, 1993; Wackernagel, 2003; Banerjee et al., 2004), where a crosscovariance function explicitly models the covariance of w(s) both within and among locations. This additional flexibility is attractive in many settings (see, e.g., Gelfand et al.,
2004; Banerjee et al., 2011; Finley et al., 2011); however, for the objectives of this study
it is not clear the extra effort will fetch substantial advantages. Rather, for simplicity and
substantially reduced computational demand, we assumed the elements of w(s) arise from
q independent univariate GPs. Specifically, the process associated with the k-th covariate
42

is wk (s) ∼ GP (0, C(·, ·; θ k )) where C(s, s∗ ; θ k ) = Cov(wk (s), wk (s∗ )) is a valid covariance
function that models the covariance corresponding to a pair of locations s and s∗ . The
process realizations are collected into an n × 1 vector, say wk = (wk (s1 ), . . . , wk (sn )) , that
follows a multivariate normal distribution M V N (0, Σk ), where Σk is the n × n covariance
matrix of wk with (i, j)-th element given by C(si , sj ; θ k ). Clearly C(s, s∗ ; θ k ) cannot be
just any function; it must ensure that the resulting Σk matrix is symmetric and positive
definite. Such functions are known as positive definite functions and are characterized as
the characteristic function of a symmetric random variable. Further technical details about
positive definite functions can be found in Cressie (1993), Chil`es & Delfiner (1999), and
Banerjee et al. (2004).
We specify C(s, s∗ ; θ k ) = σk2 ρ(s, s∗ ; φk ) with θ k = {σk2 , φk }, ρ(·; φk ) is a valid spatial
correlation function, where φk quantifies the rate of correlation decay and Var(wk ) = σk2 . For
the subsequent analyses we assumed an exponential correlation function, ρ( s − s∗ ; φk ) =
exp(−φk s − s∗ ), where s − s∗ is the Euclidean distance between the locations s and s∗ .
To complete the Bayesian model specification, we assign prior distributions to the model
parameters and inference proceeds by sampling from the posterior distribution of the parameters. As customary, we assume β follows a M V N (µβ , Σβ ) prior, while the spatial variance components σk2 ’s and the measurement error variance τ 2 are assigned inverse-Gamma,
IG(a, b), priors. The process correlation decay parameters φk ’s follow a Uniform, U nif (a, b),
with support over the geographic range of the study area.
Using notations similar to Gelman et al. (2013), we can write the posterior distribution of
the model parameters as p(Ω | y), where Ω = {β, w1 , w2 , . . . , wq , σ12 , . . . , σq2 , φ1 , . . . , φq , τ 2 }

43

and y = (y(s1 ), . . . , y(sn )) , which is proportional to
q
∏

q
∏

U nif (φk | aφ , bφ ) ×
k
k

k=1

IG(σk2 | aσk , bσk ) × N (β | µβ , Σβ ) × IG(τ 2 | aτ , bτ )×

k=1

×

q
∏

N (wk | 0, Σk ) ×

n
∏

˜ (s) w(s), τ 2 ).
N (y(si ) | x(si ) β + x

i=1

k=1

(3.2)
An efficient MCMC algorithm for estimation of (3.2) is obtained by updating β from
its full conditional then using Metropolis steps for the remaining parameters. Alternatively,
the model can be reparametrized such that the spatial random effects w do not need to
be sampled directly (Banerjee et al., 2008). In either case, the MCMC algorithms yield
posterior samples of Ω.
For predictions, if S0 = {s0,1 , s0,2 , . . . , s0,m } is a collection of m new locations, the posterior predictive distribution of the spatial random effects associated with the k-th regression
coefficient is given by
∫
p(wk,0 | y) ∝

p(wk,0 | wk , Ω, y)p(wk | Ω, y)p(Ω | y)dΩwk ,

(3.3)

where wk,0 = (wk (s0,1 ), wk (s0,2 ), . . . , wk (s0,m )) .
Given L posterior samples, {Ω(l) }L
l=1 , this distribution can be obtained via composition
(l)

(l)

(l)

sampling by first draw wk and then draw wk,0 for each l from p(wk,0 | wk , Ω(l) , y), where
this last distribution is derived as a conditional distribution from a multivariate normal
and, hence, is multivariate normal. More precisely, the process realizations over the new
locations are conditionally independent of the observed outcomes given the realizations over
the observed locations and the process parameters. In other words, p(wk,0 | wk , Ω, y) =
p(wk,0 | wk , Ω), which is a multivariate normal distribution with mean and variance given

44

by
E[wk,0 | wk , Ω] = Cov(wk,0 , wk )Var−1 (wk )wk = R0 (φk ) R(φk )−1 wk
{
}
and Var[wk,0 | wk , Ω] = σk2 R(φk ) − R0 (φk ) R(φk )−1 R0 (φk ) ,
where R0 (φk ) is the n × m matrix with (i, j)-th element given by ρ(s0,i , sj ; φk ) and R(φk ) is
the n × n matrix with (i, j)-th element given by ρ(si , sj ; φk ). This procedure is repeated to
generate samples for all wk ’s. Finally, given a set of covariates at a new location s0 , samples
from the posterior predictive distribution of the outcome variable, y(s0 )(l) , are drawn from
(l)
˜ (s0 ) w0 , τ 2(l) ) for l = 1, 2, . . . , L.
N (x(s0 ) β (l) + x

3.3.2

Candidate Models

Five candidate models were derived from (3.1) and include: non-spatial, where wk ’s are set
to zero; spatially varying intercept (SVI ), where only the spatial random effects associated
with the model intercept are included; the full spatially varying coefficients (SVC ), where all
regression coefficients have associated spatial random effects; SVC-β1 , where spatial random
effects for the intercept and β1 are included, and; SVC-β2 , where spatial random effects
for the intercept and β2 are included. Due to MEF’s additional covariate, four additional
candidate models were tested. SVC-β3 has spatial random effects associated with β0 and β3 ;
SVC-β1 β2 has spatial random effects associated with β0 , β1 , and β2 ; SVC-β1 β3 has spatial
random effects associated with β0 , β1 , and β3 , and; SVC-β2 β3 has spatial random effects
associated with β0 , β2 , and β3 .
Empirical semivariograms fit to the residuals of the non-spatial models (Figure 3.3) were
used to help guide hyperprior specification for the candidate models’ IG and U nif priors.
Specifically, for the variance parameters we set the IG hyperprior a=2, which results in a
prior distribution mean equal to b and infinite variance (see, e.g., IG definition in Gelman
et al. (2013)). Then the b hyperpriors for the models’ τ 2 and σ 2 ’s were set according to the
45

nugget and partial sill of the semivariograms. The prior for the spatial decay parameters,
φ’s, was set to U nif (0.0006, 3) which, assuming the exponential spatial correlation function,
corresponds to support for an effective spatial range between ∼1-5000 m. We define the
effective spatial range as the distance at which the correlation equals 0.05.
The MCMC samplers were implemented in C++ and Fortran and leveraged Intel’s Math
Kernel Library threaded BLAS and LAPACK routines for matrix computations. All analyses
were conducted on a Linux workstation using two Intel Nehalem quad-Xeon processors.
Three MCMC chains were run for 25000 iterations each. The most demanding model
required ∼1 hour to complete a single MCMC chain. Convergence was diagnosed using the
CODA package in R by monitoring mixing of chains and the Gelman-Rubin statistic (Gelman &
Rubin, 1992). Satisfactory convergence was diagnosed within 10000 iterations for all models.
Posterior inference was based on a post burn-in sub-sample of 15000 iterations (5000 from
each chain).

3.3.3

Fit and Prediction Accuracy Assessment

We assessed model performance using the popular Deviance Information Criterion (DIC) to
rank models in terms of how well they fit the data (Spiegelhalter et al., 2002). This criteria
is the sum of the Bayesian deviance (a measure of model fit) and the effective number of
parameters (a penalty for model complexity), D and pD , respectively. Lower values of DIC
indicate better model fit.
Predictive performance was assessed using a ten-fold cross-validation approach. Here,
the full dataset was split into ten roughly equal sized subsets. Then AGB was predicted
for locations within each subset, given parameters estimated from the remaining subsets.
Root mean squared prediction error (RMSPE) was then calculated using the observed AGB
values and corresponding median of the posterior predictive distribution. Lower RMSPE
values signify more accurate predictions.
46

3.4

Results

Our experimental design produced 24 combinations of candidate models and sites. Tables 3.1,
3.2, 3.3, and 3.4 provide summaries of the candidate models’ parameter estimates, fit, and
predictive performance for FEF, MEF, GLEES and NIWOT, respectively. Concerning all
sites non-spatial models, the exclusion of zero from the regression slope parameters’ 95%
credible interval suggests the derived LiDAR covariates explain a substantial portion of
variability in AGB with the exception of NIWOT’s β2 parameter.
The results in Tables 3.1, 3.3, and 3.4 show that for the FEF, GLEES, and NIWOT sites,
the SVI models produce marginally to moderately lower D, DIC, and RM SP E compared
to the non-spatial models. This suggests the addition of a spatial random effect to the model
intercept results in improved fit and predictive performance. Given the clear spatial structure
seen in the exploratory semivariogram plots of the non-spatial models’ residuals (Figure 3.3
(a), (c), and (d)) the SVIs’ improved fit to the data is not surprising. These semivariogram
plots along with the formal assessment of model fit and predictive performance suggest
the non-spatial models violate the assumption of independent and identically distributed
residuals at FEF, GLEES, and NIWOT.
On the other hand, the semivariogram of the non-spatial model residuals for MEF (Figure 3.3 (b)), does not provide strong evidence of any residual spatial correlation. We also see
in Table 3.2 only slight gains in model fit along with a decrease in model prediction accuracy
when comparing the non-spatial and SVI candidates. This suggests the addition of a spatial
random effect to the intercept at MEF is not only unnecessary, but allowing the intercept
to vary spatially causes the model to over-fit the data and hence prediction suffers.
These observations are further corroborated by the candidate models’ parameter estimates provided in Tables 3.1, 3.2, 3.3, and 3.4. The ratio τ 2 /(τ 2 + σ02 ) calculated using
estimates from the SVI model is useful for assessing the need for spatial random effects.

47

The lower the ratio, the greater the spatial structure in the non-spatial models’ residuals
and, hence, need to include the spatially-varying intercept. This ratio is 0.15, 0.13, and 0.29
for FEF, GLEES, and NIWOT, respectively. Compared to MEF’s 0.45 ratio, we can see
that the non-spatial model residuals for FEF, GLEES, and NIWOT contain more spatial
structure.
The effective spatial range is another parameter that describes the strength of a spatial
component. A long spatial range suggests stronger and farther reaching spatial dependence.
We see from Tables 3.1, 3.2, 3.3, and 3.4 that MEF’s SVI spatial range of 29.5 m is much
shorter than the SVI ranges for FEF, GLEES and NIWOT. The long spatial ranges and
low nugget to total variance ratios suggest the spatial random effects are capturing substantial residual spatial structure and the non-spatial models are not appropriate for the FEF,
GLEES, and NIWOT datasets. In contrast, these same measures suggest there is little spatial structure in the MEF non-spatial model residuals and hence spatial random effects may
not be warranted.
The remainder of this section is divided into four subsections, one for each site. In each
subsection we identify which model was selected as the “best” and explore some site specific
results.

48

20

12
0

50

100

150

15
10
5

Model Form = exponential
Nugget = 6.5
Partial Sill = 5.5

0

semivariance

10
8
6

semivariance

0

2

4

Model Form = exponential
Nugget = 0.5
Partial Sill = 11
Effective Range = 70

Effective Range = 25

200

0

20

40

60

distance

80

100

120

distance

(b) MEF

50

100

150

200

4
3

semivariance

2

0
0

250

1

Model Form = exponential

Effective Range = 80

Nugget = 1.75
Partial Sill = 3.5

0

6
4

Model Form = exponential
Nugget = 1
Partial Sill = 6

2

semivariance

5

8

6

(a) FEF

Effective Range = 350
0

distance

500

1000

1500

distance

(c) GLEES

(d) NIWOT

Figure 3.3: Semivariogram models generated using the residuals from the non-spatial model
estimates for FEF, MEF, GLEES, and NIWOT.

49

600
350

500
300

250

400

200
300

150

200
100

100

50

(a) FEF

(b) MEF

250
500
200
400
150

300

100

200

50

100

(c) GLEES

(d) NIWOT

Figure 3.4: Maps showing predicted AGB means for the full LiDAR extent at FEF, MEF,
GLEES, and NIWOT.

50

1600
600
1400
500
1200

400

1000

800
300

600
200
400

100

200

(a) FEF

(b) MEF

1200
250
1000
200
800
150

600
400

100

200
50

(c) GLEES

(d) NIWOT

Figure 3.5: Maps showing predicted AGB 95% credible interval width (a measure of prediction uncertainty) at FEF, MEF, GLEES, and NIWOT.

51

3.4.1

FEF Results

At FEF the full SVC model yields the lowest D, DIC, and RM SP E. The SVC model
shows a 83% improvement in DIC (model fit) compared to the non-spatial model. This
improvement is clearly seen when comparing Figures 3.6a and 3.6b. Figure 3.6b shows
high correspondence between the observed and fitted values for the SVC model, where as
in Figure 3.6a, the fitted values are scattered further from the one-to-one line. It is also
evident that the credible intervals for the SVC fitted values are much tighter than for the
non-spatial fitted values. There is a 9% improvement in RM SP E (prediction accuracy)
when moving from the non-spatial to SVC model. Figures 3.4a and 3.5a show maps of SVC
model predicted biomass and prediction uncertainty, respectively. Most of the predictions in
Figure 3.4a fall within the range of observed AGB (0 – 430 Mg/ha); however, those pixels
that exceed this observed range exhibit high associated uncertainties (Figure 3.5a).
In addition to the regression coefficient estimates given in Table 3.1, it is also instructive to look at surfaces of βk (s) = βk + wk (s) which provide insight into the space-varying
relationship between the k-th covariate and AGB. For example, the β1 (s) coefficient map
(Figure 3.7b) shows clear spatial structure and suggests the associated LiDAR covariate explains more variability in AGB in the northern regions of the study area, i.e., β1 (s) is further
from zero in the north. Compared to β1 (s), random effects associated with β0 (s) and β2 (s)
have much shorter effect ranges (Table 3.1) which make their adjustments less noticeable at
the scale shown in Figure 3.7. In regions far from inventory sub-plot locations the random
effects on β0 and β2 retreat to zero. By zooming into a region with a high concentration
of sub-plot inventories, switching from an equal to quantile interval color classification, and
passing the predictions of the β(s) s through a smoothing filter we can highlight the more
subtle spatial adjustments to β0 and β2 . Figures 3.8a and 3.8b depict β0 (s) and β2 (s) at
the extents shown by the red outline in Figures 3.7a and 3.7c, respectively. These surfaces
show how the global estimates of β0 and β2 , given in Table 3.1 as ∼11.28 and ∼0.91 respec52

tively, are influenced by their corresponding spatial random effects to better accommodate
the non-stationary relationship between the covariates and AGB. These adjustments, along
with the increases in prediction accuracy and model fit support the use of the SVC model
for FEF.
Table 3.1: Candidate model parameter estimates and 95% credible intervals for FEF.

(2.5%, 97.5%)

50%

parameter C.I.

Non-Spatial

SVC-β1

11.16

(10.49, 11.80)

11.33

(10.44, 12.27)

11.58

β1

-1.08

(-1.25, -0.92)

-1.06

(-1.23, -0.88)

β2

1.10

(0.70, 1.51)

(0.57, 1.44)

τ2

6.49

(4.62, 9.58)

1.01
0.89

(0.32, 3.70)

3/φ0

—

3/φ1

—

—

3/φ2

—

—

σ02

—

σ12

—

σ22
fit
statistics

SVI

β0

D
pD
DIC
RMSPE

121.45

5.18

(20.05, 284.91)

(2.51, 8.71)

—

SVC-β2

SVC

(10.88, 12.30)

11.41

(10.69, 12.05)

11.28

-1.04

(-1.91, 0.17)

-0.97

(-1.16, -0.78)

-0.81

(-1.76, 0.12)

0.77

(0.35, 1.22)

(0.63, 3.51)

0.91

(0.35, 1.41)

3.38
21.14
2380.95

(1.61, 5.70)
(10.29, 241.36)

1.46
2.36
31.29

—

(160.09, 4838.71)

—
0.48

(0.21, 1.73)

0.66

(0.29, 1.91)

—

—

—

293.21

93.13

248.42

(0.79, 4.83)
(10.37, 1612.90)

194.74
0.53

(10.82, 2542.37)
(0.21, 2.19)

—
2.28

(0.80, 7.99)
224.71

0.07
20.62
2400.00
21.63

(10.69, 11.90)

(0.02, 0.75)
(10.27, 156.90)
(302.72, 4918.03)
(10.25, 185.87)

0.48

(0.20, 1.19)

0.74

(0.33, 1.95)

1.64

(0.89, 3.09)
22.43

3.93

38.24

24.17

29.57

25.71

297.14

131.37

272.59

254.28

48.14

71.70

69.44

71.16

74.88

65.27

53

600
500
400
300

Fitted Biomass (Mg/ha)

0

100

200

600
500
400
300
200
0

100

Fitted Biomass (Mg/ha)

0

100

200

300

400

0

Actual Biomass (Mg/ha)

100

200

300

400

Actual Biomass (Mg/ha)

(a) non-spatial

(b) SVC

Figure 3.6: Plots showing fitted versus actual AGB values for FEF’s non-spatial and SVC
models. Vertical grey error bars depict 95% credible intervals for fitted values.

54

11.7

0.0
2.0

11.6

−0.2
11.5

−0.4
11.4

1.5
−0.6

11.3
−0.8
11.2
1.0
−1.0
11.1
−1.2
11.0
0.5
−1.4
10.9
−1.6

(a) βˆ0 + wˆ0

(b) βˆ1 + wˆ1

(c) βˆ2 + wˆ2
4.0

3.8
6.0

3.5
3.6

5.5
3.0
3.4

2.5
5.0
3.2

2.0
4.5

3.0
1.5

2.8
4.0
1.0

(d) βˆ0 + wˆ0 95% CIW

(e) βˆ1 + wˆ1 95% CIW

(f) βˆ2 + wˆ2 95% CIW

Figure 3.7: Maps showing the spatially varying coefficients (βˆx + wˆx ) and associated 95%
credible interval width (CIW) for the SVC model at FEF.

55

2.0

11.5
11.4

1.5

11.3
11.2

1.0
11.1
11.0
0.5
10.9

(a) βˆ0 + wˆ0

(b) βˆ2 + wˆ2

Figure 3.8: Maps showing the spatially varying coefficients βˆ0 + wˆ0 and βˆ2 + wˆ2 zoomed into
the extent outlined in red in figures 3.7a and 3.7c. A quantile interval color classification
and smoothing filter are used to highlight the more subtle spatial adjustments of these
parameters. The sub-plot locations are identified in green.

56

3.4.2

MEF Results

As noted previously, of the eight spatial models fit to the MEF dataset none out performed
the non-spatial model with regard to prediction accuracy and only the SVI model shows an
improvement in DIC over the non-spatial model (Table 3.2).
Figures 3.4b and 3.5b show the non-spatial model AGB prediction and associated uncertainty for the extent of the LiDAR data. A large number of grid cells in the southwest
corner of MEF were removed from the prediction dataset because their covariate values far
exceeded those seen in the observed dataset, see Section 3.2.6. We also see that grid cells in
this area that were included in the prediction set have high uncertainties. The aerial photo
in Figure 3.1b suggests that the forest in this region of MEF shows very different qualities

600
500
400
300
0

100

200

Fitted Biomass (Mg/ha)

500
400
300
200
0

100

Fitted Biomass (Mg/ha)

600

than the areas where field data were collected.

0

100

200

300

400

500

0

Actual Biomass (Mg/ha)

100

200

300

400

500

Actual Biomass (Mg/ha)

(a) non-spatial

(b) SVI

Figure 3.9: Plots showing fitted versus actual AGB values for MEF’s non-spatial and SVI
models. Vertical grey error bars depict 95% credible intervals for fitted values.

57

Table 3.2: Candidate model parameter estimates and 95% credible intervals for MEF.

9.10

β1

-1.29

(8.45, 9.73)
(-1.49, -1.09)

β2

0.89

β3

-1.20

(-1.67, -0.74)

τ2

11.56

(8.99, 15.36)

-1.29

(-1.48, -1.09)
(0.61, 1.15)

SVC-β1
9.00
-1.23
0.90

(8.30, 9.69)
(-1.50, -0.93)
(0.57, 1.23)

(2.5%, 97.5%)

0.88

(8.43, 9.73)

50%

(0.61, 1.15)

Spatial
9.08

3/φ0
3/φ1

—

—

3/φ2

—

—

—

3/φ3

—

—

—

fit
statistics

.

parameter C.I.

Non-Spatial
β0

—

σ02

—

σ12

—

-1.18
4.98
29.50

5.98

(-1.64, -0.71)
(1.24, 11.25)
(12.04, 192.98)

(1.31, 11.37)

—

-1.26
10.51

(-1.84, -0.70)
(7.75, 14.37)

20.33

(10.26, 284.90)

24.52

(10.32, 1612.90)

0.36

(0.17, 1.02)

0.36

(0.19, 0.69)

SVC-β2
9.12
-1.28
0.83
-1.25
10.15

SVC-β3
9.09

(8.44, 9.80)

-1.28

(-1.51, -1.06)

0.86

(0.35, 1.29)

-1.21

(-1.83, -0.68)

10.30

(7.55, 13.75)

21.51

(10.28, 607.29)

22.33

(10.31, 699.30)

22.44

—
—
0.36

19.52
0.36

(0.17, 0.90)

—

(8.45, 9.73)
(-1.49, -1.06)
(0.53, 1.18)
(-1.85, -0.59)
(7.67, 13.93)

SVC-β1 β2
9.01
-1.22
0.84
-1.34
9.52

(8.27, 9.75)
(-1.51, -0.91)
(0.30, 1.38)
(-2.03, -0.64)
(6.67, 13.44)

SVC-β1 β3
8.98
-1.22
0.88
-1.29
9.27

(8.27, 9.69)
(-1.54, -0.84)
(0.48, 1.25)
(-2.00, -0.60)
(6.92, 13.59)

20.30

(10.29, 332.59)

21.23

(10.29, 417.25)

—

25.30

(10.32, 1363.64)

32.31

(10.36, 2439.02)

—

21.35

(10.29, 518.13)

(10.29, 928.79)

—

(10.25, 276.50)

—
20.07

(10.25, 323.62)

0.36

(0.17, 1.01)

0.36

(0.17, 0.96)

—

0.39

(0.21, 0.80)

0.39

(0.20, 0.80)

—

0.70

(0.32, 1.71)

1.08

(0.41, 3.23)

(0.17, 1.01)

SVC-β2 β3
9.09
-1.27
0.82
-1.29
9.85

(8.39, 9.79)
(-1.51, -1.03)
(0.33, 1.30)
(-2.00, -0.57)
(7.16, 13.58)

SVC
9.00
-1.21
0.83
-1.38
9.10

(8.22, 9.75)
(-1.52, -0.91)
(0.27, 1.38)
(-2.18, -0.60)
(6.19, 13.07)

21.02

(10.29, 572.52)

22.06

(10.32, 584.80)

22.29

(10.32, 390.12)

19.59

(10.26, 386.25)

19.76

(10.27, 250.63)

—

0.37

(0.17, 1.05)

—

20.96

(10.27, 466.56)

25.08

(10.31, 594.06)

0.36

(0.17, 0.92)

0.40

(0.21, 0.79)

σ22

—

—

—

σ32

—

—

—

—

D

608.41

501.53

600.15

595.76

597.53

588.21

590.97

592.34

pD

4.95

41.29

28.61

22.85

19.05

41.54

37.34

29.88

47.55

613.37

542.82

628.77

618.61

616.59

629.74

628.31

622.22

630.60

66.89

67.46

67.53

70.81

68.23

72.41

68.44

73.59

75.28

DIC
RMSPE

0.65

(0.30, 1.52)

0.99

58

(0.37, 2.75)

—

—

0.64

(0.28, 1.52)

0.70

(0.31, 1.78)

0.94

(0.38, 2.77)

1.07

(0.41, 3.10)
583.05

3.4.3

GLEES Results

The SVI model at GLEES has the lowest RM SP E value of the five models (Table 3.3);
however, we see lower D and DIC values in all three SVC variant models. Again the
increased flexibility afforded by the additional spatial random effects in the SVC models
improves the fit but may cause prediction accuracy to suffer due to over-fitting. For this
reason the SVI model is preferred for the GLEES dataset. We see a 33% improvement in
DIC and a 4.5% increase in prediction accuracy moving from the non-spatial to the SVI
model. Figure 3.10b shows stronger one-to-one correspondence between fitted and observed
AGB values comparing to the models non-spatial counterpart in Figure 3.10a.
Table 3.3: Candidate model parameter estimates and 95% credible intervals for GLEES.

(2.5%, 97.5%)

50%

parameter C.I.

Non-Spatial

SVC-β1

SVC-β2

SVC

8.27

β1

-0.62

β2

0.66

(0.37, 0.96)

0.67

(0.37, 0.99)

0.86

(0.49, 1.23)

0.74

(0.14, 1.29)

0.87

τ2

4.10

(2.75, 6.63)

0.53

(0.19, 2.01)

0.40

(0.10, 1.67)

0.32

(0.10, 1.22)

0.25

(7.66, 8.86)
(-0.79, -0.44)

3/φ0

—

3/φ1

—

3/φ2

—

σ02

—

σ12

—

σ22
fit
statistics

SVI

β0

D
pD
DIC
RMSPE

8.40
-0.63

44.11

(7.76, 9.14)
(-0.81, -0.45)

(12.90, 173.16)

—

8.31
-0.61

(1.95, 5.88)

—

(-0.89, -0.31)

22.85

(10.31, 125.63)

33.88

(10.46, 527.24)

—
3.45

(7.61, 9.03)

—
0.98

(0.35, 2.66)

0.44

(0.24, 0.88)

—

—

—

196.30

103.69

90.35

7.94
-0.61

20.96

(7.43, 8.51)
(-0.77, -0.46)

(10.27, 151.98)

—
58.09
0.71

(10.56, 552.49)
(0.29, 1.77)

—
1.11

(0.54, 2.48)
79.59

8.06
-0.62

(7.38, 8.74)
(-0.96, -0.34)
(0.33, 1.45)
(0.08, 0.95)

20.19

(10.27, 198.41)

29.80

(10.37, 1020.50)

28.21

(10.39, 643.78)

0.53

(0.24, 1.33)

0.39

(0.21, 0.85)

0.78

(0.35, 1.90)
67.79

3.90

29.79

27.58

30.25

33.74

200.20

133.48

117.93

109.85

101.53

35.79

34.21

35.63

35.02

35.20

59

300
200
150
100
0

50

Fitted Biomass (Mg/ha)

250

300
250
200
150
100
50
0

Fitted Biomass (Mg/ha)

0

50

100

150

0

Actual Biomass (Mg/ha)

50

100

150

Actual Biomass (Mg/ha)

(a) non-spatial

(b) SVI

Figure 3.10: Plots showing fitted versus actual AGB values for GLEES’s non-spatial and
SVI models. Vertical grey error bars depict 95% credible intervals for fitted values.

60

3.4.4

NIWOT Results

The SVC-β2 model provides the best fit and prediction for the NIWOT dataset. This model
yields a 24% lower DIC and 7% lower RM SP E value compared to the non-spatial model. As
with GLEES, the more complex SVC model provides better fit, but the increased flexibility
results in decreased predictive power. Comparing Figure 3.11a and 3.11b reveals greater
one-to-one correspondence and tighter credible intervals for the SVC-β2 over that of the
non-spatial model.
Predicted biomass and associated uncertainty maps for NIWOT are provided in Figures 3.4d and 3.5d, respectively. As in GLEES and FEF, AGB values predicted outside the
range of observed AGB have the largest associated uncertainties. Figures 3.12a and 3.12c
show the space varying impact of w0 and w2 on β0 and β2 , respectively. Zooming into the
region of observed data and applying a similar smoothing filter as used for the FEF surfaces,
we can again see local parameter adjustment to accommodate the non-stationary impact of
the covariates on AGB (Figure 3.13). As mentioned previously, β2 for NIWOTs’ non-spatial
model is insignificant. When β2 is allowed to vary across the domain, as in SVC-β2 , we
see that even though globally insignificant, in local regions the covariate can explain AGB
variability and contribute to an overall increase in prediction accuracy.

61

Table 3.4: Candidate model parameter estimates and 95% credible intervals for NIWOT.

β1
β2
τ2

SVI

(11.83, 12.85)

12.35

-0.82

(-0.95, -0.68)

0.21

(-0.04, 0.49)

4.01

(2.82, 5.98)

12.35

-0.77

(-0.93, -0.60)

0.21

(-0.05, 0.48)

1.29

3/φ1

—

—

3/φ2

—

—

σ02
σ12
σ22

—

pD
DIC

2.66

(13.64, 1444.64)

12.19

-0.84

(-1.17, -0.17)

0.31

(-0.01, 0.62)

(10.52, 1578.95)

50.58

(10.40, 2097.90)

—

(0.78, 5.07)

1.19

(0.34, 3.21)

0.37

(0.20, 1.03)

—

—

—

—

—

262.40

189.65

177.34

SVC

(11.38, 12.85)

12.21

-0.77

(-0.93, -0.59)

-0.84

(-1.22, -0.29)

0.23

(-0.52, 0.78)

0.34

(-0.17, 0.82)

0.99

(0.21, 3.14)

137.24

(11.40, 1167.32)

45.68

(10.45, 2343.75)

—
1.74

(0.41, 3.87)

—
0.74

(0.32, 2.03)

0.66

(11.47, 13.06)

(0.20, 2.10)

42.98

(10.43, 1401.87)

77.34

(10.46, 2419.35)

22.29

(10.30, 669.64)

0.87

(0.30, 2.51)

0.38

(0.20, 1.00)

0.78

172.77

(0.33, 1.90)
150.60

27.65

34.78

28.97

41.52

217.29

212.12

201.74

192.12

50.05

48.27

49.31

46.31

49.40

500
400
300
0

0

100

200

300

400

Fitted Biomass (Mg/ha)

500

600

3.93
266.32

100

Fitted Biomass (Mg/ha)

(0.25, 2.76)

60.19

600

RMSPE

116.58

SVC-β2

(11.42, 13.01)

1.09

(0.36, 3.74)

—

D

SVC-β1

(11.38, 13.18)

200

(2.5%, 97.5%)

50%

12.34

3/φ0

fit
statistics

parameter C.I.

Non-Spatial
β0

0

100

200

300

400

0

Actual Biomass (Mg/ha)

100

200

300

400

Actual Biomass (Mg/ha)

(a) non-spatial

(b) SVC-β2

Figure 3.11: Plots showing fitted versus actual AGB values for NIWOT’s non-spatial and
SVC-β2 models. Vertical grey error bars depict 95% credible intervals for fitted values.

62

13.5

6.5

13.0

6.0

12.5

5.5

12.0
5.0
11.5
4.5

11.0
10.5

4.0

10.0

3.5

(a) βˆ0 + wˆ0

(b) βˆ0 + wˆ0 95% CIW

0.6
4.0
0.4
3.5

0.2
0.0

3.0
−0.2
2.5

(c) βˆ2 + wˆ2

(d) βˆ2 + wˆ2 95% CIW

Figure 3.12: Maps showing the spatially varying coefficients (βˆx + wˆx ) and associated 95%
credible interval width (CIW) for the SVC-β2 model at NIWOT.

63

13.5
0.6
13.0
0.4

12.5
12.0

0.2
11.5
0.0

11.0
10.5

−0.2
10.0

(a) βˆ0 + wˆ0

(b) βˆ2 + wˆ2

Figure 3.13: Maps showing the spatially varying coefficients βˆ0 + wˆ0 and βˆ2 + wˆ2 zoomed into
the extent outlined in red in figures 3.12a and 3.12c. A quantile interval color classification
and smoothing filter are used to highlight the more subtle spatial adjustments of these
parameters. The sub-plot locations are identified in green.

64

3.5

Disscusion

For three of the four study sites, the non-spatial model residuals exhibit enough spatial correlation to warrant the use of a spatial random effect on the intercept. The presence of these
serial correlations among the residuals suggest the assumptions of the non-spatial model
are violated and the validity of subsequent inference is questionable. Further, this suggests
that beyond meeting model assumptions, explicitly accommodating the residual dependence
via spatially structured random effects might fetch improved prediction by borrowing information from proximate observed locations. This was illustrated in the analysis of the FEF,
GLEES, and NIWOT datasets, where the SVI models outperformed the non-spatial models.
Uncertainty quantification is an important component of carbon Measurement, Reporting, and Verification systems (MRVs) like those called for by the United Nations Collaborative Programme on Reducing Emissions from Deforestation and Forest Degradation in
Developing Countries and actively being developed by NASA’s Carbon Monitoring System
Science Team (UN-REDD, 2009; CMS, 2010). The uncertainty maps like the ones developed
in Figure 3.5 can aid uncertainty analysis by providing a tool to assess the reliability of AGB
predictions.
Mapping random effects can shed light on missing covariates. As an example, the random
effect associated with β1 at FEF (Figure 3.7b) is moderately correlated with elevation.
Although not explored here, after fitting a model as we did for FEF, a researcher could now
consider elevation as an additional covariate either as an additive component to the model
or test the multiplicative interaction with the LiDAR covariates.
Given the small size of the datasets considered here, the proposed modeling framework
was computationally feasible. However, when the datasets consist of several thousand observations, which is common in forest inventory databases, cubic order matrix operations
required for evaluating the model likelihood make parameter estimation computationally

65

onerous. Therefore, our future work will focus on exploring strategies for dimension reduction when fitting spatially varying coefficients models. Further, we will extend the spatially
varying coefficients models to accommodate multiple forest variables of interest.

66

BIBLIOGRAPHY

67

BIBLIOGRAPHY

Anderson, J., Plourde, L., Martin, M., Braswell, B., Smith, M.L., Dubayah, R., Hofton, M.,
& Blair, J. (2008). Integrating waveform lidar with hyperspectral imagery for inventory
of a northern temperate forest. Remote Sensing of Environment, 112, 1856 – 1870.
Banerjee, S., Carlin, C., & Gelfand, A. (2004). Hierarchical Modeling and Analysis for
Spatial Data. Chapman & Hall/CRC Monographs on Statistics & Applied Probability,
Chapman & Hall/CRC.
Banerjee, S., Finley, A., Waldmann, P., & Ericsson, T. (2011). Hierarchical spatial process
models for multiple traits in large genetic trials. Journal of the American Statistical
Association, 105, 506 – 521.
Banerjee, S., Gelfand, A., Finley, A., & Huiyan, S. (2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society, Series B (Statistical
Methodology), 70, 825 – 848.
Bechtold, W., & Patterson, P. (2005). The enhanced forest inventory and analysis program–
national sampling design and estimation procedures. General technical report SRS, USDA
Forest Service, Southern Research Station.
Bradford, J., Birdsey, R., Joyce, L., & Ryan, M. (2008). Tree age, disturbance history, and
carbon stocks and fluxes in subalpine rocky mountain forests. Global Change Biology, 14,
2882 – 2897.
Bradford, J., Weishampel, P., Smith, M.L., Kolka, R., Birdsey, R., Ollinger, S., & Ryan,
M. (2010). Carbon pools and fluxes in small temperate forest landscapes: Variability and
implications for sampling design. Forest Ecology and Management, 259, 1245 – 1254.
Carlin, B., & Louis, T. (2000). Bayes and Empirical Bayes Methods for Data Analysis,
Second Edition. Chapman & Hall/CRC Texts in Statistical Science, Taylor & Francis.
Chatterjee, S., & Hadi, A. (2006). Regression Analysis by Example, 4th Edition. Wiley
Series in Probability and Statistics, John Wiley & Sons.
Chil`es, J., & Delfiner, P. (1999). Geostatistics: modeling spatial uncertainty. Wiley series
in probability and statistics, John Wiley & Sons.
CMS (2010). Nasa carbon monitoring system. http://carbon.nasa.gov. Accessed: 3-182014.
Cressie, N. (1993). Statistics for spatial data. Wiley series in probability and mathematical
statistics: Applied probability and statistics, John Wiley & Sons.

68

Finley, A. (2010). Comparing spatially-varying coefficients models for analysis of ecological
data with non-stationary and anisotropic residual dependence. Methods in Ecology and
Evolution, 2, 143 – 154.
Finley, A., Banerjee, S., & MacFarlane, D. (2011). A hierarchical model for quantifying
forest variables over large heterogeneous landscapes with uncertain forest areas. Journal
of the American Statistical Association, 106, 31 – 48.
Finley, A., Banerjee, S., & McRoberts, R. (2009). Hierarchical spatial models for predicting
tree species assemblages across large domains. Annals of Applied Statistics, 3, 1052 – 1079.
Gelfand, A., Schmidt, A., Banerjee, S., & Sirmans, C. (2004). Nonstationary multivariate
process modeling through spatially varying coregionalization. Test, 13, 263–312.
Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2013). Bayesian Data Analysis, 3rd Edition.
Chapman & Hall/CRC Texts in Statistical Science, Chapman & Hall/CRC.
Gelman, A., & Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511.
Gonzalez, P., Asner, G., Battles, J., Lefsky, M., Waring, K., & Palace, M. (2010). Forest carbon densities and uncertainties from lidar, quickbird, and field measurements in california.
Remote Sensing of Environment, 114, 1561–1575.
Guanter, L., Frankenberg, C., Dudhia, A., Lewis, P., G´omez-Dans, J., Kuze, A., Suto, H., &
Grainger, R. (2012). Retrieval and global assessment of terrestrial chlorophyll fluorescence
from gosat space measurements. Remote Sensing of Environment, 121, 236–251.
Guanter, L., Rossini, M., Colombo, R., Meroni, M., Frankenberg, C., Lee, J.E., & Joiner, J.
(2013). Using field specroscopy to assess the potential opf statistical approaches for the
retrieval of sun-induced chlorophyll fluorescence from ground and space. Remote Sensing
of Environment, 133, 52–61.
Harville, D. (1997). Matrix Algebra from a Statistician’s Perspective. Springer.
Hoeting, J. (2009). The importance of accounting for spatial and temporal correlation in
analyses of ecological data. Ecological Applications, 19, 574–577.
Huang, B., Song, H., Cui, H., Peng, J., & Xu, Z. (2014). Spatial and spectral image fusion
using space matrix factorization. IEEE Transactions on Geoscience and Remote Sensing,
52, 1693–1704.
Lim, K., & Treitz, P. (2004). Estimation of above ground forest biomass from airborne
discrete return laser scanner data using canopy-based quantile estimators. Scandinavian
Journal of Forest Research, 19, 558–570.

69

Lucas, R., Cronin, N., Lee, A., Moghaddam, A., Witte, C., & Tickle, P. (2006). Empirical relationships between airsar backscatter and lidar-derived forest biomass, queensland,
australia. Remote Sensing of Environment, 100, 407–425.
Muss, J., Mladenoff, D., & Townsend, P. (2011). A pseudo-waveform technique to assess
forest structure using discrete lidar data. Remote Sensing of Environment, 115, 824–835.
Perrin, E. (1904). On some dangers of extrapolation. Biometrika, 3, 99–103.
Popescu, S., Zhao, K., Neuenschwander, A., & Lin, C. (2011). Satellite lidar vs. small
footprint airborne lidar: comparing the accuracy of aboveground biomass estimates and
forest structure metrics at footprint level. Remote Sensing of Environment, 115, 2786–
2797.
Sherrill, K., Lefsky, M., Bradford, J., & Ryan, M. (2008). Forest structure estimation and
pattern exploration from discrete-return lidar in subalpine forests of the central rockies.
Canadian Journal of Forest Research, 38, 2081–2096.
Spiegelhalter, D., Best, N., Carlin, B., & van der Linde, A. (2002). Bayesian measures of
model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series
B, 64, 583–639.
Tonolli, S., Dalponte, M., Neteler, M., Rodeghiero, M., Vescovo, L., & Gianelle, D. (2011).
Fusion of airborne lidar and satellite multispectral data for the estimation of timber volume
in the southern alps. Remote Sensing of Environment, 115, 2486–2498.
UN-REDD (2009). The un-redd programme. http://www.un-redd.org/. Accessed: 3-182014.
Wackernagel, H. (2003). Multivariate geostatistics: an introduction with applications, 3rd
edition. Springer.
Wang, C., Yu, W., Wang, R., Deng, Y., & Zhao, F. (2014). Comparison of nonnegative eigenvalue decompositions with and without reflection symmetry assumptions. IEEE
Transactions on Geoscience and Remote Sensing, 52, 2278–2287.
Wheeler, D., & Calder, C. (2007). An assessment of coefficient accuracy in linear regression
models with spatially varying coefficients. Journal of Geographical Systems, 9, 145–166.
Wheeler, D., & Waller, L. (2009). Comparing spatially varying coefficient models: a case
study examining violent crime rates and their relationships to alcohol outlets and illegal
drug arrests. Journal of Geographical Systems, 11, 1–22.

70

CHAPTER 4
SUMMARY AND FUTURE WORK

This thesis explored the use of flexible Bayesian hierarchical regression frameworks for calibarating LiDAR remote sensing information to forest inventory datasets. Chapters 2 and 3
provide examples where using Bayesian hierarchical spatial models can improve fit and predictive performance when modeling both univariate and multivariate forest structure data.
When model residuals exhibit spatial correlation, that relationship can be absorbed into a
spatially structured random effect to ensure statistically valid model inference and increase
the predictive ability of the model. Further, Chapter 2 shows how leveraging not only spatial
relationships, but correlations among different variables within the same tree can improve
predictive performance. Chapter 3 demonstrates how building models that accommodate
coefficient non-stationarity via the introduction of spatial random effects can yield increased
model fit and prediction accuracies when using LiDAR to model and map forest characteristics.
As the forestry and remote sensing research communities strive to answer questions
about more complex and dynamic ecological associations among differing environmental
variables across space and time, richer classes of models beyond ordinary least squares or
maximum likelihood methods will be needed. Bayesian hierarchical models for both spatial
and spatio-temporal processes are seeing rapid development in the applied environmental
sciences because of their immense flexibility to suit intricate variable associations and the
ease of propagation of uncertainty through complex hierarchies.
In future work, I plan to explore the use of Bayesian hierarchical modeling for forestry
and ecological processes in more detail. The research in this thesis has shown the potential
for spatial random effects to improve prediction accuracies. A next logical step is to examine

71

how spatial random effects can be used to uncover missing covariates. Exploring what
latent processes the spatial random effects might be capturing can lead to more ecologically
meaningful results by providing insight into the underlying environmental processes causing
the spatial distribution of the outcome variable, or variables, of interest.
The study sites examined in this thesis cover small spatial extents, i.e., stand level.
Particularly with the models explored in Chapter 3, spatially varying coefficients may play a
more significant role for fitting models over larger domains possibly crossing climatic zones
or having large altitude gradients. Changes in these types of macro environmental conditions
over one spatial domain usually indicate the need to fit more than one model, i.e., divide
the original extent into smaller regions and fit uniquely parametrized regressions to each.
Allowing coefficients to vary over space allows for parameter influence to change moving from
one region to another. Rather than having several models with potentially low numbers
of sample locations, one model can be formed that is able to borrow strength across the
observations of the full domain and be less likely to suffer inadequate sample size. Also,
instead of imposing divisions of the original domain based on speculation, the space varying
coefficients model allows the data to inform when and how parameters change.
As we move into a data rich era in the environmental sciences, there is increasing need for
computationally efficient methods for estimating cumbersome models with rich parameter
association structures. In the future, I plan to further develop and refine the computational
procedures used in this thesis. I hope to eventually author a package for the statistical
programming language R so that other researchers interested in ecological modeling can easily
execute the statistical methods explored in Chapters 2 and 3 in their research. The study sites
examined here were all reasonably sized which meant that the estimation of these models was
computationally feasible. As environmental datasets grow larger, new, more computationally
efficient fitting procedures are needed. In future work, I plan to look into speeding up
computation time for large datasets by examining the use of increased computational power

72

given by new technologies, such as processor hyper-threading and parallel processing to make
complex ecological modeling more accessible to the environmental research field.

73