HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA ESTIMATION OF
BIOPHYSICAL AND SOCIAL FORESTRY VARIABLES
By
Neil Ryan Ver Planck

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Forestry â Doctor of Philosophy
2018

ABSTRACT
HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA ESTIMATION OF
BIOPHYSICAL AND SOCIAL FORESTRY VARIABLES
By
Neil Ryan Ver Planck
Forest inventories and surveys, accounting for time and cost constraints, are typically designed to yield accurate and precise estimates of population means and totals over large
spatial domains. In many instances, these inventories and surveys also offer reliable inference for smaller subpopulations with sufficient sample observations; however, there is growing
demand for valid and precise estimates at levels that have smaller sample sizes based on the
original sample design. One solution to this problem is application of small area estimation
methods. Small area estimation (SAE) is a model-based approach that couples a direct estimate and possible covariates to improve the estimate precision and, in some cases, accuracy.
Unlike a standard linear regression, the SAE framework is comprised of two components: a
sampling model and a linking model. Estimation of the SAE parameter of interest accounts
for and balances between the sampling (i.e., direct estimator) and linking model errors. The
linking model is a linear model with random effects that relate the small areas of interest
with some error. Additional spatial structure might still remain in the linking model after
accounting for possible covariates. Such residual structure can be further modeled using
spatial random effects.
This dissertation presents SAE methods within a hierarchical Bayesian (HB) framework.
This framework is applied to common biophysical forest inventory outcomes of interest (i.e.,
aboveground biomass, basal area, volume, and tree density) at the stand level, and to the
social forestry survey outcomes of private forest landowner populations. Furthermore, an
in depth examination of the direst estimator, in the presence of nonresponse, is assessed
for private forest landowner population size. The primary objectives of this dissertation
are: i) to apply a HB framework to increase the precision of estimates for biophysical forest

variables at the stand level by borrowing strength across all stands through the use of
LiDAR covariates; ii) to apply a conditional autoregressive structure to the stand-level
random effects to assess gains in precision of biophysical forest variables; iii) to evaluate
the current National Woodland Owner Survey estimators of private forest area and private
forest landowner population size for a known population at the state level; iv) to present
an alternative estimator of private forest landowner population size that explicitly accounts
for various nonresponse scenarios; v) to evaluate the impacts of nonresponse biases on each
of these estimators; vi) to produce county-level private forest ownership datasets for two
complete states; vi) to define and assess SAE models to improve county-level inference of
the number of private forest ownerships, and; vii) to develop open source software to fit
proposed SAE models.

This dissertation is dedicated to Liisa.

iv

ACKNOWLEDGMENTS

I would like to sincerely thank my dissertation advisor and mentor Dr. Andrew O. Finley
for all of his guidance and help throughout my graduate studies. I would also like to thank
all of my committee members: Dr. David W. MacFarlane, Dr. Aaron R. Weiskittel, Dr.
Christopher W. Woodall, and Dr. Phoebe L. Zarnetske for all of their time and assistance
with this dissertation and my research program. Many additonal thanks to the following
collaborators: Dr. James C. Finley, Dr. Emily S. Huff, Dr. John A. Kershaw, Jr., and
Dr. Alexander L. Metcalf for providing data and thoughtful feedback regarding my research.
Additional acknowlegments to the editors and anonymous reviewers of this research at the
Canadian Journal of Forest Research, Forest Science, and Remote Sensing of Environment.
Additional thanks to the lab group members, during my tenure, of the Geospatial Lab:
Chad Babcock, Gloria Desanker, Malcolm Itter, Megan Kress, Jason Matney, Daniel Taylor,
Yuzhen Zhou for providing assistance, collaboration, and motivation.
Most importantly, a special thanks to my wife, Liisa, for all of her patience, love, and support. I would also like to thank my entire family for all of their support and encouragement
throughout my graduate study years.

v

TABLE OF CONTENTS

LIST OF TABLES

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

x

CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

CHAPTER 2 HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA
TIMATION OF FOREST VARIABLES USING LIDAR . . .
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1.1 Study area . . . . . . . . . . . . . . . . . . . . . . .
2.2.1.2 Variable radius plot data . . . . . . . . . . . . . . . .
2.2.1.3 LiDAR data . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2.1 Direct estimator and small area models . . . . . . . .
2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ES. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .

7
7
10
10
10
10
11
12
12
15
22

CHAPTER 3 MULTIVARIATE HIERARCHICAL BAYESIAN MODELS FOR
SMALL AREA ESTIMATION OF FOREST VARIABLES USING
LIDAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1.1 Study area . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1.2 Variable radius plot data . . . . . . . . . . . . . . . . . . . .
3.2.1.3 LiDAR data . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2.1 Direct estimator and small area models . . . . . . . . . . . .
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26
26
28
28
28
29
30
31
31
34
44

CHAPTER 4 EVALUATION OF THE NATIONAL WOODLAND OWNER
SURVEY ESTIMATORS FOR PRIVATE FOREST AREA AND
LANDOWNERS: A CASE STUDY OF MONTANA . . . . . . . . .
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 GIS-based Baseline Population Estimate . . . . . . . . . . . . . . . .
4.2.2 Sampling Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.3 Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.3.1 Private Forest Area . . . . . . . . . . . . . . . . . . . . . . .

47
47
50
50
52
53
53

vi

4.3

4.4

4.2.3.2 Private Forest Landowner Population
4.2.4 Nonresponse . . . . . . . . . . . . . . . . . . .
Results . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Private Forest Area . . . . . . . . . . . . . . .
4.3.2 Total Private Forest Landowners . . . . . . .
4.3.2.1 Full Response . . . . . . . . . . . . .
4.3.2.2 Nonresponse . . . . . . . . . . . . .
Discussion . . . . . . . . . . . . . . . . . . . . . . . .

Size
. . .
. . .
. . .
. . .
. . .
. . .
. . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

54
56
58
58
61
62
64
65

CHAPTER 5 HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA
ESTIMATION OF COUNTY-LEVEL PRIVATE FOREST
LANDOWNER POPULATION . . . . . . . . . . . . . . . . . . . .
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 True number of private forest ownerships . . . . . . . . . . . . . . . .
5.2.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2.1 Direct estimator . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2.2 Small area estimation models . . . . . . . . . . . . . . . . .
5.2.3 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3.1 County-level covariates . . . . . . . . . . . . . . . . . . . . .
5.2.4 Simulation summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69
69
71
71
72
72
73
75
76
76
78
88

CHAPTER 6 RECOMMENDATIONS FOR FUTURE WORK . . . . . . . . . . .

93

APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPENDIX A
HB AND EBLUP PARAMETER ESTIMATES FOR FAYHERRIOT MODEL OF ABOVEGROUND BIOMASS . . . .
APPENDIX B
MODEL
SUMMARIES
OF
PRIVATE
FOREST
LANDOWNERS FOR MONTANA AND NEW JERSEY BY COUNTY . . . . . . . . . . . . . . . . . . . . . . .

96
97

98

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

vii

LIST OF TABLES

Table 2.1: Summary statistics for stand area, number of plots, mean aboveground
biomass, sampling variances, and LiDAR covariates for the Noonan Forest (m = 226 stands) dataset. . . . . . . . . . . . . . . . . . . . . . . . . .

12

Table 2.2: The median and 95% credible intervals for Fay-Herriot (FH), FH with
conditional autoregressive (FHCAR), and FHCAR with smoothed sampling variances (FHCAR-SMOOTH) model parameters. . . . . . . . . . .

15

Table 2.3: Mean CV reduction (%) between SAE models by number of neighboring
stands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

Table 3.1: Summary statistics for stand area, number of plots, mean aboveground
biomass, mean basal area, mean volume, mean tree density, sampling
variances, and LiDAR covariates for the Noonan Forest (m = 224 stands)
dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

Table 3.2: The median and 95% credible intervals for the univariate Fay-Herriot
(FH), and FH with conditional autoregressive (FHCAR) model parameters for the four outcomes of interest. . . . . . . . . . . . . . . . . . . . .

35

Table 4.1: Size classes of private forest area. . . . . . . . . . . . . . . . . . . . . . . .

57

Table 4.2: Summary of Montana private forest land area (ha) for the GIS-based
baseline (base), the estimate from simple random sampling of the baseline
(Method I), and additional publications. . . . . . . . . . . . . . . . . . . .

59

Table 4.3: Summary of the number of private forest landowners in Montana for the
GIS-based baseline (base), the estimate from two sampling methods, and
additional publications. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

Table 5.1: The median and 95% credible intervals (in parantheses) for Fay-Herriot
(FH) and FH with conditional autoregressive (FHCAR) models of a single iteration for Montana (iteration 750) and New Jersey (iteration 350). .

82

Table 5.2: Summary of bias, root mean squared error (RMSE), empirical coverage
for a 95% nominal coverage rate and average 95% confidence interval
width of the direct estimate and 95% credible interval width of the small
area estimation model estimates for Montana and New Jersey across all
counties and iterations for the direct, Fay-Herriot (FH), and FH with
conditional autoregressive random effects (FHCAR) model estimates. . . .

85

viii

Table A.1: The median and 95% credible intervals for the model parameters of the
Fay-Herriot (FH) with hierarchical Bayesian (HB) inference and the point
estimates with standard errors in parentheses for the FH under empirical
best linear unbiased predictor (EBLUP) were fit via restricted maximum
likelihood. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

Table B.1: Summary of the first 28 Montana counties by identification number (ID)
and name for true population total (Truth), sample size (n), and bias
across all repeated samples for the direct, Fay-Herriot (FH), and FH with
conditional autoregressive random effects (FHCAR) model estimates. . . .

98

Table B.2: Summary of the second set of 28 Montana counties by identification
number (ID) and name for true population total (Truth), sample size
(n), and bias across all repeated samples for the direct, Fay-Herriot (FH),
and FH with conditional autoregressive random effects (FHCAR) model
estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

Table B.3: Summary of New Jersey counties by identification number (ID) and name
for true population total (Truth), sample size (n), and bias across all repeated samples for the direct, Fay-Herriot (FH), and FH with conditional
autoregressive random effects (FHCAR) model estimates. . . . . . . . . . 100
Table B.4: Summary of the first 28 Montana counties by identification number (ID)
for root mean squared error (RMSE), empirical coverage for a 95% nominal coverage rate, and 95% confidence interval width for the direct estimator and 95% credible interval width for the two small area estimation
models across all repeated samples for the direct, Fay-Herriot (FH), and
FH with conditional autoregressive random effects (FHCAR) model estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Table B.5: Summary of the second set of 28 Montana counties by identification
number (ID) for root mean squared error (RMSE), empirical coverage
for a 95% nominal coverage rate, and 95% confidence interval width for
the direct estimator and 95% credible interval width for the two small
area estimation models across all repeated samples for the direct, FayHerriot (FH), and FH with conditional autoregressive random effects
(FHCAR) model estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Table B.6: Summary of New Jersey counties by identification number (ID) for root
mean squared error (RMSE), empirical coverage for a 95% nominal coverage rate, and 95% confidence interval width for the direct estimator
and 95% credible interval width for the two small area estimation models across all repeated samples for the direct, the Fay-Herriot (FH), and
the FH with conditional autoregressive random effects (FHCAR) model
estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
ix

LIST OF FIGURES

Figure 2.1: Noonan Forest stands by (a) number of variable radius plots, (b) standlevel P 25 (m); (c) stand-level P 75 (m) LiDAR covariates. . . . . . . . . .

13

Figure 2.2: Posterior mean aboveground biomass (Mg haâ1 ) for the (a) direct estimate, (b) FH model, (c) FHCAR model; (d) FHCAR-SMOOTH model. .

17

Figure 2.3: Mean aboveground biomass (Mg haâ1 ) for direct estimates versus posterior means of FH, FHCAR, and FHCAR-SMOOTH models. . . . . . .

18

Figure 2.4: Coefficient of variation of the (a) direct estimate, (b) FH model, (c)
FHCAR model; (d) FHCAR-SMOOTH model. . . . . . . . . . . . . . . .

19

Figure 2.5: Estimates of coefficient of variation (CV) versus stands ordered by increasing CV of direct estimate. . . . . . . . . . . . . . . . . . . . . . . .

20

Figure 2.6: Percent reduction in coefficient of variation from (a) direct estimate to
FH model, (b) direct estimate to FHCAR model, (c) FH to FHCAR
model; (d) FHCAR to FHCAR-SMOOTH model. . . . . . . . . . . . . .

21

Figure 2.7: 95% credible interval widths of AGB (Mg haâ1 ) for the (a) FH, (b)
FHCAR; (c) FHCAR-smooth models. . . . . . . . . . . . . . . . . . . . .

23

Figure 3.1: Noonan Research Forest stands by (a) forest type (balsam fir; tolerant softwoods; black spruce; white pine; mixed woods dominated by
softwoods; mixed woods dominated by intolerant hardwoods; intolerant
hardwoods), (b) stand-level P 25 (m), (c) stand-level P 50 (m), and (d)
stand-level dns (%) LiDAR covariates. . . . . . . . . . . . . . . . . . . .

32

Figure 3.2: The posterior means for the outcomes of (a) mean aboveground
biomass (MgÂˇhaâ1 ), (b) mean basal area (m2 Âˇhaâ1 ), (c) mean volume
(m3 Âˇhaâ1 ); (d) mean tree density (number of treesÂˇhaâ1 ) for the FHIW
model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

Figure 3.3: (a) Mean aboveground biomass (MgÂˇhaâ1 ), (b) mean basal area
(m2 Âˇhaâ1 ), (c) mean volume (m3 Âˇhaâ1 ); (d) mean tree density (number of treesÂˇhaâ1 ) for direct estimates versus posterior means of FH,
FHCAR, and FHIW models. . . . . . . . . . . . . . . . . . . . . . . . . .

38

x

Figure 3.4: Estimates of coefficient of variation (CV) for direct estimates and FH,
FHCAR, FHIW models of (a) mean aboveground biomass (MgÂˇhaâ1 ),
(b) mean basal area (m2 Âˇhaâ1 ), (c) mean volume (m3 Âˇhaâ1 ); (d) mean
tree density (number of treesÂˇhaâ1 ) versus stands ordered by increasing
CV of direct estimate. The horizontal dashed line indicates a CV value
of 0.15; and the horizontal dotted line indicates a CV value of 0.25. . . .

40

Figure 3.5: Percent reduction in coefficient of variation from direct estimate to FH
model for (a) mean aboveground biomass (MgÂˇhaâ1 ), (b) mean basal
area (m2 Âˇhaâ1 ), (c) mean volume (m3 Âˇhaâ1 ); (d) mean tree density
(number of treesÂˇhaâ1 ). . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

Figure 3.6: Percent reduction in coefficient of variation from FH model to FHIW
model for (a) mean aboveground biomass (MgÂˇhaâ1 ), (b) mean basal
area (m2 Âˇhaâ1 ), (c) mean volume (m3 Âˇhaâ1 ); (d) mean tree density
(number of treesÂˇhaâ1 ). . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

Figure 4.1: (a) One realization of the sample distribution of unique private forest
landowners (PFLs) by size class, (b) trends of probabilities of nonresponse by size class for four different scenarios (A-D) with an overall
nonresponse rate of approximately 50%. . . . . . . . . . . . . . . . . . .

58

Figure 4.2: (inset) Location of Montana, highlighted in grey, within the conterminous United States, (a) 10 m x 10 m resolution for all forested lands
across the state of Montana, and (b) 10 m x 10 m resolution for private
forested lands across the state of Montana. . . . . . . . . . . . . . . . . .

60

Figure 4.3: (a) The distribution for proportion of total private forest area of family
forest landowners of Montana (NWOS, Butler et al. 2016b) and the
GIS-based baseline (Base) population of Montana by size class, (b) the
distribution for proportion of private forest landowners (PFLs) by size
class for the Rocky Mountain region (Rockies, Oswalt et al. 2014) and
the Base population of Montana, and (c) the distribution of number of
family forest landowners of Montana (NWOS, Butler et al. 2016b) and
PFLs for the Base population of Montana by size class. . . . . . . . . . .

61

xi

Figure 4.4: Horizontal line indicates the GIS-based baseline of PFL population size.
Horizontal dashed lines represent the nominal probability coverage rate
of 0.95. (a-c) Mean PFL population size, mean 95% confidence interval
width, and 95% actual probability coverage for full response and incremental response rates from 50 to 10% for the two different estimators
(I and II) based upon 1000 simulations, respectively. (d-f) Mean PFL
population size, mean 95% confidence interval width, and 95% actual
probability coverage for four different response scenarios (A-D) with an
overall 50% response rate of Methods I and II based upon 1000 simulations, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

Figure 5.1: Montana counties by (a) sample size, (b) true population size with >2 ha
forest, (c) 2010 census population density (PD, number of peopleÂˇkmâ2 );
(d) total forest area (TFA, 1000s ha). . . . . . . . . . . . . . . . . . . . .

79

Figure 5.2: New Jersey counties by (a) sample size, (b) true population size with
>2 ha forest, (c) 2010 Census population density (PD, number of
peopleÂˇkmâ2 ); (d) total forest area (TFA, 1000s ha). . . . . . . . . . . .

80

Figure 5.3: Distribution of posterior means from all iterations for select SAE model
parameters: (a) the variance parameters of Montana, (b) the FHCAR
model autocorrelation parameter of Montana, (c) the variance parameters of New Jersey; (d) the FHCAR model autocorrelation parameter
of New Jersey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

Figure 5.4: Relative mean squared error comparisons for Montana (eq. 5.8): (a)
direct to FH model estimates, (b) direct to FHCAR model estimates;
(c) FH to FHCAR model estimates. . . . . . . . . . . . . . . . . . . . . .

87

Figure 5.5: Relative mean squared error comparisons for New Jersey (eq. 5.8): (a)
direct to FH model estimates, (b) direct to FHCAR model estimates;
(c) FH to FHCAR model estimates. . . . . . . . . . . . . . . . . . . . . .

88

xii

CHAPTER 1
INTRODUCTION

The focus of this dissertation is the development and application of model-based small area
estimation (SAE) models for both biophysical and social forestry variables. The use of the
model-based framework is necessitated in SAE because the sample sizes for the design-based
framework do not provide adequate precision. There is considerable demand and interest
among stakeholders of large-scale forest inventories or surveys, such as the United States
Department of Agriculture (USDA) Forest Service Forest Inventory and Analysis (FIA)
program and the associated National Woodland Owner Survey (NWOS), because outcomes
are desired at resolutions smaller than that provided by the original design. Some of the
current forestry-related SAE applications range from the traditional inventory outcomes of
timber volume and biomass (e.g., timber products output) to the assessment of wildland
fires (FIA Stakeholder Science Meeting 2017).
The key to SAE is to borrow strength from related areas to improve estimation for the
outcome and small area of interest. A small area, here, is defined by the sample size for
the outcome of interest rather than by the geographic size, as our interest lies in both forest
stands and larger areas such as counties. Rao and Molina (2015) give an in-depth overview
of SAE and cover many different approaches. Following the definitions of Rao and Molina
(2015), SAE is divided into three classes: i) direct; ii) indirect; and iii) small area modelbased estimation. Direct estimators are generally design-unbiased for the area of interest
and derived directly from the sample. Model-assisted methods, also belonging to this class,
have proven useful for combining covariates with plot observations for biophysical variables
(see, e.g., Opsomer et al. 2007). Indirect estimators are developed based upon an implicit
model that borrows strength from either another domain, time, or both. Several frequently
applied examples of this class are synthetic and composite estimators (Goerndt et al. 2011,
Breidenbach and Astrup 2012, Goerndt et al. 2013). In Chapters 2, 3, and 5, we consider
1

the third class of SAE, small area model-based estimation, referred to as small area models.
Small area models differ from the previous two classes by including an explicit model with
random effects that account for variation not explained by covariates in the model mean. In
Chapter 4, we consider the NWOS direct estimators for various unit nonresponse scenarios.
For small area models, the analyst applies either a frequentist or Bayesian mode of inference. The frequentist approach uses empirical best linear unbiased prediction (EBLUP),
and the Bayesian approach uses either empirical Bayes (EB) or hierarchical Bayes (HB).
For forestry applications, EBLUP has been applied most frequently and exclusively to biophysical settings (Goerndt et al. 2011, Breidenbach and Astrup 2012, Goerndt et al. 2013,
Magnussen et al. 2014, Mauro et al. 2016). The most common small area model is a linear mixed effects model, called the Fay-Herriot (FH, Fay and Herriot 1979) model, which
links a direct estimator to covariates via a linear model. Only one of these preceding SAE
biophysical forestry studies examined spatial correlations among the area-level effects (see
Appendix B in Magnussen et al. 2014). EB is considered the Bayesian paradigm equivalent
to EBLUP. Alternatively, HB methods provide access to posterior distributions of the small
area parameters (You and Zhou 2011), and hence parameter inference that does not rely
upon potentially unrealistic asymptotic assumptions (Pfefferman 2013).
The outcomes of interest are most commonly modeled at either the unit-level or the
area-level. For biophysical outcomes, the models generally occur at the smallest resolution
possible, i.e., the unit-level. Light detection and ranging (LiDAR) has become a ubiquitous choice for covariates in extending and improving unit-level estimation of biophysical
variables. Development and application of unit-level regression models for spatially aligned
LiDAR and biophysical inventory measurements are quite common (see, e.g., Babcock et al.
2015; 2016, Finley et al. 2017, Gregoire et al. 2016). These unit-level analyses generally apply
fixed-area plots for measurements due to alignment between the outcomes and covariates.
There are some examples, where variable-radius plots (VRP) have been applied in unit-level
analyses (Hollaus et al. 2007; 2009). Several recent studies (Hayashi et al. 2015, Deo et al.

2

2016) have varied LiDAR resolutions to best align with VRP, at the unit-level, without a
consensus across forest types. As an alternative to a unit-level analysis, an area-level analysis can be used when there is spatial misalignment between LiDAR and plot measurements
(Goerndt et al. 2011). In this dissertation, the focus is on application of area-level models
with VRP to estimate biophysical outcomes, with inference at the stand level. For social
forestry outcomes of interest, a unit-level analysis is generally not possible to be combined
with covariates outside of the survey application due to privacy concerns. For our primary
social forestry outcome of interest, private forest landowner population size, an area-level
analysis is the natural choice.
The primary objective of this dissertation is to contribute to the growing body of literature of SAE applications in the field of forestry. The detailed objectives of this dissertation
are: i) to apply a HB framework to increase the precision of estimates for biophysical forest variables at the stand level by borrowing strength across all stands through the use of
LiDAR covariates; ii) to apply a conditional autoregressive structure to the stand-level random effects to assess gains in precision of biophysical forest variables; iii) to evaluate the
current NWOS estimators of private forest area and private forest landowner population size
for a known population at the state level; iv) to present an alternative estimator of private
forest landowner population size that explicitly accounts for various nonresponse scenarios;
v) to evaluate the impacts of nonresponse biases on each of these estimators; vi) to produce county-level private forest ownership datasets for two complete states; vi) to define
and assess SAE models to improve county-level inference of the number of private forest
ownerships, and; vii) to develop open source software to fit proposed SAE models. The
subsequent paragraphs, summarize the main chapters of the dissertation as three distinct
published peer-reviewed articles and one chapter still to be submitted.
Chapter 2 has been published in Remote Sensing of Environment. In this chapter, Ver
Planck et al. (2018) use forest inventory plot measurements and LiDAR covariates to inform
model-based estimators for SAE. There are many examples where such linking models pro-

3

vide the desired accuracy and precision of forest parameter estimates for small areas where
paucity of inventory plot observations preclude design-based inference. This work builds
on previous SAE work by linking LiDAR covariates with VRP measurements within a HB
framework. Using this framework, we compare SAE of forest aboveground biomass (AGB)
using: i) FH; ii) FH with conditional autoregressive random effects (FHCAR); and iii) FHCAR with smoothed sampling variance (FHCAR-SMOOTH) models. Candidate models
and the direct estimate based on plot measurements alone were compared using coefficient
of variation (CV). On average, the FH model reduced the CV by 52.3% compared to the
direct estimate. Incorporating spatial structure via the FHCAR model reduced the CV by
56.9% and 10.8% relative to the direct and the FH model estimates, respectively. Overall,
these results illustrate the applicability and utility of using a SAE framework for linking
LiDAR with typical forest inventory data.
Chapter 3 extends the work of the previous chapter to include the additional common
forest inventory variables of basal area (BA), volume (VOL), and tree density (DENS) into a
multivariate HB SAE modeling framework. Using this framework, we compare the univariate
case of SAE for the individual biophysical variables using the FH and FHCAR models
to the multivariate FH with inverse-Wishart (FHIW) model. Candidate models and the
direct estimate based on plot measurements alone were compared again via CV. On average,
the FH model reduced the CV by 52.9%, 53.6%, 53.7%, 52.7% for AGB, BA, VOL, and
DENS, respectively, compared to the direct estimate. Incorporating spatial structure via the
FHCAR model reduced the CV for AGB, BA, VOL, and DENS by 58.0%, 57.5%, 58.7%,
54.2% relative to the direct estimates, respectively. On average, the FHCAR reduced the
CV by 12.2%, 9.5%, 12.3%, 3.7% for AGB, BA, VOL, and DENS relative to the FH model
estimates. The incorporation of the correlation among outcomes in the FHIW model reduced
the CV even further relative to the FH model by 30.6%, 22.1%, 9.8% for AGB, BA, and
VOL; however the average reduction was -1.8% for DENS. Overall, these results illustrate
the additional utility of using a multivariate SAE framework for linking LiDAR with multiple

4

outcomes of interest in a forest inventory.
The following two chapters transition from biophysical to social outcomes of interest.
Private forest ownerships currently own a plurality of forested land in the US (Oswalt et al.
2014) and will play an increasing role in the management of the forested lands in the US for
the foreseeable future. These two chapters offer methods to increase or better represent precision in estimates of private forest landowner population size. Chapter 4 has been published
in Forest Science. In this chapter, Ver Planck et al. (2016) examine the current estimator
used by the NWOS for total private forest area and private forest landowner population size
along with one alternative estimator. The NWOS, conducted by the USDA Forest Service,
is the standard for state- and national-level estimates of private forestland and ownerships in
the US. The estimators are evaluated at the minimal resolution used by NWOS, of a state.
Montana is used as a case study by combining freely available cadastral data with remote
sensing in a geographic information system. These data allow us to evaluate the estimators
for a known private forest ownership population. In addition, the impacts of nonresponse
biases are assessed for each of the estimators. The results indicate that the current estimator performs as well as the alternative estimator under conditions of full response; however,
the estimator performance varied under conditions of nonresponse. We offer the alternative
estimator to explicitly account for nonresponse over the implicit nonresponse assumptions
of the current NWOS estimator under the various conditions of nonresponse examined.
Chapter 5 has been published in the Canadian Journal of Forest Research. In this chapter, Ver Planck et al. (2017c) examine the ability to expand statewide estimates similar
to the NWOS to county levels. Due to sample sizes prescribed for inference at the state
level, there are insufficient data to support county-level estimates. However, county-level
estimates of NWOS variables are desired because ownership programs and education initiatives often occur at the county level and such information could help tailor these efforts to
better match county-specific needs and demographics. Here, we present and assess methods to estimate the number of private forest ownerships at the county level for two states,

5

Montana and New Jersey. To assess model performance, true population parameters were
derived from cadastral and remote sensing data. Two SAE models, the FH and the FHCAR, improved estimated county-level population mean squared error over that achieved
by direct estimates. The proposed SAE models use covariates to improve accuracy and precision of county-level estimates. Results show total forest area, and 2010 decennial census
population density covariates explained a significant portion of variability in county-level
population size. These and other results suggest that the proposed SAE methods yield a
statistically robust approach to deliver reliable estimates of private ownership population
size and could be extended to additional important NWOS variables at the county level.
Chapter 6 concludes with potential avenues to pursue related to the topics covered in this
dissertation.

6

CHAPTER 2
HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA ESTIMATION
OF FOREST VARIABLES USING LIDAR

2.1

Introduction

Forest inventory efforts typically follow a sampling design that aims to cover a potentially
broad range of stand conditions, e.g., capturing species and structural diversity. Generally,
a systematic grid of fixed-area plots (FAP) or variable-radius plots (VRP) are established
across the forest. The choice between FAP or VRP depends on the inventory objectives
(Maltamo et al. 2009). A greater cost efficiency generally results in the establishment of
VRP for operational inventories over FAP research inventories (Rice et al. 2014).
Light detection and ranging (LiDAR) data have become one of the remote sensing tools
of choice for extending and improving ground-based forest inventories and monitoring. There
has been considerable research relating forest attributes with LiDAR covariates (see reviews
by McRoberts et al. 2010, NĂŚsset et al. 2004). In the case of complete LiDAR coverage, a fine
grid is typically imposed on the area of interest and LiDAR covariates are calculated using
the point cloud within each grid pixel. Often there is some effort to matching the pixel size
to that of the inventory plot, especially in the setting where a regression model is developed
to relate the LiDAR covariates to the response forest variables of interest (see, e.g., Finley
et al. 2013, McRoberts et al. 2013). Development of regression models for spatially aligned
LiDAR and inventory plot measurements is often referred to as unit-level analysis and is
quite common (see, e.g., Babcock et al. 2015; 2016, Finley et al. 2017, Gregoire et al. 2016).
These and similar analyses use FAP, opposed to VRP, because the relationship between plot
extent can be directly matched with spatially coinciding LiDAR covariates. Truncated VRP,
which result in a comparable extent to FAP, used in the Finnish National Forest Inventory
have been successfully regressed on spatially aligned LiDAR covariates (Maltamo et al. 2007),

7

although this may not generalize to stands with diverse diameters (Scrinzi et al. 2015).
There are examples, where VRP have been used in unit-level analyses (Hollaus et al.
2007; 2009). More recently, Hayashi et al. (2015) and Deo et al. (2016) have applied a
variety of LiDAR resolutions that best match with the basal area factor applied in the VRP
sampling. As an alternative to a unit-level analysis, an area-level analysis can be used when
there is spatial misalignment between LiDAR and plot measurements (Goerndt et al. 2011).
In an area-level regression analysis, observations are at the stand-level, whereas a unit-level
analysis could consider multiple observations within each stand. Working at the area-level
affords some advantages. For example, one can combine the cost efficient VRP with LiDAR
covariates without spatial alignment between the plot data and LiDAR (Goerndt et al. 2011).
In an effort to align VRP measurements and LiDAR data, Kronseder et al. (2012) calculated
LiDAR covariates at a resolution of 1 ha based on the idea that VRP measurements are
expanded to a per ha basis. Additionally, van Aardt et al. (2006) and Hudak et al. (2014)
calculated the LiDAR variables at a segment- or stand-level in which the VRP samples
were established. In this study, our focus is on application of area-level models to estimate
aboveground forest biomass (AGB), with inference at the stand-level. The area-level models
developed here are general and could be applied to other forest variables or transformations
of AGB, e.g., for use in forest carbon accounting projects.
Given time and cost constraints, inventory designs often focus on achieving forest-level
accuracy and precision requirements, which results in a limited number of samples collected
within any given stand. These small sample sizes result in stand-level point estimate uncertainty that is too large for practical use. This limited inference at the stand-level, due to
paucity of samples, is referred to as the small area problem and is commonly tackled using
a small area estimation (SAE) method. We consider a forest stand to be the small area of
interest for the following development; however, this is setting specific and one can of course
consider other small area delineations.
The key of SAE is to borrow strength across related areas to improve estimation at the

8

small area of interest. Rao and Molina (2015) give an in-depth overview of SAE and cover
many different approaches. Following the definitions of Rao and Molina (2015), SAE is
divided into three classes: i) direct; ii) indirect; and iii) small area model-based estimation. Direct estimators are generally design-unbiased for the area of interest and derived
directly from the sample. Model-assisted methods, also belonging to this class, have proven
useful for combining remotely sensed data with ground observations for AGB estimation
(see, e.g., Opsomer et al. 2007). Indirect estimators are developed based upon an implicit
model that borrows strength from either another domain, time, or both. Several frequently
applied examples of this class are synthetic and composite estimators (Goerndt et al. 2011,
Breidenbach and Astrup 2012, Goerndt et al. 2013). In this study, we consider the third
class of SAE, small area model-based estimation, referred to as small area models. Small
area models differ from the previous two classes by including an explicit model with random
effects that account for variation not explained by covariates in the model mean.
For small area models, the analyst applies either a frequentist or Bayesian mode of
inference. The frequentist approach uses empirical best linear unbiased prediction (EBLUP),
and the Bayesian approach uses either empirical Bayes (EB) or hierarchical Bayes (HB).
For forestry applications, EBLUP has been applied most frequently (Goerndt et al. 2011,
Breidenbach and Astrup 2012, Goerndt et al. 2013, Magnussen et al. 2014, Mauro et al.
2016). Mauro et al. (2016) emphasized the correct specification of the estimator of mean
squared error (MSE) of EBLUP for different levels of aggregation, from a pixel to an entire
forest. The most common small area model is a linear mixed effects model, called the FayHerriot (FH, Fay and Herriot 1979) model, which links a direct estimator to covariates via a
linear model. Only one of the preceding SAE forestry studies examined spatial correlations
among the area-level effects (see Appendix B in Magnussen et al. 2014). EB is considered
the Bayesian paradigm equivalent to EBLUP. Alternatively, HB methods provide access
to posterior distributions of the small area parameters (You and Zhou 2011), and hence
parameter inference that does not rely upon potentially unrealistic asymptotic assumptions

9

(Pfefferman 2013).
The primary objective of this study was to apply a HB framework to increase the precision
of estimates for mean AGB at the stand-level by borrowing strength across all stands through
the use of LiDAR covariates. Additionally, we apply a conditional autoregressive structure
to the stand-level random effects to assess gains in precision of AGB. The remainder of the
manuscript follows with: i) a description of the study area along with relevant data for the
small area models; ii) a description and implementation of the small area models; and iii)
the results and discussion of applying small area models for AGB. All source code and data
are provided to facilitate reproducible research and application of the proposed methods.

2.2

Methods

2.2.1
2.2.1.1

Data
Study area

The area of interest for this study was the Noonan Research Forest (NRF) near Fredericton,
New Brunswick, Canada (N 45âŚ 59â12â, W 66âŚ 25â15â). The NRF has been managed by the
University of New Brunswick since 1985 and is approximately 1500 ha in size with a total
of 271 stands. The subsequent analysis uses a subset of 226 stands each with a minimum of
two VRP per stand. These stands ranged in size from 0.6 to 47 ha with an average size of
6.6 ha (Table 2.1; Figure 2.1). The forest is composed of hardwood, mixed, and softwood
stands with the major species being aspen (Populus spp.), balsam fir (Abies balsamea L.
(Mill.)), birch (Betula spp.), eastern white pine (Pinus strobus L.), red maple (Acer rubrum
L.), and spruce (Picea spp.), see Hayashi et al. (2015) for more details.

2.2.1.2

Variable radius plot data

In 2010, a 100Ă100 m grid was laid out across the NRF. At each grid intersection a VRP was
established and trees greater than 6.0 cm diameter at breast height (DBH) were selected into

10

the sample using a 2M basal area factor angle gauge. Species, DBH, and height were recorded
for each sample tree. Plot estimates of AGB Mg haâ1 were calculated using Jenkins et al.
(2003) species-group equations. Stand-level estimates were obtained by averaging plot-level
AGB Mg haâ1 estimates. Table 2.1 summarizes these stand-level estimates.

2.2.1.3

LiDAR data

The full waveform LiDAR data were collected on October 21 and 22, 2011 using a Riegl LMS
Q680i laser scanner mounted on an airplane. The sensor had a pulse repetition frequency of
180 kHz with a laser wavelength of 1550 nm and a scan angle < 28.5âŚ from nadir. The forest
was covered in overlapping strips to achieve at a minimum of six pulses per m2 , footprint of
0.35 m, and up to eight returns per pulse (Hayashi et al. 2015).
Stand-level LiDAR covariates were computed using the lascanopy function in the LAStools software suite (Isenburg 2016). The NRF stand polygons, LAS files, and arguments
to define the vertical extent of the LiDAR profile considered, were passed into the lascanopy function to generate the desired LiDAR covariates. Here, we consider stand-level
LiDAR signal derived percentilesâheight in meters at which some percent of the energy is
returnedâlabeled P followed by the percentile, as well as signal shape variables, e.g., kurtosis and skewness. Exploratory analysis using a backward selection procedure resulted in P 25
and P 75 having the lowest Akaike information criterion value. Although the two percentile
heights have a high correlation, the variance inflation factor of each was three, which is less
than the general rule of ten for assessing multicollinearity (see Kutner et al. 2004, p.409).
Therefore, P 25 and P 75 covariates were used for the SAE models.

11

Table 2.1: Summary statistics for stand area, number of plots, mean aboveground biomass,
sampling variances, and LiDAR covariates for the Noonan Forest (m = 226 stands) dataset.
Min
Stand Area (ha)
0.6
No. of plots
2
â1
Mean AGB (Mg ha ) 16.9
Ďi2
0.158
2
ĎĚi
36.5
P 25 (m)
2.2
P 75 (m)
5.1
2.2.2
2.2.2.1

Max
47.3
44
223.5
7948
804
8.4
20.7

Mean
6.1
5.9
117.8
1698
411
5.2
11.7

SD
5.6
5.5
44.8
1432
226
1.3
2.8

Models
Direct estimator and small area models

The direct estimator applied assumed simple random sampling for point and variance estimates of AGB. Beginning with the SAE FH model for stand i in 1, 2, . . . , m stands defined
as:

Yi = Î¸i + i ,

(2.1)

Î¸i = x0i Î˛ + vi ,
where Yi is the direct estimate, the parameter of interest Î¸i is mean AGB, and i is a normally distributed error with mean zero and variance Ďi2 . The additive mean of Î¸i comprises
an intercept and stand-level covariates held in the p Ă 1 column vector x and associated
p Ă 1 vector of regression coefficients Î˛. The stand-level random effects term vi is normally
distributed with mean zero and variance Ďv2 . For comparison with HB inference, the frequentist EBLUP was also applied for the FH model with available R statistical software (see
Appendix A).
It is reasonable to think that environmental variables, e.g., disturbance history and soil
characteristics, could exhibit spatial autocorrelation and affect the observed AGB. In our
setting, if direct estimate values are spatially correlated, i.e., adjacent stands have similar

12

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

25
20
15
10
5

â

â

â
â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â
â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â
â

â

â

â

â

â

â

â
â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â
â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â
â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

30

â

â
â

â

â

â

â
â

â

â

45

â

â

â

â

â

â

â

â

â

â

â

â

â

9

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â
â

â
â

â
â

8
â

7

â
â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

â

6

â

5
4

â

3

â

â

0

1000 m

1

2

(a)

(b)

23
20
17
14
11
8
5

(c)

Figure 2.1: Noonan Forest stands by (a) number of variable radius plots, (b) stand-level
P 25 (m); (c) stand-level P 75 (m) LiDAR covariates.

AGB values, then we should exploit this relationship to further improve inference by pooling
information across proximate stands. Hence, we augment model (2.1) by adding a spatially
structured random effect that follows a conditional autoregressive (CAR) prior distribution,
see, e.g., Banerjee et al. (2015) and You and Zhou (2011). This extended model called
FHCAR is defined analogous to FH, with the exception that the unstructured random effects
v = (v1 , v2 , . . . , vm )0 âź N (0, Ďv2 I) in model (2.1) are replaced with v âź N (0, ÎŁ(Ďv2 , Îť)).

13

Here, the m Ă m covariance matrix ÎŁ(Ďv2 , Îť) = Ďv2 [ÎťR + (1 â Îť)I]â1 , where Ďv2 is the spatial
variance parameter, Îť is the autocorrelation parameter, R is the neighborhood matrix with
diagonal elements equal to the number of neighbors and off diagonal elements equal negative
one or zero indicating if a neighbor is present or not, and I is the m Ă m identity matrix.
Stands are only considered neighbors with adjoining borders.
The previous models assume the sampling variances are fixed and known, which is a
common assumption for SAE. However, additional smoothing methods are also commonly
applied to the sampling variances to reduce instability in variance estimates of small sample
sizes. Here, the sampling variances were smoothed by stand area ai (similar to Goerndt
et al. 2011); smoothed sampling variances ĎĚi2 were defined as:

Ve
;
ni
Pm
ai Ďi2
i=1
,
V e = Pm
i=1 ai

ĎĚi2 =

(2.2)

where ni is the number of VRP in stand i. These ĎĚi2 were then used in place of the original
sampling variances in the FHCAR model. This model is referred to as the FHCAR-SMOOTH
model. The development of generalized variance functions (GVF) are also common in the
SAE literature (e.g., Dick 1995); however, we did not find a useful covariate to develop a
GVF.
The Bayesian SAE model specifications are completed by assigning prior distributions to
parameters (Gelman et al. 2014). Each regression coefficient in Î˛ was assigned a flat prior
distribution, Ďv2 was given an inverse-Gamma (IG) prior distribution, and, following You and
Zhou (2011), Îťâs prior was uniform with support between zero and one. The IGâs shape hyperparameter was set to two, which results in a prior mean equal to the scale hyperparameter
P
2
and infinite variance. The IGâs scale hyperparameter was set as m
i=1 Ďi /m to give equal
prior weight to the sampling and CAR variances. A Markov chain Monte Carlo (MCMC)
algorithm was used to sample from parametersâ posterior distributions. Specifically, a Gibbs
14

algorithm was developed to sample from Î¸ = (Î¸1 , Î¸2 , . . . , Î¸m ), Î˛, and Ďv2 with full conditional distributions given in You and Zhou (2011), and a Metropolis-Hastings algorithm was
used to sample from Îťâs posterior distribution. We completed all analyses using R statistical
software (R Core Team 2015). The data and R code to reproduce this analysis are available
at Mendeley Data (Ver Planck et al. 2017a).
Parameter posterior inference was based on 3,000 post burn-in MCMC samples from each
of L = 3 chains. For the FH and FHCAR models, we thinned the samples to every third
sample to reduce the autocorrelation in samples resulting in K = 3,000 samples. For the
FHCAR-SMOOTH model, thinning of the samples was not necessary based upon examination of plots of the autocorrelation function resulting in 9,000 samples. Chain mixing and
convergence were diagnosed using a multivariate potential scale reduction factor of less than
1.1 for all parameters considered (Gelman et al. 2014).

2.3

Results

The FH, FHCAR, and FHCAR-SMOOTH model parameter estimates are given in Table 2.2. Here, the 95% credible intervals for Î˛1 and Î˛2 do no include zero, which suggests the
associated LiDAR covariates explain a substantial portion of the direct estimateâs variability.
The HB and EBLUP FH model parameter estimates were similar, with the exception of Ďv2
(Table A.1). The discrepancy between the HB and EBLUP estimate for Ďv2 is likely due
Table 2.2: The median and 95% credible intervals for Fay-Herriot (FH), FH with
conditional autoregressive (FHCAR), and FHCAR with smoothed sampling variances
(FHCAR-SMOOTH) model parameters.
Parameter
FH
Î˛0
-51.99 (-69.15, -34.31)

FHCAR
-49.94 (-67.89, -31.02)

FHCAR-SMOOTH
-41.50 (-54.77, -27.33)

Î˛1 , P 25

24.24 (18.17, 30.08)

23.35 (16.90, 29.52)

21.48 (16.80, 25.95)

Î˛2 , P 75

3.66 (0.83, 6.42)

3.82 (1.01, 6.84)

4.08 (2.28, 6.01)

Ďv2

284.4 (201.2, 404.8)

471.5 (313.3, 720.8)

365.7 (201.3, 590.0)

Îť

â

0.61 (0.21, 0.93)

0.61 (0.15, 0.93)

15

to the fact that, unlike EBLUP, HB acknowledges uncertainty in this parameter and hence
greater variation around the AGB estimates.
The stand-level point estimates of AGB for the FH model ranged between 18.0 and 208.7
Mg haâ1 with a mean of 117.1 Mg haâ1 . The ranges of the FHCAR and FHCAR-SMOOTH
models were 18.2 â 210.1 Mg haâ1 and 21.3 â 215.0 Mg haâ1 with means of 116.2 and 118.4
Mg haâ1 , respectively. Figure 2.2 maps the direct and posterior means of AGB for FH,
FHCAR, and FHCAR-SMOOTH model estimates for the individual stands. A scatter plot
of these direct and posterior SAE model estimates does not reveal any regions of deviation
(Fig. 2.3). Following the methods of Brown et al. (2001) and You and Zhou (2011), bias
due to model misspecification was assessed through fitting linear models between the direct,
assumed to be design-unbiased, and the posterior means of the SAE model estimates. The
slopes of the individual linear models were 1.003, 1.002, and 1.063 for FH, FHCAR, and
FHCAR-SMOOTH posterior means, respectively, which suggests there is not a systematic
bias across the range of AGB values. The R2 for each was 0.843, 0.838, and 0.890 for the
FH, FHCAR, and FHCAR-SMOOTH posterior means.

16

225

225

195

195

165

165

135

135

105

105

75

75

45

45

15

15

(a)

(b)

225

225

195

195

165

165

135

135

105

105

75

75

45

45

15

15

(c)

(d)

Figure 2.2: Posterior mean aboveground biomass (Mg haâ1 ) for the (a) direct estimate, (b)
FH model, (c) FHCAR model; (d) FHCAR-SMOOTH model.

The coefficient of variation (CV), defined as the square root of the posterior variance
divided by the posterior mean, was used as the measure of precision among the direct, FH,
FHCAR, and FHCAR-SMOOTH model estimates with a smaller value indicating better
precision. For the direct estimates CV ranged between 0.004 and 1.156 with a mean of
0.363. The ranges for the FH, FHCAR, and FHCAR-SMOOTH models were 0.004 â 0.510,
0.004 â 0.398, 0.037 â 0.363 with means of 0.148, 0.133, and 0.106, respectively. Using a

17

â
â

200
150
100
50

Direct Estimate of Aboveground Biomass (Mg haâ1)

â

FH
FHCAR
FHCARâSMOOTH

ââ â

ââ
â
â
â
â â ââ â â â â
â
â
â
â
â
â ââ
â â
ââ â
ââ
ââ
ââ âââ â
ââ
âââ â
ââ â
â
ââ ââââ
âââ â
â
ââ
â â
ââ
ââ
â
â ââ
ââ â
ââ â â ââ
ââ
â â
â
ââ
â
ââ
â
â
ââ
ââ â
ââ
ââ
ââ
â
â
â
ââ â
ââ
ââ
â
â
â
â
â
âââââââ
â ââ
â ââ
â
â
ââ
â
ââ
âââ
ââ
â
ââ
â â
ââ
â
â
ââ
ââ
â
â
ââ
ââââââ
ââ
â
â
â
â
â
â
â
â
â
â
â
â
â
ââââ ââ
â
â
â
âââ
ââ
â â â ââ
â
ââ
â
â
â
â
â
â â ââ
ââ â ââââ
âââ
â
âââ
â â
â â
â
â
â
â
â
ââ â
â
â
â ââ
ââ â
â ââââ
ââ â
â
ââââ
â
âââ
â âââ
â
â
ââ
ââ
â ââ
â ââ ââ â â â
ââ
â
â
â
ââ
â
ââ
â
â
â
â
â
â
â
â
â
â
â
â
ââ
â
â
â
â
â
â
â
â
â
â
â
â
â â â ââ
â
â
â
â
â
â
â
â
â
â
â
â ââ â
ââ
â
â
ââ
ââ
â
â ââ
ââ â â
âââ
ââ
â â
ââ
â
â
â
â
â
ââ
â
â
ââ â ââ
â
â
ââ
â
ââ
â
â ââ ââ ââ
â
ââ
â â
â
â
â
â â ââ
â
âââââ â ââ â
ââ
â
âââ â
âââ ââ
ââ
ââ
ââ
ââââ
ââ
â
â ââ
â
ââ
â
â
â
â
â
â
â
â
â
ââ
â
â
â
â
â
ââââ
â
âââ
ââ âââââ
â â â
â
â
â
â
â
â
â
â
â
ââ â ââ â âââ
â
â â
â ââ â
â â
ââ
â
ââ
â âââ â
ââ âââ
â
â
ââ
ââ â
âââ
âââ
â
â
â
â
â
â
â
â
â
â
ââ
â
â
ââ
ââ
ââ

â
âââ
âââââ
ââ
â
ââ
â
â âââ
â
âââ
â
â
â ââ â
â
â ââ
â
â
â
â
â â

50

100

150

200

Posterior Mean of Aboveground Biomass (Mg haâ1)

Figure 2.3: Mean aboveground biomass (Mg haâ1 ) for direct estimates versus posterior
means of FH, FHCAR, and FHCAR-SMOOTH models.

common CV threshold of 0.15 for stand-level estimates (Mauro et al. 2016), 29 of the 226
stands were at or below this threshold for the direct and 161 for the FH model estimates.
The FHCAR and FHCAR-SMOOTH models had 168 and 195 stands, respectively, at or
below the threshold. Figure 2.4 maps the CV of the individual stands with the lightest
shaded stands meeting or exceeding the CV threshold. Figure 2.5 shows the CV of the
direct and three model estimates against the stands ordered from smallest to largest CV
of the direct estimator. Reductions in CV are observed between the direct estimator and
the FH model for all but two stands, and even greater reductions are observed between the
direct estimator and the FHCAR model. Further reductions in CV are observed between
the FH and FHCAR model for 186 stands. The smoothing of the sampling variance for the
FHCAR-SMOOTH model shows larger estimates of CV than even the direct estimator at
the low end of the CV direct estimator. A total of 17 stands had a CV for the FHCARSMOOTH model greater than the direct estimator. These stands were generally stands with
four or fewer variable radius plots that had low variation among the plots. The remaining

18

1.2

1.2

0.99

0.99

0.78

0.78

0.57

0.57

0.36

0.36

0.15

0.15

0.003

0.003

(a)

(b)

1.2

1.2

0.99

0.99

0.78

0.78

0.57

0.57

0.36

0.36

0.15

0.15

0.003

0.003

(c)

(d)

Figure 2.4: Coefficient of variation of the (a) direct estimate, (b) FH model, (c) FHCAR
model; (d) FHCAR-SMOOTH model.

standsâ CV values showed a similar pattern to the FHCAR model.
The percent reductions in CV from the direct estimator to the FH model ranged from
-0.9% to 88.0% with a mean of 52.3%. A wider range in percent reduction of -0.7% â 90.7%
was observed from the direct estimator to the FHCAR model and a mean of 56.9%. Gains
were also observed in percent CV reduction from the FH to the FHCAR model with a range
of -34.6% â 36.0% and mean of 10.8%. The largest range in percent reduction was seen for

19

1.2

â
â

0.8
0.6
0.4
0.0

0.2

Coefficient of Variation

1.0

â
â

DIRECT
FH
FHCAR
FHCARâSMOOTH

â
â
â
â
â
â
â

â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â ââ
â
â
â
â
â
ââ
â
â
â
â
â
â
ââ
â
â
ââ
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
ââ â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
ââ âââ â ââ
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
ââ
â
ââ
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
ââ
â
â
â
â
â
ââ
â
â ââ
â
â
â
â
â ââ
â
â
â
â
â
â
â
â
â
â
â
â
âââ â â ââ
â
â
ââ
ââ
â
â
â
â
ââ â
â
â
ââ
â
ââ â â
â
ââ
ââ
âââ â
â
â
ââ
â
â
â
â
â
â
ââ
â
â
â
â
â
â
ââ ââ
â â ââ
â
ââââ
â
ââââ
â
â
â
â
â
ââ
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â â
ââââ
ââ
â
ââââ ââ
â
ââ ââ
â
ââââââ
â
â ââ
â
ââ
â
ââ
ââ
â
â
ââ â
â
â
â â
â
â
âââ
â
âââ
ââ â
â
ââ â
â
â ââ
ââ
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â
â â â
â
â
â
ââ â
â
â ââââ â
â
â
â
â
â
â
â
ââ
ââ
ââ
âââ
â
â
ââ â
â
â
â
â
ââ
ââ
â
â
ââ â
ââ
â
ââ
ââ
â
âââ
â
â
â
â
â
â
ââ
ââ
â
â
ââ
âââ
â
âââ â
ââ
â
â
â
â
ââ
â
ââ
â
â
â
â
â
â
â
â
â
â
â
â
â
â
ââ â
â
â
â
â
â
â
ââ
ââ
â
ââ
ââ
â
ââ
ââ
ââ
â ââ
ââ
â
â
ââ
â
â
â
â
ââ
â
â
â
â
â
ââ
â
â
â
â
â
ââ
ââ
â
â
ââ
ââ â
â
â
ââ
â
â
â
â
â
â
â
â
â
ââ
ââ
â
ââ
ââ
â
â
âââ
âââââ
â
â
â
â
â
â
â
â
ââ
â
ââ
â
â
â
â
â
â
â
â
â
ââ
â
ââ
â ââ
â
â
â
â
ââââ
â
ââ
â
â
â
ââ
â
â
â
â
â
â
â
â
ââ
â
ââ â
â
â
â
â
â
âââ â âââ ââ
â
â
â
â â
ââ
ââ
â
â
â
â
â
â âââ
â
â
â â âââ
â â âââ ââ ââ â
â âââ
ââ â
â
â
â
â
â

Stands (ordered from smallest to largest CV Direct)

Figure 2.5: Estimates of coefficient of variation (CV) versus stands ordered by increasing
CV of direct estimate.
the FHCAR to the FHCAR-SMOOTH model of -2220% â 56.9% and a mean of -1.2%. The
average percent change across stands between the FHCAR and FHCAR-SMOOTH models
was negative mainly due to large percent changes (< -1000%) in a few of the stands as
depicted in Figure 2.6. Excluding these two stands, the mean percent change in CV between
the FHCAR and FHCAR-SMOOTH models was 13.9%.
Table 2.3 shows the mean percent reduction in CV for FH to FHCAR models and FHCAR
to FHCAR-SMOOTH models by number of neighboring stands. For the FH to FHCAR models, a general increasing trend was observed in percent CV reduction with increasing number
of neighbors (ranging from -25.2% to 36.0%) with a slight drop moving from ten to eleven
neighbors. A similar pattern was observed for the FHCAR to FHCAR-SMOOTH models.
The smaller number of neighbors (i.e., 1â3 and 5) had a negative value indicating reduction
from FHCAR-SMOOTH to FHCAR (ranging from -35.1% to -21.8%), and the remaining
number of neighbors showed reduction from FHCAR to FHCAR-SMOOTH (ranging from
20.3% to 43.9%).

20

95

95

80

80

60
50

60
50

30
20
10
0
â5
â10
â15
â25
â35
â1000
â2300

30
20
10
0
â5
â10
â15
â25
â35
â1000
â2300

(a)

(b)

95

95

80

80

60
50

60
50

30
20
10
0
â5
â10
â15
â25
â35
â1000
â2300

30
20
10
0
â5
â10
â15
â25
â35
â1000
â2300

(c)

(d)

Figure 2.6: Percent reduction in coefficient of variation from (a) direct estimate to FH
model, (b) direct estimate to FHCAR model, (c) FH to FHCAR model; (d) FHCAR to
FHCAR-SMOOTH model.
The precision of the SAE models was also assessed by the width of the credible intervals,
where narrower intervals indicate less uncertainty. The 95% credible interval widths of mean
AGB are shown in Figure 2.7 for the FH, FHCAR, and FHCAR-SMOOTH models. The
credible interval widths for the FH model ranged from 1.6 to 70.8 Mg haâ1 with a mean
width of 57.2 Mg haâ1 . For the FHCAR model, the range of the credible interval widths
were slightly wider, ranging from 1.5 to 87.3 Mg haâ1 , and the mean width was also smaller

21

Table 2.3: Mean CV reduction (%) between SAE models by number of neighboring stands.
Number of
Neighbors
1
2
3
4
5
6
7
8
9
10
11
13

FH to
FHCAR
-25.2
-7.3
1.5
9.3
15.3
20.1
23.6
27.6
30.4
34.2
28.1
36.0

FHCAR to
FHCAR-SMOOTH
-33.4
-35.1
-21.8
21.6
-32.8
21.9
32.7
24.6
29.1
20.3
39.3
43.9

at 50.3 Mg haâ1 . The range for the FHCAR-SMOOTH model was smaller than the previous
two models from 20.3 to 66.3 Mg haâ1 ; however, the minimum credible interval width was
much larger than both the FH and FHCAR models. The mean of the credible interval width
of the FHCAR-SMOOTH model was 41.5 Mg haâ1 , which was slightly smaller than the
FHCAR model.

2.4

Discussion

In this study, we found P 25 and P 75 to be the LiDAR covariates that explained the
greatest variation in stand-level AGB. Hayashi et al. (2015) using the same data from NRF
found the height of P 45 to be most significant in explaining AGB estimates. One possible
explanation for this difference is the LiDAR covariates used in this study were summaries
at the area-level (stand-level), while Hayashi et al. (2015) examined relationships between
AGB and LiDAR covariates at the unit-level (plot-level). Goerndt et al. (2011) also found
LiDAR covariates to differ in their final regression model for stand-level versus plot-level
relationships. Many studies (e.g., Lefsky et al. 2002) have also found P 95, generally regarded
as canopy height, to be a common explanatory variable; however, this was not the case in
our final linear model. We could have explored more LiDAR covariates; however, the focus

22

90
80
70
60
50
40
30
20
10
0

(a)

90

90

80

80

70

70

60

60

50

50

40

40

30

30

20

20

10

10

0

0

(b)

(c)

Figure 2.7: 95% credible interval widths of AGB (Mg haâ1 ) for the (a) FH, (b) FHCAR;
(c) FHCAR-smooth models.

of this paper was on applying the HB SAE framework and not to finding the best linking
model.
Recently, Breidenbach et al. (2016) and Mauro et al. (2016) examined estimation of
variance associated with model-based SAE from a frequentist perspective. In this study,
we offered the hierarchical Bayesian method as an alternative that provided access to the
full posterior distributions of the variables of interest and simplified interpretation of model

23

parameter inference. For the FH model, the EBLUP had smaller Ďv2 compared with HB. We
see this as a strength of the HB model, in that the HB mode of inference provides a more
realistic accounting of uncertainty associated with estimating AGB. As a contribution, we
have supplied the necessary functions to reproduce this research that could be applied to
datasets with similar applications. This study also highlights the suitability of applying VRP
for gains in sampling efficiency when coupled with LiDAR covariates that are summarized
at the stand-level.
Large reductions in CV were seen in the application of either the FHCAR or FHCARSMOOTH model to the NRF stands due to strong area effects. Magnussen et al. (2014) also
examined small area applications accounting for spatial autocorrelation with improvements
for Swiss forest districts. Additionally, Breidenbach et al. (2016) found large improvements
in coverage probability when incorporating spatial autocorrelation into the variance estimate
for the confidence intervals. These examples show that, when present, accounting for spatial
autocorrelation can better represent the uncertainty in the variables of interest.
Smoothing of sampling variances (Eq. 2.2) in the FHCAR-SMOOTH model decreased
credible interval widths compared to the FHCAR model; whereas, for stands with fewer
neighboring stands and smaller sample sizes the FHCAR model had smaller credible interval
widths. However, this may be a result of increasing the sampling variance for the smaller
stands by applying smoothing and may be more reflective of the instability of variance with
these small sample sizes. The method of smoothing based on stand area may not be reflective
of increased variation in smaller stands due to the practice of delineating stands based on
their similarities in management history, species composition, and age.
One potential drawback of applying SAE at the area-level is the inability to assess within
stand variation, as can be done when aligning LiDAR and plot data at the unit-level (e.g.,
Hayashi et al. 2016, Woods et al. 2011). However, as seen in Hayashi et al. (2016) a determination of prediction unit size influences inference at the stand- and forest-level. For
area-level analyses, the size of the areas can be adjusted to meet the application of interest

24

with the requirement of at least two ground samples within each area for estimating the
sampling variance.
A potential practical application of this research would be to extend AGB to carbon
estimates for forest carbon accounting projects. SAE models could be cost effective in
reducing uncertainty where paucity of field plots makes direct estimates unreliable for the
project.
Additionally, future work will be to extend this study to the multivariate setting similar
to the work of Porter et al. (2015). In this extension, cross-correlations in the multivariate
setting could improve AGB estimates along with other common forest inventory variables,
e.g., basal area, tree density, or volume. The HB SAE framework demonstrates the utility
of applying common ground inventory techniques, i.e., sampling with VRP, with available
remotely sensed data for reducing uncertainty at the area-level.

25

CHAPTER 3
MULTIVARIATE HIERARCHICAL BAYESIAN MODELS FOR SMALL
AREA ESTIMATION OF FOREST VARIABLES USING LIDAR

3.1

Introduction

Forest inventory efforts typically follow a sampling design to meet multiple objectives.
The inventories are often designed to collect information for a range of outcomes of interest.
Some of the most common forest inventory outcomes of interest related to forest management
are aboveground biomass (MgÂˇhaâ1 ), basal area (m2 Âˇhaâ1 ), cubic volume (m3 Âˇhaâ1 ), and
tree density (no. treesÂˇhaâ1 ); these are the outcomes of interest in this study. A multivariate,
rather than a univariate approach, to modeling these four outcomes may be beneficial for
increasing the precision of outcomes that are more difficult to model from remote sensing,
i.e., tree density (Finley et al. 2013).
Generally, the data are collected through the establishment of a systematic grid of fixedarea plots (FAP) or variable-radius plots (VRP) across the region of interest. The permanence of the plots and inventory objectives will determine the type of plots to be utilized
(Maltamo et al. 2009). A greater cost efficiency generally results from the establishment of
VRP for operational inventories over FAP research inventories (Rice et al. 2014). Another
solution for reducing the number of plots established and increasing sampling efficiency is
the collection of remote sensing covariates, e.g., light detection and ranging (LiDAR), that
cover the full range of natural variability. Numerous studies have been conducted to relate
these four outcomes of interest with LiDAR covariates (see reviews by McRoberts et al.
2010, NĂŚsset et al. 2004). Often the LiDAR covariates are extracted at a footprint that
aligns with ground-based sample plots, especially when models are developed for analysis at
the plot level, also referred to as the unit level (see, e.g., Babcock et al. 2015; 2016, Finley
et al. 2017, Gregoire et al. 2016). FAP are commonly the plot type of choice for unit-level

26

analysis; however there have been several studies that have examined the proper resolution
for combining LiDAR with VRP (see, e.g., Deo et al. 2016, Hayashi et al. 2015).
The alternative choice to a unit-level analysis is an area-level analysis for which the
inventory plots and remote sensing data may be misaligned (Goerndt et al. 2011). The
most common area of interest in an area-level analysis is the forest stand. Here, we will
use the efficiency of VRPs in combination with LiDAR data to determine stand-level per
ha values for aboveground biomass, basal area, cubic volume, and tree density. Besides
the choice between a unit- and area-level analysis, the cost and time to establish plots are
determining factors to the number of plots that are established. Generally, the inventory is
designed to meet forest-level accuracy and precision, and thus may result in small sample
sizes for certain stands. This problem is most often ameliorated through a model-based
solution known as small area estimation (SAE). SAE borrows strength across related areas
to improve estimation for the small area of interest, i.e., an individual stand. For more
in-depth details on SAE, see Rao and Molina (2015). The small area models are defined by
the inclusion of an explicit model with random effects that account for variation that was
unable to be explained by the covariates in the model mean.
In SAE forestry applications, the frequentist approach using empirical best linear unbiased prediction (EBLUP) has been most commonly employed for univariate outcomes (Goerndt et al. 2011, Breidenbach and Astrup 2012, Goerndt et al. 2013, Magnussen et al. 2014,
Mauro et al. 2016). One recent forestry application has examined a multivariate approach to
EBLUP for the outcomes of basal area and volume (Mauro et al. 2015). Here, we employ the
hierarchical Bayesian (HB) mode of inference that has been recently applied to the univariate
case of aboveground biomass by Ver Planck et al. (2018). We then expand these univariate
HB SAE models to the multivariate case to increase precision in each outcome by taking
advantage of potential strong correlations among our four outcomes of interest. A unit-level
multivariate approach has been demonstrated by Finley et al. (2013) to increase precision
in outcomes that are more difficult to model from remote sensing, e.g., tree density. The

27

HB method provides access to posterior distributions for the small area outcomes of interest
and is flexible for adding complex covariance structures (Porter et al. 2015), and parameter
inference does not rely upon asymptotic assumptions (Pfefferman 2013).
The primary objective of this study was to apply a multivariate HB framework to increase
the precision of estimates for mean aboveground biomass, mean basal area, mean volume,
and mean tree density at the stand level by borrowing strength across all stands through the
use of LiDAR covariates and through the correlation among outcomes. Here, we consider the
relationship among outcomes with the same number of observations for each outcome, where
the information from other outcomes can improve the estimation for an individual outcome
of interest. A potential extension of the multivariate framework could be to the case where
differences in sample size exist among the outcomes for an area of interest; however, only
the first case is examined in the current study. Additionally, we apply the univariate HB
framework for each of the outcomes for comparison to the multivariate case. Furthermore, we
add a separable conditional autoregressive structure to the stand-level random effects of the
multivariate model to assess potential gains in precision for the four outcomes. The remainder
of the study follows with: i) a description of the study area along with relevant data for the
univariate and multivariate small area models; ii) a description and implementation of the
small area models; and iii) the results and discussion of applying small area models for these
four common forest inventory outcomes.

3.2

Methods

3.2.1
3.2.1.1

Data
Study area

The area of interest for this study was the Noonan Research Forest (NRF) near Fredericton,
New Brunswick, Canada (N 45âŚ 59â12â, W 66âŚ 25â15â). The NRF has been managed by the
University of New Brunswick since 1985 and is approximately 1500 ha in size including 115 ha

28

of nonforested wetlands with a total of 271 stands. The subsequent analysis uses a subset of
224 stands each with a minimum of two VRP per stand and sampling variation greater than
zero for each outcome. These stands ranged in size from 0.6 to 47 ha with an average size of
6.6 ha (Table 3.1). The forest is composed of hardwood, mixed, and softwood stands with
the major species being aspen (Populus spp.), balsam fir (Abies balsamea L. (Mill.)), birch
(Betula spp.), eastern white pine (Pinus strobus L.), red maple (Acer rubrum L.), and spruce
(Picea spp.). Figure 3.1a shows the stands by forest type with 24 intolerant hardwood stands,
89 mixed woods dominated by intolerant hardwoods stands, 54 mixed woods dominated by
softwoods stands, 1 white pine stand, 11 black spruce stands, 38 tolerant softwoods stands,
and 7 balsam fir stands.

3.2.1.2

Variable radius plot data

In 2010, a 100Ă100 m grid was laid out across the NRF. At each grid intersection a VRP
was established with proper boundary corrections and trees greater than 6.0 cm diameter
at breast height (DBH) were selected into the sample using a 2M basal area factor (BAF)
angle gauge. Species and DBH were recorded for each sample tree. A larger, 27M BAF,
angle gauge was used to select a subset of trees for height measurements. Tree height
was then modeled for the trees without height measurements, see Hayashi et al. (2016) for
more details. For the first outcome of interest (aboveground biomass), plot estimates in
MgÂˇhaâ1 were calculated using Jenkins et al. (2003) species-group equations. The second
and fourth outcomes of basal area (m2 Âˇhaâ1 ) and tree density (treesÂˇhaâ1 ) were estimated
directly from the size and number of sample trees. For the third outcome of interest (cubic
volume), plot estimates in m3 Âˇhaâ1 were derived from individual tree volumes estimated with
local species-specific volume equations (Honer 1967). Stand-level estimates were obtained
by averaging plot-level estimates for each of the outcomes. Table 3.1 summarizes these
stand-level estimates.

29

3.2.1.3

LiDAR data

The full waveform LiDAR data were collected on October 21 and 22, 2011 using a Riegl LMS
Q680i laser scanner mounted on an airplane. The sensor had a pulse repetition frequency of
180 kHz with a laser wavelength of 1550 nm and a scan angle < 28.5âŚ from nadir. The forest
was covered in overlapping strips to achieve at a minimum of six pulses per m2 , footprint of
0.35 m, and up to eight returns per pulse (Hayashi et al. 2015).
Stand-level LiDAR covariates were computed using the lascanopy function in the LAStools software suite (Isenburg 2016). The NRF stand polygons, LAS files, and arguments
to define the vertical extent of the LiDAR profile considered, were passed into the lascanopy function to generate the desired LiDAR covariates. Here, we consider stand-level
LiDAR signal derived percentilesâheight in meters at which some percent of the energy is
returnedâlabeled P followed by the percentile, as well as coverage and signal shape variables, e.g., density (dns) of cover, kurtosis and skewness. The density covariate, expressed
as a percentage, is defined as the number of LiDAR returns above 2 meters divided by the
total number of returns. Exploratory analysis using a backward selection procedure resulted
in various combinations of percentile heights, dns, and kurtosis having the lowest Akaike
information criterion value for the individual outcomes under consideration. The variance
inflation factor, where values less than the general rule of ten for assessing multicollinearity
(see Kutner et al. 2004, p.409) are acceptable, was also calculated to reduce the number
of covariates for each of the outcomes. The final selected covariates for the SAE models
were P 25 for the aboveground biomass and basal area outcomes, P 50 for the cubic volume
outcome, and dns for the tree density outcome.

30

Table 3.1: Summary statistics for stand area, number of plots, mean aboveground biomass,
mean basal area, mean volume, mean tree density, sampling variances, and LiDAR
covariates for the Noonan Forest (m = 224 stands) dataset.

3.2.2
3.2.2.1

Stand Area (ha)

Min
0.6

Max
47.3

Mean
6.2

SD
5.6

Number of plots

2

44

6.0

5.5

Mean aboveground biomass (MgÂˇhaâ1 )

16.9

223.5

117.8

44.9

Mean basal area (m2 Âˇhaâ1 )

4.20

43.5

25.1

8.45

Mean volume (m3 Âˇhaâ1 )

23.8

313

153

60.9

Mean tree density (treesÂˇhaâ1 )

117

3628

1392

531

2
Ďi1

0.158

7948

1711

1431

2
Ďi2

1.33

393

75.7

61.5

2
Ďi3

0.019

16378

3253

2806

2
ĎĚi4

0.001

3.92

0.373

0.407

P 25 (m)

2.18

8.35

5.20

1.29

P 50 (m)

3.40

14.9

8.39

2.16

dns (%)

31.2

92.6

76.5

11.0

Models
Direct estimator and small area models

The direct estimator applied assumed simple random sampling for point and variance estimates for the four outcomes: aboveground biomass, basal area, cubic volume, and tree
density. The direct estimator for tree density was subsequently log transformed to meet
the SAE model normality assumption, whereas the other outcomes remained untransformed
into the small area models. The associated sampling variance for tree density was also transformed via the delta method (Casella and Berger 2002). We begin by fitting the univariate
SAE models of Ver Planck et al. (2018) for the Fay-Herriot (FH) and FH with conditional
autoregressive random effects (FHCAR) models for the individual outcomes. Then, we expanded the univariate FH model to the multivariate case, similar to Porter et al. (2015), for
stand i in 1, 2, . . . , m stands and outcome j in 1, 2, . . . , n defined as:

31

Balsam Fir

Tolerant
Softwoods

Black Spruce

White Pine
9
8

Softwoodsâ
Mixed Woods

7
6

Intolerant
Hardwoodsâ
Mixed Woods

5
4

Intolerant
Hardwoods

3
0

1000 m

2

(a)

(b)

16

100

14

90

12

80

10

70

8

60

6

50

4

40

2

30

(c)

(d)

Figure 3.1: Noonan Research Forest stands by (a) forest type (balsam fir; tolerant
softwoods; black spruce; white pine; mixed woods dominated by softwoods; mixed woods
dominated by intolerant hardwoods; intolerant hardwoods), (b) stand-level P 25 (m), (c)
stand-level P 50 (m), and (d) stand-level dns (%) LiDAR covariates.

Yij = Î¸ij + ij ,
Î¸ij = x0ij Î˛ j + vij ,

32

(3.1)

where Yij is the direct estimate, the parameter of interest Î¸ij is mean aboveground biomass
(j = 1), basal area (j = 2), cubic volume (j = 3), or log tree density (j = 4), and ij is a
2 for aboveground biomass, basal
normally distributed error with mean zero and variance Ďij
2 = Ď 2 /Y 2 . The sampling
area, and cubic volume. For log tree density, the variance is ĎĚi4
i4 i4

error terms are assumed to be known and independent between locations and combined
into Îľ âź N (0, ÎŁ), where ÎŁ is an nm Ă nm matrix containing the sampling errors stacked by
outcome within location along the diagonal. The additive mean of Î¸ij comprises an intercept
and stand-level covariates held in the pj Ă 1 column vector x and associated pj Ă 1 vector of
P
regression coefficients Î˛ j . The total number of covariates p is the n
j=1 pj and gathered into
the nmĂp matrix X stacked by location. The stand-level random effects term vij is normally
2 . Similar to the sampling variances, we gather
distributed with mean zero and variance Ďvj

the stand-level error terms in ÎŁv , an nm Ă nm matrix stacked by outcome within location.
We parameterize ÎŁv , similar to Porter et al. (2015), by ÎŁv = I m â ÎŁo with â defined as
the Kronecker product and ÎŁo is the n Ă n covariance matrix between the outcomes that is
given an inverse-Wishart (IW) prior distribution. We refer to this multivariate SAE model
as the FHIW model.
As in the univariate case, it is reasonable to think that unobserved environmental variables could exhibit spatial autocorrelation and affect the observed outcomes. For this, we
consider a separable covariance structure that uses a single conditional autoregressive (CAR)
model along with the multivariate dependence in outcomes through the Kronecker product.
We refer to this model as the FHSEP model by replacing the ÎŁv in the FHIW model with
ÎŁv = ÎŁs â ÎŁo , where ÎŁo has the same inverse-Wishart prior for the outcomes in the FHIW
model. The ÎŁs denotes the shared spatial CAR dependence for all of the outcomes as
ÎŁs = (D â ĎW )â1 , where D is a m Ă m matrix with the number of neighbors of the stands
along the diagonal. Here, stands are considered neighbors with adjoining borders. The
m Ă m matrix W has the value of 1 in off diagonal locations for stands that are neighbors
with zeros elsewhere, and Ď is the spatial dependence parameter.

33

The Bayesian SAE model specifications are completed by assigning prior distributions to
parameters (Gelman et al. 2014). The same prior distributions of Ver Planck et al. (2018)
for AGB were applied for the FH and FHCAR models for the three other outcomes of
interest. For the FHIW and FHSEP models, each regression coefficient in Î˛ was assigned
a flat prior distribution, ÎŁo was given an IW prior distribution, Ďâs prior was uniform with
support between zero and one. The IWâs shape hyperparameter was set to one plus the
number of outcomes. The IWâs scale hyperparameter was set to the identity matrix with
dimension of the number of outcomes. A Markov chain Monte Carlo (MCMC) algorithm was
used to sample from parametersâ posterior distributions. Specifically, a Gibbs algorithm was
developed to sample from Î¸ = (Î¸11 , Î¸21 , . . . , Î¸mn ), Î˛, and ÎŁo , and a Metropolis-Hastings
algorithm was used to sample from Ďâs posterior distribution. We completed all analyses
using R statistical software (R Core Team 2015). For all of the SAE models, parameter
posterior inference was based on 3,000 post burn-in MCMC samples from each of L = 3
chains. Chain mixing and convergence were diagnosed using a multivariate potential scale
reduction factor of less than 1.1 for all parameters considered (Gelman et al. 2014).

3.3

Results

The univariate FH and FHCAR model parameter estimates are given in Table 3.2. Here,
the 95% credible intervals for all of the Î˛ij do not include zero, which suggests the associated
LiDAR covariates explain a substantial portion for each of the direct estimateâs variability.
The credible intervals for the Îť values of the FHCAR models also do not include zero indicating that there is potential spatial structure in the random area effects; and the overlapping
credible intervals among outcomes suggest a single CAR model could be used for these outcomes in a multivariate setting, such as the proposed FHSEP model. However, the extension
of the FHIW model to the FHSEP model considered had posterior estimates of Ď near zero;
and therefore was not considered further.
For the first outcome of interest, the stand-level point estimates of aboveground biomass

34

Table 3.2: The median and 95% credible intervals for the univariate Fay-Herriot (FH), and
FH with conditional autoregressive (FHCAR) model parameters for the four outcomes of
interest.
Parameter
FH
Î˛01
116.5 (111.9, 120.9)

FHCAR
115.7 (108.0, 123.3)

Î˛11 , P 25

40.22 (36.20, 44.34)

39.82 (35.80, 43.90)

Î˛02 ,

25.0 (24.03, 25.94)

24.92 (23.58, 26.25)

7.56 (6.69, 8.46)

7.60 (6.70, 8.50)

Î˛03 ,

152.2 (145.9, 158.3)

151.2 (142.2, 160.1)

Î˛13 , P 50

54.04 (48.42, 59.79)

53.12 (47.48, 58.85)

7.17 (7.11, 7.24)

7.17 (7.07, 7.26)

Î˛14 , dns

0.135 (0.063, 0.210)

0.134 (0.056, 0.213)

2
Ďv1

284.1 (201.3, 403.3)

468.7 (309.6, 711.6)

2
Ďv2

12.49 (8.80, 17.96)

20.39 (12.86, 31.89)

2
Ďv3
2
Ďv4

504.5 (346.9, 737.1)

794.2 (499.9, 1258)

0.060 (0.039, 0.088)

0.109 (0.060, 0.179)

Îť1

â

0.681 (0.286, 0.960)

Îť2

â

0.514 (0.113, 0.893)

Îť3

â

0.589 (0.164, 0.899)

Îť4

â

0.465 (0.049, 0.891)

Î˛12 P 25

Î˛04 ,

for the FH model ranged between 18.8 â 215.2 MgÂˇhaâ1 with a mean of 116.5 MgÂˇhaâ1 .
The range of the FHCAR model was nearly identical, 18.3 â 218.1 MgÂˇhaâ1 , with a mean
of 115.6 MgÂˇhaâ1 . The FH and FHCAR models for basal area ranged between 5.37 â 43.4
m2 Âˇhaâ1 and 5.66 â 43.8 m2 Âˇhaâ1 with means of 25.0 and 24.9 m2 Âˇhaâ1 , respectively. Cubic
volume ranged between 24.69 â 313.6 m3 Âˇhaâ1 with a mean of 152.3 m3 Âˇhaâ1 for the FH
model; and for the FHCAR model, the cubic volume had a range of 22.13 â 310.9 m3 Âˇhaâ1
with a mean of 151.2 m3 Âˇhaâ1 . The final outcome of interest, tree density, had ranges of
501 â 2328 treesÂˇhaâ1 and 529 â 2488 treesÂˇhaâ1 with means of 1326 and 1323 treesÂˇhaâ1
for the FH and FHCAR models, respectively.
For the multivariate FHIW model, Figure 3.2 maps the posterior means for the esti-

35

mates of the four outcomes of interest for the individual NRF stands. The ranges of the
estimates were: 23.0 â 206 MgÂˇhaâ1 , 6.55 â 42.4 m2 Âˇhaâ1 , 10.36 â 308.7 m3 Âˇhaâ1 , and 484
â 2390 treesÂˇhaâ1 for aboveground biomass, basal area, cubic volume, and tree density, respectively. The mean estimates were 114.9 MgÂˇhaâ1 for aboveground biomass, 24.8 m2 Âˇhaâ1
for basal area, 154.9 m3 Âˇhaâ1 for cubic volume, and 1315 treesÂˇhaâ1 for tree density. These
multivariate estimates are comparable to those of the univariate models.
A scatter plot of these direct and posterior SAE model estimates does not reveal any
regions of deviation for the outcomes of aboveground biomass, basal area, and cubic volume
(Fig. 3.3a-c). However, tree density does show some larger deviations from the one-to-one
line for direct estimates of 2000 treesÂˇhaâ1 or greater for all of the SAE models examined
(Fig. 3.3d). Following the methods of Brown et al. (2001) and You and Zhou (2011), bias
due to model misspecification was assessed through fitting linear models between the direct,
assumed to be design-unbiased, and the posterior means of the SAE model estimates. The
slopes of the individual linear models were 1.636, 1.447, and 1.554 for FH, FHCAR, and
FHIW posterior means of tree density, respectively, which suggests there may be a systematic
bias across the range of tree density values. The R2 for each were also low at 0.498, 0.481, and
0.525 for the FH, FHCAR, and FHIW posterior means. The slopes for the other outcomes
had individual linear models ranging from 0.976 â 1.085 and R2 ranging between 0.809
â 0.878, which suggests no systematic bias for these outcomes across the range of values
examined.

36

225

44

195

38.29

165

32.57

135

26.86

105

21.14

75

15.43

45

9.71

15

4

(a)

(b)

350

3700

300

3186

250

2671

200

2157

150

1643

100

1129

50

614

0

100

(c)

(d)

Figure 3.2: The posterior means for the outcomes of (a) mean aboveground biomass
(MgÂˇhaâ1 ), (b) mean basal area (m2 Âˇhaâ1 ), (c) mean volume (m3 Âˇhaâ1 ); (d) mean tree
density (number of treesÂˇhaâ1 ) for the FHIW model.

The coefficient of variation (CV), defined as the square root of the posterior variance
divided by the posterior mean, was used as the measure of precision among the direct,
FH, FHCAR, and FHIW model estimates with a smaller value indicating better precision.
Figure 3.4 shows the CV of the direct and three SAE model estimates against the stands
ordered from smallest to largest CV of the direct estimator. For the direct estimates CV
ranges were 0.004 â 1.156, 0.034 â 1.273, 0.001 â 1.121, and 0.027 â 1.979 for aboveground
37

FH
40

FH

FHCAR

20

150
100

30

FHIW

Direct Estimate of Basal Area (m2haâ1)

FHIW

10

50

Direct Estimate of Aboveground Biomass (Mg haâ1)

200

FHCAR

50

100

150

200

10

Posterior Mean of Aboveground Biomass (Mg haâ1)

20

(b)

FH

3500

FH

FHCAR

FHIW

FHIW

2500
2000
1500

250
200
150
100

0

0

500

50

3000

FHCAR

Direct Estimate of Tree Density (No. trees haâ1)

300

40

1000

350

(a)

Direct Estimate of Volume (m3haâ1)

30

Posterior Mean of Basal Area (m2haâ1)

0

50

100

150

200

250

300

350

0

500

1000

1500

2000

2500

3000

3500

Posterior Mean of Tree Density (No. trees haâ1)

Posterior Mean of Volume (m3haâ1)

(c)

(d)

Figure 3.3: (a) Mean aboveground biomass (MgÂˇhaâ1 ), (b) mean basal area (m2 Âˇhaâ1 ), (c)
mean volume (m3 Âˇhaâ1 ); (d) mean tree density (number of treesÂˇhaâ1 ) for direct estimates
versus posterior means of FH, FHCAR, and FHIW models.

biomass, basal area, cubic volume, and tree density, respectively. The means of these direct
estimate CVs were 0.365, 0.358, 0.379, and 0.549 for the four outcomes. For aboveground
biomass, the ranges for FH, FHCAR, and FHIW models were all narrower than the direct
estimates (0.004 â 0.470, 0.004 â 0.429, 0.004â0.293) with means of 0.148, 0.130, and 0.100,

38

respectively. In the case of basal area, the ranges for the FH and FHCAR were 0.034 â
0.391 and 0.0343 â 0.360, while the FHIW was narrower with 0.019 â 0.269. The means of
the three SAE models were 0.140, 0.127, and 0.107. Cubic volume had ranges of 0.001 â
0.522, 0.001 â 0.463, and 0.001 â 0.754 for the FH, FHCAR, and FHIW models. Of the SAE
models for cubic volume, FHIW had the largest maximum value of 0.754 which is associated
with a regenerating forest stand and the lowest cubic volume direct estimate of all the NRF
stands. The means for these CVs were 0.153 and 0.141 for the FH and FHIW models. For
this outcome, FHCAR had the lowest mean value of 0.135. The CV for the FH model of
tree density ranged between 0.026 â 0.283 with a mean value of 0.219. The FHCAR model
for this outcome ranged between 0.026 â 0.337 with a mean of 0.210; and the FHIW model
ranged between 0.027 â 0.304 with a mean of 0.223.
Using a common CV threshold of 0.15 for stand-level estimates (Mauro et al. 2016), 27
of the 224 stands were at or below this threshold for the direct and 157 stands for the FH
model estimates of aboveground biomass. The FHCAR and FHIW models had 167 and 193
stands, respectively, at or below the threshold. For basal area, the direct and FH model
estimates had 33 and 164 stands that met this threshold, respectively. The FHCAR and
FHIW had slightly more stands that met the threshold with 167 and 192 stands, respectively.
In the case of cubic volume, the direct, FH, and FHIW model estimates had 22, 156, and
160 stands that met the threshold; the FHCAR model had the most stands with 166 stands.
Tree density had fewer than 12 stands that met the CV threshold for the direct and SAE
model estimates; so that, the CV threshold was increased to 0.25 for this outcome. The
direct, FH, FHCAR, and FHIW models had 20, 211, 190, and 186 stands, respectively, for
tree density that met this increased threshold.
Reductions in CV are observed between the direct estimator and the FH model for
all outcomes except one stand for cubic volume (Fig. 3.5), and even greater reductions
are observed between the direct estimator and the FHCAR and FHIW models. Directly
comparing the benefit of the multivariate FHIW model versus the univariate FH model, 223

39

1.2

DIRECT
1.2

DIRECT

FHCAR

FH
FHCAR

1.0

1.0

FH

FHIW

0.8
0.0

0.0

0.2

0.2

0.4

0.6

Coefficient of Variation

0.6
0.4

Coefficient of Variation

0.8

FHIW

Stands (ordered from smallest to largest CV Direct)

Stands (ordered from smallest to largest CV Direct)

DIRECT

FH

FH

FHCAR

FHCAR

FHIW

FHIW

1.5

DIRECT

1.0

Coefficient of Variation

0.6

0.0

0.0

0.2

0.5

0.4

Coefficient of Variation

0.8

1.0

2.0

(b)

1.2

(a)

Stands (ordered from smallest to largest CV Direct)

Stands (ordered from smallest to largest CV Direct)

(c)

(d)

Figure 3.4: Estimates of coefficient of variation (CV) for direct estimates and FH, FHCAR,
FHIW models of (a) mean aboveground biomass (MgÂˇhaâ1 ), (b) mean basal area
(m2 Âˇhaâ1 ), (c) mean volume (m3 Âˇhaâ1 ); (d) mean tree density (number of treesÂˇhaâ1 )
versus stands ordered by increasing CV of direct estimate. The horizontal dashed line
indicates a CV value of 0.15; and the horizontal dotted line indicates a CV value of 0.25.

stands, 224 stands, 192 stands, and 55 stands had further reductions in CV for the outcomes
of aboveground biomass, basal area, cubic volume, tree density, respectively. The percent
reductions in CV for aboveground biomass from the direct estimator to the FH model ranged

40

from 0.4% to 88.1% with a mean of 52.9% (Fig. 3.5a). For basal area, percent reductions in
CV from the direct estimator to the FH model ranged from 1.4% to 87.5% with a mean of
53.6% (Fig. 3.5b). Figure 3.5c shows the percent reductions in CV for cubic volume from
the direct estimator to the FH model ranged from -1.3% to 85.7% with a mean of 53.7%
for the NRF stands. For tree density, percent reductions in CV from the direct estimator
to the FH model ranged from 1.6% to 86.5% with a mean of 52.7% (Fig. 3.5d). Gains were
also observed in percent CV reduction from the FH to the FHIW model for aboveground
biomass with a range of -0.4% â 86.2% and mean of 30.6%, for basal area with a range of
5.4% â 78.4% and mean of 22.1%; and for cubic volume with a range of -76.5% â 74.6% and
mean of 9.8% (Fig. 3.6a-c). Gains in percent CV reduction were not as prevalent between
the FH and FHIW models for tree density with a range of -8.5% â 5.8% and mean of -1.8%
(Fig. 3.6d).

41

90

90

80

80

60

60

50

50

30

30

20

20

10

10

0
â5
â10

0
â5
â10

â20

â20

â40

â40

â80

â80

(a)

(b)

90

90

80

80

60

60

50

50

30

30

20

20

10

10

0
â5
â10

0
â5
â10

â20

â20

â40

â40

â80

â80

(c)

(d)

Figure 3.5: Percent reduction in coefficient of variation from direct estimate to FH model
for (a) mean aboveground biomass (MgÂˇhaâ1 ), (b) mean basal area (m2 Âˇhaâ1 ), (c) mean
volume (m3 Âˇhaâ1 ); (d) mean tree density (number of treesÂˇhaâ1 ).

The precision of the SAE models was also assessed by the width of the credible intervals,
where narrower intervals indicate less uncertainty. The 95% credible interval widths of mean
aboveground biomass for the FH and FHCAR models ranged between 1.54 â 69.1 MgÂˇhaâ1
and 1.55 â 88.3 MgÂˇhaâ1 with mean widths of 57.0 MgÂˇhaâ1 and 49.2 MgÂˇhaâ1 , respectively.
The FHIW model for aboveground biomass had a narrower range than the other two SAE
models (1.54 â 51.4 MgÂˇhaâ1 ) and a smaller mean width of 39.7 MgÂˇhaâ1 . Similar to the
42

90

90

80

80

60

60

50

50

30

30

20

20

10

10

0
â5
â10

0
â5
â10

â20

â20

â40

â40

â80

â80

(a)

(b)

90

90

80

80

60

60

50

50

30

30

20

20

10

10

0
â5
â10

0
â5
â10

â20

â20

â40

â40

â80

â80

(c)

(d)

Figure 3.6: Percent reduction in coefficient of variation from FH model to FHIW model for
(a) mean aboveground biomass (MgÂˇhaâ1 ), (b) mean basal area (m2 Âˇhaâ1 ), (c) mean
volume (m3 Âˇhaâ1 ); (d) mean tree density (number of treesÂˇhaâ1 ).

pattern of aboveground biomass SAE models, basal area had credible interval widths for
the FH and FHCAR models ranging from 4.35 m2 Âˇhaâ1 â 14.5 m2 Âˇhaâ1 and 4.22 m2 Âˇhaâ1
â 18.3 m2 Âˇhaâ1 with mean widths of 12.1 m2 Âˇhaâ1 and 10.8 m2 Âˇhaâ1 ; whereas, the FHIW
model had widths ranging from 2.19 m2 Âˇhaâ1 â 12.2 m2 Âˇhaâ1 with a mean width of 9.46
m2 Âˇhaâ1 . In the case of cubic volume, the three SAE model credible interval widths ranged
between 0.54 m3 Âˇhaâ1 â 114 m3 Âˇhaâ1 with mean widths of 76.7, 66.2, and 70.3 m3 Âˇhaâ1
43

for the FH, FHCAR, and FHIW models. For the SAE models of tree density, the credible
interval widths ranged between 227 â 1948 treesÂˇhaâ1 with mean widths of 1149, 1097, and
1162 treesÂˇhaâ1 . Based upon mean credible interval widths, the FHIW model had the best
precision for the outcomes of aboveground biomass and basal area; whereas the FHCAR
model performed best for cubic volume and tree density.

3.4

Discussion

To our knowledge, all of the current SAE literature related to forestry applications (e.g.,
Goerndt et al. 2011; 2013, Magnussen et al. 2014) has examined univariate outcomes with a
single proceedings paper by Mauro et al. (2015), which examined the multivariate outcomes
of basal area and volume at the area level from a frequentist perspective. The multivariate
FHIW model reduced, on average, the CV for the outcomes of aboveground biomass and
basal area more so than both the FH and FHCAR models indicating modeling multiple
outcomes may be more informative to increasing precision than applying separate univariate
models. However, the gains were not as substantial for cubic volume and actually did not
improve precision for tree density. The issues with poorer performance for tree density
in the SAE models considered here may be a result of a pattern of underestimating the
outcome for more dense stands with greater than 2000 treesÂˇhaâ1 . Goerndt et al. (2011)
found similar issues with the EBLUP SAE model for tree density, although with a pattern
of overestimation for high density stands and underestimation for low density stands. Due
to this potential model misspecification for tree density, a different metric for precision other
than CV may need to be considered, such as mean squared error, to account for additional
bias induced by the model. The model may have failed for tree density for multiple reasons.
Two potential causes are: i) the occlusion of trees in the variable-radius plot sampling, or
ii) the inability of LiDAR to penetrate the canopy for measuring subcanopy trees. Here,
large reductions in CV were seen in the application of the FHCAR model relative to direct
estimates and more modest gains compared to the FH model due to strong area effects in

44

the NRF. Magnussen et al. (2014) and Breidenbach et al. (2016) also found improvements
for small area applications in forestry that account for spatial autocorrelation.
In this study, we found P 25 to be the LiDAR covariate that explained the greatest variation in stand-level aboveground biomass and basal area, P 50 for stand-level cubic volume,
and dns for stand-level tree density. Hayashi et al. (2015) using the same data from NRF
found the height of P 45 to be most significant in explaining aboveground biomass estimates;
Hayashi et al. (2016) found a combination of the 45th quantile of LiDAR height, maximum
canopy height, and LiDAR height where maximum point density occurs were best in modeling plot-level volume estimates. Neither of these studies considered the outcomes of basal
area or tree density. A possible explanation for this difference is the LiDAR covariates used
in this study were a much smaller subset relative to the other two studies; and our LiDAR
covariates were summaries at the area-level (stand-level), while Hayashi et al. (2015) and
Hayashi et al. (2016) examined relationships between the outcomes of aboveground biomass
and volume with LiDAR covariates at the unit-level (plot-level). Also at the unit-level, Goerndt et al. (2010) examined univariate models for the outcomes of basal area and tree density
with LiDAR covariates in western Oregon. For basal area, Goerndt et al. (2010) found P 5,
percentage of first returns greater than 3 m above the ground, and the interquartile distance
in the distribution of first returns as the best model. As for tree density, they found the
minimum value for the distribution of all first returns above 3 m, P 10, and percentage of
first returns above 15 m and 21 m as the best model. Each of these models had relatively
high levels of variation explained of 0.823 and 0.877, respectively; whereas, our exploratory
analyses had values of 0.709 and 0.131 for these outcomes. These differences are not surprising as our LiDAR covariates are summaries at the area-level rather than the unit-level.
Goerndt et al. (2011) also found LiDAR covariates to differ in their final regression model for
area-level versus unit-level relationships with particularly low value of variation explained,
0.209, for tree density. Many studies (e.g., Lefsky et al. 2002) have also found P 95, generally
regarded as canopy height, to be a common explanatory variable; however, this was not the

45

case in any of our final linear models. The fairly low percent of variation explained by dns
for the tree density outcome also indicates the need for identifying better LiDAR covariates
from those explored here.
One potential drawback of applying SAE at the area-level is the inability to assess within
stand variation, as can be done when aligning LiDAR and plot data at the unit-level (e.g.,
Hayashi et al. 2016, Woods et al. 2011). As recommendations in the literature improve
for aligning VRP with LiDAR covariates, the multivariate model presented here could be
adapted to the unit-level. Additionally, the extension of the FHIW model to the FHSEP
model did not improve the precision due to the posterior of the correlation parameter being
near zero. However, this may not be true for other applications so continued consideration
for the FHSEP model may be warranted. Another possible solution may be to move from the
CAR model for the random effects to incorporate a spatial structure similar to the current
work of Datta et al. (2017). The precision in outcomes of interest may also benefit from
the incorporation of temporal and spatio-temporal models when areas have been inventoried
across repeated time periods.

46

CHAPTER 4
EVALUATION OF THE NATIONAL WOODLAND OWNER SURVEY
ESTIMATORS FOR PRIVATE FOREST AREA AND LANDOWNERS: A
CASE STUDY OF MONTANA

4.1

Introduction

Quantifying private forestland and understanding private landowner behavior at the state
level in the United States has been an interest of researchers for more than 40 years (Kingsley
and Finley 1975). Accurate and precise estimates inform outreach and assistance programs
to private forest landowners (PFLs) to achieve management and stewardship objectives. In
addition, researchers have been interested in understanding landowner behavior, objectives,
and values reflected in ownership and management decisions. According to Oswalt et al.
(2014), private forestland comprises more than 58% of all forest land in the United States
with larger proportions privately held in the North and South (74 and 87%, respectively) and
smaller proportions in the Rocky Mountains and Pacific Coast (26 and 39%, respectively)
regions. In part because of these large regional differences, many studies of private forest
ownership have been conducted at the state (Butler et al. 2005) and regional (Kaetzel et al.
2012) levels.
The US Department of Agriculture (USDA) Forest Service and Forest Inventory and
Analysis (FIA) program administer the National Woodland Owner Survey (NWOS) to provide the standard for state- and national-level estimates of private forest land area and
number of private landowners. Along with these estimates, additional related variables of
landowner attributes and behavior are collected. The FIA divides undifferentiated private
landowners into five ownership groups: i) corporate, including Native Corporations in Alaska
and private universities, ii) non-governmental conservation and natural resources organizations, iii) unincorporated local partnership, association, or club, iv) Native American, and
v) individual and family, including trusts, estates and family partnerships (OâConnell et al.
47

2015). In the past, Smith et al. (2004) designated private ownerships into just two groups:
industrial and non-industrial private forest (NIPF) landowners. More recently, Smith et al.
(2009) and Oswalt et al. (2014) have moved to new private designations of corporate (group
i) and non-corporate (groups ii - v). Whereas, the NWOS private ownership classifications
designate group i as corporate, groups ii - iv as other private, and group v as family forest.
Family forest is the primary population of interest in the NWOS (Butler 2008, Butler et al.
2016b). Here, our population of interest is the group of PFLs as defined by Finley et al.
(2001) and Metcalf et al. (2012). PFLs are defined as all NIPF owners except for tribal
owners. Tribal ownerships are treated as a group independent from PFLs because of their
unique and complex governance structures (Nettheim et al. 2002) that do not align with any
existing definition of PFLs. Although this definition differs from the NWOS classifications,
the main objective of this research is to evaluate the estimators for a known population.
Legal ownership boundaries have been used in the past to assess PFL populations through
acquisition of local property tax records in either hard copy or digital form (e.g., Kittredge
et al. 2008, Kilgore et al. 2013). The advent of geographic information systems (GIS) and
remote sensing and the increasing availability of digitized and regularly updated cadastral
data sets has increased the accessibility and affordability of these data. Although availability
is increasing, the development and administration of these data often lies at the local or
county levels of government; few states have complete and freely available data sets. The
state of Montana is one of several states currently known to have polygon-based parcel data
that allow the determination of a known population. We used these data to establish a known
population of PFLs and evaluate the current estimators of the NWOS with full response for
total private forest area and PFL population size. Further impetus for this current research
comes from previous studies (e.g., Metcalf 2010, Metcalf et al. 2012) identifying shortcomings
with the NWOS estimators for analyses conducted at the county level. Statewide testing of
the estimators had so far been impossible because of lack of data. Recent clarifications and
improvements to the descriptions of the NWOS estimators have been provided by Dickinson

48

and Butler (2013) and Metcalf et al. (2014).
Using Montana as a case study, our objective was to also evaluate the behavior of the
current estimators of the NWOS under a variety of nonresponse scenarios. Non-response
bias is an issue affecting human dimensions research, especially quantitative survey research,
and can significantly affect estimates of population parameters (Vaske 2008). Dickinson and
Butler (2013), in their presentation of the NWOS estimators, acknowledge the need for better
methods to handle nonresponse in the NWOS. Recently, Dickinson et al. (in press) outlined
some potential methods for handling nonresponse in the NWOS. Landholding size is a key
indicator of landowner decisions and behavior (e.g., Zhang et al. 2005) and is especially
important in the estimator as landholding size is used directly to determine the proper
weights for estimation of PFL population size. Thus, different trends in landholding size
in relation to response rates can affect the performance of the estimators. Others have
identified nonresponse as an important influence on biophysical estimates when stratifying
between private and public ownerships in the US National Forest Inventory (McRoberts
2003, Patterson et al. 2012, Domke et al. 2014).
In survey sampling, there are two types of nonresponse: unit and item (SaĚrndal et al.
1992, Lohr 2010). Here, our focus is exclusively on unit nonresponse related to the NWOS.
Little and Rubin (2014) describe the mechanisms of unit nonresponse as consisting of three
types: missing completely at random (MCAR); missing at random (MAR); and not missing
at random (NMAR). MCAR is defined by the mechanism of nonresponse being independent
of the data. In the past and present, the NWOS estimators have implicitly assumed a
nonresponse mechanism of MCAR and treated the survey respondents as representative of
the population (Butler et al. 2005, Dickinson and Butler 2013, Dickinson et al. in press). On
the other hand, MAR is defined as the mechanism of nonresponse depending only on the
components of responding data, and NMAR is defined by the distribution of nonresponse
depending upon the missing data values of nonresponse. Various modeling and weighting
methods can be used to deal with MAR and NMAR response mechanisms (SaĚrndal et al.

49

1992, Lohr 2010) with our proposed estimator focusing on an individual weighting method.
The three primary contributions of this manuscript are as follows: the evaluation of current NWOS estimators of private forest area and PFL population size for a known PFL
population at the state level; the presentation of an alternative estimator of PFL population
size that explicitly accounts for various nonresponse scenarios; and the evaluation of the
impacts of nonresponse biases on each of these estimators. The remainder of the article is
organized as follows. First, we provide a description of the steps for defining a GIS-based
baseline population of PFLs. Second, we describe the sampling frame and procedure for
simulation. Third, we outline the current estimator used in the NWOS for estimating private forest area and PFL population size along with an alternative estimator. Fourth, we
evaluate the impacts of different nonresponse scenarios through simulation on these estimators. Finally, we present the results of the estimators under different scenarios followed by
discussion.

4.2
4.2.1

Methods
GIS-based Baseline Population Estimate

To allow an objective assessment of estimator functionality, we combined Montana cadastral
data with remotely sensed land cover data to create a spatially explicit, statewide database
of Montana PFLs. We began by determining forested land cover types (i.e., deciduous, evergreen, mixed forests, and woody wetlands) from the National Land Cover Dataset (NLCD,
Jin et al. 2013) at a 30-m resolution and clipped to the extent of Montana. We then created
a separate raster layer indicating the presence of census water, as defined by Bechtold and
Patterson (2005), from the NLCD. To better align the raster and vector data, we resampled
the raster data to a 10-m resolution. After reclassifying the NLCD to include only those
pixels with forest cover types (the NLCD defines these pixels as at least 20% tree cover), we
considered a patch as forested when the total number of pixels in a patch was greater than 41
(for a 10-m resolution, an approximate area of 0.41 ha). The use of twenty percent tree cover

50

may underestimate the true amount of forest area across the state by an unknown amount.
Pixels with a common edge within a patch were considered neighbors. Forest patches were
then converted from raster to polygons for subsequent intersection with ownership parcels.
These characteristics of forest were chosen to align as closely as possible to the FIA definition
of forest which stipulates that forest must be at least 10% cover of trees, at least 0.41 ha in
area, and 36.6 m wide (OâConnell et al. 2015). Of the three characteristics, only the width
of the derived forest patches was slightly less than the FIA definition.
Ownership of the forested areas was determined by combining shapefiles for all statewide parcels from the Montana cadastral website (Base Map Service Center Montana State
Library 2015). We processed parcel data at the county level and then aggregated to the
state level. To begin processing parcels at the county level, we identified ownerships in
the data set without a given owner name. Additional shapefiles of county, public lands,
and reservation boundaries from the Montana Cadastral website (Base Map Service Center
Montana State Library 2015) were used to expedite and aid in classifying these ownerships.
Parcels for which ownership was unknown that fell within the public lands boundaries were
assumed public, and those parcels that fell within the reservation boundaries were assumed
tribal. The remainder of unknown parcels were given a unique identifier and treated as unique
ownerships. In addition, due to misalignment between some parcel and public lands or parcel
and tribal boundaries, we examined ownership names of all parcels that fell completely or
partially within public lands or tribal boundaries to confirm public or private ownership.
Only parcels clearly attributed as private were retained as potential PFLs.
Parcels are not a corollary for owners as individual owners may control multiple parcels
(Kittredge et al. 2008). We dissolved all parcels by ownership name and address so parcels
with the same owner were treated as singular ownerships. To account for potential different
spellings of the same landowner, an additional dissolve of the remaining parcels was done
by ownership address only. We then intersected those dissolved parcels with forest patch
polygons greater than 0.41 ha to calculate forest land on each ownership. We visually checked

51

the owner names of these parcels containing forest land (> 0.41 ha) to identify forest industry;
we expedited this process by using keywords such as âtimber,â âlumber,â âforest,â âtree,â
and others. Simultaneously during visual inspection, tribal ownerships were identified by
owner name. Private parcels containing forest land > 0.41 ha and not forest industry nor
tribal were considered PFLs. The NWOS (Butler et al. 2016b) and Oswalt et al. (2014)
classify tribal lands as private non-corporate ownerships. We aggregated county-level PFL
databases into a statewide layer and ran a final dissolve command to combine parcels owned
by the same owner in multiple counties. The final product was a statewide, spatially explicit
database of all unique PFLs. We completed all analyses using a combination of QGIS (QGIS
Development Team 2014) and R statistical software (R Core Team 2015).

4.2.2

Sampling Procedure

The FIA uses regionally based remote sensing techniques to establish phase 1 plots to stratify
the region of interest, at a minimum, into strata of forest, non-forest, and census water to
increase precision of estimation by stratified random sampling (Bechtold and Patterson 2005).
The NWOS primary sampling units of PFLs are tied to the plot location of FIA phase 2 plots
(Butler 2008, Butler et al. 2016b) within the FIA sampling design (Bechtold and Patterson
2005). PFLs are sent the NWOS at least one year after an FIA field crew has measured the
biophysical attributes of the plot and the landowner met the designation of a PFL. The FIA
use a tessellated hexagonal grid across the entire United States to randomly establish these
phase 2 plot locations within an individual hexagon. The actual hexagonal grid size for FIA
is reported as having equal areas of 2403 ha (Bechtold and Patterson 2005, Dickinson and
Butler 2013).
We generated a tessellated hexagonal grid, with equal hexagonal areas of 2407 ha, across
the state of Montana using DGGRID by Sahr (2011). This grid served to simulate the hexagonal grid that spatially distributes sampling plots used by the FIA and the NWOS. We then
located at random 1000 points in each hexagon, each point representing a unique, simulated

52

sample of the underlying population. Estimates of private forest area were calculated at each
iteration using the subsequently described estimators. For PFL population size, estimates
were also calculated at each iteration using two different estimators, subsequently referred
to as Methods I and II. For each of the two methods to be described, the means of the
distributions of the 1000 simulated estimates of total, NĚhd , and variance, vd
ar(NĚhd ), were
computed.

4.2.3
4.2.3.1

Estimators
Private Forest Area

The first estimator to be evaluated was total private forest area (in ha) for Montana. The
current NWOS estimator of forest area relies upon the FIA estimate, which uses stratified
random sampling, of private forest area (Dickinson et al. in press). Because we did not
stratify Montana as in the phase 1 plot procedures of FIA, we cannot apply the estimators
of stratified random sampling. Instead, we based our FIA estimator of total private forest
area on the assumption of equal probability sampling, in that each acre of forest is equally
likely to be sampled. The estimator and respective variance are similar to those presented
by Dickinson and Butler (2013):
n

h
Ah X
dhi = Ah pĚhd ;
AĚF IA =
nh

(4.1)

pĚ (1 â pĚhd )
vd
ar(AĚF IA ) = A2h hd
,
nh â 1

(4.2)

i=1

where AĚF IA is our FIA estimator of total private forest area across the stratum, Ah is the
total land area of the stratum, nh is the sample size in stratum h, dhi is an indicator variable
with value 1 if ownership i is within the stratum and private domain, and pĚhd is the ratio
between samples that fall on private forest land and the total sample size. Throughout this

53

study, the stratum is the state of Montana; the domain is privately held forest. The current
NWOS estimator for total private forest area is presented in Dickinson et al. (in press) as:
nhr
AĚF IA X
dhi = AĚF IA zĚ;
AĚhd =
nhr

(4.3)

vd
ar(AĚhd ) = AĚ2F IA vd
ar(zĚ) + zĚ 2 vd
ar(AĚF IA ),

(4.4)

i=1

where AĚhd is the total forest area in stratum h and domain d, AĚF IA in this case is from
equation 4.1, nhr is the number of respondents to the NWOS within the domain of interest
with replacement and assuming an MCAR mechanism of nonresponse, zĚ is the mean of
respondents within the domain of interest, and vd
ar(zĚ) is the sample variance of zĚ. Because
our domain of interest lies only in total private forest area, the estimates of equations 4.3
and 4.4 are equal to estimates of equations 4.1 and 4.2. We subsequently refer to these
estimators as Method I for private forest area estimation.

4.2.3.2

Private Forest Landowner Population Size

Using the GIS-based baseline PFL population size, we evaluated the current NWOS estimator and an alternative. We began with the current NWOS estimator for ownership total and
corresponding variance by Dickinson et al. (in press):
nhr
dhi
AĚF IA X
= AĚF IA xĚ;
NĚhd =
nhr
a
i=1 hi

(4.5)

vd
ar(NĚhd ) = AĚ2F IA vd
ar(xĚ) + xĚ2 vd
ar(AĚF IA ),

(4.6)

where NĚhd is the total number of owners in stratum h and domain d, AĚF IA is from equation
4.1, nhr is the number of respondents to the NWOS within the domain of interest with
replacement and assuming an MCAR mechanism of nonresponse, ahi is the forest area (in
ha) of the ownership in the stratum, xĚ is the mean of respondents within the domain of

54

interest weighted by the inverse of forest area owned, vd
ar(xĚ) is the sample variance of xĚ, and
vd
ar(AĚF IA ) is from equation 4.2. We subsequently refer to these estimators as Method I for
PFL population size estimation.
The alternative estimator for PFL population size begins with the previous NWOS estimators for ownership total and corresponding variance published by Dickinson and Butler
(2013):
n

n

h
h
dhi
dhi
A X
1 X
= h
;
NĚhd =
nh
phi
nh
ahi
i=1

VĚ ar(NĚhd ) =

(4.7)

i=1

n

h 
X
d

1

hi

nh (nh â 1)

i=1

phi

2
â NĚhd

,

(4.8)

where NĚhd is the total number of owners in stratum h and domain d, Ah is the total land area
of the stratum, nh is all samples that fall on land in stratum h with replacement, dhi is an
indicator variable with value 1 if ownership i is in the stratum and domain, ahi is the forest
a

area (in ha) of the ownership in the stratum, and phi ( Ahi ) is the selection probability of the
h
ownership in the stratum. In their current form, these estimators also implicitly assume an
MCAR mechanism of nonresponse.
In an attempt to correct the estimators of equations 4.7 and 4.8 for PFL population size
with nonresponse, we used a weighting method under the assumption of MAR across all
ten size classes, subsequently described, at different levels of response rates. The weighting
method was applied only to Equations 4.7 and 4.8 and accounts for the joint probability of
selection and response by:
P (selection, response) = P (selection)P (response|selection) = phi qhi ,

(4.9)

where qhi is the probability of landowner response to the survey given selection in stratum
a

h of landowner i. By replacing phi ( Ahi ) in Equations 4.7 and 4.8 with Equation 4.9, we
h

55

developed the alternative estimators, subsequently referred to as Method II, as:
rh
dhi
Ah X
;
NĚhd =
rh
ahi qhi
i=1

(4.10)

r

2
h 
X
dhi
VĚ ar(NĚhd ) =
â NĚhd ,
rh (rh â 1)
phi qhi
1

(4.11)

i=1

where rh is the sample size of responding PFLs with replacement plus all other plots that
fall on land within the stratum.

4.2.4

Nonresponse

We also generated estimates for each of the 1000 iterative samples under various nonresponse
scenarios for Methods I and II for PFL population size estimation. We began to evaluate
the effect of nonresponse on the estimators with an average response rate of 50% uniformly
distributed across 10 landholder size classes (referred to as Scenario A; see Table 4.1 for size
classifications). The size classifications were made to mimic size classes used in reporting by
Butler (2008) and Oswalt et al. (2014). Fifty percent was considered the starting point for
nonresponse based upon the results of Butler (2008). For total NIPF owners across the state
49 ) based on sending
of Montana, they found a response, or cooperation, rate of 44.1% ( 193â82

193 surveys (82 undeliverable and 49 total responses) to landowners. More recently, Butler
et al. (2016b) report a slightly better response rate of 48.2% based on sending 421 surveys
(21 undeliverable and 193 total respondents). We then decreased response incrementally
by 10%, equally distributed across all size classes, down to a low of 10%. Each of these
incremental response levels can be considered an MCAR mechanism of nonresponse. The
estimates of PFL population size from each of the two methods were then recalculated by
removing nonrespondents from the sample.
Our first evaluation of the two estimators assumed no pattern between landowner response and size class of properties, although in reality a trend may be present. Three
additional scenarios of nonresponse were examined, each with a unique relationship between
56

Table 4.1: Size classes of private forest area.
Size class Forest area (ha)
1
[0.41 â 4.05)
2
[4.05 â 8.09)
3
[8.09 â 20.2)
4
[20.2 â 40.5)
5
[40.5 â 80.9)
6
[80.9 â 202.3)
7
[202.3 â 404.7)
8
[404.7 â 2023.4)
9
[2023.4 â 4046.9)
10
4046.9 +
response rate and property size classes and corresponding to an NMAR mechanism. Across
all of these scenarios, we maintained an overall sample response rate of 50% (Figure 4.1b).
These scenarios were chosen to represent a range of possible trends that may be observed
when the survey is administered. The probabilities of nonresponse in each size class were
dependent on the sample size in each size class to maintain the overall sample response rate
of 50% per scenario (Figure 4.1b). Figure 4.1a shows an example of the sample distribution
of unique landowners by size class for a single realization with full response. Scenario B
had an extreme trend with few respondents in the smallest size class and progressively less
probability of nonresponse as size class increased. Scenario C was similar to Scenario B with
an opposite trend. The final scenario, referred to as D, was symmetric about the middle two
size classes (i.e., classes 5 and 6), each sharing the highest probability of nonresponse. The
mechanism of NMAR is clear in these scenarios: because of deliberate simulation of trends
in response based on landholding size, the evaluation of the estimators for the two methods
was once again based on removing the nonrespondents from the sample. We were only able
to apply Method II to the scenarios with a trend because we knew a priori the probabilities
of response based on the simulated data.
All estimator evaluations were assessed at the spatial extent of the state. The extent of the
entire state was chosen because it is the minimum resolution for reporting of estimates by the
NWOS. Throughout this article, the performance of each of the two methods was compared
57

150

Probability of Nonâresponse

Number of Unique Private Forest Landowners

1.00

100

50

0

0.75
Scenario
â

0.50

â

â

â

â

â

â

â

â

â

â

A
B
C
D

0.25

0.00
1

2

3

4

5

6

7

8

9

10

1

Size Class

2

3

4

5

6

7

8

9

10

Size Class

Figure 4.1: (a) One realization of the sample distribution of unique private forest
landowners (PFLs) by size class, (b) trends of probabilities of nonresponse by size class for
four different scenarios (A-D) with an overall nonresponse rate of approximately 50%.

based upon design-based inference (Gregoire 1998) with the average 95% confidence interval
width and the actual 95% probability coverage of 1000 simulations. The actual probability
coverage is the fraction of the number of simulations of the 1000 that include the GIS-based
baseline within the confidence interval of each simulation; the nominal probability coverage
is chosen to be 95%.

4.3
4.3.1

Results
Private Forest Area

From the procedures used to determine PFLs across the state of Montana, the GIS-based
baseline total private forest land area was 1,743,229 ha (Table 4.2). Table 4.2 also shows the
estimate of 1,727,868 ha along with a sampling error of 3.65% based on applying Equations
4.1 and 4.2 for 1000 simulations. The estimate of private forest area was 0.88% less than

58

the baseline private forest area and included the baseline estimate in the 95% confidence
interval. The estimates of private (i.e., nonindustrial and noncorporate) forest area for
Montana from the literature (Butler 2008, Oswalt et al. 2014, Butler et al. 2016b) ranged
from 11.4 to 26.1% greater than the baseline of the derived population. Only the estimates
of Butler et al. (2016b) from the literature contain the baseline private forest area within the
95% confidence interval. The majority of all forestland in Montana was concentrated in the
western half of the state (Figure 4.2a), whereas concentrations of private forestland in the
state were not as clearly distributed due, in part, to large areas of public forest compared to
private forest. The private forestland covered approximately 27% of all forest land (Figure
4.2b). The largest proportion of forest area was in ownership sizes between 404.7 and 2023.4
ha (i.e, size class 8) for both the baseline population and the current NWOS (Butler et al.
2016b) estimate (Figure 4.3a). Some differences existed in the proportions in each size class,
but the overall shape of the distribution appeared similar for both the baseline population
and the NWOS estimate.
Table 4.2: Summary of Montana private forest land area (ha) for the GIS-based baseline
(base), the estimate from simple random sampling of the baseline (Method I), and
additional publications.
Forest
Area
Owner
Estimate
Source
Groupa
(ha)b
Baseline
PFL
1,743,229
Method I
PFL
1,727,868
Butler (2008)
NIPF
2,197,446
Oswalt et al. (2014) Noncorporate 1,941,685
Butler et al. (2016b) Noncorporate 1,978,106
a

%
Sampling Change
Error
from
(%)
base
â
â
3.65
-0.88
5.8
26.1
0.72
11.4
7.98
13.5

95% CI
includes
base
â
Yes
No
No
Yes

PFL, private forest landowners. Ownership classifications were redefined during reporting years from industrial
and nonindustrial private forest land (NIPF) to new classifications of corporate and noncorporate.
b Forest area estimates include ownerships >0.41 ha.

59

Figure 4.2: (inset) Location of Montana, highlighted in grey, within the conterminous
United States, (a) 10 m x 10 m resolution for all forested lands across the state of Montana,
and (b) 10 m x 10 m resolution for private forested lands across the state of Montana.

60

group

group

b

Base

Base

NWOS

Rockies

0.6

0.3

0.2

0.1

0.0

0.4

0.2

0.0

group

c

Base
40000

Number of Private Landowners

a

Proportion of Private Landowners

Proportion of Private Forest Area

0.4

NWOS

30000

20000

10000

0

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9+

Size Class

Size Class

1

2

3

4

5

6 7+

Size Class

Figure 4.3: (a) The distribution for proportion of total private forest area of family forest
landowners of Montana (NWOS, Butler et al. 2016b) and the GIS-based baseline (Base)
population of Montana by size class, (b) the distribution for proportion of private forest
landowners (PFLs) by size class for the Rocky Mountain region (Rockies, Oswalt et al.
2014) and the Base population of Montana, and (c) the distribution of number of family
forest landowners of Montana (NWOS, Butler et al. 2016b) and PFLs for the Base
population of Montana by size class.

4.3.2

Total Private Forest Landowners

The GIS-based baseline population totaled 66,316 unique landowners (Table 4.3). The proportion of PFLs among the 10 size classes for the baseline population was concentrated in
the lowest three size classes (< 20.2 ha) with the largest proportion owning less than 4.05
ha (Figure 4.3b). The baseline population had 36,822 landowners owning less than 4.05 ha
with the remaining 29,494 landowners in size classes 2 through 10 (Figure 4.3c).
The distribution for the proportions of PFLs closely followed the Rocky Mountain regional estimates of Oswalt et al. (2014) (Figure 4.3b). As with the regional estimates, the

61

distribution of the baseline PFL population closely followed the current NWOS (Butler et al.
2016b) (Figure 4.3c). The NWOS estimated slightly more landowners in the first size class
with 41,409 owners compared with the baseline population. The remaining size classes for
the current NWOS have an additional 29,000 PFLs across the state for a total population
estimate of 70,409 (Table 4.3).
Table 4.3: Summary of the number of private forest landowners in Montana for the
GIS-based baseline (base), the estimate from two sampling methods, and additional
publications.
Owner
No. of
a
Source
group
ownersb
Baseline
PFL
66316
Method I
PFL
68838
Method II
PFL
68811
Butler (2008)
NIPF
40000
Butler et al. (2016b) FAMILY 70409

Sampling
error (%)
â
14.3
14.3
22.9
22.8

% change
from base
â
3.80
3.76
-39.7
6.17

95% CI
includes base
â
Yes
Yes
No
Yes

a

PFL, private forest landowners; NIPF, nonindustrial private forest land owners; FAMILY, private owners classified
as family.
b Number of private owners includes ownerships >0.41 ha.

4.3.2.1

Full Response

For 1000 simulations of Method I with full response, the average estimate of the PFL population size was 68,838 (Table 4.3). Method II, with full response, had a slightly lower estimate
of 68,811. The sampling error of the two methods was approximately 14% and included the
baseline population in the 95% confidence interval. Methods I and II each had an increase
in percentage from the baseline population of less than four percent.
For Method I, the actual 95% probability coverage was 0.948 with an average 95% confidence interval width of 38,064 PFLs. Method II had a nearly identical actual 95% probability
coverage of 0.947 with a confidence interval width of 37,965 PFLs (Figure 4.4a-c).

62

120

â
â

â

â

â

â

65

60
Method
â

II

a
100

50

40

30

20

â

100

â

80
â

60

â
â

40

â

20

b

10

100

Response rate (%)

30

20

â

100
â

â

50

â

d
A

B

C

D

â

40

30

20

â

0.925

c
100

1.00

â
â

50

â

50

10

Response rate (%)

100

â

â

â

0.75

0.50

0.25

â
â

0

Scenario

0.950

10

150

Mean 95% Confidence Interval Width
(in thousands)

^
Mean Nhd (in thousands)

40

â
â

Response rate (%)

150

0

50

â

0.975

0.900

95% Probability Coverage

55

I

1.000

95% Probability Coverage

70

Mean 95% Confidence Interval Width
(in thousands)

^
Mean Nhd (in thousands)

75

e
A

0.00
B

C

Scenario

D

f

â

A

B

C

D

Scenario

Figure 4.4: Horizontal line indicates the GIS-based baseline of PFL population size.
Horizontal dashed lines represent the nominal probability coverage rate of 0.95. (a-c) Mean
PFL population size, mean 95% confidence interval width, and 95% actual probability
coverage for full response and incremental response rates from 50 to 10% for the two
different estimators (I and II) based upon 1000 simulations, respectively. (d-f) Mean PFL
population size, mean 95% confidence interval width, and 95% actual probability coverage
for four different response scenarios (A-D) with an overall 50% response rate of Methods I
and II based upon 1000 simulations, respectively.

63

4.3.2.2

Nonresponse

For Method I with an overall sample response rate of 50% (i.e., Scenario A), the average
estimate of PFL population size remained relatively the same as full response with a mean
estimate of 68,889 PFLs. The average confidence interval width increased from full response
to 52,493 PFLs. The actual 95% probability coverage remained relatively constant at 0.962
(Figure 4.4a-c). Method II, with 50% response rates, increased from full response with a
mean estimate of 70,475 PFLs. Method II had a wider average 95% confidence interval width
of 54,452 PFLs with an actual 95% probability coverage of 0.973. As response rates continued
to decrease, the estimates of PFL population size generated by Method I remained relatively
stable. However, both the mean confidence interval width and probability coverage showed
an increasing trend. Meanwhile, the estimates generated by Methods II showed a slight
increasing trend with decreasing response rate, and the confidence interval widths showed a
large pattern of increase. This resulted in inflated probability coverages for Methods I and
II of up to 0.976 and 0.989, respectively.
Figure 4.4d shows the results of PFL population size estimates generated by Methods I
and II for the four different scenarios of nonresponse examined with an overall 50% response
rate. The most inaccurate and imprecise estimates of Method I are seen in Scenario B, in
which small landowners had a high probability of nonresponse, with the probability coverage
dropping to zero. In contrast, Method II provided stable estimates of PFL population size
with an inflated probability coverage for Scenario B. Confidence interval widths of Method
I dropped considerably, whereas those of Method II drastically increased.
For Scenarios C and D, PFL population size estimates of Method I increased well above
the values of Scenario A. Method I had a large overestimate in Scenario C (high nonresponse
probability of large landowners) with an estimate of 118,533 PFLs, a confidence interval
width of 70,402 and a probability coverage rate of 0.063. Scenario D of Method I had a
slightly lower over estimate of 89,913 PFLs, a confidence interval width of 63,444, and a
probability coverage rate of 0.800. Method II remained stable with estimates of 70,446 and

64

70,266 for Scenarios C and D, respectively. The confidence interval width and probability
coverage for Scenarios C and D also remained relatively constant at 40,977, 43,881, 0.954 and
0.970, respectively. All of these metrics gave better results for Method II than for Method
I across all four scenarios. However, the confidence interval widths of Scenario C, which
occurred due to very low response rates in the smallest size class, are still concerning.

4.4

Discussion

The estimator for forest area was both accurate and precise for estimating the GISbased baseline private forest area for the state of Montana. However, the GIS-derived forest
area estimation showed large discrepancies from other published estimates because of the
exclusion of tribal lands considered in our definition of PFL, whereas tribal lands are included
as private in the other published definitions. By rerunning our outlined procedure for tribal
lands only, the derived total tribal forest area is 220,646 ha. Adding this to our baseline
estimate of private forest area gives a total forest area of 1,963,875 ha, which results in the
forest area estimate of Butler et al. (2016b) being 0.725% larger. We consider this difference
to be negligible. Although the derived tribal land would comprise approximately 11% of the
new total forest land, the additional number of landowners amounts to less than 0.2 percent.
The inclusion or exclusion of tribal lands did not affect the performance of the estimators
for private forest area or PFL population size.
Estimates of PFL population size generated by Methods I and II performed relatively
equally for full response. Either one of these estimators could reliably be used for PFL
population size estimation with full response. The assumption of a fixed and known total
land area is one advantage of Method II, whereas Method I must incorporate the FIA estimate
of total and variance for private forest area, respectively. Dickinson and Butler (2013) cite
this as a primary reason in their discussion for proposing that the NWOS switch to estimators
similar to Method II. For both Methods I and II, the PFL population size was on average
overestimated based on our simulations, and a corresponding decrease in sample size from

65

reduced response rates caused a decrease in precision. The reason for this overestimate on
average is unknown to the authors. As for the reduced precision, according to Lohr (2010),
the best way to reduce this impact is to invest in reducing the rate of nonresponse through
prevention.
Butler (2008) estimates the NIPF population size of 40,000 landowners in Montana generated by methodology corresponding to our Method I. At first this estimate seemed too low
because the 95% confidence interval did not include the GIS-based baseline estimate, but it
may be a reasonable estimate dependent on the sample actually drawn and the nearly 10-year
difference between our study and the completion of their survey. Based on our simulations,
the estimate of PFL population size could be as low as 42,787 for both Methods I and II.
However, Butler et al. (2016b) estimate the family population size (including ownerships >
0.41 ha in size) to be considerably larger than that of Butler (2008) with 70,409 landowners
in Montana, which includes the GIS-based baseline estimate in the 95% confidence interval.
The sample size and number of respondents between these two survey periods were 49 and
193, respectively. The larger sample size is related to the current NWOS reporting estimates
at the unit of an individual state with minimum sample size requirements to meet targets of
precision (Butler et al. 2016b). The increased sample size should only impact the precision
and not the estimate as long as nonresponse follows a MCAR mechanism. This increase
of 30,409 landowners could be an actual trend since both surveys used Method I and the
assumption of a MCAR nonresponse mechanism is valid for both survey periods in Montana.
Method I showed stable estimates of PFL population size in our simulations in which the
assumption of MCAR holds for decreasing response rates. However, if a MAR or NMAR
nonresponse mechanisms exist then Method I may drastically underestimate or overestimate
PFL population size dependent on a trend in nonresponse related to landholding size being
present. In the case of a MAR or NMAR mechanism, Method II would be the preferred
estimator for consistent estimates of PFL population size. For longitudinal studies of PFL
population size, this may highlight the need for continually assessing the nonresponse mech-

66

anism when the PFL population size is determined. One possible solution around issues
of nonresponse is to abandon the survey methods for population totals and adopt methods
similar to this study using cadastral data.
Nonresponse clearly has a large impact on both estimators across all rates and trends
considered. Most of all, our Method II would fail considerably without the weighting correction for decreasing response rates when nonrespondents are removed because the sample
size includes all plots that fall on land, and the nonrespondents on private land comprise a
very small proportion of the total number of plots that fall on land. This was not the case
for Method I, in which only respondents are considered, because the decrease in the sample
size of forested plots was proportionally offset by the terms in the summation of Equation
4.5. Here, we present an estimator of PFL population size that is an alternative to the current NWOS estimator for nonresponse. The new Method II performed best in all scenarios
with the trends considered. Although the confidence interval width increased for decreasing
response rates, this performance is no different from that of the currently used estimation
methods (i.e., Method I). The new method can easily be applied if no nonresponse trend is
found proportional to landholding size. If a trend is determined, the probability of response
for each size class must be estimated independently.
One common approach to assessing nonresponse in surveys is through follow-ups with
nonrespondents to determine any differences from respondents based on some characteristics, such as landholding size. The results of these follow-up surveys could then be used for
applying Method II to determine probability of response given selection by size class, if necessary. Dickinson et al. (in press) also propose logistic regression for assessing probabilities
of response with additional covariates that are available from the survey data. GIS-based
methods with cadastral data offer one alternative to follow-up surveys in determining differences in response rates according to landholding size by knowing an approximate estimate
for the amount of forestland owned before conduction of the survey. One possible drawback
of this method for implementation in the NWOS is that only 9 of 50 states currently have

67

freely available cadastral data. However, cadastral data for assessing landholding size are
available at the county level in many parts of the United States.
Our methodology of combining remote sensing and freely available cadastral data into a
GIS for determining PFL population size may obviate the need for sampling to determine
totals of private forest area and PFL population size. In addition, this methodology may
allow future researchers to employ novel sampling strategies or increase sample size and
corresponding precision of landowner attitude and behavior estimates. Sampling strategies
could use stratification techniques to increase proportions of âsmallâ landowners into the
sample, exploring issues relevant to more parcelized areas or at the wildland urban interface.
For example, the current NWOS for Montana bases its results on only eight landowners
owning less than 4.05 ha and has a total sample size of 193 respondents. The potential for
statewide estimation based on this methodology, previously mentioned by others (McRoberts
2003, Kittredge et al. 2008, Metcalf 2010, Metcalf et al. 2012), may now be readily available
for researchers and others exploring private forest ownership dynamics.

68

CHAPTER 5
HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA ESTIMATION
OF COUNTY-LEVEL PRIVATE FOREST LANDOWNER POPULATION

5.1

Introduction

There are approximately 192 million ha of private forest land in the United States (US)
owned by corporations, families, individuals, nongovernmental organizations, and tribal entities (Butler et al. 2016c). This forested land provides many social and ecosystem benefits,
but is managed by millions of owners with potentially disparate goals and objectives (Butler et al. 2016a). Therefore, a deeper understanding of the demographics, attitudes, and
management behaviors of private forest landowners is paramount to designing effective incentives, outreach programs, and support mechanisms that enable these owners to engage
in sustainable forest management activities. The USDA Forest Service Forest Inventory and
Analysis (FIA) National Woodland Owner Survey (NWOS) is the primary source of information about national-, regional-, and state-level private forest characteristics and ownersâ
demographics, attitudes, and behaviors. NWOS results can be summarized by ownership
type, e.g., corporate, family, other private, and tribal and is currently implemented every
five years (Butler et al. 2016c).
Given limited resources to conduct the NWOS and low to moderate response rates, which
are common in social surveys, the small sample size restricts reliable inference to statelevel estimates. Although state-level estimates are informative, county-level estimates might
be more appropriate and effective when designing and delivering education and outreach
to private forest owners. Furthermore, from a management and conservation standpoint,
county-level estimates are more useful for tracking trends in private forest land parcellation,
fragmentation, composition, and ownership demographics and characteristics (see, e.g., Kittredge et al. 2008, Pan et al. 2009, Poudyal and Hodges 2009). Results from these, and

69

similar, studies underscore the value of ownership information at fine spatial scales to encourage sustainable forest management and curb loss of ecosystem services. To this end,
we present and assess methods to estimate the number of private forest ownerships at the
county-level using the current NWOS sampling design.
The NWOSâs target sample size is 250 respondents per state (Butler et al. 2016a) and a
minimum of 100 respondents is required to generate state survey summaries (Butler et al.
2016b). Again, such small state-level sample sizes effectively preclude robust inference at
county-levels, particularly when reporting results by ownership type. Small sample sizes can
result in undesirably low precision of parameter estimates within a design-based framework
(e.g., the NWOS design). Here, we refer to parameter estimates obtained using designbased estimators as direct estimates. These direct estimates are the current standard used in
operational settings that employ design-based estimators; therefore, the direct estimates will
be used for benchmarking against model-based approaches. Small area estimation (SAE) is a
model-based approach that couples a direct estimate and possible covariates to improve the
estimate precision and, in some cases, accuracy. Rao and Molina (2015) provide an excellent
review of available SAE methods. Unlike a standard linear regression, the SAE framework is
comprised of two component models: a sampling model and a linking model (You and Zhou
2011). Estimation of the SAE parameter of interest accounts for and balances between the
sampling (i.e., direct estimator) and linking model errors. The linking model is a linear model
with random effects that relate the small areas of interest with some error. Additional spatial
structure may still remain in the linking model after accounting for possible covariates. Such
residual structure can be further modeled using spatial random effects.
SAE is also of great interest to users of the core FIA biophysical variables with the
small area being dependent on the application. The modeling framework applied here for
private forest ownerships could easily be adapted to these biophysical variables. In the
case of biophysical variables, several recent forestry studies, e.g., Goerndt et al. (2013) and
Magnussen et al. (2014), use SAE to improve inference at county- and municipal-levels. A

70

thorough literature review yielded no application of SAE to private forest landowners or
related studies.
The primary contributions of this work are i) producing county-level private forest ownership datasets for Montana and New Jersey, ii) defining and assessing SAE models to improve
county-level inference of the number of private forest ownerships, and iii) developing open
source software to fit proposed SAE models.
The remainder of the manuscript is organized as follows. In section 5.2, we detail the
steps followed to create a spatially explicit private forest ownership dataset for Montana and
New Jersey. Then we define the direct estimator for the number of county-level private forest
ownerships along with two SAE models. We then describe our approach for comparing the
proposed SAE models. Results are given in section 5.3 followed by discussion and future
directions in section 5.4.

5.2
5.2.1

Materials and methods
True number of private forest ownerships

Recently, several states, including Montana and New Jersey, made cadastral data freely
available for determining property ownership. Using these 2015 cadastral data and the geographic information system (GIS) based analysis defined in Ver Planck et al. (2016), we
created a GIS layer that delineates our working definition of private forest. Here, âprivate
forestâ comprises NWOS ownership categories of corporate, family, and other private (excluding tribal lands) (Butler et al. 2016c). The number of ownerships with forest property
area greater than two ha within each county is the parameter of interest in the subsequent
analysis. The derived GIS layers provide the true parameter value within each county, denoted YTi , with subscript Ti noting the true number of private forest ownerships in county
i.
Developing the private forest ownership GIS layers began by downloading freely available
county-level cadastral data from the respective state repositories, Base Map Service Center

71

Montana State Library (2015) for Montana and New Jersey Office of Information Technology,
Office of GIS (2015) for New Jersey. These cadastral data were combined with remotely
sensed forest land cover data from the National Landcover Dataset (NLCD; Jin et al. 2013)
to determine forest ownerships greater than two ha. Based on the NLCD specifications
available, forested areas were defined to be at least 20% tree cover, at least 0.41 ha in area,
and 30 m wide.
To begin processing county-level properties, we identified ownerships in the dataset with
unknown names. Parcels for which ownership was unknown that fell within the boundaries
of public lands were assumed to be public. Additionally, the Protected Areas Database
of the US (US Geological Survey 2016) was used to identify nongovernmental conservation
organizations, e.g., the Nature Conservancy, listed as unknown in the cadastral dataset. All
other unknown properties were assigned a unique identifier and treated as unique private
ownerships. Remaining properties with known private ownership names were combined by
owner(s) name and street address so that multiple properties within a given county were
treated as a single ownership. Ownerships with forest industry or tribal affiliations were
omitted from the final county-level private forest ownership GIS layers to be consistent
with the private forest landowner definition used in Ver Planck et al. (2016) for Montana.
This portion of the analysis was completed using a combination of QGIS software (QGIS
Development Team 2014) and R statistical software (R Core Team 2015).

5.2.2
5.2.2.1

Models
Direct estimator

The direct estimator used for private forest ownership population at the state-level (Butler
et al. 2016a, Dickinson and Butler 2013) was applied to individual counties (note that the
notation was modified slightly for consistency with SAE models presented in section 5.2.2.2).
The direct estimator and associated variance are based on probability proportional to size

72

sampling (Hansen and Hurwitz 1943):
ni
dij
Ai X
;
Yi =
ni
aij

(5.1)

j=1

Ďi2

=

ni 
X
dij

1
ni (ni â 1)

j=1

pij

2
â Yi

,

(5.2)

where i indexes county, Yi is population total, Ai is total private forest area, ni is number
of samples with replacement, dij is an indicator variable that is one if sample j fell in
private forest and zero otherwise, aij is the forest area (ha) of the sampled ownership, and
pij = aij /Ai is the ownership selection probability. The steps for drawing the ownership
samples are described in section 5.2.3. The total county-level private forest area, Ai , was
fixed to equal the true private forest area derived from the GIS layer in section 5.2.1.

5.2.2.2

Small area estimation models

The direct estimator, Section 5.2.2.1, was log transformed to meet the SAE model normality
assumption. This transformation was also desirable because it ensures SAE model population estimates have the correct support, i.e., positive, following back transformation. Taking
the log of the direct estimator necessitates transformation of the associated variance, accomplished here via the delta method (Casella and Berger 2002). The Fay-Herriot (FH) SAE
model (Fay and Herriot 1979) for county i in 1, 2, . . . , m counties is defined:

Yei = Î¸Ěi + i ,

(5.3)

Î¸Ěi = x0i Î˛ + vi ,
where Yei is the log-transformed direct estimator, Î¸Ěi is the log population total, and i is a
normally distributed error with mean zero and variance ĎĚi2 = Ďi2 /Yi2 . The additive mean of
Î¸Ěi comprises an intercept and county-level covariates held in the p Ă 1 column vector x and

73

associated p Ă 1 vector of regression coefficients Î˛. The county-level random effects term vi
is normally distributed with mean zero and variance Ďv2 .
It is reasonable to think that forest and ownership patterns, e.g., property size and ownerâs
socioeconomic or demographic characteristics, could exhibit spatial autocorrelation. In our
setting, if direct estimate values are spatially correlated, i.e., adjacent counties have similar population values, then we should exploit this relationship to further improve inference
by pooling information across proximate counties. Counties could be represented as either
point locations (i.e., their centroid) or as areal units for defining this spatial structure. Representing counties by their centroid may misrepresent distances among neighboring counties
due to irregularly shaped counties; therefore, the areal approach is implemented as it maintains the desired neighborhood structure of the county lattice. Hence, we augment model
(5.3) by adding a spatially structured random effect that follows a conditional autoregressive
(CAR) prior distribution, see, e.g., Banerjee et al. (2015) and You and Zhou (2011). This
extended model called FHCAR is defined analogous to FH, with the exception that the unstructured random effects v = (v1 , v2 , . . . , vm )0 âź N (0, Ďv2 I) in model (5.3) are replaced with
v âź N (0, ÎŁ(Ďv2 , Îť)). Here, the m Ă m covariance matrix ÎŁ(Ďv2 , Îť) = Ďv2 [ÎťR + (1 â Îť)I]â1 ,
where Ďv2 is the spatial variance parameter, Îť is the autocorrelation parameter, R is the
neighborhood matrix with diagonal elements equal to the number of neighbors and off diagonal elements equal to negative one or zero indicating if a neighbor is present or not, and I is
the m Ă m identity matrix. Counties are only considered neighbors with adjoining borders.
The FH and FHCAR Bayesian model specifications are completed by assigning prior
distributions to parameters (Gelman et al. 2014). We selected non-informative priors for all
model parameters. Each regression coefficient in Î˛ was assigned a flat prior distribution, Ďv2
was given an inverse-Gamma (IG) prior distribution, and, following You and Zhou (2011),
Îťâs prior was uniform with support between zero and one. The IGâs shape hyperparameter
was set to two, which results in a prior mean equal to the scale hyperparameter and infinite
P
2
variance. The IGâs scale hyperparameter was set as m
i=1 ĎĚi /m to give equal prior weight

74

to the sampling and CAR variances. A Markov chain Monte Carlo (MCMC) algorithm was
used to sample from parametersâ posterior distributions. Specifically, a Gibbs algorithm was
developed to sample from Î¸Ě = (Î¸Ě1 , Î¸Ě2 , . . . , Î¸Ěm ), Î˛, and Ďv2 with full conditional distributions
given in You and Zhou (2011), and a Metropolis-Hastings algorithm was used to sample from
Îťâs posterior distribution. The data and associated code are available in (Ver Planck et al.
2017b).
Parameter posterior inference was based on 3000 post burn-in MCMC samples from each
of L = 3 chains resulting in K = 9000 samples. Chain mixing and convergence were diagnosed using a multivariate potential scale reduction factor of less than 1.1 for all parameters
considered (Gelman et al. 2014). For our parameter of interest Î¸Ěi (posterior mean of the
population total), the K posterior samples of Î¸Ě were exponentiated back to the original
units.

5.2.3

Simulation study

For the NWOS and similar efforts, SAE models are viable from a statistical perspective if
they improve county-level estimate precision without inducing substantial bias. Given the
actual private forest ownerships for Montana and New Jersey, section 5.2.1, we examine
SAE model inference against truth using a simulation study. One iteration in the simulation
study produces a set of county-level direct and SAE model estimates by i) drawing a random
probability proportional to size sample from the private forest ownership list sample frame,
ii) computing direct estimates (section 5.2.2.1), iii) estimating FH and FHCAR models
(section 5.2.2.2), and iv) evaluating differences between SAE model population estimates
and truth. Summarizing results from step iv for a large number of iterations allows us to
assess precision and bias in SAE model population estimates.
A county-specific sample size, ni , is needed to conduct step i. Here too, we want the
sample size to approximate that achieved by the NWOS design. To determine the sample
size in each county, we randomly located 1000 points within each hexagon of a tessellated

75

hexagonal grid laid over the states of Montana and New Jersey using DGGRID developed
by Sahr (2011). Each of the 1000 points represented a unique sample iteration across an
individual state. The area of each hexagon was 2407 ha. This grid roughly approximates
the one used to spatially distribute FIA and NWOS samples (Bechtold and Patterson 2005,
Dickinson and Butler 2013). From each set of hexagons sampled within a county, we calculated the mean number of points that fell within private forest ownerships across the 1000
sample iterations. This mean value was rounded up and used as ni . Given fixed ni , to reduce
variation among repeated samples, we repeated the simulation study steps i-iii N = 4000
times for Montana and New Jersey. These large numbers of repeated simulations should
empirically show unbiased estimates of truth for the direct estimator.

5.2.3.1

County-level covariates

Exploratory analysis using linear regression models showed that population density (PD;
number of peopleÂˇkmâ2 ) from the 2010 decennial census (US Census Bureau 2016), and
NLCD total forest area (TFA; ha) explain significant variability in log-transformed direct
estimates. For Montana, the linear regressions of the simulation runs explained 45% of the
variation on average with a range from 27% to 65%. For New Jersey, the explained variation
was higher with a mean of 76% and a range from 50% to 95%. Therefore, PD and TFA
covariates were used for the simulation study.

5.2.4

Simulation summary

The N iterations for the direct and SAE model estimates were evaluated for bias, relative
bias, mean squared error (MSE), root mean squared error (RMSE), percent coverage for a
95% nominal rate, and 95% confidence interval width for direct estimates and 95% credible
interval width for the SAE models. Each of these metrics were calculated in two ways: i)
for an individual county (i.e., eqs. 5.4 and 5.6) and ii) for an individual state (i.e., eqs. 5.5
and 5.7).
76

Using the following two equations, bias was calculated as the average difference between
the posterior mean of the population total, Î¸Ěij of county i and iteration j, and the truth YTi :

Biasi =

Bias =

N
1 X
Î¸Ěij â YTi ;
N

(5.4)

1
m

(5.5)

j=1
m
X

Biasi .

i=1

Additionally, relative bias, RBi , is defined as the bias relative to the truth for each county
1 Pm RB ).
(RBi = Biasi / YTi ) and summed across all counties for a state (RB = m
i
i=1
A trade-off between bias and precision is present when applying a SAE model. MSE was
calculated as the average squared deviations of the posterior mean of the population total
from the true population, and RMSE was the square root of these deviations:

N
1 X
M SEi =
(Î¸Ěij â YT i )2 ;
N

M SE =

1
m

j=1
m
X

M SEi .

(5.6)

(5.7)

i=1

Percent coverage was defined as the average number of times that the 95% SAE credible
interval for Î¸ij , or the direct estimate 95% confidence interval, included truth for each county.
The average 95% confidence interval width for the direct estimate was computed by a t
distribution with ni -1 degrees of freedom; whereas, the average 95% credible interval width
was determined from the 0.025 and 0.975 quantiles of the posterior distributions of Î¸ij . Both
percent coverage and average confidence interval width were also calculated at the state level.
A final relative MSE comparison among the direct and SAE model estimates was made
at the county level based on MSE (eq. 5.6) by
M SE1
1 M SE
1
2

â M SE2
+ 21 M SE2

(5.8)

where M SE1 indicates the direct or the first SAE model estimate to be compared with
M SE2 , the second SAE model estimate (Porter et al. 2015).
77

5.3

Results

Summing the sample sizes across all counties for a single iteration, Montana had a
statewide sample size of 751. Individual counties ranged from a minimum of two samples to
a maximum of 44 samples in Fergus County. The average sample size per county in Montana
was 13. Figure 5.1a maps the sample sizes across all of the counties for Montana. The true
population of Montana is 42625 ownerships. Liberty County, with 58 ownerships, had the
minimum population, and Flathead County, with 5209 ownerships, had the maximum (Fig.
5.1b). Montana is one of the least densely populated states in the US ranging from 0.097 to
21.6 peopleÂˇkmâ2 for an individual county with an average of 2.7 peopleÂˇkmâ2 (Fig. 5.1c).
Montanaâs forest area is primarily concentrated in the western portion of the state, with a
range of 1077 ha (Sheridan County in the east) to 919800 ha (Flathead County in the west)
and a mean 151200 ha per county (Fig. 5.1d).
New Jersey had a statewide sample size of 191 for each iteration. Individual counties
ranged from a minimum of two samples to a maximum of 20 samples in Burlington County.
The average sample size per county in New Jersey was nine. Figure 5.2a maps the sample
sizes across all of the counties for New Jersey. The true population of New Jersey is 35462
ownerships. Hudson County, with 33 ownerships, had the minimum population, and Atlantic
County, with 3842 ownerships, had the maximum (Fig. 5.2b). New Jersey is one of the most
densely populated states in the US, ranging from 73.4 to 4753 peopleÂˇkmâ2 , with an average
of 806.6 peopleÂˇkmâ2 in a single county (Fig. 5.2c). New Jerseyâs forest area is primarily
concentrated in the northwestern and southeastern portions of the state, with a range of 562
ha (Hudson County) to 121500 ha (Burlington County) and a mean of 41210 ha per county
(Fig. 5.2d).

78

49
42
35
28
21
14
7
0

5300
4550
3800
3050
2300
1550
800
50

0

100 km

(a)

(b)

925
793
661
529
397
265
133
1

21.62
18.53
15.44
12.36
9.27
6.18
3.09
0

(c)

(d)

Figure 5.1: Montana counties by (a) sample size, (b) true population size with >2 ha
forest, (c) 2010 census population density (PD, number of peopleÂˇkmâ2 ); (d) total forest
area (TFA, 1000s ha).

79

21

4100

18

3500

15

2900

12

2300

9

1700

6

1100

3

500

0

30

0

50 km

(a)

(b)

5405

140

4633

120

3861

100

3089

80

2317

60

1544

40

772

20

0

0

(c)

(d)

Figure 5.2: New Jersey counties by (a) sample size, (b) true population size with >2 ha
forest, (c) 2010 Census population density (PD, number of peopleÂˇkmâ2 ); (d) total forest
area (TFA, 1000s ha).

In logarithmic form, the regression coefficients were significant for the intercept, PD, and
TFA for the majority of the iterations. Averaged across all iterations for the Montana FH
model, the mean point estimate for the intercept was 5.01, the PD regression coefficient was
80

0.0733, and the TFA regression coefficient was 4.10x10â6 . For the FHCAR model, the point
estimates were 5.04, 0.0676, and 4.01x10â6 for the intercept, PD, and TFA, respectively.
None of the 95% credible intervals for the intercept or the TFA regression coefficient included
zero for the N iterations of either SAE model. For the PD regression coefficient, 3032 and
2914 iterations did not include zero in the credible interval of the FH and FHCAR models,
respectively.
In the case of the New Jersey FH model, the point estimates were 7.26, -9.35x10â4 ,
and 1.04x10â5 for the intercept, PD, and TFA regression coefficients, respectively. For the
FHCAR model, the point estimates were 7.25, -9.22x10â4 , and 1.08x10â5 for the intercept,
PD, and TFA regression coefficients, respectively. None of the 95% credible intervals for
the intercept included zero for any of the iterations of either model. None of the credible
intervals for the PD regression coefficient included zero for any iterations of the FH model
and 3996 iterations of the FHCAR model did not include zero. For the TFA regression
coefficient, 1735 and 2321 iterations did not include zero in the credible intervals of the FH
and FHCAR models, respectively.
SAE model parameter estimates for a randomly selected iteration are given in Table 5.1.
All of the regression coefficients were significant with the exception of TFA in New Jersey.
For Montana, the mean of the sampling variances was much smaller than the random effect
variance for both the FH and FHCAR models, whereas New Jersey had roughly equal mean
sampling and random effect variances. The autocorrelation parameter in both states was
fairly low with a much wider credible interval in New Jersey than Montana. Figure 5.3a
confirms that the posterior means of the sampling variances of all iterations in Montana were
lower than the random effect variances of the FH and FHCAR models; however, for New
Jersey, the FH and FHCAR random effect variances were smaller than the mean sampling
variances (Fig. 5.3c). The posterior mean of all iterations for the FHCAR autocorrelation
parameter was also fairly low in Montana and slightly greater in New Jersey (Figs. 5.3b and
d).

81

Table 5.1: The median and 95% credible intervals (in parantheses) for Fay-Herriot (FH) and FH with conditional
autoregressive (FHCAR) models of a single iteration for Montana (iteration 750) and New Jersey (iteration 350).
Montana

New Jersey

Parameter FH
Î˛0
4.92 (4.50, 5.34)

FHCAR
4.91 (4.38, 5.46)

FH
7.42 (6.65, 8.13)

FHCAR
7.41 (6.59, 8.14)

Î˛1 , PD

0.078 (0.001, 0.146)

0.072 (0.007, 0.135)

-1.11x10â3
(-1.48x10â3 , -7.42x10â4 )

-1.09x10â3
(-1.45x10â3 , -7.00x10â4 )

Î˛2 , TFA

4.40x10â6
(2.70x10â6 , 6.08x10â6 )

4.52x10â6
1.12x10â5
â6
â6
(2.72x10 , 6.35x10 ) (-3.56x10â7 , 2.27x10â5 )

1.16x10â5
(1.14x10â6 , 2.23x10â5 )

E(ĎĚi2 )

0.16 (0.02, 0.67)

0.16 (0.02, 0.67)

0.23 (0.09, 0.75)

0.23 (0.09, 0.75)

Ďv2

1.14 (0.76, 1.77)

1.71 (0.95, 3.73)

0.21 (0.07, 0.68)

0.22 (0.07, 0.93)

Îť

â

0.14 (0.007, 0.62)

â

0.29 (0.01,0.92)

Note: PD, number of peopleÂˇkmâ2 ; TFA, total forest area (ha).

82

15

4

Parameter
~2
Ď
i
FH Ď2v
FHCAR Ď2v

3

Density

Density

10

2

5
1

0

0
1

2

3

0.2

Posterior Means

(a)
6

0.4

0.6

Posterior Means

(b)
Parameter
~2
Ď
i

4

FH Ď2v
FHCAR Ď2v
4

Density

Density

3

2

2
1

0

0
0.5

1.0

1.5

2.0

0.2

Posterior Means

0.3

0.4

0.5

0.6

Posterior Means

(c)

(d)

Figure 5.3: Distribution of posterior means from all iterations for select SAE model
parameters: (a) the variance parameters of Montana, (b) the FHCAR model
autocorrelation parameter of Montana, (c) the variance parameters of New Jersey; (d) the
FHCAR model autocorrelation parameter of New Jersey.

Table 5.2 shows the simulation summaries for the individual states. As an aside and not
surprisingly, this simulation study can empirically show that the direct estimator is nearly
unbiased (see Appendix, Tables B.1 â B.3). For Montana, the biases of the direct, FH,
and FHCAR model estimates were -3, -25, and -28 ownerships, respectively. A negative
value indicates an underestimate of the true population and a positive value indicates an

83

overestimate of the true population. In terms of relative bias, these statewide estimates
represent less than negative one-tenth of a percent. The RMSE was largest for the direct
estimates with a value of 467. Both the FH and FHCAR models had similar RMSE values
of 407 and 405. The empirical coverage rates were low for the direct, FH, and FHCAR
model estimates with values of 75.0%, 75.3%, and 75.3%. The 95% confidence or credible
interval widths, ordered widest to narrowest, were 1663, 1109 and 1097 for the direct, FH
and FHCAR model estimates, respectively. Overall, the MSE is reduced by 24.0% from the
direct to the FH model, by 24.8% from the direct to the FHCAR model, and by 1.0% from
the FH to the FHCAR model.
For New Jersey, the bias of the direct estimates was one ownership, the FH model was
49 ownerships, and the FHCAR model was 40 ownerships. In terms of relative bias, these
statewide estimates represent less than two-tenths of a percent. The RMSE was largest
for the direct estimates with a value of 633. The FH model had the lowest RMSE of 545,
whereas the FHCAR model was between the direct estimates and the FH model with a
value of 564. The empirical coverage rate was highest for the direct estimates followed by
the FH and FHCAR models with values of 93.7%, 90.7%, and 89.2%, respectively. The direct
estimates had the widest 95% confidence interval width of 3979 followed by 95% credible
interval widths of 2207 for the FH model and 2050 for the FHCAR model. Overall, the
MSE is reduced by 25.9% from the direct to the FH model, by 20.6% from the direct to the
FHCAR model, and by 6.6% from the FHCAR to the FH model.
For individual counties of Montana, the bias (Eq. 5.4) of the direct estimates ranged
from -29 (for Lincoln County) to 12 (for Lewis and Clark County) ownerships (Appendix B,
Tables B.1 and B.2). Eight of the counties were unbiased for the direct estimates. In
terms of relative bias, the direct estimates ranged from -3% to 5%. The FH model had a
bias ranging from -289 ownerships (for Stillwater County) to 444 ownerships (for Flathead
County). Treasure County was the lone unbiased county for the FH model. The relative
bias ranged from -26% to 41%, with a mean of -3%. The FHCAR model had a bias ranging

84

Table 5.2: Summary of bias, root mean squared error (RMSE), empirical coverage for a
95% nominal coverage rate and average 95% confidence interval width of the direct
estimate and 95% credible interval width of the small area estimation model estimates for
Montana and New Jersey across all counties and iterations for the direct, Fay-Herriot
(FH), and FH with conditional autoregressive random effects (FHCAR) model estimates.
Montana

New Jersey

Metric
Bias

Direct
-3

FH
-25

FHCAR Direct FH
-28
1
49

FHCAR
40

RMSE

467

407

405

633

545

564

Percent coverage

75.0

75.3

75.3

93.7

90.7

89.2

Confidence / credible
interval width

1663

1109

1097

3979

2207

2050

from -288 ownerships (for Musselshell County) to 498 ownerships (for Flathead County).
Daniels County was the only unbiased county for the FHCAR model. The relative bias
ranged from -25% to 30%, with a mean of -4%. These biases can be explained by the tradeoff of increasing precision with an associated increasing bias. Based on the relative MSE
comparison of Eq. 5.8, 49 of 56 counties showed improvement from the direct to the FH
model estimates, and 52 counties showed improvement from the direct to the FHCAR model
estimates. Thirty counties showed improvement from the FH to the FHCAR model (Fig.
5.4). Both the FH and FHCAR models had 38 counties with greater percent coverage than
the direct estimates. The credible interval widths were narrower in all but Flathead County
for both the SAE models compared with the direct estimate confidence interval widths.
These estimates for individual counties can be seen in Appendix B (Table B.6).
For individual counties of New Jersey, the bias of the direct estimates ranged from -27
ownerships (for Cumberland County) to 18 ownerships (for Burlington County) (Appendix B,
Table B.3). Four of the counties were unbiased for the direct estimates. In terms of relative
bias, the direct estimates ranged from -1% to 1%. The FH model had a bias ranging from
-477 ownerships (for Hunterdon County) to 673 ownerships (for Ocean County). No county
was unbiased for either the FH or FHCAR model. The relative bias ranged from -15% to
372% (for Union County). Removing Union County, which has a small true ownership size of

85

46, the maximum becomes 55% (for Cape May County) and the mean is 9%. The FHCAR
model had a bias ranging from -572 ownerships (for Hunterdon County) to 764 ownerships
(for Ocean County). The relative bias ranged from -17% to 420% (for Union County).
Removing Union County again, the maximum becomes 60% (for Cape May County) and the
mean 9%. Sixteen of 21 counties showed improvement based on relative MSE (eq. 5.8) from
the direct to the FH model and to the FHCAR model. Ten counties showed improvement
from the FH to the FHCAR model (Fig. 5.5). The FH and FHCAR models had 16 and 14
counties, respectively, with greater percent coverage than the direct estimates. The credible
interval widths were narrower in all counties for both of the SAE models compared with the
direct estimate confidence interval widths. These estimates for individual counties can be
seen in Appendix B (Tables B.4 and B.5).

86

1
0.8
0.6
0.4
0.2
0
â0.2
â0.4
â0.6
â0.8
â1

(a)

1
0.8
0.6
0.4
0.2
0
â0.2
â0.4
â0.6
â0.8
â1

(b)

1
0.8
0.6
0.4
0.2
0
â0.2
â0.4
â0.6
â0.8
â1

(c)

Figure 5.4: Relative mean squared error comparisons for Montana (eq. 5.8): (a) direct to
FH model estimates, (b) direct to FHCAR model estimates; (c) FH to FHCAR model
estimates.

87

2

2

2

1.6

1.6

1.6

1.2

1.2

1.2

0.8

0.8

0.8

0.4

0.4

0.4

0

0

0

â0.4

â0.4

â0.4

â0.8

â0.8

â0.8

â1.2

â1.2

â1.2

â1.6

â1.6

â1.6

â2

â2

â2

(a)

(b)

(c)

Figure 5.5: Relative mean squared error comparisons for New Jersey (eq. 5.8): (a) direct to
FH model estimates, (b) direct to FHCAR model estimates; (c) FH to FHCAR model
estimates.

5.4

Discussion

Recent publications for the NWOS have focused on private ownerships 4.05 ha or greater
in size (e.g., Butler et al. 2016c). Our original objective in this study was to include private
forest ownerships of at least 0.41 ha, the minimum forest area considered by the FIA. These
smallest ownerships were to be included as they comprise a large proportion of ownerships
in states like New Jersey relative to Montana. However, preliminary analyses showed large
positive biases in the direct estimator so we adjusted the minimum area threshold to two ha
to attain empirically unbiased direct estimates rather than increasing the threshold to 4.05
ha as the two ha threshold still accounted for a large proportion of these smallest ownerships.
The unbiased direct estimates were desirable for benchmarking against the SAE models, and
the potential bias induced by these smallest ownerships may require further investigation.
We found population density and total forest area to be significant SAE model covariates
in the majority of the iterations for both Montana and New Jersey. However, population

88

density had a positive relationship with number of private ownerships for Montana and a
negative association for New Jersey counties. A possible explanation for the differing relationships could be the very low population densities of Montana counties having not reached
a critical threshold in the forested landscape, whereas New Jersey is the most densely populated state in the US, causing a corresponding loss of forested land. Kittredge et al. (2008)
found that population densities greater than 96.5â193 peopleÂˇkmâ2 (250â500 peopleÂˇmileâ2 )
for towns in Massachusetts increased the number of forested parcels less than eight ha and
secondarily reduced the proportion of the landbase in forest due to land use change. Poudyal
and Hodges (2009) also found a negative relationship between population density and woodland ownership population size for Texas. In studying all private forest landowners, Pan et al.
(2009) found population density to decrease the mean forest landholding size in Alabama,
which in turn could relate to a larger number of ownerships due to parcellation. The average
county population density of Alabama was reported as 29 peopleÂˇkmâ2 (76 peopleÂˇmileâ2 ),
slightly higher than the highest density found in Montana, so at these densities a positive
relationship with number of ownerships is plausible. The positive relationship between number of ownerships and total forest area in both states is not surprising as a county must have
a forested landbase for private forest ownerships to be present. Poudyal and Hodges (2009)
have also demonstrated the need for total private forest area at the county-level in modeling
private forest owner population size.
Development of a true ownership population at the county-level is not novel; however,
the methods here are a substantial contribution with complete coverage of two states. Other
studies have used similar approaches to develop forestland ownerships for a single county
with interest at the individual parcel-level (e.g., Cho and Newman 2005, King and Butler
2005). Cho and Newman (2005) were interested in the probability of land use conversion from
forest to developed land; King and Butler (2005) were interested in modeling landholding size
for individual parcels. Each of these response variables is important to county-level forest
management options available to an individual landowner. In each of these studies, the

89

distance to the nearest road from the parcel was the most important independent variable.
At the county level, road density may be another important covariate for future exploration;
however, the inclusion of road density may be collinear with population density currently
accounted for in the SAE models. Mehmood and Zhang (2001) found a decline in statelevel average forest landholding size, which relates to an increase in the number of private
forest ownerships for the period of 1978 to 1994. The five factors associated with this
decrease in landholding size were death, urbanization, income, regulatory uncertainty, and
financial assistance. A key objective of the current study was assessing the feasibility of
applying SAE models to county-level ownerships and hence we did not exhaustively explore
the full suite of potentially useful covariates, such as those identified in Mehmood and Zhang
(2001). Next steps in developing this line of work will identify potential transformations to
existing covariates, e.g., percentage of county forested, and an expanded set of useful SAE
model covariates that explain variability in an expanded set of ownership characteristics and
increase county-level estimate precision.
One interpretation of the conditional autoregressive (CAR) random effect in FHCAR
model is to capture unobserved covariates that are spatially structured. For example, neighboring counties may have similar regulations or demographics that impact the number of
ownerships. However, including a CAR random effect did not yield substantial gains in
RMSE in Montana and actually increased RMSE for New Jersey. This, combined with the
relatively small CAR correlation parameter estimates, suggests that there is little local residual spatial structure and that the FH model with unstructured random effects is adequate.
The FH model was overall more successful than the direct estimates in terms of MSE and
RMSE for each state and the majority of the individual counties of the two states in this
study. Alternate covariates may be needed for applying SAE models to other responses of
the NWOS, such as socioeconomic variables of Poudyal and Hodges (2009).
For simplicity, our simulation study considered only iterations with at least two samples
per county. This was done to ensure that the direct estimator yielded estimates of Î¸ and

90

Ď 2 for each county. In practice, however, when the statewide sample size is small, we would
expect zero samples in many counties. In such settings, a SAE model can still be applied
by imputing the âmissingâ Î¸ and Ď 2 estimates (ideally with uncertainty quantification).
This can be viewed as a missing data problem that is easily handled within a Bayesian
paradigm (Banerjee et al. 2015, Gelman et al. 2014). Here, given county-level observed
covariates, prediction of missing Î¸s follow directly from the posited model, i.e., second line
in model (5.3). The missing Ď 2 s can be predicted using a log-normal regression comprising
available covariates and spatially structured random effects, or other generalized variance
function (GVF) approaches common in SAE literature Dick (see, e.g., 1995). Our future
work will explore the efficacy of such SAE missing data problems for county-level ownership
characteristics.
With any survey implementation and sampling strategy, there is a trade-off between
sample size (e.g., number of contacts, determined by expected response rate) and resources
(e.g., time and money). Given limited resources, the number of contacts is typically constrained, and therefore, sample size is heavily dependent on survey response rates. Although
ownerships contacted in the NWOS respond at higher rates than many other mail-based
surveys, the potential lack of sufficient responses at the county level limits the usefulness of
the data at scales at which such data can enhance landowner outreach, incentive programs,
and information campaigns. For statewide surveys with county-level coverage and possibly
small county-level sample sizes, the FH model presented in this paper improves the ability of
researchers and forest management practitioners to use the NWOS data, by adding simple
and widely available covariates for population density and forest area. These covariates have
been shown to relate to forest management (Zhang et al. 2005), as well, making their utility
in SAE models applicable to more variables other than number of ownerships. Further research will assess SAE models use for county-level inference about other important response
variables from the NWOS, e.g., ownerships with management plans. Such models will consider jointly modeling multiple response variables and those variables with non-Gaussian

91

distributions, e.g., binary, count, and multinomial survey items.

92

CHAPTER 6
RECOMMENDATIONS FOR FUTURE WORK

The movement towards model-based small area estimation (SAE) models for both biophysical and social forestry variables is sure to continue as demand for reliable estimates at various
resolutions increases. The use of the model-based framework is necessitated in SAE because
the sample sizes for the design-based framework do not provide adequate precision. The
large-scale forest inventories and surveys of the United States Department of Agriculture
(USDA) Forest Service Forest Inventory and Analysis (FIA) program and the associated
National Woodland Owner Survey (NWOS) will continue to be a primary source for information on US forest resources. In combining this primary data source with other disparate
data sources, the application of the hierarchical Bayesian (HB) SAE framework offers the
primary advantage of fully propagating uncertainty to the final outcomes of interest.
The primary objective of this dissertation was to contribute to the growing body of literature for biophysical forestry-related SAE applications and introduce SAE to social forestry
applications. The detailed objectives of the four main chapters were to: i) apply a HB
framework to increase the precision of estimates for biophysical forest variables at the stand
level by borrowing strength across all stands through the use of LiDAR covariates; ii) apply
a conditional autoregressive structure to the stand-level random effects to assess gains in precision of biophysical forest variables; iii) evaluate the current NWOS estimators of private
forest area and private forest landowner population size for a known population at the state
level, iv) present an alternative estimator of private forest landowner population size that
explicitly accounts for various nonresponse scenarios; v) evaluate the impacts of nonresponse
biases on each of these estimators; vi) produce county-level private forest ownership datasets
for two complete states; vi) define and assess SAE models to improve county-level inference
of the number of private forest ownerships, and; vii) develop open source software to fit the
proposed SAE models.
93

This dissertation presented SAE within a HB framework; and additionally added spatial
random effects that had often been ignored in prior forestry-related SAE applications. The
research also offered open source software methods to fit the proposed SAE models, so that
this can no longer be a deterrent to applying a Bayesian framework for SAE in forestry.
There are several avenues related to this dissertation for further investigation. First, the
SAE models for aboveground biomass, basal area, and volume demonstrated an increase in
precision over the standard design-based estimates; however, tree density did not demonstrate the same results and could be improved through the discovery of more informative
covariates. Second, the biophysical variables also showed additional increases in precision
when a conditional autoregressive structure was added for the area-level random effects; but
these too could be improved as the credible intervals on these parameters were fairly broad.
Further exploration is also warranted for applying alternate neighborhood structures, e.g.,
by stand type rather than adjacency, for the conditional autoregressive structure. Application of newer spatial random effects models from the statistical literature (e.g., Porter
et al. 2015, Datta et al. 2017) may be incorporated into forestry-related SAE to improve
precision. Third, as stronger recommendations are made for combining variable-radius plots
with LiDAR covariates, we may prefer to employ the finer resolution unit-level SAE models.
Within the social forestry applications demonstrated in the last two chapters, the most
obvious avenues to pursue are to test for nonresponse at the state level and apply the
SAE models to actual NWOS data. The first of these avenues is currently being explored
by the Family Forest Research Center for the state of Wisconsin using modifications to the
current NWOS estimator, which are similar to the weighting methods described in Chapter 4.
The two statewide datasets developed here for Montana and New Jersey may be useful for
comparison to the outcomes for the state of Wisconsin, as nonresponse mechanisms may
differ by state. As for the SAE models, the county-level methods for expanding NWOS may
not be feasible due to the requirement of knowing the size of surveyed ownerships at the
county level, and this current level of detail seems to be too burdensome for respondents

94

to the previous iterations of the NWOS. However, the SAE models could easily be adapted
for the state-level NWOS data where desired precision of certain outcomes, e.g., the number
of ownerships in the 1â9 acreage, are not being met. The application of the multivariate
methods of Chapter 3 may be useful for increasing precision in estimating the 1â9 acre
ownership group with another correlated outcome of interest. Additionally, the SAE models
could be adapted for other NWOS variables with non-Gaussian distributions, e.g., binary,
count, and multinomial survey items.
Finally, there is still much research to be done that can more directly link the social
forestry responses of the NWOS to the core biophysical forest variables of the USDA Forest
Service FIA program. The linkage between the two into a single modeling framework may
offer better understanding and more precise estimation of outcomes related to the largest
ownership group and future of the US forest resource. As these forest inventories and ownerships surveys will continue into the future, additional improvement in precision for outcomes
of interest may benefit from applying temporal and spatio-temporal SAE models.

95

APPENDICES

96

APPENDIX A
HB AND EBLUP PARAMETER ESTIMATES FOR FAY-HERRIOT MODEL
OF ABOVEGROUND BIOMASS

The empirical best linear unbiased prediction (EBLUP) for the Fay-Herriot model was fit via
restricted maximum likelihood using the hbsae R package (Boonstra 2012) for aboveground
biomass. The comparison in parameter estimates between the EBLUP and HB inference are
given in Table A.1.
Table A.1: The median and 95% credible intervals for the model parameters of the
Fay-Herriot (FH) with hierarchical Bayesian (HB) inference and the point estimates with
standard errors in parentheses for the FH under empirical best linear unbiased predictor
(EBLUP) were fit via restricted maximum likelihood.
Parameter FH HB
Î˛0
-51.99 (-69.15, -34.31)

FH EBLUP
-52.57 (8.52)

Î˛1 , P 25

24.24 (18.17, 30.08)

24.27 (2.95)

Î˛2 , P 75

3.66 (0.83, 6.42)

3.69 (1.39)

Ďv2

284.4 (201.2, 404.8)

241.1

97

APPENDIX B
MODEL SUMMARIES OF PRIVATE FOREST LANDOWNERS FOR
MONTANA AND NEW JERSEY BY COUNTY

Table B.1: Summary of the first 28 Montana counties by identification number (ID) and
name for true population total (Truth), sample size (n), and bias across all repeated
samples for the direct, Fay-Herriot (FH), and FH with conditional autoregressive random
effects (FHCAR) model estimates.
Bias
County ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

Name
Silver Bow
Cascade
Yellowstone
Missoula
Lewis & Clark
Gallatin
Flathead
Fergus
Powder River
Carbon
Phillips
Hill
Ravalli
Custer
Lake
Dawson
Roosevelt
Beaverhead
Chouteau
Valley
Toole
Big Horn
Musselshell
Blaine
Madison
Pondera
Richland
Powell

Truth n Direct FH FHCAR
439
7
-6
144
131
1309 23
-12
-130
-160
1062 12
4
54
97
2782 27
-15
218
216
1941 38
12
17
-35
1804 29
-9
17
17
5209 30
-16
444
498
1239 44
6
-173
-224
344
18
-1
-22
-14
868
9
3
-185
-142
304
5
0
-36
-27
209
5
-5
-17
-22
1743 13
-8
-64
-103
461
18
-11
-101
-114
1589 10
9
-206
-190
467
8
-10
-90
-97
264
2
1
-27
-33
411
9
-3
119
119
354
8
-4
-60
-85
505
7
-2
-90
-105
72
2
0
27
16
618
30
-9
-68
-56
1247 31
-13
-285
-288
321
9
-6
-40
-32
838
21
0
-81
-87
125
3
2
22
4
489
5
-2
-86
-91
825
30
1
1
-3

98

Table B.2: Summary of the second set of 28 Montana counties by identification number
(ID) and name for true population total (Truth), sample size (n), and bias across all
repeated samples for the direct, Fay-Herriot (FH), and FH with conditional autoregressive
random effects (FHCAR) model estimates.
Bias
County ID
Name
Truth n Direct
29
Rosebud
448
32
6
30
Deer Lodge
454
7
1
31
Teton
296
7
-9
32
Stillwater
1133 14
-11
33
Treasure
175
14
-1
34
Sheridan
79
2
0
35
Sanders
1816 16
-12
36
Judith Basin
412
8
2
37
Daniels
154
2
0
38
Glacier
310
6
-7
39
Fallon
223
3
2
40
Sweet Grass
477
16
-1
41
McCone
156
2
1
42
Carter
296
9
-4
43
Broadwater
310
10
-8
44
Wheatland
180
8
0
45
Prairie
153
3
0
46
Granite
754
17
-6
47
Meagher
388
30
1
48
Liberty
58
2
0
49
Park
1032 26
-4
50
Garfield
256
8
-3
51
Jefferson
1385 16
2
52
Wibaux
148
4
3
53
Golden Valley
153
7
8
54
Mineral
520
4
9
55
Petroleum
163
6
-2
56
Lincoln
2857 19
-29

99

FH FHCAR
-31
-34
-17
-30
-35
-42
-289
-278
0
7
26
23
-37
-59
-35
-28
-1
0
-10
-30
-18
-20
-62
-58
-4
-1
-34
-27
-33
-21
-7
-2
-2
-6
-35
-27
6
12
24
17
-92
-53
-31
-38
-192
-254
-3
1
6
27
-2
18
1
14
183
144

Table B.3: Summary of New Jersey counties by identification number (ID) and name for
true population total (Truth), sample size (n), and bias across all repeated samples for the
direct, Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR)
model estimates.
Bias
County ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Name
Truth n Direct FH FHCAR
Atlantic
3842 18
17
-214
-489
Bergen
350
2
5
113
102
Burlington
3210 20
18
491
649
Camden
627
3
0
157
178
Cape May
874
5
-9
483
524
Cumberland 2591 13
-27
-47
-93
Essex
132
2
0
49
49
Gloucester
2092
9
-2
-202
-266
Hudson
33
2
0
-5
-5
Hunterdon
3785 16
-2
-477
-572
Mercer
1001
6
2
63
103
Middlesex
917
5
8
-68
-75
Monmouth
2139 10
8
-249
-363
Morris
2762 14
9
-193
-313
Ocean
1507 12
1
673
764
Passaic
522
3
2
192
201
Salem
1831
9
4
54
90
Somerset
1629
8
-4
-60
-47
Sussex
3013 18
-4
224
284
Union
46
2
0
171
193
Warren
2739 14
-13
-123
-77

100

Table B.4: Summary of the first 28 Montana counties by identification number (ID) for
root mean squared error (RMSE), empirical coverage for a 95% nominal coverage rate, and
95% confidence interval width for the direct estimator and 95% credible interval width for
the two small area estimation models across all repeated samples for the direct, Fay-Herriot
(FH), and FH with conditional autoregressive random effects (FHCAR) model estimates.
RMSE
County ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

Direct FH FHCAR
341
418
405
678
543
536
571
536
556
990
965
942
867
742
703
813
701
695
1220 1304
1323
531
414
410
224
151
159
495
382
381
264
174
175
223
152
153
780
643
643
384
245
235
710
586
594
319
213
214
244
165
161
314
365
367
295
187
173
314
215
214
73
78
70
372
269
270
570
488
486
251
164
169
531
394
395
163
126
107
308
217
219
465
382
369

Percent coverage
Direct
70.2
81.6
80.3
88.9
85.6
85.9
93.6
79.7
72.4
79.1
70.0
59.5
87.6
56.5
88.0
72.7
87.9
66.7
65.9
76.6
84.8
73.5
82.2
69.6
74.9
68.2
80.8
77.1

101

FH
78.9
84.2
84.0
93.1
90.3
91.3
94.7
79.2
75.5
76.1
72.5
62.5
90.5
59.0
87.4
72.4
64.1
76.1
67.8
73.0
68.9
77.2
77.2
73.0
78.8
70.7
77.0
82.1

FHCAR
79.0
83.5
84.5
93.3
90.4
91.5
94.6
77.2
76.1
78.2
73.5
61.7
89.8
58.2
87.6
71.6
63.8
75.6
66.1
71.6
69.6
78.1
76.8
73.6
78.5
69.5
76.3
82.5

Confidence or
credible interval width
Direct
1236
2513
2283
3964
3203
3117
5252
1828
625
2018
1031
749
3327
1089
3269
1116
4181
1061
962
1159
1196
1223
2072
795
1823
818
1391
1564

FH
1195
1960
2001
3928
2881
2813
5401
1399
469
1176
526
391
2646
663
2215
647
441
1007
551
662
204
918
1427
505
1349
330
691
1339

FHCAR
1165
1894
2054
3862
2738
2791
5426
1301
483
1260
534
391
2585
620
2272
636
421
1011
479
631
187
929
1410
520
1343
282
685
1301

Table B.5: Summary of the second set of 28 Montana counties by identification number
(ID) for root mean squared error (RMSE), empirical coverage for a 95% nominal coverage
rate, and 95% confidence interval width for the direct estimator and 95% credible interval
width for the two small area estimation models across all repeated samples for the direct,
Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR) model
estimates.
RMSE
County ID
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

Direct
302
352
292
599
151
51
671
295
166
270
212
352
185
212
301
194
172
462
306
74
570
247
632
196
191
382
153
893

FH
204
253
193
499
104
63
586
210
118
195
138
232
142
140
198
122
117
356
245
82
441
159
519
122
121
300
105
896

FHCAR
190
240
187
491
116
62
572
211
122
188
137
230
140
149
209
125
109
362
237
73
449
147
520
127
137
328
117
886

Percent coverage
Direct
75.0
77.9
61.7
81.8
69.0
96.0
87.8
72.7
84.8
65.5
76.5
71.0
75.2
68.3
59.6
62.2
70.2
75.5
60.8
71.2
77.8
57.2
86.3
59.4
63.4
72.5
71.9
88.9

102

FH FHCAR
80.7
81.3
80.9
81.1
67.8
67.4
76.6
77.6
77.5
77.5
76.6
75.9
89.8
89.8
75.4
75.9
64.0
63.7
70.9
68.3
68.8
68.5
74.8
75.4
52.1
53.6
69.9
70.2
65.7
66.8
67.9
68.7
66.5
67.0
81.2
81.3
67.2
68.6
57.0
58.1
81.0
82.0
64.2
64.5
85.8
83.2
60.7
61.0
73.3
76.5
70.3
69.8
78.5
79.5
91.8
91.6

Confidence or
credible interval width
Direct
870
1377
952
2292
402
1139
2749
1057
2474
902
1147
1066
2854
650
864
509
842
1603
892
1111
2037
737
2509
674
532
1860
487
3701

FH FHCAR
665
630
889
837
552
530
1401
1406
305
327
199
202
2343
2290
679
683
292
304
557
532
384
381
723
721
315
307
417
439
551
581
323
327
297
279
1252
1270
727
696
184
167
1574
1628
436
400
1810
1678
306
316
332
363
944
1026
307
329
3589
3562

Table B.6: Summary of New Jersey counties by identification number (ID) for root mean
squared error (RMSE), empirical coverage for a 95% nominal coverage rate, and 95%
confidence interval width for the direct estimator and 95% credible interval width for the
two small area estimation models across all repeated samples for the direct, the Fay-Herriot
(FH), and the FH with conditional autoregressive random effects (FHCAR) model
estimates.
RMSE
County ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Direct
942
247
975
354
417
842
75
644
18
906
457
451
621
779
701
359
617
550
912
30
847

FH FHCAR
732
790
162
157
1041
1131
276
288
637
682
578
528
78
78
473
472
17
17
832
864
292
284
270
248
454
499
522
510
874
937
323
320
484
494
324
295
767
776
186
207
616
588

Percent coverage
Direct
97.3
95.7
93.1
91.4
90.2
93.1
98.9
94.9
96.2
96.1
91.4
89.8
97.2
97.8
88.4
85.2
90.8
96.4
92.0
100.0
91.6

103

FH
98.5
98.4
95.6
93.8
89.2
96.7
96.2
96.0
67.8
94.3
96.0
92.3
97.9
99.2
93.9
89.0
93.3
98.8
96.2
26.9
93.8

FHCAR
96.4
98.6
95.2
93.8
86.5
97.0
93.5
95.2
66.7
92.0
96.7
92.9
96.2
98.8
90.0
87.6
93.7
99.1
96.2
13.6
94.6

Confidence or
credible interval width
Direct
5254
12481
4499
3925
2668
4114
2933
3839
448
4898
2803
2777
4080
5137
3413
3480
3000
3972
4310
1521
4009

FH
3947
939
4439
1302
2058
2978
347
2362
54
3437
1587
1251
2425
3368
3215
1179
2177
2279
3660
448
2898

FHCAR
3359
843
4471
1265
2095
2712
293
2125
52
3190
1449
1125
2124
2761
3074
1061
2192
2015
3610
401
2833

BIBLIOGRAPHY

104

BIBLIOGRAPHY

C. Babcock, A. O. Finley, J. B. Bradford, R. Kolka, R. Birdsey, and M. G. Ryan. LiDAR based prediction of forest biomass using hierarchical models with spatially varying
coefficients. Remote Sens Environ, 169:113â127, 2015.
C. Babcock, A. O. Finley, B. D. Cook, A. R. Weiskittel, and C. W. Woodall. Modeling
forest biomass and growth: Coupling long-term inventory and LiDAR data. Remote Sens
Environ, 182:1â12, 2016.
S. Banerjee, B. P. Carlin, and A. E. Gelfand. Hierarchical modeling and analysis for spatial
data. CRC Press, Boca Raton, Florida, 2 edition, 2015. 558 p.
Base Map Service Center Montana State Library. Montana Cadastral Mapping Project, 2015.
URL https://svc.mt.gov/msl/mtcadastral. last accessed May 12, 2015.
W. A. Bechtold and P. L. Patterson. The enhanced Forest Inventory and Analysis programâ
national sampling design and estimation procedures. Technical report, Gen. Tech. Rep.
SRS-80. Asheville, NC: U.S. Department of Agriculture, Forest Service, Southern Research
Station, 2005. 85 p.
H. J. Boonstra. hbsae: Hierarchical Bayesian Small Area Estimation, 2012. URL https:
//CRAN.R-project.org/package=hbsae. R package version 1.0.
J. Breidenbach and R. Astrup. Small area estimation of forest attributes in the Norwegian
National Forest Inventory. Eur J Forest Res, 131:1255â1267, 2012.
J. Breidenbach, R. E. McRoberts, and R. Astrup. Empirical coverage of model-based variance
estimators for remote sensing assisted estimation of stand-level timber volume. Remote
Sens Environ, 173:274â281, 2016.
G. Brown, R. Chambers, P. Heady, and D. Heasman. Evaluation of small area estimation
methods-an application to unemployment estimates from the UK LFS. In Proceedings of
Statistics Canada Symposium, 2001.
B. J. Butler. Family forest owners of the united states, 2006. Technical report, Gen. Tech.
Rep. NRS-27. Newtown Square, PA: U.S. Department of Agriculture, Forest Service,
Northern Research Station, 2008.
B. J. Butler, E. C. Leatherberry, and M. S. Williams. Design, implementation, and analysis
methods for the National Woodland Owner Survey. Technical report, Gen. Tech. Rep.
NE-336. Newtown Square, PA: U.S. Department of Agriculture, Forest Service, Northern
Research Station, 2005. 43 p.
B. J. Butler, B. J. Dickinson, J. H. Hewes, S. M. Butler, K. Andrejczyk, and M. MarkowskiLindsay. USDA Forest Service National Woodland Owner Survey, 2011â2013: Design,
implementation, and estimation methods. Technical report, Gen. Tech. Rep. NRS-157.
105

Newtown Square, PA: U.S. Department of Agriculture, Forest Service, Northern Research
Station, 2016a. 43 p.
B. J. Butler, J. H. Hewes, B. J. Dickinson, K. Andrejczyk, M. Markowski-Lindsay, and S. M.
Butler. Usda forest service national woodland owner survey: national, regional, and state
statistics for family forest and woodland ownerships with 10+ acres, 2011â2013. Technical
report, Res. Bull. NRS-99. Newtown Square, PA: U.S. Department of Agriculture, Forest
Service, Northern Research Station, 2016b. 39 p.
B. J. Butler, J. H. Hewes, B. J. Dickinson, K. Andrejczyk, S. M. Butler, and M. MarkowskiLindsay. Family forest ownerships of the united states, 2013: Findings from the USDA
Forest Serviceâs National Woodland Owner Survey. J Forest, 114(6):638â647, 2016c. doi:
10.5849/jof.15-099.
G. Casella and R. Berger. Statistical Inference. Duxbury, Pacific Grove, Califonia, 2 edition,
2002. 660 p.
S. Cho and D. H. Newman. Spatial analysis of rural land development. Forest Policy Econ,
7(5):732â744, 2005. doi: 10.1016/j.forpol.2005.03.008.
A. Datta, S. Banerjee, and J. S. Hodges. Spatial disease mapping using directed acyclic
graph auto-regressive (dagar) models. arXiv preprint arXiv:1704.07848, 2017.
R. K. Deo, R. E. Froese, M. J. Falkowski, and A. T. Hudak. Optimizing variable radius plot
size and LiDAR resolution to model standing volume in conifer forests. Can J Remote
Sens, 42(5):428â442, 2016.
P. Dick. Modelling net undercoverage in the 1991 Canadian census. Surv Methodol, 21:
45â54, 1995.
B. J. Dickinson and B. J. Butler. Methods for estimating private forest ownership statistics:
Revised methods for the usda forest serviceâs national woodland owner survey. J Forest,
111(5):319â325, 2013. doi: 10.5849/jof.12-088.
B. J. Dickinson, B. J. Butler, S. M. Butler, and E. S. Huff. Usda forest service, national
woodland owner survey 2011-2013: Detailed estimation procedures. Technical report, Unpublished report. On file with: U.S. Department of Agriculture, Forest Service, Northern
Research Station, Newtown Square, PA 19073, in press.
G. M. Domke, C. W. Woodall, B. F. Walters, R. E. McRoberts, and M. A. Hatfield. Strategies
to compensate for the effects of nonresponse on forest carbon baseline estimates from the
national forest inventory of the united states. Forest Ecol Manag, 315:112â220, 2014.
R. E. Fay and R. A. Herriot. Estimation of income for small places: An application of
james-stein procedures to census data. J Am Stat Assoc, 74:268â277, 1979.
A. O. Finley, S. Banerjee, B. D. Cook, and J. B. Bradford. Hierarchical bayesian spatial
models for predicting multiple forest variables using waveform lidar, hyperspectral imagery,
and large inventory datasets. Int J Appl Earth Obs, 22:147â160, 2013.
106

A. O. Finley, S. Banerjee, Y. Zhou, B. D. Cook, and C. Babcock. Joint hierarchical models
for sparsely sampled high-dimensional LiDAR and forest variables. Remote Sens Environ,
190:149â161, 2017.
J. C. Finley, S. B. Jones, A. S. Reed, M. G. Jacobson, and G. R. Glover. Finding a name to
fit the owner. J For, 99(3):48, 2001.
A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis. Chapman
& Hall/CRC, Boca Raton, Florida, 3 edition, 2014. 661 p.
M. E. Goerndt, V. J. Monleon, and H. Temesgen. Relating forest attributes with area-and
tree-based light detection and ranging metrics for western oregon. West North J Appl For,
25(3):105â111, 2010.
M. E. Goerndt, V. J. Monleon, and H. Temesgen. A comparison of small-area estimation
techniques to estimate selected stand attributes using LiDAR-derived auxiliary variables.
Can J For Res, 41(6):1189â1201, 2011.
M. E. Goerndt, V. J. Monleon, and H. Temesgen. Small-area estimation of county-level
forest attributes using ground data and remote sensed auxiliary information. For Sci, 59
(5):536â548, 2013.
T. G. Gregoire. Design-based and model-based inference in survey sampling: appreciating
the difference. Can J For Res, 28(10):1429â1447, 1998.
T. G. Gregoire, E. NĂŚsset, R. E. McRoberts, G. StaĚhl, H.-E. Andersen, T. Gobakken, L. Ene,
and R. Nelson. Joint hierarchical models for sparsely sampled high-dimensional LiDAR
and forest variables. Remote Sens Environ, 173:98â108, 2016.
M. H. Hansen and W. N. Hurwitz. On the theory of sampling from finite populations. Ann
Math Stat, 14(4):333â362, 1943.
R. Hayashi, J. A. Kershaw, Jr, and A. Weiskittel. Evaluation of alternative methods for
using lidar to predict aboveground biomass in mixed species and structurally complex
forests in northeastern North America. Math Comput For Nat Res Sci, 7(2):49â65, 2015.
R. Hayashi, A. Weiskittel, and J. A. Kershaw, Jr. Influence of prediction cell size on LiDARderived area-based estimates of total volume in mixed-species and multicohort forests in
northeastern north america. Can J Remote Sens, 42(5):473â488, 2016.
M. Hollaus, W. Wagner, B. Maier, and K. Schadauer. Airborne laser scanning of forest stem
volume in a mountainous environment. Sensors, 7(8):1559â1577, 2007.
M. Hollaus, W. Wagner, K. Schadauer, B. Maier, and K. Gabler. Growing stock estimation
for alpine forests in Austria: a robust lidar-based approach. Can J For Res, 39(7):1387â
1400, 2009.
T. G. Honer. Standard volume tables and merchantable conversion factors for the commercial
tree species of central and eastern Canada. Technical report, Information Report FMRX-5. Ottawa, Ontario, Canada: Forest Management Research Institute, 1967.
107

A. T. Hudak, A. T. Haren, N. L. Crookston, R. J. Liebermann, and J. L. Ohmann. Imputing forest structure attributes from stand inventory and remotely sensed data in western
Oregon, USA. For Sci, 60(2):253â269, 2014.
M. Isenburg. LAStools â efficient LiDAR processing software (version 160721, unlicensed),
2016. URL http://rapidlasso.com/LAStools.
J. C. Jenkins, D. C. Chojnacky, L. S. Heath, and R. A. Birdsey. National-scale biomass
estimators for United States tree species. For Sci, 49(1):12â35, 2003.
S. Jin, L. Yang, P. Danielson, C. Homer, J. Fry, and G. Xian. A comprehensive change
detection method for updating the National Land Cover Database to circa 2011. Remote
Sens Environ, 132:159â175, 2013. doi: 10.1016/j.rse.2013.01.012.
B. R. Kaetzel, I. Majumdar, L. D. Teeter, and B. J. Butler. Regional differences among
family forest landowners using national woodland owner survey results. South J Appl For,
36(3):141â145, 2012.
M. A. Kilgore, S. A. Snyder, K. Block-Torgerson, and S. J. Taff. Challenges in characterizing
a parcelized forest landscape: Why metric, scale, threshold, and definitions matter. Landsc
Urban Plan, 110:36â47, 2013.
S. L. King and B. J. Butler. Generating a forest parcel map for Madison County, NY.
Technical report, In, edited by Michael Bevers and Tara M. Barrett, Gen. Tech. Rep.
PNW-GTR-656. Portland, OR: U.S. Department of Agriculture, Forest Service, Pacific
Northwest Research Station, 2005. 147â155.
N. P. Kingsley and J. C. Finley. The forest-land owners of delaware. Technical report,
Res. Bull. NE-38. U.S. Department of Agriculture, Forest Service, Northeastern Forest
Experiment Station, 1975. 19 p.
D. B. Kittredge, A. W. DâAmato, P. Catanzaro, J. Fish, and B. J. Butler. Estimating
ownerships and parcels of nonindustrial private forestland in massachusetts. North J Appl
For, 25(2):93â98, 2008.
K. Kronseder, U. Ballhorn, V. BoĚhm, and F. Siegert. Above ground biomass estimation
across forest types at different degradation levels in Central Kalimantan using LiDAR
data. Int J Appl Earth Obs, 18:37â48, 2012.
M. H. Kutner, C. J. Nachtsheim, and J. Neter. Applied linear regression models. McGrawHill/Irwin, New York, New York, 4 edition, 2004. 701 p.
M. A. Lefsky, W. B. Cohen, G. G. Parker, and D. J. Harding. LiDAR remote sensing for
ecosystem studies. Bioscience, 52(1):19â30, 2002.
R. J. A. Little and D. B. Rubin. Statistical analysis with missing data. John Wiley & Sons,
Inc, Hoboken, New Jersey, 2 edition, 2014. 381 p.
S. L. Lohr. Sampling: design and analysis. Duxbury Press, Pacific Grove, California, 2
edition, 2010. 608 p.
108

S. Magnussen, D. Mandallaz, J. Breidenbach, A. Lanz, and C. Ginzler. National forest
inventories in the service of small area estimation of stem volume. Can J For Res, 44(9):
1079â1090, 2014. doi: 10.1139/cjfr-2013-0448.
M. Maltamo, K. T. Korhonen, P. PackaleĚn, L. MehtaĚtalo, and A. Suvanto. A test on the
usability of truncated angle count sample plots as ground truth in airborne laser scanning
based forest inventory. Forestry, 80:73â81, 2007.
M. Maltamo, P. PackaleĚn, A. Suvanto, K. T. Korhonen, L. MehtaĚtalo, and P. HyvoĚnen.
Combining ALS and NFI training data for forest management planning: a case study in
Kuortane, Western Finland. Eur J For Res, 128:305â317, 2009.
F. Mauro, V. J. Monleon, and H. Temesgen. Using small area estimation and lidar-derived
variables for multivariate prediction of forest attributes. In S. M. Stanton and G. A.
Christensen, editors, Gen. Tech. Rep. PNW-GTR-931, pages 73â77. U.S. Department of
Agriculture, Forest Service, Pacific Northwest Research Station. 384 p., December 8â10
2015.
F. Mauro, I. Molina, A. GarcÄąĚa-Abril, R. Valbuena, and E. Ayuga-TeĚllez. Remote sensing
estimates and measures of uncertainty for forest variables at different aggregation levels.
Environmetrics, 27(4):225â238, 2016.
R. E. McRoberts. Compensating for missing plot observations in forest inventory estimation.
Can J For Res, 33(10):1990â1997, 2003.
R. E. McRoberts, W. B. Cohen, E. NĂŚsset, S. V. Stehman, and E. O. Tomppo. Using
remotely sensed data to construct and assess forest attribute maps and related spatial
products. Scand J Forest Res, 25(4):340â367, 2010.
R. E. McRoberts, E. NĂŚsset, and T. Gobakken. Inference for lidar-assisted estimation of
forest growing stock volume. Remote Sens Environ, 128:268â275, 2013.
S. R. Mehmood and D. Zhang. Forest parcelization in the United States: A study of contributing factors. J Forest, 99(4):30â34, 2001.
A. L. Metcalf. Human dimensions of private forestland ownership: sampling, estimation, decision making processes, and implications. PhD thesis, The Pennsylvania State University,
2010. 264 p.
A. L. Metcalf, J. C. Finley, A. E. Luloff, D. Shumway, and R. C. Stedman. Private forest
landowners: Estimating population parameters. J Forest, 110(7):362â370, 2012.
A. L. Metcalf, J. C. Finley, A. E. Luloff, R. C. Stedman, and D. Shumway. Progress in
private forest landowner estimation. J Forest, 112(3):312â315, 2014.
E. NĂŚsset, T. Gobakken, J. Holmgren, H. HyyppaĚ, J. HyyppaĚ, M. Maltamo, M. Nilsson,
H. Olsson, AĚ. Persson, and U. SoĚderman. Laser scanning of forest resources: the Nordic
experience. Scand J Forest Res, 19(6):482â499, 2004.

109

G. Nettheim, G. D. Meyers, and D. Craig. Environmental and Natural Resources Management by Indigenous Peoples in the United States [online], chapter Indigenous Peoples
and Governance Structures: A Comparative Analysis of Land and Resource Management Rights, pages 27â60. Aboriginal Studies Press, Canberra, Australia, 2002. URL
http://search.informit.com.au/documentSummary;dn=408415504221651;res=IELIND.
last accessed Apr 4, 2016.
New Jersey Office of Information Technology, Office of GIS. New Jersey Geographic Information Network, 2015. URL https://njgin.state.nj.us/NJ$ $NJGINExplorer/index.jsp.
last accessed Dec 21, 2015.
B. M. OâConnell, E. B. LaPoint, J. A. Turner, T. Ridley, S. A. Pugh, A. M. Wilson, K. L.
Waddell, and B. L. Conkling. The forest inventory and analysis database: Database
description and users manual version 6.0.2 for phase 2. Technical report, U.S. Department
of Agriculture, Forest Service, 2015. URL http://www.fia.fs.fed.us/library/databasedocumentation. 748 p, last accessed May 15, 2015.
J. D. Opsomer, F. J. Breidt, G. G. Moisen, and G. Kauermann. Model-assisted estimation
of forest resources with generalized additive models. J Am Stat Assoc, 102(478):400â409,
2007.
S. N. Oswalt, W. B. Smith, P. D. Miles, and S. A. Pugh. Forest resources of the united
states, 2012: a technical document supporting the forest service 2015 update of the rpa
assessment. Technical report, Gen. Tech. Rep. WO-91. Washington, DC: U.S. Department
of Agriculture, Forest Service, Washington Office, 2014. 218 p.
Y. Pan, Y. Zhang, and I. Majumdar. Population, economic welfare and holding size distribution of private forestland in Alabama, USA. Silva Fenn, 43(1):161â171, 2009.
P. L. Patterson, J. W. Coulston, F. A. Roesch, J. A. Westfall, and A. D. Hill. A primer for
nonresponse in the us forest inventory and analysis program. Environ Monit Assess, 184
(3):1423â1433, 2012.
D. Pfefferman. New important developments in small area estimation. Stat Sci, 28(1):40â68,
2013.
A. T. Porter, C. K. Wikle, and S. H. Holan. Small area estimation via multivariate FayHerriot models with latent spatial dependence. Aust Nz J of Stat, 57(1):15â29, 2015. doi:
10.1111/anzs.12101.
N. C. Poudyal and D. G. Hodges. Property taxation and rural land values: Their effect on
private forestland ownership structure in Texas. Land Use Policy, 26(4):1100â1105, 2009.
doi: 10.1016/j.landusepol.2009.01.005.
QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial
Foundation, 2014. URL http:/qgis.osgeo.org.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation
for Statistical Computing, Vienna, Austria, 2015. URL https://www.R-project.org/.
110

J. N. K. Rao and I. Molina. Small area estimation. John Wiley & Sons, Inc, Hoboken, New
Jersey, 2 edition, 2015. 441 p.
B. Rice, A. R. Weiskittel, and R. G. Wagner. Efficiency of alternative forest inventory
methods in partially harvested stands. Eur J For Res, 133(2):261â272, 2014.
K. Sahr. DGGRID version 6.1, 2011. URL http://www.discreteglobalgrids.org/software/.
C.-E. SaĚrndal, B. Swensson, and J. Wretman. Model assisted survey sampling. SpringerVerlag, New York, New York, 1992. 694 p.
G. Scrinzi, F. Clementel, and A. Floris. Angle count sampling reliability as ground truth for
area-based LiDAR applications in forest inventories. Can J For Res, 45(4):506â514, 2015.
W. B. Smith, P. D. Miles, J. S. Vissage, and S. A. Pugh. Forest resources of the united
states, 2002. Technical report, Gen. Tech. Rep. NC-241. St. Paul, MN: U.S. Department
of Agriculture, Forest Service, North Central Research Station, 2004. 137 p.
W. B. Smith, P. D. Miles, D. Patrick, C. H. Perry, and S. A. Pugh. Forest resources
of the united states, 2007: a technical document supporting the forest service 2010 rpa
assessment. Technical report, Gen. Tech. Rep. WO-78. Washington, DC: U.S. Department
of Agriculture, Forest Service, Washington Office, 2009. 337 p.
US Census Bureau. 2010 Census Data, 2016. URL https://www.census.gov/2010census/
data. last accessed Aug 12, 2016.
G. A. P. US Geological Survey. Protected Areas Database of the United States (PAD-US),
version 1.4 Combined Feature Class, 2016. URL https://gapanalysis.usgs.gov/padus. last
accessed Dec 29, 2015.
J. A. N. van Aardt, R. H. Wynne, and R. G. Oderwald. Forest volume and biomass estimation
using small-footprint lidar-distributional parameters on a per-segment basis. For Sci, 52
(6):636â649, 2006.
J. J. Vaske. Survey research and analysis: Applications in parks, recreation and human
dimensions. Venture Publishing, State College, PA, 2008. 635 p.
N. R. Ver Planck, A. L. Metcalf, A. O. Finley, and J. C. Finley. Evaluation of the USDA
Forest Service National Woodland Owner Survey estimators for private forest area and
landowners: a case study of Montana. For Sci, 62(5):525â534, 2016. doi: 10.5849/forsci.15196.
N. R. Ver Planck, A. O. Finley, J. A. Kershaw, Jr., A. R. Weiskittel, and M. C. Kress.
[dataset] hierarchical Bayesian models for small area estimation of forest variables using
LiDAR. Mendeley Data, v1, 2017a. doi: 10.17632/vyfrnkx39h.1.
N. R. Ver Planck, A. O. Finley, and E. S. Huff. Replication data for: Hierarchical bayesian
models for small area estimation of county-level private forest landowner population. Harvard Dataverse, 2017b. doi: 10.7910/DVN/A3ROX.

111

N. R. Ver Planck, A. O. Finley, and E. S. Huff. Hierarchical Bayesian models for small
area estimation of county-level private forest landowner population. Can J Forest Res, 47:
1577â1589, 2017c. doi: 10.1139/cjfr-2017-0154.
N. R. Ver Planck, A. O. Finley, J. A. Kershaw, Jr., A. R. Weiskittel, and M. C. Kress.
Hierarchical Bayesian models for small area estimation of forest variables using LiDAR.
Remote Sens Environ, 204:287â295, 2018. doi: 10.1016/j.rse.2017.10.024.
M. Woods, D. Pitt, M. Penner, K. Lim, D. Nesbitt, D. Etheridge, and P. Treitz. Operational
implementation of a LiDAR inventory in boreal Ontario. Forest Chron, 87(4):512â528,
2011.
Y. You and Q. M. Zhou. Hierarchical Bayes small area estimation under a spatial model
with application to health survey data. Surv Methodol, 37(1):25â37, 2011.
Y. Zhang, D. Zhang, and J. Schelhas. Small-scale non-industrial private forest ownership in
the united states: rationale and implications for forest management. Silva Fenn, 39(3):
443â454, 2005.

112