HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA ESTIMATION OF BIOPHYSICAL AND SOCIAL FORESTRY VARIABLES By Neil Ryan Ver Planck A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Forestry – Doctor of Philosophy 2018 ABSTRACT HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA ESTIMATION OF BIOPHYSICAL AND SOCIAL FORESTRY VARIABLES By Neil Ryan Ver Planck Forest inventories and surveys, accounting for time and cost constraints, are typically designed to yield accurate and precise estimates of population means and totals over large spatial domains. In many instances, these inventories and surveys also offer reliable inference for smaller subpopulations with sufficient sample observations; however, there is growing demand for valid and precise estimates at levels that have smaller sample sizes based on the original sample design. One solution to this problem is application of small area estimation methods. Small area estimation (SAE) is a model-based approach that couples a direct estimate and possible covariates to improve the estimate precision and, in some cases, accuracy. Unlike a standard linear regression, the SAE framework is comprised of two components: a sampling model and a linking model. Estimation of the SAE parameter of interest accounts for and balances between the sampling (i.e., direct estimator) and linking model errors. The linking model is a linear model with random effects that relate the small areas of interest with some error. Additional spatial structure might still remain in the linking model after accounting for possible covariates. Such residual structure can be further modeled using spatial random effects. This dissertation presents SAE methods within a hierarchical Bayesian (HB) framework. This framework is applied to common biophysical forest inventory outcomes of interest (i.e., aboveground biomass, basal area, volume, and tree density) at the stand level, and to the social forestry survey outcomes of private forest landowner populations. Furthermore, an in depth examination of the direst estimator, in the presence of nonresponse, is assessed for private forest landowner population size. The primary objectives of this dissertation are: i) to apply a HB framework to increase the precision of estimates for biophysical forest variables at the stand level by borrowing strength across all stands through the use of LiDAR covariates; ii) to apply a conditional autoregressive structure to the stand-level random effects to assess gains in precision of biophysical forest variables; iii) to evaluate the current National Woodland Owner Survey estimators of private forest area and private forest landowner population size for a known population at the state level; iv) to present an alternative estimator of private forest landowner population size that explicitly accounts for various nonresponse scenarios; v) to evaluate the impacts of nonresponse biases on each of these estimators; vi) to produce county-level private forest ownership datasets for two complete states; vi) to define and assess SAE models to improve county-level inference of the number of private forest ownerships, and; vii) to develop open source software to fit proposed SAE models. This dissertation is dedicated to Liisa. iv ACKNOWLEDGMENTS I would like to sincerely thank my dissertation advisor and mentor Dr. Andrew O. Finley for all of his guidance and help throughout my graduate studies. I would also like to thank all of my committee members: Dr. David W. MacFarlane, Dr. Aaron R. Weiskittel, Dr. Christopher W. Woodall, and Dr. Phoebe L. Zarnetske for all of their time and assistance with this dissertation and my research program. Many additonal thanks to the following collaborators: Dr. James C. Finley, Dr. Emily S. Huff, Dr. John A. Kershaw, Jr., and Dr. Alexander L. Metcalf for providing data and thoughtful feedback regarding my research. Additional acknowlegments to the editors and anonymous reviewers of this research at the Canadian Journal of Forest Research, Forest Science, and Remote Sensing of Environment. Additional thanks to the lab group members, during my tenure, of the Geospatial Lab: Chad Babcock, Gloria Desanker, Malcolm Itter, Megan Kress, Jason Matney, Daniel Taylor, Yuzhen Zhou for providing assistance, collaboration, and motivation. Most importantly, a special thanks to my wife, Liisa, for all of her patience, love, and support. I would also like to thank my entire family for all of their support and encouragement throughout my graduate study years. v TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 CHAPTER 2 HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA TIMATION OF FOREST VARIABLES USING LIDAR . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1.1 Study area . . . . . . . . . . . . . . . . . . . . . . . 2.2.1.2 Variable radius plot data . . . . . . . . . . . . . . . . 2.2.1.3 LiDAR data . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2.1 Direct estimator and small area models . . . . . . . . 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 10 10 10 10 11 12 12 15 22 CHAPTER 3 MULTIVARIATE HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA ESTIMATION OF FOREST VARIABLES USING LIDAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1.1 Study area . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1.2 Variable radius plot data . . . . . . . . . . . . . . . . . . . . 3.2.1.3 LiDAR data . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2.1 Direct estimator and small area models . . . . . . . . . . . . 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 26 28 28 28 29 30 31 31 34 44 CHAPTER 4 EVALUATION OF THE NATIONAL WOODLAND OWNER SURVEY ESTIMATORS FOR PRIVATE FOREST AREA AND LANDOWNERS: A CASE STUDY OF MONTANA . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 GIS-based Baseline Population Estimate . . . . . . . . . . . . . . . . 4.2.2 Sampling Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3.1 Private Forest Area . . . . . . . . . . . . . . . . . . . . . . . 47 47 50 50 52 53 53 vi 4.3 4.4 4.2.3.2 Private Forest Landowner Population 4.2.4 Nonresponse . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Private Forest Area . . . . . . . . . . . . . . . 4.3.2 Total Private Forest Landowners . . . . . . . 4.3.2.1 Full Response . . . . . . . . . . . . . 4.3.2.2 Nonresponse . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 56 58 58 61 62 64 65 CHAPTER 5 HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA ESTIMATION OF COUNTY-LEVEL PRIVATE FOREST LANDOWNER POPULATION . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 True number of private forest ownerships . . . . . . . . . . . . . . . . 5.2.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2.1 Direct estimator . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2.2 Small area estimation models . . . . . . . . . . . . . . . . . 5.2.3 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3.1 County-level covariates . . . . . . . . . . . . . . . . . . . . . 5.2.4 Simulation summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 69 71 71 72 72 73 75 76 76 78 88 CHAPTER 6 RECOMMENDATIONS FOR FUTURE WORK . . . . . . . . . . . 93 APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX A HB AND EBLUP PARAMETER ESTIMATES FOR FAYHERRIOT MODEL OF ABOVEGROUND BIOMASS . . . . APPENDIX B MODEL SUMMARIES OF PRIVATE FOREST LANDOWNERS FOR MONTANA AND NEW JERSEY BY COUNTY . . . . . . . . . . . . . . . . . . . . . . . 96 97 98 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 vii LIST OF TABLES Table 2.1: Summary statistics for stand area, number of plots, mean aboveground biomass, sampling variances, and LiDAR covariates for the Noonan Forest (m = 226 stands) dataset. . . . . . . . . . . . . . . . . . . . . . . . . . 12 Table 2.2: The median and 95% credible intervals for Fay-Herriot (FH), FH with conditional autoregressive (FHCAR), and FHCAR with smoothed sampling variances (FHCAR-SMOOTH) model parameters. . . . . . . . . . . 15 Table 2.3: Mean CV reduction (%) between SAE models by number of neighboring stands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Table 3.1: Summary statistics for stand area, number of plots, mean aboveground biomass, mean basal area, mean volume, mean tree density, sampling variances, and LiDAR covariates for the Noonan Forest (m = 224 stands) dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Table 3.2: The median and 95% credible intervals for the univariate Fay-Herriot (FH), and FH with conditional autoregressive (FHCAR) model parameters for the four outcomes of interest. . . . . . . . . . . . . . . . . . . . . 35 Table 4.1: Size classes of private forest area. . . . . . . . . . . . . . . . . . . . . . . . 57 Table 4.2: Summary of Montana private forest land area (ha) for the GIS-based baseline (base), the estimate from simple random sampling of the baseline (Method I), and additional publications. . . . . . . . . . . . . . . . . . . . 59 Table 4.3: Summary of the number of private forest landowners in Montana for the GIS-based baseline (base), the estimate from two sampling methods, and additional publications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Table 5.1: The median and 95% credible intervals (in parantheses) for Fay-Herriot (FH) and FH with conditional autoregressive (FHCAR) models of a single iteration for Montana (iteration 750) and New Jersey (iteration 350). . 82 Table 5.2: Summary of bias, root mean squared error (RMSE), empirical coverage for a 95% nominal coverage rate and average 95% confidence interval width of the direct estimate and 95% credible interval width of the small area estimation model estimates for Montana and New Jersey across all counties and iterations for the direct, Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR) model estimates. . . . 85 viii Table A.1: The median and 95% credible intervals for the model parameters of the Fay-Herriot (FH) with hierarchical Bayesian (HB) inference and the point estimates with standard errors in parentheses for the FH under empirical best linear unbiased predictor (EBLUP) were fit via restricted maximum likelihood. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Table B.1: Summary of the first 28 Montana counties by identification number (ID) and name for true population total (Truth), sample size (n), and bias across all repeated samples for the direct, Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR) model estimates. . . . 98 Table B.2: Summary of the second set of 28 Montana counties by identification number (ID) and name for true population total (Truth), sample size (n), and bias across all repeated samples for the direct, Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR) model estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Table B.3: Summary of New Jersey counties by identification number (ID) and name for true population total (Truth), sample size (n), and bias across all repeated samples for the direct, Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR) model estimates. . . . . . . . . . 100 Table B.4: Summary of the first 28 Montana counties by identification number (ID) for root mean squared error (RMSE), empirical coverage for a 95% nominal coverage rate, and 95% confidence interval width for the direct estimator and 95% credible interval width for the two small area estimation models across all repeated samples for the direct, Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR) model estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Table B.5: Summary of the second set of 28 Montana counties by identification number (ID) for root mean squared error (RMSE), empirical coverage for a 95% nominal coverage rate, and 95% confidence interval width for the direct estimator and 95% credible interval width for the two small area estimation models across all repeated samples for the direct, FayHerriot (FH), and FH with conditional autoregressive random effects (FHCAR) model estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Table B.6: Summary of New Jersey counties by identification number (ID) for root mean squared error (RMSE), empirical coverage for a 95% nominal coverage rate, and 95% confidence interval width for the direct estimator and 95% credible interval width for the two small area estimation models across all repeated samples for the direct, the Fay-Herriot (FH), and the FH with conditional autoregressive random effects (FHCAR) model estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 ix LIST OF FIGURES Figure 2.1: Noonan Forest stands by (a) number of variable radius plots, (b) standlevel P 25 (m); (c) stand-level P 75 (m) LiDAR covariates. . . . . . . . . . 13 Figure 2.2: Posterior mean aboveground biomass (Mg ha−1 ) for the (a) direct estimate, (b) FH model, (c) FHCAR model; (d) FHCAR-SMOOTH model. . 17 Figure 2.3: Mean aboveground biomass (Mg ha−1 ) for direct estimates versus posterior means of FH, FHCAR, and FHCAR-SMOOTH models. . . . . . . 18 Figure 2.4: Coefficient of variation of the (a) direct estimate, (b) FH model, (c) FHCAR model; (d) FHCAR-SMOOTH model. . . . . . . . . . . . . . . . 19 Figure 2.5: Estimates of coefficient of variation (CV) versus stands ordered by increasing CV of direct estimate. . . . . . . . . . . . . . . . . . . . . . . . 20 Figure 2.6: Percent reduction in coefficient of variation from (a) direct estimate to FH model, (b) direct estimate to FHCAR model, (c) FH to FHCAR model; (d) FHCAR to FHCAR-SMOOTH model. . . . . . . . . . . . . . 21 Figure 2.7: 95% credible interval widths of AGB (Mg ha−1 ) for the (a) FH, (b) FHCAR; (c) FHCAR-smooth models. . . . . . . . . . . . . . . . . . . . . 23 Figure 3.1: Noonan Research Forest stands by (a) forest type (balsam fir; tolerant softwoods; black spruce; white pine; mixed woods dominated by softwoods; mixed woods dominated by intolerant hardwoods; intolerant hardwoods), (b) stand-level P 25 (m), (c) stand-level P 50 (m), and (d) stand-level dns (%) LiDAR covariates. . . . . . . . . . . . . . . . . . . . 32 Figure 3.2: The posterior means for the outcomes of (a) mean aboveground biomass (Mg·ha−1 ), (b) mean basal area (m2 ·ha−1 ), (c) mean volume (m3 ·ha−1 ); (d) mean tree density (number of trees·ha−1 ) for the FHIW model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Figure 3.3: (a) Mean aboveground biomass (Mg·ha−1 ), (b) mean basal area (m2 ·ha−1 ), (c) mean volume (m3 ·ha−1 ); (d) mean tree density (number of trees·ha−1 ) for direct estimates versus posterior means of FH, FHCAR, and FHIW models. . . . . . . . . . . . . . . . . . . . . . . . . . 38 x Figure 3.4: Estimates of coefficient of variation (CV) for direct estimates and FH, FHCAR, FHIW models of (a) mean aboveground biomass (Mg·ha−1 ), (b) mean basal area (m2 ·ha−1 ), (c) mean volume (m3 ·ha−1 ); (d) mean tree density (number of trees·ha−1 ) versus stands ordered by increasing CV of direct estimate. The horizontal dashed line indicates a CV value of 0.15; and the horizontal dotted line indicates a CV value of 0.25. . . . 40 Figure 3.5: Percent reduction in coefficient of variation from direct estimate to FH model for (a) mean aboveground biomass (Mg·ha−1 ), (b) mean basal area (m2 ·ha−1 ), (c) mean volume (m3 ·ha−1 ); (d) mean tree density (number of trees·ha−1 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Figure 3.6: Percent reduction in coefficient of variation from FH model to FHIW model for (a) mean aboveground biomass (Mg·ha−1 ), (b) mean basal area (m2 ·ha−1 ), (c) mean volume (m3 ·ha−1 ); (d) mean tree density (number of trees·ha−1 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Figure 4.1: (a) One realization of the sample distribution of unique private forest landowners (PFLs) by size class, (b) trends of probabilities of nonresponse by size class for four different scenarios (A-D) with an overall nonresponse rate of approximately 50%. . . . . . . . . . . . . . . . . . . 58 Figure 4.2: (inset) Location of Montana, highlighted in grey, within the conterminous United States, (a) 10 m x 10 m resolution for all forested lands across the state of Montana, and (b) 10 m x 10 m resolution for private forested lands across the state of Montana. . . . . . . . . . . . . . . . . . 60 Figure 4.3: (a) The distribution for proportion of total private forest area of family forest landowners of Montana (NWOS, Butler et al. 2016b) and the GIS-based baseline (Base) population of Montana by size class, (b) the distribution for proportion of private forest landowners (PFLs) by size class for the Rocky Mountain region (Rockies, Oswalt et al. 2014) and the Base population of Montana, and (c) the distribution of number of family forest landowners of Montana (NWOS, Butler et al. 2016b) and PFLs for the Base population of Montana by size class. . . . . . . . . . . 61 xi Figure 4.4: Horizontal line indicates the GIS-based baseline of PFL population size. Horizontal dashed lines represent the nominal probability coverage rate of 0.95. (a-c) Mean PFL population size, mean 95% confidence interval width, and 95% actual probability coverage for full response and incremental response rates from 50 to 10% for the two different estimators (I and II) based upon 1000 simulations, respectively. (d-f) Mean PFL population size, mean 95% confidence interval width, and 95% actual probability coverage for four different response scenarios (A-D) with an overall 50% response rate of Methods I and II based upon 1000 simulations, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Figure 5.1: Montana counties by (a) sample size, (b) true population size with >2 ha forest, (c) 2010 census population density (PD, number of people·km−2 ); (d) total forest area (TFA, 1000s ha). . . . . . . . . . . . . . . . . . . . . 79 Figure 5.2: New Jersey counties by (a) sample size, (b) true population size with >2 ha forest, (c) 2010 Census population density (PD, number of people·km−2 ); (d) total forest area (TFA, 1000s ha). . . . . . . . . . . . 80 Figure 5.3: Distribution of posterior means from all iterations for select SAE model parameters: (a) the variance parameters of Montana, (b) the FHCAR model autocorrelation parameter of Montana, (c) the variance parameters of New Jersey; (d) the FHCAR model autocorrelation parameter of New Jersey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Figure 5.4: Relative mean squared error comparisons for Montana (eq. 5.8): (a) direct to FH model estimates, (b) direct to FHCAR model estimates; (c) FH to FHCAR model estimates. . . . . . . . . . . . . . . . . . . . . . 87 Figure 5.5: Relative mean squared error comparisons for New Jersey (eq. 5.8): (a) direct to FH model estimates, (b) direct to FHCAR model estimates; (c) FH to FHCAR model estimates. . . . . . . . . . . . . . . . . . . . . . 88 xii CHAPTER 1 INTRODUCTION The focus of this dissertation is the development and application of model-based small area estimation (SAE) models for both biophysical and social forestry variables. The use of the model-based framework is necessitated in SAE because the sample sizes for the design-based framework do not provide adequate precision. There is considerable demand and interest among stakeholders of large-scale forest inventories or surveys, such as the United States Department of Agriculture (USDA) Forest Service Forest Inventory and Analysis (FIA) program and the associated National Woodland Owner Survey (NWOS), because outcomes are desired at resolutions smaller than that provided by the original design. Some of the current forestry-related SAE applications range from the traditional inventory outcomes of timber volume and biomass (e.g., timber products output) to the assessment of wildland fires (FIA Stakeholder Science Meeting 2017). The key to SAE is to borrow strength from related areas to improve estimation for the outcome and small area of interest. A small area, here, is defined by the sample size for the outcome of interest rather than by the geographic size, as our interest lies in both forest stands and larger areas such as counties. Rao and Molina (2015) give an in-depth overview of SAE and cover many different approaches. Following the definitions of Rao and Molina (2015), SAE is divided into three classes: i) direct; ii) indirect; and iii) small area modelbased estimation. Direct estimators are generally design-unbiased for the area of interest and derived directly from the sample. Model-assisted methods, also belonging to this class, have proven useful for combining covariates with plot observations for biophysical variables (see, e.g., Opsomer et al. 2007). Indirect estimators are developed based upon an implicit model that borrows strength from either another domain, time, or both. Several frequently applied examples of this class are synthetic and composite estimators (Goerndt et al. 2011, Breidenbach and Astrup 2012, Goerndt et al. 2013). In Chapters 2, 3, and 5, we consider 1 the third class of SAE, small area model-based estimation, referred to as small area models. Small area models differ from the previous two classes by including an explicit model with random effects that account for variation not explained by covariates in the model mean. In Chapter 4, we consider the NWOS direct estimators for various unit nonresponse scenarios. For small area models, the analyst applies either a frequentist or Bayesian mode of inference. The frequentist approach uses empirical best linear unbiased prediction (EBLUP), and the Bayesian approach uses either empirical Bayes (EB) or hierarchical Bayes (HB). For forestry applications, EBLUP has been applied most frequently and exclusively to biophysical settings (Goerndt et al. 2011, Breidenbach and Astrup 2012, Goerndt et al. 2013, Magnussen et al. 2014, Mauro et al. 2016). The most common small area model is a linear mixed effects model, called the Fay-Herriot (FH, Fay and Herriot 1979) model, which links a direct estimator to covariates via a linear model. Only one of these preceding SAE biophysical forestry studies examined spatial correlations among the area-level effects (see Appendix B in Magnussen et al. 2014). EB is considered the Bayesian paradigm equivalent to EBLUP. Alternatively, HB methods provide access to posterior distributions of the small area parameters (You and Zhou 2011), and hence parameter inference that does not rely upon potentially unrealistic asymptotic assumptions (Pfefferman 2013). The outcomes of interest are most commonly modeled at either the unit-level or the area-level. For biophysical outcomes, the models generally occur at the smallest resolution possible, i.e., the unit-level. Light detection and ranging (LiDAR) has become a ubiquitous choice for covariates in extending and improving unit-level estimation of biophysical variables. Development and application of unit-level regression models for spatially aligned LiDAR and biophysical inventory measurements are quite common (see, e.g., Babcock et al. 2015; 2016, Finley et al. 2017, Gregoire et al. 2016). These unit-level analyses generally apply fixed-area plots for measurements due to alignment between the outcomes and covariates. There are some examples, where variable-radius plots (VRP) have been applied in unit-level analyses (Hollaus et al. 2007; 2009). Several recent studies (Hayashi et al. 2015, Deo et al. 2 2016) have varied LiDAR resolutions to best align with VRP, at the unit-level, without a consensus across forest types. As an alternative to a unit-level analysis, an area-level analysis can be used when there is spatial misalignment between LiDAR and plot measurements (Goerndt et al. 2011). In this dissertation, the focus is on application of area-level models with VRP to estimate biophysical outcomes, with inference at the stand level. For social forestry outcomes of interest, a unit-level analysis is generally not possible to be combined with covariates outside of the survey application due to privacy concerns. For our primary social forestry outcome of interest, private forest landowner population size, an area-level analysis is the natural choice. The primary objective of this dissertation is to contribute to the growing body of literature of SAE applications in the field of forestry. The detailed objectives of this dissertation are: i) to apply a HB framework to increase the precision of estimates for biophysical forest variables at the stand level by borrowing strength across all stands through the use of LiDAR covariates; ii) to apply a conditional autoregressive structure to the stand-level random effects to assess gains in precision of biophysical forest variables; iii) to evaluate the current NWOS estimators of private forest area and private forest landowner population size for a known population at the state level; iv) to present an alternative estimator of private forest landowner population size that explicitly accounts for various nonresponse scenarios; v) to evaluate the impacts of nonresponse biases on each of these estimators; vi) to produce county-level private forest ownership datasets for two complete states; vi) to define and assess SAE models to improve county-level inference of the number of private forest ownerships, and; vii) to develop open source software to fit proposed SAE models. The subsequent paragraphs, summarize the main chapters of the dissertation as three distinct published peer-reviewed articles and one chapter still to be submitted. Chapter 2 has been published in Remote Sensing of Environment. In this chapter, Ver Planck et al. (2018) use forest inventory plot measurements and LiDAR covariates to inform model-based estimators for SAE. There are many examples where such linking models pro- 3 vide the desired accuracy and precision of forest parameter estimates for small areas where paucity of inventory plot observations preclude design-based inference. This work builds on previous SAE work by linking LiDAR covariates with VRP measurements within a HB framework. Using this framework, we compare SAE of forest aboveground biomass (AGB) using: i) FH; ii) FH with conditional autoregressive random effects (FHCAR); and iii) FHCAR with smoothed sampling variance (FHCAR-SMOOTH) models. Candidate models and the direct estimate based on plot measurements alone were compared using coefficient of variation (CV). On average, the FH model reduced the CV by 52.3% compared to the direct estimate. Incorporating spatial structure via the FHCAR model reduced the CV by 56.9% and 10.8% relative to the direct and the FH model estimates, respectively. Overall, these results illustrate the applicability and utility of using a SAE framework for linking LiDAR with typical forest inventory data. Chapter 3 extends the work of the previous chapter to include the additional common forest inventory variables of basal area (BA), volume (VOL), and tree density (DENS) into a multivariate HB SAE modeling framework. Using this framework, we compare the univariate case of SAE for the individual biophysical variables using the FH and FHCAR models to the multivariate FH with inverse-Wishart (FHIW) model. Candidate models and the direct estimate based on plot measurements alone were compared again via CV. On average, the FH model reduced the CV by 52.9%, 53.6%, 53.7%, 52.7% for AGB, BA, VOL, and DENS, respectively, compared to the direct estimate. Incorporating spatial structure via the FHCAR model reduced the CV for AGB, BA, VOL, and DENS by 58.0%, 57.5%, 58.7%, 54.2% relative to the direct estimates, respectively. On average, the FHCAR reduced the CV by 12.2%, 9.5%, 12.3%, 3.7% for AGB, BA, VOL, and DENS relative to the FH model estimates. The incorporation of the correlation among outcomes in the FHIW model reduced the CV even further relative to the FH model by 30.6%, 22.1%, 9.8% for AGB, BA, and VOL; however the average reduction was -1.8% for DENS. Overall, these results illustrate the additional utility of using a multivariate SAE framework for linking LiDAR with multiple 4 outcomes of interest in a forest inventory. The following two chapters transition from biophysical to social outcomes of interest. Private forest ownerships currently own a plurality of forested land in the US (Oswalt et al. 2014) and will play an increasing role in the management of the forested lands in the US for the foreseeable future. These two chapters offer methods to increase or better represent precision in estimates of private forest landowner population size. Chapter 4 has been published in Forest Science. In this chapter, Ver Planck et al. (2016) examine the current estimator used by the NWOS for total private forest area and private forest landowner population size along with one alternative estimator. The NWOS, conducted by the USDA Forest Service, is the standard for state- and national-level estimates of private forestland and ownerships in the US. The estimators are evaluated at the minimal resolution used by NWOS, of a state. Montana is used as a case study by combining freely available cadastral data with remote sensing in a geographic information system. These data allow us to evaluate the estimators for a known private forest ownership population. In addition, the impacts of nonresponse biases are assessed for each of the estimators. The results indicate that the current estimator performs as well as the alternative estimator under conditions of full response; however, the estimator performance varied under conditions of nonresponse. We offer the alternative estimator to explicitly account for nonresponse over the implicit nonresponse assumptions of the current NWOS estimator under the various conditions of nonresponse examined. Chapter 5 has been published in the Canadian Journal of Forest Research. In this chapter, Ver Planck et al. (2017c) examine the ability to expand statewide estimates similar to the NWOS to county levels. Due to sample sizes prescribed for inference at the state level, there are insufficient data to support county-level estimates. However, county-level estimates of NWOS variables are desired because ownership programs and education initiatives often occur at the county level and such information could help tailor these efforts to better match county-specific needs and demographics. Here, we present and assess methods to estimate the number of private forest ownerships at the county level for two states, 5 Montana and New Jersey. To assess model performance, true population parameters were derived from cadastral and remote sensing data. Two SAE models, the FH and the FHCAR, improved estimated county-level population mean squared error over that achieved by direct estimates. The proposed SAE models use covariates to improve accuracy and precision of county-level estimates. Results show total forest area, and 2010 decennial census population density covariates explained a significant portion of variability in county-level population size. These and other results suggest that the proposed SAE methods yield a statistically robust approach to deliver reliable estimates of private ownership population size and could be extended to additional important NWOS variables at the county level. Chapter 6 concludes with potential avenues to pursue related to the topics covered in this dissertation. 6 CHAPTER 2 HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA ESTIMATION OF FOREST VARIABLES USING LIDAR 2.1 Introduction Forest inventory efforts typically follow a sampling design that aims to cover a potentially broad range of stand conditions, e.g., capturing species and structural diversity. Generally, a systematic grid of fixed-area plots (FAP) or variable-radius plots (VRP) are established across the forest. The choice between FAP or VRP depends on the inventory objectives (Maltamo et al. 2009). A greater cost efficiency generally results in the establishment of VRP for operational inventories over FAP research inventories (Rice et al. 2014). Light detection and ranging (LiDAR) data have become one of the remote sensing tools of choice for extending and improving ground-based forest inventories and monitoring. There has been considerable research relating forest attributes with LiDAR covariates (see reviews by McRoberts et al. 2010, Næsset et al. 2004). In the case of complete LiDAR coverage, a fine grid is typically imposed on the area of interest and LiDAR covariates are calculated using the point cloud within each grid pixel. Often there is some effort to matching the pixel size to that of the inventory plot, especially in the setting where a regression model is developed to relate the LiDAR covariates to the response forest variables of interest (see, e.g., Finley et al. 2013, McRoberts et al. 2013). Development of regression models for spatially aligned LiDAR and inventory plot measurements is often referred to as unit-level analysis and is quite common (see, e.g., Babcock et al. 2015; 2016, Finley et al. 2017, Gregoire et al. 2016). These and similar analyses use FAP, opposed to VRP, because the relationship between plot extent can be directly matched with spatially coinciding LiDAR covariates. Truncated VRP, which result in a comparable extent to FAP, used in the Finnish National Forest Inventory have been successfully regressed on spatially aligned LiDAR covariates (Maltamo et al. 2007), 7 although this may not generalize to stands with diverse diameters (Scrinzi et al. 2015). There are examples, where VRP have been used in unit-level analyses (Hollaus et al. 2007; 2009). More recently, Hayashi et al. (2015) and Deo et al. (2016) have applied a variety of LiDAR resolutions that best match with the basal area factor applied in the VRP sampling. As an alternative to a unit-level analysis, an area-level analysis can be used when there is spatial misalignment between LiDAR and plot measurements (Goerndt et al. 2011). In an area-level regression analysis, observations are at the stand-level, whereas a unit-level analysis could consider multiple observations within each stand. Working at the area-level affords some advantages. For example, one can combine the cost efficient VRP with LiDAR covariates without spatial alignment between the plot data and LiDAR (Goerndt et al. 2011). In an effort to align VRP measurements and LiDAR data, Kronseder et al. (2012) calculated LiDAR covariates at a resolution of 1 ha based on the idea that VRP measurements are expanded to a per ha basis. Additionally, van Aardt et al. (2006) and Hudak et al. (2014) calculated the LiDAR variables at a segment- or stand-level in which the VRP samples were established. In this study, our focus is on application of area-level models to estimate aboveground forest biomass (AGB), with inference at the stand-level. The area-level models developed here are general and could be applied to other forest variables or transformations of AGB, e.g., for use in forest carbon accounting projects. Given time and cost constraints, inventory designs often focus on achieving forest-level accuracy and precision requirements, which results in a limited number of samples collected within any given stand. These small sample sizes result in stand-level point estimate uncertainty that is too large for practical use. This limited inference at the stand-level, due to paucity of samples, is referred to as the small area problem and is commonly tackled using a small area estimation (SAE) method. We consider a forest stand to be the small area of interest for the following development; however, this is setting specific and one can of course consider other small area delineations. The key of SAE is to borrow strength across related areas to improve estimation at the 8 small area of interest. Rao and Molina (2015) give an in-depth overview of SAE and cover many different approaches. Following the definitions of Rao and Molina (2015), SAE is divided into three classes: i) direct; ii) indirect; and iii) small area model-based estimation. Direct estimators are generally design-unbiased for the area of interest and derived directly from the sample. Model-assisted methods, also belonging to this class, have proven useful for combining remotely sensed data with ground observations for AGB estimation (see, e.g., Opsomer et al. 2007). Indirect estimators are developed based upon an implicit model that borrows strength from either another domain, time, or both. Several frequently applied examples of this class are synthetic and composite estimators (Goerndt et al. 2011, Breidenbach and Astrup 2012, Goerndt et al. 2013). In this study, we consider the third class of SAE, small area model-based estimation, referred to as small area models. Small area models differ from the previous two classes by including an explicit model with random effects that account for variation not explained by covariates in the model mean. For small area models, the analyst applies either a frequentist or Bayesian mode of inference. The frequentist approach uses empirical best linear unbiased prediction (EBLUP), and the Bayesian approach uses either empirical Bayes (EB) or hierarchical Bayes (HB). For forestry applications, EBLUP has been applied most frequently (Goerndt et al. 2011, Breidenbach and Astrup 2012, Goerndt et al. 2013, Magnussen et al. 2014, Mauro et al. 2016). Mauro et al. (2016) emphasized the correct specification of the estimator of mean squared error (MSE) of EBLUP for different levels of aggregation, from a pixel to an entire forest. The most common small area model is a linear mixed effects model, called the FayHerriot (FH, Fay and Herriot 1979) model, which links a direct estimator to covariates via a linear model. Only one of the preceding SAE forestry studies examined spatial correlations among the area-level effects (see Appendix B in Magnussen et al. 2014). EB is considered the Bayesian paradigm equivalent to EBLUP. Alternatively, HB methods provide access to posterior distributions of the small area parameters (You and Zhou 2011), and hence parameter inference that does not rely upon potentially unrealistic asymptotic assumptions 9 (Pfefferman 2013). The primary objective of this study was to apply a HB framework to increase the precision of estimates for mean AGB at the stand-level by borrowing strength across all stands through the use of LiDAR covariates. Additionally, we apply a conditional autoregressive structure to the stand-level random effects to assess gains in precision of AGB. The remainder of the manuscript follows with: i) a description of the study area along with relevant data for the small area models; ii) a description and implementation of the small area models; and iii) the results and discussion of applying small area models for AGB. All source code and data are provided to facilitate reproducible research and application of the proposed methods. 2.2 Methods 2.2.1 2.2.1.1 Data Study area The area of interest for this study was the Noonan Research Forest (NRF) near Fredericton, New Brunswick, Canada (N 45◦ 59’12”, W 66◦ 25’15”). The NRF has been managed by the University of New Brunswick since 1985 and is approximately 1500 ha in size with a total of 271 stands. The subsequent analysis uses a subset of 226 stands each with a minimum of two VRP per stand. These stands ranged in size from 0.6 to 47 ha with an average size of 6.6 ha (Table 2.1; Figure 2.1). The forest is composed of hardwood, mixed, and softwood stands with the major species being aspen (Populus spp.), balsam fir (Abies balsamea L. (Mill.)), birch (Betula spp.), eastern white pine (Pinus strobus L.), red maple (Acer rubrum L.), and spruce (Picea spp.), see Hayashi et al. (2015) for more details. 2.2.1.2 Variable radius plot data In 2010, a 100×100 m grid was laid out across the NRF. At each grid intersection a VRP was established and trees greater than 6.0 cm diameter at breast height (DBH) were selected into 10 the sample using a 2M basal area factor angle gauge. Species, DBH, and height were recorded for each sample tree. Plot estimates of AGB Mg ha−1 were calculated using Jenkins et al. (2003) species-group equations. Stand-level estimates were obtained by averaging plot-level AGB Mg ha−1 estimates. Table 2.1 summarizes these stand-level estimates. 2.2.1.3 LiDAR data The full waveform LiDAR data were collected on October 21 and 22, 2011 using a Riegl LMS Q680i laser scanner mounted on an airplane. The sensor had a pulse repetition frequency of 180 kHz with a laser wavelength of 1550 nm and a scan angle < 28.5◦ from nadir. The forest was covered in overlapping strips to achieve at a minimum of six pulses per m2 , footprint of 0.35 m, and up to eight returns per pulse (Hayashi et al. 2015). Stand-level LiDAR covariates were computed using the lascanopy function in the LAStools software suite (Isenburg 2016). The NRF stand polygons, LAS files, and arguments to define the vertical extent of the LiDAR profile considered, were passed into the lascanopy function to generate the desired LiDAR covariates. Here, we consider stand-level LiDAR signal derived percentiles—height in meters at which some percent of the energy is returned—labeled P followed by the percentile, as well as signal shape variables, e.g., kurtosis and skewness. Exploratory analysis using a backward selection procedure resulted in P 25 and P 75 having the lowest Akaike information criterion value. Although the two percentile heights have a high correlation, the variance inflation factor of each was three, which is less than the general rule of ten for assessing multicollinearity (see Kutner et al. 2004, p.409). Therefore, P 25 and P 75 covariates were used for the SAE models. 11 Table 2.1: Summary statistics for stand area, number of plots, mean aboveground biomass, sampling variances, and LiDAR covariates for the Noonan Forest (m = 226 stands) dataset. Min Stand Area (ha) 0.6 No. of plots 2 −1 Mean AGB (Mg ha ) 16.9 σi2 0.158 2 σ̃i 36.5 P 25 (m) 2.2 P 75 (m) 5.1 2.2.2 2.2.2.1 Max 47.3 44 223.5 7948 804 8.4 20.7 Mean 6.1 5.9 117.8 1698 411 5.2 11.7 SD 5.6 5.5 44.8 1432 226 1.3 2.8 Models Direct estimator and small area models The direct estimator applied assumed simple random sampling for point and variance estimates of AGB. Beginning with the SAE FH model for stand i in 1, 2, . . . , m stands defined as: Yi = θi + i , (2.1) θi = x0i β + vi , where Yi is the direct estimate, the parameter of interest θi is mean AGB, and i is a normally distributed error with mean zero and variance σi2 . The additive mean of θi comprises an intercept and stand-level covariates held in the p × 1 column vector x and associated p × 1 vector of regression coefficients β. The stand-level random effects term vi is normally distributed with mean zero and variance σv2 . For comparison with HB inference, the frequentist EBLUP was also applied for the FH model with available R statistical software (see Appendix A). It is reasonable to think that environmental variables, e.g., disturbance history and soil characteristics, could exhibit spatial autocorrelation and affect the observed AGB. In our setting, if direct estimate values are spatially correlated, i.e., adjacent stands have similar 12 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 25 20 15 10 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 45 ● ● ● ● ● ● ● ● ● ● ● ● ● 9 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 8 ● 7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 ● 5 4 ● 3 ● ● 0 1000 m 1 2 (a) (b) 23 20 17 14 11 8 5 (c) Figure 2.1: Noonan Forest stands by (a) number of variable radius plots, (b) stand-level P 25 (m); (c) stand-level P 75 (m) LiDAR covariates. AGB values, then we should exploit this relationship to further improve inference by pooling information across proximate stands. Hence, we augment model (2.1) by adding a spatially structured random effect that follows a conditional autoregressive (CAR) prior distribution, see, e.g., Banerjee et al. (2015) and You and Zhou (2011). This extended model called FHCAR is defined analogous to FH, with the exception that the unstructured random effects v = (v1 , v2 , . . . , vm )0 ∼ N (0, σv2 I) in model (2.1) are replaced with v ∼ N (0, Σ(σv2 , λ)). 13 Here, the m × m covariance matrix Σ(σv2 , λ) = σv2 [λR + (1 − λ)I]−1 , where σv2 is the spatial variance parameter, λ is the autocorrelation parameter, R is the neighborhood matrix with diagonal elements equal to the number of neighbors and off diagonal elements equal negative one or zero indicating if a neighbor is present or not, and I is the m × m identity matrix. Stands are only considered neighbors with adjoining borders. The previous models assume the sampling variances are fixed and known, which is a common assumption for SAE. However, additional smoothing methods are also commonly applied to the sampling variances to reduce instability in variance estimates of small sample sizes. Here, the sampling variances were smoothed by stand area ai (similar to Goerndt et al. 2011); smoothed sampling variances σ̃i2 were defined as: Ve ; ni Pm ai σi2 i=1 , V e = Pm i=1 ai σ̃i2 = (2.2) where ni is the number of VRP in stand i. These σ̃i2 were then used in place of the original sampling variances in the FHCAR model. This model is referred to as the FHCAR-SMOOTH model. The development of generalized variance functions (GVF) are also common in the SAE literature (e.g., Dick 1995); however, we did not find a useful covariate to develop a GVF. The Bayesian SAE model specifications are completed by assigning prior distributions to parameters (Gelman et al. 2014). Each regression coefficient in β was assigned a flat prior distribution, σv2 was given an inverse-Gamma (IG) prior distribution, and, following You and Zhou (2011), λ’s prior was uniform with support between zero and one. The IG’s shape hyperparameter was set to two, which results in a prior mean equal to the scale hyperparameter P 2 and infinite variance. The IG’s scale hyperparameter was set as m i=1 σi /m to give equal prior weight to the sampling and CAR variances. A Markov chain Monte Carlo (MCMC) algorithm was used to sample from parameters’ posterior distributions. Specifically, a Gibbs 14 algorithm was developed to sample from θ = (θ1 , θ2 , . . . , θm ), β, and σv2 with full conditional distributions given in You and Zhou (2011), and a Metropolis-Hastings algorithm was used to sample from λ’s posterior distribution. We completed all analyses using R statistical software (R Core Team 2015). The data and R code to reproduce this analysis are available at Mendeley Data (Ver Planck et al. 2017a). Parameter posterior inference was based on 3,000 post burn-in MCMC samples from each of L = 3 chains. For the FH and FHCAR models, we thinned the samples to every third sample to reduce the autocorrelation in samples resulting in K = 3,000 samples. For the FHCAR-SMOOTH model, thinning of the samples was not necessary based upon examination of plots of the autocorrelation function resulting in 9,000 samples. Chain mixing and convergence were diagnosed using a multivariate potential scale reduction factor of less than 1.1 for all parameters considered (Gelman et al. 2014). 2.3 Results The FH, FHCAR, and FHCAR-SMOOTH model parameter estimates are given in Table 2.2. Here, the 95% credible intervals for β1 and β2 do no include zero, which suggests the associated LiDAR covariates explain a substantial portion of the direct estimate’s variability. The HB and EBLUP FH model parameter estimates were similar, with the exception of σv2 (Table A.1). The discrepancy between the HB and EBLUP estimate for σv2 is likely due Table 2.2: The median and 95% credible intervals for Fay-Herriot (FH), FH with conditional autoregressive (FHCAR), and FHCAR with smoothed sampling variances (FHCAR-SMOOTH) model parameters. Parameter FH β0 -51.99 (-69.15, -34.31) FHCAR -49.94 (-67.89, -31.02) FHCAR-SMOOTH -41.50 (-54.77, -27.33) β1 , P 25 24.24 (18.17, 30.08) 23.35 (16.90, 29.52) 21.48 (16.80, 25.95) β2 , P 75 3.66 (0.83, 6.42) 3.82 (1.01, 6.84) 4.08 (2.28, 6.01) σv2 284.4 (201.2, 404.8) 471.5 (313.3, 720.8) 365.7 (201.3, 590.0) λ — 0.61 (0.21, 0.93) 0.61 (0.15, 0.93) 15 to the fact that, unlike EBLUP, HB acknowledges uncertainty in this parameter and hence greater variation around the AGB estimates. The stand-level point estimates of AGB for the FH model ranged between 18.0 and 208.7 Mg ha−1 with a mean of 117.1 Mg ha−1 . The ranges of the FHCAR and FHCAR-SMOOTH models were 18.2 – 210.1 Mg ha−1 and 21.3 – 215.0 Mg ha−1 with means of 116.2 and 118.4 Mg ha−1 , respectively. Figure 2.2 maps the direct and posterior means of AGB for FH, FHCAR, and FHCAR-SMOOTH model estimates for the individual stands. A scatter plot of these direct and posterior SAE model estimates does not reveal any regions of deviation (Fig. 2.3). Following the methods of Brown et al. (2001) and You and Zhou (2011), bias due to model misspecification was assessed through fitting linear models between the direct, assumed to be design-unbiased, and the posterior means of the SAE model estimates. The slopes of the individual linear models were 1.003, 1.002, and 1.063 for FH, FHCAR, and FHCAR-SMOOTH posterior means, respectively, which suggests there is not a systematic bias across the range of AGB values. The R2 for each was 0.843, 0.838, and 0.890 for the FH, FHCAR, and FHCAR-SMOOTH posterior means. 16 225 225 195 195 165 165 135 135 105 105 75 75 45 45 15 15 (a) (b) 225 225 195 195 165 165 135 135 105 105 75 75 45 45 15 15 (c) (d) Figure 2.2: Posterior mean aboveground biomass (Mg ha−1 ) for the (a) direct estimate, (b) FH model, (c) FHCAR model; (d) FHCAR-SMOOTH model. The coefficient of variation (CV), defined as the square root of the posterior variance divided by the posterior mean, was used as the measure of precision among the direct, FH, FHCAR, and FHCAR-SMOOTH model estimates with a smaller value indicating better precision. For the direct estimates CV ranged between 0.004 and 1.156 with a mean of 0.363. The ranges for the FH, FHCAR, and FHCAR-SMOOTH models were 0.004 – 0.510, 0.004 – 0.398, 0.037 – 0.363 with means of 0.148, 0.133, and 0.106, respectively. Using a 17 ● ● 200 150 100 50 Direct Estimate of Aboveground Biomass (Mg ha−1) ● FH FHCAR FHCAR−SMOOTH ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ●● ●●● ● ●● ●●● ● ●● ● ● ●● ●●●● ●●● ● ● ●● ● ● ●● ●● ● ● ●● ●● ● ●● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ●● ●● ● ●● ●● ●● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●●●●●●● ● ●● ● ●● ● ● ●● ● ●● ●●● ●● ● ●● ● ● ●● ● ● ●● ●● ● ● ●● ●●●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ●●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●●●● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●●●● ●● ● ● ●●●● ● ●●● ● ●●● ● ● ●● ●● ● ●● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ●● ●● ● ● ●●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ●● ● ●●●●● ● ●● ● ●● ● ●●● ● ●●● ●● ●● ●● ●● ●●●● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ●●● ●● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●●● ● ●● ●●● ● ● ●● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ●●● ●●●●● ●● ● ●● ● ● ●●● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● 50 100 150 200 Posterior Mean of Aboveground Biomass (Mg ha−1) Figure 2.3: Mean aboveground biomass (Mg ha−1 ) for direct estimates versus posterior means of FH, FHCAR, and FHCAR-SMOOTH models. common CV threshold of 0.15 for stand-level estimates (Mauro et al. 2016), 29 of the 226 stands were at or below this threshold for the direct and 161 for the FH model estimates. The FHCAR and FHCAR-SMOOTH models had 168 and 195 stands, respectively, at or below the threshold. Figure 2.4 maps the CV of the individual stands with the lightest shaded stands meeting or exceeding the CV threshold. Figure 2.5 shows the CV of the direct and three model estimates against the stands ordered from smallest to largest CV of the direct estimator. Reductions in CV are observed between the direct estimator and the FH model for all but two stands, and even greater reductions are observed between the direct estimator and the FHCAR model. Further reductions in CV are observed between the FH and FHCAR model for 186 stands. The smoothing of the sampling variance for the FHCAR-SMOOTH model shows larger estimates of CV than even the direct estimator at the low end of the CV direct estimator. A total of 17 stands had a CV for the FHCARSMOOTH model greater than the direct estimator. These stands were generally stands with four or fewer variable radius plots that had low variation among the plots. The remaining 18 1.2 1.2 0.99 0.99 0.78 0.78 0.57 0.57 0.36 0.36 0.15 0.15 0.003 0.003 (a) (b) 1.2 1.2 0.99 0.99 0.78 0.78 0.57 0.57 0.36 0.36 0.15 0.15 0.003 0.003 (c) (d) Figure 2.4: Coefficient of variation of the (a) direct estimate, (b) FH model, (c) FHCAR model; (d) FHCAR-SMOOTH model. stands’ CV values showed a similar pattern to the FHCAR model. The percent reductions in CV from the direct estimator to the FH model ranged from -0.9% to 88.0% with a mean of 52.3%. A wider range in percent reduction of -0.7% – 90.7% was observed from the direct estimator to the FHCAR model and a mean of 56.9%. Gains were also observed in percent CV reduction from the FH to the FHCAR model with a range of -34.6% – 36.0% and mean of 10.8%. The largest range in percent reduction was seen for 19 1.2 ● ● 0.8 0.6 0.4 0.0 0.2 Coefficient of Variation 1.0 ● ● DIRECT FH FHCAR FHCAR−SMOOTH ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ●● ●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●●●● ● ●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ●●●● ●● ● ●● ●● ● ●●●●●● ● ● ●● ● ●● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●●● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●● ●● ●● ●●● ● ● ●● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ●● ●● ● ●●● ● ● ● ● ● ● ●● ●● ● ● ●● ●●● ● ●●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ●● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ●●● ●●●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●●● ● ● ● ● ●●● ● ● ●●● ●● ●● ● ● ●●● ●● ● ● ● ● ● ● Stands (ordered from smallest to largest CV Direct) Figure 2.5: Estimates of coefficient of variation (CV) versus stands ordered by increasing CV of direct estimate. the FHCAR to the FHCAR-SMOOTH model of -2220% – 56.9% and a mean of -1.2%. The average percent change across stands between the FHCAR and FHCAR-SMOOTH models was negative mainly due to large percent changes (< -1000%) in a few of the stands as depicted in Figure 2.6. Excluding these two stands, the mean percent change in CV between the FHCAR and FHCAR-SMOOTH models was 13.9%. Table 2.3 shows the mean percent reduction in CV for FH to FHCAR models and FHCAR to FHCAR-SMOOTH models by number of neighboring stands. For the FH to FHCAR models, a general increasing trend was observed in percent CV reduction with increasing number of neighbors (ranging from -25.2% to 36.0%) with a slight drop moving from ten to eleven neighbors. A similar pattern was observed for the FHCAR to FHCAR-SMOOTH models. The smaller number of neighbors (i.e., 1–3 and 5) had a negative value indicating reduction from FHCAR-SMOOTH to FHCAR (ranging from -35.1% to -21.8%), and the remaining number of neighbors showed reduction from FHCAR to FHCAR-SMOOTH (ranging from 20.3% to 43.9%). 20 95 95 80 80 60 50 60 50 30 20 10 0 −5 −10 −15 −25 −35 −1000 −2300 30 20 10 0 −5 −10 −15 −25 −35 −1000 −2300 (a) (b) 95 95 80 80 60 50 60 50 30 20 10 0 −5 −10 −15 −25 −35 −1000 −2300 30 20 10 0 −5 −10 −15 −25 −35 −1000 −2300 (c) (d) Figure 2.6: Percent reduction in coefficient of variation from (a) direct estimate to FH model, (b) direct estimate to FHCAR model, (c) FH to FHCAR model; (d) FHCAR to FHCAR-SMOOTH model. The precision of the SAE models was also assessed by the width of the credible intervals, where narrower intervals indicate less uncertainty. The 95% credible interval widths of mean AGB are shown in Figure 2.7 for the FH, FHCAR, and FHCAR-SMOOTH models. The credible interval widths for the FH model ranged from 1.6 to 70.8 Mg ha−1 with a mean width of 57.2 Mg ha−1 . For the FHCAR model, the range of the credible interval widths were slightly wider, ranging from 1.5 to 87.3 Mg ha−1 , and the mean width was also smaller 21 Table 2.3: Mean CV reduction (%) between SAE models by number of neighboring stands. Number of Neighbors 1 2 3 4 5 6 7 8 9 10 11 13 FH to FHCAR -25.2 -7.3 1.5 9.3 15.3 20.1 23.6 27.6 30.4 34.2 28.1 36.0 FHCAR to FHCAR-SMOOTH -33.4 -35.1 -21.8 21.6 -32.8 21.9 32.7 24.6 29.1 20.3 39.3 43.9 at 50.3 Mg ha−1 . The range for the FHCAR-SMOOTH model was smaller than the previous two models from 20.3 to 66.3 Mg ha−1 ; however, the minimum credible interval width was much larger than both the FH and FHCAR models. The mean of the credible interval width of the FHCAR-SMOOTH model was 41.5 Mg ha−1 , which was slightly smaller than the FHCAR model. 2.4 Discussion In this study, we found P 25 and P 75 to be the LiDAR covariates that explained the greatest variation in stand-level AGB. Hayashi et al. (2015) using the same data from NRF found the height of P 45 to be most significant in explaining AGB estimates. One possible explanation for this difference is the LiDAR covariates used in this study were summaries at the area-level (stand-level), while Hayashi et al. (2015) examined relationships between AGB and LiDAR covariates at the unit-level (plot-level). Goerndt et al. (2011) also found LiDAR covariates to differ in their final regression model for stand-level versus plot-level relationships. Many studies (e.g., Lefsky et al. 2002) have also found P 95, generally regarded as canopy height, to be a common explanatory variable; however, this was not the case in our final linear model. We could have explored more LiDAR covariates; however, the focus 22 90 80 70 60 50 40 30 20 10 0 (a) 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 (b) (c) Figure 2.7: 95% credible interval widths of AGB (Mg ha−1 ) for the (a) FH, (b) FHCAR; (c) FHCAR-smooth models. of this paper was on applying the HB SAE framework and not to finding the best linking model. Recently, Breidenbach et al. (2016) and Mauro et al. (2016) examined estimation of variance associated with model-based SAE from a frequentist perspective. In this study, we offered the hierarchical Bayesian method as an alternative that provided access to the full posterior distributions of the variables of interest and simplified interpretation of model 23 parameter inference. For the FH model, the EBLUP had smaller σv2 compared with HB. We see this as a strength of the HB model, in that the HB mode of inference provides a more realistic accounting of uncertainty associated with estimating AGB. As a contribution, we have supplied the necessary functions to reproduce this research that could be applied to datasets with similar applications. This study also highlights the suitability of applying VRP for gains in sampling efficiency when coupled with LiDAR covariates that are summarized at the stand-level. Large reductions in CV were seen in the application of either the FHCAR or FHCARSMOOTH model to the NRF stands due to strong area effects. Magnussen et al. (2014) also examined small area applications accounting for spatial autocorrelation with improvements for Swiss forest districts. Additionally, Breidenbach et al. (2016) found large improvements in coverage probability when incorporating spatial autocorrelation into the variance estimate for the confidence intervals. These examples show that, when present, accounting for spatial autocorrelation can better represent the uncertainty in the variables of interest. Smoothing of sampling variances (Eq. 2.2) in the FHCAR-SMOOTH model decreased credible interval widths compared to the FHCAR model; whereas, for stands with fewer neighboring stands and smaller sample sizes the FHCAR model had smaller credible interval widths. However, this may be a result of increasing the sampling variance for the smaller stands by applying smoothing and may be more reflective of the instability of variance with these small sample sizes. The method of smoothing based on stand area may not be reflective of increased variation in smaller stands due to the practice of delineating stands based on their similarities in management history, species composition, and age. One potential drawback of applying SAE at the area-level is the inability to assess within stand variation, as can be done when aligning LiDAR and plot data at the unit-level (e.g., Hayashi et al. 2016, Woods et al. 2011). However, as seen in Hayashi et al. (2016) a determination of prediction unit size influences inference at the stand- and forest-level. For area-level analyses, the size of the areas can be adjusted to meet the application of interest 24 with the requirement of at least two ground samples within each area for estimating the sampling variance. A potential practical application of this research would be to extend AGB to carbon estimates for forest carbon accounting projects. SAE models could be cost effective in reducing uncertainty where paucity of field plots makes direct estimates unreliable for the project. Additionally, future work will be to extend this study to the multivariate setting similar to the work of Porter et al. (2015). In this extension, cross-correlations in the multivariate setting could improve AGB estimates along with other common forest inventory variables, e.g., basal area, tree density, or volume. The HB SAE framework demonstrates the utility of applying common ground inventory techniques, i.e., sampling with VRP, with available remotely sensed data for reducing uncertainty at the area-level. 25 CHAPTER 3 MULTIVARIATE HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA ESTIMATION OF FOREST VARIABLES USING LIDAR 3.1 Introduction Forest inventory efforts typically follow a sampling design to meet multiple objectives. The inventories are often designed to collect information for a range of outcomes of interest. Some of the most common forest inventory outcomes of interest related to forest management are aboveground biomass (Mg·ha−1 ), basal area (m2 ·ha−1 ), cubic volume (m3 ·ha−1 ), and tree density (no. trees·ha−1 ); these are the outcomes of interest in this study. A multivariate, rather than a univariate approach, to modeling these four outcomes may be beneficial for increasing the precision of outcomes that are more difficult to model from remote sensing, i.e., tree density (Finley et al. 2013). Generally, the data are collected through the establishment of a systematic grid of fixedarea plots (FAP) or variable-radius plots (VRP) across the region of interest. The permanence of the plots and inventory objectives will determine the type of plots to be utilized (Maltamo et al. 2009). A greater cost efficiency generally results from the establishment of VRP for operational inventories over FAP research inventories (Rice et al. 2014). Another solution for reducing the number of plots established and increasing sampling efficiency is the collection of remote sensing covariates, e.g., light detection and ranging (LiDAR), that cover the full range of natural variability. Numerous studies have been conducted to relate these four outcomes of interest with LiDAR covariates (see reviews by McRoberts et al. 2010, Næsset et al. 2004). Often the LiDAR covariates are extracted at a footprint that aligns with ground-based sample plots, especially when models are developed for analysis at the plot level, also referred to as the unit level (see, e.g., Babcock et al. 2015; 2016, Finley et al. 2017, Gregoire et al. 2016). FAP are commonly the plot type of choice for unit-level 26 analysis; however there have been several studies that have examined the proper resolution for combining LiDAR with VRP (see, e.g., Deo et al. 2016, Hayashi et al. 2015). The alternative choice to a unit-level analysis is an area-level analysis for which the inventory plots and remote sensing data may be misaligned (Goerndt et al. 2011). The most common area of interest in an area-level analysis is the forest stand. Here, we will use the efficiency of VRPs in combination with LiDAR data to determine stand-level per ha values for aboveground biomass, basal area, cubic volume, and tree density. Besides the choice between a unit- and area-level analysis, the cost and time to establish plots are determining factors to the number of plots that are established. Generally, the inventory is designed to meet forest-level accuracy and precision, and thus may result in small sample sizes for certain stands. This problem is most often ameliorated through a model-based solution known as small area estimation (SAE). SAE borrows strength across related areas to improve estimation for the small area of interest, i.e., an individual stand. For more in-depth details on SAE, see Rao and Molina (2015). The small area models are defined by the inclusion of an explicit model with random effects that account for variation that was unable to be explained by the covariates in the model mean. In SAE forestry applications, the frequentist approach using empirical best linear unbiased prediction (EBLUP) has been most commonly employed for univariate outcomes (Goerndt et al. 2011, Breidenbach and Astrup 2012, Goerndt et al. 2013, Magnussen et al. 2014, Mauro et al. 2016). One recent forestry application has examined a multivariate approach to EBLUP for the outcomes of basal area and volume (Mauro et al. 2015). Here, we employ the hierarchical Bayesian (HB) mode of inference that has been recently applied to the univariate case of aboveground biomass by Ver Planck et al. (2018). We then expand these univariate HB SAE models to the multivariate case to increase precision in each outcome by taking advantage of potential strong correlations among our four outcomes of interest. A unit-level multivariate approach has been demonstrated by Finley et al. (2013) to increase precision in outcomes that are more difficult to model from remote sensing, e.g., tree density. The 27 HB method provides access to posterior distributions for the small area outcomes of interest and is flexible for adding complex covariance structures (Porter et al. 2015), and parameter inference does not rely upon asymptotic assumptions (Pfefferman 2013). The primary objective of this study was to apply a multivariate HB framework to increase the precision of estimates for mean aboveground biomass, mean basal area, mean volume, and mean tree density at the stand level by borrowing strength across all stands through the use of LiDAR covariates and through the correlation among outcomes. Here, we consider the relationship among outcomes with the same number of observations for each outcome, where the information from other outcomes can improve the estimation for an individual outcome of interest. A potential extension of the multivariate framework could be to the case where differences in sample size exist among the outcomes for an area of interest; however, only the first case is examined in the current study. Additionally, we apply the univariate HB framework for each of the outcomes for comparison to the multivariate case. Furthermore, we add a separable conditional autoregressive structure to the stand-level random effects of the multivariate model to assess potential gains in precision for the four outcomes. The remainder of the study follows with: i) a description of the study area along with relevant data for the univariate and multivariate small area models; ii) a description and implementation of the small area models; and iii) the results and discussion of applying small area models for these four common forest inventory outcomes. 3.2 Methods 3.2.1 3.2.1.1 Data Study area The area of interest for this study was the Noonan Research Forest (NRF) near Fredericton, New Brunswick, Canada (N 45◦ 59’12”, W 66◦ 25’15”). The NRF has been managed by the University of New Brunswick since 1985 and is approximately 1500 ha in size including 115 ha 28 of nonforested wetlands with a total of 271 stands. The subsequent analysis uses a subset of 224 stands each with a minimum of two VRP per stand and sampling variation greater than zero for each outcome. These stands ranged in size from 0.6 to 47 ha with an average size of 6.6 ha (Table 3.1). The forest is composed of hardwood, mixed, and softwood stands with the major species being aspen (Populus spp.), balsam fir (Abies balsamea L. (Mill.)), birch (Betula spp.), eastern white pine (Pinus strobus L.), red maple (Acer rubrum L.), and spruce (Picea spp.). Figure 3.1a shows the stands by forest type with 24 intolerant hardwood stands, 89 mixed woods dominated by intolerant hardwoods stands, 54 mixed woods dominated by softwoods stands, 1 white pine stand, 11 black spruce stands, 38 tolerant softwoods stands, and 7 balsam fir stands. 3.2.1.2 Variable radius plot data In 2010, a 100×100 m grid was laid out across the NRF. At each grid intersection a VRP was established with proper boundary corrections and trees greater than 6.0 cm diameter at breast height (DBH) were selected into the sample using a 2M basal area factor (BAF) angle gauge. Species and DBH were recorded for each sample tree. A larger, 27M BAF, angle gauge was used to select a subset of trees for height measurements. Tree height was then modeled for the trees without height measurements, see Hayashi et al. (2016) for more details. For the first outcome of interest (aboveground biomass), plot estimates in Mg·ha−1 were calculated using Jenkins et al. (2003) species-group equations. The second and fourth outcomes of basal area (m2 ·ha−1 ) and tree density (trees·ha−1 ) were estimated directly from the size and number of sample trees. For the third outcome of interest (cubic volume), plot estimates in m3 ·ha−1 were derived from individual tree volumes estimated with local species-specific volume equations (Honer 1967). Stand-level estimates were obtained by averaging plot-level estimates for each of the outcomes. Table 3.1 summarizes these stand-level estimates. 29 3.2.1.3 LiDAR data The full waveform LiDAR data were collected on October 21 and 22, 2011 using a Riegl LMS Q680i laser scanner mounted on an airplane. The sensor had a pulse repetition frequency of 180 kHz with a laser wavelength of 1550 nm and a scan angle < 28.5◦ from nadir. The forest was covered in overlapping strips to achieve at a minimum of six pulses per m2 , footprint of 0.35 m, and up to eight returns per pulse (Hayashi et al. 2015). Stand-level LiDAR covariates were computed using the lascanopy function in the LAStools software suite (Isenburg 2016). The NRF stand polygons, LAS files, and arguments to define the vertical extent of the LiDAR profile considered, were passed into the lascanopy function to generate the desired LiDAR covariates. Here, we consider stand-level LiDAR signal derived percentiles—height in meters at which some percent of the energy is returned—labeled P followed by the percentile, as well as coverage and signal shape variables, e.g., density (dns) of cover, kurtosis and skewness. The density covariate, expressed as a percentage, is defined as the number of LiDAR returns above 2 meters divided by the total number of returns. Exploratory analysis using a backward selection procedure resulted in various combinations of percentile heights, dns, and kurtosis having the lowest Akaike information criterion value for the individual outcomes under consideration. The variance inflation factor, where values less than the general rule of ten for assessing multicollinearity (see Kutner et al. 2004, p.409) are acceptable, was also calculated to reduce the number of covariates for each of the outcomes. The final selected covariates for the SAE models were P 25 for the aboveground biomass and basal area outcomes, P 50 for the cubic volume outcome, and dns for the tree density outcome. 30 Table 3.1: Summary statistics for stand area, number of plots, mean aboveground biomass, mean basal area, mean volume, mean tree density, sampling variances, and LiDAR covariates for the Noonan Forest (m = 224 stands) dataset. 3.2.2 3.2.2.1 Stand Area (ha) Min 0.6 Max 47.3 Mean 6.2 SD 5.6 Number of plots 2 44 6.0 5.5 Mean aboveground biomass (Mg·ha−1 ) 16.9 223.5 117.8 44.9 Mean basal area (m2 ·ha−1 ) 4.20 43.5 25.1 8.45 Mean volume (m3 ·ha−1 ) 23.8 313 153 60.9 Mean tree density (trees·ha−1 ) 117 3628 1392 531 2 σi1 0.158 7948 1711 1431 2 σi2 1.33 393 75.7 61.5 2 σi3 0.019 16378 3253 2806 2 σ̃i4 0.001 3.92 0.373 0.407 P 25 (m) 2.18 8.35 5.20 1.29 P 50 (m) 3.40 14.9 8.39 2.16 dns (%) 31.2 92.6 76.5 11.0 Models Direct estimator and small area models The direct estimator applied assumed simple random sampling for point and variance estimates for the four outcomes: aboveground biomass, basal area, cubic volume, and tree density. The direct estimator for tree density was subsequently log transformed to meet the SAE model normality assumption, whereas the other outcomes remained untransformed into the small area models. The associated sampling variance for tree density was also transformed via the delta method (Casella and Berger 2002). We begin by fitting the univariate SAE models of Ver Planck et al. (2018) for the Fay-Herriot (FH) and FH with conditional autoregressive random effects (FHCAR) models for the individual outcomes. Then, we expanded the univariate FH model to the multivariate case, similar to Porter et al. (2015), for stand i in 1, 2, . . . , m stands and outcome j in 1, 2, . . . , n defined as: 31 Balsam Fir Tolerant Softwoods Black Spruce White Pine 9 8 Softwoods− Mixed Woods 7 6 Intolerant Hardwoods− Mixed Woods 5 4 Intolerant Hardwoods 3 0 1000 m 2 (a) (b) 16 100 14 90 12 80 10 70 8 60 6 50 4 40 2 30 (c) (d) Figure 3.1: Noonan Research Forest stands by (a) forest type (balsam fir; tolerant softwoods; black spruce; white pine; mixed woods dominated by softwoods; mixed woods dominated by intolerant hardwoods; intolerant hardwoods), (b) stand-level P 25 (m), (c) stand-level P 50 (m), and (d) stand-level dns (%) LiDAR covariates. Yij = θij + ij , θij = x0ij β j + vij , 32 (3.1) where Yij is the direct estimate, the parameter of interest θij is mean aboveground biomass (j = 1), basal area (j = 2), cubic volume (j = 3), or log tree density (j = 4), and ij is a 2 for aboveground biomass, basal normally distributed error with mean zero and variance σij 2 = σ 2 /Y 2 . The sampling area, and cubic volume. For log tree density, the variance is σ̃i4 i4 i4 error terms are assumed to be known and independent between locations and combined into ε ∼ N (0, Σ), where Σ is an nm × nm matrix containing the sampling errors stacked by outcome within location along the diagonal. The additive mean of θij comprises an intercept and stand-level covariates held in the pj × 1 column vector x and associated pj × 1 vector of P regression coefficients β j . The total number of covariates p is the n j=1 pj and gathered into the nm×p matrix X stacked by location. The stand-level random effects term vij is normally 2 . Similar to the sampling variances, we gather distributed with mean zero and variance σvj the stand-level error terms in Σv , an nm × nm matrix stacked by outcome within location. We parameterize Σv , similar to Porter et al. (2015), by Σv = I m ⊗ Σo with ⊗ defined as the Kronecker product and Σo is the n × n covariance matrix between the outcomes that is given an inverse-Wishart (IW) prior distribution. We refer to this multivariate SAE model as the FHIW model. As in the univariate case, it is reasonable to think that unobserved environmental variables could exhibit spatial autocorrelation and affect the observed outcomes. For this, we consider a separable covariance structure that uses a single conditional autoregressive (CAR) model along with the multivariate dependence in outcomes through the Kronecker product. We refer to this model as the FHSEP model by replacing the Σv in the FHIW model with Σv = Σs ⊗ Σo , where Σo has the same inverse-Wishart prior for the outcomes in the FHIW model. The Σs denotes the shared spatial CAR dependence for all of the outcomes as Σs = (D − ρW )−1 , where D is a m × m matrix with the number of neighbors of the stands along the diagonal. Here, stands are considered neighbors with adjoining borders. The m × m matrix W has the value of 1 in off diagonal locations for stands that are neighbors with zeros elsewhere, and ρ is the spatial dependence parameter. 33 The Bayesian SAE model specifications are completed by assigning prior distributions to parameters (Gelman et al. 2014). The same prior distributions of Ver Planck et al. (2018) for AGB were applied for the FH and FHCAR models for the three other outcomes of interest. For the FHIW and FHSEP models, each regression coefficient in β was assigned a flat prior distribution, Σo was given an IW prior distribution, ρ’s prior was uniform with support between zero and one. The IW’s shape hyperparameter was set to one plus the number of outcomes. The IW’s scale hyperparameter was set to the identity matrix with dimension of the number of outcomes. A Markov chain Monte Carlo (MCMC) algorithm was used to sample from parameters’ posterior distributions. Specifically, a Gibbs algorithm was developed to sample from θ = (θ11 , θ21 , . . . , θmn ), β, and Σo , and a Metropolis-Hastings algorithm was used to sample from ρ’s posterior distribution. We completed all analyses using R statistical software (R Core Team 2015). For all of the SAE models, parameter posterior inference was based on 3,000 post burn-in MCMC samples from each of L = 3 chains. Chain mixing and convergence were diagnosed using a multivariate potential scale reduction factor of less than 1.1 for all parameters considered (Gelman et al. 2014). 3.3 Results The univariate FH and FHCAR model parameter estimates are given in Table 3.2. Here, the 95% credible intervals for all of the βij do not include zero, which suggests the associated LiDAR covariates explain a substantial portion for each of the direct estimate’s variability. The credible intervals for the λ values of the FHCAR models also do not include zero indicating that there is potential spatial structure in the random area effects; and the overlapping credible intervals among outcomes suggest a single CAR model could be used for these outcomes in a multivariate setting, such as the proposed FHSEP model. However, the extension of the FHIW model to the FHSEP model considered had posterior estimates of ρ near zero; and therefore was not considered further. For the first outcome of interest, the stand-level point estimates of aboveground biomass 34 Table 3.2: The median and 95% credible intervals for the univariate Fay-Herriot (FH), and FH with conditional autoregressive (FHCAR) model parameters for the four outcomes of interest. Parameter FH β01 116.5 (111.9, 120.9) FHCAR 115.7 (108.0, 123.3) β11 , P 25 40.22 (36.20, 44.34) 39.82 (35.80, 43.90) β02 , 25.0 (24.03, 25.94) 24.92 (23.58, 26.25) 7.56 (6.69, 8.46) 7.60 (6.70, 8.50) β03 , 152.2 (145.9, 158.3) 151.2 (142.2, 160.1) β13 , P 50 54.04 (48.42, 59.79) 53.12 (47.48, 58.85) 7.17 (7.11, 7.24) 7.17 (7.07, 7.26) β14 , dns 0.135 (0.063, 0.210) 0.134 (0.056, 0.213) 2 σv1 284.1 (201.3, 403.3) 468.7 (309.6, 711.6) 2 σv2 12.49 (8.80, 17.96) 20.39 (12.86, 31.89) 2 σv3 2 σv4 504.5 (346.9, 737.1) 794.2 (499.9, 1258) 0.060 (0.039, 0.088) 0.109 (0.060, 0.179) λ1 — 0.681 (0.286, 0.960) λ2 — 0.514 (0.113, 0.893) λ3 — 0.589 (0.164, 0.899) λ4 — 0.465 (0.049, 0.891) β12 P 25 β04 , for the FH model ranged between 18.8 – 215.2 Mg·ha−1 with a mean of 116.5 Mg·ha−1 . The range of the FHCAR model was nearly identical, 18.3 – 218.1 Mg·ha−1 , with a mean of 115.6 Mg·ha−1 . The FH and FHCAR models for basal area ranged between 5.37 – 43.4 m2 ·ha−1 and 5.66 – 43.8 m2 ·ha−1 with means of 25.0 and 24.9 m2 ·ha−1 , respectively. Cubic volume ranged between 24.69 – 313.6 m3 ·ha−1 with a mean of 152.3 m3 ·ha−1 for the FH model; and for the FHCAR model, the cubic volume had a range of 22.13 – 310.9 m3 ·ha−1 with a mean of 151.2 m3 ·ha−1 . The final outcome of interest, tree density, had ranges of 501 – 2328 trees·ha−1 and 529 – 2488 trees·ha−1 with means of 1326 and 1323 trees·ha−1 for the FH and FHCAR models, respectively. For the multivariate FHIW model, Figure 3.2 maps the posterior means for the esti- 35 mates of the four outcomes of interest for the individual NRF stands. The ranges of the estimates were: 23.0 – 206 Mg·ha−1 , 6.55 – 42.4 m2 ·ha−1 , 10.36 – 308.7 m3 ·ha−1 , and 484 – 2390 trees·ha−1 for aboveground biomass, basal area, cubic volume, and tree density, respectively. The mean estimates were 114.9 Mg·ha−1 for aboveground biomass, 24.8 m2 ·ha−1 for basal area, 154.9 m3 ·ha−1 for cubic volume, and 1315 trees·ha−1 for tree density. These multivariate estimates are comparable to those of the univariate models. A scatter plot of these direct and posterior SAE model estimates does not reveal any regions of deviation for the outcomes of aboveground biomass, basal area, and cubic volume (Fig. 3.3a-c). However, tree density does show some larger deviations from the one-to-one line for direct estimates of 2000 trees·ha−1 or greater for all of the SAE models examined (Fig. 3.3d). Following the methods of Brown et al. (2001) and You and Zhou (2011), bias due to model misspecification was assessed through fitting linear models between the direct, assumed to be design-unbiased, and the posterior means of the SAE model estimates. The slopes of the individual linear models were 1.636, 1.447, and 1.554 for FH, FHCAR, and FHIW posterior means of tree density, respectively, which suggests there may be a systematic bias across the range of tree density values. The R2 for each were also low at 0.498, 0.481, and 0.525 for the FH, FHCAR, and FHIW posterior means. The slopes for the other outcomes had individual linear models ranging from 0.976 – 1.085 and R2 ranging between 0.809 – 0.878, which suggests no systematic bias for these outcomes across the range of values examined. 36 225 44 195 38.29 165 32.57 135 26.86 105 21.14 75 15.43 45 9.71 15 4 (a) (b) 350 3700 300 3186 250 2671 200 2157 150 1643 100 1129 50 614 0 100 (c) (d) Figure 3.2: The posterior means for the outcomes of (a) mean aboveground biomass (Mg·ha−1 ), (b) mean basal area (m2 ·ha−1 ), (c) mean volume (m3 ·ha−1 ); (d) mean tree density (number of trees·ha−1 ) for the FHIW model. The coefficient of variation (CV), defined as the square root of the posterior variance divided by the posterior mean, was used as the measure of precision among the direct, FH, FHCAR, and FHIW model estimates with a smaller value indicating better precision. Figure 3.4 shows the CV of the direct and three SAE model estimates against the stands ordered from smallest to largest CV of the direct estimator. For the direct estimates CV ranges were 0.004 – 1.156, 0.034 – 1.273, 0.001 – 1.121, and 0.027 – 1.979 for aboveground 37 FH 40 FH FHCAR 20 150 100 30 FHIW Direct Estimate of Basal Area (m2ha−1) FHIW 10 50 Direct Estimate of Aboveground Biomass (Mg ha−1) 200 FHCAR 50 100 150 200 10 Posterior Mean of Aboveground Biomass (Mg ha−1) 20 (b) FH 3500 FH FHCAR FHIW FHIW 2500 2000 1500 250 200 150 100 0 0 500 50 3000 FHCAR Direct Estimate of Tree Density (No. trees ha−1) 300 40 1000 350 (a) Direct Estimate of Volume (m3ha−1) 30 Posterior Mean of Basal Area (m2ha−1) 0 50 100 150 200 250 300 350 0 500 1000 1500 2000 2500 3000 3500 Posterior Mean of Tree Density (No. trees ha−1) Posterior Mean of Volume (m3ha−1) (c) (d) Figure 3.3: (a) Mean aboveground biomass (Mg·ha−1 ), (b) mean basal area (m2 ·ha−1 ), (c) mean volume (m3 ·ha−1 ); (d) mean tree density (number of trees·ha−1 ) for direct estimates versus posterior means of FH, FHCAR, and FHIW models. biomass, basal area, cubic volume, and tree density, respectively. The means of these direct estimate CVs were 0.365, 0.358, 0.379, and 0.549 for the four outcomes. For aboveground biomass, the ranges for FH, FHCAR, and FHIW models were all narrower than the direct estimates (0.004 – 0.470, 0.004 – 0.429, 0.004–0.293) with means of 0.148, 0.130, and 0.100, 38 respectively. In the case of basal area, the ranges for the FH and FHCAR were 0.034 – 0.391 and 0.0343 – 0.360, while the FHIW was narrower with 0.019 – 0.269. The means of the three SAE models were 0.140, 0.127, and 0.107. Cubic volume had ranges of 0.001 – 0.522, 0.001 – 0.463, and 0.001 – 0.754 for the FH, FHCAR, and FHIW models. Of the SAE models for cubic volume, FHIW had the largest maximum value of 0.754 which is associated with a regenerating forest stand and the lowest cubic volume direct estimate of all the NRF stands. The means for these CVs were 0.153 and 0.141 for the FH and FHIW models. For this outcome, FHCAR had the lowest mean value of 0.135. The CV for the FH model of tree density ranged between 0.026 – 0.283 with a mean value of 0.219. The FHCAR model for this outcome ranged between 0.026 – 0.337 with a mean of 0.210; and the FHIW model ranged between 0.027 – 0.304 with a mean of 0.223. Using a common CV threshold of 0.15 for stand-level estimates (Mauro et al. 2016), 27 of the 224 stands were at or below this threshold for the direct and 157 stands for the FH model estimates of aboveground biomass. The FHCAR and FHIW models had 167 and 193 stands, respectively, at or below the threshold. For basal area, the direct and FH model estimates had 33 and 164 stands that met this threshold, respectively. The FHCAR and FHIW had slightly more stands that met the threshold with 167 and 192 stands, respectively. In the case of cubic volume, the direct, FH, and FHIW model estimates had 22, 156, and 160 stands that met the threshold; the FHCAR model had the most stands with 166 stands. Tree density had fewer than 12 stands that met the CV threshold for the direct and SAE model estimates; so that, the CV threshold was increased to 0.25 for this outcome. The direct, FH, FHCAR, and FHIW models had 20, 211, 190, and 186 stands, respectively, for tree density that met this increased threshold. Reductions in CV are observed between the direct estimator and the FH model for all outcomes except one stand for cubic volume (Fig. 3.5), and even greater reductions are observed between the direct estimator and the FHCAR and FHIW models. Directly comparing the benefit of the multivariate FHIW model versus the univariate FH model, 223 39 1.2 DIRECT 1.2 DIRECT FHCAR FH FHCAR 1.0 1.0 FH FHIW 0.8 0.0 0.0 0.2 0.2 0.4 0.6 Coefficient of Variation 0.6 0.4 Coefficient of Variation 0.8 FHIW Stands (ordered from smallest to largest CV Direct) Stands (ordered from smallest to largest CV Direct) DIRECT FH FH FHCAR FHCAR FHIW FHIW 1.5 DIRECT 1.0 Coefficient of Variation 0.6 0.0 0.0 0.2 0.5 0.4 Coefficient of Variation 0.8 1.0 2.0 (b) 1.2 (a) Stands (ordered from smallest to largest CV Direct) Stands (ordered from smallest to largest CV Direct) (c) (d) Figure 3.4: Estimates of coefficient of variation (CV) for direct estimates and FH, FHCAR, FHIW models of (a) mean aboveground biomass (Mg·ha−1 ), (b) mean basal area (m2 ·ha−1 ), (c) mean volume (m3 ·ha−1 ); (d) mean tree density (number of trees·ha−1 ) versus stands ordered by increasing CV of direct estimate. The horizontal dashed line indicates a CV value of 0.15; and the horizontal dotted line indicates a CV value of 0.25. stands, 224 stands, 192 stands, and 55 stands had further reductions in CV for the outcomes of aboveground biomass, basal area, cubic volume, tree density, respectively. The percent reductions in CV for aboveground biomass from the direct estimator to the FH model ranged 40 from 0.4% to 88.1% with a mean of 52.9% (Fig. 3.5a). For basal area, percent reductions in CV from the direct estimator to the FH model ranged from 1.4% to 87.5% with a mean of 53.6% (Fig. 3.5b). Figure 3.5c shows the percent reductions in CV for cubic volume from the direct estimator to the FH model ranged from -1.3% to 85.7% with a mean of 53.7% for the NRF stands. For tree density, percent reductions in CV from the direct estimator to the FH model ranged from 1.6% to 86.5% with a mean of 52.7% (Fig. 3.5d). Gains were also observed in percent CV reduction from the FH to the FHIW model for aboveground biomass with a range of -0.4% – 86.2% and mean of 30.6%, for basal area with a range of 5.4% – 78.4% and mean of 22.1%; and for cubic volume with a range of -76.5% – 74.6% and mean of 9.8% (Fig. 3.6a-c). Gains in percent CV reduction were not as prevalent between the FH and FHIW models for tree density with a range of -8.5% – 5.8% and mean of -1.8% (Fig. 3.6d). 41 90 90 80 80 60 60 50 50 30 30 20 20 10 10 0 −5 −10 0 −5 −10 −20 −20 −40 −40 −80 −80 (a) (b) 90 90 80 80 60 60 50 50 30 30 20 20 10 10 0 −5 −10 0 −5 −10 −20 −20 −40 −40 −80 −80 (c) (d) Figure 3.5: Percent reduction in coefficient of variation from direct estimate to FH model for (a) mean aboveground biomass (Mg·ha−1 ), (b) mean basal area (m2 ·ha−1 ), (c) mean volume (m3 ·ha−1 ); (d) mean tree density (number of trees·ha−1 ). The precision of the SAE models was also assessed by the width of the credible intervals, where narrower intervals indicate less uncertainty. The 95% credible interval widths of mean aboveground biomass for the FH and FHCAR models ranged between 1.54 – 69.1 Mg·ha−1 and 1.55 – 88.3 Mg·ha−1 with mean widths of 57.0 Mg·ha−1 and 49.2 Mg·ha−1 , respectively. The FHIW model for aboveground biomass had a narrower range than the other two SAE models (1.54 – 51.4 Mg·ha−1 ) and a smaller mean width of 39.7 Mg·ha−1 . Similar to the 42 90 90 80 80 60 60 50 50 30 30 20 20 10 10 0 −5 −10 0 −5 −10 −20 −20 −40 −40 −80 −80 (a) (b) 90 90 80 80 60 60 50 50 30 30 20 20 10 10 0 −5 −10 0 −5 −10 −20 −20 −40 −40 −80 −80 (c) (d) Figure 3.6: Percent reduction in coefficient of variation from FH model to FHIW model for (a) mean aboveground biomass (Mg·ha−1 ), (b) mean basal area (m2 ·ha−1 ), (c) mean volume (m3 ·ha−1 ); (d) mean tree density (number of trees·ha−1 ). pattern of aboveground biomass SAE models, basal area had credible interval widths for the FH and FHCAR models ranging from 4.35 m2 ·ha−1 – 14.5 m2 ·ha−1 and 4.22 m2 ·ha−1 – 18.3 m2 ·ha−1 with mean widths of 12.1 m2 ·ha−1 and 10.8 m2 ·ha−1 ; whereas, the FHIW model had widths ranging from 2.19 m2 ·ha−1 – 12.2 m2 ·ha−1 with a mean width of 9.46 m2 ·ha−1 . In the case of cubic volume, the three SAE model credible interval widths ranged between 0.54 m3 ·ha−1 – 114 m3 ·ha−1 with mean widths of 76.7, 66.2, and 70.3 m3 ·ha−1 43 for the FH, FHCAR, and FHIW models. For the SAE models of tree density, the credible interval widths ranged between 227 – 1948 trees·ha−1 with mean widths of 1149, 1097, and 1162 trees·ha−1 . Based upon mean credible interval widths, the FHIW model had the best precision for the outcomes of aboveground biomass and basal area; whereas the FHCAR model performed best for cubic volume and tree density. 3.4 Discussion To our knowledge, all of the current SAE literature related to forestry applications (e.g., Goerndt et al. 2011; 2013, Magnussen et al. 2014) has examined univariate outcomes with a single proceedings paper by Mauro et al. (2015), which examined the multivariate outcomes of basal area and volume at the area level from a frequentist perspective. The multivariate FHIW model reduced, on average, the CV for the outcomes of aboveground biomass and basal area more so than both the FH and FHCAR models indicating modeling multiple outcomes may be more informative to increasing precision than applying separate univariate models. However, the gains were not as substantial for cubic volume and actually did not improve precision for tree density. The issues with poorer performance for tree density in the SAE models considered here may be a result of a pattern of underestimating the outcome for more dense stands with greater than 2000 trees·ha−1 . Goerndt et al. (2011) found similar issues with the EBLUP SAE model for tree density, although with a pattern of overestimation for high density stands and underestimation for low density stands. Due to this potential model misspecification for tree density, a different metric for precision other than CV may need to be considered, such as mean squared error, to account for additional bias induced by the model. The model may have failed for tree density for multiple reasons. Two potential causes are: i) the occlusion of trees in the variable-radius plot sampling, or ii) the inability of LiDAR to penetrate the canopy for measuring subcanopy trees. Here, large reductions in CV were seen in the application of the FHCAR model relative to direct estimates and more modest gains compared to the FH model due to strong area effects in 44 the NRF. Magnussen et al. (2014) and Breidenbach et al. (2016) also found improvements for small area applications in forestry that account for spatial autocorrelation. In this study, we found P 25 to be the LiDAR covariate that explained the greatest variation in stand-level aboveground biomass and basal area, P 50 for stand-level cubic volume, and dns for stand-level tree density. Hayashi et al. (2015) using the same data from NRF found the height of P 45 to be most significant in explaining aboveground biomass estimates; Hayashi et al. (2016) found a combination of the 45th quantile of LiDAR height, maximum canopy height, and LiDAR height where maximum point density occurs were best in modeling plot-level volume estimates. Neither of these studies considered the outcomes of basal area or tree density. A possible explanation for this difference is the LiDAR covariates used in this study were a much smaller subset relative to the other two studies; and our LiDAR covariates were summaries at the area-level (stand-level), while Hayashi et al. (2015) and Hayashi et al. (2016) examined relationships between the outcomes of aboveground biomass and volume with LiDAR covariates at the unit-level (plot-level). Also at the unit-level, Goerndt et al. (2010) examined univariate models for the outcomes of basal area and tree density with LiDAR covariates in western Oregon. For basal area, Goerndt et al. (2010) found P 5, percentage of first returns greater than 3 m above the ground, and the interquartile distance in the distribution of first returns as the best model. As for tree density, they found the minimum value for the distribution of all first returns above 3 m, P 10, and percentage of first returns above 15 m and 21 m as the best model. Each of these models had relatively high levels of variation explained of 0.823 and 0.877, respectively; whereas, our exploratory analyses had values of 0.709 and 0.131 for these outcomes. These differences are not surprising as our LiDAR covariates are summaries at the area-level rather than the unit-level. Goerndt et al. (2011) also found LiDAR covariates to differ in their final regression model for area-level versus unit-level relationships with particularly low value of variation explained, 0.209, for tree density. Many studies (e.g., Lefsky et al. 2002) have also found P 95, generally regarded as canopy height, to be a common explanatory variable; however, this was not the 45 case in any of our final linear models. The fairly low percent of variation explained by dns for the tree density outcome also indicates the need for identifying better LiDAR covariates from those explored here. One potential drawback of applying SAE at the area-level is the inability to assess within stand variation, as can be done when aligning LiDAR and plot data at the unit-level (e.g., Hayashi et al. 2016, Woods et al. 2011). As recommendations in the literature improve for aligning VRP with LiDAR covariates, the multivariate model presented here could be adapted to the unit-level. Additionally, the extension of the FHIW model to the FHSEP model did not improve the precision due to the posterior of the correlation parameter being near zero. However, this may not be true for other applications so continued consideration for the FHSEP model may be warranted. Another possible solution may be to move from the CAR model for the random effects to incorporate a spatial structure similar to the current work of Datta et al. (2017). The precision in outcomes of interest may also benefit from the incorporation of temporal and spatio-temporal models when areas have been inventoried across repeated time periods. 46 CHAPTER 4 EVALUATION OF THE NATIONAL WOODLAND OWNER SURVEY ESTIMATORS FOR PRIVATE FOREST AREA AND LANDOWNERS: A CASE STUDY OF MONTANA 4.1 Introduction Quantifying private forestland and understanding private landowner behavior at the state level in the United States has been an interest of researchers for more than 40 years (Kingsley and Finley 1975). Accurate and precise estimates inform outreach and assistance programs to private forest landowners (PFLs) to achieve management and stewardship objectives. In addition, researchers have been interested in understanding landowner behavior, objectives, and values reflected in ownership and management decisions. According to Oswalt et al. (2014), private forestland comprises more than 58% of all forest land in the United States with larger proportions privately held in the North and South (74 and 87%, respectively) and smaller proportions in the Rocky Mountains and Pacific Coast (26 and 39%, respectively) regions. In part because of these large regional differences, many studies of private forest ownership have been conducted at the state (Butler et al. 2005) and regional (Kaetzel et al. 2012) levels. The US Department of Agriculture (USDA) Forest Service and Forest Inventory and Analysis (FIA) program administer the National Woodland Owner Survey (NWOS) to provide the standard for state- and national-level estimates of private forest land area and number of private landowners. Along with these estimates, additional related variables of landowner attributes and behavior are collected. The FIA divides undifferentiated private landowners into five ownership groups: i) corporate, including Native Corporations in Alaska and private universities, ii) non-governmental conservation and natural resources organizations, iii) unincorporated local partnership, association, or club, iv) Native American, and v) individual and family, including trusts, estates and family partnerships (O’Connell et al. 47 2015). In the past, Smith et al. (2004) designated private ownerships into just two groups: industrial and non-industrial private forest (NIPF) landowners. More recently, Smith et al. (2009) and Oswalt et al. (2014) have moved to new private designations of corporate (group i) and non-corporate (groups ii - v). Whereas, the NWOS private ownership classifications designate group i as corporate, groups ii - iv as other private, and group v as family forest. Family forest is the primary population of interest in the NWOS (Butler 2008, Butler et al. 2016b). Here, our population of interest is the group of PFLs as defined by Finley et al. (2001) and Metcalf et al. (2012). PFLs are defined as all NIPF owners except for tribal owners. Tribal ownerships are treated as a group independent from PFLs because of their unique and complex governance structures (Nettheim et al. 2002) that do not align with any existing definition of PFLs. Although this definition differs from the NWOS classifications, the main objective of this research is to evaluate the estimators for a known population. Legal ownership boundaries have been used in the past to assess PFL populations through acquisition of local property tax records in either hard copy or digital form (e.g., Kittredge et al. 2008, Kilgore et al. 2013). The advent of geographic information systems (GIS) and remote sensing and the increasing availability of digitized and regularly updated cadastral data sets has increased the accessibility and affordability of these data. Although availability is increasing, the development and administration of these data often lies at the local or county levels of government; few states have complete and freely available data sets. The state of Montana is one of several states currently known to have polygon-based parcel data that allow the determination of a known population. We used these data to establish a known population of PFLs and evaluate the current estimators of the NWOS with full response for total private forest area and PFL population size. Further impetus for this current research comes from previous studies (e.g., Metcalf 2010, Metcalf et al. 2012) identifying shortcomings with the NWOS estimators for analyses conducted at the county level. Statewide testing of the estimators had so far been impossible because of lack of data. Recent clarifications and improvements to the descriptions of the NWOS estimators have been provided by Dickinson 48 and Butler (2013) and Metcalf et al. (2014). Using Montana as a case study, our objective was to also evaluate the behavior of the current estimators of the NWOS under a variety of nonresponse scenarios. Non-response bias is an issue affecting human dimensions research, especially quantitative survey research, and can significantly affect estimates of population parameters (Vaske 2008). Dickinson and Butler (2013), in their presentation of the NWOS estimators, acknowledge the need for better methods to handle nonresponse in the NWOS. Recently, Dickinson et al. (in press) outlined some potential methods for handling nonresponse in the NWOS. Landholding size is a key indicator of landowner decisions and behavior (e.g., Zhang et al. 2005) and is especially important in the estimator as landholding size is used directly to determine the proper weights for estimation of PFL population size. Thus, different trends in landholding size in relation to response rates can affect the performance of the estimators. Others have identified nonresponse as an important influence on biophysical estimates when stratifying between private and public ownerships in the US National Forest Inventory (McRoberts 2003, Patterson et al. 2012, Domke et al. 2014). In survey sampling, there are two types of nonresponse: unit and item (Särndal et al. 1992, Lohr 2010). Here, our focus is exclusively on unit nonresponse related to the NWOS. Little and Rubin (2014) describe the mechanisms of unit nonresponse as consisting of three types: missing completely at random (MCAR); missing at random (MAR); and not missing at random (NMAR). MCAR is defined by the mechanism of nonresponse being independent of the data. In the past and present, the NWOS estimators have implicitly assumed a nonresponse mechanism of MCAR and treated the survey respondents as representative of the population (Butler et al. 2005, Dickinson and Butler 2013, Dickinson et al. in press). On the other hand, MAR is defined as the mechanism of nonresponse depending only on the components of responding data, and NMAR is defined by the distribution of nonresponse depending upon the missing data values of nonresponse. Various modeling and weighting methods can be used to deal with MAR and NMAR response mechanisms (Särndal et al. 49 1992, Lohr 2010) with our proposed estimator focusing on an individual weighting method. The three primary contributions of this manuscript are as follows: the evaluation of current NWOS estimators of private forest area and PFL population size for a known PFL population at the state level; the presentation of an alternative estimator of PFL population size that explicitly accounts for various nonresponse scenarios; and the evaluation of the impacts of nonresponse biases on each of these estimators. The remainder of the article is organized as follows. First, we provide a description of the steps for defining a GIS-based baseline population of PFLs. Second, we describe the sampling frame and procedure for simulation. Third, we outline the current estimator used in the NWOS for estimating private forest area and PFL population size along with an alternative estimator. Fourth, we evaluate the impacts of different nonresponse scenarios through simulation on these estimators. Finally, we present the results of the estimators under different scenarios followed by discussion. 4.2 4.2.1 Methods GIS-based Baseline Population Estimate To allow an objective assessment of estimator functionality, we combined Montana cadastral data with remotely sensed land cover data to create a spatially explicit, statewide database of Montana PFLs. We began by determining forested land cover types (i.e., deciduous, evergreen, mixed forests, and woody wetlands) from the National Land Cover Dataset (NLCD, Jin et al. 2013) at a 30-m resolution and clipped to the extent of Montana. We then created a separate raster layer indicating the presence of census water, as defined by Bechtold and Patterson (2005), from the NLCD. To better align the raster and vector data, we resampled the raster data to a 10-m resolution. After reclassifying the NLCD to include only those pixels with forest cover types (the NLCD defines these pixels as at least 20% tree cover), we considered a patch as forested when the total number of pixels in a patch was greater than 41 (for a 10-m resolution, an approximate area of 0.41 ha). The use of twenty percent tree cover 50 may underestimate the true amount of forest area across the state by an unknown amount. Pixels with a common edge within a patch were considered neighbors. Forest patches were then converted from raster to polygons for subsequent intersection with ownership parcels. These characteristics of forest were chosen to align as closely as possible to the FIA definition of forest which stipulates that forest must be at least 10% cover of trees, at least 0.41 ha in area, and 36.6 m wide (O’Connell et al. 2015). Of the three characteristics, only the width of the derived forest patches was slightly less than the FIA definition. Ownership of the forested areas was determined by combining shapefiles for all statewide parcels from the Montana cadastral website (Base Map Service Center Montana State Library 2015). We processed parcel data at the county level and then aggregated to the state level. To begin processing parcels at the county level, we identified ownerships in the data set without a given owner name. Additional shapefiles of county, public lands, and reservation boundaries from the Montana Cadastral website (Base Map Service Center Montana State Library 2015) were used to expedite and aid in classifying these ownerships. Parcels for which ownership was unknown that fell within the public lands boundaries were assumed public, and those parcels that fell within the reservation boundaries were assumed tribal. The remainder of unknown parcels were given a unique identifier and treated as unique ownerships. In addition, due to misalignment between some parcel and public lands or parcel and tribal boundaries, we examined ownership names of all parcels that fell completely or partially within public lands or tribal boundaries to confirm public or private ownership. Only parcels clearly attributed as private were retained as potential PFLs. Parcels are not a corollary for owners as individual owners may control multiple parcels (Kittredge et al. 2008). We dissolved all parcels by ownership name and address so parcels with the same owner were treated as singular ownerships. To account for potential different spellings of the same landowner, an additional dissolve of the remaining parcels was done by ownership address only. We then intersected those dissolved parcels with forest patch polygons greater than 0.41 ha to calculate forest land on each ownership. We visually checked 51 the owner names of these parcels containing forest land (> 0.41 ha) to identify forest industry; we expedited this process by using keywords such as “timber,” “lumber,” “forest,” “tree,” and others. Simultaneously during visual inspection, tribal ownerships were identified by owner name. Private parcels containing forest land > 0.41 ha and not forest industry nor tribal were considered PFLs. The NWOS (Butler et al. 2016b) and Oswalt et al. (2014) classify tribal lands as private non-corporate ownerships. We aggregated county-level PFL databases into a statewide layer and ran a final dissolve command to combine parcels owned by the same owner in multiple counties. The final product was a statewide, spatially explicit database of all unique PFLs. We completed all analyses using a combination of QGIS (QGIS Development Team 2014) and R statistical software (R Core Team 2015). 4.2.2 Sampling Procedure The FIA uses regionally based remote sensing techniques to establish phase 1 plots to stratify the region of interest, at a minimum, into strata of forest, non-forest, and census water to increase precision of estimation by stratified random sampling (Bechtold and Patterson 2005). The NWOS primary sampling units of PFLs are tied to the plot location of FIA phase 2 plots (Butler 2008, Butler et al. 2016b) within the FIA sampling design (Bechtold and Patterson 2005). PFLs are sent the NWOS at least one year after an FIA field crew has measured the biophysical attributes of the plot and the landowner met the designation of a PFL. The FIA use a tessellated hexagonal grid across the entire United States to randomly establish these phase 2 plot locations within an individual hexagon. The actual hexagonal grid size for FIA is reported as having equal areas of 2403 ha (Bechtold and Patterson 2005, Dickinson and Butler 2013). We generated a tessellated hexagonal grid, with equal hexagonal areas of 2407 ha, across the state of Montana using DGGRID by Sahr (2011). This grid served to simulate the hexagonal grid that spatially distributes sampling plots used by the FIA and the NWOS. We then located at random 1000 points in each hexagon, each point representing a unique, simulated 52 sample of the underlying population. Estimates of private forest area were calculated at each iteration using the subsequently described estimators. For PFL population size, estimates were also calculated at each iteration using two different estimators, subsequently referred to as Methods I and II. For each of the two methods to be described, the means of the distributions of the 1000 simulated estimates of total, N̂hd , and variance, vd ar(N̂hd ), were computed. 4.2.3 4.2.3.1 Estimators Private Forest Area The first estimator to be evaluated was total private forest area (in ha) for Montana. The current NWOS estimator of forest area relies upon the FIA estimate, which uses stratified random sampling, of private forest area (Dickinson et al. in press). Because we did not stratify Montana as in the phase 1 plot procedures of FIA, we cannot apply the estimators of stratified random sampling. Instead, we based our FIA estimator of total private forest area on the assumption of equal probability sampling, in that each acre of forest is equally likely to be sampled. The estimator and respective variance are similar to those presented by Dickinson and Butler (2013): n h Ah X dhi = Ah p̂hd ; ÂF IA = nh (4.1) p̂ (1 − p̂hd ) vd ar(ÂF IA ) = A2h hd , nh − 1 (4.2) i=1 where ÂF IA is our FIA estimator of total private forest area across the stratum, Ah is the total land area of the stratum, nh is the sample size in stratum h, dhi is an indicator variable with value 1 if ownership i is within the stratum and private domain, and p̂hd is the ratio between samples that fall on private forest land and the total sample size. Throughout this 53 study, the stratum is the state of Montana; the domain is privately held forest. The current NWOS estimator for total private forest area is presented in Dickinson et al. (in press) as: nhr ÂF IA X dhi = ÂF IA z̄; Âhd = nhr (4.3) vd ar(Âhd ) = Â2F IA vd ar(z̄) + z̄ 2 vd ar(ÂF IA ), (4.4) i=1 where Âhd is the total forest area in stratum h and domain d, ÂF IA in this case is from equation 4.1, nhr is the number of respondents to the NWOS within the domain of interest with replacement and assuming an MCAR mechanism of nonresponse, z̄ is the mean of respondents within the domain of interest, and vd ar(z̄) is the sample variance of z̄. Because our domain of interest lies only in total private forest area, the estimates of equations 4.3 and 4.4 are equal to estimates of equations 4.1 and 4.2. We subsequently refer to these estimators as Method I for private forest area estimation. 4.2.3.2 Private Forest Landowner Population Size Using the GIS-based baseline PFL population size, we evaluated the current NWOS estimator and an alternative. We began with the current NWOS estimator for ownership total and corresponding variance by Dickinson et al. (in press): nhr dhi ÂF IA X = ÂF IA x̄; N̂hd = nhr a i=1 hi (4.5) vd ar(N̂hd ) = Â2F IA vd ar(x̄) + x̄2 vd ar(ÂF IA ), (4.6) where N̂hd is the total number of owners in stratum h and domain d, ÂF IA is from equation 4.1, nhr is the number of respondents to the NWOS within the domain of interest with replacement and assuming an MCAR mechanism of nonresponse, ahi is the forest area (in ha) of the ownership in the stratum, x̄ is the mean of respondents within the domain of 54 interest weighted by the inverse of forest area owned, vd ar(x̄) is the sample variance of x̄, and vd ar(ÂF IA ) is from equation 4.2. We subsequently refer to these estimators as Method I for PFL population size estimation. The alternative estimator for PFL population size begins with the previous NWOS estimators for ownership total and corresponding variance published by Dickinson and Butler (2013): n n h h dhi dhi A X 1 X = h ; N̂hd = nh phi nh ahi i=1 V̂ ar(N̂hd ) = (4.7) i=1 n h  X d 1 hi nh (nh − 1) i=1 phi 2 − N̂hd , (4.8) where N̂hd is the total number of owners in stratum h and domain d, Ah is the total land area of the stratum, nh is all samples that fall on land in stratum h with replacement, dhi is an indicator variable with value 1 if ownership i is in the stratum and domain, ahi is the forest a area (in ha) of the ownership in the stratum, and phi ( Ahi ) is the selection probability of the h ownership in the stratum. In their current form, these estimators also implicitly assume an MCAR mechanism of nonresponse. In an attempt to correct the estimators of equations 4.7 and 4.8 for PFL population size with nonresponse, we used a weighting method under the assumption of MAR across all ten size classes, subsequently described, at different levels of response rates. The weighting method was applied only to Equations 4.7 and 4.8 and accounts for the joint probability of selection and response by: P (selection, response) = P (selection)P (response|selection) = phi qhi , (4.9) where qhi is the probability of landowner response to the survey given selection in stratum a h of landowner i. By replacing phi ( Ahi ) in Equations 4.7 and 4.8 with Equation 4.9, we h 55 developed the alternative estimators, subsequently referred to as Method II, as: rh dhi Ah X ; N̂hd = rh ahi qhi i=1 (4.10) r 2 h  X dhi V̂ ar(N̂hd ) = − N̂hd , rh (rh − 1) phi qhi 1 (4.11) i=1 where rh is the sample size of responding PFLs with replacement plus all other plots that fall on land within the stratum. 4.2.4 Nonresponse We also generated estimates for each of the 1000 iterative samples under various nonresponse scenarios for Methods I and II for PFL population size estimation. We began to evaluate the effect of nonresponse on the estimators with an average response rate of 50% uniformly distributed across 10 landholder size classes (referred to as Scenario A; see Table 4.1 for size classifications). The size classifications were made to mimic size classes used in reporting by Butler (2008) and Oswalt et al. (2014). Fifty percent was considered the starting point for nonresponse based upon the results of Butler (2008). For total NIPF owners across the state 49 ) based on sending of Montana, they found a response, or cooperation, rate of 44.1% ( 193−82 193 surveys (82 undeliverable and 49 total responses) to landowners. More recently, Butler et al. (2016b) report a slightly better response rate of 48.2% based on sending 421 surveys (21 undeliverable and 193 total respondents). We then decreased response incrementally by 10%, equally distributed across all size classes, down to a low of 10%. Each of these incremental response levels can be considered an MCAR mechanism of nonresponse. The estimates of PFL population size from each of the two methods were then recalculated by removing nonrespondents from the sample. Our first evaluation of the two estimators assumed no pattern between landowner response and size class of properties, although in reality a trend may be present. Three additional scenarios of nonresponse were examined, each with a unique relationship between 56 Table 4.1: Size classes of private forest area. Size class Forest area (ha) 1 [0.41 — 4.05) 2 [4.05 — 8.09) 3 [8.09 — 20.2) 4 [20.2 — 40.5) 5 [40.5 — 80.9) 6 [80.9 — 202.3) 7 [202.3 — 404.7) 8 [404.7 — 2023.4) 9 [2023.4 — 4046.9) 10 4046.9 + response rate and property size classes and corresponding to an NMAR mechanism. Across all of these scenarios, we maintained an overall sample response rate of 50% (Figure 4.1b). These scenarios were chosen to represent a range of possible trends that may be observed when the survey is administered. The probabilities of nonresponse in each size class were dependent on the sample size in each size class to maintain the overall sample response rate of 50% per scenario (Figure 4.1b). Figure 4.1a shows an example of the sample distribution of unique landowners by size class for a single realization with full response. Scenario B had an extreme trend with few respondents in the smallest size class and progressively less probability of nonresponse as size class increased. Scenario C was similar to Scenario B with an opposite trend. The final scenario, referred to as D, was symmetric about the middle two size classes (i.e., classes 5 and 6), each sharing the highest probability of nonresponse. The mechanism of NMAR is clear in these scenarios: because of deliberate simulation of trends in response based on landholding size, the evaluation of the estimators for the two methods was once again based on removing the nonrespondents from the sample. We were only able to apply Method II to the scenarios with a trend because we knew a priori the probabilities of response based on the simulated data. All estimator evaluations were assessed at the spatial extent of the state. The extent of the entire state was chosen because it is the minimum resolution for reporting of estimates by the NWOS. Throughout this article, the performance of each of the two methods was compared 57 150 Probability of Non−response Number of Unique Private Forest Landowners 1.00 100 50 0 0.75 Scenario ● 0.50 ● ● ● ● ● ● ● ● ● ● A B C D 0.25 0.00 1 2 3 4 5 6 7 8 9 10 1 Size Class 2 3 4 5 6 7 8 9 10 Size Class Figure 4.1: (a) One realization of the sample distribution of unique private forest landowners (PFLs) by size class, (b) trends of probabilities of nonresponse by size class for four different scenarios (A-D) with an overall nonresponse rate of approximately 50%. based upon design-based inference (Gregoire 1998) with the average 95% confidence interval width and the actual 95% probability coverage of 1000 simulations. The actual probability coverage is the fraction of the number of simulations of the 1000 that include the GIS-based baseline within the confidence interval of each simulation; the nominal probability coverage is chosen to be 95%. 4.3 4.3.1 Results Private Forest Area From the procedures used to determine PFLs across the state of Montana, the GIS-based baseline total private forest land area was 1,743,229 ha (Table 4.2). Table 4.2 also shows the estimate of 1,727,868 ha along with a sampling error of 3.65% based on applying Equations 4.1 and 4.2 for 1000 simulations. The estimate of private forest area was 0.88% less than 58 the baseline private forest area and included the baseline estimate in the 95% confidence interval. The estimates of private (i.e., nonindustrial and noncorporate) forest area for Montana from the literature (Butler 2008, Oswalt et al. 2014, Butler et al. 2016b) ranged from 11.4 to 26.1% greater than the baseline of the derived population. Only the estimates of Butler et al. (2016b) from the literature contain the baseline private forest area within the 95% confidence interval. The majority of all forestland in Montana was concentrated in the western half of the state (Figure 4.2a), whereas concentrations of private forestland in the state were not as clearly distributed due, in part, to large areas of public forest compared to private forest. The private forestland covered approximately 27% of all forest land (Figure 4.2b). The largest proportion of forest area was in ownership sizes between 404.7 and 2023.4 ha (i.e, size class 8) for both the baseline population and the current NWOS (Butler et al. 2016b) estimate (Figure 4.3a). Some differences existed in the proportions in each size class, but the overall shape of the distribution appeared similar for both the baseline population and the NWOS estimate. Table 4.2: Summary of Montana private forest land area (ha) for the GIS-based baseline (base), the estimate from simple random sampling of the baseline (Method I), and additional publications. Forest Area Owner Estimate Source Groupa (ha)b Baseline PFL 1,743,229 Method I PFL 1,727,868 Butler (2008) NIPF 2,197,446 Oswalt et al. (2014) Noncorporate 1,941,685 Butler et al. (2016b) Noncorporate 1,978,106 a % Sampling Change Error from (%) base — — 3.65 -0.88 5.8 26.1 0.72 11.4 7.98 13.5 95% CI includes base — Yes No No Yes PFL, private forest landowners. Ownership classifications were redefined during reporting years from industrial and nonindustrial private forest land (NIPF) to new classifications of corporate and noncorporate. b Forest area estimates include ownerships >0.41 ha. 59 Figure 4.2: (inset) Location of Montana, highlighted in grey, within the conterminous United States, (a) 10 m x 10 m resolution for all forested lands across the state of Montana, and (b) 10 m x 10 m resolution for private forested lands across the state of Montana. 60 group group b Base Base NWOS Rockies 0.6 0.3 0.2 0.1 0.0 0.4 0.2 0.0 group c Base 40000 Number of Private Landowners a Proportion of Private Landowners Proportion of Private Forest Area 0.4 NWOS 30000 20000 10000 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9+ Size Class Size Class 1 2 3 4 5 6 7+ Size Class Figure 4.3: (a) The distribution for proportion of total private forest area of family forest landowners of Montana (NWOS, Butler et al. 2016b) and the GIS-based baseline (Base) population of Montana by size class, (b) the distribution for proportion of private forest landowners (PFLs) by size class for the Rocky Mountain region (Rockies, Oswalt et al. 2014) and the Base population of Montana, and (c) the distribution of number of family forest landowners of Montana (NWOS, Butler et al. 2016b) and PFLs for the Base population of Montana by size class. 4.3.2 Total Private Forest Landowners The GIS-based baseline population totaled 66,316 unique landowners (Table 4.3). The proportion of PFLs among the 10 size classes for the baseline population was concentrated in the lowest three size classes (< 20.2 ha) with the largest proportion owning less than 4.05 ha (Figure 4.3b). The baseline population had 36,822 landowners owning less than 4.05 ha with the remaining 29,494 landowners in size classes 2 through 10 (Figure 4.3c). The distribution for the proportions of PFLs closely followed the Rocky Mountain regional estimates of Oswalt et al. (2014) (Figure 4.3b). As with the regional estimates, the 61 distribution of the baseline PFL population closely followed the current NWOS (Butler et al. 2016b) (Figure 4.3c). The NWOS estimated slightly more landowners in the first size class with 41,409 owners compared with the baseline population. The remaining size classes for the current NWOS have an additional 29,000 PFLs across the state for a total population estimate of 70,409 (Table 4.3). Table 4.3: Summary of the number of private forest landowners in Montana for the GIS-based baseline (base), the estimate from two sampling methods, and additional publications. Owner No. of a Source group ownersb Baseline PFL 66316 Method I PFL 68838 Method II PFL 68811 Butler (2008) NIPF 40000 Butler et al. (2016b) FAMILY 70409 Sampling error (%) — 14.3 14.3 22.9 22.8 % change from base — 3.80 3.76 -39.7 6.17 95% CI includes base — Yes Yes No Yes a PFL, private forest landowners; NIPF, nonindustrial private forest land owners; FAMILY, private owners classified as family. b Number of private owners includes ownerships >0.41 ha. 4.3.2.1 Full Response For 1000 simulations of Method I with full response, the average estimate of the PFL population size was 68,838 (Table 4.3). Method II, with full response, had a slightly lower estimate of 68,811. The sampling error of the two methods was approximately 14% and included the baseline population in the 95% confidence interval. Methods I and II each had an increase in percentage from the baseline population of less than four percent. For Method I, the actual 95% probability coverage was 0.948 with an average 95% confidence interval width of 38,064 PFLs. Method II had a nearly identical actual 95% probability coverage of 0.947 with a confidence interval width of 37,965 PFLs (Figure 4.4a-c). 62 120 ● ● ● ● ● ● 65 60 Method ● II a 100 50 40 30 20 ● 100 ● 80 ● 60 ● ● 40 ● 20 b 10 100 Response rate (%) 30 20 ● 100 ● ● 50 ● d A B C D ● 40 30 20 ● 0.925 c 100 1.00 ● ● 50 ● 50 10 Response rate (%) 100 ● ● ● 0.75 0.50 0.25 ● ● 0 Scenario 0.950 10 150 Mean 95% Confidence Interval Width (in thousands) ^ Mean Nhd (in thousands) 40 ● ● Response rate (%) 150 0 50 ● 0.975 0.900 95% Probability Coverage 55 I 1.000 95% Probability Coverage 70 Mean 95% Confidence Interval Width (in thousands) ^ Mean Nhd (in thousands) 75 e A 0.00 B C Scenario D f ● A B C D Scenario Figure 4.4: Horizontal line indicates the GIS-based baseline of PFL population size. Horizontal dashed lines represent the nominal probability coverage rate of 0.95. (a-c) Mean PFL population size, mean 95% confidence interval width, and 95% actual probability coverage for full response and incremental response rates from 50 to 10% for the two different estimators (I and II) based upon 1000 simulations, respectively. (d-f) Mean PFL population size, mean 95% confidence interval width, and 95% actual probability coverage for four different response scenarios (A-D) with an overall 50% response rate of Methods I and II based upon 1000 simulations, respectively. 63 4.3.2.2 Nonresponse For Method I with an overall sample response rate of 50% (i.e., Scenario A), the average estimate of PFL population size remained relatively the same as full response with a mean estimate of 68,889 PFLs. The average confidence interval width increased from full response to 52,493 PFLs. The actual 95% probability coverage remained relatively constant at 0.962 (Figure 4.4a-c). Method II, with 50% response rates, increased from full response with a mean estimate of 70,475 PFLs. Method II had a wider average 95% confidence interval width of 54,452 PFLs with an actual 95% probability coverage of 0.973. As response rates continued to decrease, the estimates of PFL population size generated by Method I remained relatively stable. However, both the mean confidence interval width and probability coverage showed an increasing trend. Meanwhile, the estimates generated by Methods II showed a slight increasing trend with decreasing response rate, and the confidence interval widths showed a large pattern of increase. This resulted in inflated probability coverages for Methods I and II of up to 0.976 and 0.989, respectively. Figure 4.4d shows the results of PFL population size estimates generated by Methods I and II for the four different scenarios of nonresponse examined with an overall 50% response rate. The most inaccurate and imprecise estimates of Method I are seen in Scenario B, in which small landowners had a high probability of nonresponse, with the probability coverage dropping to zero. In contrast, Method II provided stable estimates of PFL population size with an inflated probability coverage for Scenario B. Confidence interval widths of Method I dropped considerably, whereas those of Method II drastically increased. For Scenarios C and D, PFL population size estimates of Method I increased well above the values of Scenario A. Method I had a large overestimate in Scenario C (high nonresponse probability of large landowners) with an estimate of 118,533 PFLs, a confidence interval width of 70,402 and a probability coverage rate of 0.063. Scenario D of Method I had a slightly lower over estimate of 89,913 PFLs, a confidence interval width of 63,444, and a probability coverage rate of 0.800. Method II remained stable with estimates of 70,446 and 64 70,266 for Scenarios C and D, respectively. The confidence interval width and probability coverage for Scenarios C and D also remained relatively constant at 40,977, 43,881, 0.954 and 0.970, respectively. All of these metrics gave better results for Method II than for Method I across all four scenarios. However, the confidence interval widths of Scenario C, which occurred due to very low response rates in the smallest size class, are still concerning. 4.4 Discussion The estimator for forest area was both accurate and precise for estimating the GISbased baseline private forest area for the state of Montana. However, the GIS-derived forest area estimation showed large discrepancies from other published estimates because of the exclusion of tribal lands considered in our definition of PFL, whereas tribal lands are included as private in the other published definitions. By rerunning our outlined procedure for tribal lands only, the derived total tribal forest area is 220,646 ha. Adding this to our baseline estimate of private forest area gives a total forest area of 1,963,875 ha, which results in the forest area estimate of Butler et al. (2016b) being 0.725% larger. We consider this difference to be negligible. Although the derived tribal land would comprise approximately 11% of the new total forest land, the additional number of landowners amounts to less than 0.2 percent. The inclusion or exclusion of tribal lands did not affect the performance of the estimators for private forest area or PFL population size. Estimates of PFL population size generated by Methods I and II performed relatively equally for full response. Either one of these estimators could reliably be used for PFL population size estimation with full response. The assumption of a fixed and known total land area is one advantage of Method II, whereas Method I must incorporate the FIA estimate of total and variance for private forest area, respectively. Dickinson and Butler (2013) cite this as a primary reason in their discussion for proposing that the NWOS switch to estimators similar to Method II. For both Methods I and II, the PFL population size was on average overestimated based on our simulations, and a corresponding decrease in sample size from 65 reduced response rates caused a decrease in precision. The reason for this overestimate on average is unknown to the authors. As for the reduced precision, according to Lohr (2010), the best way to reduce this impact is to invest in reducing the rate of nonresponse through prevention. Butler (2008) estimates the NIPF population size of 40,000 landowners in Montana generated by methodology corresponding to our Method I. At first this estimate seemed too low because the 95% confidence interval did not include the GIS-based baseline estimate, but it may be a reasonable estimate dependent on the sample actually drawn and the nearly 10-year difference between our study and the completion of their survey. Based on our simulations, the estimate of PFL population size could be as low as 42,787 for both Methods I and II. However, Butler et al. (2016b) estimate the family population size (including ownerships > 0.41 ha in size) to be considerably larger than that of Butler (2008) with 70,409 landowners in Montana, which includes the GIS-based baseline estimate in the 95% confidence interval. The sample size and number of respondents between these two survey periods were 49 and 193, respectively. The larger sample size is related to the current NWOS reporting estimates at the unit of an individual state with minimum sample size requirements to meet targets of precision (Butler et al. 2016b). The increased sample size should only impact the precision and not the estimate as long as nonresponse follows a MCAR mechanism. This increase of 30,409 landowners could be an actual trend since both surveys used Method I and the assumption of a MCAR nonresponse mechanism is valid for both survey periods in Montana. Method I showed stable estimates of PFL population size in our simulations in which the assumption of MCAR holds for decreasing response rates. However, if a MAR or NMAR nonresponse mechanisms exist then Method I may drastically underestimate or overestimate PFL population size dependent on a trend in nonresponse related to landholding size being present. In the case of a MAR or NMAR mechanism, Method II would be the preferred estimator for consistent estimates of PFL population size. For longitudinal studies of PFL population size, this may highlight the need for continually assessing the nonresponse mech- 66 anism when the PFL population size is determined. One possible solution around issues of nonresponse is to abandon the survey methods for population totals and adopt methods similar to this study using cadastral data. Nonresponse clearly has a large impact on both estimators across all rates and trends considered. Most of all, our Method II would fail considerably without the weighting correction for decreasing response rates when nonrespondents are removed because the sample size includes all plots that fall on land, and the nonrespondents on private land comprise a very small proportion of the total number of plots that fall on land. This was not the case for Method I, in which only respondents are considered, because the decrease in the sample size of forested plots was proportionally offset by the terms in the summation of Equation 4.5. Here, we present an estimator of PFL population size that is an alternative to the current NWOS estimator for nonresponse. The new Method II performed best in all scenarios with the trends considered. Although the confidence interval width increased for decreasing response rates, this performance is no different from that of the currently used estimation methods (i.e., Method I). The new method can easily be applied if no nonresponse trend is found proportional to landholding size. If a trend is determined, the probability of response for each size class must be estimated independently. One common approach to assessing nonresponse in surveys is through follow-ups with nonrespondents to determine any differences from respondents based on some characteristics, such as landholding size. The results of these follow-up surveys could then be used for applying Method II to determine probability of response given selection by size class, if necessary. Dickinson et al. (in press) also propose logistic regression for assessing probabilities of response with additional covariates that are available from the survey data. GIS-based methods with cadastral data offer one alternative to follow-up surveys in determining differences in response rates according to landholding size by knowing an approximate estimate for the amount of forestland owned before conduction of the survey. One possible drawback of this method for implementation in the NWOS is that only 9 of 50 states currently have 67 freely available cadastral data. However, cadastral data for assessing landholding size are available at the county level in many parts of the United States. Our methodology of combining remote sensing and freely available cadastral data into a GIS for determining PFL population size may obviate the need for sampling to determine totals of private forest area and PFL population size. In addition, this methodology may allow future researchers to employ novel sampling strategies or increase sample size and corresponding precision of landowner attitude and behavior estimates. Sampling strategies could use stratification techniques to increase proportions of “small” landowners into the sample, exploring issues relevant to more parcelized areas or at the wildland urban interface. For example, the current NWOS for Montana bases its results on only eight landowners owning less than 4.05 ha and has a total sample size of 193 respondents. The potential for statewide estimation based on this methodology, previously mentioned by others (McRoberts 2003, Kittredge et al. 2008, Metcalf 2010, Metcalf et al. 2012), may now be readily available for researchers and others exploring private forest ownership dynamics. 68 CHAPTER 5 HIERARCHICAL BAYESIAN MODELS FOR SMALL AREA ESTIMATION OF COUNTY-LEVEL PRIVATE FOREST LANDOWNER POPULATION 5.1 Introduction There are approximately 192 million ha of private forest land in the United States (US) owned by corporations, families, individuals, nongovernmental organizations, and tribal entities (Butler et al. 2016c). This forested land provides many social and ecosystem benefits, but is managed by millions of owners with potentially disparate goals and objectives (Butler et al. 2016a). Therefore, a deeper understanding of the demographics, attitudes, and management behaviors of private forest landowners is paramount to designing effective incentives, outreach programs, and support mechanisms that enable these owners to engage in sustainable forest management activities. The USDA Forest Service Forest Inventory and Analysis (FIA) National Woodland Owner Survey (NWOS) is the primary source of information about national-, regional-, and state-level private forest characteristics and owners’ demographics, attitudes, and behaviors. NWOS results can be summarized by ownership type, e.g., corporate, family, other private, and tribal and is currently implemented every five years (Butler et al. 2016c). Given limited resources to conduct the NWOS and low to moderate response rates, which are common in social surveys, the small sample size restricts reliable inference to statelevel estimates. Although state-level estimates are informative, county-level estimates might be more appropriate and effective when designing and delivering education and outreach to private forest owners. Furthermore, from a management and conservation standpoint, county-level estimates are more useful for tracking trends in private forest land parcellation, fragmentation, composition, and ownership demographics and characteristics (see, e.g., Kittredge et al. 2008, Pan et al. 2009, Poudyal and Hodges 2009). Results from these, and 69 similar, studies underscore the value of ownership information at fine spatial scales to encourage sustainable forest management and curb loss of ecosystem services. To this end, we present and assess methods to estimate the number of private forest ownerships at the county-level using the current NWOS sampling design. The NWOS’s target sample size is 250 respondents per state (Butler et al. 2016a) and a minimum of 100 respondents is required to generate state survey summaries (Butler et al. 2016b). Again, such small state-level sample sizes effectively preclude robust inference at county-levels, particularly when reporting results by ownership type. Small sample sizes can result in undesirably low precision of parameter estimates within a design-based framework (e.g., the NWOS design). Here, we refer to parameter estimates obtained using designbased estimators as direct estimates. These direct estimates are the current standard used in operational settings that employ design-based estimators; therefore, the direct estimates will be used for benchmarking against model-based approaches. Small area estimation (SAE) is a model-based approach that couples a direct estimate and possible covariates to improve the estimate precision and, in some cases, accuracy. Rao and Molina (2015) provide an excellent review of available SAE methods. Unlike a standard linear regression, the SAE framework is comprised of two component models: a sampling model and a linking model (You and Zhou 2011). Estimation of the SAE parameter of interest accounts for and balances between the sampling (i.e., direct estimator) and linking model errors. The linking model is a linear model with random effects that relate the small areas of interest with some error. Additional spatial structure may still remain in the linking model after accounting for possible covariates. Such residual structure can be further modeled using spatial random effects. SAE is also of great interest to users of the core FIA biophysical variables with the small area being dependent on the application. The modeling framework applied here for private forest ownerships could easily be adapted to these biophysical variables. In the case of biophysical variables, several recent forestry studies, e.g., Goerndt et al. (2013) and Magnussen et al. (2014), use SAE to improve inference at county- and municipal-levels. A 70 thorough literature review yielded no application of SAE to private forest landowners or related studies. The primary contributions of this work are i) producing county-level private forest ownership datasets for Montana and New Jersey, ii) defining and assessing SAE models to improve county-level inference of the number of private forest ownerships, and iii) developing open source software to fit proposed SAE models. The remainder of the manuscript is organized as follows. In section 5.2, we detail the steps followed to create a spatially explicit private forest ownership dataset for Montana and New Jersey. Then we define the direct estimator for the number of county-level private forest ownerships along with two SAE models. We then describe our approach for comparing the proposed SAE models. Results are given in section 5.3 followed by discussion and future directions in section 5.4. 5.2 5.2.1 Materials and methods True number of private forest ownerships Recently, several states, including Montana and New Jersey, made cadastral data freely available for determining property ownership. Using these 2015 cadastral data and the geographic information system (GIS) based analysis defined in Ver Planck et al. (2016), we created a GIS layer that delineates our working definition of private forest. Here, “private forest” comprises NWOS ownership categories of corporate, family, and other private (excluding tribal lands) (Butler et al. 2016c). The number of ownerships with forest property area greater than two ha within each county is the parameter of interest in the subsequent analysis. The derived GIS layers provide the true parameter value within each county, denoted YTi , with subscript Ti noting the true number of private forest ownerships in county i. Developing the private forest ownership GIS layers began by downloading freely available county-level cadastral data from the respective state repositories, Base Map Service Center 71 Montana State Library (2015) for Montana and New Jersey Office of Information Technology, Office of GIS (2015) for New Jersey. These cadastral data were combined with remotely sensed forest land cover data from the National Landcover Dataset (NLCD; Jin et al. 2013) to determine forest ownerships greater than two ha. Based on the NLCD specifications available, forested areas were defined to be at least 20% tree cover, at least 0.41 ha in area, and 30 m wide. To begin processing county-level properties, we identified ownerships in the dataset with unknown names. Parcels for which ownership was unknown that fell within the boundaries of public lands were assumed to be public. Additionally, the Protected Areas Database of the US (US Geological Survey 2016) was used to identify nongovernmental conservation organizations, e.g., the Nature Conservancy, listed as unknown in the cadastral dataset. All other unknown properties were assigned a unique identifier and treated as unique private ownerships. Remaining properties with known private ownership names were combined by owner(s) name and street address so that multiple properties within a given county were treated as a single ownership. Ownerships with forest industry or tribal affiliations were omitted from the final county-level private forest ownership GIS layers to be consistent with the private forest landowner definition used in Ver Planck et al. (2016) for Montana. This portion of the analysis was completed using a combination of QGIS software (QGIS Development Team 2014) and R statistical software (R Core Team 2015). 5.2.2 5.2.2.1 Models Direct estimator The direct estimator used for private forest ownership population at the state-level (Butler et al. 2016a, Dickinson and Butler 2013) was applied to individual counties (note that the notation was modified slightly for consistency with SAE models presented in section 5.2.2.2). The direct estimator and associated variance are based on probability proportional to size 72 sampling (Hansen and Hurwitz 1943): ni dij Ai X ; Yi = ni aij (5.1) j=1 σi2 = ni  X dij 1 ni (ni − 1) j=1 pij 2 − Yi , (5.2) where i indexes county, Yi is population total, Ai is total private forest area, ni is number of samples with replacement, dij is an indicator variable that is one if sample j fell in private forest and zero otherwise, aij is the forest area (ha) of the sampled ownership, and pij = aij /Ai is the ownership selection probability. The steps for drawing the ownership samples are described in section 5.2.3. The total county-level private forest area, Ai , was fixed to equal the true private forest area derived from the GIS layer in section 5.2.1. 5.2.2.2 Small area estimation models The direct estimator, Section 5.2.2.1, was log transformed to meet the SAE model normality assumption. This transformation was also desirable because it ensures SAE model population estimates have the correct support, i.e., positive, following back transformation. Taking the log of the direct estimator necessitates transformation of the associated variance, accomplished here via the delta method (Casella and Berger 2002). The Fay-Herriot (FH) SAE model (Fay and Herriot 1979) for county i in 1, 2, . . . , m counties is defined: Yei = θ̃i + i , (5.3) θ̃i = x0i β + vi , where Yei is the log-transformed direct estimator, θ̃i is the log population total, and i is a normally distributed error with mean zero and variance σ̃i2 = σi2 /Yi2 . The additive mean of θ̃i comprises an intercept and county-level covariates held in the p × 1 column vector x and 73 associated p × 1 vector of regression coefficients β. The county-level random effects term vi is normally distributed with mean zero and variance σv2 . It is reasonable to think that forest and ownership patterns, e.g., property size and owner’s socioeconomic or demographic characteristics, could exhibit spatial autocorrelation. In our setting, if direct estimate values are spatially correlated, i.e., adjacent counties have similar population values, then we should exploit this relationship to further improve inference by pooling information across proximate counties. Counties could be represented as either point locations (i.e., their centroid) or as areal units for defining this spatial structure. Representing counties by their centroid may misrepresent distances among neighboring counties due to irregularly shaped counties; therefore, the areal approach is implemented as it maintains the desired neighborhood structure of the county lattice. Hence, we augment model (5.3) by adding a spatially structured random effect that follows a conditional autoregressive (CAR) prior distribution, see, e.g., Banerjee et al. (2015) and You and Zhou (2011). This extended model called FHCAR is defined analogous to FH, with the exception that the unstructured random effects v = (v1 , v2 , . . . , vm )0 ∼ N (0, σv2 I) in model (5.3) are replaced with v ∼ N (0, Σ(σv2 , λ)). Here, the m × m covariance matrix Σ(σv2 , λ) = σv2 [λR + (1 − λ)I]−1 , where σv2 is the spatial variance parameter, λ is the autocorrelation parameter, R is the neighborhood matrix with diagonal elements equal to the number of neighbors and off diagonal elements equal to negative one or zero indicating if a neighbor is present or not, and I is the m × m identity matrix. Counties are only considered neighbors with adjoining borders. The FH and FHCAR Bayesian model specifications are completed by assigning prior distributions to parameters (Gelman et al. 2014). We selected non-informative priors for all model parameters. Each regression coefficient in β was assigned a flat prior distribution, σv2 was given an inverse-Gamma (IG) prior distribution, and, following You and Zhou (2011), λ’s prior was uniform with support between zero and one. The IG’s shape hyperparameter was set to two, which results in a prior mean equal to the scale hyperparameter and infinite P 2 variance. The IG’s scale hyperparameter was set as m i=1 σ̃i /m to give equal prior weight 74 to the sampling and CAR variances. A Markov chain Monte Carlo (MCMC) algorithm was used to sample from parameters’ posterior distributions. Specifically, a Gibbs algorithm was developed to sample from θ̃ = (θ̃1 , θ̃2 , . . . , θ̃m ), β, and σv2 with full conditional distributions given in You and Zhou (2011), and a Metropolis-Hastings algorithm was used to sample from λ’s posterior distribution. The data and associated code are available in (Ver Planck et al. 2017b). Parameter posterior inference was based on 3000 post burn-in MCMC samples from each of L = 3 chains resulting in K = 9000 samples. Chain mixing and convergence were diagnosed using a multivariate potential scale reduction factor of less than 1.1 for all parameters considered (Gelman et al. 2014). For our parameter of interest θ̂i (posterior mean of the population total), the K posterior samples of θ̃ were exponentiated back to the original units. 5.2.3 Simulation study For the NWOS and similar efforts, SAE models are viable from a statistical perspective if they improve county-level estimate precision without inducing substantial bias. Given the actual private forest ownerships for Montana and New Jersey, section 5.2.1, we examine SAE model inference against truth using a simulation study. One iteration in the simulation study produces a set of county-level direct and SAE model estimates by i) drawing a random probability proportional to size sample from the private forest ownership list sample frame, ii) computing direct estimates (section 5.2.2.1), iii) estimating FH and FHCAR models (section 5.2.2.2), and iv) evaluating differences between SAE model population estimates and truth. Summarizing results from step iv for a large number of iterations allows us to assess precision and bias in SAE model population estimates. A county-specific sample size, ni , is needed to conduct step i. Here too, we want the sample size to approximate that achieved by the NWOS design. To determine the sample size in each county, we randomly located 1000 points within each hexagon of a tessellated 75 hexagonal grid laid over the states of Montana and New Jersey using DGGRID developed by Sahr (2011). Each of the 1000 points represented a unique sample iteration across an individual state. The area of each hexagon was 2407 ha. This grid roughly approximates the one used to spatially distribute FIA and NWOS samples (Bechtold and Patterson 2005, Dickinson and Butler 2013). From each set of hexagons sampled within a county, we calculated the mean number of points that fell within private forest ownerships across the 1000 sample iterations. This mean value was rounded up and used as ni . Given fixed ni , to reduce variation among repeated samples, we repeated the simulation study steps i-iii N = 4000 times for Montana and New Jersey. These large numbers of repeated simulations should empirically show unbiased estimates of truth for the direct estimator. 5.2.3.1 County-level covariates Exploratory analysis using linear regression models showed that population density (PD; number of people·km−2 ) from the 2010 decennial census (US Census Bureau 2016), and NLCD total forest area (TFA; ha) explain significant variability in log-transformed direct estimates. For Montana, the linear regressions of the simulation runs explained 45% of the variation on average with a range from 27% to 65%. For New Jersey, the explained variation was higher with a mean of 76% and a range from 50% to 95%. Therefore, PD and TFA covariates were used for the simulation study. 5.2.4 Simulation summary The N iterations for the direct and SAE model estimates were evaluated for bias, relative bias, mean squared error (MSE), root mean squared error (RMSE), percent coverage for a 95% nominal rate, and 95% confidence interval width for direct estimates and 95% credible interval width for the SAE models. Each of these metrics were calculated in two ways: i) for an individual county (i.e., eqs. 5.4 and 5.6) and ii) for an individual state (i.e., eqs. 5.5 and 5.7). 76 Using the following two equations, bias was calculated as the average difference between the posterior mean of the population total, θ̂ij of county i and iteration j, and the truth YTi : Biasi = Bias = N 1 X θ̂ij − YTi ; N (5.4) 1 m (5.5) j=1 m X Biasi . i=1 Additionally, relative bias, RBi , is defined as the bias relative to the truth for each county 1 Pm RB ). (RBi = Biasi / YTi ) and summed across all counties for a state (RB = m i i=1 A trade-off between bias and precision is present when applying a SAE model. MSE was calculated as the average squared deviations of the posterior mean of the population total from the true population, and RMSE was the square root of these deviations: N 1 X M SEi = (θ̂ij − YT i )2 ; N M SE = 1 m j=1 m X M SEi . (5.6) (5.7) i=1 Percent coverage was defined as the average number of times that the 95% SAE credible interval for θij , or the direct estimate 95% confidence interval, included truth for each county. The average 95% confidence interval width for the direct estimate was computed by a t distribution with ni -1 degrees of freedom; whereas, the average 95% credible interval width was determined from the 0.025 and 0.975 quantiles of the posterior distributions of θij . Both percent coverage and average confidence interval width were also calculated at the state level. A final relative MSE comparison among the direct and SAE model estimates was made at the county level based on MSE (eq. 5.6) by M SE1 1 M SE 1 2 − M SE2 + 21 M SE2 (5.8) where M SE1 indicates the direct or the first SAE model estimate to be compared with M SE2 , the second SAE model estimate (Porter et al. 2015). 77 5.3 Results Summing the sample sizes across all counties for a single iteration, Montana had a statewide sample size of 751. Individual counties ranged from a minimum of two samples to a maximum of 44 samples in Fergus County. The average sample size per county in Montana was 13. Figure 5.1a maps the sample sizes across all of the counties for Montana. The true population of Montana is 42625 ownerships. Liberty County, with 58 ownerships, had the minimum population, and Flathead County, with 5209 ownerships, had the maximum (Fig. 5.1b). Montana is one of the least densely populated states in the US ranging from 0.097 to 21.6 people·km−2 for an individual county with an average of 2.7 people·km−2 (Fig. 5.1c). Montana’s forest area is primarily concentrated in the western portion of the state, with a range of 1077 ha (Sheridan County in the east) to 919800 ha (Flathead County in the west) and a mean 151200 ha per county (Fig. 5.1d). New Jersey had a statewide sample size of 191 for each iteration. Individual counties ranged from a minimum of two samples to a maximum of 20 samples in Burlington County. The average sample size per county in New Jersey was nine. Figure 5.2a maps the sample sizes across all of the counties for New Jersey. The true population of New Jersey is 35462 ownerships. Hudson County, with 33 ownerships, had the minimum population, and Atlantic County, with 3842 ownerships, had the maximum (Fig. 5.2b). New Jersey is one of the most densely populated states in the US, ranging from 73.4 to 4753 people·km−2 , with an average of 806.6 people·km−2 in a single county (Fig. 5.2c). New Jersey’s forest area is primarily concentrated in the northwestern and southeastern portions of the state, with a range of 562 ha (Hudson County) to 121500 ha (Burlington County) and a mean of 41210 ha per county (Fig. 5.2d). 78 49 42 35 28 21 14 7 0 5300 4550 3800 3050 2300 1550 800 50 0 100 km (a) (b) 925 793 661 529 397 265 133 1 21.62 18.53 15.44 12.36 9.27 6.18 3.09 0 (c) (d) Figure 5.1: Montana counties by (a) sample size, (b) true population size with >2 ha forest, (c) 2010 census population density (PD, number of people·km−2 ); (d) total forest area (TFA, 1000s ha). 79 21 4100 18 3500 15 2900 12 2300 9 1700 6 1100 3 500 0 30 0 50 km (a) (b) 5405 140 4633 120 3861 100 3089 80 2317 60 1544 40 772 20 0 0 (c) (d) Figure 5.2: New Jersey counties by (a) sample size, (b) true population size with >2 ha forest, (c) 2010 Census population density (PD, number of people·km−2 ); (d) total forest area (TFA, 1000s ha). In logarithmic form, the regression coefficients were significant for the intercept, PD, and TFA for the majority of the iterations. Averaged across all iterations for the Montana FH model, the mean point estimate for the intercept was 5.01, the PD regression coefficient was 80 0.0733, and the TFA regression coefficient was 4.10x10−6 . For the FHCAR model, the point estimates were 5.04, 0.0676, and 4.01x10−6 for the intercept, PD, and TFA, respectively. None of the 95% credible intervals for the intercept or the TFA regression coefficient included zero for the N iterations of either SAE model. For the PD regression coefficient, 3032 and 2914 iterations did not include zero in the credible interval of the FH and FHCAR models, respectively. In the case of the New Jersey FH model, the point estimates were 7.26, -9.35x10−4 , and 1.04x10−5 for the intercept, PD, and TFA regression coefficients, respectively. For the FHCAR model, the point estimates were 7.25, -9.22x10−4 , and 1.08x10−5 for the intercept, PD, and TFA regression coefficients, respectively. None of the 95% credible intervals for the intercept included zero for any of the iterations of either model. None of the credible intervals for the PD regression coefficient included zero for any iterations of the FH model and 3996 iterations of the FHCAR model did not include zero. For the TFA regression coefficient, 1735 and 2321 iterations did not include zero in the credible intervals of the FH and FHCAR models, respectively. SAE model parameter estimates for a randomly selected iteration are given in Table 5.1. All of the regression coefficients were significant with the exception of TFA in New Jersey. For Montana, the mean of the sampling variances was much smaller than the random effect variance for both the FH and FHCAR models, whereas New Jersey had roughly equal mean sampling and random effect variances. The autocorrelation parameter in both states was fairly low with a much wider credible interval in New Jersey than Montana. Figure 5.3a confirms that the posterior means of the sampling variances of all iterations in Montana were lower than the random effect variances of the FH and FHCAR models; however, for New Jersey, the FH and FHCAR random effect variances were smaller than the mean sampling variances (Fig. 5.3c). The posterior mean of all iterations for the FHCAR autocorrelation parameter was also fairly low in Montana and slightly greater in New Jersey (Figs. 5.3b and d). 81 Table 5.1: The median and 95% credible intervals (in parantheses) for Fay-Herriot (FH) and FH with conditional autoregressive (FHCAR) models of a single iteration for Montana (iteration 750) and New Jersey (iteration 350). Montana New Jersey Parameter FH β0 4.92 (4.50, 5.34) FHCAR 4.91 (4.38, 5.46) FH 7.42 (6.65, 8.13) FHCAR 7.41 (6.59, 8.14) β1 , PD 0.078 (0.001, 0.146) 0.072 (0.007, 0.135) -1.11x10−3 (-1.48x10−3 , -7.42x10−4 ) -1.09x10−3 (-1.45x10−3 , -7.00x10−4 ) β2 , TFA 4.40x10−6 (2.70x10−6 , 6.08x10−6 ) 4.52x10−6 1.12x10−5 −6 −6 (2.72x10 , 6.35x10 ) (-3.56x10−7 , 2.27x10−5 ) 1.16x10−5 (1.14x10−6 , 2.23x10−5 ) E(σ̃i2 ) 0.16 (0.02, 0.67) 0.16 (0.02, 0.67) 0.23 (0.09, 0.75) 0.23 (0.09, 0.75) σv2 1.14 (0.76, 1.77) 1.71 (0.95, 3.73) 0.21 (0.07, 0.68) 0.22 (0.07, 0.93) λ — 0.14 (0.007, 0.62) — 0.29 (0.01,0.92) Note: PD, number of people·km−2 ; TFA, total forest area (ha). 82 15 4 Parameter ~2 σ i FH σ2v FHCAR σ2v 3 Density Density 10 2 5 1 0 0 1 2 3 0.2 Posterior Means (a) 6 0.4 0.6 Posterior Means (b) Parameter ~2 σ i 4 FH σ2v FHCAR σ2v 4 Density Density 3 2 2 1 0 0 0.5 1.0 1.5 2.0 0.2 Posterior Means 0.3 0.4 0.5 0.6 Posterior Means (c) (d) Figure 5.3: Distribution of posterior means from all iterations for select SAE model parameters: (a) the variance parameters of Montana, (b) the FHCAR model autocorrelation parameter of Montana, (c) the variance parameters of New Jersey; (d) the FHCAR model autocorrelation parameter of New Jersey. Table 5.2 shows the simulation summaries for the individual states. As an aside and not surprisingly, this simulation study can empirically show that the direct estimator is nearly unbiased (see Appendix, Tables B.1 – B.3). For Montana, the biases of the direct, FH, and FHCAR model estimates were -3, -25, and -28 ownerships, respectively. A negative value indicates an underestimate of the true population and a positive value indicates an 83 overestimate of the true population. In terms of relative bias, these statewide estimates represent less than negative one-tenth of a percent. The RMSE was largest for the direct estimates with a value of 467. Both the FH and FHCAR models had similar RMSE values of 407 and 405. The empirical coverage rates were low for the direct, FH, and FHCAR model estimates with values of 75.0%, 75.3%, and 75.3%. The 95% confidence or credible interval widths, ordered widest to narrowest, were 1663, 1109 and 1097 for the direct, FH and FHCAR model estimates, respectively. Overall, the MSE is reduced by 24.0% from the direct to the FH model, by 24.8% from the direct to the FHCAR model, and by 1.0% from the FH to the FHCAR model. For New Jersey, the bias of the direct estimates was one ownership, the FH model was 49 ownerships, and the FHCAR model was 40 ownerships. In terms of relative bias, these statewide estimates represent less than two-tenths of a percent. The RMSE was largest for the direct estimates with a value of 633. The FH model had the lowest RMSE of 545, whereas the FHCAR model was between the direct estimates and the FH model with a value of 564. The empirical coverage rate was highest for the direct estimates followed by the FH and FHCAR models with values of 93.7%, 90.7%, and 89.2%, respectively. The direct estimates had the widest 95% confidence interval width of 3979 followed by 95% credible interval widths of 2207 for the FH model and 2050 for the FHCAR model. Overall, the MSE is reduced by 25.9% from the direct to the FH model, by 20.6% from the direct to the FHCAR model, and by 6.6% from the FHCAR to the FH model. For individual counties of Montana, the bias (Eq. 5.4) of the direct estimates ranged from -29 (for Lincoln County) to 12 (for Lewis and Clark County) ownerships (Appendix B, Tables B.1 and B.2). Eight of the counties were unbiased for the direct estimates. In terms of relative bias, the direct estimates ranged from -3% to 5%. The FH model had a bias ranging from -289 ownerships (for Stillwater County) to 444 ownerships (for Flathead County). Treasure County was the lone unbiased county for the FH model. The relative bias ranged from -26% to 41%, with a mean of -3%. The FHCAR model had a bias ranging 84 Table 5.2: Summary of bias, root mean squared error (RMSE), empirical coverage for a 95% nominal coverage rate and average 95% confidence interval width of the direct estimate and 95% credible interval width of the small area estimation model estimates for Montana and New Jersey across all counties and iterations for the direct, Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR) model estimates. Montana New Jersey Metric Bias Direct -3 FH -25 FHCAR Direct FH -28 1 49 FHCAR 40 RMSE 467 407 405 633 545 564 Percent coverage 75.0 75.3 75.3 93.7 90.7 89.2 Confidence / credible interval width 1663 1109 1097 3979 2207 2050 from -288 ownerships (for Musselshell County) to 498 ownerships (for Flathead County). Daniels County was the only unbiased county for the FHCAR model. The relative bias ranged from -25% to 30%, with a mean of -4%. These biases can be explained by the tradeoff of increasing precision with an associated increasing bias. Based on the relative MSE comparison of Eq. 5.8, 49 of 56 counties showed improvement from the direct to the FH model estimates, and 52 counties showed improvement from the direct to the FHCAR model estimates. Thirty counties showed improvement from the FH to the FHCAR model (Fig. 5.4). Both the FH and FHCAR models had 38 counties with greater percent coverage than the direct estimates. The credible interval widths were narrower in all but Flathead County for both the SAE models compared with the direct estimate confidence interval widths. These estimates for individual counties can be seen in Appendix B (Table B.6). For individual counties of New Jersey, the bias of the direct estimates ranged from -27 ownerships (for Cumberland County) to 18 ownerships (for Burlington County) (Appendix B, Table B.3). Four of the counties were unbiased for the direct estimates. In terms of relative bias, the direct estimates ranged from -1% to 1%. The FH model had a bias ranging from -477 ownerships (for Hunterdon County) to 673 ownerships (for Ocean County). No county was unbiased for either the FH or FHCAR model. The relative bias ranged from -15% to 372% (for Union County). Removing Union County, which has a small true ownership size of 85 46, the maximum becomes 55% (for Cape May County) and the mean is 9%. The FHCAR model had a bias ranging from -572 ownerships (for Hunterdon County) to 764 ownerships (for Ocean County). The relative bias ranged from -17% to 420% (for Union County). Removing Union County again, the maximum becomes 60% (for Cape May County) and the mean 9%. Sixteen of 21 counties showed improvement based on relative MSE (eq. 5.8) from the direct to the FH model and to the FHCAR model. Ten counties showed improvement from the FH to the FHCAR model (Fig. 5.5). The FH and FHCAR models had 16 and 14 counties, respectively, with greater percent coverage than the direct estimates. The credible interval widths were narrower in all counties for both of the SAE models compared with the direct estimate confidence interval widths. These estimates for individual counties can be seen in Appendix B (Tables B.4 and B.5). 86 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 (a) 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 (b) 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 (c) Figure 5.4: Relative mean squared error comparisons for Montana (eq. 5.8): (a) direct to FH model estimates, (b) direct to FHCAR model estimates; (c) FH to FHCAR model estimates. 87 2 2 2 1.6 1.6 1.6 1.2 1.2 1.2 0.8 0.8 0.8 0.4 0.4 0.4 0 0 0 −0.4 −0.4 −0.4 −0.8 −0.8 −0.8 −1.2 −1.2 −1.2 −1.6 −1.6 −1.6 −2 −2 −2 (a) (b) (c) Figure 5.5: Relative mean squared error comparisons for New Jersey (eq. 5.8): (a) direct to FH model estimates, (b) direct to FHCAR model estimates; (c) FH to FHCAR model estimates. 5.4 Discussion Recent publications for the NWOS have focused on private ownerships 4.05 ha or greater in size (e.g., Butler et al. 2016c). Our original objective in this study was to include private forest ownerships of at least 0.41 ha, the minimum forest area considered by the FIA. These smallest ownerships were to be included as they comprise a large proportion of ownerships in states like New Jersey relative to Montana. However, preliminary analyses showed large positive biases in the direct estimator so we adjusted the minimum area threshold to two ha to attain empirically unbiased direct estimates rather than increasing the threshold to 4.05 ha as the two ha threshold still accounted for a large proportion of these smallest ownerships. The unbiased direct estimates were desirable for benchmarking against the SAE models, and the potential bias induced by these smallest ownerships may require further investigation. We found population density and total forest area to be significant SAE model covariates in the majority of the iterations for both Montana and New Jersey. However, population 88 density had a positive relationship with number of private ownerships for Montana and a negative association for New Jersey counties. A possible explanation for the differing relationships could be the very low population densities of Montana counties having not reached a critical threshold in the forested landscape, whereas New Jersey is the most densely populated state in the US, causing a corresponding loss of forested land. Kittredge et al. (2008) found that population densities greater than 96.5–193 people·km−2 (250–500 people·mile−2 ) for towns in Massachusetts increased the number of forested parcels less than eight ha and secondarily reduced the proportion of the landbase in forest due to land use change. Poudyal and Hodges (2009) also found a negative relationship between population density and woodland ownership population size for Texas. In studying all private forest landowners, Pan et al. (2009) found population density to decrease the mean forest landholding size in Alabama, which in turn could relate to a larger number of ownerships due to parcellation. The average county population density of Alabama was reported as 29 people·km−2 (76 people·mile−2 ), slightly higher than the highest density found in Montana, so at these densities a positive relationship with number of ownerships is plausible. The positive relationship between number of ownerships and total forest area in both states is not surprising as a county must have a forested landbase for private forest ownerships to be present. Poudyal and Hodges (2009) have also demonstrated the need for total private forest area at the county-level in modeling private forest owner population size. Development of a true ownership population at the county-level is not novel; however, the methods here are a substantial contribution with complete coverage of two states. Other studies have used similar approaches to develop forestland ownerships for a single county with interest at the individual parcel-level (e.g., Cho and Newman 2005, King and Butler 2005). Cho and Newman (2005) were interested in the probability of land use conversion from forest to developed land; King and Butler (2005) were interested in modeling landholding size for individual parcels. Each of these response variables is important to county-level forest management options available to an individual landowner. In each of these studies, the 89 distance to the nearest road from the parcel was the most important independent variable. At the county level, road density may be another important covariate for future exploration; however, the inclusion of road density may be collinear with population density currently accounted for in the SAE models. Mehmood and Zhang (2001) found a decline in statelevel average forest landholding size, which relates to an increase in the number of private forest ownerships for the period of 1978 to 1994. The five factors associated with this decrease in landholding size were death, urbanization, income, regulatory uncertainty, and financial assistance. A key objective of the current study was assessing the feasibility of applying SAE models to county-level ownerships and hence we did not exhaustively explore the full suite of potentially useful covariates, such as those identified in Mehmood and Zhang (2001). Next steps in developing this line of work will identify potential transformations to existing covariates, e.g., percentage of county forested, and an expanded set of useful SAE model covariates that explain variability in an expanded set of ownership characteristics and increase county-level estimate precision. One interpretation of the conditional autoregressive (CAR) random effect in FHCAR model is to capture unobserved covariates that are spatially structured. For example, neighboring counties may have similar regulations or demographics that impact the number of ownerships. However, including a CAR random effect did not yield substantial gains in RMSE in Montana and actually increased RMSE for New Jersey. This, combined with the relatively small CAR correlation parameter estimates, suggests that there is little local residual spatial structure and that the FH model with unstructured random effects is adequate. The FH model was overall more successful than the direct estimates in terms of MSE and RMSE for each state and the majority of the individual counties of the two states in this study. Alternate covariates may be needed for applying SAE models to other responses of the NWOS, such as socioeconomic variables of Poudyal and Hodges (2009). For simplicity, our simulation study considered only iterations with at least two samples per county. This was done to ensure that the direct estimator yielded estimates of θ and 90 σ 2 for each county. In practice, however, when the statewide sample size is small, we would expect zero samples in many counties. In such settings, a SAE model can still be applied by imputing the “missing” θ and σ 2 estimates (ideally with uncertainty quantification). This can be viewed as a missing data problem that is easily handled within a Bayesian paradigm (Banerjee et al. 2015, Gelman et al. 2014). Here, given county-level observed covariates, prediction of missing θs follow directly from the posited model, i.e., second line in model (5.3). The missing σ 2 s can be predicted using a log-normal regression comprising available covariates and spatially structured random effects, or other generalized variance function (GVF) approaches common in SAE literature Dick (see, e.g., 1995). Our future work will explore the efficacy of such SAE missing data problems for county-level ownership characteristics. With any survey implementation and sampling strategy, there is a trade-off between sample size (e.g., number of contacts, determined by expected response rate) and resources (e.g., time and money). Given limited resources, the number of contacts is typically constrained, and therefore, sample size is heavily dependent on survey response rates. Although ownerships contacted in the NWOS respond at higher rates than many other mail-based surveys, the potential lack of sufficient responses at the county level limits the usefulness of the data at scales at which such data can enhance landowner outreach, incentive programs, and information campaigns. For statewide surveys with county-level coverage and possibly small county-level sample sizes, the FH model presented in this paper improves the ability of researchers and forest management practitioners to use the NWOS data, by adding simple and widely available covariates for population density and forest area. These covariates have been shown to relate to forest management (Zhang et al. 2005), as well, making their utility in SAE models applicable to more variables other than number of ownerships. Further research will assess SAE models use for county-level inference about other important response variables from the NWOS, e.g., ownerships with management plans. Such models will consider jointly modeling multiple response variables and those variables with non-Gaussian 91 distributions, e.g., binary, count, and multinomial survey items. 92 CHAPTER 6 RECOMMENDATIONS FOR FUTURE WORK The movement towards model-based small area estimation (SAE) models for both biophysical and social forestry variables is sure to continue as demand for reliable estimates at various resolutions increases. The use of the model-based framework is necessitated in SAE because the sample sizes for the design-based framework do not provide adequate precision. The large-scale forest inventories and surveys of the United States Department of Agriculture (USDA) Forest Service Forest Inventory and Analysis (FIA) program and the associated National Woodland Owner Survey (NWOS) will continue to be a primary source for information on US forest resources. In combining this primary data source with other disparate data sources, the application of the hierarchical Bayesian (HB) SAE framework offers the primary advantage of fully propagating uncertainty to the final outcomes of interest. The primary objective of this dissertation was to contribute to the growing body of literature for biophysical forestry-related SAE applications and introduce SAE to social forestry applications. The detailed objectives of the four main chapters were to: i) apply a HB framework to increase the precision of estimates for biophysical forest variables at the stand level by borrowing strength across all stands through the use of LiDAR covariates; ii) apply a conditional autoregressive structure to the stand-level random effects to assess gains in precision of biophysical forest variables; iii) evaluate the current NWOS estimators of private forest area and private forest landowner population size for a known population at the state level, iv) present an alternative estimator of private forest landowner population size that explicitly accounts for various nonresponse scenarios; v) evaluate the impacts of nonresponse biases on each of these estimators; vi) produce county-level private forest ownership datasets for two complete states; vi) define and assess SAE models to improve county-level inference of the number of private forest ownerships, and; vii) develop open source software to fit the proposed SAE models. 93 This dissertation presented SAE within a HB framework; and additionally added spatial random effects that had often been ignored in prior forestry-related SAE applications. The research also offered open source software methods to fit the proposed SAE models, so that this can no longer be a deterrent to applying a Bayesian framework for SAE in forestry. There are several avenues related to this dissertation for further investigation. First, the SAE models for aboveground biomass, basal area, and volume demonstrated an increase in precision over the standard design-based estimates; however, tree density did not demonstrate the same results and could be improved through the discovery of more informative covariates. Second, the biophysical variables also showed additional increases in precision when a conditional autoregressive structure was added for the area-level random effects; but these too could be improved as the credible intervals on these parameters were fairly broad. Further exploration is also warranted for applying alternate neighborhood structures, e.g., by stand type rather than adjacency, for the conditional autoregressive structure. Application of newer spatial random effects models from the statistical literature (e.g., Porter et al. 2015, Datta et al. 2017) may be incorporated into forestry-related SAE to improve precision. Third, as stronger recommendations are made for combining variable-radius plots with LiDAR covariates, we may prefer to employ the finer resolution unit-level SAE models. Within the social forestry applications demonstrated in the last two chapters, the most obvious avenues to pursue are to test for nonresponse at the state level and apply the SAE models to actual NWOS data. The first of these avenues is currently being explored by the Family Forest Research Center for the state of Wisconsin using modifications to the current NWOS estimator, which are similar to the weighting methods described in Chapter 4. The two statewide datasets developed here for Montana and New Jersey may be useful for comparison to the outcomes for the state of Wisconsin, as nonresponse mechanisms may differ by state. As for the SAE models, the county-level methods for expanding NWOS may not be feasible due to the requirement of knowing the size of surveyed ownerships at the county level, and this current level of detail seems to be too burdensome for respondents 94 to the previous iterations of the NWOS. However, the SAE models could easily be adapted for the state-level NWOS data where desired precision of certain outcomes, e.g., the number of ownerships in the 1–9 acreage, are not being met. The application of the multivariate methods of Chapter 3 may be useful for increasing precision in estimating the 1–9 acre ownership group with another correlated outcome of interest. Additionally, the SAE models could be adapted for other NWOS variables with non-Gaussian distributions, e.g., binary, count, and multinomial survey items. Finally, there is still much research to be done that can more directly link the social forestry responses of the NWOS to the core biophysical forest variables of the USDA Forest Service FIA program. The linkage between the two into a single modeling framework may offer better understanding and more precise estimation of outcomes related to the largest ownership group and future of the US forest resource. As these forest inventories and ownerships surveys will continue into the future, additional improvement in precision for outcomes of interest may benefit from applying temporal and spatio-temporal SAE models. 95 APPENDICES 96 APPENDIX A HB AND EBLUP PARAMETER ESTIMATES FOR FAY-HERRIOT MODEL OF ABOVEGROUND BIOMASS The empirical best linear unbiased prediction (EBLUP) for the Fay-Herriot model was fit via restricted maximum likelihood using the hbsae R package (Boonstra 2012) for aboveground biomass. The comparison in parameter estimates between the EBLUP and HB inference are given in Table A.1. Table A.1: The median and 95% credible intervals for the model parameters of the Fay-Herriot (FH) with hierarchical Bayesian (HB) inference and the point estimates with standard errors in parentheses for the FH under empirical best linear unbiased predictor (EBLUP) were fit via restricted maximum likelihood. Parameter FH HB β0 -51.99 (-69.15, -34.31) FH EBLUP -52.57 (8.52) β1 , P 25 24.24 (18.17, 30.08) 24.27 (2.95) β2 , P 75 3.66 (0.83, 6.42) 3.69 (1.39) σv2 284.4 (201.2, 404.8) 241.1 97 APPENDIX B MODEL SUMMARIES OF PRIVATE FOREST LANDOWNERS FOR MONTANA AND NEW JERSEY BY COUNTY Table B.1: Summary of the first 28 Montana counties by identification number (ID) and name for true population total (Truth), sample size (n), and bias across all repeated samples for the direct, Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR) model estimates. Bias County ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Name Silver Bow Cascade Yellowstone Missoula Lewis & Clark Gallatin Flathead Fergus Powder River Carbon Phillips Hill Ravalli Custer Lake Dawson Roosevelt Beaverhead Chouteau Valley Toole Big Horn Musselshell Blaine Madison Pondera Richland Powell Truth n Direct FH FHCAR 439 7 -6 144 131 1309 23 -12 -130 -160 1062 12 4 54 97 2782 27 -15 218 216 1941 38 12 17 -35 1804 29 -9 17 17 5209 30 -16 444 498 1239 44 6 -173 -224 344 18 -1 -22 -14 868 9 3 -185 -142 304 5 0 -36 -27 209 5 -5 -17 -22 1743 13 -8 -64 -103 461 18 -11 -101 -114 1589 10 9 -206 -190 467 8 -10 -90 -97 264 2 1 -27 -33 411 9 -3 119 119 354 8 -4 -60 -85 505 7 -2 -90 -105 72 2 0 27 16 618 30 -9 -68 -56 1247 31 -13 -285 -288 321 9 -6 -40 -32 838 21 0 -81 -87 125 3 2 22 4 489 5 -2 -86 -91 825 30 1 1 -3 98 Table B.2: Summary of the second set of 28 Montana counties by identification number (ID) and name for true population total (Truth), sample size (n), and bias across all repeated samples for the direct, Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR) model estimates. Bias County ID Name Truth n Direct 29 Rosebud 448 32 6 30 Deer Lodge 454 7 1 31 Teton 296 7 -9 32 Stillwater 1133 14 -11 33 Treasure 175 14 -1 34 Sheridan 79 2 0 35 Sanders 1816 16 -12 36 Judith Basin 412 8 2 37 Daniels 154 2 0 38 Glacier 310 6 -7 39 Fallon 223 3 2 40 Sweet Grass 477 16 -1 41 McCone 156 2 1 42 Carter 296 9 -4 43 Broadwater 310 10 -8 44 Wheatland 180 8 0 45 Prairie 153 3 0 46 Granite 754 17 -6 47 Meagher 388 30 1 48 Liberty 58 2 0 49 Park 1032 26 -4 50 Garfield 256 8 -3 51 Jefferson 1385 16 2 52 Wibaux 148 4 3 53 Golden Valley 153 7 8 54 Mineral 520 4 9 55 Petroleum 163 6 -2 56 Lincoln 2857 19 -29 99 FH FHCAR -31 -34 -17 -30 -35 -42 -289 -278 0 7 26 23 -37 -59 -35 -28 -1 0 -10 -30 -18 -20 -62 -58 -4 -1 -34 -27 -33 -21 -7 -2 -2 -6 -35 -27 6 12 24 17 -92 -53 -31 -38 -192 -254 -3 1 6 27 -2 18 1 14 183 144 Table B.3: Summary of New Jersey counties by identification number (ID) and name for true population total (Truth), sample size (n), and bias across all repeated samples for the direct, Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR) model estimates. Bias County ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Name Truth n Direct FH FHCAR Atlantic 3842 18 17 -214 -489 Bergen 350 2 5 113 102 Burlington 3210 20 18 491 649 Camden 627 3 0 157 178 Cape May 874 5 -9 483 524 Cumberland 2591 13 -27 -47 -93 Essex 132 2 0 49 49 Gloucester 2092 9 -2 -202 -266 Hudson 33 2 0 -5 -5 Hunterdon 3785 16 -2 -477 -572 Mercer 1001 6 2 63 103 Middlesex 917 5 8 -68 -75 Monmouth 2139 10 8 -249 -363 Morris 2762 14 9 -193 -313 Ocean 1507 12 1 673 764 Passaic 522 3 2 192 201 Salem 1831 9 4 54 90 Somerset 1629 8 -4 -60 -47 Sussex 3013 18 -4 224 284 Union 46 2 0 171 193 Warren 2739 14 -13 -123 -77 100 Table B.4: Summary of the first 28 Montana counties by identification number (ID) for root mean squared error (RMSE), empirical coverage for a 95% nominal coverage rate, and 95% confidence interval width for the direct estimator and 95% credible interval width for the two small area estimation models across all repeated samples for the direct, Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR) model estimates. RMSE County ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Direct FH FHCAR 341 418 405 678 543 536 571 536 556 990 965 942 867 742 703 813 701 695 1220 1304 1323 531 414 410 224 151 159 495 382 381 264 174 175 223 152 153 780 643 643 384 245 235 710 586 594 319 213 214 244 165 161 314 365 367 295 187 173 314 215 214 73 78 70 372 269 270 570 488 486 251 164 169 531 394 395 163 126 107 308 217 219 465 382 369 Percent coverage Direct 70.2 81.6 80.3 88.9 85.6 85.9 93.6 79.7 72.4 79.1 70.0 59.5 87.6 56.5 88.0 72.7 87.9 66.7 65.9 76.6 84.8 73.5 82.2 69.6 74.9 68.2 80.8 77.1 101 FH 78.9 84.2 84.0 93.1 90.3 91.3 94.7 79.2 75.5 76.1 72.5 62.5 90.5 59.0 87.4 72.4 64.1 76.1 67.8 73.0 68.9 77.2 77.2 73.0 78.8 70.7 77.0 82.1 FHCAR 79.0 83.5 84.5 93.3 90.4 91.5 94.6 77.2 76.1 78.2 73.5 61.7 89.8 58.2 87.6 71.6 63.8 75.6 66.1 71.6 69.6 78.1 76.8 73.6 78.5 69.5 76.3 82.5 Confidence or credible interval width Direct 1236 2513 2283 3964 3203 3117 5252 1828 625 2018 1031 749 3327 1089 3269 1116 4181 1061 962 1159 1196 1223 2072 795 1823 818 1391 1564 FH 1195 1960 2001 3928 2881 2813 5401 1399 469 1176 526 391 2646 663 2215 647 441 1007 551 662 204 918 1427 505 1349 330 691 1339 FHCAR 1165 1894 2054 3862 2738 2791 5426 1301 483 1260 534 391 2585 620 2272 636 421 1011 479 631 187 929 1410 520 1343 282 685 1301 Table B.5: Summary of the second set of 28 Montana counties by identification number (ID) for root mean squared error (RMSE), empirical coverage for a 95% nominal coverage rate, and 95% confidence interval width for the direct estimator and 95% credible interval width for the two small area estimation models across all repeated samples for the direct, Fay-Herriot (FH), and FH with conditional autoregressive random effects (FHCAR) model estimates. RMSE County ID 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 Direct 302 352 292 599 151 51 671 295 166 270 212 352 185 212 301 194 172 462 306 74 570 247 632 196 191 382 153 893 FH 204 253 193 499 104 63 586 210 118 195 138 232 142 140 198 122 117 356 245 82 441 159 519 122 121 300 105 896 FHCAR 190 240 187 491 116 62 572 211 122 188 137 230 140 149 209 125 109 362 237 73 449 147 520 127 137 328 117 886 Percent coverage Direct 75.0 77.9 61.7 81.8 69.0 96.0 87.8 72.7 84.8 65.5 76.5 71.0 75.2 68.3 59.6 62.2 70.2 75.5 60.8 71.2 77.8 57.2 86.3 59.4 63.4 72.5 71.9 88.9 102 FH FHCAR 80.7 81.3 80.9 81.1 67.8 67.4 76.6 77.6 77.5 77.5 76.6 75.9 89.8 89.8 75.4 75.9 64.0 63.7 70.9 68.3 68.8 68.5 74.8 75.4 52.1 53.6 69.9 70.2 65.7 66.8 67.9 68.7 66.5 67.0 81.2 81.3 67.2 68.6 57.0 58.1 81.0 82.0 64.2 64.5 85.8 83.2 60.7 61.0 73.3 76.5 70.3 69.8 78.5 79.5 91.8 91.6 Confidence or credible interval width Direct 870 1377 952 2292 402 1139 2749 1057 2474 902 1147 1066 2854 650 864 509 842 1603 892 1111 2037 737 2509 674 532 1860 487 3701 FH FHCAR 665 630 889 837 552 530 1401 1406 305 327 199 202 2343 2290 679 683 292 304 557 532 384 381 723 721 315 307 417 439 551 581 323 327 297 279 1252 1270 727 696 184 167 1574 1628 436 400 1810 1678 306 316 332 363 944 1026 307 329 3589 3562 Table B.6: Summary of New Jersey counties by identification number (ID) for root mean squared error (RMSE), empirical coverage for a 95% nominal coverage rate, and 95% confidence interval width for the direct estimator and 95% credible interval width for the two small area estimation models across all repeated samples for the direct, the Fay-Herriot (FH), and the FH with conditional autoregressive random effects (FHCAR) model estimates. RMSE County ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Direct 942 247 975 354 417 842 75 644 18 906 457 451 621 779 701 359 617 550 912 30 847 FH FHCAR 732 790 162 157 1041 1131 276 288 637 682 578 528 78 78 473 472 17 17 832 864 292 284 270 248 454 499 522 510 874 937 323 320 484 494 324 295 767 776 186 207 616 588 Percent coverage Direct 97.3 95.7 93.1 91.4 90.2 93.1 98.9 94.9 96.2 96.1 91.4 89.8 97.2 97.8 88.4 85.2 90.8 96.4 92.0 100.0 91.6 103 FH 98.5 98.4 95.6 93.8 89.2 96.7 96.2 96.0 67.8 94.3 96.0 92.3 97.9 99.2 93.9 89.0 93.3 98.8 96.2 26.9 93.8 FHCAR 96.4 98.6 95.2 93.8 86.5 97.0 93.5 95.2 66.7 92.0 96.7 92.9 96.2 98.8 90.0 87.6 93.7 99.1 96.2 13.6 94.6 Confidence or credible interval width Direct 5254 12481 4499 3925 2668 4114 2933 3839 448 4898 2803 2777 4080 5137 3413 3480 3000 3972 4310 1521 4009 FH 3947 939 4439 1302 2058 2978 347 2362 54 3437 1587 1251 2425 3368 3215 1179 2177 2279 3660 448 2898 FHCAR 3359 843 4471 1265 2095 2712 293 2125 52 3190 1449 1125 2124 2761 3074 1061 2192 2015 3610 401 2833 BIBLIOGRAPHY 104 BIBLIOGRAPHY C. Babcock, A. O. Finley, J. B. Bradford, R. Kolka, R. Birdsey, and M. G. Ryan. LiDAR based prediction of forest biomass using hierarchical models with spatially varying coefficients. Remote Sens Environ, 169:113–127, 2015. C. Babcock, A. O. Finley, B. D. Cook, A. R. Weiskittel, and C. W. Woodall. Modeling forest biomass and growth: Coupling long-term inventory and LiDAR data. Remote Sens Environ, 182:1–12, 2016. S. Banerjee, B. P. Carlin, and A. E. Gelfand. Hierarchical modeling and analysis for spatial data. CRC Press, Boca Raton, Florida, 2 edition, 2015. 558 p. Base Map Service Center Montana State Library. Montana Cadastral Mapping Project, 2015. URL https://svc.mt.gov/msl/mtcadastral. last accessed May 12, 2015. W. A. Bechtold and P. L. Patterson. The enhanced Forest Inventory and Analysis program– national sampling design and estimation procedures. Technical report, Gen. Tech. Rep. SRS-80. Asheville, NC: U.S. Department of Agriculture, Forest Service, Southern Research Station, 2005. 85 p. H. J. Boonstra. hbsae: Hierarchical Bayesian Small Area Estimation, 2012. URL https: //CRAN.R-project.org/package=hbsae. R package version 1.0. J. Breidenbach and R. Astrup. Small area estimation of forest attributes in the Norwegian National Forest Inventory. Eur J Forest Res, 131:1255–1267, 2012. J. Breidenbach, R. E. McRoberts, and R. Astrup. Empirical coverage of model-based variance estimators for remote sensing assisted estimation of stand-level timber volume. Remote Sens Environ, 173:274–281, 2016. G. Brown, R. Chambers, P. Heady, and D. Heasman. Evaluation of small area estimation methods-an application to unemployment estimates from the UK LFS. In Proceedings of Statistics Canada Symposium, 2001. B. J. Butler. Family forest owners of the united states, 2006. Technical report, Gen. Tech. Rep. NRS-27. Newtown Square, PA: U.S. Department of Agriculture, Forest Service, Northern Research Station, 2008. B. J. Butler, E. C. Leatherberry, and M. S. Williams. Design, implementation, and analysis methods for the National Woodland Owner Survey. Technical report, Gen. Tech. Rep. NE-336. Newtown Square, PA: U.S. Department of Agriculture, Forest Service, Northern Research Station, 2005. 43 p. B. J. Butler, B. J. Dickinson, J. H. Hewes, S. M. Butler, K. Andrejczyk, and M. MarkowskiLindsay. USDA Forest Service National Woodland Owner Survey, 2011–2013: Design, implementation, and estimation methods. Technical report, Gen. Tech. Rep. NRS-157. 105 Newtown Square, PA: U.S. Department of Agriculture, Forest Service, Northern Research Station, 2016a. 43 p. B. J. Butler, J. H. Hewes, B. J. Dickinson, K. Andrejczyk, M. Markowski-Lindsay, and S. M. Butler. Usda forest service national woodland owner survey: national, regional, and state statistics for family forest and woodland ownerships with 10+ acres, 2011–2013. Technical report, Res. Bull. NRS-99. Newtown Square, PA: U.S. Department of Agriculture, Forest Service, Northern Research Station, 2016b. 39 p. B. J. Butler, J. H. Hewes, B. J. Dickinson, K. Andrejczyk, S. M. Butler, and M. MarkowskiLindsay. Family forest ownerships of the united states, 2013: Findings from the USDA Forest Service’s National Woodland Owner Survey. J Forest, 114(6):638–647, 2016c. doi: 10.5849/jof.15-099. G. Casella and R. Berger. Statistical Inference. Duxbury, Pacific Grove, Califonia, 2 edition, 2002. 660 p. S. Cho and D. H. Newman. Spatial analysis of rural land development. Forest Policy Econ, 7(5):732–744, 2005. doi: 10.1016/j.forpol.2005.03.008. A. Datta, S. Banerjee, and J. S. Hodges. Spatial disease mapping using directed acyclic graph auto-regressive (dagar) models. arXiv preprint arXiv:1704.07848, 2017. R. K. Deo, R. E. Froese, M. J. Falkowski, and A. T. Hudak. Optimizing variable radius plot size and LiDAR resolution to model standing volume in conifer forests. Can J Remote Sens, 42(5):428–442, 2016. P. Dick. Modelling net undercoverage in the 1991 Canadian census. Surv Methodol, 21: 45–54, 1995. B. J. Dickinson and B. J. Butler. Methods for estimating private forest ownership statistics: Revised methods for the usda forest service’s national woodland owner survey. J Forest, 111(5):319–325, 2013. doi: 10.5849/jof.12-088. B. J. Dickinson, B. J. Butler, S. M. Butler, and E. S. Huff. Usda forest service, national woodland owner survey 2011-2013: Detailed estimation procedures. Technical report, Unpublished report. On file with: U.S. Department of Agriculture, Forest Service, Northern Research Station, Newtown Square, PA 19073, in press. G. M. Domke, C. W. Woodall, B. F. Walters, R. E. McRoberts, and M. A. Hatfield. Strategies to compensate for the effects of nonresponse on forest carbon baseline estimates from the national forest inventory of the united states. Forest Ecol Manag, 315:112–220, 2014. R. E. Fay and R. A. Herriot. Estimation of income for small places: An application of james-stein procedures to census data. J Am Stat Assoc, 74:268–277, 1979. A. O. Finley, S. Banerjee, B. D. Cook, and J. B. Bradford. Hierarchical bayesian spatial models for predicting multiple forest variables using waveform lidar, hyperspectral imagery, and large inventory datasets. Int J Appl Earth Obs, 22:147–160, 2013. 106 A. O. Finley, S. Banerjee, Y. Zhou, B. D. Cook, and C. Babcock. Joint hierarchical models for sparsely sampled high-dimensional LiDAR and forest variables. Remote Sens Environ, 190:149–161, 2017. J. C. Finley, S. B. Jones, A. S. Reed, M. G. Jacobson, and G. R. Glover. Finding a name to fit the owner. J For, 99(3):48, 2001. A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis. Chapman & Hall/CRC, Boca Raton, Florida, 3 edition, 2014. 661 p. M. E. Goerndt, V. J. Monleon, and H. Temesgen. Relating forest attributes with area-and tree-based light detection and ranging metrics for western oregon. West North J Appl For, 25(3):105–111, 2010. M. E. Goerndt, V. J. Monleon, and H. Temesgen. A comparison of small-area estimation techniques to estimate selected stand attributes using LiDAR-derived auxiliary variables. Can J For Res, 41(6):1189–1201, 2011. M. E. Goerndt, V. J. Monleon, and H. Temesgen. Small-area estimation of county-level forest attributes using ground data and remote sensed auxiliary information. For Sci, 59 (5):536–548, 2013. T. G. Gregoire. Design-based and model-based inference in survey sampling: appreciating the difference. Can J For Res, 28(10):1429–1447, 1998. T. G. Gregoire, E. Næsset, R. E. McRoberts, G. Ståhl, H.-E. Andersen, T. Gobakken, L. Ene, and R. Nelson. Joint hierarchical models for sparsely sampled high-dimensional LiDAR and forest variables. Remote Sens Environ, 173:98–108, 2016. M. H. Hansen and W. N. Hurwitz. On the theory of sampling from finite populations. Ann Math Stat, 14(4):333–362, 1943. R. Hayashi, J. A. Kershaw, Jr, and A. Weiskittel. Evaluation of alternative methods for using lidar to predict aboveground biomass in mixed species and structurally complex forests in northeastern North America. Math Comput For Nat Res Sci, 7(2):49–65, 2015. R. Hayashi, A. Weiskittel, and J. A. Kershaw, Jr. Influence of prediction cell size on LiDARderived area-based estimates of total volume in mixed-species and multicohort forests in northeastern north america. Can J Remote Sens, 42(5):473–488, 2016. M. Hollaus, W. Wagner, B. Maier, and K. Schadauer. Airborne laser scanning of forest stem volume in a mountainous environment. Sensors, 7(8):1559–1577, 2007. M. Hollaus, W. Wagner, K. Schadauer, B. Maier, and K. Gabler. Growing stock estimation for alpine forests in Austria: a robust lidar-based approach. Can J For Res, 39(7):1387– 1400, 2009. T. G. Honer. Standard volume tables and merchantable conversion factors for the commercial tree species of central and eastern Canada. Technical report, Information Report FMRX-5. Ottawa, Ontario, Canada: Forest Management Research Institute, 1967. 107 A. T. Hudak, A. T. Haren, N. L. Crookston, R. J. Liebermann, and J. L. Ohmann. Imputing forest structure attributes from stand inventory and remotely sensed data in western Oregon, USA. For Sci, 60(2):253–269, 2014. M. Isenburg. LAStools – efficient LiDAR processing software (version 160721, unlicensed), 2016. URL http://rapidlasso.com/LAStools. J. C. Jenkins, D. C. Chojnacky, L. S. Heath, and R. A. Birdsey. National-scale biomass estimators for United States tree species. For Sci, 49(1):12–35, 2003. S. Jin, L. Yang, P. Danielson, C. Homer, J. Fry, and G. Xian. A comprehensive change detection method for updating the National Land Cover Database to circa 2011. Remote Sens Environ, 132:159–175, 2013. doi: 10.1016/j.rse.2013.01.012. B. R. Kaetzel, I. Majumdar, L. D. Teeter, and B. J. Butler. Regional differences among family forest landowners using national woodland owner survey results. South J Appl For, 36(3):141–145, 2012. M. A. Kilgore, S. A. Snyder, K. Block-Torgerson, and S. J. Taff. Challenges in characterizing a parcelized forest landscape: Why metric, scale, threshold, and definitions matter. Landsc Urban Plan, 110:36–47, 2013. S. L. King and B. J. Butler. Generating a forest parcel map for Madison County, NY. Technical report, In, edited by Michael Bevers and Tara M. Barrett, Gen. Tech. Rep. PNW-GTR-656. Portland, OR: U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station, 2005. 147–155. N. P. Kingsley and J. C. Finley. The forest-land owners of delaware. Technical report, Res. Bull. NE-38. U.S. Department of Agriculture, Forest Service, Northeastern Forest Experiment Station, 1975. 19 p. D. B. Kittredge, A. W. D’Amato, P. Catanzaro, J. Fish, and B. J. Butler. Estimating ownerships and parcels of nonindustrial private forestland in massachusetts. North J Appl For, 25(2):93–98, 2008. K. Kronseder, U. Ballhorn, V. Böhm, and F. Siegert. Above ground biomass estimation across forest types at different degradation levels in Central Kalimantan using LiDAR data. Int J Appl Earth Obs, 18:37–48, 2012. M. H. Kutner, C. J. Nachtsheim, and J. Neter. Applied linear regression models. McGrawHill/Irwin, New York, New York, 4 edition, 2004. 701 p. M. A. Lefsky, W. B. Cohen, G. G. Parker, and D. J. Harding. LiDAR remote sensing for ecosystem studies. Bioscience, 52(1):19–30, 2002. R. J. A. Little and D. B. Rubin. Statistical analysis with missing data. John Wiley & Sons, Inc, Hoboken, New Jersey, 2 edition, 2014. 381 p. S. L. Lohr. Sampling: design and analysis. Duxbury Press, Pacific Grove, California, 2 edition, 2010. 608 p. 108 S. Magnussen, D. Mandallaz, J. Breidenbach, A. Lanz, and C. Ginzler. National forest inventories in the service of small area estimation of stem volume. Can J For Res, 44(9): 1079–1090, 2014. doi: 10.1139/cjfr-2013-0448. M. Maltamo, K. T. Korhonen, P. Packalén, L. Mehtätalo, and A. Suvanto. A test on the usability of truncated angle count sample plots as ground truth in airborne laser scanning based forest inventory. Forestry, 80:73–81, 2007. M. Maltamo, P. Packalén, A. Suvanto, K. T. Korhonen, L. Mehtätalo, and P. Hyvönen. Combining ALS and NFI training data for forest management planning: a case study in Kuortane, Western Finland. Eur J For Res, 128:305–317, 2009. F. Mauro, V. J. Monleon, and H. Temesgen. Using small area estimation and lidar-derived variables for multivariate prediction of forest attributes. In S. M. Stanton and G. A. Christensen, editors, Gen. Tech. Rep. PNW-GTR-931, pages 73–77. U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station. 384 p., December 8–10 2015. F. Mauro, I. Molina, A. Garcı́a-Abril, R. Valbuena, and E. Ayuga-Téllez. Remote sensing estimates and measures of uncertainty for forest variables at different aggregation levels. Environmetrics, 27(4):225–238, 2016. R. E. McRoberts. Compensating for missing plot observations in forest inventory estimation. Can J For Res, 33(10):1990–1997, 2003. R. E. McRoberts, W. B. Cohen, E. Næsset, S. V. Stehman, and E. O. Tomppo. Using remotely sensed data to construct and assess forest attribute maps and related spatial products. Scand J Forest Res, 25(4):340–367, 2010. R. E. McRoberts, E. Næsset, and T. Gobakken. Inference for lidar-assisted estimation of forest growing stock volume. Remote Sens Environ, 128:268–275, 2013. S. R. Mehmood and D. Zhang. Forest parcelization in the United States: A study of contributing factors. J Forest, 99(4):30–34, 2001. A. L. Metcalf. Human dimensions of private forestland ownership: sampling, estimation, decision making processes, and implications. PhD thesis, The Pennsylvania State University, 2010. 264 p. A. L. Metcalf, J. C. Finley, A. E. Luloff, D. Shumway, and R. C. Stedman. Private forest landowners: Estimating population parameters. J Forest, 110(7):362–370, 2012. A. L. Metcalf, J. C. Finley, A. E. Luloff, R. C. Stedman, and D. Shumway. Progress in private forest landowner estimation. J Forest, 112(3):312–315, 2014. E. Næsset, T. Gobakken, J. Holmgren, H. Hyyppä, J. Hyyppä, M. Maltamo, M. Nilsson, H. Olsson, Å. Persson, and U. Söderman. Laser scanning of forest resources: the Nordic experience. Scand J Forest Res, 19(6):482–499, 2004. 109 G. Nettheim, G. D. Meyers, and D. Craig. Environmental and Natural Resources Management by Indigenous Peoples in the United States [online], chapter Indigenous Peoples and Governance Structures: A Comparative Analysis of Land and Resource Management Rights, pages 27–60. Aboriginal Studies Press, Canberra, Australia, 2002. URL http://search.informit.com.au/documentSummary;dn=408415504221651;res=IELIND. last accessed Apr 4, 2016. New Jersey Office of Information Technology, Office of GIS. New Jersey Geographic Information Network, 2015. URL https://njgin.state.nj.us/NJ$ $NJGINExplorer/index.jsp. last accessed Dec 21, 2015. B. M. O’Connell, E. B. LaPoint, J. A. Turner, T. Ridley, S. A. Pugh, A. M. Wilson, K. L. Waddell, and B. L. Conkling. The forest inventory and analysis database: Database description and users manual version 6.0.2 for phase 2. Technical report, U.S. Department of Agriculture, Forest Service, 2015. URL http://www.fia.fs.fed.us/library/databasedocumentation. 748 p, last accessed May 15, 2015. J. D. Opsomer, F. J. Breidt, G. G. Moisen, and G. Kauermann. Model-assisted estimation of forest resources with generalized additive models. J Am Stat Assoc, 102(478):400–409, 2007. S. N. Oswalt, W. B. Smith, P. D. Miles, and S. A. Pugh. Forest resources of the united states, 2012: a technical document supporting the forest service 2015 update of the rpa assessment. Technical report, Gen. Tech. Rep. WO-91. Washington, DC: U.S. Department of Agriculture, Forest Service, Washington Office, 2014. 218 p. Y. Pan, Y. Zhang, and I. Majumdar. Population, economic welfare and holding size distribution of private forestland in Alabama, USA. Silva Fenn, 43(1):161–171, 2009. P. L. Patterson, J. W. Coulston, F. A. Roesch, J. A. Westfall, and A. D. Hill. A primer for nonresponse in the us forest inventory and analysis program. Environ Monit Assess, 184 (3):1423–1433, 2012. D. Pfefferman. New important developments in small area estimation. Stat Sci, 28(1):40–68, 2013. A. T. Porter, C. K. Wikle, and S. H. Holan. Small area estimation via multivariate FayHerriot models with latent spatial dependence. Aust Nz J of Stat, 57(1):15–29, 2015. doi: 10.1111/anzs.12101. N. C. Poudyal and D. G. Hodges. Property taxation and rural land values: Their effect on private forestland ownership structure in Texas. Land Use Policy, 26(4):1100–1105, 2009. doi: 10.1016/j.landusepol.2009.01.005. QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation, 2014. URL http:/qgis.osgeo.org. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2015. URL https://www.R-project.org/. 110 J. N. K. Rao and I. Molina. Small area estimation. John Wiley & Sons, Inc, Hoboken, New Jersey, 2 edition, 2015. 441 p. B. Rice, A. R. Weiskittel, and R. G. Wagner. Efficiency of alternative forest inventory methods in partially harvested stands. Eur J For Res, 133(2):261–272, 2014. K. Sahr. DGGRID version 6.1, 2011. URL http://www.discreteglobalgrids.org/software/. C.-E. Särndal, B. Swensson, and J. Wretman. Model assisted survey sampling. SpringerVerlag, New York, New York, 1992. 694 p. G. Scrinzi, F. Clementel, and A. Floris. Angle count sampling reliability as ground truth for area-based LiDAR applications in forest inventories. Can J For Res, 45(4):506–514, 2015. W. B. Smith, P. D. Miles, J. S. Vissage, and S. A. Pugh. Forest resources of the united states, 2002. Technical report, Gen. Tech. Rep. NC-241. St. Paul, MN: U.S. Department of Agriculture, Forest Service, North Central Research Station, 2004. 137 p. W. B. Smith, P. D. Miles, D. Patrick, C. H. Perry, and S. A. Pugh. Forest resources of the united states, 2007: a technical document supporting the forest service 2010 rpa assessment. Technical report, Gen. Tech. Rep. WO-78. Washington, DC: U.S. Department of Agriculture, Forest Service, Washington Office, 2009. 337 p. US Census Bureau. 2010 Census Data, 2016. URL https://www.census.gov/2010census/ data. last accessed Aug 12, 2016. G. A. P. US Geological Survey. Protected Areas Database of the United States (PAD-US), version 1.4 Combined Feature Class, 2016. URL https://gapanalysis.usgs.gov/padus. last accessed Dec 29, 2015. J. A. N. van Aardt, R. H. Wynne, and R. G. Oderwald. Forest volume and biomass estimation using small-footprint lidar-distributional parameters on a per-segment basis. For Sci, 52 (6):636–649, 2006. J. J. Vaske. Survey research and analysis: Applications in parks, recreation and human dimensions. Venture Publishing, State College, PA, 2008. 635 p. N. R. Ver Planck, A. L. Metcalf, A. O. Finley, and J. C. Finley. Evaluation of the USDA Forest Service National Woodland Owner Survey estimators for private forest area and landowners: a case study of Montana. For Sci, 62(5):525–534, 2016. doi: 10.5849/forsci.15196. N. R. Ver Planck, A. O. Finley, J. A. Kershaw, Jr., A. R. Weiskittel, and M. C. Kress. [dataset] hierarchical Bayesian models for small area estimation of forest variables using LiDAR. Mendeley Data, v1, 2017a. doi: 10.17632/vyfrnkx39h.1. N. R. Ver Planck, A. O. Finley, and E. S. Huff. Replication data for: Hierarchical bayesian models for small area estimation of county-level private forest landowner population. Harvard Dataverse, 2017b. doi: 10.7910/DVN/A3ROX. 111 N. R. Ver Planck, A. O. Finley, and E. S. Huff. Hierarchical Bayesian models for small area estimation of county-level private forest landowner population. Can J Forest Res, 47: 1577–1589, 2017c. doi: 10.1139/cjfr-2017-0154. N. R. Ver Planck, A. O. Finley, J. A. Kershaw, Jr., A. R. Weiskittel, and M. C. Kress. Hierarchical Bayesian models for small area estimation of forest variables using LiDAR. Remote Sens Environ, 204:287–295, 2018. doi: 10.1016/j.rse.2017.10.024. M. Woods, D. Pitt, M. Penner, K. Lim, D. Nesbitt, D. Etheridge, and P. Treitz. Operational implementation of a LiDAR inventory in boreal Ontario. Forest Chron, 87(4):512–528, 2011. Y. You and Q. M. Zhou. Hierarchical Bayes small area estimation under a spatial model with application to health survey data. Surv Methodol, 37(1):25–37, 2011. Y. Zhang, D. Zhang, and J. Schelhas. Small-scale non-industrial private forest ownership in the united states: rationale and implications for forest management. Silva Fenn, 39(3): 443–454, 2005. 112