A NEW MODEL-BASED METHOD FOR ESTIMATING THE ABUNDANCE OF STANDING DEAD TREES BY HONG SU AN A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILSOPHY Forestry 2011 ABSTRACT A NEW MODEL-BASED METHOD FOR ESTIMATING THE ABUNDANCE OF STANDING DEAD TREES BY Hong Su An Standing dead trees (SDT) are an important component of forest ecosystems. However, it can be a challenge to develop reliable estimates of population parameters because dead trees are generally lower in abundance and have more complex spatial distributions (e.g., are more clustered) than live trees. In addition, most forest inventories are designed for sampling live trees. Previous studies (e.g., Bull et al. 1990) have recommend using a relatively higher sampling intensity or larger plot sizes for dead versus live trees, but this is more time consuming and costly. Adding new plots, increasing plot sizes or otherwise modifying plot designs can be especially costly in the case of large scale (e.g., national forest inventories) and other permanent plot network. This thesis sought to explore approaches to improving estimation of standing dead tree abundance, other than adding more plots or modifying plot designs, in the context of the US National Forest Inventory and Analysis Program (USDA Forest Service 2008) and US Forest Health Monitoring (FHM) Program (now merged with the FIA plot design) and other similar permanent plot networks. One major consequence of using sampling plots that are either too small or too few for sampling standing dead trees is that it is likely that there will be a large proportion of zero observations in data, typically referred to as “zero-inflated” data. Excess zero observations increases variation of estimates of standing dead tree parameters. To reduce this variability caused by zero-inflated data, a new model, Expected-Zero Hurdle (EZ-Hurdle) method, is proposed. The EZ-Hurdle method replaced the observed zero proportion in data with an expected zero probability obtained from auxiliary information describing the distance from a random point (plot center) to the nearest standing dead tree. The EZ-Hurdle method greatly improved the precision and showed less average bias than fixed-area sampling, with or without adjustment using the standard Hurdle model when tested with both simulation and field studies. The EZ-Hurdle method improved the precision without adding fixed-area plot but it required additional information to explain the uncertainty caused by zero observations in data. Especially, EZ-Hurdle methods improved the precision when only additional information was applied without adding points. Therefore, it can be applied to improve the precision of estimates without changing plot design such as FIA and FHM program. The EZ-Hurdle method performs best when the density of standing dead trees is low or a small fixed-area plot size is used to collect the data because the expected zero probability which is modeled from auxiliary information showed less variation than observed zero proportion in data. Although EZ-Hurdle method showed better precision, it is less cost and sampling efficiency than fixed-area sampling method due to time to search the nearest standing dead tree. Therefore, distance-limited EZ-Hurdle method which restricts the search radius to find the nearest standing dead tree was proposed to reduce time to collect auxiliary information. Distance-limited EZ-Hurdle method showed better precision than fixed-area sampling for all circumstances such as densities and spatial patterns. It also has better time efficiency than the original EZ-Hurdle method. Therefore, the EZ-Hurdle method with a distance-limited method can be an alternative method to improve the precision for estimating the density of standing dead trees without changes of plot design using reasonable cost and time to collect the data. Copyright by Hong Su An 2011 ACKNOWLEDGMENTS It was fortunate that I have had the opportunity to interact with many people who have inspired and have been inspirational to my life and research at Michigan State University. Especially, I would like to gratefully and sincerely appreciate to Dr. David W. MacFarlane for his guidance and patience. Especially, I really appreciate his friendship and understanding during my graduate study. His advice gave me great challenge and led me to complete Ph.D program. I would also like to give many thanks to my advisory committee, Dr. Richard K. Kobe, Dr. Daniel Hayes, and Dr. Andrew O. Finley. They gave me broad view of science and a lot of comments and advice for my research. I am grateful to the Forest Modeling Lab group: Zhonglei Wang, Neil Ver Planck, and Lisa Parker. It was an exciting and enjoyable time with them. Especially, it was exciting experience for data collection not only my research but also other projects. I also appreciate Dr. Kyung-Hwan Han, Dr. Sang Choon Jeon, Dr. Sang Yeob Lee, and Dr. Hwan Jung. They encouraged me when I fell into a slump and prayed for me. In addition, I would like to thank my friends in New Hope Baptist church. Especially, I am grateful to Dr. Man Yong Shin. He is my mentor not only science but also life. He encourages me to pursue the Ph.D program and gives me tremendous amounts of advice for my life and research. He is my role model for my life. I also would like to thank Dr. Alla Sikorskii. She gave me the opportunity to expand my knowledge from forestry to nursing research. It was an unforgettable experience working as a member of her research group. Lastly, I would like to thank my family for all their love and encouragement. Their endless commitment and efforts to make me a scholar. v TABLE OF CONTENTS List of Tables..………………………………………………….…………………….…...…… viii List of Figures…………………………….…………………………………………….…...…… x 1. Introduction ................................................................................................................................ 1 1.1. The Importance of Dead Trees in Forest Ecosystems .......................................................... 1 1.2. Challenges for Estimating the Abundance of Standing Dead Trees .................................... 2 1.3. Goals and Objectives of this Dissertation ............................................................................ 4 2. Review of General Sampling Methods and Models for Dealing with Zero-Inflated Data ........ 7 2.1. Overview of Sampling Inference ......................................................................................... 7 2.1.1. Design-Based Estimation ............................................................................................. 7 2.1.2. Model-Based Estimation............................................................................................ 10 2.2. Statistical Models for Count Data and the Problem of Zero-Inflation ............................... 13 2.2.1. Overview .................................................................................................................... 13 2.2.2. Poisson Model............................................................................................................ 13 2.2.3. Negative Binomial Model .......................................................................................... 14 2.2.4. Hurdle Model ............................................................................................................. 15 3. Development of the EZ-Hurdle Method .................................................................................. 17 3.1. Model Development ........................................................................................................... 17 4. A Simulation Study to Understand the Properties of the EZ-Hurdle Method and Compare it to Related Approaches .................................................................................................................. 20 4.1. Overview ............................................................................................................................ 20 4.2. Data .................................................................................................................................... 21 4.3. Methods .............................................................................................................................. 22 4.3.1. Estimating the Expected Zero Probability ................................................................. 22 4.3.2. Comparison of EZ-Hurdle to Existing Methods and Models .................................... 25 4.3.3. Sensitivity Analysis ................................................................................................... 25 4.4. Results ................................................................................................................................ 27 4.4.1. Inclusion probability .................................................................................................. 27 4.4.2. Comparison of Estimated Density of SDT/ha by Methods ....................................... 32 4.5. Discussion of Simulation Study ......................................................................................... 43 5. An Application of the EZ-Hurdle Method for the Estimation of Standing Dead Tree vi Abundance in a Real Forest Setting ......................................................................................... 46 5.1. Introduction ........................................................................................................................ 46 5.2. Methods .............................................................................................................................. 47 5.2.1. Data Collection .......................................................................................................... 47 5.2.2. Statistical Analysis ..................................................................................................... 51 5.3. Results ................................................................................................................................ 53 5.4. Discussion for Applied Study............................................................................................. 57 6. Cost and Sampling Efficiency of EZ-Hurdle Method .............................................................. 59 6.1. Overview ............................................................................................................................ 59 6.2. Data .................................................................................................................................... 60 6.3. Analytical Methods ............................................................................................................ 62 Estimation of time costs from field data for bootstrapping and simulation ......................... 63 6.4. Results ................................................................................................................................ 71 6.4.1. Comparison of Inclusion Probabilities Between Different Forest Type Conditions . 71 6.4.2. Results of Time Requirement Studies........................................................................ 74 6.4.3. Comparison of the Coefficient of Variation by Method in Simulation ..................... 80 6.4.4. Comparison Sampling Efficiency by Method............................................................ 82 6.5. Discussion for Cost and Sampling Efficiency.................................................................... 84 7. Developing a Distance-Limited EZ-Hurdle Method ................................................................ 87 7.1. Overview ............................................................................................................................ 87 7.2. Methods .............................................................................................................................. 87 7.3. Results ................................................................................................................................ 88 7.4. Discussion for Distance-Limited EZ-Hurdle Method ........................................................ 99 8. Conclusion .............................................................................................................................. 101 9. References .............................................................................................................................. 104 vii LIST OF TABLES Table 4.1. Parameters to create clustering patterns from a Matern Cluster Point Process. .......... 20 Table 4.2. Number of samples by sampling intensity (%) and plot size (radius, m). ................... 22 Table 4.3. Models to estimate the inclusion probability of a standing dead tree. ......................... 23 Table 4.4. Mean RMSE and AIC by model to estimate the inclusion probability of a standing dead tree. ................................................................................................................... 27 Table 5.1. Guide to define the decay class by conditions (USDA Forest Service 2005). ............. 50 Table 5.2. The number of fixed-area plots and random points by sampling intensity and method. ................................................................................................................................... 52 Table 5.3. Summary of statistics from fixed-area sampling method by infestation status. .......... 53 Table 5.4. The results of variance-to-mean ratio and index of dispersion. ................................... 55 Table 5.5. The estimated parameters and standard errors () of reduced Gompertz function. ....... 55 Table 6.1. The classification of basal area (BA) class. ................................................................. 62 Table 6.2. The estimated parameters and R-square for regression model to estimate the time requirement for a 0.08 ha fixed area plot. ................................................................. 65 Table 6.3. The estimated parameters and R-square for regression model to estimate the search time to measure the nearest SDT. .............................................................................. 69 Table 6.4. Summary of results for BA for live trees, density of standing dead trees (No.SDT/ha), average distance (Dist.) from point to the nearest SDT, and average time (Time) to search for and measure the nearest standing dead tree with standard deviation in ( ) after the value. ........................................................................................................... 71 Table 6.5. Estimated coefficients and standard errors () for reduced Gompertz function fitting by forest cover type and basal area class........................................................................ 72 Table 6.6. Average of estimated time requirements (minutes) by survey type from calibrated simulations under different spatial patterns and standing dead tree densities........... 74 Table 6.7. Estimated average time requirement (min.) per plot (& point) by method and plot type when time models are applied to the BBDMS data. ................................................. 75 Table 6.8. Estimated time requirement (hr) for FAS with 7.32 m radius subplots and the additional time requirement for EZP by PPR for a specific number of fixed-area viii plots, with two-person crews. .................................................................................... 76 Table 6.9. Relative sampling efficiency of EZ-Hurdle method to compare FAS by density and spatial pattern. ........................................................................................................... 82 Table 6.10. Relative sampling efficiency of EZ-Hurdle method to compare FAS by infected status. ......................................................................................................................... 83 Table 7.1. Maximum search radii to find the nearest standing dead tree. ..................................... 88 Table 7.2. The CV of estimates (N/ha) and expected zero probability (EZP) by PPR and search radius (m) for EZ-Hurdle method when the spatial pattern of standing dead trees is random and density is 12/ha. ..................................................................................... 89 Table 7.3. The CV of estimates (N/ha) and expected zero probability (EZP) by PPR and search radius (m) for EZ-Hurdle method when the spatial pattern of standing dead trees is cluster and density is 12/ha. ...................................................................................... 91 ix LIST OF FIGURES Figure 4.1. The distribution of simulated standing dead trees by spatial pattern when the density is 12/ha. The area depicted for each pattern is about 18 ha...................................... 21 Figure 4.2. The change of inclusion probability of a standing dead tree estimated by the reduced Gompertz function under different conditions. ......................................................... 24 Figure 4.3. The mean and 95% CI of inclusion probabilities by spatial patterns and densities of standing dead trees when the search radius is 7.32m. ............................................... 29 Figure 4.4. The mean and 95% CI of inclusion probabilities by spatial patterns and densities of standing dead trees when the search radius is 17.95m. ............................................. 30 Figure 4.5. Observed and expected zero proportion in 3,000 simulated data sets by spatial patterns, when the plot radius is 7.32m and density is 24/ha. ................................... 31 Figure 4.6. The distribution of estimated density of standing dead trees by sampling intensities (1%, 3%, 5%, 7%) and methods when the plot radius is 7.32m and spatial pattern is random. True density is 12 trees per ha. SRS is simple random sampling method with fixed-area circular plot. HDP is Hurdle model with Poisson distribution and EZP is EZ-Hurdle model with Poisson distribution. ................................................. 33 Figure 4.7. The distribution of estimated density of standing dead trees by sampling intensities (1%, 3%, 5%, 7%) and methods when the plot radius is 7.32m and spatial pattern is Cluster I. True density is 49 trees per ha. SRS is simple random sampling method with fixed-area circular plot. HDP is Hurdle model with Poisson distribution and EZP is EZ-Hurdle method with Poisson distribution. ............................................... 34 Figure 4.8. RMSE by spatial patterns, sampling intensities and plot sizes. SRS is simple random sampling method, EZP is Expected-zero Hurdle model with Poisson distribution, Ran. is random pattern, C-I is clustered pattern I, and C-II is clustered pattern II. . 36 Figure 4. 9. The errors by sampling intensity when density is 12/ha and plot radius is 7.32 m. .. 38 Figure 4. 10. The errors by sampling intensity when density is 12/ha and plot radius is 17.95 m. ................................................................................................................................... 39 Figure 4. 11. The errors by sampling intensity when density is 49/ha and plot radius is 7.32 m. 40 Figure 4. 12. The errors by sampling intensity when density is 49/ha and plot radius is 17.95 m. ................................................................................................................................... 41 Figure 4.13. The changes of errors (%) by the changes of inclusion probability when sampling intensity is 5% and plot radius is 7.32m. .................................................................. 42 Figure 5.1. The selected locations among beech bark monitoring plot (BBDMS) and Pigeon x River Country Forest (PRC)...................................................................................... 47 Figure 5.2. Plot design with 30 sampling points in the transect layout. ....................................... 48 Figure 5.3. A standing dead tree is defined as a dead tree greater than 12.7 cm DBH, which is taller than 1.37 m and leans less than 45 degrees from vertical (USDA Forest Service 2005).......................................................................................................................... 49 Figure 5.4. The number of standing dead trees (No. SDT/ha) in five decay classes in two forest types. AB is American beech.................................................................................... 54 Figure 5.5. Estimated inclusion probability by infected status. Dashed line is 7.32 m. .............. 56 Figure 5.6. The change of coefficient of variations by the number of fixed-area plots. FAS is fixed-area sampling method, PPR is point to plot ratio of data for EZ-Hurdle method. Solid line is infected forests and dashed line is non-infected forests. ....................... 57 Figure 6.1. The location of Pigeon River Country Forest (PRC) in MI. ...................................... 61 Figure 6.2. Distribution of the number of SDT per plot and time requirement to measure the SDT within plot (0.08 ha). Dashed lines are medians. ..................................................... 65 Figure 6.3. The scatter plot between time (second) and number of SDT within plot and residuals of regression model to estimate time requirement for fixed area (0.08 ha). ............. 66 Figure 6.4. The distribution of search area (ha) and time (second) to measure the nearest SDT in PRC. Dashed lines are median values. ..................................................................... 68 Figure 6.5. The scattered plot between search time (Sec.) and search radius (m) and residuals of selected regression model to estimate time to search the nearest SDT. .................... 69 Figure 6.6. The inclusion probabilities by forest type and basal area class. Dashed lines are the radii of subplot (7.32 m) and annular plot (17.95 m). ............................................... 73 Figure 6.7. The distribution of distance between a random point to the nearest standing dead tree by density and spatial pattern under the simulation study. The dashed line is average distance. ..................................................................................................................... 78 Figure 6.8. Additional time requirement for EZ-Hurdle method above the time cost for FAS by PPR from field study. Solid line is infected forests and dashed line is non-infected forests. ....................................................................................................................... 79 Figure 6.9. The change of coefficient of variation of estimated density by PPR and the number of fixed-area plots. ......................................................................................................... 81 Figure 7.1. The coefficients of variation (CV) of estimated density of standing dead trees by spatial pattern and search radius. FAS is the fixed-area sampling method, PPR is the point to plot ratio, and Inf. means unlimited search radius. ...................................... 93 xi Figure 7.2. The coefficients of variation (CV) of estimated expected zero probability (EZP) by spatial pattern and search radius. FAS is the fixed-area sampling method, PPR is the point to plot ratio, and Inf. means an unlimited search radius was used. .................. 94 Figure 7.3. The change of inclusion probability of a standing dead tree by increasing the search radius when density of standing dead trees is 24/ha and spatial pattern is clustered. 96 Figure 7.4. Addition time requirement for EZ-Hurdle method by maximum search radius. PPR is point to plot ratio, FAS is fixed-area sampling method, and Inf. is unlimited distance. ................................................................................................................................... 98 xii 1. Introduction 1.1. The Importance of Dead Trees in Forest Ecosystems Dead trees influence many aspects of forest ecosystems such as soil fertility, hydrology, and wildlife habitat (Kimmins 1992). Tree death changes the resource availability for other organisms in the ecosystem (Franklin et al. 1987). Resources such as light, nutrients, and water are increased by tree death. For example, dead trees release nutrients and energy as they decompose (Harmon et al. 1986), so dead trees influence nutrient cycling (Maser and Trappe 1984). Consequently, dead wood is an important component to the global carbon (C) cycle (Oswalt et al. 2008). Following international agreements to control C emissions, quantifying the C source and sink by region is an important issue (Rothstein et al. 2004) and dead wood holds a substantial amount of C, which is slowly released to the atmosphere and soil during decomposition (Harmon et al. 1990; Keenan et al. 1993; Krankina and Harmon 1995). Therefore, the death of trees causes increases in C levels in the atmosphere (Clark et al. 2004). In some cases, dead components of the forest such as dead wood and litter may have more C than live components (Delaney et al. 1997). Dead trees also affect the species diversity in forest ecosystems because they offer habitat for a variety of organisms (Franklin et al. 1987; Green and Peterken 1997; McCarthy and Bailey 1994). For example, standing dead trees and fallen dead trees provide habitat for birds (McClelland 1977), amphibians (Jaeger 1980), small mammals (Dueser and Shugart Jr 1978; McComb et al. 1993), and reptiles (Harmon et al. 1986). Brown (1985) reported that about 100 species of vertebrates use standing dead trees and 150 species use logs. Maser and Trappe (1984) reported that insects use the dead trees for their habitat, and fungi and mosses are living in 1 dead trees. Dead tree populations are also critical to assess and monitor the health of forest ecosystems (Gray 2003; Greif and Archibold 2000). The abundance or size of dead trees has been used to assess the stage of forest development. For example, the density of dead trees suggests one of the criteria to define the old-growth Douglas-fir and mixed-conifer forest in the Pacific Northwest and California (Franklin et al. 1986). Dead wood is often used as part of criteria and indicators of sustainable forest management by some international initiatives, such as Montreal Process or International Tropical Timber Organization (ITTO) (Bütler and Schlaepfer 2004). Dead wood and dead trees have been suggested as an indicator for biodiversity in forest by the Fourth Ministerial Conference on the Protection of Forests in Europe in 2003. Several forest health indicators can be obtained from inventory data such as stand density, species composition, mortality/growth data, and growth-to-removals ratio (O'Langhlin and Cook 2003). Dead trees and dead wood are closely connected with mortality/growth rates. Therefore, reliable estimation of dead tree population attributes are important to assess and monitor forest ecosystem process (e.g., carbon cycling) and values (e.g., biodiversity). 1.2. Challenges for Estimating the Abundance of Standing Dead Trees It can be a challenge to develop reliable estimates of population parameters because dead trees are generally lower in abundance and have more complex spatial distributions (e.g., are more clustered) than live trees. The abundance and spatial pattern of dead trees are different by geographical location, stand age, forest type and management regime (Cline et al. 1980, Guby and Dobbertin 1996, Green and Peterken 1997, Fridman and Walheim 2000). Fridman and Walhem (2000) reported that the volume of dead wood (standing dead trees and logs) is different 2 by vegetation type and geographical location in Sweden. Stephens (2004) reported that standing dead tree abundance is increased with increasing stand age in pine-mixed conifer forests. However, the number of standing dead trees may not correlate with the age of stand (Vasiliauskas et al. 2004). The abundance of dead trees are different by management regimes in hemlock-hardwood forest (Tyrrell and Crow 1994), temperate deciduous forest (Green and Peterken 1997), pine forest (Montes and Canellas 2006; Reid et al. 1996), and Norway spruce (Ranius et al. 2003). The spatial pattern of mortality is also different by the agent that caused tree death (Franklin et al. 1987). For example, forest fires or wind throw causes clustering of dead trees. This wide degree of variation in standing dead tree abundance between forest ecosystems increases sampling error. However, it suggests that auxiliary information regarding forest condition might allow for better estimation of standing dead tree abundance. Another issue for sampling dead trees is that most forest inventories are designed for sampling live trees, with dead tree data often collected under the live-tree design(e.g., standing dead trees on the US National Forest Inventory and Analysis Program (FIA) plots (Bechtold and Patterson 2005)). Since the abundance of standing dead trees is generally lower and they have a more variable spatial pattern than live trees, larger plot sizes or higher sampling intensities should be needed to get the same relative accuracy for dead trees as achieved for live ones. For example, 0.4 or 1.0 ha plots have been used to estimate the abundance of standing dead trees (Ganey 1999; Spiering and Knight 2005; Stephens 2004). Bull et al., (1990) recommended a factor-5 prism (using horizontal point sampling), 1 ha fixed-area plot size, or an overall sampling intensity of about 5%, when the mean density of standing dead trees is between 0.5 to 5 standing dead trees per hectare in order to estimate standing dead tree density within 24% of the actual density, and a complete count survey has been recommended when the mean density of standing 3 dead tree is less than 0.5 per hectare. While intensification of sampling efforts or changing plot designs for estimating attributes of standing dead trees is a straightforward solution to the problem, ultimately the cost of intensification of sampling must be considered and weighted against the value of increased accuracy (Curtis and Marshall 2005; Gregoire and Valentine 2008). Adding new plots, increasing plot sizes or otherwise modifying plot designs can be especially costly in the case of large scale (e.g., national) forest inventories and other permanent plot networks and may be infeasible in some cases. Thus, a solution based utilizing relatively cheap auxiliary data or models to enhance estimates derived from standard forest inventory data might provide an efficient alternative to dramatically increasing sampling intensity or modifying plot designs. One major consequence of surveying too-small an area (e.g., using plot sizes that are too small) is that there may be a large number of zero observations of standing dead trees. Excess zero observations (a.k.a. zero-inflated data) will increase variation in estimates of standing dead tree parameters (Eskelson et al. 2009; Potts and Elith 2006). Because of the tendency for standing dead trees to be aggregated in space and a generally lower abundance of standing dead trees relative to live ones, the problem of zero-inflated data is likely large. For example, about 44% of National Forestry Inventory (FIA) plots observed no dead trees on them (Woodall et al. 2011). Therefore, an estimator, which uses relatively cheap auxiliary information to clarify uncertainty created by zero observation of standing dead trees under standard forest inventory designs, could provide cost-efficient estimation of standing dead tree abundance. 1.3. Goals and Objectives of this Dissertation This thesis explores a new approach to improving estimation of stand dead tree 4 abundance, other than adding more plots or modifying plot designs, in the context of the US National Forest Inventory and Analysis Program (Bechtold and Patterson 2005) and US Forest Health Monitoring (FHM) Program (now merged with the FIA plot design) and other similar permanent plot networks. The main goal of this research was to develop a new estimator to predict the abundance of standing dead trees more precisely using auxiliary data and a model to reduce estimation uncertainty associated with excessive zero observations in data. The underlying hypothesis of this thesis was that auxiliary information regarding the distance to the nearest standing dead tree could reduce the variation of estimates caused by large zero observations in data under a given plot design. In order to meet the main research goal several objectives were undertaken and met: 1. Review the general problem of zero-inflated or large zero observations in data and review sampling methods and models and examine this problem in the specific context of estimating standing dead tree abundance. 2. Develop a new estimation method for standing dead trees which expands on existing zeroinflated modeling methods and which is compatible with typical estimation methods used by forestry practitioners. 3. Use simulation studies and field data to explore the problem of zero-inflated or large zero observations in data for estimating the density of standing dead trees from fixed-radius plot sampling under different simulated and real forest conditions. 4. Examine the properties of the proposed new estimator, under simulated and real forest 5 conditions. 5. Compare the new estimator with existing methods such as simple random sampling method and simple random sampling combined with existing zero-inflated models, in terms of estimation error and costs of data collection. 6. Finally, this research suggests a sampling strategy for estimating standing dead tree abundance with reference to the FIA/FHM plot design. 6 2. Review of General Sampling Methods and Models for Dealing with ZeroInflated Data 2.1. Overview of Sampling Inference Most forest inventories are focused on estimating population parameters, such as mean per unit area (or plot), and its associated variance, from sample data, which are then extrapolated to the larger population of interest (the reference population). classified into design-based and model-based inference. applied at the estimation stage. Sampling inference can be Model-based inference is usually Gregoire (1998) argued that "sample selection cannot be inherently model based". Sample selection can be described as probabilistic, sequential, or purposive (Gregoire 1998). On the other hand, models can be applied to estimate the population parameters during the estimation stage. Ratio and regression estimators are well-known modelbased estimators. In this research, population parameters estimated by both design-based and model-based estimators were compared. 2.1.1. Design-Based Estimation Suppose the population is consisted of N sample units. Each unit is identified by a label i , which is i = {1,..., N } . Let y be the observed characteristics or measurement such as number i of standing dead tree at the ith unit. The population is represented by N sample units and observed characteristics y , Y = { yi ,..., y N } with unknown parameters. Let y be the sample data, y= { y1,..., yn } where n is number of selected sample units that are selected by specific sampling design. In survey sampling, the interest is making inference of population parameters from the sample data ( y ). 7 In the design-based view, the yi of populations are considered as a fixed set of unknown constants. Therefore, the estimated population parameters are also considered as a fixed constants (Dorazio 1999; Gregoire 1998). Generally, the population mean is defined as = 1 N ⋅∑ µ N y . i =1 i An estimator which is used to calculate the estimates of population parameters arises directly from the sampling design. The choice of sampling design is important to having precise population parameter estimates because the sampling design defines the selection method for an individual sample unit which represents the population. Sampling design is a system of sampling methods selecting the sample unit from the sample space under the same probability of being sampled. Therefore, the design-based approach can be defined as probability-based sampling which is called randomness. Therefore, no assumptions about the population Y are needed in design-based approach (Dorazio 1999). Based on selected sampling design, an estimator is a rule to calculate the population parameters from the sample data. The term "sampling strategy" is used to define the combination of sampling design and estimator. An estimator of the population parameter is commonly derived to ensure that the expected value of the estimator equals the value of population parameter. Such estimators are said to be a design-unbiased estimators. For example, the estimator of population mean ( µ ) under simple random sampling is µ = ∑ n y n . Since E ( µ ) , which is the expected value of i =1 i the estimate ( µ ) estimated by all possible sample y under simple random sampling, is equal to the population mean ( µ ), the estimator is design-unbiased for µ . The estimators are unbiased regardless of the nature of population. It is an important consequence of linkage between sampling design and estimation. Most current forest inventory methods for standing dead trees are based on design-based 8 inference using sampling strategies for standing live trees (Kenning et al. 2005). These methods include simple random sampling with fixed-area plot, variable radius sampling (point sampling), and strip cruising. Fixed-area sampling and point sampling have been recommended to estimate the density of standing dead trees (Bull et al. 1990). For fixed-area sampling, arguably the most common plot design, the shape and size of plot and sampling intensity should be decided at the sampling design stage. Any shape of plots can be used, but circular or rectangular shapes are commonly used in practice. Ganey (1999), for example, used square 1 ha plots to collect the information of standing dead trees, and Bull et al. (1990), Stephens (2004), Kenning et al. (2005), and the FIA program use a circular plot to collect the standing dead tree information. The size of plot and sampling intensity are usually decided by the variable of interest, allowable costs, and desired precision (Avery and Burkhart 1983). After plot size and sampling intensity have been decided, one needs to decide the sampling design to be applied to establish the plot in the research area. In most situations, sampling locations are point locations. For example, a sampling location is the plot center of circular plot or a corner point of rectangular plot. In this research, simple random sampling (SRS) and systematic sampling (SS) designs were applied to define the baseline method for estimating the abundance of standing dead trees. Generally, the population parameters of interest are mean per sampling unit or mean per unit area and the population total. When we know at least one population parameter, we can extrapolate the other population parameters. The estimator of sample mean and the variance estimator of sample mean are as follows: n 1 y = ⋅ ∑ yi n i =1 S2 = S2 n y y 9 Where yi is measured characteristic of interest on sampling unit i such as number of trees, n is number of sampling units (plots) in the sample, and S 2 is the variance of y . The mean per unit y area is estimated by y unit= y ⋅ E E =1 A where E is an expansion factor and A is the area of plot. 2.1.2. Model-Based Estimation In model-based view, a model is a theoretical construction which defines the differential probability of objects being sampled because the yi of population is considered as a realization of one or more stochastic processes. Therefore, the population of Y is modeled as a random variable whose joint distribution f y ( y θ ) is defined by one or more unknown, fixed parameter(s) θ , where y is observed data set, y= { y1,..., yn } . The model is built based on the information of interest such as density, age, or distribution pattern. The information can be obtained from prior researches and/or assumptions suggested by the structure of population. After the model is defined, the common objective is to estimate the unknown parameter θ from sample y (Dorazio 1999). For example, we assume that the underlying distribution of standing dead tree population ( Y ) is the Poisson distribution, Y  Pois(λ ) . We assume the observed data, yi = { y1,..., yn } , is also Poisson distribution. Given this assumption, the number of standing dead trees for each yi is modeled as a random outcome of a Poisson process, which has the = density function f ( y; λ ) λ y ⋅ e −λ y ! . The unknown parameter λ of Poisson distribution 10 corresponds to the probability that number of standing dead trees tallied within plot. Under the Poisson model, E ( µ ) = λ and inference about λ and µ are equivalent. In other words, the model which is fitted with estimated parameter λ is used to make inference about the standing dead tree population. The parameter ( θ ) and population parameters are commonly estimated by the method of maximum likelihood (ML), which defines the value of θ that is most likely for the given data and model. ML is estimated by maximizing the likelihood function L(θ y ) : L(θ y ) = f ( y θ ) , where y is observed or selected data set. Because the likelihood function is a function of θ with fixed y , we have different parameter θ for the different sample set. The ML method provides the most likely value for θ . In addition, the sampling design is not relevant to inference because the estimation is based on the likelihood function. Gregoire (1998) stated that  in model-based inference, an estimator of population mean ( µ ) is to be model unbiased when  E (µ − µ ) = 0. There are many advantages for model-based estimators and inferential procedures. Since population Y is a single realization of one or more stochastic processes, there is considerable flexibility in identifying and selecting classes of models for approximating the true value which is underlying processes believed to have generated Y . The prior and observed information can be used to determine the model. For example, according to the several studies and prior information, the number of standing dead trees is observed in each plot and y values are often dominated by counts of 0 or 1 (e.g., Eskelson et al. 2009) and y values are nonnegative integers. Based on the prior information, we can assume that the underlying distribution can be Poisson or negative binomial distribution. Based on the assumption, the model is fitted to sample data. 11 However, it is not always easy to find the best model. If the model of data is not correct, the estimated parameters may be biased (Dorazio 1999). Therefore, we need to be careful when we select the model to make inference from the sampling data. Count regression models have been applied to estimate the abundance of species, dead trees, and mortality. Count regression models have been widely used in fields such as econometrics, epidemiology, and nursing, but have only recently been introduced into forest ecosystem sciences (e.g., Affleck, 2006). In a comparison of five count regression models for estimating the abundance of a vulnerable plant species, Potts and Elith (2006) demonstrated that the Hurdle model, which assumes that the observed data set follows mixture of two statistical distributions, showed the best fit to model the observed data set. Negative binomial regression model and nearest neighbor imputation method were compared by Eskelson et al. (2009) to predict the abundance of standing dead trees and cavity trees (Eskelson et al. 2009). Affleck (2006) also compared different count regression models to predict the stand mortality, finding the Hurdle and negative binomial models to provide good fits to data. These previous studies aimed to find the best fit model for observed data set to explain the relationship between the abundance data or mortality and some environmental variables. These indicate that the model performance is different by underlying assumption for population parameters. Hence, the assumption for the underlying statistical distribution of the model is important factor to predict the abundance of standing dead tree population. Several count regression models used to model standing dead tree data are introduced next chapter. 12 2.2. Statistical Models for Count Data and the Problem of Zero-Inflation 2.2.1. Overview As the statistical model, count data, y = ( y1,..., yn ) , is a random variable. The count data y is modeled by the probability mass function ( f ( y θ ) ) such as Poisson or negative binomial (NB) distribution which is characterized by one or more unknown and fixed parameter θ . Statistical models for count data are a kind of discrete response models that aim to explain the number of occurrences or count of events (Hilbe 2007). Since count data has only non-negative integer values, such as the number of standing dead trees per plot, the probability mass which uses non-negative integer values is applied to model the count data, such as Poisson and negative binomial distribution. Here, the counts of interest are counts of standing dead trees in fixed-area plots, which form the base data for making inference about standing dead tree density (standing dead trees per unit area). 2.2.2. Poisson Model The Poisson model is a basic count regression model derived from the Poisson distribution. For example, let’s say that Yi is the number of standing dead trees at site i . Under simple random sampling (SRS), we typically assume that Yi is Poisson-distributed with a mean µi and an associated variance. The number of occurrences in count data (i.e., counts in the fixed-area plots) is estimated by Poisson distribution. distribution can be characterized as e− µ µ y P {Y y} = = y! 13 The probability function of Poisson where y = {0,1, 2,3,...} , the random variable Y is the count response, and the parameter µ is the expected value and variance. An important property of the Poisson probability is that the mean ( µ = E [Y ] ) and variance ( Var [Y ] ) are equal. One of the central problems associated with both spatial aggregation and low abundance of standing dead trees is the excess zero observations in sample data (i.e., plots where no dead trees were observed). When there is a larger proportion of zero count observations in the data than assumed under the statistical distribution (i.e., Poisson distribution), this data is called ‘zero inflated data’ (Tu 2002). Typical statistical methods are unsuitable or difficult to apply when analyzing zero inflated data. For example, Ganey (1999) used medians and interquantile ranges instead of means and variances for inference, because the frequency of standing dead trees in his sample data was highly skewed. homoscedasticity assumptions. uncertainty of estimation. Zero-inflated data usually do not hold to normality or In addition, insufficient nonzero observations increase the Therefore, statistical models, which apply other statistical distributions, have been developed to model large proportions of zero observations in data. 2.2.3. Negative Binomial Model The negative binomial (NB) distribution can be applied when there is over-dispersion in a Poisson regression model. The NB distribution is as a combination of two distributions, giving a combined Poisson-gamma distribution. It assumes that count responses ( y ) are Poisson distributed with µ , and µ is assumed to follow a gamma distribution. The NB distribution is characterized as P {Y = y} = k y Γ( y + k)  k   k  ⋅ ⋅ 1 − Γ( y + 1) Γ( k )  µ + k   µ + k    14 where y = {0,1, 2,3,...} , the random variable Y is the count response. The expected value is µ and the variance is µ + µ 2 k where k is a dispersion parameter. When k is large, the term µ 2 k is approximately 0. Although the NB model is more flexible than the Poisson model, it may also problems where zero probability in data is greater than or less than the zero probability estimated by a regular count distribution. 2.2.4. Hurdle Model The Hurdle model, proposed by Mullahy (1986), was developed to account for excess zero observations in count data; it consists of two component models representing two processes defining the proportion of zero and non-zero counts and the “hurdle” is a partition between them. For the case of standing dead trees, one process is causing the absence of standing dead trees at a location and another process is influencing the number of standing dead trees where they occur. In the Hurdle model, the binomial distribution can be used to model the absence and presence of standing dead trees and a zero-truncated count distribution, such as Poisson and NB distribution, can be used to model the portion of the count data where at least one standing dead tree was found. The combined probability function for the Hurdle model is defined as follows: π  Pr {Y y}  = = f (Y = y | µ ) (1 − π ) × 1 − f (Y = 0 | µ)  y=0 y>0 where π is observed zero probability in count data according to binomial distribution, f (= y | µ ) [1 − f (= 0 | µ ) ] is the zero-truncated form of Poisson or NB distribution, and Y Y f (Y = 0 | µ ) is the probability of zero estimated by Poisson or NB distribution. The expected value and variance of Y in Poisson Hurdle model are estimated by: 15 Ehdp [Y ] = (1 − π ) ⋅ µ 1 − e− µ µ + µ2  µ  Varhdp [Y ] = (1 − π ) ⋅ − (1 − π )  1 − e− µ  1 − e− µ  2 where π is the observed zero probability estimated by binomial process and µ is the expected value of the Poisson distribution. In case of NB Hurdle model, the expected value and variance of Y in NB Hurdle model are estimated by: EhdNB [Y ] = (1 − π ) ⋅ µ  k  where P0 =   1 − P0  µi + k  k 2 2 (1 − π ) 2 + µ + µ ) − (1 − π ) µ  VarhdNB [Y ] = ⋅ (µ  1 − P0 k 1 − P0    where π is observed zero probability estimated by binomial process, µ is the expected value of NB distribution, k is dispersion parameter which is estimated by NB distribution. The zero probabilities ( π ) for both Poisson Hurdle and NB Hurdle models are estimated from binomial process using observed absence and presence data. 16 3. Development of the EZ-Hurdle Method 3.1. Model Development Since the Hurdle model estimates the zero-probability in the count data based on the observed proportion of 0’s in the sample data, the zero-probability is a population parameter also subject to sampling error due to, e.g., a too-small plot size and/or variation in standing dead tree spatial pattern and abundance. To reduce this variability, we proposed the Expected-Zero (EZ) Hurdle model, which replaces the observed proportion of 0’s from the sample data with an expected zero-probability obtained from auxiliary information. This expected zero-probability provides additional information which, theoretically, should reduce random variation associated with observed zero-proportion in the count data. To use the additional information regarding the zero probability, the EZ-Hurdle method adds an additional stage to the estimation process. The first step is estimating the expected zeroprobability observations ( Pe ) given the sampling intensity. Here, the expected zero-probability is estimated by the predicted inclusion probability of a standing dead tree for a given search radius, but it could be estimated otherwise (e.g., from another data source or model). The relationship between the expected zero-probability and the inclusion probability of a standing dead tree is: Peid = 1 − PInc (id ) where Peid is the expected zero probability for the given search radius d with restricted condition i such as spatial pattern and density of standing dead trees and PInc (id ) is the inclusion probability of a standing dead tree at the search radius d . In this application, a model is applied to find the inclusion probability of a standing dead tree for a given search radius at 17 each sample point, using the observed distance from a sample point to the nearest standing dead tree as the source of auxiliary information. As the fixed-area plot radius increases, the inclusion probability of a standing dead tree approaches one and the number of non-observations, i.e., the number of 0’s, approaches zero. Thus, for any given search radius employed during sampling, an expected number of 0’s can be estimated from a data set consisting of point-to-dead tree distances, or from a function derived from such data. The second step is estimating non-zero counts using zero-truncated count distributions such as Poisson or NB distribution by adjusting the zero-probability estimated during the first step. An expected zero-probability is obtained as a function of a search radius d , and the probability ( Peid ) is plugged into the following probability function: y=0  Peid  P {Y y}  = = f (Y = y | µid ) (1 − Peid ) ⋅  1 − f (Y = ) 0 | µid  y>0 where Peid is the modeled expected zero-probability for the given search radius d and µid is estimated mean by truncated count distributions such as Poisson and NB distribution for the given search radius d and i is the restricted condition such as spatial pattern and density of standing dead trees. The log-likelihood function of the EZ-Hurdle model is as follows: = L( yi1,..., yin ) n ∑ {ln f ( yij | µi ) − ln [1 − f (0 | µi ]} j =1 If one can estimate the expected zero-probability for the given search radius ( d ) precisely, one can obtain a more precise estimate of the number of standing dead trees. For the EZ-Hurdle with Poisson distribution (EZP) model, the expected value and variance for the count distribution are: 18 E EZP [Yi ] = Peid ) ⋅ (1 − µid 1 − e − µid 2 µid + µid  µid − (1 − Peid ) VarEZP [Yi ] =(1 − Peid ) ⋅ 1 − e − µid  1 − e − µid    2 and for the EZ-Hurdle model with NB distribution (EZNB): k  kid  id µid E EZNB [Yid ] = Peid ) ⋅ (1 − where P0 =   1 − P0  µid + kid  2 µi2  µi  (1 − Peid ) 2 = ( µi + µi + ) − (1 − Peid ) VarEZNB [Yi ] 1 − P0 k 1 − P0    where Peid is the expected zero-probability estimated by model for the given search radius d , µid is the expected value of truncated Poisson or NB distribution for restriction i , such as spatial pattern and density, and kid is dispersion parameter which is estimated by NB distribution. 19 4. A Simulation Study to Understand the Properties of the EZ-Hurdle Method and Compare it to Related Approaches 4.1. Overview A simulation study was chosen to illustrate the properties of the EZ-Hurdle method and to compare it to some existing approaches. A simulation study was advantageous because the true density of standing dead trees was known and because different spatial patterns and abundances of dead trees could be devised. A simulated area of 324ha (1,800×1,800m) was chosen to represent a large forest stand and in order to avoid the problem caused by small data set which is less than 30 observations when we applied large plot size. Three different spatial patterns random, clustered and highly clustered were applied to generate the standing dead trees at three different abundance levels ranging from 12/ha to 49/ha (Table 4.1). Table 4.1. Parameters to create clustering patterns from a Matern Cluster Point Process. Spatial Pattern No. of clusters/ha Radius of cluster(m) Density/ha Cluster I Cluster II 3 2 30 30 12, 24, 49 12, 24, 49 With typical tree abundances in forests ranging from hundreds to thousands of trees per hectare, this represents a range of dead tree relative abundances ranging from about 1 to 25% of all standing trees. Random patterns were generated by a Poisson process, and clustered patterns were generated by the Matern-cluster point process. A Matern-clustered point process was used, where the number of clusters, cluster size, and mean density are fixed for each cluster (Matern 1986). The Spatstat package in R (Baddeley and Turner 2005) was used to create simulated 20 populations of standing dead trees. Figure 4.1 shows an example of standing dead trees generated for the simulation. Cluster pattern II has more aggregation than cluster pattern I when the density is 12.36/ha. Figure 4.1. The distribution of simulated standing dead trees by spatial pattern when the density is 12/ha. The area depicted for each pattern is about 18 ha. 4.2. Data In fixed radius plot sampling method, sampling locations are point locations, which are commonly selected randomly or systematically from the continuous area frame. A sampling location is the center of circular plot or a corner point of a rectangular plot. In this simulation study, two different fixed-radius sample plots were used. One is a 7.32 m radius circular plot which is same as the ‘subplot’ in the FIA design. Another is a 17.95 m radius circular plot which is same as the ‘annular plot’ in the FIA design. In order to reduce the edge effect, a 50 m buffer area was applied. Therefore, plot locations (plot centers) were randomly selected within a core area which was 289 ha. Ten different sampling intensities were applied representing 1% to 10% of the total forest area covered by the sum of the areas of all the plots, by a 1% interval for the two specified plot sizes (Table 4.2). The number of standing dead trees was counted and the 21 distance from the plot center to the nearest standing dead tree was measured. Three thousand data sets were generated for each sampling intensity and plot size by spatial pattern using a custom algorithm written in the R statistical computing language (R Development Core Team 2011). The three thousand data sets were used to estimate the population parameters under each sampling method. Table 4.2. Number of samples by sampling intensity (%) and plot size (radius, m). radius(m) Intensity (%) 1 2 3 4 5 6 7 8 9 10 7.32 17.95 193 385 578 771 964 1,156 1,349 1,542 1,735 1,927 32 64 96 128 160 192 224 256 288 320 4.3. Methods 4.3.1. Estimating the Expected Zero Probability The inclusion probability of a standing dead tree was modeled based on the distance from random points to the nearest standing dead tree. Models fit to the simulation data were used to calculate the inclusion probability of at least one standing dead tree ( PInc ( d ) ) and the corresponding expected zero probability ( 1 − PInc ( d ) ) for each search radius. We considered an array of potential models that are sigmoid-type functions (Table 4.3), which indicate an initially 22 low change in the inclusion probability over relatively short distances and then exponentially increasing inclusion probability as the search radius continues to increase, ultimately saturating as the inclusion probability approaches one; this pattern was observed in the data. Two models were selected and the best of the two was determined based on root mean squared error (RMSE) and Aikake’s Information Criterion (AIC). Table 4.3. Models to estimate the inclusion probability of a standing dead tree. Name Model = 0, d  PInc (id ) = 0  Model I 1  P (id ) = , d >0 (Reduced Logistic function) Inc  1 + b ⋅ exp−c×d  = 0, d  PInc (id ) = 0  Model II (Reduced Gompertz function)  P (id ) exp( −b×c d ) , d > 0  Inc =  PInc (id ) is the inclusion probability of a standing dead tree in forest i and d is search radius (m). Figure 4.2 shows the change of inclusion probability of a standing dead tree estimated by the reduced Gompertz function. Both models are sigmoid curves. The change of inclusion probability has strong relationship with the inflection point. For example, a inflection point moves to the left (close to 0) when the density of standing dead trees is getting high and the spatial pattern of standing dead trees is random because there are high inclusion probabilities. In order to find the best model, 964 points were randomly selected by spatial pattern and density. There is one reason that the maximum number of random points is 964 because according to a previous study, 5% sampling intensity is recommended to estimate the abundance of standing dead trees when fixed-area sampling method is applied (Bull et al. 1990). Hence, it assumed that 964 random points should be enough to test the model performance to estimate the 23 inclusion probability of a standing dead tree. Three hundred iterations had been applied by spatial pattern and density of standing dead trees. For each iteration, RMSE and AIC were calculated. Finally, the best model was selected which has better RMSE and AIC. Figure 4.2. The change of inclusion probability of a standing dead tree estimated by the reduced Gompertz function under different conditions. It was expected that the precision of estimated inclusion probability should be increased with increasing the number of random points. In order to find a reliable number of random points to estimate the inclusion probability, the change of variation for the estimated inclusion probability was examined by spatial pattern and density using the different number of random points from 30 to 1,050. The inclusion probability for given search radii 7.32 and 17.95 m were 24 estimated from 3,000 data sets by spatial pattern and density using selected model. 4.3.2. Comparison of EZ-Hurdle to Existing Methods and Models Using 3,000 simulated data sets, the abundance of standing dead trees per ha was estimated by spatial pattern and density of standing dead trees and for each method including SRS, Poisson, Poisson-Hurdle, NB-Hurdle, Poisson EZ-Hurdle, and NB EZ-Hurdle method. In the simulation, we knew the true abundance of standing dead trees per ha, so, root mean square error (RMSE) and error (Error) were used to compare the methods. The RMSE and average bias were computed as follows for each simulation environment: = Error ( Eijkmp − T jkmp ) , and n ∑ ( Eijkmp − T jkmp )2 RMSE jkmp = i =1 n where E is estimated density of standing dead trees per ha by method, T is the true density of standing dead trees, i is the number of iterations (3,000), j is the density of standing dead trees, k is a spatial pattern, m is a sampling intensity, and p is a plot size. 4.3.3. Sensitivity Analysis Due to the EZ-Hurdle method being a model-based estimator, any estimate is sensitive to the modeled inclusion probability. Therefore, a sensitivity analysis was conducted to examine the sensitivity of the EZ-Hurdle method to errors in specifying the corrected inclusion probability, i.e., when the expected probability of zero is different from the “true” zero probability under a fixed sample size and plot size. This was quantified by the estimated % change in the estimation 25 error of standing dead tree density estimated by EZ-Hurdle method as a function of changing the inclusion probability of a SDT under holding all other factors constant such as spatial pattern, density, and plot size. The percent error (%) in standing dead tree density was calculated as: percent error = TD − E D × 100 , TD where TD is the true density and E D is the estimated density by EZ-Hurdle method. 26 4.4. Results 4.4.1. Inclusion probability The reduced Gompertz function (Model II) was selected to model the inclusion probability. Model II tended to have the smallest RMSE and AIC for overall spatial pattern and density (Table 4.4). Both models show that RMSE increases with increasing clustering of dead trees relative to a random pattern (Table 4.4), since there is a lower inclusion probability of a standing dead tree under a clustered pattern and a correspondingly greater zero inflation for any given density. Table 4.4. Mean RMSE and AIC by model to estimate the inclusion probability of a standing dead tree. Spatial Pattern Density/ha Random Clustered I Clustered II 12.4 24.7 42.0 49.4 12.4 24.7 42.0 49.4 12.4 24.7 42.0 49.4 Model I RMSE 0.159 0.042 0.092 0.082 0.651 0.980 1.353 1.423 0.842 0.652 0.843 1.321 Model II AIC -2932 -678 -1486 -1544 -3513 -2719 -3130 -2488 -3321 -2343 -2874 -2784 RMSE 0.061 0.042 0.031 0.032 0.093 0.276 0.574 0.641 0.412 0.326 0.756 0.847 AIC -3479 -2454 -1805 -1822 -5119 -3642 -3875 -3074 -4835 -3538 -3427 -3173 The variation of the estimated inclusion probabilities decreased with increasing number of random points for all spatial pattern and densities for both search radii (Figs. 4.3 and 4.4). With an increasing number of random points, the mean of inclusion probability approaches the true value. For example, the inclusion probability approaches to 0.176 with increasing the 27 random points when spatial pattern is random, density is 12/ha, and search radius is 7.32m (Fig. 4.3). The results clearly show that the 95% CI are decreased with increasing the random points and that the confidence intervals are overlapped and do not dramatically decrease after 500 random points. However, in absolute terms, the 95% CI were less than 1% over all sampling intensities, spatial patterns, and densities (Figs. 4.3 and 4.4), so a much smaller number of points than 500 can yield good estimates of the inclusion probabilities. Therefore, the 500 random points were used to estimate the inclusion probability of a standing dead tree during the simulation. The inclusion probabilities were different by the spatial patterns and densities of a standing dead tree for both radii (Figs. 4.3 and 4.4). Inclusion probabilities were higher under a random distribution of trees than that found for the two clustered patterns for all densities and search radii. In addition, clustered pattern I showed relatively higher inclusion probabilities than clustered pattern II because clustered pattern II has more aggregation than clustered pattern I. These results indicate that the inclusion probabilities are highest when the density of standing dead trees is high and randomly distributed in area and lowest with a few trees in tight clusters. However, the increase in inclusion probabilities with increasing dead tree density was much smaller when trees were clustered than randomly distributed (compare the top and bottom rows of sub-figures in Figs. 4.3 and 4.4), because in the former case, the inclusion probability was largely dependent on a sample point landing in or near a cluster, rather than how many dead trees were in a cluster. 28 Figure 4.3. The mean and 95% CI of inclusion probabilities by spatial patterns and densities of standing dead trees when the search radius is 7.32m. 29 Figure 4.4. The mean and 95% CI of inclusion probabilities by spatial patterns and densities of standing dead trees when the search radius is 17.95m. 30 One way to understand how the EZ-Hurdle method works is to see how the observed zero proportion varies over many samples and compared that to the variation in the expected zero probability predicted from the model. Figure 4.5 shows the variation of observed and expected zero proportion (i.e., the expected zero probability) by spatial patterns in 3,000 simulated data sets. Figure 4.5. Observed and expected zero proportion in 3,000 simulated data sets by spatial patterns, when the plot radius is 7.32m and density is 24/ha. In this example, there are 578 random plots with a sampling intensity of 3% and a plot radius is 7.32 m. The mean of observed zero proportion is increased with decreasing density of standing dead trees. The mean of observed zero proportion is approximately 0.67 or 67% when the spatial pattern is random. It means that approximately 67% of plots among 3,000 plots have no dead trees. The mean of modeled expected zero probability when 500 random points are used is approximately 0.66 at the same spatial pattern, so only slightly different than the observed. However, and the expected zero probability from the model has an order of magnitude smaller 31 variation than observed zero proportion. Especially, there is more improvement of zero proportion when the standing dead trees are more clustered. Therefore, EZ-Hurdle method can have better estimates than SRS or other methods which used observed zero proportion because there is less variation in the estimate of the observed zero proportion between samples. 4.4.2. Comparison of Estimated Density of SDT/ha by Methods The EZ-Hurdle method showed better precision in estimating standing dead tree density than SRS and various forms of the Hurdle model over all scenarios examined. Figure 4.6 and 4.7 show the distribution of estimated densities of standing dead trees per ha using different estimators, based on 3,000 samples generated at different sampling intensities, with lower (Fig. 4.6) and higher dead tree densities (Fig. 4.7), respectively. The data clearly show a narrower distribution of estimates with a greater proportion of estimates concentrated around the mean when the EZ-Hurdle method is employed (Figs. 4.6 and 4.7). The EZ-Hurdle with Poisson (EZP) or EZ-Hurdle with NB distribution (EZ-NB) yielded nearly identical distributions of estimates to each other, so only results for EZP are shown. SRS, the Hurdle model with Poisson (HDP) or the Hurdle model with negative binomial distribution (NB-Hurdle) methods yielded nearly identical distributions of estimates to each other. The lack of difference of Hurdle-based approaches from SRS is not surprising, since they all use the observed zero proportion from the plot data, while the EZ-Hurdle method uses nearest standing dead tree distances to reduce the variation associated with zero observations in the data. 32 Figure 4.6. The distribution of estimated density of standing dead trees by sampling intensities (1%, 3%, 5%, 7%) and methods when the plot radius is 7.32m and spatial pattern is random. True density is 12 trees per ha. SRS is simple random sampling method with fixed-area circular plot. HDP is Hurdle model with Poisson distribution and EZP is EZ-Hurdle model with Poisson distribution. 33 Figure 4.7. The distribution of estimated density of standing dead trees by sampling intensities (1%, 3%, 5%, 7%) and methods when the plot radius is 7.32m and spatial pattern is Cluster I. True density is 49 trees per ha. SRS is simple random sampling method with fixed-area circular plot. HDP is Hurdle model with Poisson distribution and EZP is EZ-Hurdle method with Poisson distribution. 34 As expected, the precision of all methods increased with increasing sampling intensity. However, under the same sampling intensity, the distribution of estimates was more different between EZ-Hurdle and other methods when plot size was small and density was low (Figs. 4.6 and 4.7). Conversely, the distribution of estimates between all the methods were much more similar when a larger plot size was applied and a lower sampling intensity was employed, although the EZ-Hurdle model did not converge on the same distribution of estimates given by the other methods for any other the scenarios examined. Theoretically, SRS and the EZ-Hurdle method should converge only when the expected and observed zero proportions are the same. The RMSE captures the overall difference in estimation error between the methods as compared to the true estimate. The RMSE of EZP was almost always lower than that from SRS under all of the different scenarios examined, except at the highest density and largest plot size when sampling intensity was greater than 5% (Fig. 4.8, the other Hurdle methods are omitted from the figure because they give nearly identical results to SRS). The difference in RMSE of all the methods decreases as the sampling intensity increases under the same conditions (plot size, desnity and spatial pattern), with EZ-Hurdle always outperforming SRS when sampling intensity is less than 5% and in many cases still outperforming SRS up to the maximum sampling intensity examined (10%). All the methods worked the best when dead trees were distributed randomly in space and the superiority of the EZ-Hurdle method was greater when standing dead trees were clustered in space (for reasons explained prviously, see Fig. 4.6). When zero inflation was highest (low density and small plots, upper left sub-figure, Fig. 4.8), EZ Hurdle was superior regardless of spatial pattern. 35 Figure 4.8. RMSE by spatial patterns, sampling intensities and plot sizes. SRS is simple random sampling method, EZP is Expected-zero Hurdle model with Poisson distribution, Ran. is random pattern, C-I is clustered pattern I, and C-II is clustered pattern II. 36 The EZ-Hurdle method has less variation in estimation error for all spatial patterns, densities, and plot sizes (Figs. 4.9, 4.10, 4.11, and 4.12). Variation in estimation error was decreased with increasing sampling intensity for both SRS and EZ-Hurdle method. Variation in estimation error also increased when standing dead trees were clustered for both methods. Under the same density of standing dead trees, the EZ-Hurdle method showed greater improvement in estimation error over SRS when a smaller plot size (7.32 m radius) was applied. The difference between the methods was also larger when standing dead tree density was lower (compare Figs. 4.9, 4.10 to 4.11 and 4.12). These results were expected because the EZ-Hurdle method should perform better when there are more zero observations in the data. Under either method, there was higher variation of estimation error when the density of standing dead trees was 49 per ha than compared to 12 per ha (Figs. 4.9, 4.10, 4.11, and 4.12), which was expected because larger variation is typically associated with a larger population mean (Avery and Burkhart 1983). SRS is a design-unbiased estimator, while the EZ-Hurdle method is model-based estimator, which is expected to be biased (Gregoire 1998). The data show that errors for both methods were clustered around zero (Figs. 4.9, 4.10, 4.11, and 4.12), such that the bias in the EZHurdle method is apparently low. 37 Figure 4. 9. The errors by sampling intensity when density is 12/ha and plot radius is 7.32 m. 38 Figure 4. 10. The errors by sampling intensity when density is 12/ha and plot radius is 17.95 m. 39 Figure 4. 11. The errors by sampling intensity when density is 49/ha and plot radius is 7.32 m. 40 Figure 4. 12. The errors by sampling intensity when density is 49/ha and plot radius is 17.95 m. 41 Given that a relatively large number (n = 500) random sample points were used to estimate the inclusion probability of a standing dead tree under a given search radius, it is important to consider how sensitive the EZ-Hurdle method is to errors in specifying the expected zero probability, i.e., if lower numbers of random point to dead tree distances were employed to calibrate the EZ-Hurdle model. The results of the sensitivity analysis (Fig. 4.13) show that a 1% change of the estimated inclusion probability corresponds to less than a 0.1% change in error in actual standing dead tree density, indicating that the method is robust to relatively small errors in estimating the expected zero-probabilities. Figure 4.13. The changes of errors (%) by the changes of inclusion probability when sampling intensity is 5% and plot radius is 7.32m. This is supported by the results in Figure 4.3 and 4.4, which show that even with sample sizes as low as 30 the expected difference between the estimated and true inclusion probability is only a few percentage points. The results (Fig. 4.10) also show that the percent error is more sensitive 42 to the change of inclusion probability when the density of standing dead trees is lower and when the standing dead trees are more clustered in space. 4.5. Discussion of Simulation Study The EZ-Hurdle method employs the EZ-Hurdle model to clarify the variability associated with estimating the true proportion of zeros in count data, in this case counts of standing dead trees in fixed radius plots. The results clearly show that additional information regarding the true proportion of zeros reduces estimation error both in terms of increased precision and reduced average bias, i.e., improved accuracy, except under conditions where the data has a small number of zero observations, either because dead tree density is high or because plot sizes are large and trees are distributed randomly in space. On the other hand, the standard Hurdle model shows similar results to compare SRS for all combinations examined because it uses the observed zero proportion from the sample data when estimating the population parameter. This latter result was expected because previous studies estimating the abundance of snags (Eskelson et al. 2009), species abundance (Potts and Elith 2006), and modeling stand mortality functions (Affleck 2006), reported that the Hurdle model performs well for fitting the distribution of observed data, i.e., Hurdle models parameterized with the observed proportion of zeros in the data should have similar expected value as estimated by SRS. The EZ-Hurdle method gave superior estimates in almost all cases examined when the sampling intensity was less than 5%, which is consistent with the recommendation by Bull et al., (1990) that a 5% sampling intensity is necessary for reliable estimates of standing dead trees using standard (e.g., simple random) sampling methods. Recognizing that the EZ-Hurdle method relies on an underlying sampling design, it can allow for greater flexibility in choosing a sampling design; for example deciding the plot size to estimate the abundance of standing dead 43 trees. The EZ-Hurdle method can improve the precision when using smaller plot sizes than typically recommended for standing dead trees (e.g., 0.1 to 1 ha plots,. (Ganey 1999; Stephens 2004)). In effect, the EZ-Hurdle method helps to mitigate increased variation and possible bias associated with too many non-observations of standing dead tree within plot, which is associated with decreasing plot size (Kenning et al. 2005). Hence, the EZ-Hurdle method can allow for reductions in plot size or plot numbers when the auxiliary information, such distances to nearest standing dead trees, is available from previous studies or additional data collection. The improved results shown here for estimating standing dead tree abundance are likely general, in that we believe that the EZ-Hurdle method should work well whenever zero observations in data are a major source of uncertainty in the data (i.e., zero-inflated data). For example, it may allow for increased accuracy in estimating the abundance of rare species by enhancing the Hurdle model-based methods of Potts and Elith, (2006). However, it is important to emphasize that the EZ-Hurdle method requires that the expected zero-probability be estimable through some auxiliary data source, which in this case was as simple as measuring the distance to the nearest standing dead tree. For other situations, estimating the expected zero probability may not be so simple. According to a study by Kenning et al. (2005), the time requirement to measure the nearest dead tree was increased with a decreasing density of dead trees and the latter is the best application of the EZ-Hurdle model. Therefore, further work is needed beyond this simulation to examine the cost efficiency of applying the EZ-Hurdle method, when the auxiliary information used to estimate the expected zero-probability is calculated from locally collected distances to the nearest dead tree (this is dealt with in later chapters of this dissertation). It might be possible to estimate the inclusion probability of a standing dead tree from stand level attributes or models calibrated elsewhere, e.g., in similar stands (see e.g., Eskelson et 44 al. 2009). There were some clear trends in the data that the inclusion probability of a standing dead tree under a given plot design was different by both the density of standing dead trees and their spatial pattern. In fact, many studies suggest that the abundance and spatial pattern of standing dead trees varies by geographical location, stand age, forest type and management regime (Cline et al. 1980, Guby and Dobbertin 1996, Green and Peterken 1997, Fridman and Walheim 2000). Thus, it might be more cost-effective to predict the expected zero probability under a given sampling design and set of forest conditions from simple auxiliary variables, rather than collecting additional data such as nearest dead tree distances on site. The biggest concern would likely be that estimated population parameters from the EZ-Hurdle method would be more likely to be biased with the additional auxiliary data than with the observed data, and thus not representative of local conditions (Dorazio 1999). Further studies along these lines are needed. Nonetheless, studies of the relationship between expected zero-probabilities for dead trees and forest attributes have an intrinsic value, in that they might serve to establish expected baseline mortality rates for comparing observed rates, which could provide benchmarks for assessing and monitoring the health of forest ecosystems. 45 5. An Application of the EZ-Hurdle Method for the Estimation of Standing Dead Tree Abundance in a Real Forest Setting 5.1. Introduction Beech bark disease is a disease of American beech (Fagus grandifolia Ehrhart) which is caused by an interaction of the exotic sap-feeding beech scale (Cryptococcus fagi Baer) with at least three species of Nectria fungi (Nectria galligena, N. coccinea var. faginata, and N. ochroleuca) (McCullough et al. 2001). Beech bark disease has several effects on beech trees: reducing leaf size, discoloring foliage, causing stem and crown dieback, reducing tree growth, reducing masting, and eventually causing tree mortality (McCullough et al. 2001). In 2000, beech bark disease was reported in Michigan. More than 7.5 million beech trees could die after the introduction of beech bark disease (Petrillo et al. 2004). Therefore, the beech bark disease monitoring system (BBDMS) was developed to monitor the beech bark disease. From 2001 to 2003, 202 monitoring plots were established in Michigan. This monitoring system has been used to find the temporal and spatial changes, and determine the impacts that the beech disease has on beech trees and northern hardwood forest ecosystem. In this study, a new estimator, the EZ-Hurdle method was used to estimate the density of standing dead trees in the BBDMS. A conventional simple random sampling method with fixedarea plots and was compared to the EZ-Hurdle method to estimate the abundance of standing dead trees. In addition, the inclusion probabilities of a standing dead tree on fixed radius plots were examined by disease occurrence. 46 5.2. Methods 5.2.1. Data Collection Based on the previous data from the BBDMS, 20 forests were selected which have similar stand characteristics, such as basal area and location, in the Lower Peninsula of Michigan (Fig. 5.1); 10 infected forests and 10 non-infected forests were selected. Figure 5.1. The selected locations among beech bark monitoring plot (BBDMS) and Pigeon River Country Forest (PRC). The BBDMS used two transect matrix layouts 5×6 or 10×3 (Fig. 5.2) employing systematic sampling with fixed-area plots (abbreviated as fixed-area sampling, FAS). Where possible, the matrix was positioned as parallel to the nearest road. In this study, new sample plots were superimposed over the old BBDMS grid, consisting of 30 plot centers spaced 40 m apart. The first plot center was always at least 40 m from a road or another forest type to remove 47 the edge effects. The areas of forest were chosen to be at least 6 ha in order to insure that all 30 plot centers were located within same forest type. Two circular plots were established for each plot center. FIA-sized subplots (7.32 m radius = 0.017 ha) and annular plots (17.95 m radius = 0.100 ha) were employed to make the study relevant to the FIA program (USDA Forest Service 2005) and Forest Health Monitoring (FHM) programs, which mostly use subplots to collect data on standing dead trees, except in the Pacific Northwest states such as Oregon and California, where the annual plots are sometimes used to collect additional standing dead tree data for dead trees whose DBH is greater than 54 cm (24 in). Figure 5.2. Plot design with 30 sampling points in the transect layout. 48 All data collection was conducted to match some of the basic protocols of the FIA program. All live and standing dead trees with a DBH of ≥ 12.7 cm (5 in) were tallied at each subplot. Since the definition of dead material in forests differs by researcher (Voller and Harrison 1998), the definition of a standing dead tree used by the FIA program was applied to define standing dead trees in the field studies. Thus a standing dead tree had to be taller than 1.3 m, have a DBH of 12.7 cm and a lean of less than 45 degrees from vertical; otherwise it is ≥ classified as coarse woody debris (CWD) (Fig. 5.3, from USDA Forest Service 2005). Figure 5.3. A standing dead tree is defined as a dead tree greater than 12.7 cm DBH, which is taller than 1.37 m and leans less than 45 degrees from vertical (USDA Forest Service 2005). For both live and standing dead trees, characteristics recorded were species, DBH, and distance and azimuth from plot center. In case of standing dead trees, decay class was also recorded 49 according to the five-class system following the FIA program (Table 5.1, from USDA Forest Service 2005); Distance was measured with an electronic distance measuring device (Vertex III, Haglof, inc.) or a measuring tape, when the electronic distance measurer did not work because the distance was too far to measure. Unlike the FIA protocol, the distance to the nearest standing dead tree was also measured when the standing dead trees were outside of the plot boundaries. The time to completely measure the subplot and annular plot was recorded. Table 5.1. Guide to define the decay class by conditions (USDA Forest Service 2005). Decay Class Limbs and branches Top 1 All present Pointed 2 Few limbs, No fine branches May be broken 3 Limb stubs only Broken 4 Few or no stubs Broken 5 None Broken Remaining bark (%) Sapwood condition Heartwood condition Intact; sound, Sound, hard, original 100 incipient decay, color hard, original color Sloughing; Sound at base, incipient advanced decay, decay in outer edge of Variable fibrous, firm to upper bole, hard, light to soft, light brown reddish brown Incipient decay at base, Sloughing; fibrous, advanced decay Variable soft, light to throughout upper bole, reddish brown fibrous, hard to firm, reddish brown Advanced decay at Sloughing; base, sloughing from cubical, soft, upper bole, fibrous to Variable reddish to dark cubical, soft, dark brown reddish brown Sloughing, cubical, soft, dark brown, OR fibrous, Less than 20 Gone very soft, dark reddish brown, encased in hardened shell 50 5.2.2. Statistical Analysis The estimates of the density of standing dead trees per ha was calculated by using an estimator for fixed-area sampling method (FAS) and the EZ-Hurdle method. According to the simulation study, there was no significant difference of parameter estimates between EZ-Hurdle with Poisson vs. negative binomial distributions. Thus, EZ-Hurdle method with a Poisson distribution (EZP) was used to estimate the density of standing dead trees for the field study. The spatial pattern of standing dead trees in the field study was unknown, so several test statistics were explored to test for spatial pattern. To evaluate whether the standing dead trees were aggregated in infected and non-infected forests, a variance-to-mean ratio was calculated using fixed-area plot data as follows: s2 v= x x where s 2 is variance and x is mean. x The variance-to-mean ratio under an assumed Poisson distribution has been used to measure spatial pattern, however, it can be a poor indicator of spatial pattern when sampling intensity is too low, or plot sizes are not large enough (Young and Young 1998). An alternative metric, the Index of Dispersion (ID), which directly takes into account the sample size, was calculated as follows: ID = ( n − 1) × s 2 x x where n is number of plot. If the variance equals the mean, ID has an approximately χ 2 distribution with ( n − 1) degrees of freedom (Fisher 1922). In order to compare the FAS and EZP from the field data, a simple bootstrap re-sampling 51 procedure was used (Efron and Tibshirani 1993) to estimate the variance of estimated density of standing dead trees for both methods. For the FAS method, plot centers were re-sampled from all collected data using subplot by different sampling intensity. Several different sampling scenarios were tested for the EZP method, consisting of different point-to plot-ratios (PPR), calculated as: PPR= No. random points No. fixed area plots . Recall that, in the field study, the distance to the nearest standing dead tree was measured from a point at the center of each fixed area plot; this corresponds to a sampling scenario of PPR = 1 (Table 5.2). At PPR > 1 additional random points are added outside the plots. 1,000 repetitions were applied to estimate the density of standing dead trees for each scenario (Table 5.2). Table 5.2. The number of fixed-area plots and random points by sampling intensity and method. Method EZP PPR No. plot 1.25 1.50 1.75 2.00 SI (%) 1.00 45/36 54/36 63/36 72/36 1 36 36/36 90/72 108/72 126/72 144/72 2 72 72/72 135/108 162/108 189/108 216/108 3 108 108/108 180/144 216/144 252/144 288/144 4 144 144/144 225/180 270/180 5 180 180/180 SI is sampling intensity, FAS is fixed-area sampling method, EZP is EZ-Hurdle method with Poisson distribution, No. plot is the number of fixed-area plot used FAS, and PPR is point to plot ratio which is the number of random points / the number of fixed-area plots. Type FAS Since we did not know the true density of standing dead trees per ha in these forests, the RMSE could not be calculated, so instead the coefficient of variation (CV) of standing dead tree density from FAS vs. EZP was calculated using 1,000 repetitions from bootstrapping with each different sampling scenarios. The CV is expressed as a percentage value and allows for 52 comparison of the relative variability about mean (Avery and Burkhart 1983). The CV is calculated by CV = Sx × 100 x where S x is a standard deviation. 5.3. Results Table 5.3 summarizes the stand statistics obtained by traditional FAS. Both beech bark disease-infected forests and non-infected forests had similar density of live trees, but the infected forests had more than twice the density of standing dead trees than non-infected areas. Expressed as basal area for standing dead trees, infected forests had a significantly greater basal 2 area (BA, m /ha); about 3 times greater than non-infected stands. Results were similar whether larger (annular) or smaller (sub) plots were employed (Table 5.3, note: live trees were not measured in the annular plots). Table 5.3. Summary of statistics from fixed-area sampling method by infestation status. Infected n No 299 452.0±15.3 Yes 299 434.7±6.3 No 299 - 23.3±2.49 Yes Plot 299 - 50.8±3.87 Subplot No. live/ha No. SDT/ha A A 19.5±4.4 A 52.3±7.7 Annular A B BA (live) 26.9±6.5 32.4±3.9 A B BA(SDT) 1.0±0.6 3.0±1.8 A - 1.2±0.7 B - 3.2±1.6 A B A B B Estimates are shown as mean±95% CI, and are significantly different ( α = 0.05 ), and Infected is the infection status of beech bark disease. The decay class was classified by five-class system. The decay class was compared by infestation status and also compared between all standing dead trees and American beech trees. 53 Figure 5.4 shows the distribution of number of standing dead trees per ha by decay class. More than 50% of standing dead trees was classified as decay class 2 and 3 for both infected and noninfected forests when all species are included. However, more than 50% American beech trees was classified into decay class 1 and 2 in case of infected areas, indicating that a lot of the standing dead American beech trees died recently (between 2003 and 2008) from beech bark disease. Figure 5.4. The number of standing dead trees (No. SDT/ha) in five decay classes in two forest types. AB is American beech. Although the density of standing dead trees was significantly different, the standing dead trees for both infected and non-infected forests show clustered pattern (Table 5.4). The varianceto-ratio values are greater than one and index of dispersion is also significantly different from one at 95% probability level for both infected and non-infected forests. In other words, the spatial pattern of the standing dead trees showed aggregation for both infected and non-infected areas. 54 Table 5.4. The results of variance-to-mean ratio and index of dispersion. Infected n Variance ratio ID No 299 77.84 597.60 Yes 299 87.96 387.71 No 299 20.74 276.22 Yes Plot 299 23.01 201.32 Subplot Annular Infected is infected status, ID is index of dispersion and index of dispersion. * * * * * is significantly different from one by Figure 5.5 shows the inclusion probability a standing dead tree estimated by reduced Gompertz function (Table 5.5) by infected status. The inclusion probability was significantly different by infected status because the 95% confidence interval (CI) of parameters do not overlap (Rubin and MacFarlane 2008). As expected, there is higher inclusion probability in infected forests than non-infected forests because the density of standing dead trees is almost twice as high in infected forests. Note that the plot size is a major determinant of zero-inflation in the data (Fig. 5.5). Table 5.5. The estimated parameters and standard errors () of reduced Gompertz function. Infected Status Yes No Parameters b 4.11(0.068) 4.85(0.122) c 0.78(0.002) 0.85(0.002) d Reduced Gompertz function is PInc (id ) = exp( −b×c ) where b and c are parameters and d is search radius. 55 Figure 5.5. Estimated inclusion probability by infected status. Dashed line is 7.32 m. Figure 5.6 shows the coefficient of variation for estimated density of standing dead trees per ha by method and PPR. The results clearly show that the EZ-Hurdle method improved the precision of estimating standing dead tree abundance over FAS, as indicated by the CV. The CV of standing dead tree abundance decreased with increasing the number of fixed-area plots or increasing the relative number of random points (i.e., PPR) used for the EZP method. The CV was lower under all methods in the infected plots, because standing dead tree density was higher and so zero inflation was lower (Fig. 5.6). One interesting result is that the EZ-Hurdle method showed smaller coefficient of variation when only addition information was applied to estimate the expected zero probability without adding additional random points. The improvement of coefficient of variation was much larger when PPR> 1(compare e.g., difference between PPR = 1 and PPR = 1.25, Fig. 5.6) because additional sample locations are added to extend the information content of the auxiliary data. 56 Figure 5.6. The change of coefficient of variations by the number of fixed-area plots. FAS is fixed-area sampling method, PPR is point to plot ratio of data for EZ-Hurdle method. Solid line is infected forests and dashed line is non-infected forests. 5.4. Discussion for Applied Study The results of the field study reinforce those of the simulation study, demonstrating that the EZ-Hurdle method (EZP) increased the precision of estimates of the density of standing dead trees relative to the FAS method for both infected and non-infected forests. The auxiliary information, which is distance to the nearest standing dead tree, reduces the variation caused by zero observation, which is largest when dead tree densities are lowest (Fig. 5.5). When additional points to collect the auxiliary information are added, EZP method shows greater improvements of precision. In addition, EZ-Hurdle method has better precision than FAS when 57 only additional information is added without adding points. This property can be an advantage to improve the precision without changing plot design or adding more plots. Although EZHurdle method gave better estimates than FAS, it also required more data, which imposes additional costs. Thus, further examination was needed to confirm that EZ-Hurdle method can be more cost-efficient than FAS. Cost efficiencies for the EZ-Hurdle method and strategies for applying it to FIA plots are discussed in the chapters which follow. The spatial pattern and density of dead trees were also found different by species and site characteristics (Ganey 1999) and also different due to the agent of tree death (Franklin et al. 1987). In this study, beech bark disease is the major source of beech tree mortality. Both the variance-to-mean ratio and index of dispersion (ID) showed that standing dead trees were clustered in both infected and non-infected forests. Additionally, the shape of inclusion probability function can be an indicator of spatial pattern and density of standing dead trees because the inclusion probability increases rapidly within short distance because generally there are more points within short distances when points are clustered in area (Bailey and Gatrell 1995). 58 6. Cost and Sampling Efficiency of EZ-Hurdle Method 6.1. Overview A well designed sampling strategy usually provides a more efficient alternative to a census, but since it leaves significant portions of the population out, the choice of sampling strategy is critical to obtaining reliable parameters of the population of interest. Efficiency relates the amount of information collected to the resources expended (Gregoire and Valentine 2008) and is a critical factor for selection of the sampling strategy. Sampling efficiency is typically evaluated by two aspects: accuracy of estimated population parameters and cost effort to collect sample data. According to previous studies, EZ-Hurdle method produces better estimates than FAS when additional information is collected, in this case, measuring the distance to the nearest standing dead tree from fixed-radius plot centers and from additional random points. If one can obtain such auxiliary information for improving the estimate at relatively low cost, the EZHurdle method should be efficient as well as accurate. Auxiliary information describing the expected zero probability can be obtained by two methods. One is from a previous study or prior information, which would be less costly than collecting new data, but this could cause biasd estimates if the density and spatial pattern of standing dead trees at the locality of interest is different than that of the prior study. More straightforward, and likely more reliable, would be to derive the auxiliary data from additional field data collected at the site. The latter should be more costly. In this study, the EZ-Hurdle method is compared side by side with FAS on both precision of estimation and cost to collect the data. The precision was evaluated by the coefficient of variation of standing dead tree density per hectare and cost was evaluated by the time to collect 59 the data. A priori, it is understood that the time requirement to collect the data for EZ-Hurdle method is always greater than time requirement for FAS, because EZ-Hurdle method enhances FAS plot data through use of auxiliary data. In this study, the time requirement for the EZHurdle method was measured in the field and the efficiency of the method was examined by combining time cost information from field data with another simulation. In the next chapter, this data analysis is extended to determine the best sampling strategy to estimate the density of standing dead trees. 6.2. Data The time requirement for fixed-area sampling method was modeled from BBDMS data used in the previous study and the time requirement for searching for the nearest standing dead tree was modeled using new field data which was collected from Pigeon River Country Forest (PRC) in Michigan (Fig. 6.1). The density of standing dead trees was estimated from both actual data (BBDMS) and simulation data. Data for estimating the time requirement of fixed-area plot As in the FIA design, the (larger) annular plots were superimposed over the subplots at each sample point, so the time to collect all the tree data on both sub- and annular plots in the BBDMS, with a two-person crew, was recorded sequentially as follows: the start time and end time for a subplot and the additional time to measure trees to the boundary of the annular plot. Therefore, time to finish the annular plot is the combination of the time to finish the subplot plus the additional time requirement to measure standing dead trees which are located in the area between subplot and annular plot boundaries. The time to measure the distance to the nearest dead tree within the plot areas was recorded, as this data is collected under the current FIA 60 design. However, when there was no standing dead tree within annular plot, the nearest standing dead tree was still measured, but the search time to find the nearest standing dead trees was not recorded. Figure 6.1. The location of Pigeon River Country Forest (PRC) in MI. Data for estimating the search time to the nearest standing dead tree Because the time to search for the nearest standing dead tree was not recorded for each tree in the BBDMS study and because the BBDMS only focused on a limited range of beechdominated northern hardwood stands, additional data was collected at Pigeon River Country State Forest (PRC) in Michigan (Fig. 6.1) to (1) better understand the time-cost-efficiency of the EZ-Hurdle method and (2) to explore the prospects for predicting the expected zero probability from basic stand data (forest type and density). According to previous studies, the density of standing dead trees differs by geographical location, species, and management regime (Cline et al. 1980; Fridman and Walheim 2000; Wisdom and Bate 2008), but is not correlated with stand age (Lee et al. 1997; McCarthy and Bailey 1994; Sturtevant et al. 1997; Vasiliauskas et al. 2004). Therefore, four different forest cover types were selected: Oak, Pine, Aspen, and Northern 61 hardwood forest based on species composition and each forest type was further subdivided into 2 four stand density types according to basal area (m /ha) (Table 6.1) to represent a wide range of stand conditions for sampling. Table 6.1. The classification of basal area (BA) class. BA (m2/ha) BA ≤ 9.2 9.2 < BA ≤ 16.1 16.1 < BA ≤ 23.0 23.0 < BA BA class 1 2 3 4 Three replications of each forest cover type – stand density combination were installed to understand within condition variation. The forest stands surveyed were at least 7 ha and a 40 m buffer area was applied to avoid the edge effects, such as roads or other forest types, following protocols similar to that of the BBDMS design. Within each replication, 30 points were randomly selected. At each sample point, species, DBH, decay class, and distance were recorded for the nearest standing dead tree. The characteristics of live trees, DBH and species, were also collected, at five randomly selected points using fixed-area subplots. For each standing dead tree, the combined time to search for the nearest standing dead tree and measure its attributes was recorded with two-person crews. 6.3. Analytical Methods Boot-strapping and Simulation Similar to that discussed earlier, a boot strapping approach was used to re-sample the BBDMS data. Five sampling intensities of n = 1,000 of a 1% to 5% sampling intensity by 1% 62 interval (see, Table 5.2) were applied. A 5% sampling intensity was chosen as the upper end based upon the fact that the previous chapters show that the benefit of EZ-Hurdle is low when sampling intensity with FAS is greater than this. Five PPR strategies were applied to collect the distance information (see Table 5.2). For each scenario, 1,000 repetitions were applied. Another simulation study was run with experimental treatments including two spatial patterns, random and cluster pattern I, three different densities, such as 12, 24, and 49/ha, applied for each spatial pattern in order to compare field study. The same number of plots and points used for bootstrapping study (Table 5.2) were applied. In both the bootstrapping and simulation study precision was compared using the coefficient of variation for the estimated density of standing dead tree per ha. Estimation of time costs from field data for bootstrapping and simulation Time costs to complete both fixed-area sampling and the EZ-Hurdle method varied from place to place depending on several factors. The measured BBDMS and PRC time data were used to develop models which could represent time variability in the simulation / re-sampling environments. In fixed-area sampling method, time requirement is strongly influenced by the number of tallied standing dead trees (Kenning et al. 2005) and plot size (area). Hence, in this study, the time requirement for FAS is proportional to the number of tallied trees times the plot size: TFAS ∝ n t ⋅ plot where TFAS is time requirement for FAS and nt is the number of tallied trees and plot is the area of the plot. To estimate time requirement for fixed-area plot with a certain plot area i was 63 modeled as: t i = β 0 + β1 ⋅ n t + ε where ti is the time required at a fixed-area plot of a specified size, nt is the number of standing dead trees tallied, ε is an error term, and β 0 and β1 are fitted coefficients. Because of the way the time data were collected, the time to count and measure all of the dead trees in a subplot was not directly available (live trees were also measured, see previous chapter for more details); only the time to count and measure the dead trees in the area between the boundaries of the sub- and annular plots was known. Therefore, in the regression model, the time requirement to complete FAS in a plot size of interest, was weighted by w calculated as follows: w= Area Ploti Area between subplot and annular plot where w is weight. Finally, the time requirement for a subplot is TFAS= w ⋅ t i Figure 6.2 shows the density distribution for different counts of dead trees within a 0.08 ha plot (the size of the area between annular plot and subplot). Approximately 50% of such fixed-area plots has less than 2 standing dead trees within a 0.08 ha area and the median time requirement to complete the plot was approximately 7.30 minutes (Fig. 6.2). 64 Figure 6.2. Distribution of the number of SDT per plot and time requirement to measure the SDT within plot (0.08 ha). Dashed lines are medians. Table 6.2 shows the estimated parameters of the regression model fit to the BBDMS time data. The intercept is 303 seconds. Therefore, it was assumed that 303 seconds are required when there is no standing dead tree within 0.08 ha. This parameter determined the minimum search time for FAS for a plot of this size and was extrapolated to FIA plots using the weight (w) described above. Table 6.2. The estimated parameters and R-square for regression model to estimate the time requirement for a 0.08 ha fixed area plot. Intercept( β0 ) Parameter ( β1 ) Response Time (second) 303.4*** 63.8*** The numebr of crews is two. Statistical significant level (***, p <0.001). R-square 0.56 Figure 6.3 shows the relationship between the time to complete a 0.08 ha FAS plot and 65 number of standing dead trees tallied, along with the associated residual errors. The results show a relatively unbiased model but with a considerable amount of scatter in the relationship, since only about 56% of time requirement was explained by the number of standing dead trees tallied (Table 6.3). The spread of residuals was roughly the same across the range of fitted values. Other factors affecting this relationship included the relative difficulty of navigating the terrain the density of the vegetation and the time spent identifying and classifying dead trees. Based on the lack of bias in the model, it was assumed that time requirement for plots of other sizes is roughly proportional to this relationship, i.e., the proportion of the number of standing dead trees times plot size ( TFAS ∝ n t ⋅ plot ) was proportional to the time requirement for other (sub- and annular) plot sizes. Thus, for the bootstrapping and simulation procedure the model (Table 6.2) and procedure above was used to estimate the time to complete FAS for every plot in every iteration. Figure 6.3. The scatter plot between time (second) and number of SDT within plot and residuals of regression model to estimate time requirement for fixed area (0.08 ha). 66 Estimation of search time for finding and measuring the nearest standing dead tree For the bootstrapping and simulation procedure, it was also necessary to estimate the time to find and measure the nearest standing dead tree for every plot in every iteration. From the PRC data, the distance to the nearest standing dead trees has a strong influence on time requirements. Therefore, the time requirement to find and measure the nearest standing dead tree was modeled by Tneari = β0 + β1 ⋅ π ( di )2 + ε where Tneari is the time required at plot or point i , di is the distance from the plot center (point) to the nearest standing dead tree, ε is an error term describing other factors effecting the time requirement, and β 0 and β1 are coefficients which were obtained from fitting the model to the PRC data. Because DBH, species, and decay class were also recorded for the nearest standing dead tree, the time requirement estimated (Tneari) is an overestimate of the time to measure the distance to the nearest standing dead tree. However, since tree classification and measurement times were also included in the time estimates for the FAS method, both Tneari and TFAS can be considered as standardized for the purposes of comparing the time cost of FAS to EZ-Hurdle method. From the raw PRC data, we can see that the average search time to find and measure the nearest standing dead tree was dramatically increased due to the fact that some standing dead trees were located very far away from the random points (Fig. 6.4). Because the search area and time to measure the nearest standing dead tree were right skewed, several data transformations were applied to find the best fit model. 67 Figure 6.4. The distribution of search area (ha) and time (second) to measure the nearest SDT in PRC. Dashed lines are median values. Based on the residual of sum of squares (RSS) and AIC, the best fit regression model (Fig. 6.5) was: Time = β0 + β1 ⋅ radius + ε where time is in seconds, radius, is the radius of search area in meters, ε is an error term, and β 0 and β1 are coefficients of the model fitted to the PRC data. There was strong linear relationship between search time and search radius when both variables were square root transformed. About 96% of time variation was explained by search radius alone and there was no major pattern in residual plot (Fig. 6.5) to indicate significant bias in the model, except a very small distortion for very small search areas / very short search times (the intercept of the model was -5 seconds, Table 6.3). The estimated parameters (Table 6.3) were used to estimate the search time for both the simulation and the boot-strapping study. 68 Figure 6.5. The scattered plot between search time (Sec.) and search radius (m) and residuals of selected regression model to estimate time to search the nearest SDT. Table 6.3. The estimated parameters and R-square for regression model to estimate the search time to measure the nearest SDT. Response Parameter ( β1 ) -5.04*** Time(sec.) Statistical significant level (***, p <0.001). R-square 6.24*** Intercept( β0 ) 0.96 Estimation of time requirement for EZ-Hurdle method Recall that the EZ-Hurdle method is applied to FAS data, where auxiliary data is collected or otherwise available. The time cost for the EZ-Hurdle method was computed as the sum of three time components: TEZ TFAS + (n − n0 ) ⋅ (Tnear − TFAS ,0 ) + n* ⋅ Tnear = 69 where TEZ is time requirement for EZ-Hurdle model, n is the number of fixed-area plots, n0 is the number of fixed-area plots with a standing dead tree, Tnear is time requirement to measure the nearest standing dead tree, TFAS ,0 is time requirement for FAS when there is no standing dead trees in plot, and n* is the number of additional points to collect the distance information. TEZ is equal to TFAS when n* and (n-n0) are 0. Estimation of sampling efficiency Sampling efficiency was evaluated by both the precision of parameters and the time cost to collect data (Lessard et al. 1994). The relative efficiency to compare FAS was calculated by: 2 CVEZ T = Er × ez , where, 2 CVFAS T FAS n T EZ = ∑ TEZi n , and i =1 n T FAS = ∑ TFASi n i =1 and where CVEZ is the coefficient of variation of the parameter estimated by EZ-Hurdle method, CVFAS is the coefficient of variation of the parameter estimated by FAS, i is the number of repetitions (1,000), TEZi is the required time to collect data for repetition i for EZHurdle, and TFASi the required time to collect data for repetition i for FAS. When Er <1, the EZ-Hurdle method is more efficient than FAS. 70 6.4. Results 6.4.1. Comparison of Inclusion Probabilities Between Different Forest Type Conditions Table 6.4 summarizes the results of the random point to nearest standing dead tree data collected from the PRC. Table 6.4. Summary of results for BA for live trees, density of standing dead trees (No.SDT/ha), average distance (Dist.) from point to the nearest SDT, and average time (Time) to search for and measure the nearest standing dead tree with standard deviation in ( ) after the value. BA Class a 6.4 (2.9) 18.0 (8.0) 2 10.52 (0.71) 13.7 (10.2) 12.4 (8.0) 250 (228) 3 19.57 (2.05) 33.5 (11.9) 8.1 (5.4) 139 (126) 27.73 (4.23) 9.69 (0.37) 34.0 (28.3) 8.4 (10.9) 8.0 (5.3) 14.8 (8.7) 143 (139) 384 (288) 2 13.92 (1.63) 16.4 (8.5) 12.0 (7.0) 315 (243) 3 20.42 (1.77) 18.2 (12.2) 11.5 (6.4) 304 (233) 27.27 (2.88) 8.50 (0.83) 15.1 (10.9) 5.8 (1.9) 12.5 (6.4) 18.6 (8.5) 307 (224) 474 (280) 2 14.75 (0.51) 6.0 (5.7) 18.4 (9.1) 480 (299) 3 21.83 (0.39) 12.5 (16.9) 12.1 (8.0) 288 (253) 28.29 (1.94) 8.96 (1.18) 44.5 (13.0) 6.3 (3.1) 7.2 (4.4) 17.7 (9.6) 144 (123) 454 (326) 2 14.96 (0.42) 13.1 (11.6) 13.4 (7.9) 310 (256) 3 20.26 (1.78) 7.4 (1.7) 16.2 (8.9) 402 (297) 4 Pine 8.83 (0.34) a Time (sec.) 398 (242) 4 1 Oak Dist. (m) 4 1 NHW No.SDT/ha 4 1 Aspen BA (m2/ha) 1 Type 29.09 (2.67) 23.3 (21.8) 9.9 (6.1) 204 (187) time per two-person crew. Densities of standing dead trees generally increased with increasing basal area with some variation in the trend for NHW and pine forests, indicating an accumulation of dead trees as the 71 stands developed, consistent with the even-aged management that has been applied to the stands (even the NHWs). Some of the anomalies in this trend (Table 6.5) may be explained by the fact that most forests in the PRC are managed forest for timber production, sometimes treated with pre-commercial thinning, which often include removals of low vigor and dying trees. The average distance between random points to the nearest standing dead tree tended to be decreased with increasing the density of standing dead trees (Table 6.4). The decline in mean time to search for and measure the nearest standing dead tree was similar for all forest types, except NHW forest which had very similar time costs, densities of dead trees and distances to dead trees in the higher three basal area classes, which (anecdotally) appeared to be because of relatively thicker understory vegetation in dense forests of this type that may have increased the search time. These data (Table 6.4) indicate that, in general, the inclusion probability of a standing dead tree may be estimable from stand level data. Table 6.5. Estimated coefficients and standard errors () for reduced Gompertz function fitting by forest cover type and basal area class. Parameters Parameters BA Forest BA class type class b (se) c (se) b (se) c (se) 1 6.88 (0.286) 0.88 (0.002) 1 7.06 (0.195) 0.88 (0.001) 2 3.60 (0.088) 0.86 (0.002) 2 5.26 (0.120) 0.89 (0.001) Aspen Oak 3 3.61 (0.092) 0.78 (0.003) 3 4.26 (0.129) 0.83 (0.003) 4 4.34 (0.083) 0.76 (0.002) 4 4.94 (0.074) 0.73 (0.002) 1 3.35 (0.104) 0.89 (0.002) 1 4.09 (0.083) 0.90 (0.001) 2 4.58 (0.070) 0.84 (0.001) 2 4.67 (0.110) 0.85 (0.002) NHW Pine 3 4.71 (0.102) 0.83 (0.002) 3 4.49 (0.074) 0.88 (0.001) 4 4.32 (0.095) 0.84 (0.002) 4 4.52 (0.082) 0.80 (0.002) BA is basal area, NHW is northern hardwood forest, and se is standard error. Forest type d When the reduced Gompertz function: PInc (id ) = exp( −b×c ) was fitted to the PRC data, the inclusion probability was significantly different by forest type and basal area class (the 95% 72 confidence interval of estimated coefficients did not overlap, Table 6.5). Figure 6.6. The inclusion probabilities by forest type and basal area class. Dashed lines are the radii of subplot (7.32 m) and annular plot (17.95 m). Figure 6.6 shows the inclusion probability of a standing dead tree by forest type and basal area class predicted from the models in Table 6.5. As expected from previous chapters, the inclusion probability was increased with increasing the density of standing dead trees. For all forest types, stands in the lowest and highest basal area class were clearly different, but there was not always a noticeable difference in the intermediate classes (Fig. 6.6), especially in NHW forests, where 73 BA class 1 was different, but there was little difference between the other three BA classes. The most unusual case was that of the pine forests, where there was a higher inclusion probability (higher mortality, Table 6.4) in BA class 2 than 3 (Fig. 6.7). The latter indicates that under some circumstances the abundance of standing dead trees is not proportional to stand basal area, and other density-independent mortality factors are at work. 6.4.2. Results of Time Requirement Studies Table 6.6 summarizes the simulated average required time to count and measure standing dead trees using a 7.32 m (subplot) and 17.95 m radius fixed-area sampling plot and the time to search for and measure the nearest standing dead tree (300 random points were used to generate the average times), in computer-generated forests of varying densities and spatial patterns. Table 6.6. Average of estimated time requirements (minutes) by survey type from calibrated simulations under different spatial patterns and standing dead tree densities. Spatial Density Subplot Annular 12 1.1 9.2 Cluster 24 1.1 10.7 49 1.2 13.7 12 1.1 9.1 Random 24 1.1 10.7 49 1.2 13.8 Nearest is time to search for and measure the nearest standing dead tree. Nearest 5.8 6.1 5.3 5.8 3.8 2.3 The calibrated simulations (Table 6.6) show only relatively small differences of time requirement in the FAS between the different spatial patterns and densities when subplot in FIA is used. However, there was clear pattern that the time requirement for annular plots increases 74 with increasing density of standing dead trees, but with no significant difference between the spatial pattern types. In the case of the time to find and measure the nearest standing dead tree, the time requirement tended to decrease with increasing density of standing dead trees in a random pattern, but there was no difference between densities when the dead trees were in a clustered pattern, because most of the time was spent finding the nearest standing dead tree, regardless of the density, since trees within the cluster were relatively close together. In terms of the differences between methods, the simulations showed that finding and measuring the nearest standing dead tree took about 2 to 5.5 times longer than counting and measuring dead trees in a subplot, but only about 1/6 to about 6/10 of the time to complete an annular plot, depending on the conditions of the stand (Table 6.7). The annular plots require from about 8 to about 11.5 times longer to complete than the subplots. Table 6.7. Estimated average time requirement (min.) per plot (& point) by method and plot type when time models are applied to the BBDMS data. Infected status FAS EZ-Hurdle (w/subplot) Subplot Annular Yes 1.9 11.5 2.9 No 1.4 8.5 5.4 Time requirement was measured with two-person crews. FAS is fixed-area sampling method, subplot is 7.32 m radius, annular plot is 17.95 m radius, and EZ-Hurdle is sum of the time to find and measure the nearest standing dead tree and to measure standing dead trees within a subplot. The time models were also applied to the BBDMS and the average time requirement per plot or point by survey method was estimated by infected status (Table 6.7). For FAS, in both the infected (about 52 SDT / ha) and non-infected plots (about 21 SDT / ha), annular plots required about 6 times longer than the subplots to complete. The estimated time for the EZ-Hurdle method, which includes both the time to complete a subplot and the time to find and measure the 75 nearest standing dead, was about 1.5 to 3.9 times longer than the time for the subplot alone, for infected vs. non-infected forests, respectively, the latter because with fewer standing dead trees, it took longer to find them. However, the EZ-Hurdle method required about 1.5 to 4 times less time than annular plots for infected and non-infected forests indicating that the EZ-Hurdle method has a much lower time cost than FAS with annular-sized plots. Table 6.8. Estimated time requirement (hr) for FAS with 7.32 m radius subplots and the additional time requirement for EZP by PPR for a specific number of fixed-area plots, with two-person crews. Random Cluster EZP EZP N/ha No. plot FAS FAS PPR PPR 1.00 1.25 1.50 1.75 2.00 1.00 1.25 1.50 1.75 2.00 36 0.7 3.5 4.4 5.3 6.1 7.0 0.7 6.0 7.5 9.0 10.5 12.0 72 1.5 7.1 8.8 10.6 12.3 14.1 1.5 12.1 15.2 18.2 21.3 24.3 12 108 2.2 10.6 13.2 15.8 18.5 21.1 2.2 18.0 22.5 27.1 31.6 36.1 144 3.0 14.0 17.5 21.0 24.5 28.0 3.0 24.2 30.2 36.3 42.3 48.3 180 3.7 17.5 21.9 26.3 30.7 35.0 3.7 30.1 37.6 45.2 52.8 60.3 36 0.9 2.1 2.6 3.2 3.7 4.2 0.9 4.4 5.4 6.5 7.6 8.7 72 1.7 4.2 5.3 6.3 7.4 8.4 1.7 8.7 10.8 13.0 15.2 17.4 24 108 2.6 6.3 7.9 9.5 11.1 12.7 2.6 13.1 16.4 19.6 22.9 26.1 144 3.5 8.4 10.5 12.6 14.7 16.8 3.5 17.3 21.7 26.1 30.4 34.7 180 4.4 10.5 13.2 15.8 18.4 21.1 4.3 21.8 27.2 32.7 38.1 43.6 36 1.1 1.4 1.7 2.0 2.4 2.7 1.1 3.8 4.8 5.7 6.7 7.7 72 2.3 2.7 3.4 4.1 4.8 5.4 2.2 7.7 9.7 11.6 13.5 15.4 49 108 3.4 4.1 5.1 6.1 7.1 8.1 3.4 11.6 14.4 17.3 20.2 23.1 144 4.5 5.4 6.8 8.2 9.5 10.9 4.5 15.3 19.2 23.0 26.8 30.7 180 5.7 6.8 8.5 10.2 11.9 13.6 5.6 19.1 23.9 28.7 33.6 38.4 FAS is fixed-area sampling method, No. plot is the numer of fixed-area plots, N/ha is density of standing dead trees per ha, EZP is EZ-Hurdle method, and PPR is point to plot ratio. The time data was further extrapolated via simulation to allow for the time requirement to be explored under different sampling scenarios (Table 6.8). The data show that it can take a 76 very large amount of additonal time to collect the additional point to standing dead tree distances to gain greater precision under the EZ-Hurdle method, and that this is sensitive to both stand conditions (density and spatial pattern) as well as the quantity of additonal data collected (Table 6.8). Much of this additional time comes from searching for the occasional dead tree that is very far away from the plot center or random point (when PPR > 1). This is particularly problematic when dead trees are in a clustered pattern at low density, because the distribution of distance between random point to the nearest standing dead trees is strongly right skewed (Fig. 6.7). The additonal time was much lower when dead trees were randomly distributed (about one-third to one-half of that for clustered trees), but was still quite large compared to the base time for FAS (Table 6.8). 77 Figure 6.7. The distribution of distance between a random point to the nearest standing dead tree by density and spatial pattern under the simulation study. The dashed line is average distance. 78 The additional time requirements were also extrapolated from the BBDMS data through the bootstrapping procedure. As in the simulation, times were much lower when the density of standing dead trees was higher (52/ha) in the non-infected forests, for all PPR and numbers of plots, in comparison with non-infected forests where denstities were lower. Additonal time requirements for the EZ-Hurdle method increased linearly with increasing the number of additional random points outside of the plots (Fig. 6.8). Figure 6.8. Additional time requirement for EZ-Hurdle method above the time cost for FAS by PPR from field study. Solid line is infected forests and dashed line is non-infected forests. 79 6.4.3. Comparison of the Coefficient of Variation by Method in Simulation The coefficient of variation for different sampling strategies explotred via simulation are shown in Figure 6.9. The coefficients of variation decreased with increasing the number of plots for both methods. The coefficients of variation of EZP method were smaller than FAS for all PPR and the number of fixed-area plot. The improvement of coefficient of variation by EZP method was greater under clustered patterns than random patterns, given the same number of fixed-area plots. As expected from the previous applied study, there were greater or similar improvements of coefficients of variation when PPR is greater than 1, because additional locations are sampled outside of the plot network. However, the data above show that this additonal data comes at a very high time cost to collect the auxiliary data. 80 Figure 6.9. The change of coefficient of variation of estimated density by PPR and the number of fixed-area plots. 81 6.4.4. Comparison Sampling Efficiency by Method First, relative sampling efficiency was examined via simulation. Relative sampling efficiencies for estimating abundance of standing dead trees are shown in Tables 6.9 when suplot was applied. Because relative efficiency is greater than 1 for all densities and spatial patterns (Table 6.9), the FAS method has a better sampling efficiency than the EZ-Hurdle method because the increase in precision required a relatively much larger increase in the time requirement. Table 6.9. Relative sampling efficiency of EZ-Hurdle method to compare FAS by density and spatial pattern. Random Cluster PPR PPR 1.00 1.25 1.50 1.75 2.00 1.00 1.25 1.50 1.75 2.00 36 5.5 5.5 5.9 5.9 6.1 6.4 7.0 7.4 7.8 8.1 72 5.5 6.2 6.2 6.4 6.3 5.5 5.6 5.8 6.2 6.4 12 108 5.2 5.8 5.9 5.9 6.1 4.9 5.4 5.5 5.7 6.0 144 5.3 5.9 6.1 6.2 6.5 4.8 4.9 5.1 5.3 5.6 180 5.2 5.7 6.0 6.4 6.4 5.1 5.4 5.8 6.2 6.5 36 3.1 3.2 3.3 3.4 3.5 1.3 1.2 1.2 1.2 1.2 72 3.0 2.8 3.0 3.1 3.2 1.6 1.6 1.6 1.6 1.7 24 108 3.0 3.2 3.2 3.3 3.5 1.9 2.0 2.0 2.2 2.2 144 2.9 2.9 2.9 3.0 3.2 2.1 2.2 2.3 2.4 2.4 180 3.0 3.1 3.2 3.4 3.6 2.4 2.5 2.7 2.8 2.9 36 1.8 1.8 1.9 2.0 2.1 1.3 1.2 1.2 1.2 1.2 72 1.8 1.9 1.9 1.9 2.0 1.5 1.4 1.4 1.5 1.5 49 108 1.9 1.9 1.9 2.0 2.0 1.8 1.9 2.0 2.0 2.1 144 1.9 1.8 1.9 2.0 2.1 2.0 2.0 2.0 2.1 2.1 180 1.9 1.8 1.9 1.9 2.0 2.1 2.2 2.2 2.2 2.3 No. plot is the numer of fixed-area plots, N/ha is density of standing dead trees per ha, EZP is EZ-Hurdle method with Poisson distribution, and PPR is point to plot ratio. N/ha No. plot In the applied study, the difference in efficiency between EZ-Hurdle and FAS (with small plots) was smaller. However, the FAS method with subplots had better sampling efficiency than EZ-Hurdle method in field study (Table 6.10) and the overall patterns in the results mirrored 82 those observed in the simulation. The time requirement for the EZ-Hurdle method was more than double in non-infected forests, relative sampling efficiencies were worse when density of standing dead trees is low (non-infected forests) than high density forests (infected forest). In combination, the field study and simulation results, indicate that, holding all else constant, it is more cost-effective to add more smaller (FIA-subplot-sized) plots, where possible, then to collect point-to-dead tree distances to improve precison under the EZ-Hurdle method. However, this is not true for larger plots sizes (e.g., annual-size plots) and is irrelevant under circumstances where additional plots cannot be added, due to other costs or data needs (e.g., permanent sample plots, such as those used by FIA). Table 6.10. Relative sampling efficiency of EZ-Hurdle method to compare FAS by infected status. Infected status Yes No No. plot 36 72 108 144 180 36 72 108 144 180 PPR 1.00 1.55 1.60 1.51 1.63 1.55 3.97 3.82 3.94 3.68 3.83 1.25 1.70 1.69 1.68 1.78 1.82 3.40 4.10 4.56 4.06 4.79 83 1.50 1.81 1.86 1.81 1.95 1.96 4.49 4.46 4.97 4.57 5.34 1.75 1.93 2.04 1.97 2.11 2.00 2.11 2.18 2.09 2.26 4.71 4.76 5.20 4.99 4.93 4.95 5.36 5.63 6.5. Discussion for Cost and Sampling Efficiency Sampling efficiency is an important factor to decide the sampling strategy for estimating population parameters (Gregoire and Valentine 2008). To estimate the abundance of standing dead trees, inventory methods for estimating live trees have been modified (Kenning et al. 2005). Modifications include increasing the intensity of typical sampling methods such as strip cruising, sampling with fixed-area plots, and horizontal point sampling (prism sampling), which can have poor precision and high variation, despite a considerable time investment, when such methods are applied to areas where individuals in the population are in relatively low abundance or in high variability areas (Bull et al. 1990). In this study, the EZ-Hurdle method increased the precision of estimates under all conditions by introducing auxiliary data, but at a significantly increased cost under some conditions. Although the EZ-Hurdle model showed better precision than the FAS method when additional information was applied to estimate the expected zero probability, EZ-Hurdle method had worse sampling efficiency than FAS method due to the relatively high search time to find and measure the nearest dead tree, particularly at low standing dead tree density, where zero inflation is most likely, and where EZ-Hurdle performs the best. According to the previous study for N-tree sampling method, one-tree sampling method showed better sampling efficiency than FAS method for estimating the abundance of snags when the density is greater than 70 per ha (Kenning et al. 2005). In this latter case, the FAS method needed more than twice the time requirement than one-tree sampling. In this study, the focus was on stands with lower abundances of standing dead trees which caused the EZ-Hurdle method to be less time efficient when compared with FAS using smaller (FIA-sized subplot) plots, even though EZ-Hurdle produced better estimates. In other words, it’s more cost efficient to add more FAS plots, rather 84 than adding more dead-tree distances for plots of this size, under situations where zero inflation in the data is likely (very low densities of dead trees). On the other hand, the EZ-Hurdle method showed better cost efficiency when compared with FAS using larger (FIA-sized annular) plots, which means, e.g., that EZ-Hurdle method might be beneficial in the Northwestern FIA region where currently annular plots are used in some cases to estimate standing dead tree abundances. Hence, in situations where there is a restriction on the number of plot locations that can be added, the EZ-Hurdle method can be an alternative approach to improve the precision of estimates for the density of standing dead trees because it can improve estimates without the need to establish new plots. But, where there is no such restriction, applying many, smaller FAS plots might be the most effective way to deal with the problem of zero-inflated data sets that arise when dead tree abundances are relatively low. Therefore, EZ-Hurdle method might be most useful to apply to monitoring or inventory programs which use a fixed number of permanent sample plots or cannot change the sampling design, such as the BBDMS in Michigan, the FIA in the USA and the National Inventory System in South Korea (which uses a very similar design to FIA). The EZ-Hurdle method needs less than 5 minutes to find the nearest standing dead tree. In the case of the FIA program, it takes about one day to collect all data to finish one FIA plot, which consists of 4 subplots, so the additional 20 minutes to improve the precision of standing dead tree estimates is a relatively small portion of the total work load for that plot. Finally, given that the EZ-Hurdle method produces better estimates and its main limitation is the (time) cost of the auxiliary data, it would be beneficial to reduce the cost of the auxiliary data to improve the sampling efficiency of the method. One alternative could be estimating the inclusion probability of a standing dead tree under the sampling design from stand attributes such as forest type, density, spatial pattern, or basal area. In this study, the inclusion 85 probability was predictably different by density of standing dead trees, species, and spatial pattern, in most cases, which suggests that this is a promising. This will be the subject of further research on the EZ-Hurdle method. Another alternative could be a distance-limited method which restricts the search radius to find the nearest standing dead tree; this is pursued in the next chapter. 86 7. Developing a Distance-Limited EZ-Hurdle Method 7.1. Overview Both the field and simulation studies showed that that the time requirement to search the nearest standing dead tree was dramatically increased with decreasing the density of standing dead trees because there are points in space where it takes quite a long time to find the nearest dead tree. Therefore, a distance-limited EZ-Hurdle method was explored, where various maximum search radii were applied to find the nearest standing dead tree. 7.2. Methods Simulation data were used again in this study, following the same basic design as described previously, except that under the original EZ-Hurdle method, there is no distance restriction to search for the nearest standing dead tree. In this new simulation study, a subplot (7.32 m radius) was again used to collect the standing dead tree data, but six different scenarios were used to define the maximum search radius to find the nearest standing dead tree, the first four at +2 meter increments beyond the plot boundary, the fifth out to the boundary of an FIA annular plot and finally to unlimited distance from the point (Table 7.1). Following the previous design (see Table 5.2), five different PPR were applied to collect the data and two different spatial patterns, random and cluster I, were applied with three different densities (12, 24, and 49/ha) to define the underlying populations. For each scenario, 1,000 repetitions were applied to examine the properties of distance-limited EZ-Hurdle method. 87 Table 7.1. Maximum search radii to find the nearest standing dead tree. 9.32 11.32 Maximum search radius (m) 13.32 15.32 17.94 Unlimited The coefficient of variation for estimates and expected zero probability were calculated by search radius. The change of coefficient of variation was compared by density of standing dead trees and spatial patterns. In order to evaluate the time requirement by search radius, time requirement was calculated using regression models developed previously from BBDMS and PRC data (see Tables 6.2 and 6.3). 7.3. Results Comparison of coefficient of variation by search radii Tables 7.2 and 7.3 show the coefficient of variation of estimates (N/ha) and expected zero probability (EZP) by the number of fixed-area plot and PPR when the density of standing dead trees is 12 per ha, and when the spatial pattern is random and clustered, respectively. As in previous analyses, the EZ-Hurdle method had a smaller coefficient of variation for all search radii and PPR for both spatial patterns than FAS (Tables 7.2 and 7.3). There was a strong relationship between the coefficients of variation of the estimates and expected zero probabilities. When the coefficient of variation of expected zero probability decreases, the coefficient of variation of estimate also decreases because the expected zero probability in EZ-Hurdle model reduces the uncertainty caused by zero observations in data. This means that the EZ-Hurdle method still gives a benefit even when one searches only a small distance outside of the FAS plot. 88 Table 7.2. The CV of estimates (N/ha) and expected zero probability (EZP) by PPR and search radius (m) for EZ-Hurdle method when the spatial pattern of standing dead trees is random and density is 12/ha. Search radius (m) 9.32 11.32 13.32 15.32 17.94 ∞ N/ha 38.24 34.26 33.39 33.33 33.66 34.34 37.36 1.00 EZP 8.35 8.02 7.57 7.46 7.5 7.62 8.04 N/ha 38.24 31.04 30.06 30.09 30.58 31.4 34.12 1.25 EZP 8.35 7.08 6.62 6.57 6.66 6.82 7.21 N/ha 38.24 29.46 28.6 28.77 29.28 29.98 32.51 36 1.50 EZP 8.35 6.51 6.13 6.11 6.22 6.36 6.72 N/ha 38.24 27.93 27.18 27.32 27.69 28.19 30.47 1.75 EZP 8.35 6.05 5.72 5.71 5.79 5.89 6.23 N/ha 38.24 26.85 26.15 26.24 26.61 27.12 29.14 2.00 EZP 8.35 5.71 5.40 5.38 5.46 5.56 5.85 N/ha 25.58 23.57 22.94 22.85 23.07 23.86 24.91 1.00 EZP 5.55 5.21 4.97 4.93 4.97 5.12 5.43 N/ha 25.58 21.83 21.34 21.3 21.43 22.11 24.06 1.25 EZP 5.55 4.75 4.54 4.52 4.54 4.67 4.95 N/ha 25.58 20.22 19.72 19.71 19.85 20.38 22.24 72 1.50 EZP 5.55 4.34 4.13 4.11 4.14 4.25 4.52 N/ha 25.58 19.03 18.57 18.58 18.82 19.39 21.1 1.75 EZP 5.55 4.02 3.82 3.81 3.87 3.99 4.25 N/ha 25.58 17.91 17.46 17.47 17.68 18.14 19.7 2.00 EZP 5.55 3.73 3.55 3.54 3.59 3.69 3.93 N/ha 20.94 18.85 18.23 18.25 18.54 19.14 19.88 1.00 EZP 4.67 4.19 3.97 3.97 4.04 4.17 4.44 N/ha 20.94 17.22 16.68 16.73 17.04 17.58 19.17 1.25 EZP 4.67 3.74 3.57 3.57 3.64 3.76 4.01 N/ha 20.94 16.09 15.61 15.64 15.89 16.39 17.82 108 1.50 EZP 4.67 3.41 3.25 3.26 3.32 3.43 3.66 N/ha 20.94 15.12 14.67 14.67 14.9 15.3 16.64 1.75 EZP 4.67 3.15 3.00 3.00 3.05 3.14 3.37 N/ha 20.94 14.55 14.11 14.10 14.32 14.71 15.91 2.00 EZP 4.67 2.99 2.85 2.84 2.89 2.98 3.19 No. fixed is the number of fixed-area plots, PPR is point to plot ratio, FAS is fixed-area sampling method, and EZP is expected zero probability. No. fixed PPR Value FAS 89 Table 7.2. Continue. Search radius (m) 9.32 11.32 13.32 15.32 17.94 ∞ N/ha 17.85 16.31 15.94 15.89 16.16 16.56 17.12 1.00 EZP 3.96 3.59 3.45 3.44 3.50 3.58 3.81 N/ha 17.85 14.89 14.52 14.46 14.7 15.06 16.42 1.25 EZP 3.96 3.18 3.05 3.04 3.09 3.17 3.39 N/ha 17.85 14.04 13.65 13.58 13.8 14.15 15.42 144 1.50 EZP 3.96 2.95 2.82 2.80 2.86 2.93 3.14 N/ha 17.85 13.44 13.04 13.00 13.22 13.55 14.62 1.75 EZP 3.96 2.79 2.66 2.65 2.70 2.77 2.94 N/ha 17.85 12.86 12.48 12.46 12.68 13.00 14.05 2.00 EZP 3.96 2.61 2.49 2.49 2.54 2.60 2.78 N/ha 16.16 14.75 14.26 14.23 14.45 14.9 15.42 1.00 EZP 3.54 3.2 3.03 3.02 3.07 3.17 3.36 N/ha 16.16 13.47 12.99 12.99 13.17 13.57 14.71 1.25 EZP 3.54 2.88 2.72 2.72 2.76 2.84 3.02 N/ha 16.16 12.75 12.29 12.28 12.45 12.81 13.92 180 1.50 EZP 3.54 2.68 2.53 2.52 2.56 2.64 2.81 N/ha 16.16 12.27 11.86 11.84 12.01 12.37 13.42 1.75 EZP 3.54 2.55 2.41 2.4 2.44 2.52 2.69 N/ha 16.16 11.58 11.22 11.19 11.33 11.66 12.66 2.00 EZP 3.54 2.37 2.25 2.24 2.27 2.34 2.51 No. fixed is the number of fixed-area plots, PPR is point to plot ratio, FAS is fixed-area sampling method, and EZP is expected zero probability. No. fixed PPR Value FAS For example, when maximum search radius is 9.32, which is 2 m greater than an FIA subplot, the EZ-Hurdle method had less coefficient of variation than FAS (Tables 7.2 and 7.3). Therefore, distance-limited EZ-Hurdle method can improve the precision without heavy investment of search time to find the nearest standing dead trees. In order to confirm this trend, the coefficient of variation of estimate and expected zero probability were examined with additional two different densities such as 24 and 49 per ha. 90 Table 7.3. The CV of estimates (N/ha) and expected zero probability (EZP) by PPR and search radius (m) for EZ-Hurdle method when the spatial pattern of standing dead trees is cluster and density is 12/ha. Search radius (m) 9.32 11.32 13.32 15.32 17.94 ∞ N/ha 39.51 36.54 35.59 34.54 34.02 33.44 33.12 1.00 EZP 7.53 7.32 6.64 6.28 6.16 6.07 6.14 N/ha 39.51 34.68 33.61 32.61 32.11 31.67 31.35 1.25 EZP 7.53 6.52 5.93 5.6 5.49 5.43 5.53 N/ha 39.51 32.37 31.61 30.94 30.47 29.95 29.59 36 1.50 EZP 7.53 5.78 5.31 5.07 4.98 4.91 5.02 N/ha 39.51 31.07 30.27 29.63 29.14 28.71 28.32 1.75 EZP 7.53 5.45 5 4.78 4.69 4.64 4.73 N/ha 39.51 29.73 28.99 28.45 27.99 27.59 27.11 2.00 EZP 7.53 5 4.6 4.41 4.32 4.28 4.34 N/ha 27.69 25.99 25.21 24.77 24.24 23.69 22.93 1.00 EZP 5.22 4.91 4.57 4.44 4.33 4.26 4.31 N/ha 27.69 23.82 22.88 22.5 22.07 21.51 20.96 1.25 EZP 5.22 4.26 3.91 3.79 3.71 3.64 3.71 N/ha 27.69 22.42 21.55 21.17 20.81 20.33 19.6 72 1.50 EZP 5.22 3.91 3.59 3.48 3.41 3.35 3.37 N/ha 27.69 21.58 20.83 20.38 20.04 19.6 18.91 1.75 EZP 5.22 3.71 3.42 3.29 3.23 3.18 3.19 N/ha 27.69 20.62 19.93 19.49 19.16 18.73 18.12 2.00 EZP 5.22 3.50 3.23 3.11 3.05 3.00 3.02 N/ha 23.28 21.46 20.71 20.26 19.86 19.42 18.77 1.00 EZP 4.42 3.98 3.68 3.56 3.50 3.44 3.50 N/ha 23.28 20.52 19.79 19.38 19.00 18.58 17.83 1.25 EZP 4.42 3.66 3.38 3.27 3.21 3.15 3.18 N/ha 23.28 19.12 18.47 18.07 17.72 17.35 16.63 108 1.50 EZP 4.42 3.31 3.05 2.95 2.89 2.85 2.87 N/ha 23.28 17.89 17.30 16.95 16.63 16.31 15.69 1.75 EZP 4.42 3.06 2.83 2.73 2.67 2.63 2.66 N/ha 23.28 17.33 16.79 16.47 16.17 15.85 15.19 2.00 EZP 4.42 2.89 2.68 2.59 2.54 2.50 2.51 No. fixed is the number of fixed-area plots, PPR is point to plot ratio, FAS is fixed-area sampling method, and EZP is expected zero probability. No. fixed PPR Value FAS 91 Table 7.3. Continue. Search radius (m) 9.32 11.32 13.32 15.32 17.94 ∞ N/ha 20.17 18.67 17.90 17.55 17.22 16.79 16.25 1.00 EZP 3.86 3.49 3.22 3.13 3.08 3.03 3.07 N/ha 20.17 17.04 16.28 15.92 15.61 15.24 14.78 1.25 EZP 3.86 3.10 2.84 2.75 2.70 2.65 2.70 N/ha 20.17 16.02 15.33 14.99 14.70 14.39 13.92 144 1.50 EZP 3.86 2.89 2.65 2.57 2.52 2.49 2.53 N/ha 20.17 15.36 14.70 14.38 14.11 13.83 13.25 1.75 EZP 3.86 2.72 2.50 2.42 2.37 2.35 2.36 N/ha 20.17 14.73 14.11 13.80 13.56 13.30 12.74 2.00 EZP 3.86 2.58 2.37 2.29 2.25 2.23 2.23 N/ha 17.65 16.83 16.29 15.91 15.63 15.32 14.69 1.00 EZP 3.33 3.10 2.90 2.80 2.75 2.72 2.72 N/ha 17.65 15.54 15.05 14.73 14.47 14.22 13.66 1.25 EZP 3.33 2.80 2.61 2.53 2.49 2.46 2.47 N/ha 17.65 14.82 14.34 14.06 13.83 13.60 13.09 180 1.50 EZP 3.33 2.63 2.45 2.38 2.34 2.32 2.32 N/ha 17.65 14.13 13.73 13.50 13.30 13.08 12.59 1.75 EZP 3.33 2.44 2.28 2.22 2.19 2.17 2.16 N/ha 17.65 13.50 13.15 12.95 12.77 12.57 12.14 2.00 EZP 3.33 2.28 2.13 2.07 2.04 2.03 2.03 No. fixed is the number of fixed-area plots, PPR is point to plot ratio, FAS is fixed-area sampling method, and EZP is expected zero probability. No. fixed PPR Value FAS Figures 7.1 and 7.2 show the coefficients of variation of estimated density of standing dead trees (N/ha) by spatial pattern and search radii when density of standing dead trees are 24 per ha and 49 per ha, respectively. The same trends are observed as when the density of standing dead trees was 12 per ha. These results indicate that the coefficient of variation for the estimated density of standing dead trees by EZ-Hurdle method is always less than FAS for both spatial patterns and PPR, whenever one searches beyond the plot radius. One strange pattern was that the coefficient of variation decreased until the search radius was 13.32 and then increased or was very similar when the search radius was greater than 13.32 92 up to an unlimited distance for all number of fixed-area plots and PPR under a random pattern (Tables 7.2 and 7.3 and Figs. 7.1 and 7.2). Figure 7.1. The coefficients of variation (CV) of estimated density of standing dead trees by spatial pattern and search radius. FAS is the fixed-area sampling method, PPR is the point to plot ratio, and Inf. means unlimited search radius. 93 Figure 7.2. The coefficients of variation (CV) of estimated expected zero probability (EZP) by spatial pattern and search radius. FAS is the fixed-area sampling method, PPR is the point to plot ratio, and Inf. means an unlimited search radius was used. Whereas, the coefficients of variation was decreased continuously when the search radius limitation increased up to an unlimited search under the clustered pattern. It was expected that the expected zero probability for the given search radius (7.32 m radius in this study) should have less variation when there is no distance limit to search the nearest standing dead tree. However, according to Table 7.2 and Figure 7.1, the EZ-hurdle method had the best precision 94 when the maximum search radius was 11.32 or 13.32 m under a random spatial pattern. This unusual result was checked several times but persisted, suggesting that under some conditions the distance-limited approach can perform better than the distance-unlimited approach in terms of improved precision. In general, the model selected as the best model for estimating the inclusion probability was the one which has the least square errors for data. Especially, in the case of a parametric method, it is possible that the least square error can be worse when more information or data is added to fit the model when the additional information creates more variability. It is not guaranteed that the inclusion probability in 7.32 m radius is equal to for all search radii (Fig. 7.3). For example, the inclusion probability was 0.34 when the maximum search radius is 9.32 and 11.32 m but it was decreased to 0.33 when the maximum search radius is greater than 11.32 (Fig. 7.3). When events (standing dead trees in this study) are randomly distributed the nearest neighbor distance from random point should be equal to event to event distances. It means that there is no pattern for the inter-event interaction, which is the distance between point to event or event to event in space. It is possible that the nearest neighbor distance from large search areas to find the nearest standing dead tree can bring another inter-event interaction which is different inter-event interaction within small search area because the define of spatial pattern of events (standing dead trees) using inter-event interaction is very sensitive to the spatial scale (Bailey and Gatrell 1995). Therefore, when additional information is collected from the larger search area which is greater than 13.32 m were added to model the inclusion probability, the addition data can introduce the random noise such as another inter-event interaction to the data and can increase the variation to estimate the inclusion probability of give search radius. In other words, it is not guaranteed that one can have better precision to estimate inclusion probability although 95 add more data. Figure 7.3. The change of inclusion probability of a standing dead tree by increasing the search radius when density of standing dead trees is 24/ha and spatial pattern is clustered. 96 Comparison of time requirement by search radius Figure 7.4 shows the additional time requirement for EZ-Hurdle method by maximum search radius. When the search radius is limited, the EZ-Hurdle method takes a lot less time to collect the distance information. For example, when the maximum search radius is 9.32, more than 70% of the time requirement can be saved compared to the distance-unlimited method. If maximum search radius is 17.95 m, more than 50% of the time requirement can be saved. In general, for both spatial patterns and all densities, additional time requirements were rapidly increased when search radius changes from 17.95 m to unlimited distance (effectively infinite, denote as Inf. in Fig. 7.4). In the case of a random pattern, the additional time requirement was moderately increased until search radius was 17.95 m in comparison to the additional time requirement under a clustered pattern. Especially, when the density of standing dead trees is 49 per ha in random pattern, there was similar time requirement until the search radius was 17.95 m. Thus, much of the additional time cost, which comes from searching for the nearest standing dead tree which is, on occasion, very far away from the point, which may even reduce the quality of the estimate under some conditions (Figs. 7.1 and 7.2), can be saved with distance limiting method. 97 Figure 7.4. Addition time requirement for EZ-Hurdle method by maximum search radius. PPR is point to plot ratio, FAS is fixed-area sampling method, and Inf. is unlimited distance. 98 7.4. Discussion for Distance-Limited EZ-Hurdle Method The distance-limited EZ-Hurdle method showed better precision for all restricted search radii, spatial patterns, and densities. Moreover, it still shows better precision even when PPR is 1. Therefore, distance-limited EZ-Hurdle method can be applied to estimate the density of standing dead trees without changes in plot designs such as that used in the FIA and FHM program. According to the results of time requirement study, the distance-limited EZ-Hurdle method showed great improvement in cost efficiency over the standard EZ-Hurdle method with no distance limit set to find the nearest standing dead tree. The additional time requirement was less than 70% in comparison with the distance-unlimited method when the search radius is 2 m greater than the subplot radius. In addition, we can save more than 50% time requirement when maximum search radius is equal to annular plot (17.95 m) in FIA. Although the maximum search radius has to be decided based on budget or sampling design, according to the results of this study, 13.32 m should be the best choice when the standing dead trees are randomly distributed. When standing dead trees are clustered, 17.95 m should be the best as the maximum search radius because the precision of estimates is increased with extending the search radius and we can save more than 50% time requirement than standard EZ-Hurdle method. When the maximum search radius is 17.95 m (equal to annular plot), average time investment per plot should be less than 2 minutes based on the BBDMS data. However, we do not know the true spatial pattern of standing dead trees in practice. Therefore, these results suggest that the most efficient application of the method for FIA / FHM program should be the 13.2 m radius distance-limited EZ-Hurdle method, which should prove to be the best all around in the typical case when the spatial pattern of standing dead trees is unknown. It should allow for modest gains in precision in estimates of standing dead tree densities under the 99 current FIA or FHM plot design, without changing plot design and relatively small investment of time and cost at each plot. 100 8. Conclusion Based on both simulation studies and applied studies in real forests, the EZ-Hurdle method can improve the precision of an estimate where zero-inflated data are a source of estimation error. However, the method requires adding additional (auxiliary) data to estimate the expected zero proportion in the data, in this case by collecting additional data describing the distance between a fixed-area plot center or random point and the nearest standing dead tree. This additional data comes at an additional time cost, which can be quite large as shown here and can make the method less cost–efficient than cheaper methods even if the estimates are of lower quality. For the specific case of improving estimates of standing dead tree density from fixedarea plots, the EZ-Hurdle method produces better, but less cost-efficient estimates than FAS method with smaller (e.g., 7 m radius) fixed-area plots, because the time to search for standing dead trees can be quite long compared to establishing a small fixed radius plot which quite often contains few or no dead trees and it proved much more time-cost competitive than using larger (e.g., 17.95 m) fixed area plots. A search distance-limited variant of the EZ-Hurdle method showed promise for reducing the cost inefficiency of the method. The results of this study also suggest that the expected zero probability for including standing dead trees in fixed area plots may also be estimated from simple stand data, such as forest type and basal area, at little or no additional cost, which would make the method even more cost efficient. For the case of standing dead trees, the EZ-Hurdle method is best applied under conditions where zero-inflation in the data is likely and where adding additional samples is unlikely. In this dissertation, EZ-Hurdle method was been applied to estimate the density of standing dead tree. Because EZ-Hurdle method can perform better when there are large or excess zero observations in data, it might also be used to estimate the density of rare species or 101 other low-abundance populations. Future work is planned to explore the application of the EZHurdle method to estimate the carbon sequestration of dead trees. Further research is needed to examine the properties of the inclusion probability of a standing dead tree for different forest types, because it may be used to define an “expected mortality” benchmark for forest health monitoring and understanding stand mortality processes. 102 References 103 9. References Affleck, D.L.R. 2006. Poisson mixture models for regression analysis of stand-level mortality. Can. J. For. Res. 36(11):2994-3006. Avery, T.E., and H.E. Burkhart. 1983. Forest measurements. McGraw-Hill Inc., New York. Baddeley, A., and R. Turner. 2005. Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software 12(6):1-42. Bailey, T.C., and A.C. Gatrell. 1995. Interactive spatial data analysis. Longman Scientific & Technical Essex. Bechtold, W.A., and P.L. Patterson. 2005. The enhanced Forest Inventory and Analysis programnational sampling design and estimation procedures. Gen. Tech. Rep. SRS-80. Asheville, NC: USDA For. Serv., Southern Research Station 85. Bull, E.L., R.S. Holthausen, and D.B. Marx. 1990. How to determine snag density. West. J. Appl. For. 5(2):56-58. Bütler, R., and R. Schlaepfer. 2004. Spruce snag quantification by coupling colour infrared aerial photos and a GIS. For. Ecol. Manag. 195(3):325-339. Clark, D.B., C.S. Castro, L.D.A. Alvarado, and J.M. Read. 2004. Quantifying mortality of tropical rain forest trees using high-spatial-resolution satellite data. Ecology Letters 7(1):52-59. Cline, S.P., A.B. Berg, and H.M. Wight. 1980. Snag characteristics and dynamics in Douglas-fir forests, western Oregon. J. Wildl. Manage. 44(4):773-786. Curtis, R.O., and D.D. Marshall. 2005. Permanent-plot procedures for silvicultural and yield research. Gen. Tech. Rep. PNW-GTR-634. USDA For. Serv. Delaney, M., S. Brown, A.E. Lugo, A. Torres-Lezama, and N. Bello Quintero. 1997. The distribution of organic carbon in major components of forests located in five life zones of Venezuela. J. Trop. Ecol. 13(5):697-708. Dorazio, R.M. 1999. Design-based and model-based inference in surveys of freshwater mollusks. J. N. Am. Benthol. Soc. 18(1):118-131. Dueser, R.D., and H.H. Shugart Jr. 1978. Microhabitats in a Forest-Floor Small Mammal Fauna. Ecology 59(1):89-98. Efron, B., and R. Tibshirani. 1993. An introduction to the bootstrap. Chapman & Hall/CRC, New York. 104 Eskelson, B.N.I., H. Temesgen, and T.M. Barrett. 2009. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods. Can. J. For. Res. 39(9):1749-1765. Fisher, R.A. 1922. The Accuracy of the Plating Method of Estimating the Density of Bacterial Populations. Annals of Applied Biology 9:325-359. Franklin, J.F., F. Hall, W. Laudenslayer, C. Maser, J. Nunan, J. Poppino, C.J. Ralph, and T. Spies. 1986. Interim definitions for old-growth Douglas-fir and mixed-conifer forests in the Pacific Northwest and California. Research note PN-447. USDA For. Serv., Portland, Oregon. Franklin, J.F., H.H. Shugart, and M.E. Harmon. 1987. Tree Death as an Ecological Process. BioScience 37(8):550-556. Fridman, J., and M. Walheim. 2000. Amount, structure, and dynamics of dead wood on managed forestland in Sweden. For. Ecol. Manag. 131(1-3):23-36. Ganey, J.L. 1999. Snag density and composition of snag populations on two National Forests in northern Arizona. For. Ecol. Manag. 117(1-3):169-178. Gray, A. 2003. Monitoring stand structure in mature coastal Douglas-fir forests: effect of plot size. For. Ecol. Manag. 175(1-3):1-16. Green, P., and G.F. Peterken. 1997. Variation in the amount of dead wood in the woodlands of the Lower Wye Valley, UK in relation to the intensity of management. For. Ecol. Manag. 98(3):229-238. Gregoire, T. 1998. Design-based and model-based inference in survey sampling: Appreciating the difference. . Can. J. For. Res. 28(10):1429-1447. Gregoire, T., G., and H.T. Valentine. 2008. Sampling techniques for natural and environmental resources. Chapman & Hall/CRC. Greif, G.E., and O.W. Archibold. 2000. Standing-dead tree component of the boreal forest in central Saskatchewan. For. Ecol. Manag. 131(1-3):37-46. Harmon, M.E., W.K. Ferrell, and J.F. Franklin. 1990. Effects on Carbon Storage of Conversion of Old-Growth Forests to Young Forests. Science 247(4943):699-702. Harmon, M.E., J.F. Franklin, F.J. Swanson, P. Sollins, S.V. Gregory, J.D. Lattin, N.H. Anderson, S.P. Cline, and N.G. Aumen. 1986. Ecology of coarse woody debris in temperate ecosystems. Adv. Ecol. Res. 15:133-302. Hilbe, J. 2007. Negative binomial regression. Cambridge University Press New York. 105 Jaeger, R.G. 1980. Microhabitats of a terrestrial forest salamander. Copeia 1980(2):265-268. Keenan, R.J., C.E. Prescott, and J.P.H. Kimmins. 1993. Mass and nutrient content of woody debris and forest floor in western red cedar and western hemlock forests on northern Vancouver Island. Can. J. For. Res. 23(6):1052-1059. Kenning, R.S., M.J. Ducey, J.C. Brissette, and J.H. Gove. 2005. Field efficiency and bias of snag inventory methods. Can. J. For. Res. 35(12):2900-2910. Kimmins, J.P. 1992. Balancing Act: Environmental Issues in Forestry. UBC Press, Vancouver. Krankina, O.N., and M.E. Harmon. 1995. Dynamics of the dead wood carbon pool in northwestern Russian boreal forests. Water, Air, & Soil Pollution 82(1):227-238. Lee, P.C., S. Crites, M. Nietfeld, H.V. Nguyen, and J.B. Stelfox. 1997. Characteristics and origins of deadwood material in aspen-dominated boreal forests. Ecological Applications 7(2):691-701. Lessard, V., D.D. Reed, and N. Monkevich. 1994. Comparing n-tree distance sampling with point and plot sampling in northern Michigan forest types. North. J. Appl. For. 11(1):12-16. Maser, C., and J.M. Trappe. 1984. The seen and unseen world of the fallen tree. USDA For. Serv. Gen. Tech. Rep. PNW-GTR-164:153p. Matern, B. 1986. Spatial variation, volume 36 of Lecture Notes in Statistics. New York: Springer-Verlag, second edition. McCarthy, B.C., and R.R. Bailey. 1994. Distribution and abundance of coarse woody debris in a managed forest landscape of the central Appalachians. Can. J. For. Res. 24(7):1317-1329. McClelland, B.R. 1977. Relationships between hole-nesting birds, forest snags, and decay in western larch-douglas-fir forests of the northern Rocky Mountains, Dissertation, University of Montana, Missoula, Montana, USA. McComb, W.C., T.A. Spies, and W.H. Emmingham. 1993. Douglas-Fir Forests: Managing for Timber and Mature-Forest Habitat. J. Forestry 91(12):31-42. McCullough, D.G., R.L. Heyd, and J.G. O'Brien. 2001. Biology and management of beech bark disease. Extension Bull. E-2746, Michigan State University Extension Service. Montes, F., and I. Canellas. 2006. Modelling coarse woody debris dynamics in even-aged Scots pine forests. For. Ecol. Manag. 221(1-3):220-232. O'Langhlin, J., and P.S. Cook. 2003. Inventory-based forest health indicators: Implications for national forest management. J. Forestry 101(2):11-17. 106 Oswalt, S.N., T.J. Brandeis, and C.W. Woodall. 2008. Contribution of Dead Wood to Biomass and Carbon Stocks in the Caribbean: St. John, US Virgin Islands. Biotropica 40(1):20-27. Petrillo, H.A., J.A. Witter, and E.M. Thompson. 2004. Michigan beech bark disease monitoring and impact analysis system. Unpublished, University of Michigan. Potts, J.M., and J. Elith. 2006. Comparing species abundance models. Ecological Modelling 199(2):153-163. R Development Core Team. 2011. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org. Ranius, T., O. Kindvall, N. Kruys, and B.G. Jonsson. 2003. Modelling dead wood in Norway spruce stands subject to different management regimes. For. Ecol. Manag. 182(1-3):13-29. Reid, C.M., A. Foggo, and M. Speight. 1996. Dead wood in the Caledonian pine forest. Forestry 69(3):275. Rothstein, D.E., Z. Yermakov, and A.L. Buell. 2004. Loss and recovery of ecosystem carbon pools following stand-replacing wildfire in Michigan jack pine forests. Can. J. For. Res. 34(9):1908-1918. Rubin, B., and D. MacFarlane. 2008. Using the Space-Time Permutation Scan Statistic to Map Anomalous Diameter Distributions Drawn from Landscape-Scale Forest Inventories. For. Sci. 54(5):523-533. Spiering, D.J., and R.L. Knight. 2005. Snag density and use by cavity-nesting birds in managed stands of the Black Hills National Forest. For. Ecol. Manag. 214(1-3):40-52. Stephens, S.L. 2004. Fuel loads, snag abundance, and snag recruitment in an unmanaged Jeffrey pine-mixed conifer forest in Northwestern Mexico. For. Ecol. Manag. 199(1):103-113. Sturtevant, B.R., J.A. Bissonette, J.N. Long, and D.W. Roberts. 1997. Coarse woody debris as a function of age, stand structure, and disturbance in boreal Newfoundland. Ecological Applications 7(2):702-712. Tu, W. 2002. Zero-inflated data. In: El-Shaarawi, A.H., Peiegorsch, W.W. (Eds.), Encyclopedia on Envirometrics. John Wiley and Sons, Chichester. 4:2387-2391. Tyrrell, L.E., and T.R. Crow. 1994. Dynamics of dead wood in old-growth hemlock-hardwood forests of northern Wisconsin and northern Michigan. Can. J. For. Res. 24(8):1672-1683. USDA Forest Service. 2005. Forest inventory and analysis national core field guide, volume 1: field data collection procedures for phase 2 plots, version 3.0. USDA For. Serv.:203p. 107 USDA Forest Service. 2008. Forest Inventory and Analysis National Program. USDA Forest Service. Vasiliauskas, R., A. Vasiliauskas, J. Stenlid, and A. Matelis. 2004. Dead trees and protected polypores in unmanaged north-temperate forest stands of Lithuania. For. Ecol. Manag. 193(3):355-370. Voller, J., and S. Harrison. 1998. Conservation Biology Principles for Forested Landscapes. Univ of British Columbia Pr. Wisdom, M.J., and L.J. Bate. 2008. Snag density varies with intensity of timber harvest and human access. For. Ecol. Manag. 255(7):2085-2093. Woodall, C.W., G.M. Domke, D.W. MacFarlane, and C.M. Oswalt. 2011. Comparing field- and model-based standing dead tree carbon stock estimates across forests of the United States. Forestry. In Review. Young, L.J., and J.H. Young. 1998. Statistical ecology: a population perspective. Kluwer Academic Pub. 108