DEVELOPMENT AND APPLICATION OF HIERARCHICAL MODELS FOR MONITORING AVIAN SOUNDSCAPES, POPULATIONS, AND COMMUNITIES By Jeffrey W. Doser A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Forestry – Doctor of Philosophy Ecology, Evolution, and Behavior – Dual Major 2022 ABSTRACT DEVELOPMENT AND APPLICATION OF HIERARCHICAL MODELS FOR MONITORING AVIAN SOUNDSCAPES, POPULATIONS, AND COMMUNITIES By Jeffrey W. Doser Climate change, land use change, and other anthropogenic pressures are increasing species extinc- tions, phenology shifts, and drastic population declines. Avian populations and communities are particularly vulnerable to global change given their mobile and migratory life history strategies. Avian abundance has drastically declined throughout North America over several decades, which is compounded by phenological shifts in breeding periods and migratory patterns. Informed man- agement and conservation of avian populations and communities requires large-scale monitoring programs, as well as associated inferential tools to provide statistically robust inference using mul- tiple data sources. In this dissertation, I develop a suite of hierarchical modeling approaches to understand avian soundscapes, populations, and communities. I leverage a hierarchical Bayesian modeling framework, which is ideally suited for complex wildlife data with numerous types of observation error and dependencies among data points. In Chapter 1, I provide a brief overview of avian monitoring approaches and their associated statistical analysis frameworks. In Chapters 2 and 3, I develop hierarchical models for the analysis of complex avian soundscape data, and apply these approaches to two case studies. In Chapter 2, I apply a two-stage hierarchical beta regression model to quantify the relationship between anthropogenic and biological sounds in avian soundscapes in western New York. In Chapter 3, I use a multivariate linear mixed model to assess disturbance impacts of a shelterwood logging on avian soundscapes in northern Michi- gan. In Chapter 4, I develop a multi-region, multi-species abundance model to quantify trends of avian species and communities using point count data across a network of National Parks in the northeastern US. In Chapters 5 and 6, I use a model-based data integration approach to yield improved inference on avian population and communities. In Chapter 5, I integrate automated acoustic recording data with point count data to estimate avian abundance, which I apply to a case study on the Eastern Wood Pewee (Contopus virens) in a National Historical Park in Vermont. In Chapter 6, I develop an integrated community occupancy model that combines multiple types of detection-nondetection data for inference on species-specific and community level occurrence dynamics, which I use to assess occurrence dynamics of a foliage-gleaning bird community in New Hampshire. These results exhibit the value of hierarchical models to partition ecological data into distinct observation and ecological components for improved inference on avian population and community dynamics. Future work should continue to leverage complex data sources within hierarchical modeling frameworks to address pressing conservation and management questions on avian populations, communities, and the ecosystem services they provide. ACKNOWLEDGEMENTS I would like to thank my major advisor, Dr. Andrew Finley, for his constant support, guidance, and friendship. I would like to thank Dr. Elise Zipkin, whose continued mentorship and guidance has been pivotal to my growth as a researcher and person. Thank you to the rest of my guidance committee, Dr. Gary Roloff and Dr. Robert Tempelman, for their guidance and feedback throughout my program. I would like to thank Dr. Aaron Weed for his invaluable mentorship, as well as the numerous NPS and NETN staff who collected and contributed to the data used in Chapters 4 and 5 of this work. I am indebted to the late Dr. Stuart Gage, who generously shared the data used in Chapter 3. I am beyond grateful to my additional collaborators for their critical contributions to this work: Dr. Kristina Hannam, Dr. Eric Kasten, Dr. Katherine Miller, Wendy Leuenberger, Dr. Scott Sillett, and Dr. Michael Hallworth. Additionally, I would like to thank all the forestry administrative staff members, in particular Katie James, Renee Tiley, and Sandra Dunnebacke, who went above and beyond to provide assistance throughout my program. I am also thankful to all members of the Geospatial and Zipkin labs for their support, feedback, and camaraderie. I would like to thank the numerous volunteers at the Hubbard Brook Experimental Forest, National Ecological Observatory Network, and North American Breeding Bird Survey that collected the data used in Chapter 6. I would especially like to thank my family, in particular my parents Jim and Betsy Doser, for their love and support that made this dissertation possible. Lastly, to my partner, Gabriela. I cannot express how much you have helped me grow not only as a person, but as a researcher as well. Spending time with you has truly been, and will continue to be, the brightest part of every day. iv TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Biodiversity crisis and avian declines . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Avian population and community monitoring . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Point count surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Autonomous recording units . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Hierarchical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Recent hierarchical modeling developments in wildlife ecology . . . . . . . . . . . 6 1.4.1 Methods for ARU data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4.2 Multiregion models and data integration . . . . . . . . . . . . . . . . . . . 7 1.5 Overview of chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 CHAPTER 2 CHARACTERIZING FUNCTIONAL RELATIONSHIPS BETWEEN AN- THROPOGENIC AND BIOLOGICAL SOUNDS: A WESTERN NEW YORK STATE SOUNDSCAPE CASE STUDY . . . . . . . . . . . . . . . . 10 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 CHAPTER 3 ASSESSING SOUNDSCAPE DISTURBANCE THROUGH HIERAR- CHICAL MODELS AND ACOUSTIC INDICES: A CASE STUDY ON A SHELTERWOOD LOGGED NORTHERN MICHIGAN FOREST . . . . 12 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 CHAPTER 4 TRENDS IN BIRD ABUNDANCE DIFFER AMONG PROTECTED FORESTS BUT NOT BIRD GUILDS . . . . . . . . . . . . . . . . . . . . . 14 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 CHAPTER 5 INTEGRATING AUTOMATED ACOUSTIC VOCALIZATION DATA AND POINT COUNT SURVEYS FOR ESTIMATION OF BIRD ABUN- DANCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 CHAPTER 6 INTEGRATED COMMUNITY OCCUPANCY MODELS: A FRAME- WORK TO ASSESS OCCURRENCE AND BIODIVERSITY DYNAM- ICS USING MULTIPLE DATA SOURCES . . . . . . . . . . . . . . . . . . 18 6.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.3.1 Integrated community occupancy model . . . . . . . . . . . . . . . . . . . 22 6.3.1.1 Ecological Process Model . . . . . . . . . . . . . . . . . . . . . 23 v 6.3.1.2 Observation Model: replicated detection-nondetection data . . . 24 6.3.1.3 Observation Model: nonreplicated detection-nondetection data . 25 6.3.1.4 Linking species models across the community . . . . . . . . . . 26 6.3.1.5 Data integration via joint likelihood . . . . . . . . . . . . . . . . 27 6.3.2 Simulation study 1: Assessing benefits of integration . . . . . . . . . . . . 27 6.3.3 Simulation study 2: Assessing benefits of community modeling . . . . . . 28 6.3.4 Case study: foliage-gleaning birds in the White Mountains . . . . . . . . . 29 6.3.4.1 Goodness of fit and model validation . . . . . . . . . . . . . . . 32 6.3.5 Model implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.4.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.4.2 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 CHAPTER 7 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 7.1 Research synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 7.2 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2.1 Integration of soundscape, population, and community level analyses . . . . 47 7.2.2 Software development for models of species and community level occur- rence dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 7.2.3 Nonstationary and multi-scaled climate and land use effects on avian communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7.3 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 vi LIST OF TABLES Table 6.1: Characteristics of the three data sources used to model occurrence dynamics of twelve foliage-gleaning birds in the White Mountain National Forest from 2010-2018. Forest Cover corresponds to the amount of forest within a 250m radius of a point count site. Values for elevation and forest cover are mean (minimum, maximum). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Table 6.2: Precision and accuracy of species-specific parameter estimates when using the integrated community occupancy model (ICOM) compared to a single species integrated distribution model (IDM) for a simulated community of 25 species over six years across 100 simulations with one replicated (REP) data set and one nonreplicated (NREP) data set. Precision improvement is the percentage improvement in precision when using the ICOM compared to the IDM, where precision is defined as the difference between the 2.5% and 97.5% quantiles of the posterior means. Bias is the average magnitude of the posterior means minus the true simulated value. Values are averaged across all 25 species and six years. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Table 6.3: Two-fold cross validation results comparing predictive performance across models using different combinations of the three data sets. The fitted model is shown in the first column and log predictive density measures are shown for the entire community with each data set and across all data sets. Values in parentheses show the average rank of the model for an individual species across all models, with 1 indicating the model is the best for all species and 7 indicating the the model is worst for all species. Bold values indicate the best performing model for each individual data set. Predictive performance of the model using only NEON data is only assessed at NEON locations because NEON data are only available for four of the nine study years. . . . . . . . . . . 38 Table A.1: Bayesian p-value goodness of fit assessment for the foliage-gleaning bird case study using the Chi-square fit statistic. Bayesian p-values between 0.1 and 0.9 indicate adequate model fit to each data source (Hobbs and Hooten, 2015). . . . 58 Table A.2: Number of unique observations at each site and year combination of the twelve foliage-gleaning bird species in the three data sets used in the case study in White Mountain National Forest from 2010-2018. Numbers in parentheses indicate total number of observations including pseudoreplicates (i.e., repeated detection of a bird species at a site in the same year). . . . . . . . . . . . . . . . 59 vii Table A.3: Posterior means and standard deviations of species-specific occurrence covari- ate effects from the integrated community occupancy model (Mean (SD)), as well as the overall community level effects (COMM). Boldface indicates 95% credible interval does not overlap zero. . . . . . . . . . . . . . . . . . . . . . . . 61 Table A.4: Precision of community-level occurrence parameters from models using dif- ferent amounts of the three data sets. Precision is measured as the width of the 95% credible interval. Boldface indicates the model with the highest precision for a given parameter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Table A.5: Precision of species-level occurrence parameters from models using different amounts of the three data sets. Species-specific values are averaged across the twelve species, while species and year-specific intercepts are averaged across the twelve species and nine years. Precision is measured as the width of the 95% credible interval. Boldface indicates the model with the highest precision for a given parameter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Table A.6: Two-fold cross validation results comparing predictive performance of single- species integrated distribution models (IDM) to the integrated community occupancy model (ICOM). Values are the log predictive density, where higher values indicate better predictive performance. Results are not shown for Black- poll Warbler due to insufficient data required to fit the single-species model. . . . 63 viii LIST OF FIGURES Figure 6.1: Study location for the case study. Panel (A) shows the White Mountain National Forest (shaded dark grey region) and the location of the Hubbard Brook Experimental Forest (HBEF; light grey region), the BBS routes (dark blue lines), and the NEON data from Bartlett Forest (purple region). Panel (B) shows the distribution of point count locations in HBEF, and Panel (C) shows the distribution of points in the NEON data set. Note different axis spacings across the three plots. . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Figure 6.2: Sampling distribution of estimated bias in simulated species level occurrence intercepts (A) and covariate effects (B) under models using different combi- nations of a replicated (REP) data set and nonreplicated data sets with low (NREP L) and high (NREP H) detection probability. Points represent the median bias (posterior mean - true simulated value) in a species-level effect across 100 simulations for a community of 25 species. Lines represent the 95% quantiles of the bias values. The intercept parameter using only NREP L is not shown as it failed to converge. . . . . . . . . . . . . . . . . . . . . . . 34 Figure 6.3: Estimated average site-level species richness (A) and Jaccard index (B) of a community of twelve foliage-gleaning bird species in the White Mountain National Forest. Points are posterior means. Jaccard index values are relative to a single site in the Hubbard Brook Experimental Forest with value 1, with 0 indicating no species in common to the reference site, and 1 indicating identical community composition to the reference site. . . . . . . . . . . . . . . 36 Figure 6.4: Average occurrence probabilities of twelve foliage-gleaning bird species in the White Mountain National Forest from 2010 to 2018 from models using different subsets of the three data sources. Points show posterior mean oc- currence probabilities averaged across all sites in a given year. Gray shaded region indicates the 95% credible interval of estimates derived from the model that incorporated all three data sources. . . . . . . . . . . . . . . . . . . . . . . 37 Figure A.1: Sampling distribution of estimated bias in simulated community level occur- rence intercepts (A) and covariate effects (B) under models using different combinations of a replicated (REP) data set and a nonreplicated data set with low detection probability (NREP L) and with high detection probability (NREP H). Points represent the median bias (posterior mean - true simulated value) in a community-level effect across 100 simulations for a community of 25 species. Lines represent the 95% quantiles. . . . . . . . . . . . . . . . . . 57 ix Figure A.2: Sampling distributions of estimated occurrence intercepts for a single year for a simulated community of 25 species across 100 simulations when using the integrated community occupancy model (ICOM) or a single species inte- grated distribution model (IDM). Points represent the median posterior mean across all 100 simulations, and lines represent the 2.5% and 95% quantiles of the posterior means across the 100 simulations. Black points are the true simulated values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Figure A.3: Probability of positive trends in occurrence probabilities of twelve foliage- gleaning bird species in the White Mountain National Forest from 2010-2018. Color indicates the posterior mean trend estimate from a post-hoc simple linear regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Figure A.4: Estimated occurrence probabilities of twelve foliage-gleaning bird species across an elevation gradient. Black lines are posterior mean occurrence prob- abilities with 95% credible intervals denoted by the gray region. Occurrence probabilities correspond to the probability of a species occupying a site in 2014 that was not occupied in the previous year. . . . . . . . . . . . . . . . . . 61 x CHAPTER 1 INTRODUCTION 1.1 Biodiversity crisis and avian declines Biodiversity loss is one of the most pressing issues of the Anthropocene, with already high species extinction rates expected to increase further into the future as a result of climate change, land use change, and other anthropogenic pressures (Dirzo and Raven, 2003; Ceballos et al., 2010, 2015). Species extinctions alone are only one mechanism in which global change is influencing biodiversity, as the composition and functional diversity of communities have also rapidly shifted, in large part due to shifts in individual species ranges (Chen et al., 2011; Rushing et al., 2020). Such shifts in community composition can have potentially profound impacts on ecosystem functioning and the ecosystem services human society relies on (Tilman et al., 2014). Further, populations of a wide variety of organisms, including birds (Rosenberg et al., 2019), bats (Rodhouse et al., 2019), and insects (Wagner et al., 2021), have shown significant declines in their distributions and abundances across large geographical regions. Given the numerous ecosystem services provided by individual populations, such losses can have drastic impacts on ecosystem functioning. Avian populations and communities in particular are vulnerable to the effects of climate and land use change due to a reliance on numerous locations across broad spatial scales at specific times of the year (Smith et al., 2011; Hallworth et al., 2021). Key studies have shown large declines in avian abundance over several decades across vast spatial regions in both North America (Rosenberg et al., 2019) and Europe (Inger et al., 2015). Phenological changes in the timing of breeding periods and migrations are occurring in Europe (Hällfors et al., 2020) and North America (Youngflesh et al., 2021) but are inconsistent across different groups of species, which has important implications for reproductive success as global change progresses. Similar inconsistencies across avian assemblages have led to differential species range shifts in response to climate change (Rushing et al., 2020). Despite these findings, little is still understood about specific drivers of population declines and 1 community shifts, relative importance of drivers, and specific management and conservation actions most beneficial to avian populations and communities at local, regional, and continental scales (Breiner et al., 2015; Kindsvater et al., 2018). Such informed management and conservation requires efficient monitoring of avian populations and communities at fine resolution across large scales, as well as associated statistical tools to draw inference from monitoring initiatives. 1.2 Avian population and community monitoring 1.2.1 Point count surveys Birds are one of the most well-monitored animal groups. The primary survey method for avian population and community monitoring is the point count survey (Bibby et al., 2000), in which observers stand stationary at a specific location and count the number of individual birds seen or heard within some pre-specified distance from the location. Observers identify birds to the species level, most commonly resulting in counts of each species at each individual point count site. These data provide information pertinent to knowledge of avian habitat use and the abundance of different species across space and/or time. However, point count data are confounded by imperfect detection (i.e., false negatives) during the sampling process (MacKenzie et al., 2002). An observer may miss an individual of a given species (or miss an entire species) if it is truly absent, or the observer might not detect the individual even if it is truly present. This imperfect detection process is further exacerbated in highly mobile species like birds (Kéry and Royle, 2016), as birds may move out of the sampling area during the survey or may be in the sampling area but not available for detection (e.g., an individual is not vocalizing). Thus, observed counts of birds from point count surveys confound this imperfect detection process with the true distribution of birds across space and/or time. This is especially true when the imperfect detection process varies across space (e.g., probability of detection is lower in heavily forested areas) and/or time (e.g., avian vocalization rates vary over the duration of the breeding season; Kéry et al. 2005; Kéry and Schmidt 2008). Failing to account for this detection process in models of avian occurrence and/or abundance can result in biased estimates of population size as well as biased effects of environmental drivers (Kéry et al., 2 2010a). There exist a multitude of ways to separately estimate the detection process from the true ecological process of interest, all of which rely on the collection of additional information on detection probability (i.e., the probability of detecting an individual given it is present; MacKenzie and Royle 2005; Nichols et al. 2009). Most commonly, observers will perform multiple point count surveys at each specific location during some period of closure (i.e., no permanent immigration or emigration). The variation in the data collected in one survey versus another provides information on the detection probability of individual birds, which allows for separate estimation of the detection probability from the ecological process of interest (e.g., abundance, occurrence; Nichols et al. 2009). Alternative approaches to estimate detection probability include using multiple independent observers, measuring the detection distance to a bird (Burnham et al., 1980; Buckland et al., 2005), and recording the specific time a detection occurs (Farnsworth et al., 2002; MacKenzie and Royle, 2005). 1.2.2 Autonomous recording units Over the last twenty years, autonomous acoustic recording units (ARUs) have emerged as a potential alternative for monitoring acoustically active wildlife like birds (Darras et al., 2019; Gibb et al., 2019). ARUs are primarily used to monitor wildlife in two ways. First, ARUs can be deployed to monitor spatiotemporal changes in soundscapes, or the sounds present at a given location at any given point in time (Pijanowski et al., 2011a,b). The recently developed fields of soundscape ecology and ecoacoustics rely on the premise that the soundscape, and its variation over space and time, can reflect changes in ecological status and landscapes, ultimately suggesting soundscapes can be used as alternative ecological units to study change in wildlife populations and communities (Sueur and Farina, 2015; Farina, 2019). A typical soundscape analysis involves the collection of acoustic recordings using ARUs across a spatiotemporal gradient, quantification of the recordings using numerical summaries of the recordings referred to as acoustic indices (Sueur et al., 2008, 2014), and ecological interpretation of these acoustic indices in the context of the wildlife community 3 that comprises the vocalizations in the soundscape. Such ecoacoustic approaches have successfully revealed effects of natural gas extraction on biodiversity in tropical forests (Deichmann et al., 2017), timber harvesting on biodiversity (Law et al., 2018; Burivalova et al., 2021), and artificial light on bat communities (Mena et al., 2021). Despite the utility of acoustic indices, their relationship with biodiversity and landscape quality is variable across urbanization gradients and ecosystems (Fairbrass et al., 2017; Buxton et al., 2018). Alternatively, ARUs can provide information on the presence/absence (or abundance) of multiple species at different sites, analogous to traditional point count surveys (Shonfield and Bayne, 2017). Most commonly, experienced ornithologists will listen to acoustic recordings and extract detection-nondetection data for the species of interest (e.g., Furnas and Callas 2015). More recently, fully automated machine learning algorithms are used to determine whether or not a species is present in a specific recording (Ducrettet et al., 2020), although at least some manual verification of the recordings must occur to account for the potential of false positives. ARUs can also provide information on the abundance and/or density of individuals within a study area (Pérez- Granados and Traba, 2021), which typically relies on estimation of distances to the individuals identified in the recording (Darras et al., 2018; Yip et al., 2019) or calibration with other ARUs in a multi-sensor array using spatial capture-recapture techniques (Efford et al., 2009). Ultimately, ARUs are an attractive alternative to traditional point count surveys as they create a permanent record of the survey that can be reanalyzed, require minimal observer training, and usually provide more temporal replicates at a specific location than are possible using human observers (Darras et al., 2019). 1.3 Hierarchical models Hierarchical Bayesian modeling has emerged as a common statistical paradigm for working with complex ecological data, like those derived from point count surveys and ARUs (Royle and Dorazio, 2008; Hobbs and Hooten, 2015; Kéry and Royle, 2021). Hierarchical models contain multiple random variables that are structured according to their conditional probability, which provides 4 an intuitive way to separately model observation processes (i.e., imperfect detection) from the true ecological process of interest (Hobbs and Hooten, 2015; Hooten and Hefley, 2019). At the most general level, a Bayesian hierarchical model comprises the following series of conditional probability distributions (Berliner, 1996) [process, parameters | data] ∝[data | process, parameters]× [process | parameters]× (1.1) [parameters], where process refers to the ecological process of interest (e.g., occurrence, abundance), data refers to the observed data that are an imperfect representation of the true ecological process, and parameters are the parameters governing both the ecological process and observation process. The first stage of the hierarchical model is the likelihood ([data | process, parameters]), or observation model, which describes the imperfect relationship between the observed data and the true ecological process of interest. The second stage is the process model ([process | parameter]), which models the true ecological process of interest conditional on a set of parameters that potentially relate to the process of interest. The final stage is the prior models ([parameters]), which completes the Bayesian specification of a hierarchical model. Together, these three components enable inference on the joint posterior distribution ([process, parameters | data]), which describes understanding of the true ecological process and the parameters that govern it after observing the data. This general hierarchical model is ideally suited for imperfect data used to monitor wildlife such as birds because it provides an explicit way to account for observation error (e.g., imperfect detection; Royle and Dorazio 2006). Such a hierarchical framework allows for inclusion of covariate effects and observation processes separately from the model from the true ecological process of interest, which ultimately can lead to improved inference and predictive performance (Kéry and Schmidt, 2008; Guélat and Kéry, 2018). To this end, researchers have developed models that are specific forms of Equation 1.1 to address imperfect detection and observation errors associated with common wildlife data collection approaches, such as point count surveys and 5 ARUs. For example, occupancy models are a form of species distribution model (SDM) that leverage replicated detection-nondetection data to separately model the imperfect detection process from the true species occurrence process (MacKenzie et al., 2002). Numerous extensions of occupancy models exist, including dynamic occupancy models for modeling colonization and extinction dynamics (MacKenzie et al., 2003) and multispecies occupancy models (Dorazio and Royle, 2005; Zipkin et al., 2010) that leverage a hierarchical framework to share information across species. For modeling abundance, a series of hierarchical models of the form of Equation 1.1 have been developed to separately model true species abundance from an imperfect observation process, including N-mixture models (Royle, 2004), double observer methods (Nichols et al., 2000), distance sampling (Buckland et al., 2005), and removal sampling (Farnsworth et al., 2002). Such single species approaches can all be extended to a multispecies framework to model an entire community of species simultaneously (e.g., Yamaura et al. 2011; Sollmann et al. 2016; Chandler et al. 2013). 1.4 Recent hierarchical modeling developments in wildlife ecology 1.4.1 Methods for ARU data As ecological data become more complex and span larger spatio-temporal extents, there is widespread interest in the development of appropriate statistical tools to analyze such data. Acoustic data col- lected from ARUs are becoming more prominently used throughout monitoring programs designed for different research, management, and conservation initiatives. Given the vast amount of data ARUs provide, appropriate statistical methods must be applied to yield valid ecological inference from these data. Acoustic indices are often used as numerical summaries of large soundscape data sets in the field of ecoacoustics, resulting in complex data that are often multivariate, composi- tional, non-Gaussian, and/or highly correlated across space and/or time. Contemporary ecological statistics literature offers rigorous approaches to dealing with such data (e.g., Clark 2007; Hobbs and Hooten 2015), however, applied methodology is scarcely available in the ecoacoustics litera- ture. Despite recent advances in standardization of the use of acoustic indices (Bradfer-Lawrence et al., 2019, 2020), there is a clear need for flexible statistical approaches for analyzing complex 6 ecoacoustic data (Gibb et al., 2019; Sugai et al., 2019). In contrast to the field of ecoacoustics, there has been more statistical modeling development for the use of ARUs to monitor individual (or multiple) species distributions across space and/or time. Through manual validation of acoustic recordings arising from ARUs, researchers can obtain detection-nondetection histories of different species of interest, which can be used in a series of traditional occupancy modeling approaches to study species habitat use across potentially vast spatio-temporal regions (Furnas and Callas, 2015; Wood et al., 2019, 2021). Further, new hierarchical modeling approaches have been developed that directly incorporate semi-automated clustering or machine learning algorithms into occupancy modeling frameworks, which reduces the time required to manually validate individual vocalizations in acoustic recordings (Chambert et al., 2018; Wright et al., 2020b). While such approaches are promising for the evaluation of species distributions, there are few approaches to reliably estimate abundance/density from ARUs without the deployment of multi-sensor arrays (which is cost intensive across broad regions) or extensive validation of acoustic recordings (Pérez-Granados and Traba, 2021). 1.4.2 Multiregion models and data integration Two recent developments in statistical ecology for modeling species populations and communities across large spatio-temporal regions are multiregion models (Sutherland et al., 2016) and data integration (Zipkin and Saunders, 2018; Miller et al., 2019). Multiregion community models (Sutherland et al., 2016) extend the multispecies occupancy model (Dorazio and Royle, 2005) to estimate occurrence and species richness across multiple distinct spatial units (i.e., regions) that collect data following a similar protocol. This approach is ideally suited for monitoring and conservation networks that are structured hierarchically (e.g., a network of individual National Parks), as it allows for the estimation of distinct region level estimates as well as overall network level estimates. This approach has been used to reveal effects of habitat connectivity and area on amphibian communities (Wright et al., 2020a), effects of management regimes on tropical mammal diversity (Oberosler et al., 2020), and urbanization effects on North American mammal 7 communities (Fidino et al., 2020). Currently, multiregion community approaches have been limited to assessments of species occurrence and richness. Extensions of the multiregion community approach to model abundance across multiple distinct spatial regions could provide improved insight on wildlife populations and communities at multiple spatial scales. Data integration is a model-based approach to combining multiple disparate data sources (Zipkin and Saunders, 2018; Isaac et al., 2020). This approach is gaining widespread popularity due to the proliferation of large-scale citizen science monitoring programs (e.g., eBird) and continental scale governmental monitoring programs (e.g., BBS, NEON) in addition to smaller scale monitoring initiatives. The integration of multiple data sources can yield benefits, such as improved accuracy and precision (Dorazio, 2014), a larger spatio-temporal extent (Zipkin et al., 2021), and the ability to accommodate multiple sampling and detection biases that are not always possible in single data source models (Miller et al., 2019; Simmonds et al., 2020; Lauret et al., 2021). Although many data types have been used in integrated models, such as presence-only data (Dorazio, 2014), count and detection-nondetection data (Zipkin et al., 2017), distance sampling data (Farr et al., 2021), and capture-recapture data (Abadi et al., 2010b), many common data sources, such as automated and semi-automated acoustic vocalization data, have yet to be leveraged in these promising modeling frameworks. Further, despite the clear utility of single species integrated models, there are only a few instances of the use of model-based data integration to jointly assess multiple species in a community (Péron and Koons, 2012; Lahoz-Monfort et al., 2017; Barraquand and Gimenez, 2019). The combination of data integration together with hierarchical community modeling principles has the potential to improve understanding of community dynamics across potentially vast spatio- temporal regions. 1.5 Overview of chapters The following dissertation includes five main chapters. The overall goal of this work is to develop novel hierarchical modeling approaches for analyzing multiple data types for inference on avian soundscapes, populations, and communities. I subsequently apply these modeling approaches to a 8 series of case studies in which I showcase the ecological inference that is possible using advanced hierarchical Bayesian models and the benefits these approaches can provide for improved avian monitoring, management, and conservation. Below I provide a brief summary of the chapters. Chapters 2 and 3 focus on the development of statistical approaches for the analysis of complex avian soundscape data. In Chapter 2, I develop a two-stage hierarchical beta regression model to quantify the relationship between anthropogenic and biological sounds in avian soundscapes in western New York. In Chapter 3, I develop a multivariate linear mixed model for the assessment of disturbance impacts on soundscapes, which I use to explore the effects of a shelterwood logging on avian soundscapes in northern Michigan. Chapter 4 focuses on the development of a multi-region, multispecies abundance model to assess trends of avian communities across a network of spatial locations using traditional point count data. I apply the model to a network of National Parks in the Northeast Temperate Inventory and Monitoring Network in the northeastern US to assess species-level, guild-level, and community- level abundance trends from 2006-2019. In Chapters 5 and 6, I develop two integrated modeling frameworks for combining multiple data sources to yield improved inference on bird population and community dynamics. In Chapter 5, I integrate automated acoustic recording data with traditional point count surveys to yield improved estimates of bird abundance, and apply the framework to elucidate trends in the Eastern Wood Pewee (Contopus virens) in a National Historical Park in Vermont. In Chapter 6, I develop an integrated community occupancy model that integrates multiple types of replicated and nonrepli- cated detection-nondetection data for inference on species-specific and community level occurrence dynamics. I then apply this framework to assess varying trends in foliage-gleaning bird abundance from 2010-2018 in the White Mountain National Forest in New Hampshire. Finally, I provide a summary of key research findings from the five chapters and a discussion on future research directions in Chapter 7. 9 CHAPTER 2 CHARACTERIZING FUNCTIONAL RELATIONSHIPS BETWEEN ANTHROPOGENIC AND BIOLOGICAL SOUNDS: A WESTERN NEW YORK STATE SOUNDSCAPE CASE STUDY 2.1 Abstract Roads are a widespread feature of landscapes worldwide, and road traffic sound potentially makes nearby habitat unsuitable for acoustically communicating organisms. It is important to understand the influence of roads at the soundscape level to mitigate negative impacts of road sound on indi- vidual species as well as subsequent effects on the surrounding landscape. We seek to characterize the relationship between anthropogenic and biological sounds in western New York and assess the extent to which available traffic data explains variability in anthropogenic noise. Recordings were obtained in the spring of 2016 at 18 sites throughout western New York. We used the Welch Power Spectral Density (PSD) at low frequencies (0.5-2 kHz) to represent anthropogenic noise and PSD values at higher frequencies (2-11 kHz) to represent biological sound. Relationships were modeled using a novel two-stage hierarchical Bayesian model utilizing beta regression and basis splines. Model results and map predictions illustrate that anthropogenic noise and biological sound have an inverse relationship, and anthropogenic noise is greatest in close proximity to high traffic volume roads. The predictions have large uncertainty, resulting from the temporal coarseness of public road data used as a proxy for traffic sound. Results suggest that finer temporal resolution traffic sound data, such as crowd-sourced time-indexed traffic data from geographic positioning systems, might better account for observed temporal changes in the soundscape. The use of such data, in combination with the proposed modeling framework, could have important implications for the development of sound management policies. Material from: Doser, J. W., Hannam, K. M., & Finley, A. O. (2020). Characterizing functional relationships between anthropogenic and biological sounds: a western New York state soundscape 10 case study. Landscape Ecology, 35(3), 689-707. For full text of this work, please go to: https://doi.org/10.1007/s10980-020-00973-2 11 CHAPTER 3 ASSESSING SOUNDSCAPE DISTURBANCE THROUGH HIERARCHICAL MODELS AND ACOUSTIC INDICES: A CASE STUDY ON A SHELTERWOOD LOGGED NORTHERN MICHIGAN FOREST 3.1 Abstract Assessing the effects of anthropogenic disturbances on wildlife and natural resources is a necessary conservation task. The soundscape is a critical habitat component for acoustically communicating organisms, but the use of the soundscape as a tool for assessing disturbance impacts has been relatively unexplored until recently. Here we present a broad modeling framework for assessing disturbance impacts on soundscapes, which we apply to quantify the influence of a shelterwood logging on soundscapes in northern Michigan. Our modeling approach can be broadly applied to assess anthropogenic disturbance impacts on soundscapes. The approach accommodates inherent differences in control and treatment sites to improve inference about treatment effects, while also accounting for extraneous variables (e.g., rain) that influence acoustic indices. Recordings were obtained at 13 sites before and after a shelterwood logging. Four sites were in the logging region and nine sites served as control recordings outside the logging region. We quantify the soundscapes us- ing common acoustic indices (Normalized Difference Soundscape Index (NDSI), Acoustic Entropy (H), Acoustic Complexity Index (ACI), Acoustic Evenness Index (AEI)) and Welch Power Spectral Density (PSD) values. We build two hierarchical Bayesian models to quantify the changes in the soundscape over the study period. Our analysis reveals no long-lasting effects of the shelterwood logging on the soundscape as measured by the NDSI, but analysis of H, AEI, and PSD suggest changes in the evenness of sounds across the frequency spectrum, indicating a potential shift in the avian species communicating in the soundscapes as a result of the logging. Further, our analysis confirms previous findings that the ACI does not accurately reflect changes in landscape config- uration. Multiple model validation techniques (i.e., comparison of parameter estimates and the widely applicable information criterion (WAIC)) reveal our proposed hierarchical Bayesian models 12 outperform more simple models used for hypothesis testing. Acoustic recordings, in conjunction with this modeling framework, can deliver cost efficient assessment of disturbance impacts on the landscape and underlying biodiversity as represented through the soundscape. Material from: Doser, J. W., Finley, A. O., Kasten, E. P., & Gage, S. H. (2020). Assessing sound- scape disturbance through hierarchical models and acoustic indices: A case study on a shelterwood logged northern Michigan forest. Ecological Indicators, 113, 106244. For full text of this work, please go to: https://doi.org/10.1016/j.ecolind.2020.106244 13 CHAPTER 4 TRENDS IN BIRD ABUNDANCE DIFFER AMONG PROTECTED FORESTS BUT NOT BIRD GUILDS 4.1 Abstract Improved monitoring and associated inferential tools to efficiently identify declining bird pop- ulations, particularly of rare or sparsely distributed species, is key to informed conservation and management across large spatio-temporal regions. We assess abundance trends for 106 bird species in a network of eight forested national parks located within the northeast U.S.A. from 2006-2019 using a novel hierarchical model. We develop a multi-species, multi-region removal sampling model that shares information across species and parks to enable inference on rare species and sparsely sampled parks and to evaluate the effects of local forest structure. Trends in bird abun- dance over time varied widely across parks, but species showed similar trends within parks. Three parks (Acadia National Park and Marsh-Billings-Rockefeller and Morristown National Historical Parks (NHP)) decreased in bird abundance across all species, while three parks (Saratoga NHP and Roosevelt-Vanderbilt and Weir-Farm National Historic Sites) increased in abundance. Bird abundance peaked at medium levels of basal area and high levels of percent forest and forest regen- eration, with percent forest having the largest effect. Variation in these effects across parks could be a result of differences in forest structural stage and diversity. By sharing information across both communities and parks, our novel hierarchical model enables uncertainty-quantified estimates of abundance across multiple geographical (i.e., network, park) and taxonomic (i.e., community, guild, species) levels over a large spatio-temporal region. We found large variation in abundance trends across parks but not across bird guilds, suggesting that local forest condition might have a broad and consistent effect on the entire bird community within a given park. Research should target the three parks with overall decreasing trends in bird abundance to further identify what specific factors are driving observed declines across the bird community. Understanding how bird communities 14 respond to local forest structure and other stressors (e.g., pest outbreaks, climate change) is crucial for informed and lasting management. Material from: Doser, J. W., Weed, A. S., Zipkin, E. F., Miller, K. M., & Finley, A. O. (2021). Trends in bird abundance differ among protected forests but not bird guilds. Ecological Applica- tions, e2377. For full text of this work, please go to: https://doi.org/10.1002/eap.2377 15 CHAPTER 5 INTEGRATING AUTOMATED ACOUSTIC VOCALIZATION DATA AND POINT COUNT SURVEYS FOR ESTIMATION OF BIRD ABUNDANCE 5.1 Abstract Monitoring wildlife abundance across space and time is an essential task to study their population dynamics and inform effective management. Acoustic recording units are a promising technology for efficiently monitoring bird populations and communities. While current acoustic data models provide information on the presence/absence of individual species, new approaches are needed to monitor population abundance, ideally across large spatio-temporal regions. We present an integrated modeling framework that combines high-quality but temporally sparse bird point count survey data with acoustic recordings. Our models account for imperfect detection in both data types and false positive errors in the acoustic data. Using simulations, we compare the accuracy and precision of abundance estimates using differing amounts of acoustic vocalizations obtained from a clustering algorithm, point count data, and a subset of manually validated acoustic vocalizations. We also use our modeling framework in a case study to estimate abundance of the Eastern Wood- Pewee (Contopus virens) in Vermont, U.S.A. The simulation study reveals that combining acoustic and point count data via an integrated model improves accuracy and precision of abundance es- timates compared with models informed by either acoustic or point count data alone. Improved estimates are obtained across a wide-range of scenarios, with the largest gains occurring when detection probability for the point count data is low. Combining acoustic data with only a small number of point count surveys yields estimates of abundance without the need for validating any of the identified vocalizations from the acoustic data. Within our case study, the integrated models provided moderate support for a decline of the Eastern Wood-Pewee in this region. Our integrated modeling approach combines dense acoustic data with few point count surveys to deliver reliable estimates of species abundance without the need for manual identification of acoustic vocalizations 16 or a prohibitively expensive large number of repeated point count surveys. Our proposed approach offers an efficient monitoring alternative for large spatio-temporal regions when point count data are difficult to obtain or when monitoring is focused on rare species with low detection probability. Material from: Doser, J. W., Finley, A. O., Weed, A. S., & Zipkin, E. F. (2021). Integrating automated acoustic vocalization data and point count surveys for estimation of bird abundance. Methods in Ecology and Evolution, 12(6), 1040-1049. For full text of this work, please go to: https://doi.org/10.1111/2041-210X.13578 17 CHAPTER 6 INTEGRATED COMMUNITY OCCUPANCY MODELS: A FRAMEWORK TO ASSESS OCCURRENCE AND BIODIVERSITY DYNAMICS USING MULTIPLE DATA SOURCES 6.1 Abstract The occurrence and distributions of wildlife populations and communities are shifting as a result of global changes. To evaluate whether these shifts are negatively impacting biodiversity processes, it is critical to monitor the status, trends, and effects of environmental variables on entire communities. However, modeling the dynamics of multiple species simultaneously can require large amounts of diverse data, and few modeling approaches exist to simultaneously provide species and community level inferences. I present an “integrated community occupancy model” (ICOM) that unites principles of data integration and hierarchical community modeling in a single framework to provide inferences on species-specific and community occurrence dynamics using multiple data sources. The ICOM combines replicated and nonreplicated detection-nondetection data sources using a hierarchical framework that explicitly accounts for different detection and sampling processes across data sources. I use simulations to compare the ICOM to previously developed hierarchical community occupancy models and single species integrated distribution models. I then apply the model to assess the occurrence and biodiversity dynamics of foliage-gleaning birds in the White Mountain National Forest in the northeastern USA from 2010-2018 using three independent data sources. Simulations reveal that integrating multiple data sources in the ICOM increased precision and accuracy of species and community level inferences compared to single data source models, although benefits of integration were dependent on the information content of individual data sources (e.g., amount of replication). Compared to single species models, the ICOM yielded more precise species-level estimates. Within my case study, the ICOM had the highest out-of-sample predictive performance compared to single species models and models that used only a subset of the three data sources. The ICOM provides more precise estimates of occurrence dynamics compared 18 to multi-species models using single data sources or integrated single-species models. I further found that the ICOM had improved predictive performance across a broad region of interest with an empirical case study of forest birds. The ICOM offers an attractive approach to estimate species and biodiversity dynamics, which is additionally valuable to inform management objectives of both individual species and their broader communities. 6.2 Introduction Populations and communities of a wide range of organisms, including birds (Rosenberg et al., 2019), bats (Rodhouse et al., 2019), and insects (Wagner et al., 2021), have experienced severe declines in their distributions and abundances across large geographical regions as a result of habitat loss, climate change, and other anthropogenic stressors. As a result, there is growing interest in developing enhanced monitoring techniques to estimate species’ trends and distributions. While species-level assessments can be valuable, it is critical to quantify the status and dynamics of entire communities to understand global change effects on biodiversity. However, multi-species approaches require large numbers of observations for each species or make assumptions about equal detectability of species during sampling, precluding robust assessments of rare species and overall community dynamics (Sor et al., 2017). Despite these limitations, many sampling approaches (e.g., autonomous recording units for birds and bats, camera traps for mammals) naturally provide data on the occurrence patterns of multiple species in a community. However, occurrence data for each species are confounded by imperfect detection during sampling. An observer may record a species as absent if it is indeed absent, or the observer may fail to detect the species during sampling, and thus I refer to such data as detection-nondetection data. Replicated sampling can provide additional information on species’ detection rates, allowing for the separation of the occurrence process from the detection process to enable ecologically relevant inferences. Most commonly, this information is obtained by collecting multi-species detection-nondetection data on several occasions over short time periods when closure—no permanent extinction or colonization—can be reasonably assumed (MacKenzie 19 and Royle, 2005). By accounting for imperfect detection of species during sampling, these data (hereafter “replicated data”) allow for estimation of species distributions and occurrence patterns within an occupancy modeling framework (MacKenzie et al., 2002). This additional information required to separate detection from occurrence is often more costly and time consuming to collect, which motivates the desire to use nonreplicated detection-nondetection data to model species distribution patterns. Accordingly, there is widespread interest in determining statistically robust ways to incorporate nonreplicated data sources into species and community level analyses to inform effective biodiversity conservation. Two recent statistical modeling developments, data integration and hierarchical community occupancy models, have led to improved inference on individual species and community dynamics, respectively, to help inform biodiversity assessments. Data integration is a model-based approach to combining multiple data types (Zipkin and Saunders, 2018; Miller et al., 2019). This approach yields many key benefits compared to single data source models, such as higher accuracy and precision of parameter estimates (Dorazio, 2014), inference across broader spatio-temporal extents (Zipkin et al., 2021), and the opportunity to accommodate sampling biases and imperfect detection for the various data sources (Miller et al., 2019). Data integration is particularly useful for combining large-scale nonreplicated data with replicated data, as the replicated data allow for the explicit modeling of imperfect detection while still using the information contained in the nonreplicated data. The combination of these data types can thus greatly expand the spatial scope of analysis. Hierarchical community occupancy models provide combined inference on the occurrence of multiple species using a single replicated detection-nondetection data source (Dorazio and Royle, 2005; Gelfand et al., 2005). Species-specific parameters are viewed as random effects arising from a common, community-level distribution with a mean and variance parameter representing the average effect across species in the community and the variation of species-specific effects within the community, respectively (Dorazio and Royle, 2005). In addition to providing more precise estimates of species-specific effects (Zipkin et al., 2009), these models can estimate community- 20 level parameters, as well as biodiversity metrics (Guillera-Arroita et al., 2019), all with associated uncertainty. Community occupancy models have led to improved insights into understanding species occurrence patterns for insects (Mata et al., 2017), birds (Frishkoff et al., 2016), mammals (Gallo et al., 2017), and a variety of other taxa (Devarajan et al., 2020). While substantial development of both data integration and hierarchical community models has occurred over the last decade (reviewed in Zipkin and Saunders 2018; Guillera-Arroita et al. 2019; Miller et al. 2019), there has been comparatively less research focused on modeling multiple species using more than one data source. Multi-species integrated population models have been used to assess competition (Péron and Koons, 2012), synchrony (Lahoz-Monfort et al., 2017), and predator-prey dynamics (Barraquand and Gimenez, 2019; Clark, 2021), but these approaches are not designed for assessment of larger communities. Clark et al. (2017) introduced generalized joint attribute modeling (GJAM), a broad framework for using multiple data sets to estimate the distribution and abundance of multiple species, but their approach does not model the observation process hierarchically, making it difficult to account for differences in sampling protocols and species detections among data sources. Thus, a new modeling approach that combines the benefits of data integration and hierarchical community modeling in a single framework has the potential to yield detailed inferences on individual species occurrence patterns while simultaneously providing inference on community dynamics. Here I develop an “integrated community occupancy model” to simultaneously estimate occur- rence patterns of multiple species within a community as well as community metrics by incorpo- rating multiple available data sources into a single analysis. My modeling framework combines one or more “replicated” detection-nondetection data sources with one or more “nonreplicated” data sources. The integrated model consists of a single, ecological process model and multiple observation models that explicitly account for different detection processes in each data source. The detection models are linked to the process model using a joint-likelihood framework (Miller et al., 2019). My ecological process model is a dynamic community occupancy model in which I explicitly model the latent occurrence process as a function of space and/or time varying co- 21 variates and a temporal autologistic parameter (Royle and Dorazio, 2008). I validate the model using simulations and assess its ability to estimate species-specific and community level dynamics using different amounts of data sources. I then compare inferences from the integrated community occupancy model to those generated by single species integrated models to evaluate the marginal benefits of the approach. I apply the model to an empirical case study of a community of twelve foliage-gleaning insectivorous birds in the White Mountain National Forest in the northeastern USA to assess patterns and trends in species occurrence and community metrics (i.e., richness, composition) from 2010-2018. The integrated community occupancy model provides a rigorous approach to elucidate both species specific and community level dynamics that can provide crucial insights on the specific mechanisms driving biodiversity shifts and population declines. 6.3 Materials and Methods 6.3.1 Integrated community occupancy model I develop an “integrated community occupancy model” (ICOM) that leverages one or more repli- cated detection-nondetection data sources with one or more nonreplicated detection-nondetection data sources to provide inferences on species-specific and community occurrence dynamics. I present the model using one replicated and one nonreplicated data source, but my framework can be extended to incorporate additional data sources (if available) in an analogous manner. The ICOM consists of a single ecological process model for individual species, which is shared across the various data sets. Here, I demonstrate the model with a dynamic ecological process that de- scribes species occurrences as a function of spatio-temporally varying covariates and an autologistic parameter that accounts for temporal correlation (Royle and Dorazio, 2008). This process model is then linked to individual likelihoods for each data set via a hierarchical framework that assumes independence among the detection processes and which is conditional on the true latent occurrence state (i.e., whether a species is truly present or absent at sampling locations; Miller et al. 2019). To link the individual species models, I assume species-level occurrence and detection parameters are random effects coming from common, community level distributions (Dorazio and Royle, 2005; 22 Gelfand et al., 2005), enabling information sharing across the community to increase precision of species effects and estimation of biodiversity attributes. 6.3.1.1 Ecological Process Model My goal is to model the occurrence dynamics of multiple species at sites 𝑗 = 1, . . . 𝐽 within a specified region of interest 𝐴. Let 𝑧𝑖, 𝑗,𝑡 denote the true presence (1) or absence (0) of species 𝑖 at site 𝑗 during year 𝑡, where 𝑖 = 1, . . . , 𝐼 and 𝑡 = 1, . . . , 𝑇. I assume 𝑧𝑖, 𝑗,𝑡 arises from a Bernoulli process following 𝑧𝑖, 𝑗,𝑡 ∼ Bernoulli(𝜓𝑖, 𝑗,𝑡 ), (6.1) where 𝜓𝑖, 𝑗,𝑡 is the probability species 𝑖 occurs at site 𝑗 in year 𝑡. For the first year, I model 𝜓𝑖, 𝑗,𝑡 according to logit(𝜓𝑖, 𝑗,1 ) = 𝛽0𝑖,1 + 𝜷𝑖 · 𝒙 𝑗,1 , (6.2) where 𝛽0𝑖,1 is the species-specific occurrence probability (on the logit scale) in the first year (at average covariate values) and 𝜷𝑖 is a vector of species-specific regression coefficients that describe the effect of standardized covariates (i.e., mean 0 and standard deviation 1) 𝒙 𝑗,1 on the occurrence probability of species 𝑖. In subsequent years, the occurrence probability for species 𝑖 in year 𝑡 at site 𝑗 depends on whether or not the species was present at the site 𝑗 in the previous year 𝑡 − 1 in addition to covariates (which can vary spatially and/or temporally). I accommodate the temporal dependence by incorporating a species-specific autologistic parameter 𝜙𝑖 into the occurrence model, such that for 𝑡 > 1 logit(𝜓𝑖, 𝑗,𝑡 ) = 𝛽0𝑖,𝑡 + 𝜷𝑖 · 𝒙 𝑗,𝑡 + 𝜙𝑖 · 𝑧𝑖, 𝑗,𝑡−1 , (6.3) where 𝛽0𝑖,𝑡 + 𝜙𝑖 is the species-specific intercept in year 𝑡 when species 𝑖 occurred at site 𝑗 in the previous year 𝑡 − 1 and 𝛽0𝑖,𝑡 is the intercept in year 𝑡 when species 𝑖 did not occur at site 𝑗 in the 23 previous year 𝑡 − 1. I use the autologistic parameterization of the dynamic community occupancy model as it allows me to assess covariate effects directly on species occurrence probabilities (i.e., the covariate effects remain the same regardless of the value of 𝑧𝑖, 𝑗,𝑡−1 ) and derive species-specific trends post-hoc from the occurrence probabilities (𝜓𝑖, 𝑗,𝑡 ). However, the ecological process model can be readily modified to incorporate relevant biological processes of interest (provided sufficient data are available; Royle and Dorazio 2008). For example, the ecological process model can be modified to explicitly include a trend effect (covariate on year), assess covariate effects on colonization and persistence (Dorazio et al., 2010), or the autologisitc component can be removed if that is not relevant to the target community. 6.3.1.2 Observation Model: replicated detection-nondetection data For the replicated data type, I assume 𝐾 > 1 “sampling replicates” within each year 𝑡 are available at a subset of sites 𝑟 = 1, . . . , 𝑅. The sampling replicates can be observations from multiple independent surveys, multiple independent observers, spatial subsamples, or sampling intervals from a removal design, which enable separate estimation of occurrence and detection probabilities (MacKenzie et al., 2002). I assume the 𝑅 sites are a subset of the total 𝐽 sites (i.e., 𝑅 ≤ 𝐽), which may cover the entire region of interest 𝐴 or, more commonly, only a portion of it. Let 𝑦𝑖,𝑟,𝑘,𝑡 denote the detection (1) or nondetection (0) of species 𝑖 during replicate 𝑘 at site 𝑟 during year 𝑡. I model 𝑦𝑖,𝑟,𝑘,𝑡 as 𝑦𝑖,𝑟,𝑘,𝑡 ∼ Bernoulli( 𝑝𝑖,𝑟,𝑘,𝑡 · 𝑧𝑖, 𝑗 [𝑟],𝑡 ), (6.4) where 𝑝𝑖,𝑟,𝑘,𝑡 is the probability of detecting species 𝑖 during visit 𝑘 at site 𝑟 in year 𝑡 and 𝑧𝑖, 𝑗 [𝑟],𝑡 is the true occurrence status of species 𝑖 in year 𝑡 at site 𝑗 corresponding to the 𝑟th replicated data site. Species detection probabilities can vary by site and/or sampling covariates following 24 logit( 𝑝𝑖,𝑟,𝑘,𝑡 ) = 𝛼0𝑖,𝑡 + 𝜶𝑖 · 𝒘 𝑟,𝑘,𝑡 (6.5) where 𝛼0𝑖,𝑡 is the species-specific detection probability (on the logit scale) in year 𝑡 at average covariate values and 𝜶𝑖 is a vector of parameters that describe the effect of standardized covariates 𝒘 𝑟,𝑘,𝑡 on the detection probability of species 𝑖. 6.3.1.3 Observation Model: nonreplicated detection-nondetection data Let 𝑣 𝑖,𝑚,𝑡 be the detection (1) or non-detection (0) of species 𝑖 at site 𝑚 in year 𝑡 for the nonreplicated data source, where 𝑚 = 1, . . . , 𝑀. I assume the 𝑀 sites are a subset of all 𝐽 sites of interest (i.e., 𝑀 ≤ 𝐽) within the area of interest 𝐴. The replicated data may be available at different sites than the nonreplicated data in the same region 𝐴, the same sites, or a subset of the same sites in 𝐴. We model the detection-nondetection data 𝑣 𝑖,𝑚,𝑡 according to 𝑣 𝑖,𝑚,𝑡 ∼ Bernoulli(𝜋𝑖,𝑚,𝑡 · 𝑧𝑖, 𝑗 [𝑚],𝑡 ), (6.6) where 𝜋𝑖,𝑚,𝑡 is the probability of detecting species 𝑖 in site 𝑚 in year 𝑡 for the nonreplicated data set and 𝑧𝑖, 𝑗 [𝑚],𝑡 is the true occurrence status of species 𝑖 in year 𝑡 at the site 𝑗 corresponding to the 𝑚th nonreplicated data site. Detection probability 𝜋𝑖,𝑚,𝑡 can vary by species, site, and time following logit(𝜋𝑖,𝑚,𝑡 ) = 𝛾0𝑖,𝑡 + 𝜸𝑖 · 𝒔𝑚,𝑡 , (6.7) where 𝛾0𝑖,𝑡 is a species and year specific intercept and 𝜸𝑖 is a vector of parameters that describe the effect of standardized covariates 𝒔𝑚,𝑡 on the detection probability of species 𝑖. Nonreplicated data alone are unable to separate occurrence probabilities from detection prob- abilities as the model structure is generally unidentifiable (Dorazio, 2014). While detection and occurrence parameters can be weakly identifiable under certain circumstances (e.g., detection varies with covariates, large number of sites and years; Lele et al. 2012; Kéry and Royle 2021), estimates 25 often do not converge or have unreasonably large credible intervals, such that inferences are not ecologically useful. 6.3.1.4 Linking species models across the community Following the structure of the hierarchical community occupancy model (Dorazio and Royle, 2005; Gelfand et al., 2005), species-specific parameters in both the ecological process model and observation models are treated as random effects arising from community level normal distributions with associated community level mean and variance parameters. For example, 𝛽0𝑖,𝑡 , the intercept on occurrence probabilities for species 𝑖 in year 𝑡, is modeled as 2 𝛽0𝑖,𝑡 ∼ Normal(𝜇 𝛽0𝑡 , 𝜎𝛽0 𝑡 ), (6.8) where 𝜇 𝛽0𝑡 is the hyper-mean for occurrence probability (on the logit scale) of all species in the community in year 𝑡 (at average covariate values) and 𝜎𝛽0 2 is the hyper-variance for occurrence 𝑡 probability across all species in the community in year 𝑡. Models for all other species-specific effects in the ecological and observation models are defined analogously. By treating species- specific effects as random, I improve estimates for both rare and abundant species (Zipkin et al., 2009) while simultaneously estimating community level effects. A further benefit of the hierarchical community modeling approach is the ability to easily calculate biodiversity metrics (e.g., species richness (Dorazio and Royle, 2005), beta diversity (Dorazio et al., 2010), Hill numbers (Broms et al., 2015)) from the latent occurrence state (𝑧𝑖, 𝑗,𝑡 ) that account for imperfect detection of species. Under a Bayesian framework, I can calculate any biodiversity metric as a derived parameter at each iteration of the MCMC to obtain a full posterior distribution from which I can obtain estimates with fully propagated uncertainty. For example, I can estimate species richness at each site 𝑗 and year 𝑡 at each iteration of the MCMC by summing the latent occurrence state (𝑧𝑖, 𝑗,𝑡 ) for all species. As a metric of beta diversity, I can calculate the Jaccard index (Magurran, 2013), which describes the similarity between two sites in terms of the 26 number of species that occur at both sites. More specifically, I calculate the Jaccard index between site 𝑗 and 𝑗 0 in year 𝑡 as Í𝐼 𝑖=1 𝑧𝑖, 𝑗,𝑡 · 𝑧𝑖, 𝑗 0,𝑡 JACCARD 𝑗, 𝑗 0,𝑡 = Í 𝐼 Í𝐼 Í𝐼 , (6.9) 𝑖=1 𝑧𝑖, 𝑗,𝑡 + 𝑖=1 𝑧 𝑖, 𝑗 0 ,𝑡 − 𝑖=1 𝑧 𝑖, 𝑗,𝑡 · 𝑧 𝑖, 𝑗 0 ,𝑡 which takes value 0 if the two sites have no species in common and value 1 if the same species occur at the two sites. 6.3.1.5 Data integration via joint likelihood I use a joint likelihood framework to integrate the replicated and nonreplicated detection-nondetection data sources into a single model (the ICOM; Miller et al. 2019). To do this, I assume the likelihoods for the individual data sets are independent, conditional on the shared latent ecological process. This assumption can be interpreted as the detection of a species in one data set (conditional on the species being present) is independent of the detection of the species in any other data set (Schaub and Abadi, 2011; Kéry and Royle, 2021). Thus, the full joint likelihood, conditional on the true, shared ecological process, is the product of the individual conditional likelihoods for each data set: LICOM (𝜶0, 𝜶, 𝜸0, 𝜸 | 𝒛, 𝜷0, 𝜷, 𝒚, 𝒗) = LREP (𝜶0, 𝜶 | 𝒛, 𝜷0, 𝜷, 𝒚) · LNREP (𝜸0, 𝜸 | 𝒛, 𝜷0, 𝜷, 𝒗). (6.10) 6.3.2 Simulation study 1: Assessing benefits of integration I performed a simulation study to assess whether integration of multiple data sources in an ICOM framework could provide improved accuracy and precision for species and community level pa- rameters compared to individual analyses under a range of realistic parameter values (Appendix). I simulated data from one replicated data source with 𝐾 = 3 replicates and two nonreplicated data sources, with the replicated data source having medium community-level detection probability (mean hyper-parameter = 0.5), one nonreplicated data source having low community-level detec- 27 tion probability (mean = 0.22), and the other having high community-level detection probability (mean = 0.78), which allowed me to compare the benefits of integration across varying qualities of nonreplicated data. I generated 100 replicates of each data source under the ICOM framework using a range of community-level ecological parameter values and subsequently drew simulated species data for 𝐼 = 25 species for 𝑇 = 6 years. I generated species’ occurrence probabilities according to Equations 6.2 and 6.3 with a single spatially-varying covariate. I generated detection processes for each data source as a function of a species and year specific intercept, and a species-specific effect of a spatio-temporally varying covariate unique to each data set. I simulated all covariates as normally distributed random variables with mean zero and standard deviation one. I generated each data source at 50 distinct sites that were randomly distributed across the range of covariates, resulting in a total of 𝐽 = 150 sites. I compared model performance by fitting models individually for each of the seven unique combinations of the three data sources and subsequently computing the average bias (i.e., true simulated value minus estimated value) of the species-specific occurrence parameters across all 25 species and 100 simulations, as well as the average bias of the community level parameters. 6.3.3 Simulation study 2: Assessing benefits of community modeling To evaluate the benefits of the hierarchical community model approach used in the ICOM, I assessed how species-level estimates from the ICOM compared to estimates from single species integrated distribution models (IDMs). The IDM took the same form as the ICOM except species-specific parameters were no longer random effects from a community level distribution; rather, species- specific parameters were estimated individually in a model for each species. I simulated data from one replicated data source with 𝐾 = 3 replicates and one nonreplicated data source, both with medium community-level detection probabilities (mean = 0.5). I simulated a community of 𝐼 = 25 species, where each of the two data sources consisted of 50 unique locations sampled over 𝑇 = 6 years, where the locations were randomly distributed across the range of a spatially varying covariate influencing occurrence. I assumed detection for both data sets was a function of a species 28 and year specific intercept, and a species-specific spatio-temporally varying covariate unique to each survey replicate at each site. I generated species-specific occurrence and detection intercepts and covariate effects from uniform distributions (Supplemental Information S1.1), which allowed me to compare the ICOM to individual IDMs under the scenario when species level effects may not follow a normal distribution. I simulated 100 data sets from the community under realistic parameter values (Supplemental Information S1.1). I assessed model performance across the 100 simulated data sets by comparing the accuracy and precision of estimates from the ICOM to the IDMs. 6.3.4 Case study: foliage-gleaning birds in the White Mountains I applied the ICOM to characterize temporal trends and spatial variability in individual species occurrence, species richness, and species composition of a community of twelve foliage-gleaning birds from 2010-2018 across the White Mountain National Forest using two replicated data sets and one nonreplicated data set (Figure 6.1). The two replicated data sets come from the Hubbard Brook Experimental Forest (HBEF) and the National Ecological Observatory Network (NEON) at Bartlett Experimental Forest (Barnett et al., 2019; National Ecological Observatory Network (NEON), 2021), while the nonreplicated data set comes from the North American Breeding Bird Survey (BBS; Pardieck et al. 2020). At HBEF, observers performed three replicate surveys at each of 373 sites in each survey year to account for imperfect detection, while observers at NEON used a removal design at 81 sites to separate detection from occurrence. BBS observers performed point counts at 50 point count locations (called stops) along four routes (i.e., roads) that at least partially fell within the White Mountain National Forest, resulting in a total of 200 nonreplicated point count locations during each survey year. Integration of these three data sets is particularly valuable as each data source has clear advantages and disadvantages (Table 6.1) and covers disparate areas within the study region. Thus, data integration may yield parameter estimates more indicative of the entire White Mountains rather than analyses of the data sources independently. See Appendix for additional details on the three data sets. 29 Table 6.1: Characteristics of the three data sources used to model occurrence dynamics of twelve foliage-gleaning birds in the White Mountain National Forest from 2010-2018. Forest Cover corresponds to the amount of forest within a 250m radius of a point count site. Values for elevation and forest cover are mean (minimum, maximum). HBEF NEON BBS Data type Replicated Replicated Nonreplicated Years 2010-2018 2015-2018 2010-2018 Number of sites 373 81 200 Elevation (m) 607 (240, 932) 432 (268, 766) 352 (134, 917) Forest Cover (%) 97.7 (71, 100) 94.2 (75, 100) 70.6 (0, 92) Survey Location Experimental forest Experimental forest Roadside I modeled occurrence dynamics for the following twelve foliage-gleaning bird species: Ameri- can Redstart (Setophaga ruticilla), Black-and-white Warbler (Mniotilta varia), Blue-headed Vireo (Vireo solitarius), Blackburnian Warbler (Setophaga fusca), Blackpoll Warbler (Setophaga stri- ata), Black-throated Blue Warbler (Setophaga caerulescens), Black-throated Green Warbler (Se- tophaga virens), Canada Warbler (Cardellina canadensis), Magnolia Warbler (Setophaga magno- lia), Nashville Warbler (Leiothlypis ruficapilla), Ovenbird (Seiurus aurocapilla), and Red-eyed Vireo (Vireo olivaceus). I specified species occurrence following Equation 6.1, with occurrence probability in the first year, 𝜓𝑖, 𝑗,1 , modeled according to logit(𝜓𝑖, 𝑗,1 ) = 𝛽0𝑖,1 + 𝛽1𝑖 · ELEV 𝑗 + 𝛽2𝑖 · ELEV2𝑗 + 𝛽3𝑖 · FOR 𝑗 , (6.11) where 𝛽0𝑖,1 is the species-specific intercept in year 1, and 𝛽1𝑖 , 𝛽2𝑖 , and 𝛽3𝑖 are species-specific effects of elevation (ELEV, linear and quadratic) and local forest cover within a 250m radius (FOR), respectively. Occurrence in subsequent years is modeled analogously with a year-specific intercept and an autologistic parameter following Equation 6.3. I extracted elevation data at a 30 × 30 m resolution from the National Elevation Dataset (Gesch et al., 2002) and associated each point count site with the elevation at the center of the point count. I used the National Land Cover Database (Homer et al., 2015) to determine the amount of local forest cover in 2016 within a 250m radius of each point count location. To compute species-specific temporal trend estimates, 30 B A C Figure 6.1: Study location for the case study. Panel (A) shows the White Mountain National Forest (shaded dark grey region) and the location of the Hubbard Brook Experimental Forest (HBEF; light grey region), the BBS routes (dark blue lines), and the NEON data from Bartlett Forest (purple region). Panel (B) shows the distribution of point count locations in HBEF, and Panel (C) shows the distribution of points in the NEON data set. Note different axis spacings across the three plots. I performed a post-hoc linear regression using the average occurrence probability of each species during each year as a response variable and year as a covariate. Under a Bayesian framework, I obtain full uncertainty propagation by calculating the trend for each posterior sample of the average occurrence probabilities (Appendix). All species-specific occurrence intercepts and regression coefficients were modeled hierarchically following Equation 6.8. I incorporated multiple covariates in the conditional likelihoods of each data type to account for variation in detection probabilities following the species’ detection models described in the “Integrated community occupancy model” section. For the HBEF and NEON data, I included 31 species and year-specific intercepts, a species-specific linear effect of the time of the survey, and species-specific linear and quadratic effects of the day of the survey. For the BBS data, I modeled detection as a function of a species and year specific intercept, species-specific linear and quadratic effects of day of survey, and a random observer effect to account for variation in detection among observers. All detection covariates were modeled hierarchically following Equation 6.8. Species and year specific intercepts were also modeled hierarchically, but were drawn from a single distribution for all species and years within each data set (Appendix). 6.3.4.1 Goodness of fit and model validation Model assessment and validation for integrated models is an active area of research (Fletcher et al., 2019). I assessed model fit for the case study using a Bayesian p-value approach with a Chi-square fit statistic. I calculate a separate Bayesian p-value for each data source (Appendix). I used two-fold cross validation with the log predictive density (Vehtari et al., 2017) as a predictive performance metric to assess the out-of-sample predictive performance of the full ICOM compared to six models using subsets of the three data sets for the case study. Assessing out-of-sample predictive performance with occupancy models presents additional complexities since the ecological state of interest is not directly observed (Zipkin et al., 2012). I split each data set into two parts, such that each half consisted of the data from half of the spatial locations for all years those locations were sampled. For a given data set, I compared the occurrence predictions at each hold out location during each year it was sampled to the occurrence values generated from models that were fit using the data at the hold out locations. To account for model uncertainty, I compared the yearly occurrence predictions individually to latent occurrence values generated from the subset of the seven models that used the data set in the model fitting process (see Appendix for details). I summarize predictive performance for each species and the entire community individually at each data set location, as well as across the entire study region (i.e., White Mountains). I used a similar approach to compare the performance of the ICOM to individual species IDMs (Appendix). 32 6.3.5 Model implementation I estimated the parameters in all model versions (simulations and case study) with a Bayesian framework using Markov Chain Monte Carlo (MCMC). I fit the models in NIMBLE (de Valpine et al., 2017, 2021) within the R statistical environment (R Core Team, 2020) using vague priors for all hyper-parameters (Appendix). For all simulations, I ran three chains, each with 20,000 iterations with a burn-in period of 10,000 iterations and a thinning rate of four. For the case study, I ran models for three chains of 450,000 iterations with a burn-in period of 200,000 iterations and a thinning rate of 20, resulting in a total of 52,000 samples from the posterior distribution. I assessed model convergence using the Gelman-Rubin R-hat diagnostic (Brooks and Gelman, 1998) and visual assessment of trace plots using the coda package (Plummer et al., 2006). 6.4 Results 6.4.1 Simulations The ICOM using one replicated data set, one nonreplicated data set with low average detection probability across the community, and one nonreplicated data set with high detection probability yielded unbiased estimates and was generally more precise in community and species-level occur- rence parameter estimates than models using smaller combinations of the three data sets or data sets individually (Figure 6.2, Appendix Figure A.1). Patterns were similar across community effects and species-specific effects, with increases in precision more prominent in species-level effects. Despite the general improvement in estimates found when integrating all three data sources, models using only two data sources, particularly the replicated data source and the nonreplicated data source with high detection probability, also yielded estimates with low bias and high precision. Models using only the nonreplicated data source with low detection probability often failed to converge, while models using only the nonreplicated data source with high detection probability mostly converged but were less precise than models using replicated data (and with essentially unidentifiable intercept values; Figure 6.2A). 33 A B 1 0.5 Bias in covariate effect Bias in intercept 0 0.0 −0.5 −1 EP L H L H H H EP L H L H H H R EP EP EP EP EP EP R EP EP EP EP EP EP R R R R R R R R R R R R N N N N N N N N N N N N + + + + + + + + EP EP L L EP EP L L R R EP EP R R EP EP R R R R N N N N + + EP EP R R Data Sets Used Data Sets Used Figure 6.2: Sampling distribution of estimated bias in simulated species level occurrence intercepts (A) and covariate effects (B) under models using different combinations of a replicated (REP) data set and nonreplicated data sets with low (NREP L) and high (NREP H) detection probability. Points represent the median bias (posterior mean - true simulated value) in a species-level effect across 100 simulations for a community of 25 species. Lines represent the 95% quantiles of the bias values. The intercept parameter using only NREP L is not shown as it failed to converge. The ICOM also led to substantial improvements in precision of parameter estimates compared to single species IDMs (Table 6.2, Appendix Figure A.2). Species-level IDMs provided slightly more accurate estimates of species-specific occurrence parameters for some species as compared to the ICOM, which is a result of Bayesian shrinkage (i.e., borrowing strength) driving species-level parameters closer to the community average for those species with extreme parameter values in the ICOM. However, the true species-level parameters were contained within the 95% Bayesian credible interval of the estimated parameter values across 94.9% of all simulations, indicating that this loss in accuracy is negligible. Further, I simulated species-level effects from a uniform distribution rather than a normal distribution, which led to more extreme species values. Losses in accuracy would be much lower for communities of species where the normal assumption is adequate. Thus, in addition to providing community-level parameter effects, the ICOM provides 34 more precise estimates of species-specific effects compared to IDMs with only minor losses in accuracy for species with extreme intercepts and/or covariate effects. Table 6.2: Precision and accuracy of species-specific parameter estimates when using the integrated community occupancy model (ICOM) compared to a single species integrated distribution model (IDM) for a simulated community of 25 species over six years across 100 simulations with one replicated (REP) data set and one nonreplicated (NREP) data set. Precision improvement is the percentage improvement in precision when using the ICOM compared to the IDM, where precision is defined as the difference between the 2.5% and 97.5% quantiles of the posterior means. Bias is the average magnitude of the posterior means minus the true simulated value. Values are averaged across all 25 species and six years. Parameter Precision ICOM Bias IDM Bias Parameter Improvement (%) 𝛾0𝑖 41.9 0.235 0.101 NREP detection intercept 𝛾1𝑖 33.4 0.070 0.027 NREP detection covariate 𝜙𝑖 30.0 0.173 0.121 Auto-logistic 𝛽0𝑖 29.2 0.173 0.108 Occurrence intercept 𝛽1𝑖 22.5 0.042 0.017 Occurrence covariate 𝛼0𝑖 18.8 0.104 0.044 REP detection intercept 𝛼1𝑖 9.63 0.023 0.012 REP detection covariate 6.4.2 Case Study The ICOM estimated variable trends in occurrence for the twelve foliage-gleaning bird species across the White Mountain National Forest, with five species having >75% probability of increas- ing occurrence rates and three species having >75% probability of decreasing (Appendix A.3) from 2010-2018. Community level parameters revealed that average occurrence probability peaked at mid-level elevations and higher amounts of local forest cover across foliage-gleaning species, although species-specific parameters were highly variable (Appendix Table A.3). Occurrence probabilities peaked at a variety of elevations across the twelve species (Appendix Figure A.4), which resulted in species richness being maximized at mid-level elevations (600-800m; Figure 6.3A). Species composition of the community, as measured by the Jaccard index, largely followed similar patterns (Figure 6.3B). Estimated trends in species-specific occurrence probabilities were 35 A B 1.00 7 Average Jaccard Index 0.75 Average Richness 5 0.50 3 0.25 250 500 750 250 500 750 Elevation (m) Elevation (m) Data Set BBS HBEF NEON Figure 6.3: Estimated average site-level species richness (A) and Jaccard index (B) of a community of twelve foliage-gleaning bird species in the White Mountain National Forest. Points are posterior means. Jaccard index values are relative to a single site in the Hubbard Brook Experimental Forest with value 1, with 0 indicating no species in common to the reference site, and 1 indicating identical community composition to the reference site. highly dependent on the data sets included in the model (Figure 6.4), suggesting important spatial variability across the White Mountains. For example, while Red-eyed Vireo occurrence showed consistent trends across models from all data source combinations, trends for the Black-throated Blue Warbler and Black-throated Green Warbler were stable in estimates from most data combina- tions, but occurrence probabilities were estimated to have declined over this time period based on the model that used only BBS data. Integration of all data sources in the ICOM yielded better predictive performance for the community of birds across the three data set locations than models using only a subset of the available data (Table 6.3). This is likely a result of both a larger number of detections and a wider range of the covariate space when using all three data sources. The model using only HBEF and BBS data had the highest predictive performance for data at HBEF, while the model using NEON data only had the highest predictive performance for data at NEON. The ICOM had higher 36 American Redstart Black−and−white Warbler Black−throated Blue Warbler 1.00 0.75 0.50 0.25 0.00 Black−throated Green Warbler Blackburnian Warbler Blackpoll Warbler 1.00 0.75 Average Occurrence Probability 0.50 0.25 0.00 Blue−headed Vireo Canada Warbler Magnolia Warbler 1.00 0.75 0.50 0.25 0.00 Nashville Warbler Ovenbird Red−eyed Vireo 1.00 0.75 0.50 0.25 0.00 10 12 14 16 18 10 12 14 16 18 10 12 14 16 18 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 Year HBEF BBS HBEF + BBS HBEF + NEON + BBS Data Sets Used NEON HBEF + NEON NEON + BBS Figure 6.4: Average occurrence probabilities of twelve foliage-gleaning bird species in the White Mountain National Forest from 2010 to 2018 from models using different subsets of the three data sources. Points show posterior mean occurrence probabilities averaged across all sites in a given year. Gray shaded region indicates the 95% credible interval of estimates derived from the model that incorporated all three data sources. 37 predictive performance compared to single species IDMs across all 12 species for the three data sets, and outperformed single species IDMs individually for each species for 64%, 91%, and 100% of the species at HBEF, BBS, and NEON, respectively (Appendix Table A.6). Table 6.3: Two-fold cross validation results comparing predictive performance across models using different combinations of the three data sets. The fitted model is shown in the first column and log predictive density measures are shown for the entire community with each data set and across all data sets. Values in parentheses show the average rank of the model for an individual species across all models, with 1 indicating the model is the best for all species and 7 indicating the the model is worst for all species. Bold values indicate the best performing model for each individual data set. Predictive performance of the model using only NEON data is only assessed at NEON locations because NEON data are only available for four of the nine study years. Model HBEF BBS NEON All HBEF -10174 (3.75) -6200 (4.67) -947 (4.75) -17321 (4.5) NEON - - -792 (2.83) - BBS -13876 (3.75) -5702 (3.5) -1148 (5.08) -20726 (3.83) HBEF+NEON -10097 (3.58) -5878 (3.33) -852 (3.67) -16829 (3.33) HBEF+BBS -9732 (2.67) -5717 (3.67) -934 (4.42) -16383 (2.67) NEON+BBS -12560 (4.5) -5759 (3.25) -801 (3.83) -19121 (4.33) HBEF+NEON+BBS -9767 (2.75) -5691 (2.58) -836 (3.42) -16294 (2.33) 6.5 Discussion Assessing species distributions and occurrence dynamics of multiple species in a community is an important task for biodiversity conservation (Guillera-Arroita et al., 2019). Monitoring programs collect different types of data that vary in amount, spatial extent, quality, and information content, and incorporating these varied data into a unified analysis can yield improved estimates on quantities of interest (Zipkin and Saunders, 2018). I developed an “integrated community occupancy model” (ICOM) that uses replicated and nonreplicated detection-nondetection data to simultaneously provide inferences on species-specific and community-level dynamics. Using simulations and empirical bird data, I showed that the ICOM can provide more accurate and precise estimates of occurrence dynamics than analyses using single data sources (Figure 6.2) or single species models (Table 6.2) as well as improved predictive performance across a study region (Table 38 6.3, Appendix Table A.6). In my simulation study, the ICOM using one replicated data set, one nonreplicated data set with low detection probability, and one replicated data set with high detection probability provided unbiased and generally more precise parameter estimates than models using a subset of the three data sources (Figure 6.2), which aligns with previous single species data integration work (Fletcher et al., 2019). Despite this general improvement in the simulation study, integrating the single replicated data source with the high detection nonreplicated data source yielded comparable accuracy and precision to the model using one replicated and two nonreplicated data sources. This suggests that integrating a data source of particularly low quality (e.g., nonreplicated, potentially large detection variability) with higher quality data sources may not yield any practical benefits (Simmonds et al., 2020). I generated each data source at random locations across the range of the covariate with no systematic bias in sampling locations. In reality, many sources of detection-nondetection data are spatially biased (e.g., collected along road transects or near locations with high human density), which can lead to a narrow range of habitat variables (e.g., forest cover) that drive species and community occurrence dynamics. Such biases can result in different species being observed across data sets or estimated parameters that are either biased, only span a portion of the covariate range of interest, or are not indicative of the larger spatial areas of interest (Conn et al., 2017). By integrating disparate data sources from multiple locations in the ICOM, I can increase the likelihood that the sampled sites vary along important ecological and environmental gradients, which in turn enables more precise and accurate inference on the environmental conditions driving species occurrence (Zipkin et al., 2017). In the foliage-gleaning bird case study, I used a two-fold cross validation approach to show that integration of all three data sets yielded the best overall predictive performance across the White Mountain National Forest compared to models using smaller subsets of the three data sources (Table 6.3). Further, the ICOM generally yielded improved predictive performance for individual species and the overall community compared to single species integrated distribution models (Appendix Table A.5). In contrast to the simulation study, parameter precision was not 39 always highest when integrating all three data sets in the case study (Appendix Tables A.4, A.5). Additionally, models with smaller subsets of the three data sources outperformed the ICOM with all three data sets individually for the HBEF and NEON data sets, although the improvements in predictive performance were not large (Table 6.3). This is likely a result of different covariate ranges among the three data sources and only having NEON data for a subset of the time period of interest. For example, precision of species-specific effects of local forest cover was highest for the model using HBEF and NEON data. Sites at HBEF and NEON have low variability in forest cover, with average forest cover of sites being 94.2% and 97.7%, respectively. However, variability in forest cover is comparatively higher at BBS sites, which likely explains why precision is lower when incorporating BBS data. For most intercept parameters at the community and species-specific level, models using either NEON data alone or NEON and BBS were the most precise. In the model using NEON and BBS data, information for separating detection probability from true occurrence comes solely from the NEON data, which are only available for four of the nine years of the study period. This shorter time period likely results in less unexplained variability in occurrence probability and detection across years, which in turn leads to more precise intercept estimates. However, the ICOM using all three data sets enables inference across the entire temporal period of interest as well as a broader range of covariates, generating more informative estimates on trends and spatial patterns across the White Mountains. The large variation in temporal trends when using different combinations of the three data sources suggests spatially-varying occurrence trends for the community of twelve foliage-gleaning birds. NEON data were only available from 2015-2018, which may account for differences in estimated trends from the model using only NEON data compared to other models. BBS data were sampled along road transects and thus have less forest cover than both the NEON and HBEF sites, suggesting that occurrence trends could vary as a result of differences in amount of local forest cover and/or proximity to roads (Furnas, 2020). Estimates from the full ICOM are a weighted average across heterogeneity in trends across the region, where the weights are determined by the amount of available data from each data source (Fletcher et al., 2019). In the case study, the HBEF 40 data source comprised a majority of observations (77%; Appendix Table A.2) and thus contributed the most to estimates of model parameters, which I deemed acceptable because the HBEF data are a high-quality replicated data source. Alternatively, a profiling approach could be used within the MCMC sampler to change the weights for each data set similar to the maximum likelihood approach of Fletcher et al. 2019. By including information from multiple data sources within a region, the ICOM yields area-wide averaged species-specific trends. If an area-wide averaged trend is not desired, trends from different spatial locations could be estimated hierarchically in a multi-region framework (Doser et al., 2021d) or treated as spatially varying coefficients to explicitly model spatial heterogeneity (Finley, 2011). BBS data are a nonreplicated data source and so there is little information to separate the occurrence and detection processes when using only BBS data in a community model (Kéry and Royle, 2021). In the model using only BBS data, this resulted in large uncertainty in the autologistic parameter and subsequent occurrence estimates (Supplemental Table S5). This weak identifiability of the occurrence and detection intercepts contributed to substantial differences in average occurrence probabilities estimated using only BBS data compared to models using multiple data sources in the ICOM (Figure 6.4). When the BBS data were integrated with NEON or HBEF, the information used to separately estimate detection from occurrence came from only the replicated data source(s). This may explain why yearly occurrence probabilities from the NEON + BBS and BBS models for the Black-throated Blue Warbler, Black-throated Green Warbler, and Blue-headed Vireo were nearly identical in the years NEON data were available (2015-2018; Figure 6.4). The limited temporal span of the NEON data sometimes resulted in the BBS data effectively “pulling down” estimates of occurrence form 2010-2014 in the NEON + BBS model for certain species (e.g., Black-throated Green Warbler), while for other species (e.g., Blue-headed Vireo) estimates in 2010-2014 from the NEON + BBS model appeared to be less influenced by the BBS data (Figure 6.4). These varying species-level patterns highlight the complexities that can arise when integrating data sets of different qualities, types, and size, which should be carefully considered by end-users of integrated modeling approaches such as the ICOM. 41 The benefits of data integration for multi-species detection-nondetection data sets will depend on the characteristics and goals of each specific study. The simulations and foliage-gleaning bird case study demonstrate that the ICOM can yield more accurate and precise estimates of occurrence dynamics than alternative approaches. However, it is impossible to explore the full range of data integration complexities within a single study. Future analyses motivated by specific data sets, sample sizes, and ecological questions are necessary to provide further guidance on the practical benefits and potential drawbacks of integrating multiple data sources within an ICOM framework. While the ICOM generally leads to improved inference for species and community level effects, integrating multiple data sets leads to increased computation times and potential complications in achieving model convergence. For example, in the simulation study, the ICOM with three data sets took approximately twice as long to run as a community model using only one replicated data source (i.e., 27 minutes vs. 13 minutes for 20,000 iterations). The added computational costs will depend on numerous analysis-specific factors, such as the number of data sets, and the number of species, sites, and years available within each data set. When determining what data sources to use in an ICOM, I recommend considering the following factors: (1) the amount of the different data sources within the area of interest and how they are distributed across the range of ecological and environmental gradients; (2) the precision of estimates required for the analysis objectives; (3) amount of time and computational power available to run the models; (4) the spatial resolution of each data source; and (5) the quality and information content of each data source (e.g., replicated vs. nonreplicated, large detection variability vs. standardized protocol). For example, suppose multiple data sources are available for a study seeking to reveal the effect of a spatially-varying covariate (e.g., amount of urbanization) on species-specific and community-level occurrence dynamics. If one of the data sources is a high quality, replicated data set that spans the entire environmental gradient of the covariate, a community model using only this single data source would likely suffice. Alternatively, if each individual data source only spans a portion of the environmental gradient, integration of the data sets in an ICOM would be necessary to fully understand the covariate effect on species and community occurrence dynamics. I encourage 42 the use of model validation techniques, such as the cross-validation approach I presented in the foliage-gleaning bird case study to evaluate the benefits of the ICOM for specific study objectives. I envision numerous possible methodological extensions to the ICOM framework. While I used three data sources arising from the same method in the case study (i.e., point count surveys), the ICOM can incorporate detection-nondetection data from different data collection approaches, such as autonomous recording units (Doser et al., 2021b) and citizen science checklists (Kéry et al., 2010b) in an analogous manner. The ICOM assumes the detection processes of each data source are independent, conditional on the true occurrence process. This conditional independence could be violated for certain data sources where trained observers performing structured surveys also submit observations to citizen science repositories (e.g., eBird). The consequences of violating such conditional independence should be explored before combining data types, although results for integrated population models suggest this violation likely has negligible impacts on the accuracy of inferences (Abadi et al., 2010a). The ICOM can help address a variety of questions in applied population and community ecology. As in the single data source hierarchical community model (Dorazio and Royle, 2005), data augmentation (Royle et al., 2007) could be incorporated into the ICOM to account for missing species and to estimate richness of the community. If regional species pools differ between data source locations, the ICOM could be adapted to a multiregion framework (Sutherland et al., 2016) or extended to explore variation of species traits among disparate communities (Garrard et al., 2013; Tenan et al., 2017). Given the model’s ability to estimate species and community level effects, the ICOM can be applied to help elucidate individual species sensitivities to various global change drivers, determine factors causing shifts in taxonomic or functional diversity, or forecast future species and community shifts under varying climate and land use change scenarios to help prioritize conservation strategies. In particular, the ICOM will assist multi-species conservation planning by providing estimates for rare species that lack adequate data for common analysis approaches, while simultaneously obtaining inference on an entire community that may elucidate specific species traits linked to occurrence trends. 43 As changes in environmental and climate conditions continue globally, continued development of monitoring and analysis techniques that can effectively produce accurate and precise estimates of biodiversity metrics are needed to understand global change impacts and develop appropriate mitigation plans. Integrated community occupancy models provide a new approach to simul- taneously analyze multi-species data from numerous available sources. This framework can be used to elucidate both species specific and community level dynamics, improving understanding of the mechanisms driving biodiversity distributions and informing appropriate management and conservation actions to address ongoing global change. 44 CHAPTER 7 CONCLUSIONS 7.1 Research synthesis In this dissertation, I have presented a suite of hierarchical Bayesian modeling approaches for inference on avian soundscapes, populations, and communities. Using multiple data types, I have displayed the value of hierarchical models to partition ecological data into distinct observation and ecological components for improved inference on avian population and community dynamics, as well as a more traditional use of hierarchical models to accommodate complex dependency structures between data points and multivariate responses. As avian data sources become more accessible and span larger spatiotemporal regions, novel and clearly-communicated hierarchical models will play a crucial role in informed management and conservation of individual avian species and entire communities. In Chapter 2, I developed a two-stage hierarchical beta regression model leveraging basis splines to quantify the nonlinear relationship between anthropogenic and biological sounds in avian soundscapes in western New York. Through this approach, I revealed the extent to which publicly-available traffic data could explain the inverse relationship between anthropogenic and biological sounds. Importantly, I concluded the large uncertainty in predictions of biological soundscapes from traffic data was largely a result of temporal coarseness of the public road traffic data, indicating that time-indexed traffic data from traffic and navigation applications (e.g., Waze) might serve as a valuable tool for soundscape conservation (Dumyahn and Pijanowski, 2011), reducing anthropogenic noise impacts on wildlife communities (Herrera-Montes and Aide, 2011), and maintaining the plethora of ecological and human health benefits that biological sounds provide (Buxton et al., 2021). In Chapter 3, I used a multivariate linear mixed effects model to show a shelterwood logging in northern Michigan had no long-term impacts on overall soundscape quality, but did have impacts 45 on the composition of sounds in the soundscape. Given these soundscapes are comprised primarily of avian vocalizations, this indicated a shift in avian community composition. These results align with previous research documenting logging impacts on avian communities in temperate forest ecosystems (Franklin and Forman, 1987; Doyon et al., 2005), suggesting the potential for sound- scape monitoring, along with the developed modeling framework, as a rapid tool for assessment of disturbance impacts on acoustically-communicating communities. In Chapter 4, I found trends in bird abundance were highly variable across protected forests in the northeastern US, but species within a forest showed similar trends, suggesting local forest condition might have a broad and consistent effect on bird communities in this region. I developed a novel multi-species, multi-region, removal sampling model to leverage data from eight distinct parks within a single monitoring network, which enabled estimation of trends for rare species and sparsely sampled parks. Future work will seek to use these data together with additional data sources (e.g., Breeding Bird Survey, eBird, autonomous recording units) to understand the mechanisms driving these heterogeneous spatial trends. In Chapter 5, I developed a novel integrated model that combines automated acoustic vocaliza- tion data together with traditional point count survey data to estimate bird abundance. Given the ability of ARUs to record for multiple distinct time periods at a given location, my approach has the potential to greatly improve abundance estimates, as well as estimated effects of environmental drivers on abundance, in particular for rare and hard-to-detect species. Further, by leveraging information from point count survey data, my approach reduces the time necessary to validate ARU vocalizations obtained from an automated clustering or machine learning algorithm, which has been a major downside to the use of ARUs in wildlife monitoring programs (Gibb et al., 2019). In Chapter 6, I extended single species integrated distribution models to a multispecies frame- work, which enables estimation of occurrence and biodiversity dynamics using multiple data sources. I showed the novel modeling framework resulted in improved predictive performance of occurrence dynamics for a community of twelve-foliage gleaning birds in the White Mountain National Forest. This framework can improve estimates of both species specific and community 46 level dynamics, providing the opportunity to improve understanding of the specific environmental drivers of biodiversity shifts, and how such effects vary across species groups, space, and time. 7.2 Future research The previous analyses represent methodological improvements for analyses of avian soundscapes, populations, and communities. I presented applications of these approaches towards understanding disturbance effects, spatial and temporal variation in bird abundance, and population and commu- nity occurrence trends. However, areas of future work remain in terms of both methodological development and application to address important management and conservation questions, in par- ticular for addressing research questions that span regional to continental scales. My future work will build on the work presented in this dissertation and focus on the development and application of novel data integration approaches for macroscale analyses of bird populations, communities, and the ecosystem services they provide. Below I present three specific areas of future research. 7.2.1 Integration of soundscape, population, and community level analyses Analyses on avian soundscapes in the fields of soundscape ecology and ecoacoustics are typically performed separately from more traditional analyses that focus on avian populations and communi- ties. Both approaches can provide information on the impacts of global change and anthropogenic disturbance on natural ecosystems, as well as the ecosystem services they provide to humans (re- viewed in Gibb et al. 2019). Integration of soundscape analyses together with population and community analyses could provide a more holistic understanding of how avian communities are changing and the implications this has for ecosystems, natural resources and the human health benefits they provide (Buxton et al., 2021). Acoustic patterns in soundscapes are inherently a function of individual species abundance as well as the species composition of a community (Aide et al., 2017). However, soundscapes are nonlinearly related to both the abundance and composition of avian communities, as individual species have distinct contributions to soundscape complexity as a result of variation in song complexity, duration, and frequency (Rappaport et al., 2020). As a 47 result, changes in abundance and composition of avian communities are likely linked to changes in soundscape complexity and diversity in complex, non-linear patterns that vary across space and time. Recent research has recognized this deficiency and linked long-term avian population declines to shifts in soundscapes (Morrison et al., 2021). Ignoring the impacts of avian population and community shifts (e.g., biotic homogenization) on soundscapes underestimates the impacts such losses will have on human health and well-being, especially given the premier importance of bird song to human health and sense of place in nature (Buxton et al., 2021). I plan to develop integrated models that relate multiple soundscape characteristics (e.g., acoustic complexity, diver- sity, composition) to avian community composition to understand how predicted homogenization of avian communities as a result of land-use change and climate change will influence soundscape quality and the underlying human health and well-being benefits these soundscapes provide. 7.2.2 Software development for models of species and community level occurrence dynamics While there has been widespread development of hierarchical modeling approaches to address common complexities when modeling species-specific and community level occurrence dynamics (MacKenzie et al., 2002; Dorazio and Royle, 2005), there is a lack of user-friendly and computation- ally efficient software to fit these specialized models. This is particularly true for spatially-explicit models that accommodate sources of dependency among observations, as computationally com- plexity increases in cubic order with the number of spatial locations. Despite this computational bottleneck, known as the “big N” problem (Heaton et al., 2019), such spatially-explicit models often improve predictive performance when predicting species-specific and community level oc- currence probabilities across a region of interest (Swanson et al., 2013; Guélat and Kéry, 2018; Wright et al., 2021). The lack of specialized software to fit models that account for both imperfect detection and spatial dependence has resulted in either the use of more standard approaches for which such software exists (e.g., spatially-explicit logistic regression using spBayes; Finley et al. 2015) or the use of standard Bayesian software packages (e.g., JAGS, BUGS, NIMBLE), which can be drastically slow and requires learning new syntax. I recently developed spOccupancy, an R 48 package that fits single species occupancy models, multispecies occupancy models, and integrated occupancy models that can accommodate spatial autocorrelation (Doser et al., 2021a). I leverage Pólya-Gamma data augmentation (Polson et al., 2013) and Nearest Neighbor Gaussian Processes (Datta et al., 2016) to yield a computationally efficient Gibbs sampler for all models. I plan to continue development of spOccupancy with the following extensions: (1) Include functionality for dynamic occupancy models (MacKenzie et al., 2003); (2) Allow for spatially varying coefficiets (SVCs; Gelfand et al. 2004; Finley and Banerjee 2020); (3) Functionality for integrated community occupancy models (Doser et al., 2021c). Such computationally efficient software will make it easier to address both imperfect detection and spatial autocorrelation in species distribution models that use large data sets. 7.2.3 Nonstationary and multi-scaled climate and land use effects on avian communities Avian species have been severely impacted by global changes, with an estimated three billion birds lost in North America (Rosenberg et al., 2019) over the last 50 years and similar losses in Europe (Inger et al., 2015). The impacts of climate change and land-use/land-cover (LULC) change have additionally resulted in shifts in avian species distributions (Rushing et al., 2020) and timing of migratory patterns (Hällfors et al., 2020; Youngflesh et al., 2021). Despite increased recognition of global change impacts, a mechanistic understanding of the roles of climate and LULC change in shaping bird occurrence dynamics and distributional shifts is severely lacking for all but the most common species (Breiner et al., 2015; Kindsvater et al., 2018), resulting in critical information gaps to understand and predict future avian trajectories. Understanding the relative contribution of global change drivers to shifts in both species distributions and community composition is complex as it depends on the extent of spatiotemporal synchrony among local populations and species as well as the strength of competing drivers whose effects are nonstationary and multi-scaled. I seek to extend the integrated community occupancy model (ICOM) developed in Chapter 6 to a “macrosystems integrated community occupancy model” (mICOM) to estimate the effects of nonstationary and multi-scaled environmental drivers on avian occurrence dynamics across the continental United 49 States. I will integrate four continental-scale avian data sources (eBird, BBS, NEON, and National Park Service Inventory and Monitoring Network data) and incorporate spatially-varying coefficients (Finley, 2011) of climate and LULC effects at multiple spatial scales via a scale-selection submodel (Frishkoff et al., 2019) to reveal how these effects vary across space, across species, and within a given species range. This analysis will generate improved knowledge on how global change will impact avian communities, the ecosystems they use, and the ecosystem services they provide, as well as help shape management actions across networks of national parks and Bird Conservation Regions by identifying specific areas and communities that are particularly vulnerable to the effects of climate and LULC change. 7.3 Concluding remarks Novel and clearly-communicated hierarchical models that integrate multiple disparate data sources will be valuable tools for conservation and management efforts at both local and continental scales. In this work, I developed hierarchical Bayesian modeling approaches to understand avian soundscape, population, and community dynamics. My future work will continue to leverage complex data sources within hierarchical modeling frameworks to answer important questions on bird populations, communities, and the ecosystem services they provide. 50 APPENDIX 51 APPENDIX INTEGRATED COMMUNITY OCCUPANCY MODELS: A FRAMEWORK TO ASSESS MULTISPECIES OCCURRENCE AND BIODIVERSITY DYNAMICS USING MULTIPLE DATA SOURCES A.1 Simulation study details For the simulation study focused on assessing the benefits of integration (simulation study 1), my goal was to determine whether integration of multiple data sources in an ICOM could provide improved accuracy and precision for species and community level parameters compared to models using smaller only a subset of the available data sources. I therefore drew community-level parameters from a range of realistic values. Specifically, for the 100 simulations, community-level occurrence parameters were drawn from the following distributions 𝛽¯0,𝑡 = Uniform(−1.5, 0.5) for all 𝑡 = 1, . . . , 6, 𝛽¯1 = Uniform(−0.5, 0.5), 𝜙¯ = Uniform(−0.2, 0.8). The intercepts for the detection models for the replicated data set were randomly drawn from a uniform distribution from -1 to 1 in each year 𝑡. I used a uniform distribution from -2.5 to 0 for the nonreplicated data set with low detection probability and a uniform distribution from 0 to 2.5 for the nonreplicated data set with high detection probability. All detection covariate effects for all three data sources were randomly drawn from a uniform distribution from -0.75 to 0.75. Variance parameters controlling variation in the species and year-specific occurrence intercepts were drawn from a uniform distribution ranging from 0.5 to 1.5. All variance parameters controlling the variation in species-specific detection intercepts were drawn from a uniform distribution ranging from 1 to 3, and all variance parameters controlling the variation in species-specific covariate effects 52 were drawn from a uniform distribution ranging from 0.25 to 2, except for the the autologistic parameter, which was drawn from a uniform distribution that ranged from 0.25 to 1. For the simulation study focused on assessing the benefits of community modeling (simulation study 2), my goal was to compare the performance of the ICOM to single species integrated species distribution models (IDMs). Since in reality, species specific effects may not follow the normal distribution assumed by the ICOM framework, I generated intercepts and covariate effects for each species from uniform distributions. More specifically, year specific occurrence intercepts for each species were generated from a uniform distribution between -1.5 and 1.5, detection intercepts for each data set were generated uniformly from -1 to 1, the autologistic parameter was generated uniformly from -0.5 to 2, and all covariate effects on occurrence and detection were generated from a uniform distribution with lower bound -0.5 and upper bound 0.5. I then used either the ICOM framework to estimate the parameters for each species or used an individual IDM for each species. I subsequently compared results to assess the benefits of using the ICOM framework compared to individual IDMs. A.2 Case study data set details The first data set comes from the Hubbard Brook Experimental Forest (HBEF), a 3600 ha northern hardwood forest in the White Mountain National Forest. Observers recorded the numbers of all bird species in 50m radius point count surveys three times annually since 1999 at 373 sites along 15 transects. Transects are separated by 500m and points within a transect separated by either 100 or 200m. Here I use data from 2010-2018 and summarize the counts during each visit as either detections (at least one individual detected) or non-detections (no individuals detected) for the twelve species used in the analysis. The second data set comes from the National Ecological Observatory Network (NEON; Barnett et al. 2019; National Ecological Observatory Network (NEON) 2021) at Bartlett Experimental Forest, a long-term research site in the WMNF, where there are a total of nine sampling grids, each comprised of 7-10 point count sites separated by a minimum of 250m. From 2015-2018, observers 53 recorded the number of all bird species observed during a six minute, 125m radius point count survey once during the breeding season. The six minute survey was split into three two-minute intervals following a removal design (MacKenzie and Royle, 2005) where the observer recorded the interval during which a species was first observed (if any) with a 1, intervals prior to observation with a 0, and then mentally removed the species from subsequent intervals (marked with NA), which enables modeling of data in an occupancy modeling framework. The third data set comes from the North American Breeding Bird Survey (BBS; Pardieck et al. 2020), in which observers count birds along routes (i.e., roads) approximately 39.2km in length. At 50 points along the route (referred to as stops), the observer performs a three minute point count of birds seen or heard within a 0.4km radius. Here I use data from 2010-2018 from four routes that at least partially fell within the WMNF (Figure 1). I use the detection (1) or nondetection (0) of each species at each stop within each route as the detection-nondetection data. The BBS data are nonreplicated and alone cannot be used to explicitly separate occurrence from detection probabilities. I model all three data sources at the same spatial resolution despite differences in the radius of point counts (HBEF = 50m radius, NEON = 125m radius, BBS = 400m radius). Most birds during point count surveys are counted within a 125m radius of the point count center (Ralph, 1993) and so I am confident ignoring these resolution differences is inconsequential, especially given that I am modeling occurrence and not abundance. Further, the separate detection models effectively account for differences in the point count radii that result in differing species-level detection probabilities. Nevertheless, if available data sets are summarized at largely different spatial resolutions, I recommend incorporating a change of support into the occupancy modeling approach (Pacifici et al., 2019). A.3 Model implementation details I specified vague priors for all parameters in the model. I specified uniform priors from 0 to 1 for the community level occurrence intercept in each year 𝑡 and all community-level detection intercepts 54 in each year 𝑡 on the real scale. For example, for the community-level occurrence intercept in year t, I specified logit−1 ( 𝛽¯0,𝑡 ) ∼ Uniform(0, 1). For all covariate effects and the autologistic parameter (in the simulation study and case study) I assigned normal priors with mean 0 and variance 2.72 to the mean community-level effect on the logit scale, which results in a relatively flat prior between 0 and 1 on the true occurrence scale (Broms et al., 2016). I assigned a gamma prior with shape and rate parameters equal to 0.1 for all community level-dispersion parameters that govern the variance of species-specific effects around the common community-level distribution. See NIMBLE code on GitHub (https://github.com/zipkinlab/Doser_etal_2021_InReview) for more details on model implementation. In the foliage-gleaning bird case study, I modeled species and year-specific detection intercepts as random effects across all years and species, as opposed to the model description and simulation study where each intercept for a given year had its own community-level distribution. Thus, for a given data set, the intercept on detection probability for species 𝑖 in year 𝑡 was modeled as 2 𝛼0𝑖,𝑡 ∼ Normal(𝜇𝛼0 , 𝜎𝛼0 ), where 𝜇𝛼0 is the hyper-mean for detection probability (on the logit scale) of all species in the community across all years (at average covariate values) and 𝜎𝛼0 2 is the hyper-variance for detection probability across all species in the community across all years. For the case study, I performed a post-hoc simple linear regression analysis using the average occurrence probability of each species during each year as a response variable and year as a covariate. I wrote a Gibbs sampler in R (R Core Team, 2020) where at each iteration of the sampler I used a different individual sample from the posterior distributions of the average species-specific occurrence probabilities in each year, which enables full uncertainty propagation for the estimated trend parameter. I used vague normal priors with mean 0 and variance 1000 for the intercept and slope (trend) parameter and a vague inverse gamma prior with shape and scale parameter equal to 0.001 for the residual variance. 55 A.4 Bayesian P-value Calculation I performed a posterior predictive check for each data set to assess model fit. Because standard goodness of fit approaches are not valid for binary data (McCullagh, 2019), goodness of fit measures for occupancy models must be obtained by binning the data in some manner (Broms et al., 2016). Because I was interested primarily in assessing occurrence trends over time, I binned the data across all sites within a given year for each species. More specifically, for the HBEF data set, I predicted the observed detection/non-detection of each species 𝑖 at site 𝑗 during year 𝑡 and visit 𝑘, denoted 𝑦ˆ 𝑖, 𝑗,𝑘,𝑡 . I then summed across both the observed detection/non-detection data and replicated detection/non-detection data across all sites and visits, yielding, respectively, 𝐽ÕHBEF Õ 𝐾𝑗 𝑦 SUM,𝑖,𝑡 = 𝑦𝑖, 𝑗,𝑘,𝑡 𝑗=1 𝑘=1 𝐽ÕHBEF Õ 𝐾𝑗 𝑦ˆ SUM,𝑖,𝑡 = 𝑦ˆ 𝑖, 𝑗,𝑘,𝑡 . 𝑗=1 𝑘=1 I used these binned values in a posterior predictive check with a Chi-square discrepancy test statistic, which I summarized using a Bayesian p-value. More specifically, I compute (𝑦 SUM,𝑖,𝑡 − 𝐸 [𝑦 SUM,𝑖,𝑡 ]) 2 Õ 𝐼 Õ 𝑇 𝜒𝑦 = 𝑖=1 𝑡=1 𝐸 [𝑦 SUM,𝑖,𝑡 ] + 𝜖 ( 𝑦ˆ SUM,𝑖,𝑡 − 𝐸 [ 𝑦ˆ SUM,𝑖,𝑡 ]) 2 Õ 𝐼 Õ 𝑇 𝜒𝑦ˆ = , 𝑖=1 𝑡=1 𝐸 [ 𝑦ˆ SUM,𝑖,𝑡 ] + 𝜖 Í𝐽REP Í𝐾 𝑗 where 𝐸 [ 𝑦ˆ SUM,𝑖,𝑡 ] = 𝐸 [𝑦 SUM,𝑖,𝑡 ] = 𝑗=1 𝑘=1 𝑝𝑖, 𝑗,𝑘,𝑡 · 𝑧𝑖, 𝑗,𝑡 and 𝜖 = 0.0001 is an added constant to avoid problems with small expected values (Kéry and Royle, 2021). I then computed the Bayesian p-value as Õ 𝑀 BP𝑦 = 𝐼 ( 𝜒𝑦ˆ > 𝜒𝑦 ), 𝑚=1 56 where 𝐼 (·) is the indicator function. I compute a separate Bayesian p-value for the NEON and BBS data sets following the same procedure. A.5 Additional Simulation Study Results A 0.2 B 0.1 0.1 Bias in covariate effect Bias in intercept 0.0 0.0 −0.1 −0.1 −0.2 −0.3 EP L H L H H H EP L H L H H H R EP EP EP EP EP EP R EP EP EP EP EP EP R R R R R R R R R R R R N N N N N N N N N N N N + + + + + + + + EP EP L L EP EP L L R R EP EP R R EP EP R R R R N N N N + + EP EP R R Data Sets Used Data Sets Used Figure A.1: Sampling distribution of estimated bias in simulated community level occurrence intercepts (A) and covariate effects (B) under models using different combinations of a replicated (REP) data set and a nonreplicated data set with low detection probability (NREP L) and with high detection probability (NREP H). Points represent the median bias (posterior mean - true simulated value) in a community-level effect across 100 simulations for a community of 25 species. Lines represent the 95% quantiles. 57 2 Occupancy Intercept 1 0 Model −1 IDM ICOM −2 True 1 5 10 15 20 25 Species Figure A.2: Sampling distributions of estimated occurrence intercepts for a single year for a simulated community of 25 species across 100 simulations when using the integrated community occupancy model (ICOM) or a single species integrated distribution model (IDM). Points represent the median posterior mean across all 100 simulations, and lines represent the 2.5% and 95% quantiles of the posterior means across the 100 simulations. Black points are the true simulated values. A.6 Additional Case Study Results Table A.1: Bayesian p-value goodness of fit assessment for the foliage-gleaning bird case study using the Chi-square fit statistic. Bayesian p-values between 0.1 and 0.9 indicate adequate model fit to each data source (Hobbs and Hooten, 2015). Model 𝑦𝑖, 𝑗,𝑘,𝑡 𝑣 1,𝑖, 𝑗,𝑡 𝑣 2,𝑖, 𝑗,𝑡 HBEF 0.508 - - NEON - 0.503 - BBS - - 0.559 HBEF + NEON 0.491 0.490 - HBEF + BBS 0.474 - 0.521 NEON + BBS - 0.462 0.509 HBEF + NEON + BBS 0.478 0.494 0.524 58 Table A.2: Number of unique observations at each site and year combination of the twelve foliage-gleaning bird species in the three data sets used in the case study in White Mountain National Forest from 2010-2018. Numbers in parentheses indicate total number of observations including pseudoreplicates (i.e., repeated detection of a bird species at a site in the same year). Species HBEF NEON BBS Total American Redstart 93 (111) 45 270 408 (426) Black-and-white Warbler 264 (294) 30 41 335 (365) Blue-headed Vireo 684 (779) 54 190 928 (1023) Blackburnian Warbler 1974 (3348) 120 142 2236 (3610) Blackpoll Warbler 297 (469) 0 1 298 (470) Black-throated Blue Warbler 2226 (4030) 116 153 2494 (4299) Black-throated Green Warbler 2272 (3898) 195 138 2605 (4231) Canada Warbler 213 (269) 2 11 226 (282) Magnolia Warbler 565 (830) 12 113 690 (955) Nashville Warbler 72 (80) 0 57 129 (137) Ovenbird 1956 (3441) 245 545 2746 (4231) Red-eyed Vireo 1880 (3272) 246 1012 3138 (4530) Total 12496 (20821) 1065 2673 16233 (24559) 59 Black−and−white Warbler Canada Warbler Blue−headed Vireo Magnolia Warbler Ovenbird Trend Species American Redstart 0.04 0.03 0.02 Nashville Warbler 0.01 0.00 −0.01 Red−eyed Vireo Blackburnian Warbler Black−throated Blue Warbler Blackpoll Warbler Black−throated Green Warbler 0.00 0.25 0.50 0.75 1.00 Probability of Increasing Trend Figure A.3: Probability of positive trends in occurrence probabilities of twelve foliage-gleaning bird species in the White Mountain National Forest from 2010-2018. Color indicates the posterior mean trend estimate from a post-hoc simple linear regression. A.7 Model validation details I used two-fold cross validation to assess the out-of-sample predictive performance of the full ICOM compared to six models using subsets of the three data sets for the case study. I split each data set into two parts, such that each half of the data set consisted of the data from half of the spatial locations for all years those locations were sampled. I fit each model using each half of the data set and subsequently predicted the true occurrence (𝑧𝑖, 𝑗,𝑡 ) at the locations not used to fit the model for all years those locations were sampled. Assessing out-of-sample predictive performance with occupancy models presents additional complexities since the ecological state of interest is not directly observed (Zipkin et al., 2012). As such, I must generate the latent occurrence values using a separate model. For a given data set, I compared the yearly occurrence predictions at the hold out locations to the yearly occurrence values generated from models that were fit using the data at the hold out locations. To account for model uncertainty, I compared the yearly occurrence predictions 60 American Redstart Black−and−white Warbler Black−throated Blue Warbler Black−throated Green Warbler 1.00 0.75 0.50 0.25 0.00 Occurrence Probability Blackburnian Warbler Blackpoll Warbler Blue−headed Vireo Canada Warbler 1.00 0.75 0.50 0.25 0.00 Magnolia Warbler Nashville Warbler Ovenbird Red−eyed Vireo 1.00 0.75 0.50 0.25 0.00 250 500 750 250 500 750 250 500 750 250 500 750 Elevation (m) Figure A.4: Estimated occurrence probabilities of twelve foliage-gleaning bird species across an elevation gradient. Black lines are posterior mean occurrence probabilities with 95% credible intervals denoted by the gray region. Occurrence probabilities correspond to the probability of a species occupying a site in 2014 that was not occupied in the previous year. Table A.3: Posterior means and standard deviations of species-specific occurrence covariate effects from the integrated community occupancy model (Mean (SD)), as well as the overall community level effects (COMM). Boldface indicates 95% credible interval does not overlap zero. Species Elevation Elevation2 Percent Forest Autologistic AMRE -0.23 (0.089) -0.30 (0.072) -0.18 (0.074) 2.6 (0.70) BAWW 0.26 (0.11) -0.28 (0.084) 0.087 (0.13) 4.0 (0.84) BHVI -0.030 (0.099) -0.055 (0.080) 0.16 (0.12) 5.3 (0.99) BLBW 0.17 (0.068) -0.57 (0.055) 0.087 (0.090) 2.2 (0.18) BLPW 0.75 (0.20) 1.31 (0.14) 0.60 (0.28) 1.8 (0.31) BTBW 0.40 (0.084) -0.82 (0.075) -0.33 (0.20) 3.1 (0.26) BTNW 0.28 (0.082) -0.16 (0.069) 0.045 (0.13) 1.9 (0.22) CAWA 1.01 (0.20) -0.34 (0.13) 0.093 (0.19) 5.6 (0.47) MAWA 0.51 (0.086) 0.45 (0.071) -0.10 (0.090) 5.2 (0.32) NAWA 0.18 (0.12) 0.31 (0.095) -0.099 (0.11) 4.3 (0.76) OVEN -0.30 (0.064) -0.54 (0.054) 0.21 (0.071) 3.5 (0.25) REVI -0.42 (0.054) -0.45 (0.042) 0.22 (0.055) 2.7 (0.17) COMM 0.22 (0.15) -0.12 (0.19) 0.065 (0.11) 3.2 (0.48) 61 individually to latent occurrence values generated from the subset of the seven models that used the data set in the model fitting process. I used the log predictive density (Vehtari et al., 2017) to compare predictive performance of the models, where larger values indicate better model fit. Thus, for each species 𝑖 at each hold out location 𝑗 during each year 𝑡 that site was sampled, I obtained a measure of the log predictive density, which were then summed across all hold-out sites and years. For example, to assess the ability of the full ICOM to predict occurrence at HBEF, I compared out of sample predictions at HBEF generated from the full ICOM to latent occurrence values for each individual species generated from the following models that were fit using the out-of-sample data: (1) HBEF, (2) HBEF + NEON, (3) HBEF + BBS, (4) HBEF + NEON + BBS. This generated four log predictive density measures (after summing across sites and years), that I then averaged to account for model uncertainty, yielding a single measure of predictive performance at the locations of a specific data set. I followed the same process using the other half of the data and averaged these two values to obtain a single value of predictive performance for a model of a given data set. I obtained log predictive density measures individually for each species and summed these values Table A.4: Precision of community-level occurrence parameters from models using different amounts of the three data sets. Precision is measured as the width of the 95% credible interval. Boldface indicates the model with the highest precision for a given parameter. Parameter HBEF NEON BBS HBEF + HBEF + NEON + HBEF + NEON BBS BBS NEON + BBS 𝛽¯0,1 2.97 - 2.59 3.01 2.77 2.53 2.80 𝛽¯0,2 2.62 - 2.44 2.67 2.81 2.41 2.65 𝛽¯0,3 2.82 - 2.36 2.92 2.98 1.94 2.92 𝛽¯0,4 2.47 - 2.61 2.60 2.43 2.35 2.44 𝛽¯0,5 2.76 - 2.35 2.83 2.81 1.54 2.68 𝛽¯0,6 2.64 2.21 2.29 2.43 2.72 1.89 2.47 𝛽¯0,7 3.19 2.86 1.71 3.14 2.79 2.06 2.76 𝛽¯0,8 3.05 2.56 2.37 2.92 3.07 1.88 2.85 𝛽¯0,9 3.67 2.26 3.12 3.51 3.38 2.21 3.21 𝛽¯1 1.24 0.53 0.44 1.06 0.67 0.37 0.59 𝛽¯2 0.65 0.39 0.34 0.70 0.81 0.29 0.75 𝛽¯3 0.33 0.46 0.39 0.30 0.48 0.32 0.42 𝜙¯ 2.07 0.92 4.20 2.05 2.01 1.96 1.92 62 Table A.5: Precision of species-level occurrence parameters from models using different amounts of the three data sets. Species-specific values are averaged across the twelve species, while species and year-specific intercepts are averaged across the twelve species and nine years. Precision is measured as the width of the 95% credible interval. Boldface indicates the model with the highest precision for a given parameter. Model Intercept Elevation Elevation2 Percent Autologistic Forest HBEF 0.25 0.49 0.37 0.35 2.12 NEON 0.21 0.75 0.56 0.66 1.51 BBS 0.31 0.68 0.42 0.52 5.12 HBEF + NEON 0.22 0.45 0.36 0.33 1.95 HBEF + BBS 0.25 0.45 0.36 0.57 2.06 NEON + BBS 0.38 0.54 0.35 0.47 2.54 HBEF + NEON + BBS 0.21 0.41 0.32 0.50 1.80 to obtain a single metric for the entire community. The model with the best predictive performance across the community throughout the White Mountains across all study years will have the highest Table A.6: Two-fold cross validation results comparing predictive performance of single-species integrated distribution models (IDM) to the integrated community occupancy model (ICOM). Values are the log predictive density, where higher values indicate better predictive performance. Results are not shown for Blackpoll Warbler due to insufficient data required to fit the single-species model. Species HBEF BBS NEON ICOM IDM ICOM IDM ICOM IDM American Redstart -683 -846 -592 -562 -75 -88 Black-and-white Warbler -885 -1014 -492 -729 -82 -139 Blue-headed Vireo -993 -919 -573 -668 -107 -154 Blackburnian Warbler -925 -968 -514 -743 -118 -173 Black-throated Blue Warbler -714 -835 -418 -647 -117 -140 Black-throated Green Warbler -488 -600 -496 -127 -127 -201 Canda Warbler -620 -646 -100 -575 -10.5 -25.8 Magnolia Warbler -1153 -1083 -436 -574 -41.6 -74 Nashville Warbler -411 -1161 -218 -434 - - Ovenbird -1052 -1029 -293 -365 -78 -88 Red-eyed Vireo -1030 -988 -364 -397 -88 -101 Community -8953 –10091 -4496 -5942 -845 -1185 63 total log predictive density measure summed across the three values for the three data sets. Finally, I used two-fold cross validation to assess the performance of the ICOM compared to individual species IDMs. I fit each model to half of the data points (which were constructed in the same manner as described in the preceding paragraph), predicted the yearly occurrence values at the other half of the data points, and compared the yearly predicted values to the latent occurrence values generated from the model fit using the out of sample data points. I used the log predictive density as a measure of predictive performance, computing a single value for each species at each data set location (summed across all available years), which subsequently allowed me to compare predictive performance across species, data sets, and the entire community. 64 BIBLIOGRAPHY 65 BIBLIOGRAPHY Abadi, F., Gimenez, O., Arlettaz, R., and Schaub, M. (2010a). An assessment of integrated population models: bias, accuracy, and violation of the assumption of independence. Ecology, 91(1):7–14. Abadi, F., Gimenez, O., Ullrich, B., Arlettaz, R., and Schaub, M. (2010b). Estimation of immigra- tion rate using integrated population models. Journal of Applied Ecology, 47(2):393–400. Aide, T. M., Hernández-Serna, A., Campos-Cerqueira, M., Acevedo-Charry, O., and Deichmann, J. L. (2017). Species richness (of insects) drives the use of acoustic space in the tropics. Remote Sensing, 9(11):1096. Barnett, D. T., Duffy, P. A., Schimel, D. S., Krauss, R. E., Irvine, K. M., Davis, F. W., Gross, J. E., Azuaje, E. I., Thorpe, A. S., Gudex-Cross, D., et al. (2019). The terrestrial organism and biogeochemistry spatial sampling design for the national ecological observatory network. Ecosphere, 10(2):e02540. Barraquand, F. and Gimenez, O. (2019). Integrating multiple data sources to fit matrix population models for interacting species. Ecological Modelling, 411:108713. Berliner, L. M. (1996). Hierarchical bayesian time series models. In Maximum Entropy and Bayesian Methods, pages 15–22. Springer. Bibby, C. J., Burgess, N. D., Hillis, D. M., Hill, D. A., and Mustoe, S. (2000). Bird Census Techniques. Elsevier. Bradfer-Lawrence, T., Bunnefeld, N., Gardner, N., Willis, S. G., and Dent, D. H. (2020). Rapid as- sessment of avian species richness and abundance using acoustic indices. Ecological Indicators, 115:106400. Bradfer-Lawrence, T., Gardner, N., Bunnefeld, L., Bunnefeld, N., Willis, S. G., and Dent, D. H. (2019). Guidelines for the use of acoustic indices in environmental research. Methods in Ecology and Evolution, 10(10):1796–1807. Breiner, F. T., Guisan, A., Bergamini, A., and Nobis, M. P. (2015). Overcoming limitations of modelling rare species by using ensembles of small models. Methods in Ecology and Evolution, 6(10):1210–1218. Broms, K. M., Hooten, M. B., and Fitzpatrick, R. M. (2015). Accounting for imperfect detection in hill numbers for biodiversity studies. Methods in Ecology and Evolution, 6(1):99–108. Broms, K. M., Hooten, M. B., and Fitzpatrick, R. M. (2016). Model selection and assessment for 66 multi-species occupancy models. Ecology, 97(7):1759–1770. Brooks, S. P. and Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4):434–455. Buckland, S. T., Anderson, D. R., Burnham, K. P., and Laake, J. L. (2005). Distance sampling. Encyclopedia of Biostatistics, 2. Burivalova, Z., Orndorff, S., Truskinger, A., Roe, P., Game, E. T., et al. (2021). The sound of logging: Tropical forest soundscape before, during, and after selective timber extraction. Biological Conservation, 254:108812. Burnham, K. P., Anderson, D. R., and Laake, J. L. (1980). Estimation of density from line transect sampling of biological populations. Wildlife Monographs, (72):3–202. Buxton, R. T., McKenna, M. F., Clapp, M., Meyer, E., Stabenau, E., Angeloni, L. M., Crooks, K., and Wittemyer, G. (2018). Efficacy of extracting indices from large-scale acoustic recordings to monitor biodiversity. Conservation Biology, 32(5):1174–1184. Buxton, R. T., Pearson, A. L., Allou, C., Fristrup, K., and Wittemyer, G. (2021). A synthesis of health benefits of natural sounds and their distribution in national parks. Proceedings of the National Academy of Sciences, 118(14). Ceballos, G., Ehrlich, P. R., Barnosky, A. D., García, A., Pringle, R. M., and Palmer, T. M. (2015). Accelerated modern human–induced species losses: Entering the sixth mass extinction. Science Advances, 1(5):e1400253. Ceballos, G., García, A., and Ehrlich, P. R. (2010). The sixth extinction crisis: Loss of animal populations and species. Journal of Cosmology, 8(1821):31. Chambert, T., Waddle, J. H., Miller, D. A., Walls, S. C., and Nichols, J. D. (2018). A new framework for analysing automated acoustic species detection data: Occupancy estimation and optimization of recordings post-processing. Methods in Ecology and Evolution, 9(3):560–570. Chandler, R. B., King, D. I., Raudales, R., Trubey, R., Chandler, C., and Arce Chávez, V. J. (2013). A small-scale land-sparing approach to conserving biological diversity in tropical agricultural landscapes. Conservation Biology, 27(4):785–795. Chen, I.-C., Hill, J. K., Ohlemüller, R., Roy, D. B., and Thomas, C. D. (2011). Rapid range shifts of species associated with high levels of climate warming. Science, 333(6045):1024–1026. Clark, J. S. (2007). Models for Ecological Data. Princeton University Press. Clark, J. S., Nemergut, D., Seyednasrollah, B., Turner, P. J., and Zhang, S. (2017). Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. 67 Ecological Monographs, 87(1):34–56. Clark, T. J. (2021). Large carnivore recolonization reshapes population and community dynamics in the Rocky Mountains: Implications for harvest management. PhD thesis, University of Montana. Conn, P. B., Thorson, J. T., and Johnson, D. S. (2017). Confronting preferential sampling when analysing population distributions: diagnosis and model-based triage. Methods in Ecology and Evolution, 8(11):1535–1546. Darras, K., Batáry, P., Furnas, B., Celis-Murillo, A., Van Wilgenburg, S. L., Mulyani, Y. A., and Tscharntke, T. (2018). Comparing the sampling performance of sound recorders versus point counts in bird surveys: A meta-analysis. Journal of Applied Ecology, 55(6):2575–2586. Darras, K., Batáry, P., Furnas, B. J., Grass, I., Mulyani, Y. A., and Tscharntke, T. (2019). Au- tonomous sound recording outperforms human observation for sampling birds: a systematic map and user guide. Ecological Applications, 29(6):e01954. Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016). Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111(514):800–812. de Valpine, P., Paciorek, C., Turek, D., Michaud, N., Anderson-Bergman, C., Obermeyer, F., Wehrhahn Cortes, C., Rodrìguez, A., Temple Lang, D., and Paganin, S. (2021). NIMBLE: MCMC, Particle Filtering, and Programmable Hierarchical Modeling. R package version 0.11.1. de Valpine, P., Turek, D., Paciorek, C., Anderson-Bergman, C., Temple Lang, D., and Bodik, R. (2017). Programming with models: writing statistical algorithms for general model structures with NIMBLE. Journal of Computational and Graphical Statistics, 26:403–413. Deichmann, J. L., Hernández-Serna, A., C., J. A. D., Campos-Cerqueira, M., and Aide, T. M. (2017). Soundscape analysis and acoustic monitoring document impacts of natural gas exploration on biodiversity in a tropical forest. Ecological Indicators, 74:39–48. Devarajan, K., Morelli, T. L., and Tenan, S. (2020). Multi-species occupancy models: review, roadmap, and recommendations. Ecography, 43(11):1612–1624. Dirzo, R. and Raven, P. H. (2003). Global state of biodiversity and loss. Annual Review of Environment and Resources, 28(1):137–167. Dorazio, R. M. (2014). Accounting for imperfect detection and survey bias in statistical analysis of presence-only data. Global Ecology and Biogeography, 23(12):1472–1484. Dorazio, R. M., Kéry, M., Royle, J. A., and Plattner, M. (2010). Models for inference in dynamic metacommunity systems. Ecology, 91(8):2466–2475. 68 Dorazio, R. M. and Royle, J. A. (2005). Estimating size and composition of biological commu- nities by modeling the occurrence of species. Journal of the American Statistical Association, 100(470):389–398. Doser, J. W., Finley, A. O., Kasten, E. P., and Gage, S. H. (2020a). Assessing soundscape disturbance through hierarchical models and acoustic indices: A case study on a shelterwood logged northern michigan forest. Ecological Indicators, 113:106244. Doser, J. W., Finley, A. O., Kéry, M., and Zipkin, E. F. (2021a). spOccupancy: An R pack- age for single species, multispecies, and integrated spatial occupancy models. arXiv preprint arxiv:2111.12163. Doser, J. W., Finley, A. O., Weed, A. S., and Zipkin, E. F. (2021b). Integrating automated acoustic vocalization data and point count surveys for estimation of bird abundance. Methods in Ecology and Evolution, 12(6):1040–1049. Doser, J. W., Hannam, K. M., and Finley, A. O. (2020b). Characterizing functional relationships between anthropogenic and biological sounds: a western New York state soundscape case study. Landscape Ecology, 35(3):689–707. Doser, J. W., Leuenberger, W., Sillett, T. S., Hallworth, M. T., and Zipkin, E. F. (2021c). Integrated community occupancy models: A framework to assess occurrence and biodiversity dynamics using multiple data sources. arXiv preprint arXiv:2109.01894. Doser, J. W., Weed, A. S., Zipkin, E. F., Miller, K. M., and Finley, A. O. (2021d). Trends in bird abundance differ among protected forests but not bird guilds. Ecological Applications, 31(6):e2377. Doyon, F., Gagnon, D., and Giroux, J.-F. (2005). Effects of strip and single-tree selection cutting on birds and their habitat in a southwestern Quebec northern hardwood forest. Forest Ecology and Management, 209:101–115. Ducrettet, M., Forget, P.-M., Ulloa, J. S., Yguel, B., Gaucher, P., Prince, K., Haupert, S., and Sueur, J. (2020). Monitoring canopy bird activity in disturbed landscapes with automatic recorders: A case study in the tropics. Biological Conservation, 245:108574. Dumyahn, S. L. and Pijanowski, B. C. (2011). Soundscape conservation. Landscape Ecology, 26(9):1327–1344. Efford, M. G., Dawson, D. K., and Borchers, D. L. (2009). Population density estimated from locations of individuals on a passive detector array. Ecology, 90(10):2676–2682. Fairbrass, A. J., Rennert, P., Williams, C., Titheridge, H., and Jones, K. E. (2017). Biases of acoustic indices measuring biodiversity in urban areas. Ecological Indicators, 83:169–177. 69 Farina, A. (2019). Ecoacoustics: a quantitative approach to investigate the ecological role of environmental sounds. Mathematics, 7(1):21. Farnsworth, G. L., Pollock, K. H., Nichols, J. D., Simons, T. R., Hines, J. E., and Sauer, J. R. (2002). A Removal Model for Estimating Detection Probabilities from Point-Count Surveys. The Auk, 119(2):414–425. Farr, M. T., Green, D. S., Holekamp, K. E., and Zipkin, E. F. (2021). Integrating distance sampling and presence-only data to estimate species abundance. Ecology, 102(1):e03204. Fidino, M., Gallo, T., Lehrer, E. W., Murray, M. H., et al. (2020). Landscape-scale differences among cities alter common species’ responses to urbanization. Ecological Applications, 31(2):e2253. Finley, A. O. (2011). Comparing spatially-varying coefficients models for analysis of ecological data with non-stationary and anisotropic residual dependence. Methods in Ecology and Evolution, 2(2):143–154. Finley, A. O. and Banerjee, S. (2020). Bayesian spatially varying coefficient models in the spBayes R package. Environmental Modelling & Software, 125:104608. Finley, A. O., Banerjee, S., and Gelfand, A. E. (2015). spBayes for Large Univariate and Multivariate Point-Referenced Spatio-Temporal Data Models. Journal of Statistical Software, 63(13):1–28. Fletcher, R. J., Hefley, T. J., Robertson, E. P., Zuckerberg, B., McCleery, R. A., and Dorazio, R. M. (2019). A practical guide for combining data to model species distributions. Ecology, 100(6):e02710. Franklin, J. F. and Forman, R. T. (1987). Creating landscape patterns by forest cutting: Ecological consequences and principles. Landscape Ecology, 1(1):5–18. Frishkoff, L. O., Karp, D. S., Flanders, J. R., Zook, J., Hadly, E. A., Daily, G. C., and M’Gonigle, L. K. (2016). Climate change and habitat conversion favour the same species. Ecology Letters, 19(9):1081–1090. Frishkoff, L. O., Mahler, D. L., and Fortin, M.-J. (2019). Integrating over uncertainty in spatial scale of response within multispecies occupancy models yields more accurate assessments of community composition. Ecography, 42(12):2132–2143. Furnas, B. J. (2020). Rapid and varied responses of songbirds to climate change in california coniferous forests. Biological Conservation, 241:108347. Furnas, B. J. and Callas, R. L. (2015). Using automated recorders and occupancy models to monitor common forest birds across a large geographic region. The Journal of Wildlife Management, 79(2):325–337. 70 Gallo, T., Fidino, M., Lehrer, E. W., and Magle, S. B. (2017). Mammal diversity and metacommu- nity dynamics in urban green spaces: implications for urban wildlife conservation. Ecological Applications, 27(8):2330–2341. Garrard, G. E., McCarthy, M. A., Williams, N. S., Bekessy, S. A., and Wintle, B. A. (2013). A general model of detectability using species traits. Methods in Ecology and Evolution, 4(1):45– 52. Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004). Nonstationary multivariate process modeling through spatially varying coregionalization. Test, 13(2):263–312. Gelfand, A. E., Schmidt, A. M., Wu, S., Silander Jr, J. A., Latimer, A., and Rebelo, A. G. (2005). Modelling species diversity through species level hierarchical modelling. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(1):1–20. Gesch, D., Oimoen, M., Greenlee, S., Nelson, C., Steuck, M., and Tyler, D. (2002). The national elevation dataset. Photogrammetric Engineering and Remote Sensing, 68(1):5–32. Gibb, R., Browning, E., Glover-Kapfer, P., and Jones, K. E. (2019). Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring. Methods in Ecology and Evolution, 10(2):169–185. Guélat, J. and Kéry, M. (2018). Effects of spatial autocorrelation and imperfect detection on species distribution models. Methods in Ecology and Evolution, 9(6):1614–1625. Guillera-Arroita, G., Kéry, M., and Lahoz-Monfort, J. J. (2019). Inferring species richness using multispecies occupancy modeling: Estimation performance and interpretation. Ecology and Evolution, 9(2):780–792. Hällfors, M. H., Antão, L. H., Itter, M., Lehikoinen, A., Lindholm, T., Roslin, T., and Saastamoinen, M. (2020). Shifts in timing and duration of breeding for 73 boreal bird species over four decades. Proceedings of the National Academy of Sciences, 117(31):18557–18565. Hallworth, M. T., Bayne, E., McKinnon, E., Love, O., Tremblay, J. A., Drolet, B., Ibarzabal, J., Van Wilgenburg, S., and Marra, P. P. (2021). Habitat loss on the breeding grounds is a major contributor to population declines in a long-distance migratory songbird. Proceedings of the Royal Society B, 288(1949):20203164. Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M., et al. (2019). A case study competition among methods for analyzing large spatial data. Journal of Agricultural, Biological and Environmental Statistics, 24(3):398–425. Herrera-Montes, M. I. and Aide, T. M. (2011). Impacts of traffic noise on anuran and bird communities. Urban Ecosystems, 14(3):415–427. 71 Hobbs, N. T. and Hooten, M. B. (2015). Bayesian Models: A Statistical Primer for Ecologists. Princeton University Press. Homer, C., Dewitz, J., Yang, L., Jin, S., Danielson, P., Xian, G., Coulston, J., Herold, N., Wickham, J., and Megown, K. (2015). Completion of the 2011 national land cover database for the conterminous united states–representing a decade of land cover change information. Photogrammetric Engineering & Remote Sensing, 81(5):345–354. Hooten, M. B. and Hefley, T. J. (2019). Bringing Bayesian Models to Life. CRC Press. Inger, R., Gregory, R., Duffy, J. P., Stott, I., Voříšek, P., and Gaston, K. J. (2015). Common European birds are declining rapidly while less abundant species’ numbers are rising. Ecology Letters, 18(1):28–36. Isaac, N. J., Jarzyna, M. A., Keil, P., Dambly, L. I., Boersch-Supan, P. H., Browning, E., Freeman, S. N., Golding, N., Guillera-Arroita, G., Henrys, P. A., et al. (2020). Data integration for large-scale models of species distributions. Trends in Ecology & Evolution, 35(1):56–67. Kéry, M., Gardner, B., and Monnerat, C. (2010a). Predicting species distributions from checklist data using site-occupancy models. Journal of Biogeography, 37(10):1851–1862. Kéry, M. and Royle, J. A. (2016). Applied Hierarchical Modeling in Ecology: Analysis of distri- bution, abundance and species richness in R and BUGS: Volume 1: Prelude and Static Models. Academic Press. Kéry, M. and Royle, J. A. (2021). Applied Hierarchical Modeling in Ecology: Analysis of distri- bution, abundance, and species richness in R and BUGS: Volume 2: Dynamic and advanced models. Academic Press. Kéry, M., Royle, J. A., and Schmid, H. (2005). Modeling avian abundance from replicated counts using binomial mixture models. Ecological Applications, 15(4):1450–1461. Kéry, M., Royle, J. A., Schmid, H., Schaub, M., Volet, B., Haefliger, G., and Zbinden, N. (2010b). Site-occupancy distribution modeling to correct population-trend estimates derived from oppor- tunistic observations. Conservation Biology, 24(5):1388–1397. Kéry, M. and Schmidt, B. (2008). Imperfect detection and its consequences for monitoring for conservation. Community Ecology, 9(2):207–216. Kindsvater, H. K., Dulvy, N. K., Horswill, C., Juan-Jordá, M.-J., Mangel, M., and Matthiopoulos, J. (2018). Overcoming the data crisis in biodiversity conservation. Trends in Ecology & Evolution, 33(9):676–688. Lahoz-Monfort, J. J., Harris, M. P., Wanless, S., Freeman, S. N., and Morgan, B. J. (2017). Bringing it all together: multi-species integrated population modelling of a breeding community. Journal 72 of Agricultural, Biological and Environmental Statistics, 22(2):140–160. Lauret, V., Labach, H., Authier, M., and Gimenez, O. (2021). Using single visits into integrated occupancy models to make the most of existing monitoring programs. Ecology, page e03535. Law, B. S., Brassil, T., Gonsalves, L., Roe, P., Truskinger, A., and McConville, A. (2018). Passive acoustics and sound recognition provide new insights on status and resilience of an iconic endangered marsupial (koala phascolarctos cinereus) to timber harvesting. PloS One, 13(10):e0205075. Lele, S. R., Moreno, M., and Bayne, E. (2012). Dealing with detection error in site occupancy surveys: what can we do with a single survey? Journal of Plant Ecology, 5(1):22–31. MacKenzie, D. I., Nichols, J. D., Hines, J. E., Knutson, M. G., and Franklin, A. B. (2003). Esti- mating site occupancy, colonization, and local extinction when a species is detected imperfectly. Ecology, 84(8):2200–2207. MacKenzie, D. I., Nichols, J. D., Lachman, G. B., Droege, S., Royle, A. J., and Langtimm, C. A. (2002). Estimating site occupancy rates when detection probabilities are less than one. Ecology, 83(8):2248–2255. MacKenzie, D. I. and Royle, J. A. (2005). Designing occupancy studies: general advice and allocating survey effort. Journal of Applied Ecology, 42(6):1105–1114. Magurran, A. E. (2013). Measuring Biological Diversity. John Wiley & Sons. Mata, L., Threlfall, C. G., Williams, N. S., Hahs, A. K., Malipatil, M., Stork, N. E., and Livesley, S. J. (2017). Conserving herbivorous and predatory insects in urban green spaces. Scientific Reports, 7(1):1–12. McCullagh, P. (2019). Generalized Linear Models. Routledge. Mena, J. L., Rivero, J., Bonifaz, E., Pastor, P., Pacheco, J., and Aide, T. M. (2021). The effect of artificial light on bat richness and nocturnal soundscapes along an urbanization gradient in an arid landscape of central peru. Urban Ecosystems, pages 1–12. Miller, D. A., Pacifici, K., Sanderlin, J. S., and Reich, B. J. (2019). The recent past and promising future for data integration methods to estimate species’ distributions. Methods in Ecology and Evolution, 10(1):22–37. Morrison, C., Aunin, š, A., Benkő, Z., Brotons, L., Chodkiewicz, T., Chylarecki, P., Escandell, V., Eskildsen, D., Gamero, A., Herrando, S., et al. (2021). Bird population declines and species turnover are changing the acoustic properties of spring soundscapes. Nature Communications, 12(1):1–12. 73 National Ecological Observatory Network (NEON) (2021). Breeding landbird point counts (dp1.10003.001). Nichols, J. D., Hines, J. E., Sauer, J. R., Fallon, F. W., Fallon, J. E., and Heglund, P. J. (2000). A double-observer approach for estimating detection probability and abundance from point counts. The Auk, 117(2):393–408. Nichols, J. D., Thomas, L., and Conn, P. B. (2009). Inferences about landbird abundance from count data: Recent advances and future directions. In Environmental and Ecological Statistics, volume 3. Springer. Oberosler, V., Tenan, S., Zipkin, E., and Rovero, F. (2020). Poor management in protected areas is associated with lowered tropical mammal diversity. Animal Conservation, 23(2):171–181. Pacifici, K., Reich, B. J., Miller, D. A., and Pease, B. S. (2019). Resolving misaligned spatial data with integrated species distribution models. Ecology, 100(6):e02709. Pardieck, K., Ziolkowski Jr, D., Lutmerding, M., Aponte, V., and Hudson, M.-A. (2020). North american breeding bird survey dataset 1966–2019. U.S. Geological Survey data release, https://doi.org/10.5066/P9J6QUF6. Pérez-Granados, C. and Traba, J. (2021). Estimating bird density using passive acoustic monitoring: a review of methods and suggestions for further research. Ibis. Péron, G. and Koons, D. N. (2012). Integrated modeling of communities: parasitism, competition, and demographic synchrony in sympatric ducks. Ecology, 93(11):2456–2464. Pijanowski, B. C., Farina, A., Gage, S. H., Dumyahn, S. L., and Krause, B. L. (2011a). What is soundscape ecology? An introduction and overview of an emerging new science. Landscape ecology, 26(9):1213–1232. Pijanowski, B. C., Villanueva-Rivera, L. J., Dumyahn, S. L., Farina, A., Krause, B. L., Napoletano, B. M., Gage, S. H., and Pieretti, N. (2011b). Soundscape ecology: the science of sound in the landscape. BioScience, 61(3):203–216. Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). CODA: Convergence Diagnosis and Output Analysis for MCMC. R News, 6(1):7–11. Polson, N. G., Scott, J. G., and Windle, J. (2013). Bayesian inference for logistic models using Pólya–Gamma latent variables. Journal of the American Statistical Association, 108(504):1339– 1349. R Core Team (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. 74 Ralph, C. J. (1993). Handbook of field methods for monitoring landbirds, volume 144. Pacific Southwest Research Station. Rappaport, D. I., Royle, J. A., and Morton, D. C. (2020). Acoustic space occupancy: Combining ecoacoustics and lidar to model biodiversity variation and detection bias across heterogeneous landscapes. Ecological Indicators, 113:106172. Rodhouse, T. J., Rodriguez, R. M., Banner, K. M., Ormsbee, P. C., Barnett, J., and Irvine, K. M. (2019). Evidence of region-wide bat population decline from long-term monitoring and bayesian occupancy models with empirically informed priors. Ecology and Evolution, 9(19):11078–11088. Rosenberg, K. V., Dokter, A. M., Blancher, P. J., Sauer, J. R., Smith, A. C., Smith, P. A., Stanton, J. C., Panjabi, A., Helft, L., Parr, M., et al. (2019). Decline of the north american avifauna. Science, 366(6461):120–124. Royle, J. A. (2004). N-mixture models for estimating population size from spatially replicated counts. Biometrics, 60(1):108–115. Royle, J. A. and Dorazio, R. M. (2006). Hierarchical models of animal abundance and occurrence. Journal of Agricultural, Biological, and Environmental Statistics, 11(3):249–263. Royle, J. A. and Dorazio, R. M. (2008). Hierarchical modeling and inference in ecology: the analysis of data from populations, metapopulations and communities. Elsevier. Royle, J. A., Dorazio, R. M., and Link, W. A. (2007). Analysis of multinomial models with unknown index using data augmentation. Journal of Computational and Graphical Statistics, 16(1):67–85. Rushing, C. S., Royle, J. A., Ziolkowski, D. J., and Pardieck, K. L. (2020). Migratory behavior and winter geography drive differential range shifts of eastern birds in response to recent climate change. Proceedings of the National Academy of Sciences, 117(23):12897–12903. Schaub, M. and Abadi, F. (2011). Integrated population models: a novel analysis framework for deeper insights into population dynamics. Journal of Ornithology, 152(1):227–237. Shonfield, J. and Bayne, E. (2017). Autonomous recording units in avian ecological research: current use and future applications. Avian Conservation and Ecology, 12(1). Simmonds, E. G., Jarvis, S. G., Henrys, P. A., Isaac, N. J., and O’Hara, R. B. (2020). Is more data always better? A simulation study of benefits and limitations of integrated distribution models. Ecography, 43(10):1413–1422. Smith, A. C., Fahrig, L., and Francis, C. M. (2011). Landscape size affects the relative importance of habitat amount, habitat fragmentation, and matrix quality on forest birds. Ecography, 34(1):103– 75 113. Sollmann, R., Gardner, B., Williams, K. A., Gilbert, A. T., and Veit, R. R. (2016). A hierarchi- cal distance sampling model to estimate abundance and covariate associations of species and communities. Methods in Ecology and Evolution, 7(5):529–537. Sor, R., Park, Y.-S., Boets, P., Goethals, P. L., and Lek, S. (2017). Effects of species prevalence on the performance of predictive models. Ecological Modelling, 354:11–19. Sueur, J. and Farina, A. (2015). Ecoacoustics: the ecological investigation and interpretation of environmental sound. Biosemiotics, 8(3):493–502. Sueur, J., Farina, A., Gasc, A., Pieretti, N., and Pavoine, S. (2014). Acoustic indices for biodiversity assessment and landscape investigation. Acta Acustica united with Acustica, 100(4):772–781. Sueur, J., Pavoine, S., Hamerlynck, O., and Duvail, S. (2008). Rapid acoustic survey for biodiversity appraisal. PloS One, 3(12):e4065. Sugai, L. S. M., Silva, T. S. F., Ribeiro Jr, J. W., and Llusia, D. (2019). Terrestrial passive acoustic monitoring: review and perspectives. BioScience, 69(1):15–25. Sutherland, C., Brambilla, M., Pedrini, P., and Tenan, S. (2016). A multiregion community model for inference about geographic variation in species richness. Methods in Ecology and Evolution, 7(7):783–791. Swanson, A. K., Dobrowski, S. Z., Finley, A. O., Thorne, J. H., and Schwartz, M. K. (2013). Spatial regression methods capture prediction uncertainty in species distribution model projections through time. Global Ecology and Biogeography, 22(2):242–251. Tenan, S., Brambilla, M., Pedrini, P., and Sutherland, C. (2017). Quantifying spatial variation in the size and structure of ecologically stratified communities. Methods in Ecology and Evolution, 8(8):976–984. Tilman, D., Isbell, F., and Cowles, J. M. (2014). Biodiversity and ecosystem functioning. Annual Review of Ecology, Evolution, and Systematics, 45:471–493. Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave- one-out cross-validation and WAIC. Statistics and Computing, 27(5):1413–1432. Wagner, D. L., Grames, E. M., Forister, M. L., Berenbaum, M. R., and Stopak, D. (2021). Insect decline in the anthropocene: Death by a thousand cuts. Proceedings of the National Academy of Sciences, 118(2). Wood, C. M., Klinck, H., Gustafson, M., Keane, J. J., Sawyer, S. C., Gutiérrez, R., and Peery, M. Z. (2021). Using the ecological significance of animal vocalizations to improve inference in 76 acoustic monitoring programs. Conservation Biology, 35(1):336–345. Wood, C. M., Popescu, V. D., Klinck, H., Keane, J. J., Gutiérrez, R., Sawyer, S. C., and Peery, M. Z. (2019). Detecting small changes in populations at landscape scales: A bioacoustic site-occupancy framework. Ecological Indicators, 98:492–507. Wright, A. D., Grant, E. H. C., and Zipkin, E. F. (2020a). A hierarchical analysis of habitat area, connectivity, and quality on amphibian diversity across spatial scales. Landscape Ecology, 35(2):529–544. Wright, W. J., Irvine, K. M., Almberg, E. S., and Litt, A. R. (2020b). Modelling misclassification in multi-species acoustic data when estimating occupancy and relative activity. Methods in Ecology and Evolution, 11(1):71–81. Wright, W. J., Irvine, K. M., Rodhouse, T. J., and Litt, A. R. (2021). Spatial gaussian processes im- prove multi-species occupancy models when range boundaries are uncertain and nonoverlapping. Ecology and Evolution. Yamaura, Y., Andrew Royle, J., Kuboi, K., Tada, T., Ikeno, S., and Makino, S. (2011). Modelling community dynamics based on species-level abundance models from detection/nondetection data. Journal of Applied Ecology, 48(1):67–75. Yip, D. A., Knight, E. C., Haave-Audet, E., Wilson, S. J., Charchuk, C., Scott, C. D., Sólymos, P., and Bayne, E. M. (2019). Sound level measurements from audio recordings provide objective distance estimates for distance sampling wildlife populations. Remote Sensing in Ecology and Conservation. Youngflesh, C., Socolar, J., Amaral, B. R., Arab, A., Guralnick, R. P., Hurlbert, A. H., LaFrance, R., Mayor, S. J., Miller, D. A., and Tingley, M. W. (2021). Migratory strategy drives species-level variation in bird sensitivity to vegetation green-up. Nature Ecology & Evolution, pages 1–8. Zipkin, E. F., DeWan, A., and Andrew Royle, J. (2009). Impacts of forest fragmentation on species richness: a hierarchical approach to community modelling. Journal of Applied Ecology, 46(4):815–822. Zipkin, E. F., Grant, E. H. C., and Fagan, W. F. (2012). Evaluating the predictive abilities of community occupancy models using AUC while accounting for imperfect detection. Ecological Applications, 22(7):1962–1972. Zipkin, E. F., Rossman, S., Yackulic, C. B., Wiens, J. D., Thorson, J. T., Davis, R. J., and Grant, E. H. C. (2017). Integrating count and detection–nondetection data to model population dynamics. Ecology, 98(6):1640–1650. Zipkin, E. F., Royle, J. A., Dawson, D. K., and Bates, S. (2010). Multi-species occurrence models to evaluate the effects of conservation and management actions. Biological Conservation, 77 143(2):479–484. Zipkin, E. F. and Saunders, S. P. (2018). Synthesizing multiple data types for biological conservation using integrated population models. Biological Conservation, 217:240–250. Zipkin, E. F., Zylstra, E. R., Wright, A. D., Saunders, S. P., Finley, A. O., Dietze, M. C., Itter, M. S., and Tingley, M. W. (2021). Addressing data integration challenges to link ecological processes across scales. Frontiers in Ecology and the Environment, 19(1):30–38. 78