DATA INTEGRATION IN POPULATION AND COMMUNITY ECOLOGY USING HIERARCHICAL MODELING By Matthew T. Farr A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Zoology-Doctor of Philosophy Ecology, Evolutionary Biology and Behavior-Dual Major 2021 ABSTRACT DATA INTEGRATION IN POPULATION AND COMMUNITY ECOLOGY USING HIERARCHICAL MODELING By Matthew T. Farr In this dissertation, I develop and apply methods for data integration using hierarchical modeling to estimate the status, trends, and demography of wildlife populations and communities. I use multi-level statistical and mathematical models to explicitly link observed data to latent ecological processes. By separately modeling observational and ecological processes, I can integrate multiple disparate data sources into a unified framework to estimate ecologically relevant population and community parameters, often in the context of wildlife conservation. In Chapter One, I apply a multispecies hierarchical distance sampling model to assess the effect of management actions on a carnivore community in the Masai Mara National Reserve, Kenya. I assess variation in species-level responses to passive management, resulting in human disturbance and apex predator declines. In Chapter Two, I develop an integrated distribution model that uses distance sampling and presence-only data to jointly estimate species abundance. I apply this model to a case study on black-backed jackals (Canis mesomelas) to evaluate the effects of anthropogenic disturbance on the distribution of jackals across the Masai Mara National Reserve. In Chapter Three, I evaluate status and trends of species in a forest dwelling duiker community using detection-nondetection data. I develop a multispecies dynamic N-occupancy model to estimate species-level abundance, demographic parameters, and quasi- extinction probabilities. In Chapter Four, I create a spatiotemporal integrated model to estimate the effects of weather conditions on monarch butterflies (Danaus plexippus) during spring migration. Each chapter illustrates a unique application of data integration in wildlife ecology, either by combining data on multiple species to estimate population and community-level parameters or by combining disparate data sources on a single species to estimate demography and other population-level parameters. Data integration is a powerful framework that leverages all available information to address pressing conservation challenges. ACKNOWLEDGEMENTS The quantity and content of words within a dissertation do not accurately reflect the experiences of obtaining a PhD. The path to completing a PhD is strenuous, and behind any successful graduate is a community of supporting individuals. Since my matriculation at Michigan State University in 2015, I have developed numerous relationships that have fostered both my academic prowess and growth as an individual. I feel privileged to have received the amount of support in both my professional and personal lives while pursuing my Doctor of Philosophy degree. My coming to MSU was encouraged by my undergraduate advisors, Dr. Patrick Zollner and Dr. Robert Swihart. I owe them a huge debt of gratitude for recommending me to my graduate advisor, Dr. Elise Zipkin. To Elise, thank you for continuing to be such an exceptional advisor. Your commitment to helping and improving everyone and everything around you is inspiring. The Zipkin Lab and greater MSU community has truly been an outstanding group of scientists that have promoted my growth as a researcher. Beyond formal collaboration, the access to each lab member and colleague has been a great source of intellect and creativity in dealing with the idiosyncrasies of ecology and graduate school. A special thanks to Dr. Sonja Christensen, Dr. Savvas Constantinou, Kayla Davis, Dr. Andrew Dennhardt, Dr. David Green, Erika LaPlante, Dr. Connie Rojas, Dr. Sarah Saunders, Dr. Klara Scharnagl, Alli Sussman, Dr. Alex Wright, and Dr. Erin Zylstra. I am positive that our friendships will carry on. An additional thanks to my committee members, Dr. Andrew Finley, Dr. Kay Holekamp, and Dr. Gary Roloff for your commitment to improving my research. A specific thanks to Kay for the access to her outstanding long-term monitoring data and for my tenure at her field sites in Kenya. iv I also want to thank my family. My parents, Tom and Jane, and my sister, Rachel, have been constant in their support of me. I feel very grateful to them. I want to thank my friends as well for always keeping me grounded, focused, and motivated on pursing my passions. A specific thanks to Brandon Zinman and Cole Bleke for providing me an outlet to enjoy wildlife and the outdoors outside of academia. I cannot wait for our upcoming adventures. Lastly, to the love of my life, Chase. My appreciation for you is never fully expressed. We have been through so many trials and tribulations in the last six years, and I am forever grateful we were able to overcome them together. As we finally end this chapter, I cannot wait for what is next for both of us and revel in the uncertainty and endless possibilities of our next chapter. Matthew T. Farr v TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................... viii LIST OF FIGURES ....................................................................................................................... ix INTRODUCTION ...........................................................................................................................1 CHAPTER 1: Multispecies hierarchical modeling reveals variable responses of African carnivores to management alternatives ............................................................................................7 Abstract .........................................................................................................................................7 CHAPTER 2: Integrating distance sampling and presence-only data to estimate species abundance ........................................................................................................................................9 Abstract .........................................................................................................................................9 CHAPTER 3: Quantifying the conservation status and trends of wildlife communities using detection-nondetection data ...........................................................................................................11 Abstract .......................................................................................................................................11 Introduction .................................................................................................................................12 Model ..........................................................................................................................................15 Observation process .................................................................................................................15 Biological process ....................................................................................................................17 Community component............................................................................................................18 Application ..................................................................................................................................19 Study area and data collection .................................................................................................19 Data analysis ............................................................................................................................21 Bayesian population viability analysis.....................................................................................23 Results .........................................................................................................................................24 Population abundance and growth ...........................................................................................24 Demographic rates ...................................................................................................................24 Population viability ..................................................................................................................25 Discussion ...................................................................................................................................26 CHAPTER 4: Overcoming data gaps using integrated modeling to estimate population abundance of a migratory species ..................................................................................................34 Abstract .......................................................................................................................................34 Introduction .................................................................................................................................34 The Data ......................................................................................................................................37 The Model ...................................................................................................................................38 Spatial heterogeneity in population density .............................................................................39 Temporal change in population density ...................................................................................40 Structured count data ...............................................................................................................40 Unstructured presence-only data..............................................................................................41 vi Spatiotemporal data integration ...............................................................................................42 Joint likelihood.........................................................................................................................43 Simulation Study.........................................................................................................................43 Scenario....................................................................................................................................43 Results ......................................................................................................................................45 Case Study ..................................................................................................................................46 Population monitoring .............................................................................................................46 Spatiotemporal stages ..............................................................................................................47 Datasets ....................................................................................................................................48 Environmental covariates.........................................................................................................49 Observation covariates .............................................................................................................49 Data analysis ............................................................................................................................50 Results ......................................................................................................................................51 Discussion ...................................................................................................................................52 APPENDICES ...............................................................................................................................61 APPENDIX A: Additional application details ...........................................................................62 APPENDIX B: Nimble model code ...........................................................................................65 APPENDIX C: Model output .....................................................................................................71 APPENDIX D: Simulation code, and output ..............................................................................80 APPENDIX E: Case study details, code, and output ................................................................100 BIBLIOGRAPHY ........................................................................................................................116 vii LIST OF TABLES Table 3.1 National parks and basic data information used in my multi-species dynamic N- occupancy analysis of the antelope community including the number of years sampled, species observed, number of sampling sites, and the number of recorded presences across species. Species are numerically identified as follows: (1) Cephalophus callipygus (2) Cephalophus dorsalis (3) Cephalophus harveyi (4) Cephalophus leucogaster (5) Philantomba monticola (6) Nesotragus moschatus (7) Cephalophus nigrifrons (8) Cephalophus ogilbyi (9) Tragelaphus scriptus (10) Cephalophus spadix (11) Tragelaphus spekii (12) Cephalophus silvicultor. ...........29 Table C.1 Mean annual population growth rates (i.e., geometric mean) with 95% credible intervals of each population (species-park combination) during the study period ........................71 Table C.2 Mean annual site-level abundance with 95% credible intervals of each population across the study period (including only the years sampled for each population/park combination) ........................................................................................................................................................73 Table C.3 Mean annual survival probability and gains with 95% credible intervals for each species across all parks. .................................................................................................................77 Table C.4 Park-level mean annual survival probability with 95% credible intervals ...................78 Table C.5 Quasi-extinction probability (for each of the 18 populations coming from parks with at least five years of data) for the projected window of 2020-2030 ..................................................79 Table D.1 The root mean squared error and percent relative bias (median, lower and upper quantiles) of each estimated parameter compared to the true simulated value for both the integrated (I) and presence-only (PO) analyses .............................................................................99 Table E.1 Estimates (mean and 95% confidence intervals) of monarch population density per 100-m2 for each year and stage ....................................................................................................114 Table E.2 Estimates (mean and 95% confidence intervals) of the effect of NDVI and averaged daily GDD per each stage on monarch population density on the log-scale ...............................115 viii LIST OF FIGURES Figure 3.1 Population density (i.e., abundance per site) of each species at each park across all years that a park was surveyed. Points show the mean abundance while the shaded area is the 95% credible interval around the mean. Estimates are only shown for parks and years during which sampling occurred and for species that were observed in each park. Each panel shows the trends for a specific park (labeled with the parks’ abbreviations) and species are each represented with a unique color.........................................................................................................................30 Figure 3.2 (A) Annual apparent survival probabilities for each species in the community across the network of parks (πœ”0,𝑖 ), mean survival probability at the community-level (πœ‡πœ”0 ; black coloring), and mean park-level survival probabilities for the entire community of species that reside in the park (πœ‡πœ”0 + πœ€πœ”,π‘Ÿ ; unique coloring). (B) Species-specific and community-level annual number of individuals gained (𝛾0,𝑖 , πœ‡π›Ύ0 ). (A,B) Boxplots show the mean, 50% credible intervals, and 95% credible intervals. The dashed-line separates the species-specific estimates from the community- and park-level estimates .............................................................................31 Figure 3.3 Retrospective population density (i.e., abundance per site during study period) and projected population density (up to 2030) for each species within the four parks that were sampled for at least five years. Lines show mean estimated (solid) and projected (dashed) density while the shaded areas depict the 95% credible intervals. Estimates are shown from the initial sampling year for a given park until 2030 (with missing years interpolated for parks when data were unavailable). Each panel shows a different park, labeled with the parks’ abbreviations. Species are depicted with a unique color .......................................................................................32 Figure 4.1 (A-D) Visualization of our simulation study for the spatiotemporal integrated model with stage, domain 𝑆1 on the left (A, C) and stage, domain 𝑆2 on the right (B, D) with dashed lines indicating the extent of each domain. The top row (A, B) depicts true population abundance and distribution with black dots representing observations of individuals. The middle row (C, D) shows the data for each stage within each domain. (C) 𝑆1 contains presence-only data (white dots) and 25 sites (dashed circles) with single-visit counts (magnitude of counts are within dashed circles). (D) 𝑆2 contains only presence-only data. (A-D) Background gradient indicates the expected population density (individuals per unit area) for each domain and stage (A, C are the same [stage 1] and B, D are the same [stage 2]). (E, F) Simulation results showing bias (estimated minus true value) from each of the integrated and presence-only models. Bottom left (E) shows results for biases in estimates of the baseline population density (πœ†0 ), covariate effect (𝛽), and population change (𝛿). Bottom right (F) shows results for biases in estimates of abundance for each domain and stage ...........................................................................................56 Figure 4.2 Predicted population density of monarchs from 2016 to 2018 during their spring migration. Density values were calculated using parameter estimates and values of NDVI and daily GDD averaged across each stage at a given location and year. Population density was estimated and reported at a resolution of 100-m2. Rows and columns correspond to years and stages, respectively ........................................................................................................................58 ix Figure 4.3 Effects of NDVI and averaged daily GDD on monarch population density. Top left (A) is the parameter estimates (mean and 95% confidence intervals) for the linear and quadratic effects of NDVI and averaged daily GDD at each stage. Middle left (B, C) is the estimated marginal effects (mean and 95% confidence intervals) of NDVI (B) and averaged daily GDD (C) on monarch population density at each stage, respectively. Right column (D-F) is the marginal effect (mean) of both NDVI and averaged daily GDD on monarch population density for early spring (D), late spring (E), and early summer (F). Crosses indicate observed values between 2016 and 2018 and dashed lines indicate the observed mean values during 2016 - 2018 .............59 x INTRODUCTION Human population growth and resulting pervasiveness of anthropogenic disturbance have left no wilderness or natural place unscathed (Foley et al. 2005, Watson et al. 2018). Anthropogenic-induced declines of many wildlife species have led to extirpations and extinctions, and consequential losses of biodiversity can have dramatic effects on the remaining wildlife (Ceballos et al. 2015, Zipkin et al. 2020, IUCN 2021). The goal of conservation science and wildlife management is to reverse these alarming trends through strategic protection of threatened species and their critical habitats (Watson & Venter 2017, CBD 2020). Numerous obstacles obstruct biodiversity conservation, but one of the daunting hurdles limiting such global-scale efforts has been extensive gaps in data availability on species status and trends (Meyer et al. 2015, Conde et al. 2019). Throughout both terrestrial and aquatic biomes, locations of data collection are often biased because of socio-economic constraints (e.g., financial resources for data collection are tied to developed countries), creating spatial gaps in data availability (Meyer et al. 2015). The temporal extent of data is also restricted as long-term ecological monitoring programs have existed for less than a century (Magnuson 1990). Additionally, data are often collected for only a subset of species (e.g., charismatic, indicator, umbrella) within ecological communities while many threatened species are categorized as data deficient by the International Union for Conservation of Nature (Troudet et al. 2017, IUCN 2021). Spatial, temporal, and species data gaps limit characterization of the status and trends of threatened populations (Conde et al. 2019). The growing threat to global biodiversity has prompted the establishment of many large- scale monitoring programs to overcome data gaps in estimating the status and trends of threatened species (Magnuson 1990). Data digitalization and online repositories (e.g., iNaturalist, 1 GBIF, Map of Life) have exponentially increased data availability for addressing both basic ecological questions and those focused on conservation (Jetz et al. 2012, Chandler et al. 2017). However, variation in the species, ecosystems, and motivations in data being collected by monitoring programs and a lack of standardization in monitoring have led to a plethora of disjointed datasets, often containing different types (e.g., presence-only, presence-absence, count, capture-recapture), which can make analysis challenging (Miller et al. 2019, Saunders et al. 2019a, Isaac et al. 2020). Data quality, or the information content of a given data type, largely depends on how the data were collected and whether the observation process was designed to account for errors such as, imperfect detection or location biases (Miller et al. 2019). A tradeoff often exists between information content and data quantity where information-rich data has a high logistical cost (e.g., financial, effort, infrastructure). Lower information content data, such as that collected via opportunistic or ad libitum sampling, can be conducted with relative ease as there is no protocol or design. Quantitative ecologists have rapidly been developing methods to strategically integrate multiple data sources into a unified analysis to estimate population- and community-level parameters (Dorazio & Royle 2005, Dorazio et al. 2006, Zipkin et al. 2009, Zipkin et al. 2010, Rossman et al. 2016, Pacifici et al. 2017, Zipkin & Saunders 2018, Fletcher et al. 2019, Miller et al. 2019, Isaac et al. 2020). Data integration, which I broadly define as any quantitative analysis framework that combines multiple data sources to estimate a shared ecological process (e.g., abundance, demographic rates, covariate effects), can improve accuracy and precision of estimates over independent analyses of each data source (Zipkin et al. 2009, Zipkin & Saunders 2018). Data integration may also enable estimation of parameters that would be unidentifiable in 2 an analysis of a single data source (Schaub & Abadi 2011, Dorazio et al. 2014) and/or resolve discrepancies among inferences based on disparate data sources (Saunders et al. 2019a). In my dissertation, I focus on a subset of data integration methods that use a hierarchical modeling framework to differentiate variation coming from the observation processes of one or more data sources and variation attributable to underlying ecological processes. This framework links the ecological parameter(s) of interest (e.g., abundance, survival) to the data by accounting for observation error, which may vary in structure or magnitude, among data sources (Pacifici et al. 2017, Zipkin & Saunders 2018, Fletcher et al. 2019, Miller et al. 2019, Isaac et al. 2020). I explore the utility of data integration within both community and population ecology by combining data from different species (i.e., multispecies models) and combining data of different types (single-species integrated models). Multispecies modeling uses a hierarchical framework to model the dynamics of multiple species simultaneously within a unified community analysis. Data from each species is of the same type, but observations and ecological parameters associated with each species are nested hierarchically within community-level components, allowing for variation among species within a common community (Dorazio & Royle 2005, Dorazio et al. 2006, KΓ©ry & Royle 2009, Zipkin et al. 2009). Similar in its use of hierarchical statistical models, single-species integrated modeling combines data of different types within a unified analysis, specifying a separate observation process for each data type (Schaub & Abadi 2011, Dorazio et al. 2014, Zipkin & Saunders 2018). Because different data types inform at least one common ecological parameter, the ecological process (e.g., population abundance, population dynamics) can be jointly estimated. Each chapter in my dissertation is a separate development and application of a hierarchical multispecies or single-species integrated model to make inferences about different taxa and ecological systems. 3 Modeling applications presented in Chapters One and Two occur within the Masai Mara National Reserve, Kenya. Data used for these chapters was collected by the Michigan State University Mara Hyena Project, which has been monitoring spotted hyena (Crocuta crocuta) in the Reserve for over 30 years. During this long-term study, the Mara Hyena Project has surveyed the larger carnivore community in the Reserve using a variety of techniques (Green 2015). In Chapter One, I develop a multispecies hierarchical model to evaluate the effect of management alternatives and resulting human disturbances on a carnivore community within the Reserve. By integrating data for multiple carnivore species, I find that passive management and high levels of human disturbance have variable effects on carnivore abundance. African lions (Leo panthera) are adversely affected, while other species like spotted hyena and black-backed jackals (Canis mesomelas) benefit from disturbance. Variation in species’ responses to disturbance could be driven by numerous processes, including direct mortality, competitive release, and habitat degradation. Although my goal was not to pinpoint explicit processes, my results indicate that passive management does not affect carnivores uniformly. By using a multispecies hierarchical model, I can both estimate abundance of rare and cryptic species that may have been discarded in analyses focused on single species, and characterize variation in the responses of multiple carnivores to human disturbance. In Chapter Two, I develop an integrated distribution model that combines distance sampling (higher information content) and presence-only (lower information content) data to generate spatially explicit estimates of species abundance. I use a simulation study to assess the data requirements of each type and verify that the model returns unbiased and precise estimates. I apply this model to a case study on black-backed jackals in the Reserve and find that black- backed jackals are thriving in more disturbed regions near high human density. Despite its lower 4 information content and potential biases, presence-only data can improve the precision of estimating ecological patterns when used in conjunction with structured data. Chapter Three characterizes the regional distribution and dynamics of forest antelopes across Central and East Africa using detection-nondetection data from The Tropical Ecology Assessment and Monitoring Network. Recent methodological developments have expanded the capabilities of detection-nondetection data (e.g., camera trap data) to estimate both abundance and dynamics of wildlife populations (Rossman et al. 2016). I extended a single-species framework (dynamic N-occupancy model) for analyzing detection-nondetection data to simultaneously estimate demographic rates for multiple species in a community. This approach is particularly valuable in that it can characterize population dynamics for cryptic or rare species when detection-nondetection data may be the only available data type. Although many antelope species in this region are experiencing declines from habitat loss and poaching, my results suggest that the antelope populations were mostly stable and had low probabilities of extirpation over the next ten years. In Chapter Four, I explore spring migration patterns of the eastern monarch butterfly (Danaus plexippus), which ranges from Central Mexico to the Midwestern United States and Southeast Ontario, Canada. While population dynamics of this species are affected by a range of seasonal factors, recent studies suggest that environmental conditions during spring migration may be particularly important (Zipkin et al. 2012, Saunders et al. 2016, 2018b, Zylstra et al. 2021). However, data deficiencies in spring limit our understanding of the processes affecting abundance during this critical stage of the migratory cycle. I developed a hierarchical model to integrate multiple types of unstructured opportunistic sightings of monarchs in the spring with structured data in early summer to characterize spatiotemporal variation in abundance on the 5 spring breeding grounds and assess the effects of spring climatic drivers. Using simulations, I assessed the ability of non-overlapping data to estimate abundance across spatiotemporal domains. When applying this integrated model to the eastern population of monarchs, I found that the effect of spring weather conditions on monarch abundance varies across its spring migration and early summer breeding grounds. These results demonstrate the potential for integrated models to circumvent data gaps during a species annual cycle. The variety of systems, taxa, data types, and corresponding analyses covered in my dissertation illustrates the utility of data integration in population and community ecology. The frameworks I develop have a broad range of applications because of their ability to produce unbiased and precise estimates by accounting for variation between species or data types. Whether it is a multispecies model or an integrated model, the hierarchy of observation and ecological process models allows for a seamless statistical and mathematical connection between whichever data sources are available and the target parameters of interest (e.g., abundance). Data integration is a powerful tool that can harness the wide range of data now available to help overcome knowledge gaps in promoting conservation of wildlife in the face of ongoing anthropogenic threats. 6 CHAPTER 1: Multispecies hierarchical modeling reveals variable responses of African carnivores to management alternatives Abstract Carnivore communities face unprecedented threats from humans. Yet, management regimes have variable effects on carnivores, where species may persist or decline in response to direct or indirect changes to the ecosystem. Using a hierarchical multispecies modeling approach, I examined the effects of alternative management regimes (i.e., active vs. passive enforcement of regulations) on carnivore abundances and group sizes at both species and community levels in the Masai Mara National Reserve, Kenya. Alternative management regimes have created a dichotomy in ecosystem conditions within the Reserve, where active enforcement of regulations maintains low levels of human disturbance in the Mara Triangle and passive enforcement of regulations in the Talek region permits multiple forms of human disturbance. My results demonstrate that these alternative management regimes have variable effects on 11 observed carnivore species. As predicted, some species, such as African lions and bat-eared foxes, have higher population densities in the Mara Triangle, where regulations are actively enforced. Yet, other species, including black-backed jackals and spotted hyenas, have higher population densities in the Talek region where enforcement is passive. Multiple underlying mechanisms, including behavioral plasticity and competitive release, are likely causing higher black-backed jackals and spotted hyena densities in the disturbed Talek region. My multispecies modeling framework reveals that carnivores do not react to management regimes uniformly, shaping carnivore communities by differentially producing winning and losing species. Some carnivore species require active enforcement of regulations for effective conservation, while 7 others more readily adapt (and in some instances thrive in response) to lax management enforcement and resulting anthropogenic disturbance. Yet, high levels of human disturbance appear to be negatively affecting the majority of carnivores, with potential consequences that may permeate throughout the rest of the ecosystem. Community approaches to monitoring carnivores should be adopted as single species monitoring may overlook important intra- community variability. Material from: Farr, M. T., Green, D. S., Holekamp, K. E., Roloff, G. J., & Zipkin, E. F. (2019). Multispecies hierarchical modeling reveals variable responses of African carnivores to management alternatives. Ecological Applications, e01845. For full text of this work, please go to: https://doi.org/10.1002/eap.1845 8 CHAPTER 2: Integrating distance sampling and presence-only data to estimate species abundance Abstract Integrated models combine multiple data types within a unified analysis to estimate species abundance and covariate effects. By sharing biological parameters, integrated models improve the accuracy and precision of estimates compared to separate analyses of individual data sets. I developed an integrated point process model to combine presence-only and distance sampling data for estimation of spatially explicit abundance patterns. Simulations across a range of parameter values demonstrate that my model can recover estimates of biological covariates, but parameter accuracy and precision varied with the quantity of each data type. I applied my model to a case study of black-backed jackals in the Masai Mara National Reserve, Kenya, to examine effects of spatially varying covariates on jackal abundance patterns. The model revealed that jackals were positively affected by anthropogenic disturbance on the landscape, with highest abundance estimated along the Reserve border near human activity. I found minimal effects of landscape cover, lion density, and distance to water source, suggesting that human use of the Reserve may be the biggest driver of jackal abundance patterns. My integrated model expands the scope of ecological inference by taking advantage of widely available presence-only data, while simultaneously leveraging richer, but typically limited, distance sampling data. Material from: Farr, M. T., Green, D. S., Holekamp, K. E., & Zipkin, E. F. (2021). Integrating distance sampling and presence-only data to estimate abundance. Ecology, e03204. 9 For full text of this work, please go to: https://doi.org/10.1002/ecy.3204 10 CHAPTER 3: Quantifying the conservation status and trends of wildlife communities using detection-nondetection data Abstract Effective conservation often requires understanding species’ abundance patterns and demographic rates across space and time. Ideally, such knowledge should be available for whole communities, as variations in individual species’ demographic rates and responses to environmental factors provides critical information on optimal approaches to mitigate accelerating biodiversity loss. However, collecting the data necessary to simultaneously estimate abundance and demographic rates is often prohibitively time-intensive and expensive for communities of species. I developed a β€œmulti-species dynamic N-occupancy model” that requires only detection-nondetection data (e.g., repeated presence-absence surveys) to estimate both the abundance and demographic rates of individual species over time as well as composite metrics of the whole community. Using data from a network of camera traps across tropical equatorial African national parks, I use my model to evaluate the statuses and trends of a forest-dwelling antelope community by estimating each species’ local population abundance, rates of recruitment (i.e., reproduction and immigration), and apparent survival probabilities. I paired my model’s results with a Bayesian population viability analysis to make mechanistic projections of species’ population sizes and quasi-extinction rates within each park out to 2030. Both retrospective and prospective analyses indicate that the antelope community is fairly stable in this region (although 17% of populations/species-park combinations declined during the study period), with variations in population statuses and trends linked more closely to differences among national parks rather than differences in individual species’ life histories. While certain assumptions must be met that may prevent implementation on highly mobile or socially grouping species, in many situations 11 my modeling framework can estimate abundance, demographic rates, and extinction risks for entire communities using commonly collected detection-nondetection data. Such frameworks can be valuable to conservation efforts seeking to understand and abate biodiversity loss. Introduction Knowledge of the population size and demographic rates (e.g., survival, recruitment) of species in wildlife communities is often needed to quantify, and ultimately address, threats leading to biodiversity loss (Conde et al. 2019). Traditionally, estimates of species abundance and demographic rates have relied upon β€˜marked data’ (e.g. capture-recapture data), in which individuals are identified and followed via tags, bands, genotypes, and/or phenotypes (Pollock et al. 1990). The expensive and labor-intensive monitoring needed to generate marked data often preclude collection beyond single species. Though single-species analyses based on marked data can provide robust inferences, they are often restricted to common or charismatic species (Troudet et al. 2017). Methods to extrapolate single-species inferences (e.g., umbrella, keystone, indicator) to unmonitored community members may miss important variations among species (Cushman et al. 2010, Runge et al. 2019). Yet, accelerating biodiversity loss demands diversified approaches to monitor multiple species simultaneously and whole communities when possible (Nicholson & Possingham 2006, Ceballos et al. 2015, Zipkin et al. 2020). Community-wide assessments can provide unique information about species and their variable responses to environmental factors, including disturbance (REF). While the need to scale up biodiversity assessments to community-levels is clear, the data to do so remain logistically challenging to obtain. Most community-wide data collection protocols rely on β€˜unmarked data’ (e.g., presence-only, presence-absence, detection- nondetection, count). Unmarked data do not require identification or recapture of individuals and 12 thus can be collected more easily than marked data for community-wide monitoring programs. Arguably the most ubiquitous unmarked data type is detection-nondetection data in which observers record the presence or absence of a species at a given time and place (MacKenzie et al. 2017). A common approach to analyzing detection-nondetection data is occupancy modelling, which makes use of replicate sampling over short time frames to estimate species distributions and occurrence patterns while accounting for imperfect observation during sampling (MacKenzie et al. 2002). The advent of multi-species occupancy models (Dorazio & Royle 2005, Dorazio et al. 2006) has allowed for estimation of community occurrence processes and trends across space and time, driving discoveries in population biology, biodiversity loss, macrosystem processes, and community ecology among other areas (KΓ©ry & Schaub 2012, MacKenzie et al. 2017, Devarajan et al. 2020). Yet without the ability to estimate demographic rates, traditional occupancy models have been restricted in their capacity to infer changes in population sizes and the underlying mechanisms driving trends. Recent development of the β€˜dynamic N-occupancy model’ has expanded the use of unmarked data to jointly estimates population abundance and dynamics using detection-nondetection data for a single species (Rossman et al. 2016). This is done by decomposing changes in abundance into apparent survival and populations gains via recruitment (i.e., the combination of fecundity and immigration) using the biological process model developed by Dail and Madsen (2011) and capitalizing on the link between species detection probability and abundance developed by Royle and Nichols (2003). Here, I expand the single-species dynamic N-occupancy modeling framework to a multi- species context, capable of estimating abundance and demographic rates for communities of related species using only detection-nondetection data. My β€˜multi-species dynamic N-occupancy model’ can estimate community-level responses to environmental covariates while also capturing 13 species-specific variation in demographic rates and the effects of covariates. By linking individual responses via community-level distributions (Dorazio & Royle 2005, Dorazio et al. 2006), my modeling framework provides abundance and demographic rate estimates for rarer species that otherwise would be unidentifiable using a single-species approach (KΓ©ry & Royle 2009, Zipkin et al. 2009, 2010). Scaling the single-species dynamic N-occupancy model to a multi-species context fills a needed gap in conservation ecology by creating a framework to explore the underlying mechanisms of community-wide population changes and biodiversity assessments. I apply the multi-species dynamic N-occupancy model to a case study of forest dwelling antelope species in tropical equatorial Africa. Despite growing concern for human induced biodiversity loss in the tropics (Bradshaw et al. 2009), tropical communities contain a disproportionate amount of data gaps worldwide (Collen et al. 2008, Meyer et al. 2015). Of the available data collection methods for communities, unmarked methods are typically the most feasible and as a result, are widely used in tropical regions (O’Brien 2008, Tobler et al. 2008). My target antelope community provides an ideal case study to demonstrate the untapped capabilities of unmarked data collected via remote camera trap surveys. Camera trap surveys provide one of the only suitable methods to monitor remote, forest dwelling terrestrial mammals. Limited information on demography is available for most of the species included in my analysis, despite intense environmental pressures. To evaluate the status, trends, and potential trajectory for this community across Central and East Africa, I apply the multi-species dynamic N- occupancy model to estimate species-specific abundance and demographic rates. I then project each populations’ quasi-extinction (i.e., local extinction/extirpation) probability (i.e., likelihood that a population size falls below a viable threshold where extinction/extirpation is inevitable) 14 using Bayesian population viability analysis (BPVA; KΓ©ry & Schaub 2012, Saunders et al. 2018a, Saunders et al. 2021). Projecting species’ viabilities under potential future scenarios provides a method to quantify uncertainty in species extirpation in response to environmental drivers and is a useful metric to evaluate potential conservation actions targeted at both species and communities. My multi-species dynamic N-occupancy model thus provides a multifaceted approach to address a knowledge gap (abundance and vital rate estimates, species-specific and community-level estimates, population projections, and quasi-extinction probabilities) that occurs persistently throughout wildlife communities. Model The multi-species dynamic N-occupancy model uses detection-nondetection data to estimate abundance and demographic rates at both species and community levels by combining the N-occupancy modeling framework (Rossman et al. 2016) with the multi-species occupancy modeling framework (Dorazio & Royle 2005). The model estimates latent biological processes (i.e., abundance, apparent survival, reproduction/immigration) for individual species and accounts for imperfect detection during data collection via an observation process component. Species’ biological and observation processes are then linked with a hierarchical statistical structure (i.e. through shared distributions) to estimate community-level parameters in addition to the species-level parameters. This approach leverages information across species to improve precision of species-level parameters, especially for those species that were observed less frequently due to being either rare or elusive (Zipkin et al. 2009). Observation process To estimate latent abundance, 𝑁𝑖,𝑗,𝑑 , of species 𝑖 at a sampling site 𝑗 during year 𝑑, detection- nondetection data, 𝑦𝑖,𝑗,π‘˜,𝑑 , are collected during π‘˜ = 1,2, … 𝐾 sampling replicates. I assume that 15 species abundance at a site 𝑗 is closed to changes within year 𝑑. Thus, the 𝐾 > 1 sampling replicates within a year allow us to estimate the probability that species 𝑖 was detected at site 𝑗 during sampling replicate π‘˜ (𝑦𝑖,𝑗,π‘˜,𝑑 = 1). I model the detection-nondetection data using a Bernoulli process: 𝑦𝑖,𝑗,π‘˜,𝑑 ~ π΅π‘’π‘Ÿπ‘›π‘œπ‘’π‘™π‘™π‘–(𝑝𝑖,𝑗,π‘˜,𝑑 ), where 𝑝𝑖,𝑗,π‘˜,𝑑 is the detection probability of species 𝑖 at site 𝑗 during replicate visit π‘˜ in year 𝑑. A nondetection of species 𝑖 can result from two separate processes: the species is truly absent at the site (i.e., latent abundance of species 𝑖 at site 𝑗 in year 𝑑 is zero, 𝑁𝑖,𝑗,𝑑 = 0) or because the species was present (𝑁𝑖,𝑗,𝑑 > 0) but no individuals were detected during sampling. Thus, 𝑝𝑖,𝑗,π‘˜,𝑑 can be defined as the probability that at least one of the 𝑁𝑖,𝑗,𝑑 individuals at the site was detected during the π‘˜th sampling event (Rossman et al. 2016, Royle & Nichols 2003): 𝑁𝑖,𝑗,𝑑 𝑝𝑖,𝑗,π‘˜,𝑑 = 1 βˆ’ (1 βˆ’ πœƒπ‘–,𝑗,π‘˜,𝑑 ) where πœƒπ‘–,𝑗,π‘˜,𝑑 is the detection probability of an individual of species 𝑖 at site 𝑗 during replicate visit π‘˜ in year 𝑑. If there are no individuals at the site (𝑁𝑖,𝑗,𝑑 = 0), then the detection probability collapses to zero. Likewise, as latent abundance, 𝑁𝑖,𝑗,𝑑 , increases, the overall detection probability, 𝑝𝑖,𝑗,π‘˜,𝑑 , of the species also increases because each individual has an independent probability of being detected, πœƒπ‘–,𝑗,π‘˜,𝑑 . Covariates can be added to πœƒπ‘–,𝑗,π‘˜,𝑑 to account for variation in detection by species, site, replicate visit, and/or year using a logit-link function: π‘™π‘œπ‘”π‘–π‘‘ (πœƒπ‘–,𝑗,π‘˜,𝑑 ) = 𝛼0,𝑖 + 𝛼𝑣,𝑖 βˆ™ π‘₯𝑣,𝑗,π‘˜,𝑑 . Here, 𝛼0,𝑖 is the intercept for species 𝑖, or average detection probability of individuals on the logit scale while 𝛼𝑣,𝑖 is a vector (1,2, … , 𝑉) of parameter effects (𝛼1,𝑖 , 𝛼2,𝑖 , … , 𝛼𝑉,𝑖 ) for species 𝑖 16 of corresponding standardized covariates π‘₯𝑣,𝑗,π‘˜,𝑑 , which may change by sampling site, replicate visit, and/or year. Biological process The biological process model focuses on estimating abundance, 𝑁𝑖,𝑗,𝑑 , for species 𝑖 across all 𝑗 sites in year 𝑑 using the approach developed by Dail and Madsen (2011). I assume that species abundance changes between years (i.e., from 𝑑 βˆ’ 1 to 𝑑) through processes of survival and recruitment. In the first year for which data are available (i.e., 𝑑 = 1), I estimate abundance, 𝑁𝑖,𝑗,1 , using a Poisson distribution: 𝑁𝑖,𝑗,1 ~ π‘ƒπ‘œπ‘–π‘ π‘ π‘œπ‘›(πœ†π‘–,𝑗 ), where πœ†π‘–,𝑗 is the expected abundance of species 𝑖 at site 𝑗 in the first year of sampling (Dail & Madsen 2011). I can model heterogeneity in initial abundance by adding covariates using a log- link function: π‘™π‘œπ‘”(πœ†π‘–,𝑗 ) = 𝛽0,𝑖 + 𝛽𝑣,𝑖 βˆ™ 𝑀𝑣,𝑗 , where 𝛽0,𝑖 is the intercept (i.e., average initial abundance on the log scale) for species 𝑖 and 𝛽𝑣,𝑖 is a vector (1,2, … , 𝑉) of parameter effects (𝛽1,𝑖 , 𝛽2,𝑖 , … , 𝛽𝑉,𝑖 ) for standardized covariates 𝑀𝑣,𝑗 . In subsequent years (𝑑 > 1), I assume that changes to latent abundance of species 𝑖 at site 𝑗 occur via births-deaths and immigration-emigration processes (Dail & Madsen 2011) and are dependent on the population size during the previous year, 𝑑 βˆ’ 1. I break this process into two components: 𝑆𝑖,𝑗,π‘‘βˆ’1, the number of individuals of species 𝑖 that survive from year 𝑑 βˆ’ 1 to 𝑑 and remain at site 𝑗, and 𝐺𝑖,𝑗,π‘‘βˆ’1 , the number of new individuals of species 𝑖 that are gained to site 𝑗 via recruitment (reproduction and/or immigration) from year 𝑑 βˆ’ 1 to 𝑑 (Dail & Madsen 2011, Rossman et al. 2016). Thus, total abundance in year 𝑑 > 1 is: 17 𝑁𝑖,𝑗,𝑑 = 𝑆𝑖,𝑗,π‘‘βˆ’1 + 𝐺𝑖,𝑗,π‘‘βˆ’1 . I model the number of surviving individuals between 𝑑 βˆ’ 1 and 𝑑 using a binomial distribution: 𝑆𝑖,𝑗,π‘‘βˆ’1 ~ π΅π‘–π‘›π‘œπ‘šπ‘–π‘Žπ‘™(𝑁𝑖,𝑗,π‘‘βˆ’1 , πœ”π‘–,𝑗,π‘‘βˆ’1 ), where πœ”π‘–,𝑗,π‘‘βˆ’1 is the apparent survival probability of each individual of species 𝑖 at site 𝑗 between 𝑑 βˆ’ 1 and 𝑑. Apparent survival is the product of true survival and site fidelity (i.e., the inverse of permanent emigration). I model the number of individuals of species 𝑖 gained into the population at site 𝑗 with a Poisson distribution: 𝐺𝑖,𝑗,π‘‘βˆ’1 ~ π‘ƒπ‘œπ‘–π‘ π‘ π‘œπ‘›(𝛾𝑖,𝑗,π‘‘βˆ’1 ), where 𝛾𝑖,𝑗,π‘‘βˆ’1 is the expected number of individuals gained at each site from reproduction and immigration. Provided there is sufficient available data, variation in apparent survival probability (πœ”π‘–,𝑗,π‘‘βˆ’1) and the expected number of individuals gained to a site (𝛾𝑖,𝑗,π‘‘βˆ’1 ) can be modeled with covariates that change by site and/or year using a logit-link function and a log-link function, respectively. Community component To link the species models at a community level, I assume that the species-level parameters (i.e., intercept and effect parameters on either logit- or log-link scales) in both the observation and biological process models are random effects drawn from a parameter-specific, community-level distribution shared across all species (Dorazio & Royle 2005, Dorazio et al. 2006, Zipkin et al. 2009, Zipkin et al. 2010). For example, the intercept parameter for initial species abundance, 𝛽0,𝑖 , is assumed to come from a normal distribution: 𝛽0,𝑖 ~ π‘π‘œπ‘Ÿπ‘šπ‘Žπ‘™(πœ‡π›½0 , πœŽπ›½20 ), with a hyper-mean πœ‡π›½0 (i.e., representing average expected abundance across all species in the community) and hyper-variance πœŽπ›½20 (i.e., representing the variation in initial expected abundance 18 across species). The random effects structure allows for information sharing across species within the community, improving parameter identifiability for species with low levels of data and increasing parameter precision for most other species (KΓ©ry & Royle 2009, Zipkin et al. 2009). In addition to estimating species-level biological and observational process parameters, the hierarchical structure of the model also produces estimates of community-level mean (e.g., πœ‡π›½0 ) and variance (e.g., πœŽπ›½20 ), which provide useful metrics for summarizing community dynamics. Application Study area and data collection My study encompasses a network of six national parks (Udzungwa Mountains National Park [UDZ], Tanzania; Volcanoes National Park [VNP] Rwanda; Bwindi Impenetrable Forest [BIF], Uganda; Nouabale-Ndoki National Park [NNNP], Republic of Congo; Korup National Park [KRP], Cameroon; and Nyungwe Forest National Park [NFNP], Rwanda) in the equatorial region of Central and East Africa between 2009 and 2019 (Table 3.1). Equatorial Africa has a tropical climate (i.e., wet and humid) and pronounced wet and dry seasons with annual rainfall in parks ranging from 1500 to over 5000 mm (Kingdon 2015, O’Brien et al. 2020). Vegetation in the region is determined by rainfall and elevation with the predominant habitats being lowland and montane rainforest that extend into forest mosaics further from the equator (Kingdon 2015). Elevation varies across parks with the lowest elevation near sea-level in KRP (92 - 463 m) and the highest elevation in VNP (2509 - 3884 m; O’Brien et al. 2020). Parks range in area from 342 - 4000 km2 and are unfenced with varying human densities (0.5 - 386 humans/km2) along the edges. 19 I focused on a metacommunity of forest dwelling antelopes across the network of parks (Table 3.1; Johnston & Anthony 2012). I included 12 closely-related species in my analysis: suni (Nesotragus moschatus), bushbuck (Tragelaphus scriptus), sitatunga (Tragelaphus spekii), and 9 species of duikers (Subfamily Cephalophinae; listed in Table 3.1). Predominately a browsing community, diet items for these species include fruit, flowers, and foliage with some species also eating grass (Kingdon 2015). This antelope metacommunity is geographically distributed across Sub-Saharan Africa, living in multiple forest types from lowland to alpine (Kingdon 2015), with each species range limited to only a subset of the parks in my study (Table 3.1). Forest fragmentation and resulting refugia during Pleistocene climate change is hypothesized to be responsible for the modern distribution of this community (Johnston & Anthony 2012). In the Anthropocene, humans are likely having the largest influence over the distribution and abundance patterns of this community. Antelopes are facing pervasive anthropogenic pressures including deforestation and poaching (Newing 2001), and the health, productivity, and persistence of their tropical rainforest habitat is threatened by climate change (Phillips et al. 2009, Sullivan et al. 2020). Recent assessments of this antelope community conflict on species estimated stability (O’Brien et al. 2020), and minimal information on species abundance and demographic rates have prevented conclusive inference on the current status and future trajectory of this vulnerable community. To estimate community wide abundance and demographic rates, I collated data from the Tropical Ecology Assessment and Monitoring (TEAM) network. The TEAM network was developed for monitoring tropical species worldwide using a standardized camera trapping protocol (see TEAM Network 2011 a, b for detailed protocols). Camera traps were deployed in each park, and images were taken whenever an animal triggered a camera’s motion sensor. Each 20 park contained camera traps across 60 sites (except NFNP, which had 97 sites) that were sampled once per year for 30 consecutive days. The available years of data varied for parks, ranging from three for VNP to 11 for UDZ (Table 3.1). In post-processing of images, individual species were identified and aggregated into detection-nondetection histories based on 5-day sampling periods (replicates) for a maximum of six replicates per year. In cases where camera traps malfunctioned, we used the amount of time that the camera was functional as a covariate within my observation model to account for variation in detection due to decreased effort. Species were only evaluated at parks within their range; thus, I estimated the abundance and dynamics of 23 populations (i.e., species-park combinations; Table 3.1). Data analysis I described the detection probability of individuals, πœƒπ‘–,𝑗,π‘Ÿ,π‘˜,𝑑 , of each species 𝑖 at site 𝑗 in park π‘Ÿ during replicate π‘˜ in year 𝑑 using a logit-link: π‘™π‘œπ‘”π‘–π‘‘(πœƒπ‘–,𝑗,π‘Ÿ,π‘˜,𝑑 ) = 𝛼0,𝑖 + 𝛼1 βˆ™ π‘‘π‘Žπ‘¦π‘ π‘—,π‘Ÿ,π‘˜,𝑑 , where 𝛼0,𝑖 is the species-specific intercept or detection probability of an individual during a sampling replicate when cameras were functional for the average amount of time (4.5 out of 5 days). I added the covariate π‘‘π‘Žπ‘¦π‘ π‘—,π‘Ÿ,π‘˜,𝑑 (standardized to a mean of 0 and standard deviation of 1) to incorporate variation in the amount of time that a camera was functional at site 𝑗 at park π‘Ÿ during replicate π‘˜ in year 𝑑, an effect that did not vary by species. For the biological process model, I used a log-link function to model species’ initial abundances across sites within parks: π‘™π‘œπ‘”(πœ†π‘–,π‘Ÿ ) = πœ†0,𝑖 + πœ€πœ†,π‘Ÿ . 21 The intercept, πœ†0,𝑖 , varies by species to account for differences in baseline abundance and a park- level random effect (πœ€πœ†,π‘Ÿ ) captures variation between parks. I similarly modeled species annual survival probabilities, πœ”π‘–,π‘Ÿ , as: π‘™π‘œπ‘”π‘–π‘‘(πœ”π‘–,π‘Ÿ ) = πœ”0,𝑖 + πœ€πœ”,π‘Ÿ , where πœ”0,𝑖 is the baseline species-specific survival probability and πœ€πœ”,π‘Ÿ is a park-level random effect. During model development, I attempted to include environmental covariates in the models of initial abundance and the demographic parameters but large variations in covariate values between parks prevented meaningful inference and led to overly complex statistical structures (Appendix A). Further, many of the site-level (e.g., elevation, temperature, distance to edge) and park-level covariates were co-linear such that it was difficult to determine the important factors influencing species. As such, I settled on including random effects in my estimates of initial abundance and survival to account for species-specific and community-level comparisons between parks without subscribing improper mechanism to estimated differences. I modeled the expected number of individuals gained to a site, 𝛾𝑖 as: π‘™π‘œπ‘”(𝛾𝑖 ) = 𝛾0,𝑖 , using only a species-specific intercept, 𝛾0,𝑖 , as I did not expect residual variation in gains at the park-level. I explored adding a park-level random effect but removed it due to poor convergence and lack of biological support. There is no evidence for differences in birth rates or sex ratios between parks, and immigration into sites is largely dictated by species’ territorial behavior (such that there is high site fidelity across all species, Kingdon 2015), which I assumed did not vary by park. To link the species models, I drew each species-specific parameter (𝛼0,𝑖 , πœ†0,𝑖 , πœ”0,𝑖 , 𝛾0,𝑖 ) from separate community-level normal distributions with corresponding hyper-means and hyper- 22 variances. I used the mean community-level estimate for apparent annual survival across all parks (πœ‡πœ”0 ) in combination with the park-level random effects on survival (πœ€πœ”,π‘Ÿ ) to derive community-level apparent annual survival for each park (i.e., πœ‡πœ”0 + πœ€πœ”,π‘Ÿ ). I also derived an index of annual population-level (i.e., species-park combination) abundance by summing across π½π‘Ÿ,𝑑 βˆ‘π‘—=1 𝑁𝑖,𝑗,π‘Ÿ,𝑑 sites surveyed in a year. I report average abundance per site, 𝑁 ̂𝑖,π‘Ÿ,𝑑 = , rather than total π½π‘Ÿ,𝑑 abundance within a park to account for variations in sampling effort (i.e., number of sites surveyed per year, π½π‘Ÿ,𝑑 ; Table 3.1). I calculated annual population growth rates for each population as: 𝑁̂𝑖,π‘Ÿ,𝑑 ⁄𝑁 ̂𝑖,π‘Ÿ,π‘‘βˆ’1 , and summarized across years by taking the geometric mean. I estimated parameters using a Bayesian framework via Nimble and R (de Valpine et al. 2017, R Core Team 2020 Version 4.0.2). All hyper-parameters and other fixed effect parameters were given vague priors (Appendix B). I ran 5 MCMC chains each for 100,000 iterations with a burn-in of 75,000 and thinning of 25 providing 5,000 samples from the posterior distribution for each parameter. I assessed convergence using the Gelman Rubin diagnostic (Rhat < 1.2) in addition to visually examining the chains. Bayesian population viability analysis I projected the population of each species to 2030 using the estimated parameter values from my multi-species dynamic N-occupancy model and calculated quasi-extinction probabilities for each population using a BPVA. I selected 2030 as the target year as this projection timeframe captures the next window for the Convention on Biological Diversity’s updated Strategic Plan for Biodiversity (2021 - 2030; CBD 2020). I used a 50% decline in population abundance from 2020 to 2030 as the threshold for quasi-extinction as this is the criteria for International Union for Conservation of Nature’s endangered species status (IUCN 2021): π‘‡β„Žπ‘Ÿπ‘’π‘ β„Žπ‘œπ‘™π‘‘ = 𝑁 ̂𝑖,π‘Ÿ,𝑑=2020 βˆ™ 23 0.5 where 𝑁 ̂𝑖,π‘Ÿ,𝑑=2020 is the posterior mean of park-level abundance in 2020. I calculate the quasi- extinction probability as the proportion of MCMC iterations where the population (i.e., species- park combination) abundance in 2030 was below the threshold: 𝑄𝑆𝐸𝑖,π‘Ÿ = π‘ƒπ‘Ÿ(𝑁 ̂𝑖,π‘Ÿ,𝑑=2030 < π‘‡β„Žπ‘Ÿπ‘’π‘ β„Žπ‘œπ‘™π‘‘) in which I defined extinction to be inevitable with probability > 0.95 (i.e., no overlap of the 𝑁 ̂𝑖,π‘Ÿ,𝑑=2030 95% CI with the threshold). I report population projections and quasi- extinction probabilities only for populations with at least five years of data as estimates with a smaller temporal extent may be unreliable (Zipkin et al. 2014). I calculated the population projection estimates for the BPVA from a separate model run than the retrospective analysis (Appendix B), again using both Nimble and R (de Valpine et al. 2017, R Core Team 2020 Version 4.0.2) with identical priors and MCMC settings. Results Population abundance and growth While estimates of population abundance and growth reveal that most of the antelope community was fairly stable over the study period (Figure 3.1, Appendix C Table C.1, C.2), four of the 23 populations (~17%) declined in abundance over the 11-year time frame (95% credible interval [CI] for growth rates were less than one). The four populations that had negative growth rates over the study period are: C. callipygus, C. dorsalis, P. monticola, C. silvicultor, all within the Nouabale-Ndoki National Park (NNNP). Eleven populations had stable population growth rates (95% CI for growth rates overlapped one) and eight populations increased in abundance (i.e., 95% CI above one). Demographic rates Though mean community-level annual apparent survival (πœ‡πœ”0 ; Figure 3.2A) across parks was estimated as 0.72 (CI 0.28, 0.96), there was large variation across species (πœŽπœ”0 = 0.85 [CI 24 0.48, 1.52], logit-scale, Appendix C Table C3) and parks (πœŽπœ€π‘Ÿ,πœ” = 2.72 [CI 1.14, 6.50], logit scale, Appendix C Table C.4). The larger variation in survival across parks than across species (i.e., πœŽπœ€π‘Ÿ,πœ” > πœŽπœ”0 ) suggests that environmental and/or anthropogenic factors at the park-level may be contributing more to annual survival than species life history processes or species-specific responses to particular environmental factors within a park. Park-level estimates of survival (Figure 3.2A) for VNP (0.99 [CI 0.97, 0.99]), BIF (0.81 [CI 0.69, 0.91]), NFNP (0.77 [CI 0.56, 0.92]), and UDZ (0.77 [CI 0.56, 0.89]) were higher than the community-level average, likely attributable to active and effective management enforcement (Oberosler et al 2020a, 2020b, O’Brien et al. 2020). Despite estimates of stable population growth rates of its three duiker populations (Figure 3.1), low survival within KRP (0.31 [CI 0.11, 0.52]; Figure 3.2A) may be attributable to hunting pressure as local extinction of other antelope species has likely already occurred there (Viquerat et al. 2012, O’Brien et al. 2020). However, C. silvicultor is at the edge of its geographic range in KRP (Kingdon 2015), which may also be contributing to its low survival in that park (0.54 [CI 0.27, 0.72]). Across species, C. dorsalis had the lowest average annual apparent survival probability (0.56 [CI 0.12, 0.92]) while C. silvicultor had the highest (0.90 [CI 0.51, 0.99]; Figure 3.2A). The mean number of individuals gained per species annually at sites across parks (πœ‡π›Ύ0 ) was 0.24 (CI 0.10 - 0.56) with P. monticola having the highest estimated site-level recruitment (i.e., sum of immigration and fecundity; (1.48 [CI 1.11 - 1.87]) and T. spekii having the lowest (0.02 [CI 0.01 - 0.05]) (Figure 3.2B). Population viability None of the 18 populations with sufficient time-series for projections (i.e., four of six parks had at least five years of data) reached a quasi-extinction threshold severe enough to trigger IUCN endangered species status (i.e., 50% loss with nonoverlapping 95% CI), with all 25 populations having quasi-extinction probabilities of less than 0.25 (Figure 3.1; Appendix C Table C.5). Of the four populations within NNNP that were estimated to be declining, C. silvicultor and T. spekii had the highest quasi-extinction probabilities at 0.23 and 0.24, respectively (Appendix C Table C.5). While all local populations were at a low risk of extirpation over the next decade (assuming that current conditions hold), several populations in NNNP and BIF were projected to decline from 2021 to 2030 (bottom panels of Figure 3.3), suggesting that close monitoring is warranted and that management interventions may be needed to ensure persistence in the longer term. Discussion Achieving conservation targets for biodiversity requires quantifiable measures of the status, trends, and dynamics of populations at both species and community levels (Nicholson & Possingham 2006). Using an extensive continental-scale camera trapping network in Central and East Africa, my multi-species dynamic N-occupancy model estimated (1) stable population growth of the antelope community across national parks during my study period (2009-2019), (2) varying abundance and vital rates across community members and between populations of the same species, (3) differences in annual apparent survival of community members between national parks, and (4) low probabilities of quasi-extinction for the majority of populations over the next decade. These results resemble patterns seen in protected areas across equatorial Africa where effective conservation of species communities depend on the enforcement of regulations to prevent or mitigate disturbance (Farr et al. 2019, Oberosler et al 2020a, 2020b, O’Brien et al. 2020). Here, I demonstrate the upper potential of detection-nondetection data to assess communities of species by developing an analytical framework to estimate population abundance and demographic rates of each community member within the parks they inhabit. The potential 26 of my modeling framework to support biodiversity conservation is underpinned by detection- nondetection being the most ubiquitous data type for monitoring multiple species and the ability to quantify demographic rates and extinction risk for multiple species simultaneously. Contrary to the perception that unmarked data provides limited information relative to marked data, I was able to estimate community-wide population abundance and demographic rates by combining multi-species models (Dorazio & Royle 2005, Dorazio et al. 2006) with the single-species dynamic N-occupancy (Rossman et al. 2016). While there is great potential for this approach, multiple limitations may restrict my model’s application to certain data scenarios and species. Dynamic unmarked models need at least 3-5 years (i.e., time periods) of data for parameters to be identifiable but simulation results reveal that these models typically do not perform well with less than 5-10 years of sampling (Dail & Madsen 2011, Zipkin et al. 2014). Precision of estimates also depend on the number of sites sampled where too few survey locations can even lead to inaccuracies (Rossman et al. 2016). In my application of the multi- species dynamic N-occupancy model, limited time series for certain populations prevented estimation of site-level variation using covariates or multi-scale processes across parks (Appendix A). Additionally, I was unable to propagate environmental stochasticity into my BPVA (such as in Saunders et al. 2018a) because random temporal effects were unidentifiable, likely leading to underestimation of uncertainty in my future projections. For some species life histories, the basic structure of the multi-species dynamic N-occupancy model may not be feasible. For example, highly mobile and non-territorial species may violate the geographic closure assumption (i.e., no immigration or emigration out of the site during replicate visits). Other assumptions related to demographic closure (i.e., no births or deaths between replicate visits within a year), independence of sites, independence of individual detections, and species 27 identification may lead to the necessary exclusion of some taxonomic groups (Royle & Nichols 2003, Devarajan et al. 2020). Difficulty with parameter estimation may also occur for species that are very rarely detected. Though the multi-species framework theoretically allows for estimation of rare and elusive species (Zipkin et al. 2009), few observations of a species can lead to imprecise or unidentifiable estimates of species-specific parameters, especially in the context of a model that aims to estimate demography as well as abundance. My case study provided an initial demonstration of the multi-species dynamic N-occupancy model and is only a prelude to this framework’s full potential for conservation biology. Further applications of this approach can help explain variation in species statuses and trends by linking environmental drivers (covariates) to demographic rates. Such approaches can help elucidate the mechanistic reasons behind biodiversity loss and species declines by partitioning the effects of specific environmental factors on species’ survival and/or recruitment. Coupled with a Bayesian population viability analysis, extinction risk can be linked to future environmental scenarios related to projected climate changes and management interventions (Saunders et al. 2018a, Saunders et al. 2021). Evaluating extinction risks at community levels can provide a useful metric for potential biodiversity loss while explicitly incorporating uncertainty. Quantifying extinction uncertainty across species will improve the utility of global biodiversity indices for setting conservation targets (Watermeyer et al. 2020). The comparatively lower cost of collecting unmarked data, such as detection-nondetection data, has made these data types globally available for conservation assessments. Thus, models that can use such data to estimate not only metrics of population trends, but also demographic rates, are exceptionally valuable to inform global conservation priorities and ultimately, the effectiveness of conservation interventions. 28 Table 3.1 National parks and basic data information used in my multi-species dynamic N-occupancy analysis of the antelope community including the number of years sampled, species observed, number of sampling sites, and the number of recorded presences across species. Species are numerically identified as follows: (1) Cephalophus callipygus (2) Cephalophus dorsalis (3) Cephalophus harveyi (4) Cephalophus leucogaster (5) Philantomba monticola (6) Nesotragus moschatus (7) Cephalophus nigrifrons (8) Cephalophus ogilbyi (9) Tragelaphus scriptus (10) Cephalophus spadix (11) Tragelaphus spekii (12) Cephalophus silvicultor. Park Country Years Species Sites Presences Korup NP 2011 - 2015 5,8,12 Cameroon 60 1100 (KRP) (𝑛 = 5) (𝑛 = 3) Nouabale-Ndoki NP Republic of 2010 - 2016 1,2,4,5,7,11,12 60 5502 (NNNP) Congo (𝑛 = 7) (𝑛 = 7) Udzungwa NP 2009 - 2019 3,6,9,10 Tanzania 60 2970 (UDZ) (𝑛 = 11) (𝑛 = 4) Bwindi NP 2010 - 2017 7,9,11,12 Uganda 60 1551 (BIF) (𝑛 = 8) (𝑛 = 4) Nyungwe Forest NP 2014 - 2017 7,9,12 Rwanda 97 284 (NFNP) (𝑛 = 4) (𝑛 = 3) Volcanoes NP 2014 - 2016 7,9 Rwanda 60 1432 (VNP) (𝑛 = 3) (𝑛 = 2) 29 Figure 3.1 Population density (i.e., abundance per site) of each species at each park across all years that a park was surveyed. Points show the mean abundance while the shaded area is the 95% credible interval around the mean. Estimates are only shown for parks and years during which sampling occurred and for species that were observed in each park. Each panel shows the trends for a specific park (labeled with the parks’ abbreviations) and species are each represented with a unique color. 30 Figure 3.2 (A) Annual apparent survival probabilities for each species in the community across the network of parks (πœ”0,𝑖 ), mean survival probability at the community-level (πœ‡πœ”0 ; black coloring), and mean park-level survival probabilities for the entire community of species that reside in the park (πœ‡πœ”0 + πœ€πœ”,π‘Ÿ ; unique coloring). (B) Species-specific and community-level annual number of individuals gained (𝛾0,𝑖 , πœ‡π›Ύ0 ). (A,B) Boxplots show the mean, 50% credible intervals, and 95% credible intervals. The dashed-line separates the species-specific estimates from the community- and park-level estimates. 31 Figure 3.3 Retrospective population density (i.e., abundance per site during study period) and projected population density (up to 2030) for each species within the four parks that were sampled for at least five years. Lines show mean estimated (solid) and projected (dashed) density while the shaded areas depict the 95% credible intervals. Estimates are shown from the initial sampling year for a given park until 2030 (with missing years interpolated for parks when data were unavailable). Each panel shows a different park, labeled with the parks’ abbreviations. Species are depicted with a unique color. 32 Figure 3.3 (cont’d) 33 CHAPTER 4: Overcoming data gaps using integrated modeling to estimate population abundance of a migratory species Abstract Effects of environmental or anthropogenic factors on migratory populations can carry over from one season and/or spatial location to another. However, identifying the mechanistic processes driving cross-seasonal effects on migratory species is difficult due to extensive spatiotemporal gaps in monitoring data. The continuous collection of opportunistic data by volunteer-based networks provides a solution to addressing these data gaps within migratory cycles. To estimate population abundance at broad spatiotemporal scales, I develop an integrated model that leverages unstructured data during seasons and spatial locations when structured monitoring is absent. I validate my approach through simulation and then apply the framework to estimate the population density of monarch butterflies throughout their spring migratory pathway. While climate conditions during the spring breeding seasons have been identified as a key driver of annual monarch population dynamics, this portion of the annual migration lacks structured survey data. Thus, I use my integrated model to combine all available data sources and reveal that vegetation and temperature conditions in eastern Texas affected the density of monarchs during their spring migration between 2016 and 2018. Not only do these results corroborate the hypothesized importance of spring migration conditions for monarchs, but they reveal a dynamic relationship between spring conditions and their effect on fine-scale population density. More broadly, this framework highlights the power of data integration for overcoming spatiotemporal gaps during the annual cycle of migratory species. Introduction 34 During the annual cycle of a species, their population may occupy only a subset of their range, which may vary across seasons (e.g., breeding, nonbreeding, migrating). Many threats to species (e.g., climate change, habitat loss) occur across broad spatiotemporal scales, but cross- seasonal connections or carry-over effects, in which the cause of an observed effect at one season is linked to another (Norris 2005, Harrison et al. 2011), can complicate species conservation (Martin et al. 2007). Migratory species are among the most threatened species groups because of the challenges in enacting conservation at a given place and time when cross-seasonal connections regularly manifest between breeding grounds, nonbreeding grounds, and/or migratory pathways (Webster et al. 2002, Saunders et al. 2021, Zylstra et al. 2021). Parsing cross-seasonal connections into mechanistic processes requires data that span the spatiotemporal extent of a species annual cycle; however, structured data, which come from standardized monitoring programs, are often limited to spatiotemporal subsets of the migratory cycle (e.g., breeding grounds, non-breeding grounds) because of logistical constraints and elusive or cryptic behaviors that hamper data collection (Marra et al. 2015, Rushing et al. 2016). The continuous collection of unstructured data via volunteer-based networks (e.g., iNaturalist, eBird, GBIF, Map of Life) provides a means to fill in gaps from traditional data collection methods (Jetz et al 2012, Chandler et al. 2017). However, biases in unstructured data can impede the ability to develop meaningful inferences due to the lack of survey design (Dorazio 2014, Fithian et al. 2015). Integrated modeling, in which multiple data sources and types are analyzed in a single, unified framework, can be a useful tool to address these issues and model dynamics of migratory species (Pacifici et al. 2017, Zipkin & Saunders 2018, Fletcher et al. 2019, Miller et al. 2019, Isaac et al. 2020). This framework can overcome the limitations of structured (i.e., limited extent 35 and quantity) and unstructured (i.e., limited information content) data types to estimate abundance (Dorazio 2014, Farr et al. 2021). Integrated modeling has great potential for modeling migratory species abundance across their full annual cycle (Hostetler et al. 2015). Beyond estimating abundance, this framework can resolve discrepancies in estimated population trends from separate monitoring programs by linking seasonal dynamics within a species annual cycle (Saunders et al. 2019a) or be used to estimate β€˜missing’ parameters that would be inestimable from independent analysis of each data source (Saunders et al. 2018a, Zipkin & Saunders 2018). Integrated models have also been used to evaluate the effects of environmental factors hypothesized to drive population trends across a species full annual cycle (Rushing et al. 2017, Rushing et al. 2021, Zylstra et al. 2021). Here, I develop an integrated model for a migratory species when both spatial and temporal data gaps exist within the annual cycle. I develop a point process model to describe spatiotemporal variation in abundance and use single-visit count data to identify baseline abundance. I then include spatiotemporal covariates and integrate opportunistic presence-only data to extend the analysis and estimate abundance during temporal portions of the annual cycle when single-visit count data are unavailable. I demonstrate the utility of this framework using a simulation study where abundance is estimated across space and time, despite limited structured sampling. I then apply our modeling framework to a case study of monarch butterflies, in which data integration over a species annual cycle may be the only means to address a pressing ecological question. The monarch butterfly has declined across its eastern North American range since the mid-1990s (Brower et al. 2012, Agrawal & Inamine 2018), and unfavorable climatic conditions are projected to further limit monarch population abundance (Zylstra et al. 2021). The eastern 36 North American population of monarchs has a unique, multi-generational migratory cycle. At the beginning of the year (January - February), adult monarchs that emerged during the previous year are residing in their overwintering grounds in central Mexico. In spring (March - April), monarchs migrate north to eastern Texas and Oklahoma where they locate milkweed host plants to deposit eggs. Individuals that successfully metamorphose become the first generation of the year and continue the migration north to the summer breeding grounds across the Midwestern U.S. and Ontario, Canada. Two to three more generations are produced during the summer (May - August) until the final generation of the year enters reproductive diapause and migrates south in the fall (September - October), back to the same overwintering grounds in central Mexico (arriving by mid-December). Multiple hypotheses have been proposed to explain the decline of monarchs in eastern North America (Thogmartin et al. 2017, Agrawal & Inamine 2018). Recent retrospective analyses demonstrate the importance of weather conditions during the spring breeding season and migratory pathway (Zipkin et al. 2012, Saunders et al. 2016, 2018b, Zylstra et al. 2021), highlighting a critical spatiotemporal window where population growth could either be inhibited or bolstered. Spring temperature and precipitation affect the recruitment of monarchs by impacting development and host plant resources (Zalucki 1982). Despite the importance of the spring migratory pathway for monarchs, structured survey data are only available in winter (non- breeding) and summer breeding portions of their full annual cycle. I use my spatiotemporal integrated model to estimate monarch abundance and associated effects of environmental conditions during their spring migration, allowing for inference during an understudied stage of their migratory cycle. The Data 37 Structured data are those resulting from surveys that were designed with the parameter of interest (i.e., abundance) and target species in mind, while unstructured data are collected without a planned survey design or monitoring protocol. Structured data typically have high information content as they minimize sampling biases and are collected in ways that allow for the explicit estimation of observation error. However, significant logistical constraints often limit the quantity of structured data that can be collected. Without a planned design, unstructured data can be obtained more easily at large scales and thus tend to be of higher volume. However, unstructured data contain sampling biases and involve data collection methods that make it difficult to account for observation errors in analyses, leading to lower information content than structured data. Many survey designs (i.e., protocols to sample space and time) exist for structured count data. For simplicity, I consider single-visit surveys where the numbers of individuals are counted once within multiple spatial units (i.e., sites). Without replicate samples, single-visit surveys cannot account for imperfect detection leading to estimates of relative rather than absolute abundance (Knape & Korner-Nievergelt 2016). Presence-only data are the outcome of opportunistic, unstructured sampling where presences of the target species are recorded; however, the absence of the species at a given location is not recorded. Without this information, presence-only data cannot be used to estimate abundance without integration of structured data (Dorazio 2014, Farr et al. 2021). Presence-only data are also subject to sampling biases including nonrandom sampling and imperfect detection. The Model The purpose of our integrated model is to estimate population abundance during one or more seasons (henceforth notated as stage) of the annual migratory cycle when structured data 38 are limited. To estimate abundance, I use a point process to describe the distribution of individuals within a spatial domain 𝑆𝑑 (e.g., breeding range, non-breeding range, migratory pathways) occupied by the species during each stage 𝑑 = 1, … , 𝑇 within its annual cycle. As the spatial domain of the population may change over its annual cycle, I allow 𝑆𝑑 to vary at each stage. Point process models estimate the density of individuals at stage 𝑑 and point location 𝑠 (𝑠 ∈ 𝑆𝑑 ) using an intensity (i.e., number of points per unit area) function, πœ†π‘‘ (𝑠), where 𝑑 is discrete and 𝑠 is continuous. Population abundance at a given stage, 𝑁𝑑 , is described as a random variable arising from a Poisson process, 𝑁𝑑 ~π‘ƒπ‘œπ‘–π‘ (Ξ› t ). Expected abundance, Ξ› t , is calculated as the integral over the continuous domain 𝑆𝑑 : Ξ› 𝑑 = βˆ«π‘† πœ†π‘‘ (𝑠)𝑑𝑠. 𝑑 Spatial heterogeneity in population density To identify an estimate of abundance, I begin by modeling population density during a stage of the annual cycle where both structured and unstructured data exist; I refer to this as baseline population density and assume for simplicity, that this occurs at 𝑑 = 1. As population density varies over 𝑆𝑑 due to environmental or anthropogenic factors, I can model baseline population density, πœ†π‘‘=1 (𝑠), with spatial covariates using an inhomogeneous point process, specifically a log-Gaussian Cox process (MΓΈller et al. 1998): log(πœ†π‘‘=1 (𝑠)) = log(πœ†0 ) + 𝛽 β€² 𝑋𝑑=1 (𝑠) + πœ”(𝑠), eqn 1 where 𝑋𝑑=1 (𝑠) is a vector of covariates at location 𝑠 during the baseline stage, 𝛽 β€² is the vector of corresponding coefficients, and πœ†0 is the baseline population density when covariates 𝑋 are at their mean values (standardized to mean of zero and standard deviation of 1). Both πœ†π‘‘=1 (𝑠) and πœ†0 are on the log scale as their support ranges from 0 to ∞ while the support of covariates are ℝ. Spatial heterogeneity that is unaccounted for by covariates may be captured with a spatial random effect, πœ”(𝑠). A Gaussian random field (GRF) is used to describe πœ”(𝑠) within the log- 39 Gaussian Cox process (MΓΈller et al. 1998). The GRF is described using a multivariate normal distribution, πœ”~𝑀𝑉𝑁(0, πœπ‘„) and implemented using a stochastic partial differential equation approximation via a triangulated spatial mesh (which may vary by stage) of π‘˜ = 1, … , 𝐾𝑑 nodes across 𝑆𝑑 (Simpson et al. 2016, Krainski et al. 2018). Here, 𝜏 is the precision hyperparameter, and 𝑄 is a matrix that describes the correlation structure across 𝑆𝑑 . I specify correlations between locations in 𝑆𝑑 using a MatΓ©rn correlation function with scale parameter πœ… and fixed/constant smoothness parameter 𝑣 (Simpson et al. 2016, Krainski et al. 2018), which indicates that correlation among πœ†π‘‘ (𝑠) values is driven by spatial proximity, with locations near one another having similar intensities. Temporal change in population density During subsequent stages of the annual cycle, 𝑑 = 2, … , 𝑇, population density will change because of demographic processes (i.e., mortality, recruitment, immigration, emigration) that are subject to spatiotemporally varying external factors. I extend eqn 1 to implicitly account for population change: log(πœ†π‘‘ (𝑠)) = log(πœ†0 ) + 𝛽 β€² 𝑋𝑑 (𝑠) + 𝛿𝑑 + πœ”(𝑠). eqn 2 Here 𝑋𝑑 (𝑠) is a vector of covariates for each subsequent stage. Population change not captured by fluctuations in covariate values is captured by 𝛿𝑑 , which is the change (i.e., combined gains and losses) in mean population density between stages across 𝑆𝑑 . Structured count data I assume counts, 𝐢𝑑𝑗 , occur at set of sites (i.e., spatial locations) 𝑗 = 1, … , 𝐽 with an associated area 𝐷𝑗 where 𝐷1,…,𝐽 βŠ‚ 𝑆𝑑 . Count data are also described with a log-Gaussian Cox process: 𝐢𝑑𝑗 ~π‘ƒπ‘œπ‘–π‘  (∫𝐷 πœ†π‘‘ (𝑠)𝑑𝑠). Count data and most structured data types cannot be used as 𝑗 direct measures of true latent abundance as there are naturally errors within the data collection 40 process. When counts are replicated, an observation process model can be included to account for imperfect detection (Royle 2004). However, for simplicity and similar to many structured monitoring programs (e.g., Breeding Bird Survey, Christmas Bird Count), I do not explicitly account for observation error, and use a single-visit count model which assumes constant detection probability over space and time to estimate relative abundance. As many count survey designs aggregate individual observations within a site (i.e., they do not include point location, 𝑠𝑖 , for each individual 𝑖), a change-of-support problem must be addressed (Pacifici et al. 2019). Given that the expected sum of Poisson random variables is equivalent to the sum of their means, I specify an area offset and approximate the integral over each site 𝑗 as: 𝐷𝑗 πœ†Μ…π‘‘π‘— β‰ˆ ∫𝐷 πœ†π‘‘ (𝑠)𝑑𝑠. 𝑗 This assumes a homogenous process within each site (i.e., πœ†Μ…π‘‘π‘— is the mean population density for 𝐷𝑗 ) and a linear relationship between area and intensity (i.e., no density dependence). Counts at each stage 𝑑 and site 𝑗 can be modeled as: 𝐢𝑑𝑗 ~π‘ƒπ‘œπ‘–π‘ (𝐷𝑗 πœ†Μ…π‘‘π‘— ), where log(πœ†Μ…π‘‘π‘— ) = log(πœ†0 ) + 𝛽 β€² 𝑋𝑑𝑗 + 𝛿𝑑 + πœ”π‘— . Covariates, 𝑋𝑑𝑗 , are summarized across each 𝐷𝑗 , and a projection matrix, π΄π‘˜π‘— , is needed to interpolate the random effects, πœ”π‘— , from each node π‘˜ in the spatial mesh to each site 𝑗 (Krainski et al. 2018). The log-likelihood, ℓ𝐢 (πœ†0 , 𝛽, 𝛿𝑑 , πœ”), of the count data follows the classical Poisson log-likelihood. Unstructured presence-only data Presence-only data resulting from opportunistic sampling are modeled using a thinned log-Gaussian Cox process, π‘Œπ‘‘ ~π‘ƒπ‘œπ‘–π‘  (βˆ«π‘† πœ†π‘‘ (𝑠)𝑝𝑑 (𝑠)𝑑𝑠), where π‘Œπ‘‘ is the number of presence-only 𝑑 observations for stage 𝑑 within spatial domain 𝑆𝑑 (Dorazio 2014, Farr et al. 2021). The thinning function, 𝑝𝑑 (𝑠), represents the observation process that relates true population density at location 𝑠, πœ†π‘‘ (𝑠), to the presence-only data (i.e., the number of detections at location 𝑠). Imperfect 41 detection and sampling bias can be modeled within a linear predictor of 𝑝𝑑 (𝑠) using a logit-link function, intercept (𝑝0 ), covariates (π‘Šπ‘‘ (𝑠)) and their effects (𝛼) that capture each of these observation processes (Dorazio 2014, Farr et al. 2021). The log-likelihood function for presence-only data is: Ξ»0 βˆ™exp(𝛽 β€² 𝑋𝑑 (𝑠)+𝛿𝑑 +πœ”(𝑠)+𝑝0 +𝛼′ π‘Šπ‘‘ (𝑠)) π‘Œ ℓ𝑃𝑂 (πœ†0 , 𝛽, 𝛿𝑑 , πœ”, 𝑝0 , 𝛼) = βˆ‘π‘‡π‘‘=1 (βˆ’ βˆ«π‘† 𝑑 𝑑𝑠 + βˆ‘π‘–=1 (log(πœ†0 ) + 𝑑 exp(𝑝0 +𝛼′ π‘Šπ‘‘ (𝑠))+1 𝛿𝑑 + 𝛽 β€² 𝑋𝑑 (𝑠𝑖 ) + πœ”(𝑠𝑖 ) + 𝑝0 + 𝛼 β€² π‘Šπ‘‘ (𝑠𝑖 ) βˆ’ log(exp(𝑝0 + 𝛼 β€² π‘Šπ‘‘ (𝑠𝑖 )) + 1))). eqn 3 Within a single stage 𝑑, the log-likelihood can be decomposed in two parts. The first is the integral across domain 𝑆𝑑 for the thinned process. To approximate the integral over 𝑆𝑑 , the thinned process is estimated at each of the mesh nodes π‘˜ specified above, and a weighted sum is calculated where the nodes are provided a spatial (i.e., area) weight π‘€π‘˜ : βˆ«π‘†π‘‘ πœ†(𝑠)𝑝(𝑠)𝑑𝑠 β‰ˆ βˆ‘πΎ π‘˜=1 π‘€π‘˜ πœ†π‘˜ π‘π‘˜ . eqn 4 The second part of the log-likelihood is the contribution of all the presence-only observations 𝑖 = 1 … , π‘Œπ‘‘ at stage 𝑑 for each of the locations 𝑠𝑖 . A projection matrix, π΄π‘˜π‘– is also needed to interpolate the random effects πœ”π‘– from each node π‘˜ to each presence-only observation 𝑖 at location 𝑠𝑖 . The covariates, 𝑋𝑑 and π‘Šπ‘‘ need to be extracted for each node π‘˜ in addition to the presence-only locations 𝑠𝑖 . Spatiotemporal data integration Within our framework, structured data provide an anchor point for estimating the baseline population density, πœ†0 , which is unidentifiable in a standalone presence-only data analysis (Dorazio 2014, Farr et al. 2021). I assume that structured data exist at β‰₯ 1 stages 𝑑 within a species annual cycle, whereas unstructured data exist throughout the cycle, 𝑑 = 1, … , 𝑇 (i.e., structured data are more limited temporally than unstructured data). During a stage when both 42 data types simultaneously exist, the structured data can be used to estimate baseline population density, πœ†0 . Thus, data integration creates a link between unstructured data and population density, which can be used to estimate population density during subsequent stages, 𝑑 = 2, … , 𝑇, when structured data are unavailable. If structured data exist during other portions of the annual cycle, they can also be integrated during an additional stage 𝑑. The implicit assumptions of this framework are that changes in population density are explained by covariates or temporal deviations (i.e., main effects of each stage, 𝛿𝑑 ) from baseline density. Since unstructured data are used to estimate 𝛿𝑑 , the relationship between unstructured data and population density is assumed to be constant over space and time, with the thinning rate accurately capturing changes in observation error. Joint likelihood To create the integrated model, a joint likelihood is formed by the product of the likelihoods of structured (e.g., count) and unstructured (e.g., presence-only) data: 𝐿𝐼𝐷𝑀 (πœ†0 , 𝛽, 𝛿𝑑 , πœ”, 𝑝0 , 𝛼) = 𝐿𝐢 (πœ†0 , 𝛽, 𝛿𝑑 , πœ”) βˆ™ 𝐿𝑃𝑂 (πœ†0 , 𝛽, 𝛿𝑑 , πœ”, 𝑝0 , 𝛼) eqn 5 The joint likelihood of the integrated distribution model assumes independence between data types. However, violations of this assumption are likely to have minimal impact on precision and accuracy of parameter estimation and associated uncertainty (Abadi et al. 2010, Fletcher et al. 2019). Simulation Study Scenario The objective of our simulation study is to evaluate whether it is reasonable to assume a shared baseline population density between structured and unstructured data types during multiple spatiotemporal stages of abundance for a hypothetical target species. I devised a 43 scenario that was applicable to migratory species, like the eastern monarch butterfly population, which occupies non-overlapping geographic regions during different stages (e.g., breeding, non- breeding, migrating) of their annual cycle. I established two stages, where each stage contained a spatially nonoverlapping domain, 𝑆1 or 𝑆2 , of distinct size and shape (Figures 4.1A, B), with 𝑆1 β‰ˆ 36% larger than 𝑆2 . Both unstructured and structured data were available during the first stage, while only unstructured data were available during the second stage. I simulated presence-only observations for the unstructured data and single-visit counts for the structured data. I specified πœ†0 to yield an expected baseline mean abundance across both 𝑆1 and 𝑆2 . To create a realistic scenario, I created an inhomogeneous point process via a single environmental covariate across both domains, 𝑋𝑑 (𝑠), and unmeasured spatial heterogeneity, πœ”(𝑠). I specified a single covariate effect 𝛽 that was constant between stages with unique covariate values, 𝑋𝑑 (𝑠) for each stage (Figures 4.1A, B; Appendix D). The spatial random effect, πœ”(𝑠), was simulated using a GRF with a MatΓ©rn correlation function and precision (𝜏), scale (πœ…), and smoothness, (𝑣; fixed at 𝑣 = 1; Appendix D) hyperparameters. I then simulated the location and number of individuals for each stage using a Poisson point process and the corresponding intensity function. I simulated presence-only data (i.e., observations of individuals) for both stages (Figures 4.1C, D) where the thinning process incorporated effect 𝛼 and a single covariate, π‘Š(𝑠), representing sampling bias (simulated from a GRF). I standardized the covariate by subtracting the minimum value and dividing by the mean, which forced areas with low sampling intensity (i.e., lower covariate values) towards the intercept, 𝑝0 . I set 𝑝0 = logit(0.01); thus, locations at the lower end of the covariate range represent low or no sampling effort (i.e., 𝑝𝑑 β‰ˆ 0). The log- likelihood was estimated using eqns 3 and 4. I simulated single-visit counts at 25 sites for stage 1 (Figure 4.1C). I incorporated imperfect detection to mimic a realistic scenario. For each 44 individual within a given site, I simulated their detection probability using a random Bernoulli trial with a spatially homogenous probability of 0.90. Without repeated surveys, count data cannot be used to estimate absolute abundance; thus, in this simulation, our model could only estimate relative abundance. I simulated 1000 datasets and estimated population density along with relative abundance to evaluate our ability to leverage structured data over space and time. Fixed effect parameters (πœ†0 , 𝛽, 𝛿, πœ…, 𝜏, 𝑝0 , 𝛼), along with covariates 𝑋𝑑 (𝑠) and π‘Š(𝑠), were constant across simulations. The spatial random effects, πœ”(𝑠), and individual locations changed between simulations. Parameters were estimated using R-INLA, Template Model Builder, and R (Lindgren & Rue 2015, Kristensen et al. 2016, R Core Team 2020). To assess convergence (i.e., negative log likelihood is at a minimum), I checked that the absolute values of the final gradient for each parameter were near 0 and checked that the Hessian matrix was positive definite (Skaug & Fournier 2006). To assess performance, I calculated root-mean-squared error, bias, and percent relative bias for the mean estimate of each parameter. I also derived estimates of abundance and associated uncertainty to compare with true abundance. Finally, I estimated abundance for each of the 1000 simulated datasets based only on the presence-only data to compare with estimates from our integrated framework to assess the improvements in precision and bias as a result of integrating structured and unstructured datasets. Results Our integrated framework successfully identified baseline population density, πœ†0 , with lower bias (-2% relative bias) than the presence-only analysis (231% relative bias; Figure 4.1E; see Table D.1 Appendix D for full results). The slight error in our integrated analysis likely reflects that I cannot adjust for imperfect detection with single-visit counts. When structured data 45 were unavailable, it was impossible to identify the intercept parameter of an intensity function, as has been demonstrated previously (Dorazio 2014, Farr et al. 2021). Estimates of covariate effect, 𝛽, were accurate using both frameworks, though more so using the integrated analysis (6% relative bias) relative to the presence-only analysis (10% relative bias; Figure 4.1E). These results were similar to those from other studies comparing integrated and presence-only analyses (Farr et al. 2021), and the small error in the integrated analysis was likely due to each domain, 𝑆1 and 𝑆2 , only capturing a portion of the covariate range (Figure 4.1). Unmeasured spatial variability, πœ”(𝑠), was similarly estimated between the integrated and presence-only analyses. Hyperparameters, 𝜏 and πœ…, were likely biased as 𝑆1 and 𝑆2 only captured a portion of the simulation GRF for πœ”(𝑠) (Figure 4.1). Unmeasured population change, 𝛿, was estimated with slight error for both the integrated (28% relative bias) and presence-only analyses (32% error; Figure 4.1E). Using the estimated parameters, I predicted abundance across each stage of the annual cycle. The integrated model was also able to estimate abundance for stage 2, when structured data were unavailable, with low bias (3% relative bias; Figure 4.1F). Abundance was also estimated with low bias in stage 1 (-9% relative bias; Figure 4.1F). Overall, abundances were underestimated because single-visit counts do not account for imperfect detection. Alternatively, abundance was overestimated in each stage (both β‰ˆ 1Γ—108% relative bias) when using only the presence-only data due to its inability to estimate πœ†0 . Case Study Population monitoring The vast geographic range that monarchs inhabit during their annual migratory cycle make them difficult to monitor throughout the year. Structured monitoring occurs annually at the overwintering grounds in central Mexico and across the summer breeding grounds in the 46 Midwestern U.S. and Ontario, Canada. However, structured monitoring does not occur during the spring breeding season nor during the spring or fall migrations. With spring conditions potentially becoming less favorable for monarch recruitment (Zylstra et al. 2021), a deeper understanding of the mechanistic links between climate and monarch abundance may aid decision-making for this population of conservation concern. Fortunately, multiple volunteer- based monitoring programs collect observations of adult monarchs in the late spring and early summer through unstructured, opportunistic sampling. Using our integrated framework, I leveraged these data to estimate abundance of monarchs during spring migration and assess the effects of weather conditions during this critical stage. Spatiotemporal stages Our analysis encompassed the spring migration of monarchs during 2016 - 2018. I focused on these three years because the availability of unstructured data during the spring migration has increased markedly in recent years (see www.journeynorth.org). Within a given year, I partitioned spring migration into three distinct spatiotemporal stages. I indexed years with π‘Ÿ (𝑅 = 3) and use 𝑆𝑑 (𝑇 = 3) to denote each spatiotemporal stage during spring migration in a given year. The spatiotemporal stages were constant across years and defined similarly to previous studies (Saunders et al. 2019b, Zylstra et al. 2021). Our first stage, 𝑆1 , encompassed eastern Texas and Oklahoma (93.5˚ W to 100˚ W, 25.8˚ N to 37˚ N) between 8 March and 4 April. This stage was specified to encompass adult monarchs arriving from Mexico as they lay eggs that become the first generation of the year. The next stage, 𝑆2 , covered the same spatial extent but occurred between 19 April and 2 May, which included the sightings of adults from the first generation of the year post-metamorphosis. The final stage, 𝑆3 , captured these adults after they arrived on the summer breeding grounds, an area that spans the Midwestern U.S. and 47 Ontario, Canada (74.3˚ W to 97.2˚ W, 39.8˚ N to 49.4˚ N), during the early part of the summer breeding season (3 May to 6 June). Although monarchs disperse outside of this area during summer, I focused on the Midwest because the majority of individuals that arrive on the overwintering grounds in Mexico originate from this region (Flockhart et al. 2017). This spatiotemporal stage is also important as it contains both structured and unstructured data, while 𝑆1 and 𝑆2 contain very few to no structured sampling data. Datasets I collated presence-only data on monarchs during spring and early summer from five unstructured monitoring programs: North American Butterfly Association’s Butterflies I’ve Seen program (BIS), iNaturalist, eButterfly, Butterflies and Moths of North America, and Journey North. These programs allow nonscientists to report butterfly sightings (e.g., date, location, species, count) to an online database that is then made available to scientists. From each program, I extracted monarch sightings (location and date) during our study period and specified them as individual points (i.e., a single presence). For structured monitoring data, I integrated single-visit counts from five separate programs. Unlike unstructured monitoring, variation in protocols exist between programs. The first program I used was North American Butterfly Association’s Fourth of July (NFJ) surveys. These structured surveys are conducted annually in summer, with optimal timing depending on local conditions, at specified sites throughout the summer breeding range. Volunteers survey areas within a 25-km diameter circle and record the number of adults butterflies observed, by species (Oberhauser et al. 2015). I summed monarch observations among volunteers during each survey, resulting in a single count for each site and year 𝐢𝑑𝑗 . The other four monitoring programs were β€˜Pollard walk’ surveys from state-level butterfly monitoring networks (BMNs) in Illinois, 48 Ohio, Iowa, and Michigan. Volunteers walked fixed transects and counted the number of adult butterflies observed. Unlike NFJ surveys, volunteers surveyed Pollard transects multiple times within a year; however, I only used the first replicate of the year to avoid pseudo-replication. As with NFJ surveys, I summarized monarch counts during each visit to a site. The majority of structured sampling existed during early summer, 𝑆3 , with less than 5 structured counts occurring during late spring, 𝑆2 , each year. Environmental covariates I included environmental covariates that are likely to influence monarch recruitment and abundance. Monarch larvae are obligate feeders on milkweed (Asclepias spp.; Pleasants & Oberhauser 2013); thus, milkweed availability is likely to drive monarch recruitment. Because data on milkweed distribution and abundance are lacking, I used normalized difference vegetation index (NDVI), a measure of landscape greenness, as a proxy for milkweed distribution during the spring and early summer (Flockhart et al. 2013, Lemoine 2015). Temperature is also vital to the recruitment of monarchs as it influences the rates of development and survival from eggs to adults (Zalucki 1982). I used the number of growing degree days (GDD) to describe thermal conditions across the spring migratory pathway (Zipkin et al. 2012, Saunders et al. 2016, 2018, Zylstra et al. 2021). GDD is the accumulation of temperature within a specific range (McMaster & Wilhelm 1997) that allows for monarch development. GDD values accumulated from a separate start date for each stage until the date each observation or survey occurred. To standardize for differences in days accumulated, I summarized GDD as the average daily GDD. For details about how NDVI and GDD variables were calculated for each observation, survey location, and mesh node, see Appendix E. Observation covariates 49 The amount of effort expended searching for monarchs on each structured survey was likely to affect the number of individuals observed. Because the area searched during each structured survey (e.g., length of each BMN transect or area of each NFJ survey) was not reported, I converted the number of person hours spent surveying into area sampled based on the average human walking speed (5-km/hr; Browning et al. 2006) and assuming that butterflies within 1-m of the observer were detected. For the presence-only data, I used two covariates in the thinning function to account for spatiotemporal variation in sampling intensity. First, I used the spatiotemporal density of non-monarch butterfly (superfamily Papilionoidea) observations in iNaturalist, assuming that areas of high-frequency butterfly observations correlate with high sampling intensity. I used human population density as another proxy for sampling intensity because the number of observers relates to the number of people in the surrounding area (Geldmann et al. 2016, Appendix E). Data analysis I used the integrated modeling framework described above with a few case-specific modifications. I selected a 100-m2 resolution for presence-only data, and converted the area offset to reflect this baseline resolution. In addition to the fixed effects describing population change between domains within a single annual cycle (𝛿𝑑 ), I included a fixed effect to capture population change between years (π›Ύπ‘Ÿ ). I included an interaction between each environmental covariate and stage to capture any nonstationarity in climatic effects (Rollinson et al. 2021). I also specified the effect of observation covariates to vary by stage and year to account for changes in their effects on sampling effort. I added quadratic effects for both NDVI and averaged daily GDD to account for peaks in optimal climatic conditions. For simplicity, I show the updated eqn 2 here and specify the entire likelihood within TMB in Appendix E: 50 log(πœ†π‘‘π‘Ÿ (𝑠)) = log(πœ†0 ) + 𝛽1,𝑑 βˆ™ π‘π·π‘‰πΌπ‘‘π‘Ÿ (𝑠) + 𝛽2,𝑑 βˆ™ π‘π·π‘‰πΌπ‘‘π‘Ÿ (𝑠)2 + 𝛽3,𝑑 βˆ™ πΊπ·π·π‘‘π‘Ÿ (𝑠) + 𝛽4,𝑑 βˆ™ πΊπ·π·π‘‘π‘Ÿ (𝑠)2 + 𝛿𝑑 + π›Ύπ‘Ÿ + πœ”(𝑠). eqn 6 I use location 𝑠 above as a generic description, but I estimate population density within the likelihood at each 𝑠𝑖 , node π‘˜, and site 𝑗 (Appendix E). Results Using our integrated model, I estimated monarch population density (i.e., relative density) during spring migration from 2016 - 2018 when structured data were limited. In each year, monarch density increased across each migratory stage respectively (Figure 4.2). Climatic conditions along the spring migratory pathway, particularly early in migration in Texas, had significant effects on the density of monarchs (Figure 4.3A). I identified nonlinear effects of both NDVI and averaged daily GDD during the spring and early summer (Figures 4.3B, C). The relationship between covariates and peak monarch density differed between the spring and summer, shifting from higher densities under moderately warm and highly vegetated conditions to higher densities under cooler and less vegetated conditions (Figures 4.3D-F). Densities were highest when conditions were near their spatiotemporal means of a given stage (during 2016 – 2018) of NDVI and averaged daily GDD, except for early spring when expected density peaked above the mean NDVI (Figure 4.3D-F). The influence of vegetation and climatic conditions was also weaker during the early summer compared to spring (Figure 4.3A), with monarchs likely to be present over a wide range of conditions (Figures 4.2C, F, I, 4.3B, C, F). I assessed seasonal and yearly changes in monarch densities, rather than total abundance, as single-visit counts do not account for imperfect detection. Monarch densities increased over the course of spring migration each year, with the lowest densities during early spring migration (Figure 4.2). Increases in densities from spring to summer (Figures 4.2A, B, D, E, G, H) reflects 51 spring breeding and recruitment of individuals as monarchs peak in abundance during the summer (Zylstra et al. 2021). Abundance trends across years reflect similar patterns seen in the overwintering grounds during 2016 – 2018, with densities considerably higher in 2018 than in the previous two years (Table E.1 Appendix E). Discussion In this study, I developed an integrated model using a log-Gaussian Cox process that can estimate abundance during multiple seasons of a species’ annual cycle when structured data are sparse. The continuous collection of unstructured data in combination with the comparatively higher information content inherent to structured data improves inferences on migratory populations, as compared to independent analyses of either data source. By addressing data gaps through data integration across multiple stages of a species’ migratory or annual cycle, our modeling framework can be used to elucidate cross-seasonal connections between environmental factors and observed patterns in population abundance and/or dynamics. I demonstrate both the validity and merit of my framework through a simulation study and by directly estimating monarch population density throughout the spring migration when structured data are minimal. The spring migration has been identified as a critical stage for the eastern monarch population because of a potential cross-seasonal effect of spring climate on the long-term trends and annual variation in summer monarch abundance (Saunders et al. 2016, Saunders et al. 2018b, Zylstra et al. 2021). Our model corroborates these long-term analyses, despite the use of a different spatiotemporal resolution than has been used in previous work. The demonstrated effects of spring NDVI and averaged daily GDD on monarch densities suggest that spring conditions may drive recruitment rates of monarchs in spring, either directly (by influencing growth rates) or indirectly (by influencing the availability of milkweed). Similar to 52 previous studies (Saunders et al. 2016, Zylstra et al. 2021), monarch densities were highest when conditions were mild and near the spatiotemporal (space and year) average of a given stage. One of the novel findings from our analysis relates to nonstationarity of the effects of environmental conditions on monarch population densities (i.e., when mean effects or their covariance are not constant across space or time; Rollinson et al. 2021) in spring vs early summer, with a shift from peak densities under warmer and greener conditions vs. cooler and less vegetated conditions, respectively (Figure 3D-F). The initiative to understand cross-seasonal processes has unveiled the frequency of nonstationary effects over space and time within ecology (Rollinson et al. 2021). In our case study, I explicitly captured nonstationarity over time using a temporal interaction for the covariates and implicitly captured it over space as the spring and summer domains were spatially distinct. Given the point process foundation of our model, it can be adapted to situations when stationarity is violated by specifying more complex statistical structures (Jarzyna et al. 2014, Ingebrigtsen et al. 2014, Lindgren & Rue 2015, Krainski et al. 2018). The ability to incorporate nonstationarity and spatial random effects into our framework helps maintain the assumption of a baseline population density by increasing the model’s flexibility in explaining spatiotemporal variation and dependences. Our integrated framework, which uses unstructured data to estimate population size during stages when structured data are unavailable, relies on one critical assumption: that the relationship between unstructured data and population density is constant over space and time, with the thinning rate accurately capturing changes in observation error. Thus, valid inferences rely on a model that properly accounts for the observation processes through which the data were collected. These observational covariates are critical to properly estimating both population density and the effect of the environmental covariates, and should be carefully examined. 53 Unstructured data use a thinned point process with relevant covariates; however, relevant covariates may be hard to identify or measure (KΓ©ry & Royle 2020). The use of covariates to correct for sampling bias of presence-only observations is the subject of much debate (Troudet et al. 2017) and warrants further study (Kelling et al. 2019). I selected observation covariates following other presence-only analyses (Geldmann et al. 2016, Momeni-Dehaghi et al. 2021), but improper selection can lead to biased inferences (Ries et al. 2019, Wepprich 2019). Log- Gaussian Cox processes may be a solution, as they are powerful tools to estimate spatiotemporal patterns in both ecological and observational processes and can be specified to improve estimation where spatial biases in sampling occur (Tang et al. 2021). Preferably, structured data should formally incorporate methods to account for error, such as imperfect detection. In addition to selecting relevant covariates to capture variation in observational and biological patterns, other choices made in the modeling process can affect inferences. Covariates must be summarized over space and time, and the scale at which summarizing occurs could influence our understanding of the effects of climate or other environmental factors on species populations. Emerging evidence from macrosystems ecology indicates that multi-scale connections may drive population patterns, and thus, the role of scale is critical in capturing cryptic yet important ecological processes (Fei et al. 2015). Inferences will also be affected by the unit of measurement (1-km2 vs 1Γ—6-m2) for the area offset, which accounts for effort in terms of the area sampled by structured surveys. The weight (log likelihood values) of the presence only data source is determined by the spatial resolution of the model, which is indirectly set by the area offset unit of measurement. Finally, data weighting between structured and unstructured sources can be a concern when structured surveys are unable to anchor baseline density estimates by incorporating imperfect detection. I encourage future practitioners to conduct sensitivity 54 analyses to ensure that inference is invariant to these choices or choices are made with biological support. My spatiotemporal integrated model offers a powerful approach for estimating densities of migratory species that range over vast extents, as well as covariate effects on those densities. The spatial point process framework also allows for easy incorporation of techniques for addressing nonstationary processes, which is a growing issue in macrosystems research. Cross- seasonal processes and data gaps are not only a challenge for migratory species. Thus, my modeling framework can be applied to other species or systems that would benefit from data integration to provide a complete picture of ecological processes operating across broad spatial ranges and the full annual cycle. 55 Figure 4.1 Figure 1. (A-D) Visualization of my simulation study for the spatiotemporal integrated model with stage, domain 𝑆1 on the left (A, C) and stage, domain 𝑆2 on the right (B, D) with dashed lines indicating the extent of each domain. The top row (A, B) depicts true population abundance and distribution with black dots representing observations of individuals. The middle row (C, D) shows the data for each stage within each domain. (C) 𝑆1 contains presence-only data (white dots) and 25 sites (dashed circles) with single-visit counts (magnitude of counts are within dashed circles). (D) 𝑆2 contains only presence-only data. (A-D) Background gradient indicates the expected population density (individuals per unit area) for each domain and stage (A, C are the same [stage 1] and B, D are the same [stage 2]). (E, F) Simulation results showing bias (estimated minus true value) from each of the integrated and presence-only models. Bottom left (E) shows results for biases in estimates of the baseline population density (πœ†0 ), covariate effect (𝛽), and population change (𝛿). Bottom right (F) shows results for biases in estimates of abundance for each domain and stage. 56 Figure 4.1 (cont’d) 57 Figure 4.2 Predicted population density of monarchs from 2016 to 2018 during their spring migration. Density values were calculated using parameter estimates and values of NDVI and daily GDD averaged across each stage at a given location and year. Population density was estimated and reported at a resolution of 100-m2. Rows and columns correspond to years and stages, respectively. 58 Figure 4.3 Effects of NDVI and averaged daily GDD on monarch population density. Top left (A) is the parameter estimates (mean and 95% confidence intervals) for the linear and quadratic effects of NDVI and averaged daily GDD at each stage. Middle left (B, C) is the estimated marginal effects (mean and 95% confidence intervals) of NDVI (B) and averaged daily GDD (C) on monarch population density at each stage, respectively. Right column (D-F) is the marginal effect (mean) of both NDVI and averaged daily GDD on monarch population density for early spring (D), late spring (E), and early summer (F). Crosses indicate observed values between 2016 and 2018 and dashed lines indicate the observed mean values during 2016 - 2018. 59 Figure 4.3 (cont’d) 60 APPENDICES 61 APPENDIX A: Additional application details For the multi-species dynamic N-occupancy model application, I assessed multiple anthropogenic and environmental factors hypothesized to affect the antelope community’s abundance and demographic rates. Though the final model did not contain covariates on the biological process, I describe the various covariates that I considered including and an explanation of why I refrained from including covariates in the final model. Anthropogenic covariates Humans may play a role in the spatial distributions and survival of antelope species within my target community as antelopes are often poached as part of the bushmeat trade (Newing 2001). I summarized multiple covariates as a proxy for hunting pressure because direct metrics were not available. I used human population density outside of parks and distance to edge as proxies for anthropogenic disturbance. Because human population density was either high (> 300 humans/km2) or low (< 35 humans/km2) at each park, I specified human density as a binary variable (i.e., 0 for low density [UDZ, NNNP, KRP] and 1 for high density [VNP, BIF, NFNP]). Distance to edge was used as a proxy for hunting pressure and other human disturbance. Distances were recorded from each camera trap site to the nearest accessible edge (e.g., public road, navigable river, park boundary adjacent to human land use). Environmental covariates Though previous research on tropical rainforest dwelling antelopes suggests neutral responses to weather conditions (O’Brien et al. 2020), ongoing climate change continues to be a concern for tropical rainforest (Phillips et al. 2009, Sullivan et al. 2020). I characterized local weather (i.e., rainfall) at the site-level to assess the potential influence of climate change on the antelope community. I hypothesized that climate may indirectly affect the population status and 62 trends of antelopes by altering forest productivity and resulting food availability (e.g., foliage, fruit). I used CHIRPS (Climate Hazards group Infrared Precipitation with Stations) annual precipitation values (mm), which combines satellite and station data to interpolate localized (i.e., 0.05Β° resolution) rainfall (Funk et al. 2015). Antelope species within this community vary in their diet composition (i.e., percent of foliage, fruit, grass); thus, each species is tied to a different forest types and corresponding elevation range (Kingdon 2015). To capture this variation, I also evaluated a site-level elevation covariate. Model building To interpret multiple covariate effects, I standardized each covariate to have a mean of zero and standard deviation of one. For site-level covariates I standardized across all parks as standardizing within each park prevented meaningful inference. During model building I first constructed a null model. To determine a final model, I attempted to add covariates to abundance, survival, and gains using a forward selection approach (Saunders et al. 2019b, 2019c, Saunders et al. 2021). However, parameter unidentifiability prevented this approach from working as convergence (Rhat < 1.2) for many species-specific effects was not achievable. I hypothesize that large variation in covariate values across parks prevented accurate estimation of effects. Ideally, population-specific (i.e., interaction between a given covariate, species, and park) effects could account for this issue, but data volumes for many parks were constrained by short temporal extents (Table 3.1). I also ruled out scaling up site-level covariates (i.e., mean of a park) to the park-level as the low number of parks (i.e., six in total) did not allow covariates to accurately represent (i.e., proxy for true relationship) the anthropogenic or environmental process of interest. Ultimately, I settled on using park-level random effects on initial abundance and apparent survival. The high similarities between sites within a park and the large variations that I 63 observed with covariate values among parks led us to determine that park-level random effects were the most practical solution. 64 APPENDIX B: Nimble model code Nimble code to fit the β€˜multi-species dynamic N-occupancy model’ for the antelope community across the network of TEAM sites within national parks between 2009-2019. Both the retrospective and projected analysis code are shown below. Definitions of constants nparks: number of national parks (6). Nspecs: number of species (12). parkS & parkE: indicates the parks where a species was observed (varies by species). nsite: number of sites within a park (varies by park). nstart & nend: indicates the years when a park was surveyed (varies by parks). nyears: number of years for projection corresponding to 2030 (22). nreps: number of replicates (6). Retrospective model code model.code <- nimbleCode({ #-Priors-# # Detection hyperparameters mu.a0 ~ dunif(0, 1) # Community-level intercept on probability scale mu.a0L <- logit(mu.a0) # Community-level intercept on logit scale tau.a0 ~ dgamma(0.1, 0.1) # Community-level precision # Effect of effort (days sampled) alpha1 ~ dnorm(0, 0.1) # Initial abundance hyperparameters mu.b0 ~ dnorm(0, 0.1) # Community-level intercept on log scale tau.b0 ~ dgamma(0.1, 0.1) # Community-level precision # Apparent survival hyperparameters mu.o0 ~ dunif(0, 1) # Community-level intercept on probability scale mu.o0L <- logit(mu.o0) # Community-level intercept on logit scale tau.o0 ~ dgamma(0.1, 0.1) # Community-level precision # Gains hyperparameters mu.g0 ~ dnorm(0, 0.1) # Community-level intercept on log scale tau.g0 ~ dgamma(0.1, 0.1) # Community-level precision # Precison of park-level random effect on initial abundance tau.eps.l ~ dgamma(0.1, 0.1) # Precison of park-level random effect on apparent survival tau.eps.o ~ dgamma(0.1, 0.1) 65 for(r in 1:nparks){ # Park-level random effect on initial abundance eps.l[r] ~ dnorm(0, tau.eps.l) # Park-level random effect on apparent survival eps.o[r] ~ dnorm(0, tau.eps.o) # Predicted community-level apparent survival for each park logit(park.surv[r]) <- mu.o0L + eps.o[r] } # End r for(i in 1:nspecs){ # Species-specific intercept on detection alpha0[i] ~ dnorm(mu.a0L, tau.a0) # Species-specific intercept on initial abundance beta0[i] ~ dnorm(mu.b0, tau.b0) # Species-specific intercept on apparent survival omega0[i] ~ dnorm(mu.o0L, tau.o0) # Species-specific intercept on gains gamma0[i] ~ dnorm(mu.g0, tau.g0) for(r in parkS[i]:parkE[i]){ # Predicted population-level apparent survival logit(pop.surv[r,i]) <- omega0[i] + eps.o[r] for(j in 1:nsite[r]){ # Linear predictor for initial abundance log(lambda[j,r,i]) <- beta0[i] + eps.l[r] # Initial abundance N[nstart[r],j,r,i] ~ dpois(lambda[j,r,i]) for(k in 1:nreps){ # Linear predictor for detection probability (in year 1) logit(r[k,nstart[r],j,r,i]) <- alpha0[i] + alpha1 * days[k,nstart[r],j,i] # Observation process (in year 1) y[k,nstart[r],j,r,i] ~ dbern(p[k,nstart[r],j,r,i]) # Site level detection (N-occupancy parameterization) (in year 1) p[k,nstart[r],j,r,i] <- 1 - pow((1 - r[k,nstart[r],j,r,i]), N[nstart[r],j,r,i]) } # End k for(t in (nstart[r]+1):nend[r]){ for(k in 1:nreps){ 66 # Linear predictor for detection probability (in year t+1) logit(r[k,t,j,r,i]) <- alpha0[i] + alpha1 * days[k,t,j,i] # Observation process (in year t+1) y[k,t,j,r,i] ~ dbern(p[k,t,j,r,i]) # Observation process (in year t+1) p[k,t,j,r,i] <- 1 - pow((1 - r[k,t,j,r,i]), N[t,j,r,i]) } # End k # Linear predictor for apparent survival logit(omega[t-1,j,r,i]) <- omega0[i] + eps.o[r] # Biological process N[t,j,r,i] <- S[t-1,j,r,i] + G[t-1,j,r,i] # Apparent survival S[t-1,j,r,i] ~ dbin(omega[t-1,j,r,i], N[t-1,j,r,i]) # Gains G[t-1,j,r,i] ~ dpois(gamma[t-1,j,r,i]) # Linear predictor for gains log(gamma[t-1,j,r,i]) <- gamma0[i] } # End t } # End j for(t in nstart[r]:nend[r]){ # Population (species-park) abundance per year Nhat[t,r,i] <- sum(N[t,1:nsite[r],r,i]) } # End t } # End r } # End s }) Prospective model code model.code <- nimbleCode({ #-Priors-# # Detection hyperparameters mu.a0 ~ dunif(0, 1) # Community-level intercept on probability scale mu.a0L <- logit(mu.a0) # Community-level intercept on logit scale tau.a0 ~ dgamma(0.1, 0.1) # Community-level precision 67 # Effect of effort (days sampled) alpha1 ~ dnorm(0, 0.1) # Initial abundance hyperparameters mu.b0 ~ dnorm(0, 0.1) # Community-level intercept on log scale tau.b0 ~ dgamma(0.1, 0.1) # Community-level precision # Apparent survival hyperparameters mu.o0 ~ dunif(0, 1) # Community-level intercept on probability scale mu.o0L <- logit(mu.o0) # Community-level intercept on logit scale tau.o0 ~ dgamma(0.1, 0.1) # Community-level precision # Gains hyperparameters mu.g0 ~ dnorm(0, 0.1) # Community-level intercept on log scale tau.g0 ~ dgamma(0.1, 0.1) # Community-level precision # Precison of park-level random effect on initial abundance tau.eps.l ~ dgamma(0.1, 0.1) # Precison of park-level random effect on apparent survival tau.eps.o ~ dgamma(0.1, 0.1) for(r in 1:nparks){ # Park-level random effect on initial abundance eps.l[r] ~ dnorm(0, tau.eps.l) # Park-level random effect on apparent survival eps.o[r] ~ dnorm(0, tau.eps.o) # Predicted community-level apparent survival for each park logit(park.surv[r]) <- mu.o0L + eps.o[r] } # End r for(i in 1:nspecs){ # Species-specific intercept on detection alpha0[i] ~ dnorm(mu.a0L, tau.a0) # Species-specific intercept on initial abundance beta0[i] ~ dnorm(mu.b0, tau.b0) # Species-specific intercept on apparent survival omega0[i] ~ dnorm(mu.o0L, tau.o0) # Species-specific intercept on gains gamma0[i] ~ dnorm(mu.g0, tau.g0) for(r in parkS[i]:parkE[i]){ # Predicted population-level apparent survival logit(pop.surv[r,i]) <- omega0[i] + eps.o[r] for(j in 1:nsite[r]){ # Linear predictor for initial abundance 68 log(lambda[j,r,i]) <- beta0[i] + eps.l[r] # Initial abundance N[nstart[r],j,r,i] ~ dpois(lambda[j,r,i]) for(k in 1:nreps){ # Linear predictor for detection probability (in year 1) logit(r[k,nstart[r],j,r,i]) <- alpha0[i] + alpha1 * days[k,nstart[r],j,i] # Observation process (in year 1) y[k,nstart[r],j,r,i] ~ dbern(p[k,nstart[r],j,r,i]) # Site level detection (N-occupancy parameterization) (in year 1) p[k,nstart[r],j,r,i] <- 1 - pow((1 - r[k,nstart[r],j,r,i]), N[nstart[r],j,r,i]) } # End k for(t in (nstart[r]+1):nend[r]){ for(k in 1:nreps){ # Linear predictor for detection probability (in year t+1) logit(r[k,t,j,r,i]) <- alpha0[i] + alpha1 * days[k,t,j,i] # Observation process (in year t+1) y[k,t,j,r,i] ~ dbern(p[k,t,j,r,i]) # Observation process (in year t+1) p[k,t,j,r,i] <- 1 - pow((1 - r[k,t,j,r,i]), N[t,j,r,i]) } # End k # Linear predictor for apparent survival logit(omega[t-1,j,r,i]) <- omega0[i] + eps.o[r] # Biological process N[t,j,r,i] <- S[t-1,j,r,i] + G[t-1,j,r,i] # Apparent survival S[t-1,j,r,i] ~ dbin(omega[t-1,j,r,i], N[t-1,j,r,i]) # Gains G[t-1,j,r,i] ~ dpois(gamma[t-1,j,r,i]) # Linear predictor for gains log(gamma[t-1,j,r,i]) <- gamma0[i] } # End t for(t in (nend[r]+1):nyears){ # Linear predictor for apparent survival logit(omega[t-1,j,r,i]) <- omega0[i] + eps.o[r] # Biological process 69 N[t,j,r,i] <- S[t-1,j,r,i] + G[t-1,j,r,i] # Apparent survival S[t-1,j,r,i] ~ dbin(omega[t-1,j,r,i], N[t-1,j,r,i]) # Gains G[t-1,j,r,i] ~ dpois(gamma[t-1,j,r,i]) # Linear predictor for gains log(gamma[t-1,j,r,i]) <- gamma0[i] } # End t } # End j for(t in nstart[r]:nyears){ # Population (species-park) abundance per year Nhat[t,r,i] <- sum(N[t,1:nsite[r],r,i]) } # End t } # End r } # End s }) 70 APPENDIX C: Model output Output from the multi-species dynamic N-occupancy model. For the retrospective analysis I provide estimates of annual population growth and population abundance during the study period. I also report species-specific, community-level, and park-level demographic rates. For the population projection and Bayesian viability analysis, I provide estimate of the quasi- extinction probability of each population. Retrospective analysis Table C.1 Mean annual population growth rates (i.e., geometric mean) with 95% credible intervals of each population (species-park combination) during the study period. National park Species Annual population growth rate UDZ C. harveyi 1.04 (CI 1.02, 1.06) UDZ N. moschatus 1.00 (CI 0.98, 1.02) UDZ T. scriptus 1.19 (CI 1.09, 1.26) UDZ C. spadix 1.01 (CI 0.98, 1.04) VNP C. nigrifrons 1.07 (CI 1.03, 1.10) VNP T. scriptus 1.01 (CI 0.99, 1.03) NFNP C. nigrifrons 1.41 (CI 1.29, 1.54) NFNP T. scriptus 1.41 (CI 0.91, 2.08) NFNP C. silvicultor 1.54 (CI 1.21, 1.94) BIF C. nigrifrons 1.08 (CI 1.06, 1.11) BIF T. scriptus 1.12 (CI 0.96, 1.29) BIF T. spekii 1.16 (CI 0.97, 1.36) BIF C. silvicultor 1.06 (CI 1.03, 1.09) NNNP C. callipygus 0.93 (CI 0.90, 0.95) NNNP C. dorsalis 0.96 (CI 0.93, 1.00) NNNP C. leucogaster 1.49 (CI 1.33, 1.69) NNNP P. monticola 0.92 (CI 0.90, 0.95) NNNP C. nigrifrons 1.07 (CI 0.89, 1.26) NNNP T. spekii 1.02 (CI 0.81, 1.23) NNNP C. silvicultor 0.94 (CI 0.91, 0.97) KRP P. monticola 1.03 (CI 0.98, 1.08) KRP C. ogilbyi 1.00 (CI 0.93, 1.07) 71 Table C1 (cont’d) KRP C. silvicultor 0.92 (CI 0.67, 1.19) 72 Table C.2 Mean annual site-level abundance with 95% credible intervals of each population across the study period (including only the years sampled for each population/park combination). National park Species Year Site-level abundance KRP P. monticola 2011 1.49 (CI 1.28, 1.73) KRP P. monticola 2012 1.59 (CI 1.37, 1.83) KRP P. monticola 2013 2.02 (CI 1.77, 2.32) KRP P. monticola 2014 1.90 (CI 1.65, 2.18) KRP P. monticola 2015 1.68 (CI 1.43, 1.95) KRP C. ogilbyi 2011 1.24 (CI 0.97, 1.60) KRP C. ogilbyi 2012 1.17 (CI 0.92, 1.50) KRP C. ogilbyi 2013 1.30 (CI 1.05, 1.62) KRP C. ogilbyi 2014 1.23 (CI 0.97, 1.57) KRP C. ogilbyi 2015 1.23 (CI 0.95, 1.58) KRP C. silvicultor 2011 0.06 (CI 0.03, 0.12) KRP C. silvicultor 2012 0.06 (CI 0.05, 0.10) KRP C. silvicultor 2013 0.06 (CI 0.03, 0.10) KRP C. silvicultor 2014 0.05 (CI 0.03, 0.08) KRP C. silvicultor 2015 0.04 (CI 0.02, 0.10) NNNP C. callipygus 2010 4.69 (CI 3.87, 5.65) NNNP C. callipygus 2011 3.57 (CI 3.02, 4.23) NNNP C. callipygus 2012 2.92 (CI 2.47, 3.43) NNNP C. callipygus 2013 2.97 (CI 2.52, 3.48) NNNP C. callipygus 2014 3.05 (CI 2.57, 3.58) NNNP C. callipygus 2015 2.61 (CI 2.17, 3.10) NNNP C. callipygus 2016 2.94 (CI 2.45, 3.48) NNNP C. dorsalis 2010 2.10 (CI 1.75, 2.52) NNNP C. dorsalis 2011 1.69 (CI 1.42, 2.02) NNNP C. dorsalis 2012 1.75 (CI 1.47, 2.08) NNNP C. dorsalis 2013 1.90 (CI 1.63, 2.22) NNNP C. dorsalis 2014 1.71 (CI 1.45, 2.02) NNNP C. dorsalis 2015 1.57 (CI 1.30, 1.88) NNNP C. dorsalis 2016 1.69 (CI 1.40, 2.02) NNNP C. leucogaster 2010 0.01 (CI 0.00, 0.03) NNNP C. leucogaster 2011 0.04 (CI 0.00, 0.12) NNNP C. leucogaster 2012 0.14 (CI 0.07, 0.23) NNNP C. leucogaster 2013 0.25 (CI 0.18, 0.35) NNNP C. leucogaster 2014 0.28 (CI 0.20, 0.40) NNNP C. leucogaster 2015 0.23 (CI 0.13, 0.37) NNNP C. leucogaster 2016 0.23 (CI 0.13, 0.38) 73 Table C2 (cont’d) NNNP P. monticola 2010 6.21 (CI 5.32, 7.18) NNNP P. monticola 2011 4.70 (CI 4.10, 5.37) NNNP P. monticola 2012 4.19 (CI 3.67, 4.80) NNNP P. monticola 2013 4.19 (CI 3.68, 4.77) NNNP P. monticola 2014 4.19 (CI 3.68, 4.75) NNNP P. monticola 2015 3.59 (CI 3.10, 4.15) NNNP P. monticola 2016 3.78 (CI 3.28, 4.33) NNNP C. nigrifrons 2010 0.05 (CI 0.03, 0.08) NNNP C. nigrifrons 2011 0.01 (CI 0.00, 0.05) NNNP C. nigrifrons 2012 0.08 (CI 0.07, 0.10) NNNP C. nigrifrons 2013 0.01 (CI 0.00, 0.05) NNNP C. nigrifrons 2014 0.02 (CI 0.00, 0.05) NNNP C. nigrifrons 2015 0.05 (CI 0.02, 0.10) NNNP C. nigrifrons 2016 0.07 (CI 0.02, 0.15) NNNP T. spekii 2010 0.07 (CI 0.03, 0.18) NNNP T. spekii 2011 0.06 (CI 0.00, 0.15) NNNP T. spekii 2012 0.07 (CI 0.03, 0.15) NNNP T. spekii 2013 0.05 (CI 0.00, 0.15) NNNP T. spekii 2014 0.06 (CI 0.00, 0.15) NNNP T. spekii 2015 0.09 (CI 0.05, 0.18) NNNP T. spekii 2016 0.08 (CI 0.02, 0.18) NNNP C. silvicultor 2010 2.17 (CI 1.87, 2.52) NNNP C. silvicultor 2011 2.08 (CI 1.83, 2.37) NNNP C. silvicultor 2012 1.99 (CI 1.77, 2.23) NNNP C. silvicultor 2013 1.84 (CI 1.62, 2.08) NNNP C. silvicultor 2014 1.72 (CI 1.52, 1.97) NNNP C. silvicultor 2015 1.54 (CI 1.33, 1.77) NNNP C. silvicultor 2016 1.50 (CI 1.28, 1.73) UDZ C. harveyi 2009 1.96 (CI 1.68, 2.28) UDZ C. harveyi 2010 1.99 (CI 1.73, 2.30) UDZ C. harveyi 2011 2.12 (CI 1.85, 2.43) UDZ C. harveyi 2012 2.19 (CI 1.92, 2.52) UDZ C. harveyi 2013 2.31 (CI 2.03, 2.65) UDZ C. harveyi 2014 2.47 (CI 2.17, 2.83) UDZ C. harveyi 2015 2.54 (CI 2.22, 2.92) UDZ C. harveyi 2016 2.65 (CI 2.32, 3.03) UDZ C. harveyi 2017 2.77 (CI 2.43, 3.17) UDZ C. harveyi 2018 2.80 (CI 2.43, 3.23) UDZ C. harveyi 2019 2.91 (CI 2.52, 3.35) UDZ N. moschatus 2009 0.91 (CI 0.73, 1.12) UDZ N. moschatus 2010 0.88 (CI 0.72, 1.07) 74 Table C2 (cont’d) UDZ N. moschatus 2011 0.86 (CI 0.70, 1.03) UDZ N. moschatus 2012 0.89 (CI 0.73, 1.07) UDZ N. moschatus 2013 1.02 (CI 0.88, 1.18) UDZ N. moschatus 2014 1.05 (CI 0.90, 1.22) UDZ N. moschatus 2015 1.02 (CI 0.87, 1.18) UDZ N. moschatus 2016 0.96 (CI 0.82, 1.12) UDZ N. moschatus 2017 0.95 (CI 0.80, 1.10) UDZ N. moschatus 2018 0.90 (CI 0.75, 1.07) UDZ N. moschatus 2019 0.91 (CI 0.75, 1.08) UDZ T. scriptus 2009 0.01 (CI 0.00, 0.05) UDZ T. scriptus 2010 0.06 (CI 0.05, 0.08) UDZ T. scriptus 2011 0.06 (CI 0.03, 0.10) UDZ T. scriptus 2012 0.13 (CI 0.12, 0.15) UDZ T. scriptus 2013 0.08 (CI 0.03, 0.13) UDZ T. scriptus 2014 0.11 (CI 0.07, 0.15) UDZ T. scriptus 2015 0.13 (CI 0.10, 0.17) UDZ T. scriptus 2016 0.12 (CI 0.08, 0.17) UDZ T. scriptus 2017 0.11 (CI 0.08, 0.15) UDZ T. scriptus 2018 0.10 (CI 0.07, 0.15) UDZ T. scriptus 2019 0.13 (CI 0.10, 0.17) UDZ C. spadix 2009 1.27 (CI 0.93, 1.67) UDZ C. spadix 2010 1.25 (CI 0.93, 1.63) UDZ C. spadix 2011 1.25 (CI 0.93, 1.63) UDZ C. spadix 2012 1.20 (CI 0.85, 1.60) UDZ C. spadix 2013 1.24 (CI 0.92, 1.63) UDZ C. spadix 2014 1.26 (CI 0.93, 1.67) UDZ C. spadix 2015 1.29 (CI 0.93, 1.70) UDZ C. spadix 2016 1.38 (CI 1.07, 1.78) UDZ C. spadix 2017 1.40 (CI 1.05, 1.80) UDZ C. spadix 2018 1.38 (CI 1.02, 1.82) UDZ C. spadix 2019 1.38 (CI 1.00, 1.83) BIF C. nigrifrons 2010 0.75 (CI 0.67, 0.85) BIF C. nigrifrons 2011 0.82 (CI 0.72, 0.93) BIF C. nigrifrons 2012 0.95 (CI 0.85, 1.08) BIF C. nigrifrons 2013 0.72 (CI 0.60, 0.85) BIF C. nigrifrons 2014 0.83 (CI 0.72, 0.95) BIF C. nigrifrons 2015 0.98 (CI 0.85, 1.12) BIF C. nigrifrons 2016 1.42 (CI 1.28, 1.57) BIF C. nigrifrons 2017 1.33 (CI 1.17, 1.50) BIF T. scriptus 2010 0.02 (CI 0.00, 0.07) BIF T. scriptus 2011 0.01 (CI 0.00, 0.03) 75 Table C2 (cont’d) BIF T. scriptus 2012 0.01 (CI 0.00, 0.03) BIF T. scriptus 2013 0.04 (CI 0.03, 0.07) BIF T. scriptus 2014 0.04 (CI 0.03, 0.07) BIF T. scriptus 2015 0.04 (CI 0.02, 0.07) BIF T. scriptus 2016 0.06 (CI 0.05, 0.08) BIF T. scriptus 2017 0.07 (CI 0.05, 0.10) BIF T. spekii 2010 0.04 (CI 0.02, 0.10) BIF T. spekii 2011 0.04 (CI 0.02, 0.12) BIF T. spekii 2012 0.05 (CI 0.02, 0.12) BIF T. spekii 2013 0.05 (CI 0.02, 0.13) BIF T. spekii 2014 0.06 (CI 0.02, 0.15) BIF T. spekii 2015 0.08 (CI 0.05, 0.17) BIF T. spekii 2016 0.09 (CI 0.05, 0.17) BIF T. spekii 2017 0.10 (CI 0.05, 0.20) BIF C. silvicultor 2010 1.02 (CI 0.88, 1.20) BIF C. silvicultor 2011 1.03 (CI 0.90, 1.18) BIF C. silvicultor 2012 1.06 (CI 0.92, 1.23) BIF C. silvicultor 2013 1.22 (CI 1.07, 1.38) BIF C. silvicultor 2014 1.42 (CI 1.27, 1.58) BIF C. silvicultor 2015 1.35 (CI 1.17, 1.53) BIF C. silvicultor 2016 1.44 (CI 1.27, 1.63) BIF C. silvicultor 2017 1.53 (CI 1.35, 1.73) NFNP C. nigrifrons 2014 0.29 (CI 0.25, 0.34) NFNP C. nigrifrons 2015 0.53 (CI 0.42, 0.64) NFNP C. nigrifrons 2016 0.65 (CI 0.52, 0.79) NFNP C. nigrifrons 2017 0.81 (CI 0.65, 0.99) NFNP T. scriptus 2014 0.02 (CI 0.00, 0.06) NFNP T. scriptus 2015 0.04 (CI 0.01, 0.08) NFNP T. scriptus 2016 0.05 (CI 0.02, 0.09) NFNP T. scriptus 2017 0.06 (CI 0.02, 0.10) NFNP C. silvicultor 2014 0.07 (CI 0.03, 0.11) NFNP C. silvicultor 2015 0.11 (CI 0.05, 0.18) NFNP C. silvicultor 2016 0.15 (CI 0.08, 0.24) NFNP C. silvicultor 2017 0.24 (CI 0.15, 0.34) VNP C. nigrifrons 2014 3.22 (CI 2.88, 3.62) VNP C. nigrifrons 2015 3.42 (CI 3.07, 3.83) VNP C. nigrifrons 2016 3.67 (CI 3.28, 4.08) VNP T. scriptus 2014 2.88 (CI 2.48, 3.32) VNP T. scriptus 2015 2.91 (CI 2.52, 3.32) VNP T. scriptus 2016 2.95 (CI 2.55, 3.37) 76 Table C.3 Mean annual survival probability and gains with 95% credible intervals for each species across all parks. Species Annual apparent survival Gains C. callipygus 0.61 (CI 0.15, 0.94) 1.34 (CI 0.88, 1.95) C. dorsalis 0.56 (CI 0.12, 0.92) 0.92 (CI 0.67, 1.21) C. harveyi 0.88 (CI 0.43, 0.98) 0.36 (CI 0.24, 0.51) C. leucogaster 0.73 (CI 0.24, 0.96) 0.09 (CI 0.06, 0.15) P. monticola 0.67 (CI 0.18, 0.95) 1.48 (CI 1.11, 1.87) N. moschatus 0.83 (CI 0.32, 0.98) 0.14 (CI 0.09, 0.20) C. nigrifrons 0.66 (CI 0.17, 0.95) 0.22 (CI 0.18, 0.27) C. ogilbyi 0.89 (CI 0.44, 0.99) 0.57 (CI 0.36, 0.85) T. scriptus 0.66 (CI 0.16, 0.95) 0.03 (CI 0.02, 0.05) C. spadix 0.82 (CI 0.31, 0.98) 0.22 (CI 0.11, 0.37) T. spekii 0.73 (CI 0.19, 0.97) 0.02 (CI 0.01, 0.05) C. silvicultor 0.90 (CI 0.51, 0.99) 0.12 (CI 0.09, 0.16) Community-level 0.72 (CI 0.28, 0.96) 0.24 (CI 0.10, 0.56) 77 Table C.4 Park-level mean annual survival probability with 95% credible intervals. National park Annual apparent survival UDZ 0.77 (CI 0.56, 0.89) VNP 0.99 (CI 0.97, 1.00) NFNP 0.77 (CI 0.56, 0.92) BIF 0.81 (CI 0.69, 0.91) NNNP 0.66 (CI 0.51, 0.80) KRP 0.31 (CI 0.11, 0.52) 78 Population projection Table C.5 Quasi-extinction probability (for each of the 18 populations coming from parks with at least five years of data) for the projected window of 2020-2030. National park Species Quasi-extinction probability KRP P. monticola 0.00 KRP C. ogilbyi 0.00 KRP C. silvicultor 0.05 NNNP C. callipygus 0.00 NNNP C. dorsalis 0.00 NNNP C. leucogaster 0.07 NNNP P. monticola 0.00 NNNP C. nigrifrons 0.00 NNNP T. spekii 0.24 NNNP C. silvicultor 0.23 UDZ C. harveyi 0.00 UDZ N. moschatus 0.01 UDZ T. scriptus 0.24 UDZ C. spadix 0.01 BIF C. nigrifrons 0.04 BIF T. scriptus 0.03 BIF T. spekii 0.23 BIF C. silvicultor 0.01 79 APPENDIX D: Simulation code, and output Simulation study script Attached here is the R script to run the simulation study. #-----------# #-Libraries-# #-----------# library(spatstat) library(RandomFields) library(INLA) library(rgeos) library(TMB) library(gridExtra) #-----------# #-Functions-# #-----------# #Function to create dual mesh (original code from Krainski et al. 2018) source("rDualMesh.R") #Logit function logit <- function(pp) { log(pp) - log(1-pp) } #Inverse logit expit <- function(eta) { 1/(1+exp(-eta)) } 80 #-------------------------# #-Spatio-temporal domains-# #-------------------------# #Full spatio-temporal extent win <- owin(c(0, 3), c(0, 3)) #Window loc.d <- 3 * cbind(c(0, 1, 1, 0, 0), c(0, 0, 1, 1, 0)) #Coordinates #Domain 1 win1 <- owin(c(1, 2.75), c(0.25, 2)) #Domain 2 win2 <- owin(c(0, 3), c(0, 3), poly = list(x=c(0.25,0.75,0.75,2.75,2.75,0.25), y=c(0.25,0.25,2.25,2.25,2.75,2.75))) #Number of pixels npix <- 300 spatstat.options(npixel = npix) #Create triangulated mesh over domain mesh <- inla.mesh.2d(loc.domain = loc.d, offset = c(0.3, 1), max.edge = c(0.3, 0.7), cutoff = 0.05) # mesh1 <- inla.mesh.2d(boundary = inla.sp2segment(as(win1, "SpatialPolygons")), max.edge = 0.3, cutoff = 0.05) # # mesh2 <- inla.mesh.2d(boundary = inla.sp2segment(as(win2, "SpatialPolygons")), max.edge = 0.3, cutoff = 0.05) #X range of mesh x0 <- seq(min(mesh$loc[, 1]), max(mesh$loc[, 1]), length = npix) 81 #Y range of mesh y0 <- seq(min(mesh$loc[, 2]), max(mesh$loc[, 2]), length = npix) #Mesh window Mwin <- owin(c(min(x0), max(x0)), c(min(y0), max(y0))) #---------------# #-Simulate LGCP-# #---------------# #Parameters of intensity function beta <- c(6, 0.5) #Change in intensity delta <- c(0.5) #Parameters of thinning function alpha <- c(logit(0.01), 0.85) #Ecological covariate set.seed(100) RandomFields::RFoptions(spConform=FALSE) xcov1 <- RFsimulate(model = RMgauss(scale = 1.25), x = x0, y = y0, grid = TRUE) xcov2 <- xcov1 + RFsimulate(model = RMgauss(var = 0.5, scale = 9.5), x = x0, y = y0, grid = TRUE) x1 <- (xcov1 - mean(c(xcov1, xcov2))/sd(c(xcov1, xcov2))) x2 <- (xcov2 - mean(c(xcov1, xcov2))/sd(c(xcov1, xcov2))) #Thinning covariate pcov <- RFsimulate(model = RMgauss(scale = 1), x = x0, y = y0, grid = TRUE) pcov <- (pcov - min(pcov))/sd(pcov) set.seed(NULL) 82 #Range of Matern covaraince range <- 1.2 #Scale of Matern covariance kappa <- sqrt(8)/range #Variance of Matern covaraince sigma2 <- 0.2 #Precision of Matern covariance tau <- sqrt(4*pi*kappa*kappa*sigma2) #Smoothness of Matern covaraince nu <- 1 #Simulate spatial covariance zcov <- RFsimulate(model = RMmatern(var = sigma2, scale = kappa, nu = nu), x = x0, y = y0, grid = TRUE) #Intensity function X1 <- as.im(x1, W = Mwin)[win] X2 <- as.im(x2, W = Mwin)[win] z <- as.im(zcov, W = Mwin)[win] lambda1 <- exp(beta[1] + beta[2] * X1 + z) lambda2 <- exp(beta[1] + delta[1] + beta[2] * X2 + z) #Format intensity function lambda1 <- as.im(lambda1, W=win) lambda2 <- as.im(lambda2, W=win) #Simulate LGCP PP1 <- rpoispp(lambda1)[win] PP2 <- rpoispp(lambda2)[win] #---------------------------------# #-Simulate opportunistic sampling-# #---------------------------------# 83 #Thinning function thin <- alpha[1] + alpha[2] * pcov thin <- expit(thin) #Format thinning function thin <- as.im(thin, W=Mwin)[win] #Latent observations Z1 <- rbinom(n = PP1$n, size = 1, prob = thin[PP1]) PO1 <- PP1[Z1==1] PO1 <- PO1[win1] Z2 <- rbinom(n = PP2$n, size = 1, prob = thin[PP2]) PO2 <- PP2[Z2==1] PO2 <- PO2[win2] #-----------------# #-Simulate counts-# #-----------------# #Number of counts ncount <- 25 #Count locations u.loc <- expand.grid(seq(1 + 0.175, 2.75 - 0.175, 0.35), seq(0.25 + 0.175, 2 - 0.175, 0.35)) #Latent abundance N <- NULL #Counts count <- rep(0, ncount) 84 #Covariate value @ location Cov <- NULL #Unit area unit.A <- NULL for(j in 1:ncount){ unit <- disc(radius = 0.15, centre = as.numeric(u.loc[j,])) count[j] <- PP1[unit]$n count[j] <- rbinom(1, count[j], 0.9) Cov[j] <- mean(as.im(x1, W=Mwin)[unit]) unit.A[j] <- area(unit) } #Count dataframe countdf <- data.frame(count, Cov, unit.A, u.loc) #---------------# #-SPDE approach-# #---------------# #Update mesh mesh <- inla.mesh.2d(boundary = loc.d, max.edge = 0.3, cutoff = 0.05) #Create SPDE objects (Matern covariance) spde <- inla.spde2.matern(mesh) #Create dual mesh for weights dmesh <- book.mesh.dual(mesh) #Format location domain domain.polys <- Polygons(list(Polygon(loc.d)), '0') domainSP <- SpatialPolygons(list(domain.polys)) 85 #Weights of each node (area of dual mesh) weight <- sapply(1:length(dmesh), function(i) { if (gIntersects(dmesh[i, ], domainSP)) return(gArea(gIntersection(dmesh[i, ], domainSP))) else return(0) }) rm(dmesh) #--------------# #-Compile Data-# #--------------# #Node index nodes <- mesh$n #Observation index nobs <- sum(PO1$n, PO2$n) ncount <- dim(countdf)[1] #Period index t_i <- rep(0:1, c(PO1$n, PO2$n)) t_n <- 2 #Counts counts <- countdf$count #Area @ each count location Area <- countdf$unit.A #Covaraites @ nodes nloc <- as.ppp(mesh$loc[,1:2], W = Mwin) 86 nCov <- matrix(NA, nrow = nodes, ncol = t_n) nCov[,1] <- as.im(x1, W = Mwin)[nloc] nCov[,2] <- as.im(x2, W = Mwin)[nloc] nBias <- as.im(pcov, W = Mwin)[nloc] #Covariates @ obs Cov <- c(as.im(x1, W = Mwin)[PO1], as.im(x2, W = Mwin)[PO2], countdf$Cov) Bias <- c(as.im(pcov, W = Mwin)[PO1], as.im(pcov, W = Mwin)[PO2]) #Location of obsevations locxy <- rbind(cbind(PO1$x, PO1$y)[,2:1], cbind(PO2$x, PO2$y)[,2:1], as.matrix(countdf[,4:5])) #Projection matrix of observations A <- as(inla.spde.make.A(mesh, locxy), "dgTMatrix") #------------# #-Prediction-# #------------# Grid <- as.matrix(expand.grid(seq(0.01,2.99,0.01), seq(0.01,2.99,0.01))) npred <- dim(Grid)[1] Apred <- as(inla.spde.make.A(mesh, Grid), "dgTMatrix") predX <- matrix(NA, ncol = t_n, nrow = npred) predX[,1] <- interp.im(X1, x = Grid[,1], y = Grid[,2]) predX[,2] <- interp.im(X2, x = Grid[,1], y = Grid[,2]) #------------------# 87 #-Integrated Model-# #------------------# #Compile TMB code (only once) compile("Simulation.cpp") #Load TMB code dyn.load(dynlib("Simulation")) #Compile data data <- list("nodes" = nodes, "nobs" = nobs, "ncount" = ncount, "t_i" = t_i, "t_n" = t_n, "weight" = weight, "area" = Area, "A" = A, "counts" = counts, "nBias" = nBias, "nCov" = nCov, "Bias" = Bias, "Cov" = Cov, "spde" = spde$param.inla[c("M0","M1","M2")], "npred" = npred, "Apred" = Apred, "predX" = predX) #PHASE 1: Fit fixed effects #Parameters to estimate params <- list("beta0" = 0, "beta1" = 1, "delta1" = 1, "alpha0" = logit(0.01), "alpha1" = 0, "log_kappa" = log(sqrt(8)/range), "log_tau_O" = 1, "omega" = rep(0, mesh$n)) #Define random effects random = c("omega") #Map map <- list( "log_kappa" = as.factor(NA), "log_tau_O" = as.factor(NA), "omega" = factor(rep(NA, mesh$n))) 88 #Make AD objective function obj1 <- MakeADFun(data = data, parameters = params, random = random, map = map, DLL="Simulation") #Trace parameters obj1$env$tracepar <- TRUE #Minimize objective function opt1 <- nlminb(obj1$par, obj1$fn, obj1$gr) #Calculate standard deviations out1 <- sdreport(obj1) #PHASE 2: Fit random effects params <- list("beta0" = opt1$par[1], "beta1" = opt1$par[2], "delta1" = opt1$par[3], "alpha0" = opt1$par[4], "alpha1" = opt1$par[5], "log_kappa" = log(kappa), "log_tau_O" = 1, "omega" = rep(0, mesh$n)) #Map map <- list( "beta0" = as.factor(NA), "beta1" = as.factor(NA), "delta1" = as.factor(NA), "alpha0" = as.factor(NA), "alpha1" = as.factor(NA)) #Make AD objective function obj2 <- MakeADFun(data = data, parameters = params, random = random, map = map, DLL="Simulation") #Trace parameters 89 obj2$env$tracepar <- TRUE #Minimize objective function opt2 <- nlminb(obj2$par, obj2$fn, obj2$gr) #Calculate standard deviations out2 <- sdreport(obj2) #Store output output <- as.data.frame(round(cbind(Truth = c(beta, delta, alpha, PP1$n, PP2$n, kappa, tau), rbind(summary(out1)[!grepl("omega|sigma_O|tau_O|kappa", rownames(summary(out1))),], summary(out2)[grepl("kappa|tau_O", rownames(summary(out2))),])), digits = 2)) colnames(output)[2:3] <- c("IM Estimate", "IM Std. Error") #---------------------# #-Presence-only model-# #---------------------# #Compile TMB code (only once) compile("Simulation_PO.cpp") #Load TMB code dyn.load(dynlib("Simulation_PO")) #Compile data data <- list("nodes" = nodes, "nobs" = nobs, "t_i" = t_i, "t_n" = t_n, "weight" = weight, "A" = A[1:nobs,], "nBias" = nBias, "nCov" = nCov, "Bias" = Bias, "Cov" = Cov[1:nobs], "spde" = spde$param.inla[c("M0","M1","M2")], 90 "npred" = npred, "Apred" = Apred, "predX" = predX) #PHASE 1: Fit fixed effects #Parameters to estimate params <- list("beta0" = 0, "beta1" = 1, "delta1" = 1, "alpha0" = logit(0.01), "alpha1" = 0, "log_kappa" = log(sqrt(8)/range), "log_tau_O" = 1, "omega" = rep(0, mesh$n)) #Map map <- list( "log_kappa" = as.factor(NA), "log_tau_O" = as.factor(NA), "omega" = factor(rep(NA, mesh$n))) #Make AD objective function obj3 <- MakeADFun(data = data, parameters = params, random = random, map = map, DLL="Simulation_PO") #Trace parameters obj3$env$tracepar <- TRUE #Minimize objective function opt3 <- nlminb(obj3$par, obj3$fn, obj3$gr) #Calculate standard deviations out3 <- sdreport(obj3) params <- list("beta0" = opt1$par[1], "beta1" = opt1$par[2], "delta1" = opt1$par[3], "alpha0" = opt1$par[4], "alpha1" = opt1$par[5], "log_kappa" = log(sqrt(8)/range), "log_tau_O" = 1, "omega" = rep(0, mesh$n)) 91 #Map map <- list( "beta0" = as.factor(NA), "beta1" = as.factor(NA), "delta1" = as.factor(NA), "alpha0" = as.factor(NA), "alpha1" = as.factor(NA)) #Define random effects random = c("omega") #Make AD objective function obj4 <- MakeADFun(data = data, parameters = params, map = map, random = random, DLL="Simulation_PO") #Trace parameters obj4$env$tracepar <- TRUE #Minimize objective function opt4 <- nlminb(obj4$par, obj4$fn, obj4$gr) #Calculate standard deviations out4 <- sdreport(obj4) #Store output output <- merge(output, as.data.frame(round(rbind(summary(out3)[!grepl("omega|sigma_O|tau_O|kappa", rownames(summary(out3))),], summary(out4)[grepl("kappa|tau_O", rownames(summary(out4))),]), digits = 2)), by='row.names', all=TRUE) colnames(output)[5:6] <- c("PO Estimate", "PO Std. Error") 92 output$Row.names[7:8] <- c("N[1]", "N[2]") rownames(output) <- output$Row.names output <- output[,-1] convergence <- data.frame("pdHess" = c(out1$pdHess, out2$pdHess, out3$pdHess, out4$pdHess), "grd.size" = c(all(abs(out1$gradient.fixed) < 0.01), all(abs(out2$gradient.fixed) < 0.01), all(abs(out3$gradient.fixed) < 0.01), all(abs(out4$gradient.fixed) < 0.01))) rownames(convergence) <- c("IM_fixed", "IM_random", "PO_fixed", "PO_random") out <- list(output, convergence) message("finished") #-----------# #-Save file-# #-----------# ID <- length(list.files("./Output/")) + 1 save(out, file = paste("./Output/output", ID, ".Rds", sep="")) tmp <- list(summary(out1), summary(out2), summary(out3), summary(out4)) save(tmp, file = "tmp_out.Rds") 93 Template Model Builder code for simulation #include // Integrated Log Gaussian Cox Process template Type objective_function::operator() () { // objective function -- joint negative log-likelihood using namespace R_inla; using namespace density; using namespace Eigen; // DATA // // Indices DATA_INTEGER( nodes ); // Number of nodes DATA_INTEGER( nobs ); // Number of presence-only observations DATA_INTEGER( ncount ); // Number of structured sampling sites DATA_INTEGER( t_n ); // Number of stages DATA_IVECTOR( t_i ); // Stage ID for data // Projection & weight data DATA_VECTOR( weight ); // Node weight DATA_VECTOR( area ); // Area sweept for counts DATA_SPARSE_MATRIX( A ); // Projection matrix // Count dataset DATA_VECTOR( counts ); // Number of counts per site // Covaraite dataset DATA_VECTOR( nBias ); //Sampling bias @ nodes DATA_MATRIX( nCov ); //Environmental covariate @ nodes 94 DATA_VECTOR( Bias ); //Sampling bias @ observations DATA_VECTOR( Cov ); //Environmental covariate @ data // SPDE objects DATA_STRUCT(spde,spde_t); // Sparse matrix for Matern covariance structure // Prediction DATA_INTEGER( npred ); // Number of pixels for prediction DATA_MATRIX( predX ); // Environmental covariate for prediction DATA_SPARSE_MATRIX( Apred ); // Projection matrix for prediction // PARAMETERS // // Fixed effects PARAMETER( beta0 ); // Baseline population density PARAMETER( beta1 ); // Effect of environmental covariate PARAMETER( delta1 ); // Effect of stage (population change) PARAMETER( alpha0 ); // Thinning function intercept PARAMETER( alpha1 ); // Effect of sampling bias PARAMETER( log_kappa ); // Scale parameter of Matern covariance PARAMETER( log_tau_O ); // Precision parameter of Matern covariance // Random effects PARAMETER_VECTOR( omega ); // Spatial random effect // Population density at each stage vector beta(t_n); beta(0) = beta0; beta(1) = beta0 + delta1; // Derived parameters Type kappa = exp(log_kappa); Type tau_O = exp(log_tau_O); 95 Type range = sqrt(8)/kappa; Type sigma_O = 1/sqrt(4*PI*tau_O*tau_O*kappa*kappa); vector jnll_comp(4); jnll_comp.setZero(); // Probability of random effects SparseMatrix Q = Q_spde(spde,kappa); // Matern covariance (see R_inla namespace) jnll_comp(0) += GMRF(Q)( omega ); // Holding values vector Omega(nodes); vector omg(nobs + ncount); // Transform GMRFs for(int k=0; k lambda(ncount); for(int j=0; j predO(npred); predO = Apred * Omega; vector pred(t_n); vector Npred(t_n); for(int t=0; t% filter(period == 1))), "dgTMatrix") A2 <- as(inla.spde.make.A(mesh[[2]], st_coordinates(Data %>% filter(period == 2))), "dgTMatrix") A3 <- as(inla.spde.make.A(mesh[[3]], st_coordinates(Data %>% filter(period == 3))), "dgTMatrix") #--------------# #-Compile data-# #--------------# #Number of years t_n <- Data %>% summarize(out = n_distinct(yr)) %>% select(out) %>% .$out #Number of stages p_n <- Data %>% summarize(out = n_distinct(period)) %>% select(out) %>% .$out #Number of nodes in early spring nodes1 <- mesh[[1]]$n #Number of nodes in late spring nodes2 <- mesh[[2]]$n #Number of nodes in early summer nodes3 <- mesh[[3]]$n #Number of nodes across stages and years nodes <- as.integer(t_n*(nodes1 + nodes2 + nodes3)) #MOVE TO NODE FORMATTING NodeDF$period <- as.factor(rep(1:3, c(nodes1 * 3, nodes2 * 3, nodes3 * 3))) #Year-stage identifier for nodes tp_k <- as.integer(fct_cross(NodeDF$period, NodeDF$Year)) - 1 #Stage identifier for nodes p_k <- as.integer(NodeDF$period) - 1 #Number of presence-only observations nobs <- dim(Data %>% filter(type == "obs"))[1] #Number of presence-only observations in early spring nobs1 <- dim(Data %>% filter(type == "obs" & period == 1))[1] #Number of presence-only observations in late spring nobs2 <- dim(Data %>% filter(type == "obs" & period == 2))[1] 103 #Number of presence-only observations in early summer nobs3 <- dim(Data %>% filter(type == "obs" & period == 3))[1] #Number of single-visit sites ncount <- dim(Data %>% filter(type == "count"))[1] #Number of single-visit sites in early spring ncount1 <- dim(Data %>% filter(type == "count" & period == 1))[1] #Number of single-visit sites in late spring ncount2 <- dim(Data %>% filter(type == "count" & period == 2))[1] #Number of single-visit sites in early summer ncount3 <- dim(Data %>% filter(type == "count" & period == 3))[1] #Single-visit counts @ each site counts <- as.numeric(Data %>% filter(type == "count") %>% select(count) %>% .$count) #Year-stage identifier for data tp_i <- as.integer(fct_cross(as.factor(Data$period), as.factor(Data$yr))) - 1 #CHECK THIS #Stage identifier for data p_i <- as.integer(as.factor(Data$period)) - 1 #Weight of each node @ 100 m^2 weight <- rep(c(weight1, weight2, weight3), t_n) * 1e4 #Area/effort offset area <- as.numeric(Data %>% filter(type == "count") %>% select(effort) %>% .$effort) #Covert to 100 m^2 area <- area*6*1000*1e-2 #Observation covariates Data <- Data %>% mutate(daysaccum = as.numeric(as.Date(paste0(yr, "-", mo_day)) - as.Date(paste0(yr, ifelse(period == 1, "-03-01", ifelse(period == 2, "-03-22", "-04-26"))))) + 1, gdd.avg = gdd2/daysaccum) #Normalized Difference Vegetation Index @ data NDVI <- Data$NDVI #Averaged daily growing degree days @ data GDD <- Data$gdd.avg #Number of other butterfly observations @ data Bias <- as.numeric(Data %>% filter(type == "obs") %>% select(Bias) %>% .$Bias) #Population density @ data 104 PopD <- as.numeric(Data %>% filter(type == "obs") %>% select(PopD) %>% .$PopD) #Node covariates NodeDF <- NodeDF %>% mutate(daysaccum = ifelse(period == 1, 35, 42), gdd.avg = gdd2/daysaccum) #Normalized Difference Vegetation Index @ nodes nNDVI <- NodeDF$NDVI #Averaged daily growing degree days @ nodes nGDD <- NodeDF$gdd.avg #Number of other butterfly observations @ nodes nBias <- NodeDF$Bias #Population density @ nodes nPopD <- NodeDF$PopD #Scale covaraites for both data & nodes tmp1 <- (NDVI - mean(c(NDVI, nNDVI)))/sd(c(NDVI, nNDVI)) tmp2 <- (GDD - mean(c(GDD, nGDD)))/sd(c(GDD, nGDD)) tmp3 <- (Bias - mean(c(Bias, nBias)))/sd(c(Bias, nBias)) tmp4 <- (PopD - mean(c(PopD, nPopD)))/sd(c(PopD, nPopD)) tmp5 <- (nNDVI - mean(c(NDVI, nNDVI)))/sd(c(NDVI, nNDVI)) tmp6 <- (nGDD - mean(c(GDD, nGDD)))/sd(c(GDD, nGDD)) tmp7 <- (nBias - mean(c(Bias, nBias)))/sd(c(Bias, nBias)) tmp8 <- (nPopD - mean(c(PopD, nPopD)))/sd(c(PopD, nPopD)) NDVI <- tmp1 GDD <- tmp2 Bias <- tmp3 PopD <- tmp4 nNDVI <- tmp5 nGDD <- tmp6 nBias <- tmp7 nPopD <- tmp8 #--------------# #-Optimize nll-# #--------------# #Compile TMB code (only once) # compile("./DataAnalysis/IntegratedModel_Final.cpp") #Load TMB code dyn.load(dynlib("./DataAnalysis/IntegratedModel_Final")) #PHASE 1: Fit fixed effects #Compile data for TMB data <- list("nodes1" = nodes1, "nodes2" = nodes2, "nodes3" = nodes3, "nobs1" = nobs1, "nobs2" = nobs2, "nobs3" = nobs3, 105 "ncount" = ncount, "ncount1" = ncount1, "ncount2" = ncount2, "ncount3" = ncount3, "counts" = counts, "t_n" = t_n, "p_n" = p_n, "tp_k" = tp_k, "tp_i" = tp_i, "p_i" = p_i, "p_k" = p_k, "weight" = weight, "area" = area, "A1" = A1, "spde1" = spde1$param.inla[c("M0","M1","M2")], "A2" = A2, "spde2" = spde2$param.inla[c("M0","M1","M2")], "A3" = A3, "spde3" = spde3$param.inla[c("M0","M1","M2")], "nBias" = nBias, "nPopD" = nPopD, "nNDVI" = nNDVI, "nGDD" = nGDD, "Bias" = Bias, "PopD" = PopD, "NDVI" = NDVI, "GDD" = GDD) #Parameters to estimate params <- list("beta0" = 0, "beta1" = rep(0,3), "beta2" = rep(0,3), "beta3" = rep(0,3), "beta4" = rep(0,3), "delta1" = 0, "delta2" = 0, "gamma1" = 0, "gamma2" = 0, "alpha0" = 0, "alpha1" = rep(0,9), "alpha2" = rep(0,9), "log_kappa" = 0, "log_tau" = 1, "omega1" = rep(0, nodes1), "omega2" = rep(0, nodes2), "omega3" = rep(0, nodes3)) #Hold random effects constant map <- list( "log_kappa" = as.factor(NA), "log_tau" = as.factor(NA), "omega1" = factor(rep(NA, nodes1)), "omega2" = factor(rep(NA, nodes2)), "omega3" = factor(rep(NA, nodes3)) ) #Random effects random <- c("omega1", "omega2", "omega3") #Make AD objective function obj1 <- MakeADFun(data = data, parameters = params, random = random, map = map, DLL="IntegratedModel_Final") #Trace parameters obj1$env$tracepar <- TRUE #Minimize objective function opt1 <- nlminb(obj1$par, obj1$fn, obj1$gr) #AIC 2 * length(obj1$par) - 2 * (-1 * sum(obj1$report()$jnll_comp[3:12])) #Output out1 <- sdreport(obj1) summary(out1) #Extract parameter values beta0 <- out1$par.fixed[1] 106 beta1 <- out1$par.fixed[2:4] beta2 <- out1$par.fixed[5:7] beta3 <- out1$par.fixed[8:10] beta4 <- out1$par.fixed[11:13] delta1 <- out1$par.fixed[14] delta2 <- out1$par.fixed[15] gamma1 <- out1$par.fixed[16] gamma2 <- out1$par.fixed[17] #Expected population densities per stage and year beta <- NULL #2016 stage 1 (early spring) beta[1] <- beta0 + gamma1 + gamma2 + delta1 + delta2 #2016 stage 2 (late spring) beta[2] <- beta0 + gamma1 + delta1 + delta2 #2016 stage 3 (early summer) beta[3] <- beta0 + delta1 + delta2 #2017 stage 1 (early spring) beta[4] <- beta0 + gamma1 + gamma2 + delta1 #2017 stage 2 (late spring) beta[5] <- beta0 + gamma1 + delta1 #2017 stage 3 (early summer) beta[6] <- beta0 + delta1 #2018 stage 1 (early spring) beta[7] <- beta0 + gamma1 + gamma2 #2018 stage 2 (late spring) beta[8] <- beta0 + gamma1 #2018 stage 3 (early summer) beta[9] <- beta0 #Compile output Output <- data.frame(year = factor(rep(2016:2018, each = 3)), period = factor(rep(1:3, 3)), mean.den = beta) Output$lower.den <- NA Output$upper.den <- NA #Profile confidence intervals Output[1,4:5] <- confint(tmbprofile(obj1, lincomb = as.numeric(names(obj1$par) %in% c("beta0","gamma1","gamma2","delta1","delta2")))) Output[2,4:5] <- confint(tmbprofile(obj1, lincomb = as.numeric(names(obj1$par) %in% c("beta0","gamma1","delta1","delta2")))) Output[3,4:5] <- confint(tmbprofile(obj1, lincomb = as.numeric(names(obj1$par) %in% c("beta0","delta1","delta2")))) Output[4,4:5] <- confint(tmbprofile(obj1, lincomb = as.numeric(names(obj1$par) %in% c("beta0","gamma1","gamma2","delta1")))) 107 Output[5,4:5] <- confint(tmbprofile(obj1, lincomb = as.numeric(names(obj1$par) %in% c("beta0","gamma1","delta1")))) Output[6,4:5] <- confint(tmbprofile(obj1, lincomb = as.numeric(names(obj1$par) %in% c("beta0","delta1")))) Output[7,4:5] <- confint(tmbprofile(obj1, lincomb = as.numeric(names(obj1$par) %in% c("beta0","gamma1","gamma2")))) Output[8,4:5] <- confint(tmbprofile(obj1, lincomb = as.numeric(names(obj1$par) %in% c("beta0","gamma1")))) Output[9,4:5] <- confint(tmbprofile(obj1, lincomb = as.numeric(names(obj1$par) %in% c("beta0")))) #PHASE 2: Fit random effects #Parameters to estimate params2 <- list("beta0" = beta0, "beta1" = beta1, "beta2" = beta2, "beta3" = beta3, "beta4" = beta4, "delta1" = delta1, "delta2" = delta2, "gamma1" = gamma1, "gamma2" = gamma2, "alpha0" = out1$par.fixed[18], "alpha1" = out1$par.fixed[19:27], "alpha2" = out1$par.fixed[28:36], "log_kappa" = 0, "log_tau" = 1, "omega1" = rep(0, nodes1), "omega2" = rep(0, nodes2), "omega3" = rep(0, nodes3)) #Hold fixed effects constant map2 <- list( "alpha0" = as.factor(NA), "alpha1" = factor(rep(NA, 9)), "alpha2" = factor(rep(NA, 9)), "beta0" = as.factor(NA), "beta1" = factor(rep(NA, 3)), "beta2" = factor(rep(NA, 3)), "beta3" = factor(rep(NA, 3)), "beta4" = factor(rep(NA, 3)), "delta1" = as.factor(NA), "delta2" = as.factor(NA), "gamma1" = as.factor(NA), "gamma2" = as.factor(NA) ) #Make AD objective function obj2 <- MakeADFun(data = data, parameters = params2, map = map2, random = random, DLL="IntegratedModel_Final") #Trace parameters obj2$env$tracepar <- TRUE #Minimize objective function opt2 <- nlminb(obj2$par, obj2$fn, obj2$gr) #Output out2 <- sdreport(obj2) #Estimated parameter values 108 Effect <- data.frame(param = rep(c("beta1", "beta2", "beta3", "beta4"), each = 3), period = factor(rep(1:3, 4)), mean.den = c(beta1, beta2, beta3, beta4)) Effect$lower <- NA Effect$upper <- NA Effect[1,4:5] <- confint(tmbprofile(obj1, lincomb = c(0,1,rep(0,34)))) Effect[2,4:5] <- confint(tmbprofile(obj1, lincomb = c(0,0,1,rep(0,33)))) Effect[3,4:5] <- confint(tmbprofile(obj1, lincomb = c(0,0,0,1,rep(0,32)))) Effect[4,4:5] <- confint(tmbprofile(obj1, lincomb = c(0,0,0,0,1,rep(0,31)))) Effect[5,4:5] <- confint(tmbprofile(obj1, lincomb = c(0,0,0,0,0,1,rep(0,30)))) Effect[6,4:5] <- confint(tmbprofile(obj1, lincomb = c(rep(0,6),1,rep(0,29)))) Effect[7,4:5] <- confint(tmbprofile(obj1, lincomb = c(rep(0,7),1,rep(0,28)))) Effect[8,4:5] <- confint(tmbprofile(obj1, lincomb = c(rep(0,8),1,rep(0,27)))) Effect[9,4:5] <- confint(tmbprofile(obj1, lincomb = c(rep(0,9),1,rep(0,26)))) Effect[10,4:5] <- confint(tmbprofile(obj1, lincomb = c(rep(0,10),1,rep(0,25)))) Effect[11,4:5] <- confint(tmbprofile(obj1, lincomb = c(rep(0,11),1,rep(0,24)))) Effect[12,4:5] <- confint(tmbprofile(obj1, lincomb = c(rep(0,12),1,rep(0,23)))) Template Model Builder code for case study The TMB code to generate the objective function containing the negative log likelihood for optimization. #include // Integrated Log Gaussian Cox Process template Type objective_function::operator() () { // Objective function -- joint negative log-likelihood using namespace R_inla; using namespace density; using namespace Eigen; // Indices DATA_INTEGER( nodes1 ); // Number of nodes in early spring DATA_INTEGER( nodes2 ); // Number of nodes in late spring DATA_INTEGER( nodes3 ); // Number of nodes in early summer DATA_INTEGER( nobs1 ); // Number of presence-only observations in early spring DATA_INTEGER( nobs2 ); // Number of presence-only observations in late spring DATA_INTEGER( nobs3 ); // Number of presence-only observations in early summer DATA_INTEGER( ncount ); // Number of structured sampling sites DATA_INTEGER( ncount1 ); // Number of structured sampling sites in early spring 109 DATA_INTEGER( ncount2 ); // Number of structured sampling sites in late spring DATA_INTEGER( ncount3 ); // Number of structured sampling sites in early summer DATA_INTEGER( t_n ); // Number of years DATA_INTEGER( p_n ); // Number of stages DATA_IVECTOR( tp_i ); // Index for year and stage combination for the data DATA_IVECTOR( tp_k ); // Index for year and stage combination for the nodes DATA_IVECTOR( p_i ); // Stage ID for the data DATA_IVECTOR( p_k ); // Stage ID for the nodes // Projection & weight data DATA_VECTOR( weight ); // Node weight DATA_VECTOR( area ); // Area sweept for counts DATA_SPARSE_MATRIX( A1 ); // Projection matrix early spring DATA_SPARSE_MATRIX( A2 ); // Projection matrix late spring DATA_SPARSE_MATRIX( A3 ); // Projection matrix early summer //Count data DATA_VECTOR( counts ); // Number of counts per site // Covaraite dataset DATA_VECTOR( nBias ); // Number of other butterfly observations @ nodes DATA_VECTOR( nPopD ); // Human population density @ nodes DATA_VECTOR( nNDVI ); // NDVI @ nodes DATA_VECTOR( nGDD ); // GDD @ nodes DATA_VECTOR( Bias ); // Number of other butterfly observations @ data DATA_VECTOR( PopD ); // Human population density @ data DATA_VECTOR( NDVI ); // NDVI @ data DATA_VECTOR( GDD ); // GDD @ data // SPDE objects DATA_STRUCT(spde1,spde_t); // Sparse matrix for Matern covariance structure in early spring DATA_STRUCT(spde2,spde_t); // Sparse matrix for Matern covariance structure in late spring DATA_STRUCT(spde3,spde_t); // Sparse matrix for Matern covariance structure in early summer // Fixed effects PARAMETER( beta0 ); // Baseline population density PARAMETER_VECTOR( beta1 ); // Effect of NDVI PARAMETER_VECTOR( beta2 ); // Quadratic effect of NDVI PARAMETER_VECTOR( beta3 ); // Effect of GDD PARAMETER_VECTOR( beta4 ); // Quadratic effect of GDD PARAMETER( delta1 ); // Effect of 2016 PARAMETER( delta2 ); // Effect of 2017 PARAMETER( gamma1 ); // Effect of early spring PARAMETER( gamma2 ); // Effect of late spring PARAMETER( alpha0 ); // Thinning function intercept PARAMETER_VECTOR( alpha1 ); // Effect of number of other butterfly observations PARAMETER_VECTOR( alpha2 ); // Effect of human population density PARAMETER( log_kappa ); // Scale parameter of Matern covariance 110 PARAMETER( log_tau ); // Precision parameter of Matern covariance // Random effects PARAMETER_VECTOR( omega1 ); // Spatial random effect in early spring PARAMETER_VECTOR( omega2 ); // Spatial random effect in late spring PARAMETER_VECTOR( omega3 ); // Spatial random effect in early summer // Population density at each stage and year vector beta(t_n * p_n); beta(0) = beta0 + gamma1 + gamma2 + delta1 + delta2; //2016 early spring beta(1) = beta0 + gamma1 + delta1 + delta2; //2016 late spring beta(2) = beta0 + delta1 + delta2; //2016 early summer beta(3) = beta0 + gamma1 + gamma2 + delta1; //2017 early spring beta(4) = beta0 + gamma1 + delta1; //2017 late spring beta(5) = beta0 + delta1; //2017 early summer beta(6) = beta0 + gamma1 + gamma2; //2018 early spring beta(7) = beta0 + gamma1; //2018 late spring beta(8) = beta0; //2018 early summer // Derived parameters for computational purposes Type kappa = exp(log_kappa); Type tau = exp(log_tau); Type range = sqrt(8)/kappa; Type sigma = 1/sqrt(4*PI*tau*tau*kappa*kappa); vector jnll_comp(12); jnll_comp.setZero(); // Probability of random effects SparseMatrix Q1 = Q_spde(spde1,kappa); SparseMatrix Q2 = Q_spde(spde2,kappa); SparseMatrix Q3 = Q_spde(spde3,kappa); jnll_comp(0) += GMRF(Q1)( omega1 ); jnll_comp(1) += GMRF(Q2)( omega2 ); jnll_comp(2) += GMRF(Q3)( omega3 ); // Holding values vector Omega1(nodes1); vector Omega2(nodes2); vector Omega3(nodes3); vector omg1(nobs1 + ncount1); vector omg2(nobs2 + ncount2); vector omg3(nobs3 + ncount3); // Transform GMRFs //Early spring for(int k=0; k lambda(ncount); int f=0; for(int i=0; i