EVALUATING THE PERFORMANCE AND APPLICABILITY OF STATE-SPACE STOCK ASSESSMENT MODELS By Emily Morgan Liljestrand A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Fisheries and Wildlife – Doctor of Philosophy Ecology, Evolutionary Biology and Behavior – Dual Major 2023 ABSTRACT Fisheries stock assessment models are used to estimate population demographics and dynamics such as abundance, biomass, and fishing mortality from input fishery data including total catch, composition of catch, and fishing effort. A goal of stock assessment is to accurately quantify population and fishery dynamics so stocks can be managed to achieve fishery objectives and long-term sustainability. Accurate and precise model estimates can be attained by using models and techniques that account for ecological complexity like variability in quantities across ages, years, or regions without overparameterization. The state-space framework is one such statistical technique that may allow for incorporating more stochasticity such that the model can better reflect reality. The state-space modeling framework assumes that unobserved “states” develop over time due to process error modeled as a random effect and that observed data have expected values based on these states but differ from expectations due to observation error. State-space stock assessment models (SSSAM) have experienced an outpouring of research and application in the past decade as computation processing power and novel software has facilitated the approximation of the high-level integrals necessary for SSSAM. SSSAM allows for several time-varying processes in recruitment, numbers at age, mortality, selectivity, and catchability, and has become an essential part of the contemporary fisheries modeling toolbox. With their swift advancement, it is important to understand best practices of applying state-space stock assessment models, and how data availability, the variability of process or observation error, and model structure may influence model results and accuracy. In Chapter 1 I built an age-based state-space stock assessment model that used fisheries dependent data, rather than fisheries independent surveys, as an index of abundance and was applied to Lake Michigan lake whitefish (Coregonus clupeaformis). The model predicted greater ii abundance and lower mortality compared to the non-state-space model and domed rather than asymptotic selectivity. The state-space model also had reduced retrospective patterns in recruitment. Chapters 2 and 3 each used a simulation-estimation framework to generate catch and index data using a state-space stock assessment that assumed process variability in recruitment, expected survival (abundance), and selectivity. Simulations were based on a Gulf of Maine haddock (Melanogrammus aeglefinus) model and assumed different degrees of process and observation variance (Chapter 2) or assumed observation error likelihood distributions for the proportions at age (Chapter 3) to generate data. Simulated data were input into several estimation models with alternative assumptions about contributing sources of process variability and/or observation error distributions. The results show that state-space models which assume several sources of process variability can produce unbiased estimates even when processes are constant over time. The state-space models were able to estimate process variance in several stochastic processes under a broad range of true values. However, assuming variability in expected survival when it is deterministic can lead to the model not converging. Unbiased results are achieved when the observation likelihood is structured to account for inter-age correlation and overdispersion though such a framework may have difficulty allocating variance between process and observation sub-models. The concluding chapter places this work in the context of other age-based stock assessment models and argues for the inclusion of state-space in the modeling toolbox, as they can account for multiple time-varying processes and be used in a broad range of data contexts. This work provides a blueprint for where and how SSSAM may be best utilized in the future, particularly with data limited or data poor stocks and in cases where the process variance is unknown and should be estimated within the model. iii ACKNOWLEDGMENTS I am deeply grateful to everyone who mentored me, guided me, taught me, or provided me advice and feedback during my graduate program at Michigan State University. From those who instructed me in hierarchical modeling in the classroom, to those who led me through the administrative difficulties of being a graduate student, to even those who showed me the best way to coax ketchup from a stubborn bottle, you know who you are and that you are in my thanks. I want to acknowledge my fellow members of the Quantitative Fisheries Center, graduate students and post-docs in the Fisheries and Wildlife and Ecology, Evolution, and Behavior programs at Michigan State University, and other NOAA/Sea Grant Population Dynamics Fellows (i.e., my “fellow fellows”). In particular, Grant Adams, Matt Damiano, Nick Fisch, and Amanda Hart have been consistent supporters and have given me extensive feedback and advice. Andie Perreault, Matt Cheng, and Andres Beita are stellar and have been generous with their time and expertise. Sara Hugentobler, Zac Johnston, and Miranda Wade thank you for providing my Michigan homes away from home and thanks Zac especially for teaching me to use the HPCC. Thank you to Ben Kline for being a constant source of optimism. Liz Stebbins, you are brilliant and the greatest roommate, coworker, and friend a person could ever ask for. Beth Gerstner, thank you for all the coffee dates. I could not have completed my work or shared it with quite as many people if not for the years of funding and support from all levels of the University. Thank you to my PhD committee, Travis Brenden, Mary Bremigan, and Elise Zipkin for your guidance and advice. Thank you to my many teachers and mentors in the Fisheries and Wildlife department who gave me the educational foundation to complete this work including Rique Campa, JP Steibel, Brian Roth, Mike Wagner, and Erin Dreelin. Members of the Modeling Sub-Committee of the 1836 Treaty iv Waters offered feedback and guidance. Ted Treska and Charles Perretti provided data. Stuart Ludsin at Ohio State University was an endless wellspring of wisdom and life advice. My work was made possible by the QFC Ricker Fellowship, MSU Graduate School Research Enhancement Award, NMFS-Sea Grant Joint Fellowship, The College of Agriculture and Natural Resources (CANR) Alumni Fellowship, the IAGLR David M. Dolan Scholarship, and the Fisheries and Wildlife Robert C. Ball and Betty A. Ball Fisheries and Wildlife Fellowship, as well as many travel grants, too many to name. Thank you to Jim Bence and Jon Deroba who have been patient and compassionate teachers, and my undeniable role models in mentorship. Your wisdom and passion for fisheries science is contagious. Everyone who works with Jon at the Northeast Fisheries Science Center during my “sabbatical” there were incredibly supporting, and I cannot thank Julie Nieland, Liz Brooks, Tim Miller, and Dan Hennen enough for creating a welcoming environment. And lastly, I could not have done this without the love and encouragement of my family. Avery Baker graciously offered me her home, her standing desk, and her dog Louie to take on much needed afternoon walks. My parents Howard Liljestrand and Blinda McClelland set me on my scientific path and gave me substantial financial and technical support. My brother Frasier prepared me for the realities and challenges of being a graduate student. And most importantly, thank you to my best friend and partner Laura de Castro Quaglia. Every word of encouragement, every time you restarted my stalled-out R code for me, and every cup of coffee you brewed made it possible for me to reach this goal. I owe everything to you. v TABLE OF CONTENTS LIST OF ABBREVIATIONS ...................................................................................................... vii INTRODUCTION ..........................................................................................................................1 REFERENCES ....................................................................................................................6 CHAPTER 1: APPLYING A NOVEL STATE-SPACE STOCK ASSESSMENT FRAMEWORK USING A FISHERIES-DEPENDENT INDEX OF FISHING MORTALITY ................................................................................................................................10 1.1 Abstract .........................................................................................................................10 1.2 Introduction ...................................................................................................................11 1.3 Methods ........................................................................................................................14 1.4 Results ...........................................................................................................................26 1.5 Discussion ....................................................................................................................35 REFERENCES ..................................................................................................................42 APPENDIX 1A: DATA COLLECTION AND PROCESSING ........................................49 APPENDIX 1B: CURRENT STATISTICAL CATCH AT AGE MODEL (SCA) ...........52 APPENDIX 1C: MODEL DIAGNOSTICS FOR THE STATE-SPACE MODEL ...........66 CHAPTER 2: THE EFFECT OF PROCESS VARIABILITY AND DATA QUALITY ON PERFORMANCE OF A STATE-SPACE STOCK ASSESSMENT MODEL ............................72 2.1 Abstract .........................................................................................................................72 2.2 Introduction ...................................................................................................................73 2.3 Methods ........................................................................................................................76 2.4 Results ...........................................................................................................................89 2.5 Discussion ..................................................................................................................102 REFERENCES ................................................................................................................108 APPENDIX 2A: INPUT OBSERVATION VARIANCE VALUES ...............................113 APPENDIX 2B: SUPPLEMENTARY FIGURES ...........................................................115 CHAPTER 3: THE PERFORMANCE OF ALTERNATIVE AGE COMPOSITION LIKELIHOODS IN STATE-SPACE MODELS WITH MULTIPLE SOURCES OF PROCESS VARIATION ..............................................................................................................................132 3.1 Abstract .......................................................................................................................132 3.2 Introduction .................................................................................................................133 3.3 Methods ......................................................................................................................136 3.4 Results .........................................................................................................................144 3.5 Discussion ..................................................................................................................164 REFERENCES ................................................................................................................169 APPENDIX 3A: PREDICTED VALUES AND SUPPLEMENTARY DATA ...............172 CONCLUSIONS AND RECOMMENDATIONS .....................................................................176 vi LIST OF ABBREVIATIONS SSSAM or SAM State-space Stock Assessment Model SCAA or SCA Statistical catch at age NEFSC Northeast Fisheries Science Center WHAM Woods Hole Assessment Model ASAP Age-structured Assessment Program SAM State-space Assessment Model CORA Chippewa Ottawa Resource Authority NOAA National Oceanic and Atmospheric Administration NMFS National Marine Fisheries Service ICES International Council for the Exploration of the Sea SSB Spawning Stock Biomass MN Multinomial Distribution DM Dirichlet-Multinomial Distribution LG Logistic Normal Distribution MVTW Multivariate Tweedie Distribution MVN Multivariate Lognormal Distribution kg Kilograms vii INTRODUCTION The aphorism “all models are wrong, some are useful” (attributed to statistician George Box) acknowledges that describing real-world systems as a collection of neat mathematical equations will inherently contain inaccuracies. Fisheries stock assessment models attempt to estimate the unobservable - abundance, natural mortality, exploitation rate - using observed catch and survey data by making explicit assumptions about how those observations are mathematically related to life history and fishery parameters. Though the resultant output will be “wrong” in some way, by designing models that account for as much of the natural system as is statistically and mathematically possible, we can strive to make the results more “useful” towards achieving long term fisheries management goals. The history of stock assessment modeling has been typified by progressively adding complexity and realism to the models as technology and data availability have permitted. The 1980s saw the development of the statistical catch at age model (SCAA), which assumed that observed data contained error as the result of observation variability or “observation error” which could be quantified with a statistical distribution. Models could then be fit by maximizing the likelihood of the data conditional on a set of parameter values (Fournier and Archibald, 1982). When such integrated models, (those that use several data sources to fit the model), became standard, scientists had to contend with data weighting (i.e., how closely to fit some data at the expense of poorly fitting others). Research focused on how to specify or calculate variance parameters of the likelihood function that would correctly assign variability (i.e., “weight”) among catch, survey indices of abundance, and/or proportions at age (Deriso et al., 2007; Maunder and Punt, 2013). Often, data were weighted a priori through iterative reweighting 1 algorithms, as a function of sampling trips, or based on expert judgement (Francis, 2011; Truesdell et al., 2017). Another important branch of stock assessment science aimed to introduce more stochasticity into previously constant or deterministic parameters like recruitment, catchability, natural mortality, and selectivity in SCAA (Deroba and Schueller, 2013; Linton and Bence, 2011; Maunder and Deriso, 2003; Wilberg and Bence, 2006). This was typically accomplished by constraining deviations from a mean or predicted value through a penalty term in a joint or “penalized” likelihood (in addition to the likelihood of the data). The variance of these deviations was usually specified using fixed ratios between the observation and process variation or assumed known a priori. The development of state-space stock assessment models (SSSAM) that use an integrated likelihood, rather than the penalized likelihood, have been able to address both data weighting and process variance concerns. Such models have been able to capitalize on advanced mathematical programs that can approximate complex high-level mathematical integrals of the marginal likelihood (Kristensen et al., 2016). This change has led to the estimability of process variance parameters and, under certain assumed likelihood functions for the observed catch, the observation variance parameters (Cadigan, 2015; Miller et al., 2016; Nielsen and Berg, 2014). Now the model, rather than the modeler, could determine the degree of process and observation error, and by doing so, reduce the risk of bias and model misspecification (de Valpine and Hilborn, 2005). Knowing the variability of ecological processes, especially recruitment, can guide model projections which is vital given that forecasting the future abundance of a stock is key to sustainability (Thorson et al., 2014). 2 A state-space stock assessment model (SSSAM) is a hierarchical model with two sub- model components - a “process” model that describes the population dynamics (birth, growth, death of the fish) and “observation” model that describes the fishery (how harvest or other data are collected) (Figure 0.1). The process model quantifies how underlying unobserved “states” like abundance, recruitment, or mortality progress over time as a function of the state in the previous year and the process error (Aeberhard et al., 2018; Auger-Méthé et al., 2016). The observation model describes how the observed data like catch or indices of abundance are a function of the state in each year and observation error. In a frequentist framework, the process and observation error are quantified with probability distributions and the former is integrated from a marginal likelihood that integrates across all possible state values. Figure 0.1. Simplified structure of a state-space stock assessment model. SSSAMs are now employed as a supplement to traditional stock assessment methods in the United States, Canada, and Europe (Cadigan, 2015; Miller et al., 2016; Nielsen and Berg, 2014). Two R modeling packages have been developed- WHAM in the Northeastern United States and SAM in Northern Europe- to facilitate fitting state-space stock assessment models (Nielsen and Berg, 2014; Stock and Miller, 2021). SSSAM have been used for multispecies models, multifleet models, and to flexibly estimate process variance where once it was fixed or 3 estimated with constraints (Albertsen et al., 2018; Nielsen et al., 2021; Perreault and Cadigan, 2021; Stock et al., 2021). Several SSSAMs have been able to estimate observation error variance and thus self- weight the data within the model (Berg and Nielsen, 2016; Cadigan, 2015). However, this ability is contingent on how the observation model is parameterized, usually by fitting catch-at-age data (rather than total catch and proportions at age) using a multivariate lognormal distribution. As research and application of SSSAM was flourishing in the past decade, so was research into alternative likelihood distributions for the proportion at age data, other than the conventional multinomial, that better reflect the procedures used to collect age information and adjust the information content of such data (Francis, 2011, 2014; Maunder, 2011). Alternatives such as the Dirichlet-multinomial, logistic normal, or multivariate Tweedie, may better quantify overdispersion and inter-age correlation (Fisch et al., 2021; Thorson et al., 2022, 2017). These two avenues of inquiry - into SSSAM and alternative likelihoods - have recently begun to intersect and there is considerable interest in how to parameterize both the process and observation sub-models to correctly apportion variability between the two and among multiple time-varying processes or multiple data sources within the process and observation models, respectively (Albertsen et al., 2017; Cronin-Fine and Punt, 2021; Xu et al., 2020). As the interest in applying SSSAM and alternative age composition likelihoods continues to expand, it is more important than ever to understand where and how this framework is best applied. This can and has involved 1) applying stock assessment models to real data in novel contexts and comparing the output against non-state-space models (e.g., Perreault et al., 2020) or 2) fitting state-space models with simulated data using a simulation-estimation procedure (e.g., Miller and Hyun, 2018) (Figure 0.2). Self- and cross- tests are especially elucidating and can 4 quantify model robustness and the illustrated consequences of model misspecification (Deroba et al., 2015). Simulation-estimation testing asks not only how well models perform under ideal (correctly specified) circumstances, but also what is the preferred model to choose when the true underlying dynamics are unknown. When faced with substantial uncertainty or poor data quality, which model configuration will “get it right” most of the time? Among many wrong models, which is the most useful? Figure 0.2. Simplified structure of a simulation-estimation experiment. This research employs both approaches- fitting both real and simulated data- to understand how SSSAM can be applied when fisheries independent indices of abundance are not available (Chapter 1), when the underlying recruitment, expected survival, and selectivity processes are highly variable or constant or the data are “noisy” (Chapter 2), and when the observation error in proportions at age has substantial overdispersion or inter-age correlation (i.e., non-multinomial distribution) (Chapter 3). The findings reveal some of the current upper limits of model complexity for SSSAM and how these models can be designed in the future to accurately and precisely estimate recruitment, abundance, spawning stock biomass, and exploitation rate, and process or observation error variances. 5 REFERENCES Aeberhard, W.H., Flemming, J.M., Nielsen, A., 2018. Review of state-space models for fisheries science. Annu. Rev. Stat. Appl. 5, 215-35. https://doi.org/10.1146/annurev-statistics- 031017-100427. Albertsen, C.M., Nielsen, A., Thygesen, U.H., 2018. Connecting single-stock assessment models through correlated survival. ICES J. Mar. Sci. 75, 235-244. https://doi.org/10.1093/icesjms/fsx114. Albertsen, C.M., Nielsen, A., Thygesen, U.H., 2017. Choosing the observational likelihood in state-space stock assessment models. Can. J. Fish. Aquat. Sci. 74, 779-789. https://doi.org/10.1139/cjfas-2015-0532. Auger-Méthé, M., Field, C., Albertsen, C.M., Derocher, A.E., Lewis, M.A., Jonsen, I.D., Flemming, J.M., 2016. State-space models’ dirty little secrets: Even simple linear Gaussian models can have estimation problems. Sci. Rep. 6, 1-10. https://doi.org/10.1038/srep26677. Berg, C.W., Nielsen, A., 2016. Accounting for correlated observations in an age-based state- space stock assessment model. ICES J. Mar. Sci. 73, 1788-1797. https://doi.org/10.1093/icesjms/fsw046. Cadigan, N.G., 2015. A state-space stock assessment model for northern cod, including under- reported catches and variable natural mortality rates. Can. J. Fish. Aquat. Sci. 308, 1-13. https://doi.org/10.1139/cjfas-2015-0047. Cronin-Fine, L., Punt, A.E., 2021. Modeling time-varying selectivity in size-structured assessment models. Fish. Res. 239, 105927. https://doi.org/10.1016/j.fishres.2021.105927. de Valpine, P., Hilborn, R., 2005. State-space likelihoods for nonlinear fisheries time-series. Can. J. Fish. Aquat. Sci. 62, 1937-1952. https://doi.org/10.1139/f05-116. Deriso, R.B., Maunder, M.N., Skalski, J.R., 2007. Variance estimation in integrated assessment models and its importance for hypothesis testing. Can. J. Fish. Aquat. Sci. 64, 187-197. https://doi.org/10.1139/f06-178. Deroba, J.J., Butterworth, D.S., Methot, R.D., DeOliveria, J.A.A., Fernandez, C., Nielsen, A., Cadrin, S.X., Dickey-Collas, M., Legault, C.M., Ianelli, J.N., Valero, J.L., Needle, C.L., Malley, J.M.O., Chang, Y., Thompson, G.G., Canales, C., Swain, D.P., Miller, D.C.M., Hintzen, N.T., Bertignac, M., Ibaibarriaga, L., Silva, A., Murta, A., Kell, L.T., de Moor, C.L., Parma, A.M., Dichmont, C.M., Restrepo, V.R., Ye, Y., Jardim, E., Spencer, P.D., Hanselman, D.H., Blaylock, J., Mood, M., Hulson, P.J.F., 2015. Simulation testing the robustness of stock assessment models to error: some results from the ICES strategic 6 initiative on stock assessment methods. ICES J. Mar. Sci. 72, 19-30. https://doi.org/10.1093/icesjms/fst237. Deroba, J.J., Schueller, A.M., 2013. Performance of stock assessments with misspecified age- and time-varying natural mortality. Fish. Res. 146, 27-40. https://doi.org/10.1016/j.fishres.2013.03.015. Fisch, N., Camp, E., Shertzer, K., Ahrens, R., 2021. Assessing likelihoods for fitting composition data within stock assessments, with emphasis on different degrees of process and observation error. Fish. Res. 243, 106069. https://doi.org/10.1016/j.fishres.2021.106069. Fournier, D.A., Archibald, C.P., 1982. A general theory for analyzing catch at age data. Can. J. Fish. Aquat. Sci. 39, 1195-1207. https://doi.org/10.1139/f82-157. Francis, R.I.C.C., 2011. Corrigendum: Data weighting in statistical fisheries stock assessment models. Can. J. Fish. Aquat. Sci. 68, 2228-2228. https://doi.org/10.1139/f2011-165. Francis, R.I.C.C., 2014. Replacing the multinomial in stock assessment models: A first step. Fish. Res. 151, 70-84. https://doi.org/10.1016/j.fishres.2013.12.015. Kristensen, K., Nielsen, A., Berg, C.W., Skaug, H., Bell, B.M., 2016. TMB: Automatic differentiation and laplace approximation. J. Stat. Softw. 70, 1-21. https://doi.org/10.18637/jss.v070.i05. Linton, B.C., Bence, J.R., 2011. Catch-at-age assessment in the face of time-varying selectivity. ICES J. Mar. Sci. 68, 611-625. https://doi.org/10.1093/icesjms/fsq173. Maunder, M.N., 2011. Review and evaluation of likelihood functions for composition data in stock-assessment models: Estimating the effective sample size. Fish. Res. 109, 311-319. https://doi.org/10.1016/j.fishres.2011.02.018. Maunder, M.N., Deriso, R.B., 2003. Estimation of recruitment in catch-at-age models. Can. J. Fish. Aquat. Sci. 60, 1204-1216. https://doi.org/10.1139/f03-104. Maunder, M.N., Punt, A.E., 2013. A review of integrated analysis in fisheries stock assessment. Fish. Res. 142, 61-74. https://doi.org/10.1016/j.fishres.2012.07.025. Miller, T.J., Hare, J.A., Alade, L.A., 2016. A state-space approach to incorporating environmental effects on recruitment in an age-structured assessment model with an application to southern new england yellowtail flounder. Can. J. Fish. Aquat. Sci. 73, 1261-1270. https://doi.org/10.1139/cjfas-2015-0339. Miller, T.J., Hyun, S., 2018. Evaluating evidence for alternative natural mortality and process error assumptions using a state-space, age-structured assessment model. Can. J. Fish. Aquat. Sci. 75, 691-703. https://doi.org/10.1139/cjfas-2017-0035. 7 Nielsen, A., Berg, C.W., 2014. Estimation of time-varying selectivity in stock assessments using state-space models. Fish. Res. 158, 96-101. https://doi.org/10.1016/j.fishres.2014.01.014. Nielsen, A., Berg, C.W., Hintzen, N.T., Mosegaard, H., Trijoulet, V., 2021. Multi-fleet state- space assessment model strengthens confidence in single-fleet SAM and provides fleet- specific forecast options. ICES J. Mar. Sci. 78, 2043-2052. https://doi.org/10.1093/icesjms/fsab078. Perreault, A.M.J., Cadigan, N.G., 2021. Natural mortality diagnostics for state-space stock assessment models. Fish. Res. 243, 106062. https://doi.org/10.1016/j.fishres.2021.106062. Perreault, A.M.J., Wheeland, L.J., Joanne Morgan, M., Cadigan, N.G., 2020. A state-space stock assessment model for American plaice on the Grand Bank of Newfoundland. J. Northwest Atl. Fish. Sci. 51, 45-104. https://doi.org/10.2960/J.v51.m727. Stock, B.C., Miller, T.J., 2021. The Woods Hole Assessment Model (WHAM): A general state- space assessment framework that incorporates time- and age- varying processes via random effects and links to environmental covariates. Fish. Res. 240, 105967. https://doi.org/10.1016/j.fishres.2021.105967. Stock, B.C., Xu, H., Miller, T.J., Thorson, J.T., Nye, J.A., 2021. Implementing two-dimensional autocorrelation in either survival or natural mortality improves a state-space assessment model for Southern New England-Mid Atlantic yellowtail flounder. Fish. Res. 237, 105873. https://doi.org/10.1016/j.fishres.2021.105873. Thorson, J.T., Jensen, O.P., Zipkin, E.F., 2014. How variable is recruitment for exploited marine fishes? A hierarchical model for testing life history theory. Can. J. Fish. Aquat. Sci. 71, 973-983. https://doi.org/10.1139/cjfas-2013-0645. Thorson, J.T., Johnson, K.F., Methot, R.D., Taylor, I.G., 2017. Model-based estimates of effective sample size in stock assessment models using the Dirichlet-multinomial distribution. Fish. Res. 192, 84-93. https://doi.org/10.1016/j.fishres.2016.06.005. Thorson, J.T., Miller, T.J., Stock, B.C., 2022. The multivariate-Tweedie: a self-weighting likelihood for age and length composition data arising from hierarchical sampling designs. ICES J. Mar. Sci. fsac159. https://doi.org/10.1093/icesjms/fsac159. Truesdell, S.B., Bence, J.R., Syslo, J.M., Ebener, M.P., 2017. Estimating multinomial effective sample size in catch-at-age and catch-at-size models. Fish. Res. 192, 66-83. https://doi.org/10.1016/j.fishres.2016.11.003. Wilberg, M.J., Bence, J.R., 2006. Performance of time-varying catchability estimators in statistical catch-at-age analysis. Can. J. Fish. Aquat. Sci. 63, 2275-2285. https://doi.org/10.1139/f06-111. 8 Xu, H., Thorson, J.T., Methot, R.D., 2020. Comparing the performance of three data-weighting methods when allowing for time-varying selectivity. Can. J. Fish. Aquat. Sci. 77, 247- 263. https://doi.org/10.1139/cjfas-2019-0107. 9 CHAPTER 1: APPLYING A NOVEL STATE-SPACE STOCK ASSESSMENT FRAMEWORK USING A FISHERIES-DEPENDENT INDEX OF FISHING MORTALITY 1.1 Abstract State-space models have a hierarchical framework that assumes the observed data are derived from a time series of unobserved latent states. State-space stock assessment models have emerged as an alternative framework to conduct stock assessment in Canada, the east coast of the United States of America, and in Europe though little research has investigated where and how they are optimally used and if they would be appropriate in other management contexts. I built a novel state-space stock assessment model with process variation in recruitment and time- and age- specific catchability. I fit the model to fisheries dependent commercial trap net and gill net catch and effort data of Lake Michigan lake whitefish (Coregonus clupeaformis) using a marginal likelihood that integrated over latent states. Compared to the previously employed statistical catch at age model, the state-space model estimated dome-shaped, rather than asymptotic selectivity for both fisheries, 15% lower average total instantaneous mortality, and 20% higher average recruitment. To my knowledge this is the first application of a state-space stock assessment model fit by maximum likelihood in the Laurentian Great Lakes and the first such model to exclude a fisheries-independent survey. These results demonstrate the feasibility of employing a maximum likelihood state-space framework in fisheries that lack such fishery independent indices of abundance and instead use catch per unit effort as an index of abundance. This work presents a novel approach to applying state-space stock assessment modeling and offers insights and suggestions for future in the Great Lakes and in similar circumstances of data availability. 10 1.2 Introduction Fisheries stock assessment must contend with multiple sources of model variability, both from the process sub-model (i.e., process variation) and from the observation sub-model (i.e., observation error) which introduce uncertainty and obscure the relationship between underlying model dynamics and the observed data (Fournier and Archibald, 1982). Until recently, age- structured fisheries stock assessment models have generally assumed the degree of process variation and/or observation error was known. Deriso et al. (2007) emphasized that it is possible to estimate the variances, even from multiple sources, when all variation is due to observation error. If process variation is present and deviations from the mean are treated like another source of data, the variance of those deviations cannot be independently estimated by maximizing the joint likelihood of the deviations and data, a procedure sometimes called penalized likelihood (see Schnute 1994). The penalized likelihood, also called “errors-in-variables” (EV) method, is still used in many stock assessment packages such as ASAP, MULTIFAN-CL, and stock synthesis (SS) (Fournier et al., 1998; Legault and Restrepo, 1998; Methot and Wetzel, 2013). However, because the estimates can be asymptotically biased, alternative approaches have been adopted in newer assessment platforms (de Valpine and Hilborn, 2005; Stock and Miller, 2021). State-space models (SSM) have been able to partition process variation and observation error and estimate both with high precision (Aeberhard et al., 2018b). SSM accomplish this by assuming the data are observations (with error) derived from unobserved states like population abundance and mortality, that are progressing through time following a stepwise Markovian process. The states, or their deviations from a shared mean, are specified as random effects and in a maximum likelihood framework are integrated over to obtain the marginal likelihood, which is maximized. 11 The theorized advantages of state-space stock assessment models have been realized, with models applied in several different geographical regions, using different model configurations, able to simultaneously estimate the magnitude of several different sources of process variation, including recruitment, survival, fishing mortality, selectivity, and natural mortality (Aeberhard et al., 2018; Cadigan, 2015; Stock et al., 2021; Stock and Miller, 2021; Yin et al., 2019). State-space stock assessment models (herein defined as those that use a marginal likelihood) have estimated mortality and abundance greater accuracy and precision than non- state-space models (herein defined as those that use a penalized likelihood), and exhibit less retrospective patterns (Gunnlaugsson, 2012; Perreault et al., 2020; Stock et al., 2021). A state- space modeling research track is currently exploring the feasibility of applying such models to NOAA-managed stocks in New England and the Mid-Atlantic and state-space stock assessment models are used for a large portion of stocks managed by the International Council for the Exploration of the Sea (ICES) (Aanes et al., 2020). Nevertheless, there remains considerable interest and uncertainty in what kinds of process variation can be considered, and under what circumstances and with what data the SSM approach can be practically implemented (Aanes et al., 2020). This work contributes to the ongoing inquiry by using a state-space modeling framework to fit total catch and age composition data for two fisheries (with two distinct gears) without a fisheries-independent index of abundance. Indices quantify trends in stock abundance over time and are often assumed to be proportional to abundance therefore allowing an estimation of stock size. In the absence of a fisheries independent survey, an index can be calculated from data collected during the fishing process, such as catch per unit effort (CPUE). However, CPUE is only a reliable index of abundance if the proportionality constant, catchability, is correctly 12 specified (Wilberg et al., 2009). Catchability is often assumed constant or varying around a constant mean, though can vary systematically due to density-dependent effects, gear changes, preferential sampling, or angler behavior (Ducharme-Barth et al., 2022; Hilborn and Walters, 1992; Quirijns et al., 2008). In such cases, potential causes of changes in catchability might be unknown, but can still be partially accounted for by allowing catchability to vary through a stochastic process, an approach I adopt herein (Wilberg et al., 2009). This novel state-space model was applied to lake whitefish (Coregonus clupeaformis) in the northmost region of Lake Michigan. This stock experiences no recreational fishing and two commercial fisheries, a trap net fishery and a gill net fishery, have yielded on average 222 and 159 thousand kilograms per year, respectively, since 1986, the first year included in the assessment. Fishing effort is reported in numbers of lifts of the trap nets and feet of gill net deployed. These effort data were used to inform fishing mortality within the model and catchability was a state variable estimated for each age and year that scaled effort to fishing mortality. This parameterization distinguishes this state-space model from previous applications and existing state-space platforms like SAM or WHAM where instantaneous fishing mortality is a state variable (Nielsen and Berg, 2014; Perreault et al., 2020). To my knowledge, this is the first state-space stock assessment model application in the Laurentian Great Lakes (fit by marginal maximum likelihood) and this work confirms the applicability of SSM without a fisheries independent survey, which is customary in previous applications (Cadigan, 2015; Nielsen et al., 2021; Perreault et al., 2020; Stock and Miller, 2021). My objective was to apply a novel state-space stock assessment model and to provide time- and age- varying estimates of recruitment and catchability and their associated variances. Ultimately, 13 I developed an assessment tool applicable in situations without fisheries independent surveys and from this tool I make general recommendations about the use of SSM models in such situations. 1.3 Methods Table 1.1. Symbols used in the text, appendices, tables, and figures. Whether the symbol is used in the statistical catch at age model (SCA), the state-space model (SSM), or both, is indicated. The value is provided for indices and time and age invariant data. Symbol Description Usage Value(s) Index Variables 𝑦 Year Both 1986-2017 𝑎, 𝑎̃ Age Both 4-15+ 𝐺 Fishery Both g- gill net, t-trap net Data and Priors 𝐸𝑦,𝐺 Observed effort by year and fishery Both See Figure 2 𝐶̃𝑦,𝐺 Observed harvest by year and fishery Both 𝑃̃𝑎,𝑦,𝐺 Observed proportions by age, year, and fishery Both 𝑀̃ Observed natural mortality SSM 0.2 𝑀̂ Median of prior on natural mortality SCA 0.2 𝜎𝑀 Standard deviation of observed or prior natural Both 0.1 mortality 𝑚𝑎,𝑦 Maturity by age and year Both (𝑠𝑝𝑎𝑤𝑛) Weight at the time of spawning by age and year Both 𝑊𝑎,𝑦 𝑒 Average number of eggs per kilogram SCA 19937 𝑓 Average proportion of females in spawning SCA 0.4845 population φ𝐺 Standard deviation multiplier for fishery-specific SCA 1.5 catchability, 𝑞𝑦,𝐺 , process variability φ𝑠 Standard deviation multiplier for trap net selectivity SCA 1.0 parameter 𝑝1,𝑦,𝑡 process variability φ𝑅 Standard deviation multiplier for recruitment SCA 15 process variability φ𝐶𝐺 Standard deviation multiplier for total catch SCA 4 observation error 𝜎𝐶𝐺 Standard deviation of likelihood of total catch SSM 0.067 Parameters and State Variables 𝑀 Instantaneous natural mortality Both 𝜎𝐺 Standard deviation of process variability of fishery- SSM specific catchability 𝜌𝐺 Correlation coefficient (among ages) of process SSM variability of fishery-specific catchability 𝜎𝑅 Standard deviation of process variability of SSM recruitment (𝑅) Process variability of recruitment SSM 𝜀𝑦 (𝐺) Vector of process variability of fishery-specific SSM 𝜺𝑦 catchability (𝑠) Process variability of trap net selectivity parameter SCA 𝜀𝑦 𝑞𝑦,𝐺 Catchability by year and fishery SCA 14 Table 1.1 (cont’d) 𝜎 Baseline standard deviation for process variability SCA and observation error 𝑝1,𝑦,𝑡 Trap net selectivity inflection point parameter by SCA year 𝑝1,𝑔 Gill net selectivity parameter SCA 𝑝2,𝐺 Trap and gill net selectivity parameters SCA 𝑅̅ Average recruitment and abundance in 1986 SCA 𝐷𝑦 Deviations by year from average recruitment SCA 𝑑𝑎 Deviations by age from average abundance in 1986 SCA 𝛼 Ricker stock recruitment parameter SCA 𝛽 Ricker stock recruitment parameter SCA Derived Quantities 𝑁𝑎,𝑦 Abundance by age and year Both 𝑁̂4,𝑦 Predicted abundance of age 4 individuals by year SCA (recruitment) 𝐹𝑎,𝑦,𝐺 Instantaneous fishing mortality by age, year, and Both fishery 𝒒𝑦,𝐺 Vector of age-specific catchability by year and SSM fishery 𝑠𝑎,𝑦,𝐺 Selectivity by age, year, and fishery SCA 𝑍𝑎,𝑦 Instantaneous total mortality by age and year Both 𝑍𝑎̅ Average instantaneous total mortality by age from SCA 1986-1988 𝐶𝑎,𝑦,𝐺 Expected harvest by age, year, and fishery Both 𝑃𝑎,𝑦,𝐺 Expected proportions by age, year, and fishery Both (𝑆𝑝𝑎𝑤𝑛) Spawning stock biomass by year Both 𝐵𝑦 ϵ𝑦 Spawning stock size in eggs by year SCA 𝑇 Total objective function value Both 𝜎𝑎 Standard deviation of multivariate normal random SSM Set to single value, walk deviations of fishery-specific catchability 𝜎𝐺 , for each fishery 𝑛𝐺 Effective sample size of multinomial likelihood of Both observing given proportion at age of catch 𝚺𝐺 Covariance matrix of process variability of fishery- SSM specific catchability 𝜎𝑠 Standard deviation of process variability of trap net SCA selectivity parameter 1.3.1 Data and Model Inputs The assessment model was fit using annual fishery-specific values of fishing effort (𝐸𝑦,𝐺 ), observed fishery harvest (𝐶̃𝑦,𝐺 ), and observed age compositions in units of proportions at age (𝑃̃𝑎,𝑦,𝐺 ) (variables are described in Table 1.1). Annual reported yield in weight from the catch reporting system was converted into estimates of annual catch in numbers using the annual 15 average weight of fish harvested by each fishery and estimates of underreporting (Appendix 1A). In addition, the model used input values of average weight at age, average length at age, and maturity schedules in each year, which were collected from harvest subsampling, to calculate selectivity and spawning stock biomass. 1.3.2 State-space Assessment Model A state-space stock assessment model (SSM) was developed in Template Model Builder (TMB) that assumed two state variables, recruitment and catchability at age, varied across years (Kristensen et al., 2016). The SSM was structurally similar to the existing statistical catch at age (SCA) models used to assess lake whitefish and lake trout in the Laurentian Great Lakes which fit data using a penalized likelihood (described in Appendix 1B). The SSM had structural elements that mimicked the state-space modeling software “SAM” including the state variables, process variance structure, and a marginal likelihood (Berg and Nielsen, 2016; Nielsen and Berg, 2014). The SSM had two sub-models: a process or “population” model, and an observation or “fishery” model. The process model described how the underlying states, recruitment and catchability, changed over time as a function of the state in the previous year and process variability. The observation model described how the predicted total catch and catch compositions derived from those true unobserved states owing to observation error. Table 1.2. Equations used in the statistical catch at age model (SCA), the state-space model (SSM), or both. Note that though equations 3.5a and 3.5b appear identical, each contributes to a different part of the objective function total, which is an important distinction in the integrated likelihood framework. Index Description Equation Usage Process Model (𝑅) (𝑅) log 𝑁4,𝑦 = log 𝑁4,𝑦−1 + 𝜀𝑦 ; 𝜀𝑦 ~𝑁(0, 𝜎𝑅2 ) 1.1 Recruitment random walk SSM 𝑎−1 log 𝑁𝑎,1986 = log 𝑁4,1986−(𝑎−4) − ∑ log 𝑍𝑎̅ , 4 < 𝑎 ≤ 9 1.2 Abundance at age 4-9 in initial year SSM 4 log 𝑁𝑎,1986 = 0, 𝑎 > 9 1.3 Abundance at age 9+ in initial year Both 16 Table 1.2 (cont’d) log 𝑁𝑎,𝑦 = log 𝑁𝑎−1,𝑦−1 − 𝑍𝑎−1,𝑦−1 , 4 ≤ 𝑎 < 𝐴 Abundance at age exponential 1.4 Both decay with plus group log 𝑁𝐴,𝑦 = log(𝑁𝐴−1,𝑦−1 𝑒 −𝑍𝐴−1,𝑦−1 + 𝑁𝐴,𝑦−1 𝑒 −𝑍𝐴,𝑦−1 ) 𝑍𝑎,𝑦 = 𝑀 + ∑ 𝐹𝑎,𝑦,𝐺 1.5 Total instantaneous mortality 𝐺=𝑔,𝑡 Both 𝐹𝑎,𝑦,𝐺 = 𝑞𝑦,𝐺 𝐸𝑦,𝐺 𝑠𝑎,𝑦,𝐺 , 𝐺 = 𝑔, 𝑡 1.6a Fishing mortality SCA 1.6b Fishing Mortality 𝐹𝑎,𝑦,𝐺 = 𝑞𝑎,𝑦,𝐺 𝐸𝑦,𝐺 , 𝐺 = 𝑔, 𝑡 SSM Year-specific catchability random (𝐺) (𝐺) 1.7a log 𝑞𝑦+1,𝐺 = log 𝑞𝑦,𝐺 + 𝜀𝑦 ; 𝜀𝑦 ~𝑁(0, 𝜎𝐺2 ), 𝐺 = 𝑔, 𝑡 SCA walk Year- and age- specific vector of (𝐺) (𝐺) 1.7bi log 𝒒𝑦,𝐺 = log 𝒒𝑦−1,𝐺 + 𝜺𝑦 ; 𝜺𝑦 ~𝑁(0, 𝚺𝐺 ), 𝐺 = 𝑔, 𝑡 SSM catchability random walk Correlated error of catchability 1.7bii 𝚺𝑎,𝑎̃ = 𝜌|𝑎−𝑎̃| 𝜎𝑎 𝜎𝑎̃ , 4 < 𝑎 < 𝐴, 4 < 𝑎̃ < 𝐴 SSM random walk (𝑆𝑝𝑎𝑤𝑛) (𝑠𝑝𝑎𝑤𝑛) 1.8 Spawning stock biomass 𝐵𝑦 = ∑𝐴𝑎=4 𝑚𝑎,𝑦 𝑊𝑎,𝑦 log 𝑁𝑎,𝑦 Both Observation Model 𝐹𝑎,𝑦,𝐺 2.1 Baranov catch equation 𝐶𝑎,𝑦,𝐺 = (1 − exp(−𝑍𝑎,𝑦 ))𝑁𝑎,𝑦 , 𝐺 = 𝑔, 𝑡 Both 𝑍𝑎,𝑦 𝐶𝑎,𝑦,𝐺 2.2 Yearly catch proportions at age 𝑃𝑎,𝑦,𝐺 = 𝐴 , 𝐺 = 𝑔, 𝑡 Both ∑𝑎=1 𝐶𝑎,𝑦,𝐺 Negative Log Likelihoods Total objective function on the log- 3.1 scale is the sum of prior/penalty 𝑇 = ∑ 𝑁𝐿𝑃 + ∑ 𝑁𝐿𝐿 Both and data likelihood components Recruitment deviations from those ̂4,𝑦 2 1 𝑁 3.2a expected by Ricker curve (see NLP (𝑅) = ∑2017𝑦=1991 2𝜎 2 (log ) + log 𝜎𝑅 SCA 𝑅 𝑁4,𝑦 Appendix B) Recruitment deviations from those 1 (𝑅) 2 3.2b NLP (𝑅) = ∑2017𝑦=1981 2𝜎 2 (𝜀𝑦 ) + log 𝜎𝑅 SSM expected around 0 𝑅 Fishery-specific deviations in 1 (𝐺) 2 3.3a catchability following a random NLP (𝐺) = ∑2017𝑦=1986 2𝜎 2 (𝜀𝑦 ) + log 𝜎𝐺 , 𝐺 = 𝑔, 𝑡 SCA 𝐺 walk Fishery-specific deviations in vector 1 (𝐺) ′ −1 (𝐺) NLP (𝐺) = ∑2017𝑦=1986 ⁄2 𝜺𝑦 Σ 𝑇 𝜺𝑦 + 3.3b of catchability at age following a SSM random walk log √|𝚺𝐺 | , 𝐺 = 𝑔, 𝑡 Deviations for random walk for trap 1 (𝑠) 2 3.4 NLP (𝑠) = ∑2017 𝑦=1986 2𝜎 2 (𝜀𝑦 ) + log 𝜎𝑠 SCA net selectivity parameter 𝑠 Penalty on natural mortality from 1 2 3.5a NLP (𝑀) ̂ = 2 (log 𝑀 − log 𝑀) + log 𝜎𝑀 SCA deviating from median of the prior 2𝜎𝑀 Likelihood of observing a given 3.5b natural mortality value based on 1 NLL(𝑀) = 2 (log 𝑀 ̃ − log 𝑀)2 + log 𝜎𝑀 SSM 2𝜎𝑀 assumed true value 2 Likelihood of observing the given 1 𝐶𝑦,𝐺 NLL(𝐶𝐺) = ∑2017 𝑦=1986 2 (log ̃ ) + log 𝜎𝐶𝐺 , 𝐺 = 3.6 total catch based on an assumed true 2𝜎𝐶𝐺 𝐶𝑦,𝐺 Both value 𝑔, 𝑡 Likelihood of observing the given ̃ NLL(𝑃𝐺) = − ∑2017 𝐴 𝑦=1986 ∑𝑎=4 𝑛𝑦,𝐺 𝑃𝑎,𝑦,𝐺 log 𝑃𝑎,𝑦,𝐺 , 𝐺 = 3.7 proportions at age of catch based on Both assumed true values 𝑔, 𝑡 17 Recruitment was modeled as a random walk which effectively penalizes recruitment estimates that change greatly from year to year with the extent of the penalty depending on the random walk variance. Though recruitment can differ greatly between adjacent years due to several abiotic and biotic processes, treating these changes as arising from random walk is a recommended approach in a state-space framework especially when estimating the variance of the process variability (Maunder and Thorson, 2018). Recruitment, 𝑁4,𝑦 , was calculated on the log-scale by adding normally distributed error to the log-scale recruitment in the previous year, starting 5 years prior to the first year being modeled (Table 1.2, Eq. 1.1). Abundances of individuals ages 5 to 9 in the initial year, 1986, were calculated using the recruitment from years preceding the time series, 1981-1985, which were estimated by extending the random walk process backwards in time (Eq. 1.2). This method assumed that recruitment prior to 1986 came from the same stochastic process as recruitment throughout the time series and the initial abundances are a function of prior recruitment, adjusted downward using average total mortality. Given that there was no information on mortality rates prior to 1986, the average estimated age- specific instantaneous mortality rates for 1986-1988, 𝑍𝑎̅ , were used as reasonable substitutes. The abundance of older ages in the initial year were set to 0, consistent with the very low or zero harvest for cohorts that were age-9 or older in 1986 (Eq. 1.3). Abundances in subsequent years and ages were a function of an exponential mortality model (Eqs. 1.4-1.5). Instantaneous fishing mortality was the product of age-, year-, and fishery- specific catchability, 𝑞𝑎,𝑦,𝐺 and year- and fishery- specific effort, 𝐸𝑦,𝐺 (Eq. 1.6a), a common approach in stock assessment models (Fournier and Archibald, 1982). Effectively, 𝑞𝑎,𝑦,𝐺 combines what is conventionally modeled as two separate processes, year-specific catchability, and age-specific selectivity, into a single age- and year- specific value. Time-varying selectivity 18 is commonly used in stock assessment to account for changes in size at age or changes in gear technology that may preferentially target certain ages, among other reasons (Martell and Stewart, 2014). The vector of log-scale catchability at age in a given year, for a given fishery, i.e. 𝒒𝑦,𝐺 = (𝑞4,𝑦,𝐺 , 𝑞5,𝑦,𝐺 , … , 𝑞𝐴,𝑦,𝐺 ), varied over time according to a random walk with multivariate normal error, analogous to how SAM models age specific instantaneous fishing mortality (Eq. 1.7bi) (Nielsen and Berg, 2014). The covariance matrix, 𝚺, is a flexible matrix that quantifies the variability and correlation among ages of the year- and fishery- specific process variability (𝐺) vectors, 𝜺𝑦 (Eq. 1.7bii). Though the covariance matrix can be parameterized in many ways depending on assumed similarity of age-specific process variability, in this model the correlation structure was a first order autoregressive process (AR(1)), where the degree of correlation was a function of the absolute difference in ages. This correlation structure, when applied to fishing mortality or selectivity, has a demonstrably better fit to data in other stocks and is now the default structure in SAM models and a popular option in WHAM models (Berg and Nielsen, 2016; Nielsen et al., 2021; Nielsen and Berg, 2014; Stock and Miller, 2021). The assumption of stronger correlation in catchability variations between adjacent ages, implicit in the AR(1) structure, is sensible given that many factors (such as size or depth distributions of fish) will tend to be more similar for fish closer in age (Nielsen and Berg, 2014). Catchability in this case is expressed as a proportionality between fishing effort and fishing mortality. This is also the proportionality between annual catch and average annual abundance (Ricker, 1975). Thus, my assumption that fishing mortality is proportional to fishing effort is equivalent to treating annual catch divided by annual effort as an abundance index. Fishery-specific correlation coefficients 𝜌𝐺 and standard deviation, 𝜎𝐺 , parameters were estimated for each fishery and were invariant among ages. The correlation coefficients were 19 estimated on the logit scale, and thus constrained between 0 and 1. These determined how the random walk errors of age-specific catchabilities were related among ages. I did not allow for negative correlations as I could not envision a mechanism that would cause age-specific catchability to differ most for adjacent ages. At one extreme of the allowed range, log-scale catchabilities would change by the same amount for every age from year to year, and their trajectories over time would be parallel (𝜌 = 1). At the other extreme age-specific catchabilities would develop over time completely independently (𝜌 = 0). The estimated value determined the model dynamics between these extremes. In the observation sub-model, the predicted age-, year-, and fishery- specific catch was calculated using the Baranov catch equation (Eq. 2.1). The predicted proportions at age were derived from those values (Eq. 2.2). The SSM assessment model was fit by maximizing a marginal (integrated) likelihood. The random effects, 𝜓, were integrated out of the likelihood, so the objective function was conditional only on the fixed effect parameters, 𝜆. The marginal likelihood contained two components, the likelihood of the data given the random effects and the parameters, 𝐿(𝜆|𝑋, 𝜓), and the probability density function of the random effects, 𝑓(𝜓), ∞ 𝐿(𝜆|𝑋) = ∫−∞ 𝐿(𝜆|𝑋, 𝜓)𝑓(𝜓)𝑑𝜓. The likelihood of observing the total catch and catch composition were modeled as a lognormal distribution and a multinomial distribution, respectively (Eqs. 3.6-3.7). Estimated age- and year- invariant 𝑀 was estimated using a normally distributed likelihood with mean 𝑀 ̃ = 0.2 2 and fixed standard deviation 𝜎𝑀 (Eq. 3.5b). Because of this distribution’s inclusion in the likelihood portion of the marginal likelihood, 𝑀 ̃ is effectively a single observed data point which the model fits by predicting 𝑀. Error from the random walk of log-scale recruitment, log 𝑁4,𝑦 , 20 was assumed to come from a 0-mean normal distribution (Eq. 3.2b). Error in the age- and fishery- specific random walk of catchability was assumed to arise from a 0-mean multivariate normal distribution (Eq. 3.3b). The standard deviations for the observed log-scale total catch by fishery, 𝜎𝐶𝐺 , were fixed at 0.067, the values used in the SCA model. These was found in the SCA model by following an iterative procedure during fitting that adjusted the ratios between process and observation error variance, until the target value of 𝜎𝐶𝐺 was achieved, as described by Richards et al. (1997). This effectively means variance was set based on expert judgment at a level that assumes that two-thirds of observed catches would be within about 6.7% of the true values of catch. I fixed 𝜎𝐶𝐺 at the same value in the SSM model because the model did not converge when I attempted to estimate it along with the other parameters. I conducted a sensitivity analysis that set 𝜎𝐶𝐺 at values 50% above or below 0.067. The point estimates of fixed effect parameters (including the standard deviation parameters for the random effects) were estimated by maximum likelihood. The predicted random effects and derived quantities (i.e., abundance, spawning stock biomass, mortality) were calculated using the “epsilon” method bias correction algorithm which is now standard in TMB (Thorson and Kristensen, 2016). 1.3.3 Case Study- Lake Michigan Lake Whitefish The lake whitefish stock used as a case study for the application of the novel state-space stock assessment is the northmost management region of Lake Michigan, from the Straits of Mackinac to Seul Choix Point, WFM-03 (Figure 1.1). Lake whitefish in the Laurentian Great Lakes have experienced a declining recruitment since the 1990s owing to changes in ice cover and food web dynamics (Ebener et al., 2021). Estimated recruitment in WFM-03 has ranged from 1.5 million individuals during peaks in the 90s and early 00s, to less than half a million in 21 the past five years (Caroffino and Seider, 2020). Individuals recruit to the fishery at age 4, and on average adults are equally susceptible to environmental and biological drivers of natural mortality over time. Changes in environmental conditions that are not fully understood, notably declines in an important food source, Diporeia, have driven declines in lake whitefish growth and condition which are reflected in the age, length, and weight data (Fera et al., 2015). Figure 1.1. Lake whitefish management units of the 1836 Treaty-Ceded Waters of Lakes Superior, Huron, and Michigan, including the management region of interest, WFM-03, in northern Lake Michigan. Reproduced from Caroffino and Barton (2019). Emigration or immigration were not included in the model, because sub-populations in each management unit tend to reproduce in the same region in which they were born, which has been observed from tagging and genetic studies (Ebener et al., 2010; VanDeHey et al., 2009). 22 However, some mixing does occur during harvest, so recruitment should be interpreted as both the reproduction rate and the net movement from other stock areas. Two commercial fisheries occur on lake whitefish in WFM-03, a trap net and gill net fishery, both of which are exclusively fished by members of tribes represented by the Chippewa Ottawa Resource Authority (CORA) (Caroffino and Lenart, 2012). Licensed fishers are obligated to report daily weight of landed fish and the amount of gear used (length of gill net or number of trap net lifts), which are aggregated to a single yearly value for the management region (Ebener et al., 2005) (Figure 1.2). The harvest and effort data are not aggregated across both fisheries because each are reported in distinct units and several features make the two fisheries unique. The gears tend to exhibit different age-specific selectivity curves; gill nets are dome shaped whereas the trap nets are believed to be asymptotic- catching more older and larger fish than the gill nets (Zhao and Morbey, 2017). Additionally, the temporal patterns of relative effort of the two fisheries differed considerably over time, often due to changes in the market and management actions, including a gill net conversion program, wherein Michigan exchanged fishers’ gill nets for trap nets, which was implemented with the 2000 consent decree (United States vs. Michigan Consent Decree, 2000). Lastly, an adjustment was necessary to define effective gill net effort because of changes to the average height of gill net gear over time (Ebener et al., 2005). 23 Figure 1.2. Fishery effort of the gill net and trap net fisheries on Lake Michigan lake whitefish in region WFM-03 from 1986-2017. For Lake Michigan lake whitefish, fisheries management is conducted through regulations intended to limit harvest so as to not exceed a constant annual mortality rate of 65% for any age, and to sustain a spawning potential ratio (SPR) of at least 20% (Ebener et al., 2005). SPR is the ratio between the spawning stock biomass per recruit (SSBR) obtained with the current age-specific fishing mortality schedule held constant over the life of a cohort and the spawning stock biomass per recruit given no fishing mortality (SSBRF=0). This equates, conditioned on life-history, to a constant fishing mortality rule because in Lake Michigan for lake whitefish natural mortality is assumed constant. The recommended harvest level is calculated by scaling age-specific fishing mortality rates (averaged over the last three years of the assessment) up or down to meet mortality and SPR conditions in projections. 24 I compared the estimated abundance, recruitment, spawning stock biomass, and instantaneous mortality, between the SCA and SSM, and determined if differences in output would change management metrics. Spawning stock biomass was the product of proportion mature, average weight at spawning, and abundance at age (Eq. 1.8). The percent difference of each year-specific value, and the minimum, the maximum, and average over the whole time series were calculated. The ranges (differences between maximum and minimum values) were compared between the SCA and SSM. The patterns of age- and year- specific catchability of the SSM was compared to the catchability and selectivity in the SCA. Because the SCA model estimates annual catchability and age- and year- specific selectivity, the product of these values was compared to the age- and year- specific catchabilities from the SSM. The annual total mortality in the final three years and the SPR based on those mortality rates were compared against the established limits. Other diagnostic values, like normalized one step ahead (OSA) residuals for the age composition data in both the SCA and SSM models, ordinary least squared (OLS) or “Pearson residuals” for the total catch in the SCA model, OSA residuals for the total catch in the SSM model, and retrospective patterns were examined to compare relative model fits (Appendix B and C) (Hurtado-Ferro et al., 2015; Trijoulet et al., 2023). OSA residuals account for random effects and the correlation between observations in the multinomial distribution and were calculated either externally (for SCA) or internally (for SSM) as described in Trijoulet et al., (2023). OLS residuals were used to evaluate the SCA model’s fit to total catch because these observations are continuous and normally distributed and this model does not have random effects. Mohn’s Rho statistic was used to quantify the degree of retrospective bias in recruitment and spawning stock biomass and compared against the standard acceptable range for short-lived species, -0.22 to 0.3 (Hurtado-Ferro et al., 2015; Mohn, 1999). 25 1.4 Results The SSM converged on a global minimum of the objective function (Eq. 3.1) and estimated the mean and standard error for the fixed effects: instantaneous natural mortality, 𝑀, the standard deviations of the process variability, 𝜎𝑅 and 𝜎𝐺 , and the degree of correlation in the process variability of the age-specific catchability for each fishery, 𝜌𝐺 (Table 1.3). The output of the SSM exhibited similar trends as that of the SCA model despite the differences in model configuration (Figures 1.3-1.5). A quantitative comparison among the average, minimum, maximum, and range of derived quantities revealed a -62 to +53% difference in some output values (Table 1.4). Table 1.3. Point estimates and asymptotic standard errors of fixed effect parameters in the State- space model (SSM) for Lake Michigan lake whitefish in WFM-03. Parameter Standard Estimate (Symbol) Error 𝑀 0.1896 0.0184 𝜎𝑅 0.2474 0.0428 𝜎𝑡 0.3652 0.0442 𝜎𝑔 0.3905 0.0492 𝜌𝑡 0.8438 0.0502 𝜌𝑔 0.9142 0.0356 Table 1.4. Estimates of derived quantities in the state-space model (SSM) and statistical catch at age (SCA) model for Lake Michigan lake whitefish in WFM-03. Quantity SCA Estimate SSM Estimate Percent (%) difference Average Abundance 2,633,377 3,001,214 13.97 Maximum Abundance 4,431,335 4,885,740 10.25 Minimum Abundance 953,555 1,126,510 18.14 Abundance Range 3,477,780 3,759,230 8.10 Average Recruitment 807,254 910,279 12.76 Average Recruitment in 492,468 650,837 Terminal 10 years 32.16 Minimum Recruitment 252,249 385,213 52.71 Maximum Abundance 1,545,150 1,460,345 -5.49 Recruitment Range 1,292,901 1,075,132 -16.84 Average Spawning Stock 3,931,830 4,489,802 Biomass (SSB) (lbs) 14.19 26 Table 1.4 (cont’d) Gill Net Mortality Range 0.691 0.740 7.09 Minimum SSB (lbs) 2,182,870 2,418,414 10.79 Maximum SSB (lbs) 5,985,860 6,542,546 9.30 SSB Range (lbs) 3,802,990 4,124,132 8.44 Average Total Mortality 0.647 0.543 -16.07 Maximum Total Mortality 0.303 0.268 -11.55 Minimum Total Mortality 1.288 1.211 -5.98 Total Mortality Range 0.985 0.942 -4.37 Average Trap Net Mortality 0.251 0.159 -36.65 Maximum Trap Net Mortality 0.060 0.036 -40.00 Minimum Trap Net Mortality 0.485 0.306 -36.91 Trap Net Mortality Range 0.424 0.270 -36.32 Average Gill Net Mortality 0.202 0.195 -3.47 Maximum Gill Net Mortality 0.013 0.005 -61.54 Minimum Gill Net Mortality 0.704 0.745 5.82 Spawning Potential Ratio (SPR) 0.53 0.63 18.87 Total abundance and recruitment (abundance at age 4) followed similar trends across time in the two models (Figure 1.3). The average abundance in the SSM model was 3 million and in the SCA model it was 2.6 million. Estimates of abundance for both models peaked in 2007 and were lowest in 1989 (Figure 1.3A). The range in abundance was smaller in the SSM model. In the SSM the maximum total abundance was 4.3 times larger than the minimum abundance, and in the SCA, the maximum was 4.6 times the minimum. Neither model estimated abundance consistently higher or lower across the entire time series, though the SSM estimates ranged from 11% lower to 64% higher than of those of the SCA model and was larger for most years. In the final 5 years, the SSM estimated a positive trend and the SCA model a negative trend. 27 Figure 1.3. Estimates of (A) total abundance and (B) recruitment of age 4 individuals of Lake Michigan lake whitefish in WFM-03 from 1986-2017, from the statistical catch at age (SCA) model (black, solid), and the state-space model (SSM) (green, dashed). 28 Estimates of recruitment differed more than abundance among the models, though both the SSM and SCA models estimated similar trends (Figure 1.3B). Estimated recruitment for both models were greatest in 1995 and lowest in 1988. The range in recruitment was smaller in the SSM. In the SSM the maximum estimated recruitment was 3.8 times larger than the minimum estimate and in the SCA model, the maximum was 6.1 times larger than the minimum. Neither model consistently estimated higher recruitment than the other. Recruitment estimates in the SSM were 14% lower to 103% higher than those of the SCA model (at the very beginning and penultimate point of the time series, respectively). Mean recruitment was 13% larger in the SSM (807,000 versus 889,000, for SCA and SSM respectively) and 32% larger in just the final 10 years (492,000 versus 633,000, for SCA and SSM respectively). Both models estimated similar trends in spawning stock biomass (SSB), except for near the end of the time series, when the SSM estimates rose and the SCA estimates fell (Figure 1.4). In both models, the maximum SSB occurred in 1998 and the minimum in 1989, and in both the maximum value was approximately 2.7 times larger than the minimum. SSM estimates of SSB ranged from 10% lower to 59% higher than those of the SCA model, and the average difference was 15%. The mean SSB was 3.9 million lbs. for the SSM and 4.5 million lbs. for the SCA. For both models the trends in SSB were different from trends in abundance and recruitment due to temporal patterns in mortality and weight at age. 29 Figure 1.4. Estimates of spawning stock biomass of Lake Michigan lake whitefish in WFM-03 from 1986-2017, from the statistical catch at age (SCA) model (black, solid), and the state-space model (SSM) (green, dashed). In both models, the total instantaneous mortality rate initially increased in the first third of the time series, then gradually declined (Figure 1.5). The SCA model estimated higher mortality than the SSM for most of the time series, reaching a maximum in 1993 at an instantaneous total mortality rate of 1.288 yr-1 (averaged across ages 4-15+) (Figure 1.5A). The SSM reached a maximum total mortality in the same year but at 1.211 yr-1 (Figure 1.5B). In only 1990 and 1992 did the SSM estimate a larger total instantaneous mortality rate than the SCA model, by 3% and 6%, respectively. The average total instantaneous mortality rate was 0.54 yr-1 in the SSM and 0.64 yr-1 in the SCA. The SSM estimated a yearly instantaneous trap net fishing mortality rate 15-54% lower than the SCA model. The yearly instantaneous gill net fishing mortality rate in the SSM ranged from 49% larger to 57% lower than the SCA model, though it was lower for most of the time series (24 out of 32 years). Time-invariant instantaneous natural mortality rate was approximately the same in both models- 0.190 ± 0.018 yr-1 and 0.184 ± 0.017 yr-1 in the SSM and SCA, respectively. 30 Figure 1.5. Estimated instantaneous fishing mortality rate of the gill net fishery (FG), the trap net fishery (FT) and the natural mortality rate (M) over time for (A) the statistical catch at age model (SCA) and (B) the state-space model (SSM). The instantaneous fishing mortality rate was averaged across ages 4-15+ for each year, and instantaneous natural morality rate was age- and year- invariant. In the SSM the catchabilities provide estimates of the age- and year- specific proportionality between instantaneous fishing mortality rates and fishing effort. These fishery- specific relationships are presented in two ways- as catchability over years for each age (Figure 1.6B, 1.6D) and as catchability over ages for each year, but normalized so that the maximum value is 1, as is typical for selectivity (Figure 1.7B, 1.7D). The estimated degree of correlation in the AR(1) random walk for catchability (± 95% confidence intervals) was 0.843 ± 0.050 for the trap net fishery and 0.914 ± 0.036 for the gill net fishery (Table 1.3). For the SCA, the product of year-specific catchability and age- and year- specific selectivity has the same meaning as SSM catchabilities. I calculated these derived age- and year- specific catchabilities for the SCA and present them in the same way to compare how models quantify the relationship between fishing mortality and fishing effort (Figures 1.6A, 1.6C, 1.7A, 1.7C). 31 Figure 1.6. Estimated catchability over years for each age (on scale from red to purple from age 4 to 15+) of the (A) gill net and (C) trap net fishery in the statistical catch at age (SCA) model, and the (B) gill net and (D) trap net fishery in the state-space model (SSM). 32 Figure 1.7. Estimated selectivity over ages for each year (on scale from red for early years to blue for later years) of the (A) gill net and (C) trap net fishery in the statistical catch at age (SCA) model, and of the (B) gill net and (D) trap net fishery in the state-space model (SSM). Selectivity was obtained by normalizing age-specific catchability such that the maximum value for each year was 1. Catchability at age was lower later in the time series than near the beginning across both models and both fisheries, but there were some marked differences in patterns between SSM and SCA (Figures 1.6). Catchability of the gill net fishery steadily declined after 1995 in the SSM but peaked mid-way through the time series, in 2001, in the SCA (contrast Figures 1.6A and 1.6B). Catchability of the trap net fishery had similar trends for both models, with peaks occurring in 1994 and 2004-2006 for both the SCA and SSM (contrast Figures 1.6C and 1.6D). The selectivity for both fisheries was dome shaped in the SSM, and largely asymptotic in the SCA (Figure 1.7). The one notable deviation from this pattern was the gill net fishery in the SCA 33 model, where some dome-like patterns occurred in the initial 10 years of the time series with a modest decline in selectivity at older ages. This pattern can be explained because selectivity in the SCA model was a function of mean length at age, and because fish got smaller on average through the time series, the old fish were as small late in the time series as the young highly selected fish earlier in the time series. In general, selectivity peaked at younger ages earlier in the time series for both models and both fisheries, and the age of full selectivity became progressively older in later years. Management reference points differed between the SCA and SSM though not to the extent that differences in recommended harvest limits would have differentially impinged on fishery operations. In the last three years of the SCA and SSM assessments, the maximum (over ages) annual total mortality was 31% and 25%, and the SPR based on these mortality rates was 0.53 and 0.63, respectively. Thus, mortality rates were estimated to be below 65% and SPR above 0.2. In short, harvest limits based on either assessment would correspond to higher than status quo levels of fishing mortality. Residuals for the fit to log scale total catch for both fisheries and both models were centered on zero, approximately normal, and generally did not trend through time, although the residuals for catch from the SSM model did exhibit more skew than the SCA model (Figures 1B.1-1B.3, 1C.2-1C.4). Residuals for the age compositions for both fleets from the SCA model were patterned for several cohorts, especially in the first half of the time series (Figures 1B.4- 1B.5). In the SSM model, the model overestimated the proportion of older ages in the trap net fishery and underestimated the proportion of individuals in the gill net fishery at the beginning of the time series, but the cohort effects were generally resolved relative to the SCA model (Figures 1C.5-1C.6). Both models exhibited retrospective patterns in recruitment, though not spawning 34 stock biomass (Figures 1B.6-1B.7, 1C.7-1C.8). The Mohn’s Rho for recruitment was 0.989 for the SCA and 0.448 for the SSM, which both suggest a systemic bias in recruitment, though less so in the SSM. The Mohn’s Rho for spawning stock biomass was marginally larger and in the opposite direction in the SSM model (0.092) compared to the SCA (-0.038). The retrospective bias on SSB was within the acceptable range of -0.22 to 0.3 suggested for short lived species. Sensitivity tests demonstrated that fixing the standard deviation of the lognormal distribution of the total catch at values 50% above or below the baseline did not change most estimated fixed effect parameters by more than 1.5% (Table 1.5). The estimated standard deviation of the trap net and gill net catchability process variability decreased by 2.5% and 8% respectively, when the observation error was increased, suggesting a tradeoff between these values. For a full description of the model diagnostics of SSM, including time series of one step ahead (OSA) residuals and retrospective analyses, see Appendix 1C. Table 1.5. Results of sensitivity analysis. The maximum likelihood estimates of fixed effect parameters are reported for each scenario as well as the percent (%) difference relative to the baseline. % % % % Scenario 𝑀 𝜎𝑅 𝜎𝑡 𝜎𝑔 % Dif 𝜌𝑡 𝜌𝑔 % Dif Dif Dif Dif Dif 𝜎𝐶𝐺 = 0.067 (Baseline) 0.1896 0.247 0.365 0.391 0.844 0.914 𝜎𝐶𝐺 = 0.034 0.1899 0.18 0.248 0.09 0.370 1.2 0.403 3.26 0.847 0.32 0.920 0.58 𝜎𝐶𝐺 = 0.100 0.1896 -0.17 0.247 -0.2 0.360 -2.5 0.371 -7.96 0.841 -0.7 0.905 -1.55 1.5 Discussion This work demonstrated the feasibility of applying a state-space stock assessment model (SSM) with age-specific catchability as a state variable akin to year- and age- specific fishing mortality in previous applications (Berg and Nielsen, 2016; Nielsen and Berg, 2014). This also confirmed the possibility of applying SSM in cases where there are no fisheries independent 35 indices of abundance. To capitalize on state-space modeling’s capacity to estimate several sources of process variability, the existing SCA model had to be re-parameterized such that recruitment and age-specific catchability were random walk processes. Merely specifying that yearly variability in catchability, recruitment, or selectivity were random effects, without changing the parameterization of the SCA model resulted in model non-convergence. These necessary changes to the process model parameterization, though increasing realism and model flexibility, inevitably led to changes in model output that cannot be parsed from changes due only to converting a statistical catch at age model (SCA) to an SSM. Some changes in output between the SCA and SSM, like trends in abundance and fishing mortality, were minimal. Others, like recruitment in the latter part of the time series, and the catchability and selectivity patterns, substantially altered the interpretation of population and fisheries dynamics. Estimates of time-varying recruitment from the SSM exhibited less interannual variation and less difference between the maximum and minimum value than the SCA model, which reflects autocorrelation in the random walk process (Thorson et al., 2014). During much of the assessment, recruitment can be strongly informed by how a cohort was represented in subsequent years of catch data which is why estimates at the end of the assessment period are generally the most uncertain (Brooks and Legault, 2016). In the absence of future information, the SCA model was guided by the explicit relationship between recruitment and spawning stock biomass (SSB) in past years while SSM, lacking such a function, remained largely the same in the final 5 years (Figure 3B). Ideally, the Lake Michigan lake whitefish model should have enough flexibility to account for the relationship between recruitment and SSB while also incorporating how ecosystem changes like ice cover, prey availability, and habitat degradation due to dreissenid 36 invasion may create additional yearly variability (Ebener et al., 2021). The solution may lie in one or several alternative recruitment structures including 1) parameterizing variable parameters of the stock recruitment curve, 2) employing more dynamic autocorrelation processes with time series models that estimate how much to “remember” from previous years (e.g., AR(1)), or 3) utilizing a “mixture-distribution” of two distributions controlled via a Bernoulli distribution that accommodates occasional spikes or dips in recruitment (Johnson et al., 2016; Maunder and Thorson, 2018; Thorson et al., 2014). The SSM estimated dome-shaped selectivity for both fisheries, so it predicted older (and thus larger) individuals were present but less vulnerable to capture. The SCA was constrained to fit an asymptotic selectivity curve for the trap net fishery such that the oldest fish were catchable at a comparable rate to middle-aged fish. An asymptote was estimated for most years for the gill net fishery as well. Sensitivity tests of the SSM which forced an asymptote for the trap net fishery confirmed that differences in the estimated selectivity curve drove the differences in abundance, and by extension, spawning stock biomass. Both asymptotic and dome-shaped selectivity have been previously reported for trap net gear and either relationship could be reasonably expected for Lake Michigan lake whitefish in WFM-03 (Dunlop et al., 2018; Hansen et al., 2008; Jeong et al., 2000). The SSM had limited data to inform selectivity because the average age of lake whitefish in this region is lower than in adjacent regions and older ages were not strongly represented in the harvest (Caroffino and Seider, 2020). This flexibility to estimate time-varying selectivity without specifying an explicit and somewhat arbitrary function, is a hallmark of state-space stock assessment models (Nielsen et al., 2021). However, if there is good reason to specify or restrain it, there are several future options to do so. The functional form could be implemented as a “prior” about which penalties are assessed, rather than requiring an 37 exact match. This approach could be implemented with the selectivity parameters fixed, changing gradually over time (perhaps by an autoregressive process), or changing at discrete time blocks (Cronin-Fine and Punt, 2021; Martell and Stewart, 2014; Stock and Miller, 2021). Future lake whitefish assessments might consider a thorough model selection and review procedure, with dome-shaped selectivity for the trap net fishery included as a possible interpretation of fishery dynamics. The simultaneous estimation of observation and process error variances is a touted hallmark of state-space modeling but was not achieved in this model. This inability to estimate the observation error variance within the model may be caused by the estimation of variance of time-varying catchability. Sensitivity tests demonstrated a tradeoff between observation error standard deviation, 𝜎𝐶𝐺 , and catchability standard deviation, 𝜎𝐺 , because when the former was fixed at a higher value, the latter was estimated at a lower value. The observation error standard deviation may have been estimable in this model if there were greater contrast in fishing mortality and catch over time, or if there were informative survey data. Observation error standard deviation can be approximated from the observed data, so estimating the standard deviation of catchability, an unobservable state variable, was prioritized in this model, a practice also used by others (Francis, 2011). The age composition data were fit using a multinomial distribution for which the effective sample sizes were specified a priori. Efforts to calculate effective sample sizes through established iterative reweighting techniques led to data overweighting (i.e. the effective sample sizes were orders of magnitude larger than observed sample size), fitting the age proportion data precisely at the expense of high process model variability (Francis, 2011; Truesdell et al., 2017). Estimability of observation error variance may be beyond the upper limits of model flexibility for this SSM given the process model 38 parameterization and data availability. Limitations on estimating the observation error variance have not been evident in other state-space models that included index data and assumed constant catchability (Nielsen and Berg, 2014; Perreault et al., 2020). Future research could investigate the reasons for this difference. In particular, it is an open question whether estimation of observation error variance and time-varying catchability is possible with different distributions of the catch data (e.g., Dirichlet multinomial for proportions at age or a multivariate lognormal for catch at age (Nielsen and Berg, 2014; Thorson et al., 2017) or if provision of external estimates of observation error variance is a fundamental requirement when allowing time- varying catchability. If it is a fundamental property, providing external estimates of observation error variance in order to allow for time-varying fishery catchability seems preferable. Though fisheries-dependent sampling is non-random and prone to bias, its continued use may be justified by its extensive availability, inexpensiveness, and the capabilities of modern assessment models to account for violations of the typical assumption of proportionality with abundance. This state-space stock assessment model uses variability in the proportionality coefficient, catchability, to account for a stochastic relationship between CPUE and abundance across age and time, combatting the concerns of hyperstability or hyperdepletion that surround the use of effort data (Rose and Kulka, 1999). Other research has explored standardization methods using generalized linear models (GLM) or generalized additive models (GAM) to explain how factors like area, time, and gear can influence the relationship between abundance and CPUE in a non-linear fashion (Deroba and Bence, 2009; Ducharme-Barth et al., 2022; Grüss et al., 2019). Specific covariates such as season, sampling depth, or hook and bait characteristics can be directly included in the model to explain variation in catchability (Grüss et al., 2019; Johnson et al., 2019). Even with fine scale catch data and informative covariates, specifying 39 catchability as a time-varying random effect can allow for temporal variability not explained by the covariates, without overfitting the model. Use of this SSM in future cases may challenge preconceived assumptions about population or fishery dynamics that were assumed impossible, unrealistic, or inestimable given the available data and limits to parameterization. The SSM structure easily lends itself to estimating process variation in natural mortality, selectivity, abundance, and many other factors previously assumed to be invariant or fixed. This research demonstrated that a lack of fisheries independent indices is not a hurdle to employing SSM. Many more fisheries may consider this approach a viable option, maintaining that there is sufficient informative effort data to relate catchability to fishing mortality. Highly migratory species like tunas, billfish, and sharks, data- poor species like elasmobranchs, deep sea fisheries, and developing fisheries are examples of those that have limited to no fisheries independent data, or the continued collection of such data may not be cost effective and may therefore benefit from this SSM approach (Costello et al., 2012; Dennis et al., 2015; Langley et al., 2009; Lynch et al., 2018; Victorero et al., 2018). Several Great Lakes stocks which have been previously assessed using a penalized likelihood statistical catch-at-age model may also be candidates for fitting a state-space stock assessment model, especially one such as this which can accommodate fisheries-dependent data. For example, yellow perch in Lake Michigan are assessed using recreational and commercial fisheries effort and time-varying catchability, though these models include informative fisheries- independent data (Wilberg et al., 2005). As another example, Chinook salmon stocks in Lakes Michigan and Huron lack reliable survey information and abundance is estimated entirely with catch and effort data (Brenden et al., 2012; Clark et al., 2016). Likewise, consideration of fishery dependent CPUE information has been suggested for use alongside fishery independent 40 information when stakeholder perceptions of stock status are inconsistent with a stock assessment, as in Gulf of Maine and Georges Bank Cod (NEFSC, 2022a). While a lack of fishery independent data may create new challenges, such as the inability to estimate observation error variance, they are not insurmountable, and many benefits of state-space modelling can still be achieved. 41 REFERENCES Aanes, S., Aeberhard, W.H., Albertsen, C.M., Aldrin, M., Babyn, J., Berg, C.W., Breivik, O., Cook, R., Flemming, J.M., Fryer, R., Liljestrand, E.M., Millar, C., Miller, T.J., Minto, C., Newman, K.B., Nielsen, A., Perreault, A., Regular, P., Skaug, H.J., Spence, M., Trijoulet, V., Vatnehol, S., 2020. Workshop on the review and future of state space stock assessment models in ICES (WKRFSAM). ICES Sci. Rep. 2:32. 1-23. http://doi.org/10.17895/ices.pub.6004. Aeberhard, W.H., Flemming, J.M., Nielsen, A., 2018. Review of state-space models for fisheries science. Annu. Rev. Stat. Appl. 5, 215-35. https://doi.org/10.1146/annurev-statistics- 031017-100427. Berg, C.W., Nielsen, A., 2016. Accounting for correlated observations in an age-based state- space stock assessment model. ICES J. Mar. Sci. 73, 1788-1797. https://doi.org/10.1093/icesjms/fsw046. Brenden, T.O., Bence, J.R., Szalai, E.B., 2012. An age-structured integrated assessment of chinook salmon population dynamics in Lake Huron’s main basin since 1968. T. Am. Fish. Soc. 141, 919-933. https://doi.org/10.1080/00028487.2012.675910. Brooks, E.N., Legault, C.M., 2016. Retrospective forecasting - evaluating performance of stock projections for New England groundfish stocks. Can. J. Fish. Aquat. Sci. 73, 935-950. https://doi.org/10.1139/cjfas-2015-0163. Cadigan, N.G., 2015. A state-space stock assessment model for northern cod, including under- reported catches and variable natural mortality rates. Can. J. Fish. Aquat. Sci. 308, 1-13. https://doi.org/10.1139/cjfas-2015-0047. Caroffino, D.C., Lenart, S.J., 2012. Executive summary in Caroffino, D.C. and Lenart, S.J., eds. Technical fisheries committee administrative report 2012: Status of lake trout and lake whitefish populations in the 1836 treaty-ceded waters of Lakes Superior, Huron and Michigan, with recommended yield and effort levels for 2012. http://www.michigan.gov/greatlakesconsentdecree. Caroffino, D.C., Seider, M.J., 2020. Executive Summary in Caroffino, D.C. and Seider, M.J., eds. Technical fisheries committee administrative report 2020: Status of lake trout and lake whitefish populations in the 1836 treaty-ceded waters of Lakes Superior, Huron and Michigan, with recommended yield and effort levels for 2020. https://www.michigan.gov/greatlakesconsentdecree. Clark, R.D., Bence, J.R., Claramunt, R.M., Johnson, J.E., Gonder, D., Legler, N.D., Robillard, S.R., Dickinson, B.D., 2016. A spatially explicit assessment of changes in chinook salmon fisheries in Lakes Michigan and Huron from 1986 to 2011. N. Am. J. Fish. Manage. 36, 1068–1083. https://doi.org/10.1080/02755947.2016.1185060. 42 Costello, C., Ovando, D., Hilborn, R., Gaines, S.D., Deschenes, O., Lester, S.E., 2012. Status and solutions for the world’s unassessed fisheries. Science 338, 517-520. https://doi.org/10.1126/science.1223389. Cronin-Fine, L., Punt, A.E., 2021. Modeling time-varying selectivity in size-structured assessment models. Fish. Res. 239, 105927. https://doi.org/10.1016/j.fishres.2021.105927. de Valpine, P., Hilborn, R., 2005. State-space likelihoods for nonlinear fisheries time-series. Can. J. Fish. Aquat. Sci. 62, 1937-1952. https://doi.org/10.1139/f05-116. Dennis, D., Plagányi, É., Van Putten, I., Hutton, T., Pascoe, S., 2015. Cost benefit of fishery- independent surveys: Are they worth the money? Mar. Policy 58, 108-115. https://doi.org/10.1016/j.marpol.2015.04.016. Deriso, R.B., Maunder, M.N., Skalski, J.R., 2007. Variance estimation in integrated assessment models and its importance for hypothesis testing. Can. J. Fish. Aquat. Sci. 64, 187-197. https://doi.org/10.1139/f06-178. Deroba, J.J., Bence, J.R., 2009. Developing model-based indices of Lake Whitefish abundance using commercial fishery catch and effort data in Lakes Huron, Michigan, and Superior. N. Am. J. Fish. Manage. 29, 50-63. https://doi.org/10.1577/m07-205.1. Ducharme-Barth, N.D., Grüss, A., Vincent, M.T., Kiyofuji, H., Aoki, Y., Pilling, G., Hampton, J., Thorson, J.T., 2022. Impacts of fisheries-dependent spatial sampling patterns on catch- per-unit-effort standardization: A simulation study and fishery application. Fish. Res. 246, 106169. https://doi.org/10.1016/j.fishres.2021.106169. Dunlop, E.S., Feiner, Z.S., Höök, T.O., 2018. Potential for fisheries-induced evolution in the Laurentian Great Lakes. J. Great Lakes R. 44, 735-747. https://doi.org/10.1016/j.jglr.2018.05.009. Ebener, M.P., Bence, J.R., Newman, K., Schneeberger, P., 2005. Application of statistical catch- at-age models to assess lake whitefish stocks in the 1836 treaty-ceded waters of the upper Great Lakes. In: Mohr, L.C., Nalepa, T.F. (Eds.), Proceedings of a Workshop on the Dynamics of Lake Whitefish and the Amphipod Diporeia spp. in the Great Lakes Great Lakes Fishery Commission Technical Report 66, pp. 271-309. Ebener, M.P., Brenden, T.O., Wright, G.M., Jones, M.L., Faisal, M., 2010. Spatial and temporal distributions of lake whitefish spawning stocks in Northern lakes Michigan and Huron, 2003-2008. J. Great Lakes Res. 36, 38-51. https://doi.org/10.1016/j.jglr.2010.02.002. Ebener, M.P., Dunlop, E.S., Muir, A.M., 2021. Declining recruitment of lake whitefish recruitment to fisheries in the Laurentian Great Lakes: Management considerations and research priorities. http://www.glfc.org/pubs/misc/2021-01.pdf. 43 Fera, S.A., Rennie, M.D., Dunlop, E.S., 2015. Cross-basin analysis of long-term trends in the growth of lake whitefish in the Laurentian Great Lakes. J. Great Lakes R. 41, 1138-1149. https://doi.org/10.1016/j.jglr.2015.08.010. Fournier, D.A., Archibald, C.P., 1982. A general theory for analyzing catch at age data. Can. J. Fish. Aquat. Sci. 39, 1195-1207. https://doi.org/10.1139/f82-157. Fournier, D.A., Hampton, J., Sibert, J.R., 1998. MULTIFAN-CL: a length-based, age-structured model for fisheries stock assessment, with application to South Pacific albacore. Can. J. Fish. Aquat. Sci. 55, 2105-2116. https://doi.org/10.1139/f98-100. Fournier, D.A., Skaug, H.J., Ancheta, J., Ianelli, J.N., Magnusson, A., Maunder, M.N., Nielsen, A., Sibert, J., 2012. AD Model Builder: Using automatic differentiation for statistical inference of highly parameterized complex nonlinear models. Optim. Method Softw. 27, 233-249. https://doi.org/10.1080/10556788.2011.597854. Francis, R.I.C.C., 2011. Corrigendum: Data weighting in statistical fisheries stock assessment models. Can. J. Fish. Aquat. Sci. 68, 2228-2228. https://doi.org/10.1139/f2011-165. Grüss, A., Walter, J.F., Babcock, E.A., Forrestal, F.C., Thorson, J.T., Lauretta, M.V., Schirripa, M.J., 2019. Evaluation of the impacts of different treatments of spatio-temporal variation in catch-per-unit-effort standardization models. Fish. Res. 213, 75-93. https://doi.org/10.1016/j.fishres.2019.01.008. Gunnlaugsson, T., 2012. Some catch-at-age analysis methods and models compared on simulated data. Open J. Mar. Sci. 2, 16-24. https://doi.org/10.4236/ojms.2012.21003. Hansen, M.J., Horner, N.J., Liter, M., Peterson, M.P., Maiolie, M.A., 2008. Dynamics of an increasing lake trout population in Lake Pend Oreille, Idaho. N. Am. J. Fish. Manage. 28, 1160-1171. https://doi.org/10.1577/M07-149.1. He, J.X., Bence, J.R., 2007. Modeling annual growth variation using a hierarchical Bayesian approach and the von Bertalanffy growth function, with application to lake trout in southern Lake Huron. T. Am. Fish. Soc. 136, 318-330. https://doi.org/10.1577/t06-108.1. Herbst, S.J., Marsden, J.E., 2011. Comparison of precision and bias of scale, fin ray, and otolith age estimates for lake whitefish (Coregonus clupeaformis) in Lake Champlain. J. Great Lakes R. 37, 386-389. https://doi.org/10.1016/j.jglr.2011.02.001. Hilborn, R., Walters, C.J., 1992. Quantitative fisheries stock assessment: choice, dynamics, and uncertainty. Chapman and Hall, New York, NY. Hurtado-Ferro, F., Szuwalski, C.S., Valero, J.L., Anderson, S.C., Cunningham, C.J., Johnson, K.F., Licandeo, R.R., Mcgilliard, C.R., Monnahan, C.C., Muradian, M.L., 2015. Looking in the rear-view mirror: bias and retrospective patterns in integrated, age-structured stock assessment models. ICES J. Mar. Sci. 72, 99-110. https://doi.org/10.1093/icesjms/fsu198. 44 Jeong, E.C., Park, C.D., Park, S.W., Lee, J.H., Tokai, T., 2000. Size selectivity of trap for male red queen crab Chionoecetes japonicus with the extended SELECT model. Fish. Res. 66, 494-501. https://doi.org/10.1046/j.1444-2906.2000.00079.x. Johnson, K.F., Councill, E., Thorson, J.T., Brooks, E., Methot, R.D., Punt, A.E., 2016. Can autocorrelated recruitment be estimated using integrated assessment models and how does it affect population forecasts? Fish. Res. 183, 222-232. https://doi.org/10.1016/j.fishres.2016.06.004. Johnson, K.F., Thorson, J.T., Punt, A.E., 2019. Investigating the value of including depth during spatiotemporal index standardization. Fish. Res. 216, 126-137. https://doi.org/10.1016/j.fishres.2019.04.004. Kristensen, K., Nielsen, A., Berg, C.W., Skaug, H., Bell, B.M., 2016. TMB: Automatic differentiation and laplace approximation. J. Stat. Softw. 70, 1-21. https://doi.org/10.18637/jss.v070.i05. Langley, A., Harley, S., Hoyle, S., Davies, N., Hampton, J., Kleiber, P., 2009. Stock assessment of yellowfin tuna in the western and central Pacific Ocean. WCPFC SC5 SA WP‐3, Port Vila, Vanuatu, 10–21. Legault, C.M., Restrepo, V.R., 1998. A flexible forward age-structured assessment program. Collect. Vol. Sci. Pap. ICCAT, 49, 246-253. Lynch, P.D., Shertzer, K.W., Cortés, E., Latour, R.J., 2018. Abundance trends of highly migratory species in the Atlantic Ocean: accounting for water temperature profiles. ICES J. Mar. Sci. 75, 1427-1438. https://doi.org/10.1093/icesjms/fsy008. Martell, S., Stewart, I., 2014. Towards defining good practices for modeling time-varying selectivity. Fish. Res. 158, 84-95. https://doi.org/10.1016/j.fishres.2013.11.001. Maunder, M.N., Thorson, J.T., 2018. Modeling temporal variation in recruitment in fisheries stock assessment: A review of theory and practice. Fish. Res. 217, 71-86. https://doi.org/10.1016/j.fishres.2018.12.014. Methot, R.D., Wetzel, C.R., 2013. Stock synthesis: A biological and statistical framework for fish stock assessment and fishery management. Fish. Res. 142, 86-99. https://doi.org/10.1016/j.fishres.2012.10.012. Mohn, R., 1999. The retrospective problem in sequential population analysis: An investigation using cod fishery and simulated data. ICES J. Mar. Sci. 56, 473-488. https://doi.org/10.1006/jmsc.1999.0481. [NEFSC] Northeast Fisheries Science Center. 2022. Management Track Assessments Fall 2021. US Dept Commer, Northeast Fish. Sci. Cent. Ref. Doc. 22-07, 53 p. 45 Nielsen, A., Berg, C.W., 2014. Estimation of time-varying selectivity in stock assessments using state-space models. Fish. Res. 158, 96-101. https://doi.org/10.1016/j.fishres.2014.01.014. Nielsen, A., Berg, C.W., Hintzen, N.T., Mosegaard, H., Trijoulet, V., 2021. Multi-fleet state- space assessment model strengthens confidence in single-fleet SAM and provides fleet- specific forecast options. ICES J. Mar. Sci. 78, 2043-2052. https://doi.org/10.1093/icesjms/fsab078. Patriarche, M.H., 1977. Biological basis for management of lake whitefish in the Michigan waters of northern Lake Michigan. T. Am. Fish. Soc. 106, 295-308. https://doi.org/10.1577/1548-8659(1977)106<295:bbfmol>2.0.co;2 Perreault, A.M.J., Wheeland, L.J., Joanne Morgan, M., Cadigan, N.G., 2020. A state-space stock assessment model for American plaice on the Grand Bank of Newfoundland. J. Northwest Atl. Fish. Sci. 51, 45-104. https://doi.org/10.2960/J.v51.m727. Quirijns, F.J., Poos, J.J., Rijnsdorp, A.D., 2008. Standardizing commercial CPUE data in monitoring stock dynamics: Accounting for targeting behaviour in mixed fisheries. Fish. Res. 89, 1-8. https://doi.org/10.1016/j.fishres.2007.08.016 Richards, L.J., Schnute, J.T., Olsen, N., 1997. Visualizing catch–age analysis: a case study. Can. J. Fish. Aquat. Sci. 54, 1636-1658. https://doi.org/10.1139/f97-073. Ricker, W.E., 1975. Computation and interpretation of biological statistics of fish populations. Bull. Fish. Res. Board. Can. 191, 1-382. Rose, G.A., Kulka, D.W., 1999. Hyperaggregation of fish and fisheries: how catch-per-unit- effort increased as the northern cod (Gadus morhua) declined. Can. J. Fish. Aquat. Sci. 56, 118-127. https://doi.org/10.1139/f99-207. Schnute, J.T., 1994. A general framework for developing sequential fisheries models. Can. J. Fish. Aquat. Sci. 51, 1676–1688. https://doi.org/10.1139/f94-168. Stock, B.C., Miller, T.J., 2021. The Woods Hole Assessment Model (WHAM): A general state- space assessment framework that incorporates time- and age- varying processes via random effects and links to environmental covariates. Fish. Res. 240, 105967. https://doi.org/10.1016/j.fishres.2021.105967. Stock, B.C., Xu, H., Miller, T.J., Thorson, J.T., Nye, J.A., 2021. Implementing two-dimensional autocorrelation in either survival or natural mortality improves a state-space assessment model for Southern New England-Mid Atlantic yellowtail flounder. Fish. Res. 237, 105873. https://doi.org/10.1016/j.fishres.2021.105873. Thorson, J.T., Jensen, O.P., Zipkin, E.F., 2014. How variable is recruitment for exploited marine fishes? A hierarchical model for testing life history theory. Can. J. Fish. Aquat. Sci. 71, 973-983. https://doi.org/10.1139/cjfas-2013-0645. 46 Thorson, J.T., Kristensen, K., 2016. Implementing a generic method for bias correction in statistical models using random effects, with spatial and population dynamics examples. Fish. Res. 175, 66-74. https://doi.org/10.1016/j.fishres.2015.11.016. Thorson, J.T., Johnson, K.F., Methot, R.D., Taylor, I.G., 2017. Model-based estimates of effective sample size in stock assessment models using the Dirichlet-multinomial distribution. Fish. Res. 192, 84-93. https://doi.org/10.1016/j.fishres.2016.06.005. Trijoulet, V., Albertsen, C.M., Kristensen, K., Legault, C.M., Miller, T.J., Nielsen, A., 2023. Model validation for compositional data in stock assessment models: Calculating residuals with correct properties. Fish. Res. 257, 106487. https://doi.org/10.1016/j.fishres.2022.106487. Truesdell, S.B., Bence, J.R., 2016. A Review of stock assessment methods for lake trout and lake whitefish in 1836 treaty waters of Lake Huron, Lake Michigan and Lake Superior. Quantitative Fisheries Center Technical Report T2016-01. https://doi.org/10.6084/m9.figshare.3123949. Truesdell, S.B., Bence, J.R., Syslo, J.M., Ebener, M.P., 2017. Estimating multinomial effective sample size in catch-at-age and catch-at-size models. Fish. Res. 192, 66-83. https://doi.org/10.1016/j.fishres.2016.11.003. U.S. v. Michigan. 2000. Stipulation for Entry of Consent Decree. U.S. District Court Western District of Michigan Southern Division, Case No. 2:73 CV 26. VanDeHey, J.A., Sloss, B.L., Peeters, P.J., Sutton, T.M., 2009. Genetic structure of lake whitefish (Coregonus clupeaformis) in Lake Michigan. Can. J. Fish. Aquat. Sci. 66 (3), 382-393. https://doi.org/10.1139/F08-213. Victorero, L., Watling, L., Deng Palomares, M.L., Nouvian, C., 2018. Out of Sight, But Within Reach: A Global History of Bottom-Trawled Deep-Sea Fisheries From >400 m Depth. Front. Mar. Sci. 5, 1-17. https://doi.org/10.3389/fmars.2018.00098. Wilberg, M.J., Bence, J.R., Eggold, B.T., Makauskas, D., Clapp, D.F., 2005. Yellow Perch dynamics in Southwestern Lake Michigan during 1986–2002. N. Am. J. Fish. Manage. 25, 1130-1152. https://doi.org/10.1577/M04-193.1. Wilberg, M.J., Thorson, J.T., Linton, B.C., Berkson, J., 2009. Incorporating time-varying catchability into population dynamic stock assessment models. Rev. Fish. Sci. 18, 7-24. https://doi.org/10.1080/10641260903294647. Yin, Y., Aeberhard, W.H., Smith, S.J., Flemming, J.M., 2019. Identifiable state-space models : A case study of the Bay of Fundy sea scallop fishery. Can. J. Stat. 47, 27-45. https://doi.org/10.1002/cjs.11470. 47 Zhao, Y., Morbey, Y.E., 2017. Estimating the size selectivity of trap nets using a gill-net selectivity experiment: Method development and application to lake whitefish in lake Huron. N. Am. J. Fish. Manage. 37, 1341-1349. https://doi.org/10.1080/02755947.2017.1381206. 48 APPENDIX 1A: DATA COLLECTION AND PROCESSING Fishery and biological data of lake trout and lake whitefish have been collected in the 1836 Treaty waters of the Great Lakes since 1985, though the time series for some regions begins later. Each year, the time series up to but not including that year are used to build stock assessments that are used to make harvest recommendations for the following year (i.e., the time series 1986-2017 is used in a model built in 2018 that makes recommendations for 2019). A catch reporting system provides monthly total harvest, 𝐻𝑦,𝐺 , in weight, and effort data, 𝐸𝑦,𝐺 , in lifts for the trap net fishery, or length of net for the gill net fishery from 1986 to 2017, which are pooled from daily reports mandated for each licensed commercial fisher (Ebener et al., 2005). Biological data were collected opportunistically by sampling catch from commercial trap and gill net fisheries from 1986-2017. The number of individual fish sampled each year varied considerably over time and between fisheries, from 86-1261 in the trap net fishery and 0-716 in the gill net fishery (Figure 1A.1). 49 Figure 1A.1. Number of fish sampled and aged from the gill net and trap net fishery of Lake Michigan lake whitefish in region WFM-03 from 1986-2017. In general, aging was done using scale samples for smaller (and presumably younger) fish and otoliths for larger (and presumably older) fish. A total of 2,484 scale samples were taken from individuals ranging from 375-669mm (averaging 464mm), and a total of 748 otolith samples were collected, from individuals ranging from 430-623mm (averaging 523mm). Otoliths are a more accurate and precise aging tool than either fin rays or scales to age lake whitefish and were assumed to be unbiased measurements (Herbst and Marsden, 2011). Because scale samples yield similar age estimates as otoliths at younger ages but are less precise at older ages, dividing the aging between the two structures should minimize aging error (Herbst and Marsden, 2011). (ℎ𝑎𝑟𝑣) Weight at age, 𝑊𝑎,𝑦 , and length at age, 𝐿𝑎,𝑦 , for the harvest in each year was determined using a growth model with year- and cohort- specific parameters, using data from both fisheries (He and Bence, 2007). These calculations were done prior to assessment model fitting and provided as inputs to the assessment model. Weight at age from the sampled harvest and samples from a graded-mesh gill net survey were used to calculate population weight at age at the 50 (𝑖𝑛𝑖𝑡) (𝑠𝑝𝑎𝑤𝑛) beginning of the year, 𝑊𝑎,𝑦 , and at the time of spawning, 𝑊𝑎,𝑦 , assuming that harvest occurs on June 30th, spawning occurs on October 30th, and growth from one year to the next follows an exponential model. The total harvest in weight by year and fishery, 𝐻𝑦,𝐺 , was converted into total harvest in numbers using the average weight by year and fishery, 𝑊 ̅𝑦,𝐺 . The observed total catch in each year by each fishery 𝐶𝑦,𝐺 was a function of total harvest, the mean weight, and a year- and fishery- specific adjustment term, 𝐴𝑦,𝐺 , that corrected for under reporting of each fishery, 𝐻𝑦,𝐺 ̅𝑊 ̅̅𝑦,𝐺 𝐶𝑦,𝐺 = . 𝐴𝑦,𝐺 Underreporting rate was estimated as the ratio of reported whitefish harvested by the fishers to the reported whitefish purchased by wholesalers (Ebener et al., 2005). Observed proportions at age for each fishery and year, 𝑃𝑎,𝑦,𝐺 were calculated from the commercial sampling data, by normalizing the number sampled at age and year by the total sampled in each year. Only fish age 4 and older were included in the stock (younger fish were extremely rarely caught) and all fish age 15 and older were aggregated into a plus group. The oldest recorded age was 32, however, individuals aged 20 or older were exceedingly rare in this region (<0.1% of sampled individuals). 51 APPENDIX 1B: CURRENT STATISTICAL CATCH AT AGE MODEL (SCA) I refer to this model as the current assessment, despite some minor modifications that occurred to the model in 2021, and it is an age-based statistical catch at age model (henceforth “SCA”) fit to commercial trap net and gill net fishery data with additional data inputs described below (Caroffino and Seider, 2020; Ebener et al., 2005; Truesdell and Bence, 2016). The model was built using AD Model Builder (Fournier et al., 2012). The years of data used in the assessment were from 1986 to 2017 (i.e., y = 1986, 1989 … 2017), the age classes ranged from 4 to 15+ (i.e., a= 4, 5, … A, where A=15+), and there were catch and effort data for each of the two fisheries (i.e., G=g, t). The model estimated the abundance of individuals ages 4-9 in 1986 and the abundance (i.e., recruitment) of age 4 individuals in each year. Log-scale recruitment was calculated as the sum of two log-scale parameters- an average recruitment, 𝑅̅ , and a vector of deviations, 𝐷𝑦 , log 𝑁4,𝑦 = log 𝑅̅ + log 𝐷𝑦 . The number of individuals from age 4 to 9 in the initial year were calculated in a similar way, using the same average recruitment and another set of deviation values 𝑑𝑎 , log 𝑁𝑎,1986 = log 𝑅̅ + log 𝑑𝑎 where 𝑎 < 10. The number of individuals in the initial year from age 10 to 15 were set to 0, because few or no fish of these ages were detected in the catch and there were insufficient data to reliably inform abundance and catchability. The effects of this simplification on model output, particularly in later years of the time series, are likely negligible. The vectors of 𝐷𝑦 and 𝑑𝑎 were constrained to collectively sum to 0. All other abundances at age and year, after the initial year and initial age, 𝑁𝑎,𝑦 , followed an exponential decay with all fish age A or older subject to the same year-specific instantaneous mortality (Table 1.2, Eq. 1.4). The instantaneous total mortality 52 at age and year, 𝑍𝑎,𝑦 , was the sum of age-, year-, and fishery- specific instantaneous fishing mortality, 𝐹𝑎,𝑦,𝐺 , and a single estimated instantaneous natural mortality, 𝑀 (Eq. 1.5). Instantaneous fishing mortality was the product of year- and fishery- specific catchability, 𝑞𝑦,𝐺 , year- and fishery- specific fishing effort, 𝐸𝑦,𝐺 , and age- and year- and fishery- specific selectivity, 𝑠𝑎,𝑦,𝐺 (Eq. 1.6a). Log-scale catchability varied over time according to a random walk (Eq. 1.7a). Selectivity at age for each fishery was a function of mean length at age and varied over time in the trap net fishery. The selectivity of the trap net fishery was modeled as a logistic function of mean length at age each year, normalized against a reference length, 455 mm (i.e., selectivity was 1.0 at this mean length at age): 1 1 + exp(−𝑝2,𝑡 (𝐿𝑦,𝑎 − 𝑝1,𝑦,𝑡 )) 𝑠𝑎,𝑦,𝑡 = 1 1 + exp (−𝑝2,𝑡 (455 − 𝑝1,𝑦,𝑡 )) (𝑠) (𝑠) log 𝑝1,𝑦+1,𝑡 = log 𝑝1,𝑦,𝑡 + 𝜀𝑦 ; 𝜀𝑦 ~𝑁(0, 𝜎𝑠2 ); 𝑦 < 2017. The parameters, 𝑝1,𝑦,𝑡 and 𝑝2,𝑡 , are the inflection point (varying over years), and the determinant of the slope at the inflection point (constant over years), respectively. The inflection point for the initial year was estimated and then varied over according to a random walk, the distribution of which was included in the total objective function (Eq. 3.4), Selectivity of the gill net fishery used a lognormal function with two estimated parameters, 𝑝1,𝑔 and 𝑝2,𝑔 , and was normalized to equal 1.0 for a mean length at age of exp (𝑝2,𝑔 ), 53 2 1 −(log 𝐿𝑦,𝑎 −𝑝2,𝑔 ) ∗exp( ) √2𝜋𝑝1,𝑔 𝐿𝑦,𝑎 2𝑝1,𝑔 2 𝑠𝑎,𝑦,𝑔 = 1 . √2𝜋𝑝1,𝑔 exp 𝑝2,𝑔 (𝑆𝑝𝑎𝑤𝑛) Spawning stock biomass, 𝐵𝑦 , in each year was calculated by multiplying the weight at age during spawning (adjusted with maturity schedule), by the number of individuals at age in each year (Eq. 1.8). The predicted age-, year-, and fishery- specific catch, 𝐶𝑎,𝑦,𝐺 , was calculated from the Baranov catch equation (Eq. 2.1). The annual proportions at age were calculated from the age- and year- specific total catch (Eq. 2.2). The objective function of the SCA model, T, was the sum of the negative log prior probability density of the parameters, NLP, and the negative log likelihood of the data given the parameters, NLL (Eq. 3.1). The NLP was the sum of explicit components explaining variability in recruitment, 𝑁4,𝑦 , log-scale catchability (trap and gill net), and selectivity (trap net). Priors for all other parameters were from uniform distributions, specified via bounds for the parameters. Recruitment was constrained such that estimated values did not deviate substantially from a mean predicted by a Ricker stock recruitment function, 𝑁 ̂4,𝑦 which depended on Ricker stock recruitment parameters, 𝛼 and 𝛽, and year-specific spawning stock size in eggs, ϵ𝑦 (−𝛽ϵ𝑦 ) log 𝑁̂4,𝑦+5 = 𝛼ϵ𝑦 . The predicted value was compared to the current estimates, 𝑁𝑎,𝑦 , using a likelihood framework during model fitting (Eq. 3.2a). Fecundity data, including age- and year- specific maturity, 𝑚𝑎,𝑦 , the average number of eggs per kilogram, 𝑒, and the average proportion of females in the spawning population, 𝑓, were input into the model. The maturity matrix was a 5- year running average that was the same across all 1836 treaty waters and calculated by applying 54 an age/length key to maturity at length data. Eggs per kilogram was also a universal value based on samples collected in 1983 and 1996 (Ebener et al., 2005). The proportion of females was calculated from samples of the commercial harvest and was a single age- and year-invariant value for each management unit, which was 0.4845 for WFM-03 (Ebener et al., 2005; Patriarche, 1977). Spawning stock size (i.e., eggs) was calculated from maturity at age, weight at age during (𝑠𝑝𝑎𝑤𝑛) spawning, 𝑊𝑦,𝑖 , eggs per kilogram, percent female, numbers at age, and total instantaneous mortality for the proportion of the year that occurs before spawning (0.838), 𝐴 (𝑠𝑝𝑎𝑤𝑛) −𝑍 ∗0.838 ϵ𝑦 = ∑ 𝑚𝑦,𝑖 𝑊𝑦,𝑖 𝑒𝑓𝑁𝑖,𝑦 𝑦,𝑖 𝑖=4 The deviations from predicted mean recruitment were assumed to be lognormally distributed (Eq. 3.2a). Note that these terms start in 1991, which is the year class produced by the spawning biomass in 1986 (the first year being modeled). The errors for the log-scale trap net and gill net catchability random walks were assumed to come from fishery-specific normal distributions (Eq. 3.3a). Similarly, the errors for the random walk of the time-varying trap net selectivity parameter, 𝑝1,𝑦,𝑡 , were assumed to come from a normal distribution (Eq. 3.4). The NLP also included a prior expectation on instantaneous natural mortality, that it was assumed to arise from a lognormal distribution with median 𝑀 = 0.2 and standard deviation, 𝜎𝑀 = 0.1 (Eq. 3.5a). The NLL included four summed components, for the total catch and proportions at age of the trap net and gill net fisheries. The observed total catch by year, for each fishery was assumed to come from a lognormal distribution, with median equal to the predicted catch (Eq. 3.6). The observed catch proportions at age by year for each fishery were assumed to behave as though 55 they resulted from sampled numbers at age following a multinomial distribution, with fishery- specific effective sample size, 𝑛𝐺 determined a priori using an iterative reweighting algorithm (Francis, 2011; Truesdell et al., 2017) (Eq. 3.7). The effective sample size in each year was the product of the number of fish sampled and the gear-specific scalar, 0.12 and 0.06 for the trap and gill net fisheries, respectively, rounded to the nearest whole number. Those effective sample sizes are reported in Table 1B.1. Table 1B.1. Year-specific effective sample size of the multinomial distribution of proportion at age of the trap net, 𝑛𝑡 , and gill net, 𝑛𝑔 , fisheries. Year 𝑛𝑡 𝑛𝑔 1986 40 0 1987 28 5 1988 24 4 1989 56 0 1990 40 0 1991 51 10 1992 46 12 1992 53 30 1993 75 14 1994 40 11 1995 97 21 1996 105 37 1997 107 41 1998 72 31 1999 60 18 2000 107 15 2001 147 23 2002 41 15 2003 79 10 2004 59 33 2005 78 21 2006 10 18 2007 27 11 2008 38 14 2009 14 10 2010 27 18 2011 41 42 2012 61 29 2013 65 12 56 Table 1B.1 (cont’d) 2014 52 24 2015 25 13 2016 41 17 2017 40 0 The standard deviations for the above distributions were calculated as the product of a single common standard deviation term, 𝜎, which was estimated, and a standard deviation specific multiplier, 𝜑, that was pre-specified. This constraint was implemented because in this model structure there was insufficient information to independently estimate the standard deviations of each process variability and observation error distribution. The model used a minimization algorithm to determine the point estimates for parameters that minimized the total objective value 𝑇. The estimated parameters and standard errors are reported in Table 1B.2. Table 1B.2. Parameter estimates and standard error of SCA model. Parameter Standard Estimate (Symbol) Error 𝑀 0.1838 0.0171 𝜎 0.0445 0.0026 𝑞1986,𝑡 0.0221 0.0051 𝑞1987,𝑡 0.0275 0.0055 𝑞1988,𝑡 0.0287 0.0047 𝑞1989,𝑡 0.0241 0.0040 𝑞1990,𝑡 0.0208 0.0035 𝑞1991,𝑡 0.0270 0.0042 𝑞1992,𝑡 0.0278 0.0040 𝑞1993,𝑡 0.0302 0.0044 𝑞1994,𝑡 0.0334 0.0048 𝑞1995,𝑡 0.0328 0.0051 𝑞1996,𝑡 0.0317 0.0048 𝑞1997,𝑡 0.0303 0.0047 𝑞1998,𝑡 0.0281 0.0043 𝑞1999,𝑡 0.0286 0.0038 𝑞2000,𝑡 0.0254 0.0037 𝑞2001,𝑡 0.0220 0.0029 𝑞2002,𝑡 0.0188 0.0029 𝑞2003,𝑡 0.0206 0.0032 57 Table 1B.2 (cont’d) 𝑞2004,𝑡 0.0280 0.0041 𝑞2005,𝑡 0.0258 0.0039 𝑞2006,𝑡 0.0247 0.0036 𝑞2007,𝑡 0.0235 0.0039 𝑞2008,𝑡 0.0189 0.0028 𝑞2009,𝑡 0.0164 0.0023 𝑞2010,𝑡 0.0142 0.0022 𝑞2011,𝑡 0.0155 0.0025 𝑞2012,𝑡 0.0147 0.0025 𝑞2013,𝑡 0.0116 0.0023 𝑞2014,𝑡 0.0093 0.0022 𝑞2015,𝑡 0.0080 0.0023 𝑞2016,𝑡 0.0064 0.0022 𝑞2017,𝑡 0.0067 0.0025 𝑞1986,𝑔 0.1015 0.0147 𝑞1987,𝑔 0.0858 0.0114 𝑞1988,𝑔 0.0809 0.0103 𝑞1989,𝑔 0.0811 0.0112 𝑞1990,𝑔 0.0486 0.0059 𝑞1991,𝑔 0.0687 0.0086 𝑞1992,𝑔 0.0608 0.0067 𝑞1993,𝑔 0.0682 0.0084 𝑞1994,𝑔 0.0614 0.0081 𝑞1995,𝑔 0.0659 0.0086 𝑞1996,𝑔 0.0484 0.0064 𝑞1997,𝑔 0.0525 0.0075 𝑞1998,𝑔 0.0454 0.0063 𝑞1999,𝑔 0.0389 0.0052 𝑞2000,𝑔 0.0486 0.0070 𝑞2001,𝑔 0.1061 0.0167 𝑞2002,𝑔 0.0734 0.0113 𝑞2003,𝑔 0.0435 0.0066 𝑞2004,𝑔 0.0631 0.0101 𝑞2005,𝑔 0.0656 0.0104 𝑞2006,𝑔 0.0754 0.0120 𝑞2007,𝑔 0.0701 0.0122 𝑞2008,𝑔 0.0743 0.0118 𝑞2009,𝑔 0.0532 0.0073 𝑞2010,𝑔 0.0448 0.0060 𝑞2011,𝑔 0.0555 0.0084 𝑞2012,𝑔 0.0435 0.0072 𝑞2013,𝑔 0.0329 0.0066 𝑞2014,𝑔 0.0509 0.0125 58 Table 1B.2 (cont’d) 𝑞2015,𝑔 0.0257 0.0070 𝑞2016,𝑔 0.0303 0.0088 𝑞2017,𝑔 0.0198 0.0063 𝑝1,1986,𝑡 475.38 9.5069 𝑝1,1987,𝑡 483 8.735 𝑝1,1988,𝑡 475.83 10.337 𝑝1,1989,𝑡 471.67 7.9452 𝑝1,1990,𝑡 479.03 10.072 𝑝1,1991,𝑡 463.82 8.2848 𝑝1,1992,𝑡 466.99 6.9187 𝑝1,1993,𝑡 470.45 6.5829 𝑝1,1994,𝑡 467.32 6.5738 𝑝1,1995,𝑡 479.12 5.7553 𝑝1,1996,𝑡 480.74 5.4075 𝑝1,1997,𝑡 478.89 5.1605 𝑝1,1998,𝑡 491.21 5.3963 𝑝1,1999,𝑡 470.89 6.7586 𝑝1,2000,𝑡 473.77 6.4606 𝑝1,2001,𝑡 447.77 5.9078 𝑝1,2002,𝑡 473.12 5.6466 𝑝1,2003,𝑡 472.59 7.4029 𝑝1,2004,𝑡 458.43 6.8059 𝑝1,2005,𝑡 461 8.519 𝑝1,2006,𝑡 437.27 7.8146 𝑝1,2007,𝑡 435 8.3108 𝑝1,2008,𝑡 435.06 7.9925 𝑝1,2009,𝑡 450.69 6.734 𝑝1,2010,𝑡 472.2 7.4435 𝑝1,2011,𝑡 469.65 6.3439 𝑝1,2012,𝑡 478.42 6.9458 𝑝1,2013,𝑡 477.52 7.1065 𝑝1,2014,𝑡 439.58 7.3909 𝑝1,2015,𝑡 463.28 9.3819 𝑝1,2016,𝑡 476.64 13.21 𝑝1,2017,𝑡 478.23 11.348 𝑝2,𝑡 0.0600 0.0040 𝑝1,𝑔 0.0833 0.0063 𝑝2,𝑔 6.29 0.0154 𝑅̅ 504140 49815 𝐷1986 1.4627 0.2157 𝐷1987 0.8984 0.1619 𝐷1988 0.5004 0.1086 𝐷1989 0.6276 0.1224 59 Table 1B.2 (cont’d) 𝐷1990 1.624 0.2177 𝐷1991 2.0457 0.2577 𝐷1992 2.3842 0.2723 𝐷1993 1.6725 0.2007 𝐷1994 1.3842 0.1679 𝐷1995 3.0649 0.2843 𝐷1996 2.5721 0.2435 𝐷1997 2.4709 0.2369 𝐷1998 2.2904 0.2348 𝐷1999 2.3116 0.2563 𝐷2000 2.0771 0.2498 𝐷2001 1.6651 0.2206 𝐷2002 1.5444 0.2248 𝐷2003 1.2869 0.2152 𝐷2004 2.1706 0.3452 𝐷2005 2.9389 0.4021 𝐷2006 2.4684 0.3507 𝐷2007 2.0104 0.2881 𝐷2008 1.657 0.2356 𝐷2009 1.4356 0.2023 𝐷2010 1.1113 0.1674 𝐷2011 0.9298 0.1522 𝐷2012 0.8216 0.1522 𝐷2013 0.8075 0.1805 𝐷2014 0.7559 0.1974 𝐷2015 0.9726 0.3089 𝐷2016 0.6199 0.2238 𝐷2017 0.6573 0.3222 𝑑5 1.006 0.1666 𝑑6 0.4012 0.0992 𝑑7 0.0923 0.0462 𝑑8 0.0242 0.0235 𝑑9 0.0134 0.0175 𝛼 0.0003 0.0001 𝛽 1.17E-10 3.64E-11 Several plots are included to illustrate model diagnostics. Histogram plots of ordinary least squared (OLS) residuals (observed-expected value) of (A) total trap net catch on the log- scale, (B) total gill net catch on the log-scale, and one step ahead (OSA) residuals of (C) trap net 60 proportions at age, and (D) gill net proportions at age are reported in Figure 1.B1. The OLS residuals of the total trap net and gill net fishery catch over time are reported in Figures 1B.2- 1B.3. The OSA residuals of proportions at age are plotted as a bubble plot to highlight potential patterns across age and time (Figures 1B.4-1B.5). Retrospective plots for the recruitment and spawning stock biomass are reported in Figures 1B.6-1B.7. Figure 1B.1. Histogram plots of ordinary least squared (OLS) residuals of (A) total trap net catch on the log-scale and (B) total gill net catch on the log-scale, and one step ahead (OSA) residuals of (C) trap net proportions at age and (D) gill net proportions at age. 61 Figure 1B.2. Ordinary least squared (OLS) residuals of total log-scale catch of the trap net fishery over time. Observed catch was modeled as a lognormal distribution with mean equal to the expected catch and standard deviation 0.067. Figure 1B.3. Ordinary least squared (OLS) residuals of total log-scale catch of the gill net fishery by year. Observed catch was modeled as a lognormal distribution with mean equal to the expected catch and standard deviation 0.067. 62 Figure 1B.4. One step ahead (OSA) residuals of proportions at age of the trap net fishery. Observed proportions were modeled as a multinomial distribution with year-specific effective sample size. No residuals were reported for year and age combination where the expected and observed value was 0 (older ages in the first 6 years) or for age 15+, because the way OSA residuals are calculated does not allow for estimates in the oldest age. 63 Figure 1B.5. One step ahead (OSA) residuals of proportions at age of the gill net fishery. Observed proportions were modeled as a multinomial distribution with year-specific effective sample size. No residuals were reported for year and age combination where the expected and observed value was 0 (older ages in the first 6 years) or for age 15+, because the way OSA residuals are calculated does not allow for estimates in the oldest age. Note that years 1986, 1989, and 1990 are also absent because the sample size was zero. 64 Figure 1B.6. Retrospective analysis of estimated recruitment. The SCA model was refit after removing sequential years of the most recent observations. Dark colors indicate more complete data sets and lighter colors indicate less complete data sets. Figure 1B.7. Retrospective analysis of estimated spawning stock biomass. The SCA model was refit after removing sequential years of the most recent observations. Dark colors indicate more complete data sets and lighter colors indicate less complete data sets. 65 APPENDIX 1C: MODEL DIAGNOSTICS FOR THE STATE-SPACE MODEL The state-space stock assessment model predicted temporally variability in age structure with older ages being more represented in the population in later years (Figure 1C.1). The state- space stock assessment model (SSM) fit the catch and catch composition data well for both fisheries with plots of the predicted and observed total harvest over time entirely overlapping, with no difference distinguishable through line graphs, and therefore are not reported here. The histograms of one step ahead (OSA) residuals for the total catch on the log-scale, proportions at age, and catchability process variability are plotted in Figure 1C.2. The residuals were plotted over time in Figures 1C.3-1C.6. Retrospective plots for the recruitment and spawning stock biomass are reported in Figures 1C.7-1C.8. Figure 1C.1. Predicted proportions at age over time in the state-space stock assessment model (SSM). 66 Figure 1C.2. Histogram plots of one step ahead (OSA) residuals of (A) total trap net catch on the log-scale, (B) total gill net catch on the log-scale, (C) trap net proportions at age and(D) gill net proportions at age, and the process variability at age of the (E) trap net catchability and (F) gill net catchability. 67 Figure 1C.3. One step ahead (OSA) residuals of total log-scale catch of the trap net fishery by year. Observed catch was modeled as a lognormal distribution with mean equal to the expected catch and standard deviation 0.067. Figure 1C.4. One step ahead (OSA) residuals of total log-scale catch of the gill net fishery by year. Observed catch was modeled as a lognormal distribution with mean equal to the expected catch and standard deviation 0.067. 68 Figure 1C.5. One step ahead (OSA) residuals of proportions at age of the trap net fishery. Observed proportions were modeled as a multinomial distribution with year-specific effective sample size. No residuals were reported for year and age combination where the expected and observed value was 0 (older ages in the first 6 years) or for age 15+, because the way OSA residuals are calculated does not allow for estimates in the oldest age. 69 Figure 1C.6. One step ahead (OSA) residuals of proportions at age of the gill net fishery. Observed proportions were modeled as a multinomial distribution with year-specific effective sample size. No residuals were reported for year and age combination where the expected and observed value was 0 (older ages in the first 6 years) or for age 15+, because the way OSA residuals are calculated does not allow for estimates in the oldest age. Note that years 1986, 1989, and 1990 are also absent because the sample size was zero. 70 Figure 1C.7. Retrospective analysis of estimated recruitment in number of fish. The SSM model was refit after removing sequential years of the most recent observations. Dark colors indicate more complete data sets and lighter colors indicate less complete data sets. Figure 1C.8. Retrospective analysis of estimated spawning stock biomass. The SSM model was refit after removing sequential years of the most recent observations. Dark colors indicate more complete data sets and lighter colors indicate less complete data sets. 71 CHAPTER 2: THE EFFECT OF PROCESS VARIABILITY AND DATA QUALITY ON PERFORMANCE OF A STATE-SPACE STOCK ASSESSMENT MODEL 2.1 Abstract State-space modeling is an emerging approach to age structured fisheries stock assessment that can accommodate multiple sources of variability in processes like recruitment, abundance, and selectivity. By maximizing the marginal likelihood by treating yearly deviations as random effects and then integrating them from the likelihood, these models can estimate multiple process variances. Several fisheries software packages have been developed that use a state-space framework with marginal likelihood, which has increased their popularity and usage across the U.S. Atlantic coast, Canada, and Europe. However, robust testing is still needed to gauge the applicability of these models and understand how they perform over a range of realistic variability in the process or observation error. Using an assessment model fit to Gulf of Maine haddock as a baseline, I used a simulation-estimation procedure to evaluate if state-space stock assessment models could produce unbiased and precise estimates over process variances that ranged from zero to well above the levels estimated in the Gulf of Maine haddock assessment, or when observations were noisier (i.e., more variable around their true value) than had been assumed in the assessment. I fit alternative estimation models that differed in which processes errors were included (of recruitment, expected survival, and fishery selectivity). State- space models that specify random effects in all three processes produced relatively unbiased and precise estimates of biomass and exploitation under most simulation scenarios and therefore are recommended except when variability in expected survival is absent (or very low), in which case the model is unlikely to converge. A conventional statistical-catch-at-age model with recruitment 72 estimated as a fixed effect for each year, deterministic expected survival, and constant selectivity produced estimates that were comparable to the best performing state-space model, but did not provide internal estimates of process variances and did not perform well when recruitment was highly variable. This work will facilitate the use of state-space stock assessment models and the choice of parameterization that will produce the most accurate output to inform future predictions and management. 2.2 Introduction Fisheries stock assessment modelers have long worked to parse process variation from observation noise (Fournier and Archibald, 1982). Process variation (often “process error”) is the stochasticity in underlying biological state processes like recruitment, natural mortality, abundance, catchability, and selectivity. Observation noise (often “observation error”) are factors that create differences between what is real and what is measured such as deviations from random sampling. Process and observation error can be quantified using probability distributions and in a maximum likelihood estimation framework, those probability distributions are included in the likelihood, either added on the log-scale in a joint likelihood or errors-in-variables framework or integrated from the likelihood in the marginal likelihood (de Valpine and Hilborn, 2005). When maximizing marginal likelihood, it is at least theoretically possible to estimate multiple process variances and in some cases observation variances (Cadigan, 2015; Nielsen and Berg, 2014). Thanks to advances in fisheries stock assessment modeling software like Template Model Builder (TMB) that can approximate the high-level integrals, state-space stock assessment models that maximize the marginal likelihood and estimate many of the variances have emerged as a popular addition to the modeling toolbox (Aanes et al., 2020; Kristensen et al., 2016). In fact, state-space and the marginal likelihood have become largely synonymous in quantitative 73 fisheries discourse, though Bayesian methods still flourish, particular in state-space surplus production models (Han et al., 2023; Soto et al., 2022). Where once a stock assessment model could perhaps only account for stochasticity in one underlying process like recruitment or catchability, state-space modeling software like SAM or the Woods Hole Assessment Model (WHAM) now default to including variability in several underlying processes (Johnson et al., 2014; Linton and Bence, 2008; Nielsen and Berg, 2014; Stock and Miller, 2021). State-space stock assessment models (i.e., those that use a marginal likelihood) have several demonstrated advantages over non-state-space assessment models. They can more realistically represent dynamics by including stochasticity in recruitment, abundance, selectivity, and natural mortality, and allowing these to vary among ages, years, and region (Perreault et al., 2020; Thorson, 2019; Trijoulet et al., 2020). This increased realism could decrease bias caused by misspecification (Deroba and Schueller, 2013). Specifying variation as a random effect in a marginal likelihood will decrease bias even further (Thorson et al., 2019). With such a framework, the standard deviation of the probability distribution of these random effects can be estimated as a parameter in the model, rather than being fixed a priori or by calculating the distribution of independent fixed effect parameters after the model is fit. Accurately estimating the variance of recruitment is especially important because it can inform our understanding of a species extinction probability or guide rebuilding plans, and estimating it within a statistical model can propagate uncertainty (Maunder and Deriso, 2003; Thorson et al., 2014a). Since they account for more of the process variability that actually occurs, state-space models should exhibit less retrospective patterns than non-state-space models (Hurtado-Ferro et al., 2015; Perreault et al., 2020; Stewart and Martell, 2014; Stock et al., 2021; Szuwalski et al., 2018). The additional 74 variation may also resolve data conflicts in integrated assessments (Maunder and Piner, 2017; Stewart and Monnahan, 2017). As stock assessment practitioners move to adopt state-space methods it is important to understand how well these models perform at producing unbiased and precise estimates of quantities like spawning stock biomass, recruitment, and exploitation, if the process variances can be estimated under a broad range of true values, and if they reduce retrospective patterns (the differences in model output when the most recent years are included in the analysis). My objective was to evaluate the performance of state-space stock assessment models using a simulation-estimation procedure wherein data were generated with different levels of variability in recruitment, numbers at age or “expected survival” (given the fishing mortality), and fishery selectivity, ranging from no variability (i.e., constant over time) to variability that was two times the value estimated from fitting to real Gulf of Maine haddock catch and index data. In a separate procedure, I tested two different levels of observation error by generating data with higher observation error variability in total catch, the survey indices, or the proportions at age of the catch or indices set at higher levels than in the baseline Gulf of Maine haddock assessment. I fit each simulated dataset with a suite of alternative estimation models that made different assumptions about the existence and nature of the variability in the three processes of recruitment, expected survival, and selectivity. 75 2.3 Methods 2.3.1 Overview My approach was to generate datasets using an operating model and fit alternative estimation models to each simulated dataset. I simulated 100 datasets for each scenario, which were defined by the variance of the process errors assumed for recruitment, expected survival, and selectivity, or the observation variance in total catch and index data and the “effective sample size” describing the observation noise in the respective proportions at age. The baseline estimation model was based on a state-space stock assessment model fit to Gulf of Maine haddock (Melanogrammus aeglefinus). In alternative estimation models I altered the treatment of process errors (including leaving out process errors in some or all the processes). The structure of the simulation model, as well as the baseline values of process error variances and the values of other parameters was also based on the state-space stock assessment model fit to Gulf of Maine haddock. 2.3.2 Determination of Baseline Estimation Model The baseline estimation model was an age-based state-space stock assessment model, adapted from the statistical catch at age model used to assess Gulf of Maine haddock using the program ASAP (Legault and Restrepo, 1998). Description of symbols are provided in Table 2.1. The state-space stock assessment model was built using the Woods Hole Assessment Model (WHAM) package which is an R package dependent on Template Model Builder (TMB) (Kristensen et al., 2016; Stock and Miller, 2021). WHAM offers several options for parameterizing process error in recruitment, numbers at age (i.e., expected survival), selectivity, and other state variables that were not explored in the current study. The process errors can be specified as random effects and integrated from the joint likelihood of data and random effects 76 (i.e., the marginal likelihood). Recruitment can be parameterized as a random walk or random about a mean, and deviations from the previous year’s value or mean, respectively, can be independent and identically distributed (“iid”) or correlated among years. Age- and year- specific deviations in expected survival or mean selectivity can be iid, correlated across years or ages in a 1-dimensional autoregressive process (“AR1_y” or “AR1_a”) or both ages and years simultaneously in a 2-dimentional autoregressive process (“2DAR1”). Refer to Stock and Miller (2021) for a full description of WHAM and its many parameterization options. Table 2.1. Symbols used in the text, tables, and figures. For indices, time and age invariant data, and some simple derived quantities, the value is provided. Symbol Description Usage Index Variables 𝑦, 𝑦̃ Year 1977-2019 𝑌 Terminal Year 2019 𝑎, 𝑎̃ Age 1-8 A Terminal age plus group 9+ 𝑖 Survey Index F (Fall), S (Spring) Data and Inputs 𝐶𝑦 Total observed catch by year 𝑃𝑎,𝑦 Observed proportion at age by year in the catch 𝐼𝑦,𝑖 Observed total index value by year and survey 𝑝𝑎,𝑦,𝑖 Observed proportion at age by year and survey of the index 𝑀 Instantaneous natural mortality 0.2 𝑚𝑎,𝑦 Maturity at age and year 𝑊𝑎,𝑦 Weight at age and year at time of capture (𝑆) Weight at age and year at time of spawning 𝑊𝑎,𝑦 𝑓 Fraction of the year that passes before spawning occurs 0.25 𝑁7:9+,1977 Numbers at age (in thousands of individuals) for the [5,1,20] oldest age groups in the first year 𝑆4:9+,𝐹 Selectivity of ages 4-9+ in the fall survey [1,1,1,1,1,1] 𝑆5:9+,𝑆 Selectivity of ages 5-9+ in the spring survey [0.9,1,1,1,1] 𝜎𝐶𝑦 Standard deviation of log-normally distributed See Appendix observation error of total catch 2A 𝜎𝐼2𝑦,𝑖 Standard deviation of log-normally distributed See Appendix observation error of index values 2A (𝐼) Effective sample size of multinomial distribution of See Appendix 𝑛𝑦,𝑖 observation error of proportions at age of the indices 2A States/Random Effects (𝑅𝑒𝑐) Deviations from mean recruitment 𝜀𝑦 (𝑆𝑢𝑟) Deviations from expected abundance at age, which is 0 𝜀𝑎,𝑦 (𝑆𝑒𝑙) Deviations from mean selectivity at age 𝜀𝑎,𝑦 77 Table 2.1 (cont’d) Fixed Effect Parameters 𝑅̅ Mean recruitment 𝑁1:6,1977 Numbers at age for the youngest age groups in the first year 𝑠̅𝑎 Mean selectivity at age of the catch 𝐹𝑦 Fishing morality at year 𝜎𝑅 Standard deviation of process error around mean recruitment 𝜎𝑆𝑢𝑟 Standard deviation of process error around survival 𝜎𝑆𝑒𝑙 Standard deviation of process error around selectivity at age of catch Correlation coefficient of process errors around 𝜌𝑁𝐴𝐴 recruitment and survival, which is correlated among years 𝜌𝑆𝑒𝑙 Correlation coefficient of process errors around selectivity which is correlated among years 𝑞𝑖 Catchability of indices 𝑆𝑎,𝑖 Selectivity of indices at age Derived Quantities 𝑁𝑎,1978:2019 Abundance (“numbers at age”) in thousands 𝐹𝑦 Fully selected instantaneous fishing mortality 𝐹𝑎,𝑦 Age-specific instantaneous fishing mortality 𝑠𝑎,𝑦 Selectivity at age and year 𝐶̂𝑎,𝑦 Expected catch at age in numbers 𝐶𝑦 ̂ Expected total catch in weight 𝑃̂𝑎,𝑦 Expected proportions at age of the catch ̂𝐼𝑎,𝑦,𝑖 Expected fisheries independent index value at age ̂𝐼𝑦,𝑖 Expected total fisheries independent index value 𝑝̂𝑎,𝑦,𝑖 Expected proportions at age of the index (𝑆) Spawning stock biomass 𝐵𝑦 𝐸𝑦 Exploitation rate 𝑇 Average spawning stock biomass trend To determine the structure of the baseline estimation model, the real Gulf of Maine haddock data were fit using several parameterizations of the WHAM model, each with different combinations of process error structure in recruitment, expected survival, or selectivity, from those options described above, as well as for several options for the age composition likelihood (multinomial, Dirichlet-multinomial, and logistic normal). The Gulf of Maine haddock data includes annual total biomass of landings (for commercial and recreational fisheries combined), survey indices from the spring and fall Northeast Fisheries Science Center (NEFSC) bottom trawl surveys, and the numerical proportion at age of those landings and surveys (from 1-9+) 78 from 1977 through 2019 (NEFSC, 2022b). Other input data included biological information like natural mortality, maturity at age, and weight at age at time of catch or spawning. The estimation model was chosen based on the minimum Akaike’s Information Criteria (AIC) value (Akaike, 1998), and by robustness in generating and fitting simulated data. In all the estimation models fit using the Gulf of Maine haddock data, observation error variances for total annual landings and annual survey indices of total abundance were pre- specified and treated as known. Similarly, effective sample sizes of the multinomial distribution for the fishery and survey proportions at age were pre-specified and not estimated. WHAM has optional parameters that can scale the input year-specific standard deviation of the lognormal distribution of total annual catch or annual survey indices. However, when I tried to estimate these parameters, the model failed to converge. The effective sample size of the multinomial distribution cannot be estimated within the model and I did not use iterative reweighting when fitting the haddock data or simulated data in WHAM because it is computationally costly and particularly challenging in state-space models (Maunder, 2011). The effective sample sizes were premised on an iterative reweighting procedure following Francis (2011) used during the haddock assessment using ASAP, but the values were not re-estimated for the WHAM model (NEFSC, 2022b). Alternative age composition likelihoods like the Dirichlet-multinomial or logistic normal do have weighting parameters, but both performed poorly at generating data that could be fit using the estimation model, likely owing to too much process and observation error variability. The values that determined observation error variance- the standard deviation of the lognormal distribution and the effective sample size of the multinomial, - were fixed at the values used in the original ASAP assessment model (NEFSC, 2022b). 79 In the selected estimation model, recruitment (i.e., the number of age 1 individuals) in each year, 𝑁1,𝑦 varied about the mean recruitment following a lognormal distribution and AR(1) process (Table 2.2, Eqs. 2.1-2.2). Under this model, deviations in a given year would tend to be more similar to those in years immediately prior or subsequent (Eq. 2.2). The model estimated the mean recruitment, 𝑅̅ , standard deviation, 𝜎𝑅 , and correlation coefficient, 𝜌𝑁𝐴𝐴 , as fixed (𝑅𝑒𝑐) effects, and the deviations from the mean, 𝜀𝑦 , as random effects. Log-scale abundance in the initial year, 𝑙𝑜𝑔𝑁𝑎,1977, was estimated individually for ages 1-6 and fixed for ages 7-9+. Fixing the abundance for the older ages in the initial year improved model stability and parameter estimability when fitting to both real and simulated data. Table 2.2. Equations used in the estimation and simulation models. Index Description Equation Process Model Yearly recruitment is a (𝑅𝑒𝑐) function of mean 𝑙𝑜𝑔𝑁1,𝑦 = 𝑙𝑜𝑔𝑅̅ + 𝜀𝑦 , 𝑦 > 1977 2.1 recruitment and random process error 2 Recruitment deviations (𝑅𝑒𝑐) (𝑅𝑒𝑐) 𝜎𝑅 𝜀𝑦 ~𝑁(𝜌𝑁𝐴𝐴 𝜀𝑦−1 − 2 , 𝜎𝑅2 ) 2.2 are randomly distributed 2(1−𝜌 𝑁𝐴𝐴 ) as stationary AR(1) Fishing mortality at age 𝐹𝑎,𝑦 = 𝐹𝑦 𝑠𝑎,𝑦 , 𝑎 < 7 and year is the product of 2.3 fully selected F, and age 𝐹𝑎,𝑦 = 𝐹𝑦 , 𝑎 ≥ 7 and year specific selectivity Selectivity at age and year are the sum of a 1 mean parameter for each 𝑠𝑎,𝑦 = (𝑆𝑒𝑙) 2.4 1+exp (−(𝑙𝑜𝑔𝑖𝑡(𝑠̅ 𝑎 )+𝜀𝑎,𝑦 )) age and deviations, transformed on the logit scale 2 2.5 Selectivity at age (𝑆𝑒𝑙) (𝑆𝑒𝑙) 𝜀𝑎,𝑦 ~𝑁 (𝜌𝑆𝑒𝑙 𝜀𝑎,𝑦−1 − 𝜎𝑆𝑒𝑙 2 , 𝜎𝑠𝑒𝑙 ),𝑎 < 7 2(1−𝜌 2 deviations are 𝑆𝑒𝑙 ) distributed as stationary AR(1) 2.6 Age- and year- (𝑆𝑢𝑟) 𝑙𝑜𝑔𝑁𝑎,𝑦 = log (𝑁𝑎−1,𝑦−1 ) + −𝐹𝑎−1,𝑦−1 − 𝑀 + 𝜀𝑎,𝑦 , 1 < 𝑎 < 𝐴 specific abundance is a function of 𝑙𝑜𝑔𝑁𝐴,𝑦 = log (𝑁𝐴−1,𝑦−1 𝑒 −𝐹𝐴−1,𝑦−1−𝑀 + 𝑁𝐴,𝑦−1 𝑒 −𝐹𝐴,𝑦−1 −𝑀 ) abundance in the previous age and (𝑆𝑢𝑟) year, survival, and + 𝜀𝐴,𝑦 , 𝑎 = 𝐴 random process error 80 Table 2.2 (cont’d) 2 2.7 Numbers at age (𝑆𝑢𝑟) (𝑆𝑢𝑟) 𝜀𝑎,𝑦 ~𝑁 (𝜌𝑁𝐴𝐴 𝜀𝑎,𝑦−1 − 𝜎𝑆𝑢𝑟 2 , 𝜎𝑠𝑢𝑟 ),𝑎 > 1 2(1−𝜌𝑁𝐴𝐴 2 ) deviations are distributed as stationary AR(1) Observation Model 2.8 Expected catch at age 𝐹𝑎,𝑦 𝐶̂𝑎,𝑦 = 𝑁𝑎,𝑦 (1 − 𝑒 −𝑀−𝐹𝑎,𝑦 ) is a function of 𝐹𝑎,𝑦 +𝑀 abundance and mortality 2.9 Expected total catch 𝐶̂𝑦 = ∑𝐴𝑎=1 𝐶̂𝑎,𝑦 𝑊𝑎,𝑦 in weight is a function of catch at age and weight at age and year 2.10 Expected catch ̂ 𝐶𝑎,𝑦 proportion at age is a 𝑃̂𝑎,𝑦 = ∑𝑎=1 𝐶̂𝑎,𝑦 𝐴 function of age specific values and total 2.11 Expected survey 𝐼̂𝑎,𝑦,𝑖 = 𝑞𝑖 𝑠𝑎,𝑖 𝑁𝑎,𝑦 𝑒 (−𝐹𝑎,𝑦 −𝑀)𝑓𝑦,𝑖 index in weight for each age, year, and index 2.12 Expected total survey 𝐼̂𝑦,𝑖 = ∑𝐴𝑎=1 𝐼̂𝑎,𝑦,𝑖 index in weight is a sum of age-specific values 2.13 Expected index 𝐼̂𝑎,𝑦,𝑖 𝑝̂𝑎,𝑦,𝑖 = proportion at age is a 𝐼̂𝑦,𝑖 function of age specific values and total Likelihood Components 2.14 Observed total catch 𝜎𝐶2𝑦 is derived from the log(𝐶𝑦 ) ~𝑁(log(𝐶̂𝑦 ) − , 𝜎𝐶2𝑦 ) 2 expected values and lognormal error (𝐶) 2.15 Observed catch 𝑃𝑎,𝑦 ~𝑀𝑁(𝑃̂𝑎,𝑦 , 𝑛𝑦 ) proportion at age is derived from multinomial distribution with input effective sample size 2.16 Observed total survey 𝜎𝐼2𝑦,𝑖 index is derived from log(𝐼𝑦,𝑖 ) ~𝑁(log(𝐼̂𝑦 ) − , 𝜎𝐼2𝑦,𝑖 ) 2 expected values and lognormal error 81 Table 2.2 (cont’d) 2.17 Observed index (𝐼) 𝑝𝑎,𝑦,𝑖 ~𝑀𝑁(𝑝̂𝑎,𝑦,𝑖 , 𝑛𝑦,𝑖 ) proportion at age is derived from multinomial distribution with input effective sample size Derived Quantities and Simulation Output 2.18 Average Recruitment ∑𝑌𝑦=𝑌−3 𝑁1,𝑦 in terminal 3 years ̅1,(𝑌−3:𝑌) = 𝑁 3 𝐴 2.19 Spawning stock (𝑆) (𝑆) biomass 𝐵𝑦 = ∑ 𝑁𝑎,𝑦 𝑊𝑎,𝑦 𝑚𝑎,𝑦 𝑒 𝑓(−𝑀−𝐹𝑎,𝑦 ) 𝑎=1 2.20 Exploitation rate of ̂ 𝐶𝑦 stock 𝐸𝑦 = ∑𝐴𝑎=1 𝑁𝑎,𝑦 𝑊𝑎,𝑦 2.21 Average spawning (𝑆) 𝐵𝑇 stock biomass trend in 𝐷 = (𝑆) recent 10 years 𝐵𝑇−10 2.22 Relative Error of a 𝑥̂ − 𝑥̃ performance metric 𝑅𝐸𝑥 = 𝑥̂ In the selected estimation model, fully selected fishing mortality in each year, 𝐹𝑦 , was a fixed effect parameter. Fishing mortality at age and year, 𝐹𝑎,𝑦 , for a<7 was the product of the fully selected value, 𝐹𝑦 and selectivity at age and year, 𝑠𝑎,𝑦 (Eq. 2.3). For age 7 and above 𝐹𝑎,𝑦 = 𝐹𝑦 . Selectivity at age and year was a function of an age specific mean, 𝑠̅𝑎 , and age- and year- (𝑆𝑒𝑙) specific selectivity deviations, 𝜀𝑎,𝑦 , and constrained to be between zero and one (Eq. 2.4). (𝑆𝑒𝑙) Deviations from the mean selectivity on the logit scale, 𝜀𝑎,𝑦 were random effects correlated among years but not among ages (Eq. 2.5). The standard deviation of this distribution, 𝜎𝑆𝑒𝑙 , and the correlation coefficient, 𝜌𝑆𝑒𝑙 , were fixed effects. After the initial year, abundance at age in the selected estimation model was a function of abundance at age of the previous age in the previous year, expected survival, itself a function of instantaneous fishing mortality, 𝐹𝑎,𝑦 , and natural mortality, 𝑀, and process error on expected (𝑆𝑢𝑟) survival, 𝜀𝑎,𝑦 (Eq. 2.6). Abundance in the 9+ group also included the number of 9+ individuals 82 that survived since the previous year. This parameterization mimics how process errors in numbers at age were incorporated in the “full state-space model” described previously in Nielsen and Berg, (2014) and Miller et al. (2016). The expected survival process errors had an AR(1) structure with correlation among years but not ages. The standard deviation for the expected survival process error 𝜎𝑆𝑢𝑟 , was estimated as a fixed effect, and the correlation coefficient was the same as that for the recruitment process error, 𝜌𝑁𝐴𝐴 (Eq. 2.7). The predicted catch at age in numbers, 𝐶̂𝑎,𝑦 , followed the Baranov catch equation (Eq. 2.8). The predicted total catch in weight (i.e., yield), 𝐶̂𝑦 , was a sum of the products of catch at age and weight at age, 𝑊𝑎,𝑦 , across ages (Eq. 2.9) The predicted proportions at age, 𝑃̂𝑎,𝑦 , were derived from the total catch in numbers and catch at age (Eq. 2.10). The predicted survey index at age and year and survey was the product of abundance at age, survey-specific catchability, 𝑞𝑖 , age- and survey- specific selectivity, 𝑠𝑎,𝑖 , and expected survival (adjusted for the fraction of the year that has elapsed at the time of the survey, 𝑓𝑦,𝑖 ) (Eq. 2.11). Ages 4-9 were assumed to be fully susceptible to survey gear in the fall and ages 5-9 were assumed to be fully susceptible to survey gear in the spring; hence survey selectivity was fixed at 1, apart from survey selectivity for age 5 individuals in the spring which was fixed at 0.9. Unlike for the fishery, age-specific selectivity for the surveys was time-invariant. The predicted total survey index value, 𝐼̂𝑦,𝑖 , and predicted survey proportions at age, 𝑝̂𝑎,𝑦,𝑖 , were calculated from the age-specific indices in the same way as the fishery total catch and proportions were from the age-specific values (Eqs. 2.12- 2.13). The observed total catch in each year, 𝐶𝑎,𝑦 , was assumed to have a lognormal distribution with mean equal to the predicted catch and the year-specific standard deviation that was input as data, 𝜎𝐶𝑦 (Eq. 2.14, Appendix 2A). The observed proportions were assumed to have a 83 (𝐶) multinomial distribution with year specific effective sample sizes 𝑛𝑦 , which were also input (Eq. 2.15, Appendix 2A). The observed total index values and proportions at age of the indices followed the same distributions (lognormal and multinomial) as total catch and catch proportions at age but had unique standard deviations and effective sample sizes, respectively that determined the degree of variation (Eq. 2.16-2.17, Appendix 2A). The estimation model was fit by maximizing the marginal likelihood in which the random effects on recruitment, expected survival, and selectivity were integrated from the joint distribution of random effects and data. Therefore, the objective function was minimized by varying the fixed effects. Mean recruitment, numbers at age in the first age, fishing mortality, and process variance parameters were estimated on the log-scale and survey catchability and selectivity and AR(1) process error correlation coefficients were estimated on the logit scale. The predicted values for the random effects and derived quantities were those that maximized the joint likelihood of data and process variability, conditioned on the estimates of the fixed effects, using the epsilon method for bias correction (Thorson and Kristensen, 2016). 2.2.3 Simulation/Operating Model Scenarios Data were simulated using the fitted WHAM model described above. The simulated data had the same number of age groups and years, the same natural mortality, maturity, weight at age at capture and spawning, numbers at age in the oldest age group in the first year, and observation error variance of the fall and spring surveys as estimated or assumed for the Gulf of Maine haddock assessment. In all simulation scenarios, data were simulated using most of the same fixed effect parameter estimates, including the same mean recruitment, yearly fully selected fishing mortality, mean selectivity at age, numbers at age for ages 1-6 in the first year, catchability and selectivity of the index data, and correlation coefficients of the process errors in 84 numbers at age or selectivity. The difference among simulation scenarios was the process or observation error variance or the effective sample size used in the simulation model. In the (𝑅𝑒𝑐) (𝑆𝑢𝑟) baseline simulation scenario, random effects on recruitment, 𝜀𝑦 , expected survival, 𝜀𝑎,𝑦 , and (𝑆𝑒𝑙) selectivity, 𝜀𝑎,𝑦 , were drawn from their respective distributions using the mean (for recruitment and selectivity), correlation coefficients, and standard deviations estimated in the model fit to Gulf of Maine haddock. The randomly generated values of deviations in yearly recruitment, survival, and selectivity then informed the realized catch and index values, 𝐶̃𝑦 , 𝑃̃𝑎,𝑦 , 𝐼̃𝑦 , and 𝑝̃𝑎,𝑦 (i.e., Eqs. 2.8-2.13 but 𝑋̃ rather than 𝑋̂) and the simulated catch and index values were a function of these realized values and observation error (i.e., Eqs. 2.14-2.17, but 𝑋̃ rather than 𝑋̂). Each alternative simulation scenario assumed that one of the process or observation error variance values differed from the baseline (Table 2.3), while the others were kept at their baseline values. The first set of simulation scenarios addressed how increases or decreases in the variance of process error in recruitment, expected survival, or selectivity affected model performance (Table 2.3A). The four alternative variances used for each source of process variability are categorized as “very low,” “low,” “high,” and “very high.” The “very low” simulation scenario meant the variance was set to 0. For the recruitment or age-specific selectivity processes, this meant that the year-specific values equaled the mean, R ̅ or s̅ a , respectively. For the expected survival process, this meant that the survival was deterministic and abundance at age did not deviate from what was expected given F𝑎,𝑦 and M and abundance at age in the previous year. The “low” simulation scenario meant the standard deviation of each process error distribution was half of the baseline value. “High” and “very high” scenarios for expected survival or selectivity meant the standard deviation was double or three times the baseline value. However, because the estimated standard deviation of recruitment process was 85 already quite high (1.17) the “high” and “very high” treatments were constrained to 1.5 and 2, which are 1.28 and 1.7 times the baseline value, respectively. Because there were four different levels of process error standard deviation for each of the three types of process errors, plus baseline levels, there were 13 total simulation scenarios for the process error (including the baseline). Table 2.3. Process error standard deviations in simulation scenarios (A) and multiplier for year- and index- specific observation error standard deviation (total catch and index value) and effective sample sizes (proportions at age) (B). Only increases in observation error variance were tested as these reflected increased “noisiness” in the simulated data and the consequences of having poor data. (A) Process Error Base Very Low Low High Very High 𝜎𝑅 1.17 0.00 0.58 1.50 2.00 𝜎𝑆𝑢𝑟 0.19 0.00 0.10 0.40 0.60 𝜎𝑆𝑒𝑙 0.30 0.00 0.15 0.60 0.90 (B) Observation Error Base High Very High 𝜎𝐶𝑦 x1.0 x1.1 x1.5 𝜎𝐼𝑦,𝑖 x1.0 x1.1 x1.5 (𝐶) x1.0 x0.9 x0.5 𝑛𝑦 (𝐼) x1.0 x0.9 x0.5 𝑛𝑦,𝑖 The second set of simulation scenarios involved decreasing the “quality” (i.e., increasing the “noisiness”) of the simulated data by either increasing the observation error standard deviation of the total catch or total index, or decreasing the effective sample size of the proportions at age of the catch or index, both of which assume that the observed data can deviate more from the predicted values than is assumed in the baseline model (Table 2.3B). Observation error variance or effective sample size was adjusted accordingly in the estimation models, so the data quality was poorer but correctly specified. The process error standard deviations were kept at the baseline values. The standard deviation of the lognormal error around total catch or total 86 index value (for both the spring and fall surveys), which were year-specific values, were increased by a factor of 1.1 or 1.5 for “high” or “very high” treatments, respectively. The effective sample size of the multinomial distribution for the catch proportions at age or index proportions at age (for both the spring and fall surveys), which were also year-specific values, were decreased by a factor of 0.9 or 0.5 for a “high” or “very high” treatment, respectively. As in the process error simulation scenarios, each component was tested individually, so in total there were nine simulation scenarios for observation error (including the baseline). 2.3.4 Alternative Estimation Model Scenarios The estimation model described in section 2.3.2 was fit to simulated data. The estimation model specified yearly deviations in recruitment, expected survival, and selectivity as correlated random effects. Seven other estimation models were also used to fit the data, each which specified one or more of these processes as a fixed effect or constant over time, rather than a random effect. Each of the eight versions of the estimation model were designated by a three- letter code that indicate whether the recruitment, expected survival, and selectivity processes were modeled as a random effect (R), fixed effect (F) or constant (C), in that order. For example, the estimation model described in section 2.3.2 was designated “RRR” and the model that assumed recruitment and expected survival were random effects but selectivity was constant was “RRC”. When recruitment was a fixed effect (F), the year-specific deviations were estimated parameters and their distribution was not constrained with a probability function. When a value was constant, the year-specific deviations were fixed at zero. When applied to expected survival, “constant” means no variation in the survival process, so survival is deterministic and abundance is only a function of abundance in the previous year and total mortality. Not every combination of random effect, fixed effect, or constant parameterization could be tested because of the 87 structure of WHAM. The intent with fitting these additional alternative versions of the estimation model was to examine the consequences of ignoring process error when it is present (in most cases) or trying to estimate it when it is absent (in the “very low” scenarios). The “FCC” model, which specifies yearly recruitment as a fixed effect parameter, expected survival as deterministic, and selectivity as constant, mimics the statistical catch-at-age model (SCAA) currently employed to assess many fisheries in the northwest Atlantic using program ASAP. Within each simulation scenario of process error variability or data quality, 100 data sets were simulated and fit using the eight estimation models. If the model did not converge, and/or did not result in an invertible Hessian matrix, the model was considered a failure and excluded from the analysis. Model performance was also evaluated by retrospective analysis where retrospective patterns were quantified by Mohn’s Rho calculated for spawning stock biomass, fishing mortality, and recruitment (Hurtado-Ferro et al., 2015). 2.4.5 Performance of Assessment Models Several metrics were used to evaluate model performance in fitting to simulated data, each of which reflected values of interest in fisheries management. Mean recruitment in the terminal three years, 𝑁̅1,(𝑌−3:𝑌) , spawning stock biomass in the terminal year, 𝐵𝑌(𝑆) , exploitation rate in the terminal year, 𝐸𝑌 , and depletion in spawning stock biomass in the terminal 10 years, 𝐷, were compared against the true values used to simulate the data (Eqs. 2.18-2.21). Depletion was defined as the ratio between spawning stock biomass in the terminal year and spawning stock biomass 10 years prior to the terminal year (Eq. 2.21). Relative error of a given model performance metric 𝑥 was the difference between the estimated, 𝑥̂, and realized, 𝑥̃, values relative to the realized value (Eq. 2.22). 88 For estimation models that assumed one or more process was a random effect the distributions of the estimated standard deviation and correlation coefficient of the AR(1) process of the random effect were compared against the true value used to generate the data. Convergence rate, the bias (systemic over- or under- estimation of a quantity), and precision (spread of relative error) were compared across treatments. For each of the thirteen simulation scenarios for process error variability, and nine simulation scenarios for observation error variability, simulated data were fit using eight estimation models. Thus, there were 104 simulation/estimation combinations for the process variability scenarios and 72 such combinations for the observation error scenarios. 2.4 Results The median convergence rate across the 104 simulation/estimation combinations for process error scenarios was 90% and ranged from 7-100% (Table 2.4A). Among process variability scenarios, average convergence rate was highest when the recruitment variability was very low (94%) or low (95%) and lowest when the recruitment variability was high (73%) or expected survival variability was very low (74%) or very high (76%). Models converged most frequently when recruitment included a random effect, expected survival was deterministic, and selectivity was constant (RCC) (98%), and least frequently when recruitment was constant, survival was deterministic, and selectivity included a random effect (CCR) (76%). The convergence rate was approximately the same across observation error simulation scenarios, as were the relative convergence rates for the different estimation models (Table 2.4B). 89 Table 2.4. Convergence rate in process variability simulation scenarios (A) and observation error simulation scenarios (B) across 100 simulated data sets. The criteria for convergence were that the model had to complete the optimization without error and all the parameters and standard error values had to be estimated (i.e., the Hessian matrix was invertible). (A) SIMULATION SCENARIO ESTIMATION MODELS Recruitment Survival Selectivity RRR RRC RCR RCC FCR FCC CCR CCC Average Median Min Max Baseline Baseline Baseline 94 100 93 96 81 88 82 84 89.75 90.5 81 100 Very High Baseline Baseline 90 100 94 97 38 46 61 65 73.88 77.5 38 100 High Baseline Baseline 91 100 94 98 71 77 65 77 84.13 84 65 100 Low Baseline Baseline 94 100 99 98 94 99 84 97 95.63 97.5 84 100 Very Low Baseline Baseline 91 93 94 98 90 98 97 98 94.88 95.5 90 98 Baseline Very High Baseline 87 98 90 90 54 59 59 73 76.25 80 54 98 Baseline High Baseline 86 98 89 98 74 87 77 81 86.25 86.5 74 98 Baseline Low Baseline 54 84 95 99 85 94 81 89 85.13 87 54 99 Baseline Very Low Baseline 7 48 98 99 82 96 79 86 74.38 84 7 99 Baseline Baseline Very High 97 98 95 95 92 90 77 85 91.13 93.5 77 98 Baseline Baseline High 92 100 96 100 88 92 73 82 90.38 92 73 100 Baseline Baseline Low 84 98 94 100 77 93 80 84 88.75 88.5 77 100 Baseline Baseline Very Low 85 98 86 100 64 91 72 82 84.75 85.5 64 100 Average 80.92 93.46 93.62 97.54 76.15 85.38 75.92 83.31 Median 90 98 94 98 81 91 77 84 Min 7 48 86 90 38 46 59 65 Max 97 100 99 100 94 99 97 98 90 Table 2.4 (cont’d) (B) SIMULATION SCENARIO ESTIMATION MODELS Total Catch Total Index Catch Proportions Index Proportions RRR RRC RCR RCC FCR FCC CCR CCC Average Median Min Max Baseline Baseline Baseline Baseline 94 100 93 96 81 88 82 84 89.75 90.5 81 100 Very High Baseline Baseline Baseline 97 100 94 98 84 93 87 93 93.25 93.5 84 100 High Baseline Baseline Baseline 90 100 93 97 83 88 80 86 89.63 89 80 100 Baseline Very High Baseline Baseline 94 99 91 98 77 88 80 86 89.13 89.5 77 99 Baseline High Baseline Baseline 90 97 93 97 76 84 77 81 86.88 87 76 97 Baseline Baseline Very Low Baseline 84 95 84 100 63 86 56 84 81.5 84 56 100 Baseline Baseline Low Baseline 91 98 90 97 86 90 75 82 88.63 90 75 98 Baseline Baseline Baseline Very Low 82 97 93 98 75 82 77 80 85.5 82 75 98 Baseline Baseline Baseline Low 93 98 99 98 77 89 71 88 89.13 91 71 99 Average 90.56 98.22 92.22 97.67 78 87.56 76.11 84.89 Median 91 98 93 98 77 88 77 84 Min 82 95 84 96 63 82 56 80 Max 97 100 99 100 86 93 87 93 91 Spawning stock biomass in the terminal year was relatively unbiased across all estimation models in all process variance simulation scenarios (Figure 2.1). Precision was lower in estimation models that assumed recruitment was constant (C--), except when the true recruitment variance was zero (left column, bottom row). Spawning stock biomass was estimated with low precision when the variation in expected survival was high or very high (middle column, top two rows), except when expected survival was included as a random effect. Average recruitment in the terminal three years was relatively unbiased across all estimation models in all process variance simulation scenarios but precision was lower in estimation models that assumed constant recruitment (Figure 2.2). When variability in selectivity was very high (right column, top row) and the estimation model incorrectly assumed it was constant and treated recruitment and expected survival as random effects (RRC), average recruitment was positively biased. The most precise and relatively unbiased estimates of average recruitment in the terminal three years were in cases where the true recruitment variance was zero (left column, bottom row). Exploitation rate in the terminal year was relatively unbiased except when the model incorrectly assumed recruitment was constant in which case it was underestimated (Figure 2.3). Exploitation rate was underestimated when selectivity variability was very high (right column, top row) and the model assumed it was constant and recruitment and expected survival were random effects (RRC), suggesting a tradeoff between exploitation rate and recruitment estimates. Precision in estimated exploitation rate was low when expected survival process variance was high or very high (middle column, top two rows) and the model assumed the expected survival was deterministic. Depletion was estimated without bias across all estimation models and process variance scenarios (Figure 2.4). Depletion had low precision when recruitment was 92 assumed to be constant and high precision when the variance in expected survival was zero (middle column, bottom row). Recruitment process error standard deviation was estimated without substantial bias in most estimation models and process variance simulation scenarios (Figure 2.5). When the true recruitment variance was zero or very low (left column, bottom two rows), the standard deviation was unbiased, as long as the estimation model assumed process error in expected survival, otherwise the variance was overestimated. The recruitment standard deviation was also overestimated when the true variation in expected survival was high or very high (middle column, top two rows) and the estimation model assumed the expected survival was deterministic. The survival process error standard deviation was underestimated in almost all scenarios, and only substantially overestimated when the true expected survival variance was zero and the model assumed expected survival process variation was present (Figure 2.6). The recruitment and expected survival process error correlation coefficient, which quantifies inter- year correlations was unbiased in most simulation scenarios, except when selectivity variation was very high, and the estimation model assumed selectivity was constant (Figure 2B.1). The selectivity process error standard deviation was underestimated in estimation models that assumed all processes were random effects (RRR) but was relatively unbiased or overestimated in other estimation models across other process error simulation scenarios (Figure 2.7). In particular, the standard deviation was substantially overestimated, sometimes by a factor of 2 or 3 when selectivity was the only source of process variability in the model (CCR), and the bias became more extreme as either recruitment, expected survival, or selectivity variability increased. The selectivity process error correlation coefficient was underestimated in all 93 scenarios, and the bias was greater when selectivity was the only source of process variability (Figure 2B.2). The Mohn’s Rho statistics showed low retrospective bias and high precision in spawning stock biomass or average fishing mortality, except in estimation models where recruitment was incorrectly specified as constant (Figures 2B.4-2B.5). When recruitment was specified as a random effect, there were positive retrospective patterns in recruitment, and when specified as a fixed effect, the retrospective patterns were less (Figure 2B.6). Increasing the variability in total catch or index data or the proportions at age did not noticeably change the precision or accuracy in model output, though the precision was lower in estimation models that assumed recruitment was constant across all observation error operating model scenarios (Figures 2B.7-2B.9). The retrospective patterns in spawning stock biomass, average fishing mortality, and recruitment was also equivalent across observation error scenarios (Figures 2B.10-2B.12). Recruitment and selectivity process error standard deviation, and both correlation coefficients were estimated about as well across observation error operating model scenarios (Figures 2B.13-2B.17). 94 Figure 2.1. Relative error in spawning stock biomass in the terminal year across process variability operating model scenarios. The black lines indicated zero relative error. Results presented in each column are for when process error variance was set at a range of values (left-recruitment, middle-survival, right-selectivity) with the standard deviations for the other processes at baseline values. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). The number above each boxplot indicates the number of models that converged and produced positive definite hessians, out of 100 simulated data sets. The black dot is the median and the red dot is the average relative error across the converged models. 95 Figure 2.2. Relative error in average recruitment in the terminal 3 years across process variability operating model scenarios. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C), in that order. See caption of Figure 2.1 for description of plot elements. 96 Figure 2.3. Relative error in average exploitation (predicted catch biomass/total biomass) in the terminal year across process variability operating model scenarios. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C), in that order. See caption of Figure 2.1 for description of plot elements. 97 Figure 2.4. Relative error in depletion in the terminal 10 years across process variability operating model scenarios. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C), in that order. See caption of Figure 2.1 for description of plot elements. 98 Figure 2.5. Estimated recruitment process error standard deviation across process variability operating model scenarios. Black lines indicate the true recruitment standard deviation. Results presented in each column are for when process error standard deviation was set at a range of values (left-recruitment, middle-survival, right-selectivity) with the standard deviations for the other processes at baseline values. Note that results in all three columns are for recruitment process variation. When the survival (middle column) or selectivity (right column) standard deviations are changed from their baseline value, the true recruitment standard deviation remains constant at its baseline value of 1.176 as indicated by the horizontal lines. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Only estimation models that specify recruitment as a random effect (R--) are represented. See caption of Figure 2.1 for description of plot elements. Note slight differences in the y-axis. 99 Figure 2.6. Estimated expected survival process error standard deviation across process variability operating model scenarios. Black lines indicate the true expected survival standard deviation. Results presented in each column are for when process error standard deviation was set at a range of values (left-recruitment, middle-survival, right-selectivity) with the standard deviations for the other processes at baseline values. Note that results in all three columns are for survival process variation. When the recruitment (left column) or selectivity (right column) standard deviations are changed from their baseline value, the true expected survival standard deviation remains constant at its baseline value of 0.19 as indicated by the horizontal lines. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Only estimation models that specify survival as a random effect (-R-) are represented. See caption of Figure 2.1 for description of plot elements. Note slight differences in the y-axis. 100 Figure 2.7. Estimated expected selectivity process error standard deviation across process variability operating model scenarios. Black lines indicate the true selectivity standard deviation. Results presented in each column are for when process error standard deviation was set at a range of values (left-recruitment, middle-survival, right-selectivity) with the standard deviations for the other processes at baseline values. Note that results in all three columns are for selectivity process variation. When the recruitment (left column) or expected survival (middle column) standard deviations are changed from their baseline value, the true selectivity standard deviation remains constant at its baseline value of 0.3 as indicated by the horizontal lines. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Only estimation models that specify selectivity as a random effect (--R) are represented. See caption of Figure 2.1 for description of plot elements. Note slight differences in the y-axis. 101 2.5 Discussion Even under a wide range of true variability in recruitment, survival, and selectivity, estimation models that assumed recruitment varied and treated them as either random or fixed effects estimated spawning stock biomass, recruitment, exploitation, and depletion without substantial bias. Estimation models that assumed constant recruitment performed poorly under most circumstances, but recruitment is rarely, if ever, assumed constant in an integrated age- based stock assessment model because many biotic and abiotic factors will influence the production of eggs and their survival to recruitment from year to year (Maunder and Thorson, 2018b). Though several estimation models were used in this study to provide contrast and elucidate the influence of assuming random effects in each process individually or in combination, only a few resemble those that are conventionally employed in stock assessment. The FCC resembles the architecture of a program like ASAP and a traditional statistical-catch-at- age model (Legault and Restrepo, 1998). When to apply a state-space model that includes process error on everything (RRR), b) a “reduced” version of RRR that specifies only one or two of the processes as random effects, or c) a statistical-catch-at-age model for a particular system will depend on the assumed degree of process and observation error variance and whether the model converges. Ultimately, the decision to employ a state-space model depends strongly on what quantities of interest need to be estimated with high accuracy and if modelers want estimates of process variance that are estimated within the model. Internal estimates of process variance may be preferable in fisheries management because they may better reflect the range of future recruitment, abundance, and spawning stock biomass. If the recruitment or expected survival process variation is suspected to be very high, then it is better to employ an estimation model that specifies that both processes are random effects. 102 Such models tended to have higher convergence rates and less biased model estimates compared to models that specify recruitment as a fixed effect and survival as deterministic. Similar research that tested the estimability of recruitment and recruitment standard deviation in a deterministic, fixed effect, penalized likelihood, or marginal likelihood framework found similar results, i.e., that having a constraint on recruitment deviations leads to less bias in estimated recruitment and biomass (Maunder and Deriso, 2003; Thorson et al., 2019). If there is reason to believe the survival is deterministic (i.e., that the abundance at age can be fully explained by the previous abundance at age and mortality), the estimation model should not include survival as a random effect, because the convergence rate of such models under these conditions was very low. When recruitment or survival process variability is low RR- models are likely trying to estimate phantom variability and misallocating variance in recruitment or selectivity to expected survival (more on this discussed below). In the present case, there is little disadvantage to assuming recruitment or selectivity process error is present when it is absent. This aligns with similar work that found that specifying process error in selectivity when there is none did not create bias, as long as data were weighted correctly (Cronin-Fine and Punt, 2021; Martell and Stewart, 2014; Stewart and Monnahan, 2017). However, this conclusion is contingent upon the parameterization of selectivity, as other work has found that sometimes estimating time-varying selectivity when it varies minimally can introduce substantial bias (Linton and Bence, 2011). Modelers can anticipate the degree of variability in recruitment or expected survival before choosing among estimation models using information outside of the assessment model and/or by understanding what influences interannual variability. Recruitment variability has been weakly linked to fecundity, length of spawning season, and other life history characteristics, and to environmental conditions, though such relationships tend to become unstable over time 103 (Brosset et al., 2020; Cushing, 1990; Mertz and Myers, 1996; Rose et al., 2001). Thorson et al., (2014) developed a hierarchical stock-recruitment model that estimated the standard deviation and inter-annual correlation coefficient of recruitment for 154 marine species summarized across taxa, which could serve as a reference, though this work quantified variation in addition to that explained by a stock-recruitment relationship. Variation in expected survival includes any change in the numbers-at-age that is not accounted for in the exponential total mortality function. Expected survival variability can reflect immigration/emigration from the system, unaccounted for process variability in natural mortality or selectivity, or catch misreporting, which may be detected from process error residuals (Stock and Miller, 2021). Therefore, if movement is not occurring and other process variability is correctly specified, such as on natural mortality, expected survival can reasonably be excluded from the random effects in the estimation model. If substantial movement or stock mixing is occurring, models can account for this though auxiliary information like tagging data, and/or restructuring management units to reflect closed populations (Berger et al., 2021; Bosley et al., 2022; Goethel et al., 2021). Therefore state-space stock assessment models may perform best when such auxiliary information about the stock movement is routinely collected. The estimation model that specified random effects in recruitment, expected survival, and selectivity (RRR) did not reliably partition variability across the three processes in all circumstances. When all three processes varied, the standard deviation in recruitment was accurately estimated but expected survival and selectivity variability were underestimated, a phenomenon noted between recruitment and survival in Trijoulet et al., (2020). This underestimation may be because the survival and selectivity processes are confounded, regardless of the degree of process variation, or that the AR(1) process constrains the simulated 104 values to be closer to the mean and the full extent of variation is not realized with only 42 years of data (Linton and Bence, 2008). When recruitment and selectivity are at their baseline variability but expected survival variation is high, some will be allocated to recruitment. The problem of variance not being apportioned correctly was exacerbated when one or more processes were incorrectly specified. There is an apparent tradeoff between expected survival and selectivity process variance, because when one was assumed fixed (RCR and RRC), the standard deviation of the other process is overestimated. When only one source of variability is specified, either the recruitment or selectivity, in the RCC and CCR models, respectively, that process must absorb all model variance, so the standard deviation is substantially overestimated. The decision to employ a state-space model should not depend on perceived quality of the index or catch data. Other work has investigated the estimability of observation and process error variance more closely and noted trade-offs between data weighting or data availability (increased number of surveys and auxiliary data) and correctly specifying the process model (Dickey-Collas et al., 2015; Stewart and Monnahan, 2017). Within the scope of this work, having quality catch proportion at age data slightly increased the accuracy of expected survival standard deviation estimates. Age composition likelihoods like the Dirichlet-multinomial that include data-weighting parameters have been explored as an alternative to the multinomial distribution when there is time-varying selectivity (Albertsen et al., 2017; Cronin-Fine and Punt, 2021; Xu et al., 2020). The use of such alternative likelihoods may allow for estimation of observation error variance, which was not realized in my model. However, it may exacerbate the issue of correctly apportioning variability and is a topic of future investigation. Intriguingly, I saw little decline in the precision of biomass or exploitation estimates when data quality was low or process error in recruitment was substantial. This suggests that 105 assessments of comparable quality might be achieved with alternative data collection schemes involving fewer trips or sampled fish. Gulf of Maine haddock is a data rich case with multiple survey indices, so increasing the variance by no more than 50% and limiting the investigation to one data source at a time may not have been extensive enough to investigate how noisy data can still lead to unbiased and precise estimates. Future work may consider reductions of data quality in several data sources simultaneously, or removal of whole surveys or years of data. As research in data-limited or data-poor methods in fisheries becomes increasingly necessary to support more global fisheries, state-space models may be an important avenue of investigation (Cope et al., 2023). I did not use AIC or DIC as a model selection criterion but rather based model performance on bias and precision of relative error and retrospective patterns. Comparing the likelihood between state-space and non-state-space models can sometimes lead to choosing over- fitted models with poor predictive abilities (Trijoulet et al., 2020). Contemporary model selection is not based solely on AIC or DIC, but rather also using retrospective patterns, consistency, prediction skill, and (of particular importance in state-space models) one step ahead residuals (Carvalho et al., 2021; Trijoulet et al., 2023). Using retrospective patterns to select among models with different selectivity parameterization can lead to better performing models than if selection was done based on DIC (Linton and Bence, 2011). A state-space model is categorically recommended instead of a SCAA model for its ability to incorporate more stochasticity and estimate process error standard deviations. Correctly quantifying the process error variability in recruitment, expected survival, and selectivity can lead to model projections that are more likely to reflect reality. Accurate estimates of recruitment process variance in particular are vital to predicting the future trajectory of a stock because it can be used to approximate extinction probability and inform recovery plans (Maunder and Deriso, 106 2003). Using anything except the state-space model that assumes random effects on recruitment, expected survival, and selectivity is only recommended under a few circumstances explored in this study, which the modeler may be able to determine a priori. Though this work offers guidelines, its findings are rooted in a particular stock and a model with the variance of observation error distributions specified a priori. The observation error variance was fixed at the same value used in the SCAA model which may have been too large because observation error in SCAA tends to account for process variability which the state-space model explicitly specifies. Future work should expand this simulation approach to include even greater observation error bias, greater combinations of low/high variance in processes like recruitment, expected survival, selectivity, and with alternative observation error likelihoods that allow for self-weighting and variance estimation like the Dirichlet-multinomial, logistic normal, or multivariate Tweedie. 107 REFERENCES Aanes, S., Aeberhard, W.H., Albertsen, C.M., Aldrin, M., Babyn, J., Berg, C.W., Breivik, O., Cook, R., Flemming, J.M., Fryer, R., Liljestrand, E.M., Millar, C., Miller, T.J., Minto, C., Newman, K.B., Nielsen, A., Perreault, A., Regular, P., Skaug, H.J., Spence, M., Trijoulet, V., Vatnehol, S., 2020. Workshop on the review and future of state space stock assessment models in ICES (WKRFSAM). ICES Sci. Rep. 2:32. 1-23. http://doi.org/10.17895/ices.pub.6004. Akaike, H., 1998. Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike, pp. 199-213. Ed. By E. Parzen, K. Tanabe, and G. Kitagawa. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-1694-0_15. Albertsen, C.M., Nielsen, A., Thygesen, U.H., 2017. Choosing the observational likelihood in state-space stock assessment models. Can. J. Fish. Aquat. Sci. 74, 779-789. https://doi.org/10.1139/cjfas-2015-0532. Berg, C.W., Nielsen, A., 2016. Accounting for correlated observations in an age-based state- space stock assessment model. ICES J. Mar. Sci. 73, 1788-1797. https://doi.org/10.1093/icesjms/fsw046. Berger, A.M., Deroba, J.J., Bosley, K.M., Goethel, D.R., Langseth, B.J., Schueller, A.M., Hanselman, D.H., 2021. Incoherent dimensionality in fisheries management: Consequences of misaligned stock assessment and population boundaries. ICES J. Mar. Sci. 78, 155-171. https://doi.org/10.1093/icesjms/fsaa203. Bosley, K.M., Schueller, A.M., Goethel, D.R., Hanselman, D.H., Fenske, K.H., Berger, A.M., Deroba, J.J., Langseth, B.J., 2022. Finding the perfect mismatch: Evaluating misspecification of population structure within spatially explicit integrated population models. Fish Fish. 23, 294-315. https://doi.org/10.1111/faf.12616. Brosset, P., Smith, A.D., Plourde, S., Castonguay, M., Lehoux, C., and Van Beveren, E., 2020. A fine-scale multi-step approach to understand fish recruitment variability. Sci. Rep., 10, 16064. https://doi.org/10.1038/s41598-020-73025-z. Cadigan, N.G., 2015. A state-space stock assessment model for northern cod, including under- reported catches and variable natural mortality rates. Can. J. Fish. Aquat. Sci. 308, 1-13. https://doi.org/10.1139/cjfas-2015-0047. Carvalho, F., Winker, H., Courtney, D., Kapur, M., Kell, L., Cardinale, M., Schirripa, M., Kitakado, T., Yemane, D., Piner, K.R., Maunder, M.N., Taylor, I., Wetzel, C.R., Doering, K., Johnson, K.F., Methot, R.D., 2021. A cookbook for using model diagnostics in integrated stock assessments. Fish. Res. 240, 105959. https://doi.org/10.1016/j.fishres.2021.105959. 108 Cope, J. M., Dowling, N. A., Hesp, S. A., Omori, K. L., Bessell-Browne, P., Castello, L., Chick, R., Dougherty, D., Holmes, S.J., McGarvey, R., Ovando, D., Nowlis, J., Prince, J., 2023. The stock assessment theory of relativity: deconstructing the term “data-limited” fisheries into components and guiding principles to support the science of fisheries management. Rev. Fish. Sci. 33, 241-263. https://doi.org/10.1007/s11160-022-09748-1. Cronin-Fine, L., Punt, A.E., 2021. Modeling time-varying selectivity in size-structured assessment models. Fish. Res. 239, 105927. https://doi.org/10.1016/j.fishres.2021.105927. Cushing, D. H., 1990. Plankton production and year-class strength in fish populations: an update of the match/mismatch hypothesis. Adv. Mar. Biol. 26, 249-293. https://doi.org/10.1016/S0065-2881(08)60202-3. De Valpine, P., Hilborn, R., 2005. State-space likelihoods for nonlinear fisheries time-series. Can. J. Fish. Aquat. Sci. 62, 1937-1952. https://doi.org/10.1139/f05-116. Deroba, J.J., Schueller, A.M., 2013. Performance of stock assessments with misspecified age- and time-varying natural mortality. Fish. Res. 146, 27-40. https://doi.org/10.1016/j.fishres.2013.03.015. Dickey-Collas, M., Hintzen, N.T., Nash, R.D.M., Schon, P.-J., Payne, M.R., 2015. Quirky patterns in time-series of estimates of recruitment could be artefacts. ICES J. Mar. Sci. 72, 111-116. https://doi.org/10.1093/icesjms/fsu022. Francis, R.I.C.C., 2011. Corrigendum: Data weighting in statistical fisheries stock assessment models. Can. J. Fish. Aquat. Sci. 68, 2228-2228. https://doi.org/10.1139/f2011-165. Goethel, D.R., Bosley, K.M., Langseth, B.J., Deroba, J.J., Berger, A.M., Hanselman, D.H., Schueller, A.M., 2021. Where do you think you’re going? Accounting for ontogenetic and climate-induced movement in spatially stratified integrated population assessment models. Fish Fish. 22, 141-160. https://doi.org/10.1111/faf.12510. Han, Q., Shan, X., Jin, X., Gorfine, H., 2023. Contrasting stock status trends obtained from survey and fishery CPUE, taking Larimichthys polyactis in Yellow Sea Large Marine Ecosystem as an example. Ecol. Indic. 147, 110032. https://doi.org/10.1016/j.ecolind.2023.110032. Hurtado-Ferro, F., Szuwalski, C.S., Valero, J.L., Anderson, S.C., Cunningham, C.J., Johnson, K.F., Licandeo, R.R., Mcgilliard, C.R., Monnahan, C.C., Muradian, M.L., 2015. Looking in the rear-view mirror: bias and retrospective patterns in integrated, age-structured stock assessment models. ICES J. Mar. Sci. 72, 99-110. https://doi.org/10.1093/icesjms/fsu198. Johnson, K.F., Monnahan, C.C., McGilliard, C.R., Vert-Pre, K.A., Anderson, S.C., Cunningham, C. J., Hurtado-Ferro, F., Licandeo, R.R., Muradian, M.L., Ono, K., Szuwalski, C.S., Valero, J.L., Whitten, A.R., Punt, A.E., 2014. Time-varying natural mortality in fisheries 109 stock assessment models: identifying a default approach. ICES J. Mar. Sci. 72, 137-150. https://doi.org/10.1093/icesjms/fsu055. Kristensen, K., Nielsen, A., Berg, C.W., Skaug, H., Bell, B.M., 2016. TMB: Automatic differentiation and laplace approximation. J. Stat. Softw. 70, 1-21. https://doi.org/10.18637/jss.v070.i05. Legault, C.M., Restrepo, V.R., 1998. A flexible forward age-structured assessment program. Collect. Vol. Sci. Pap. ICCAT, 49, 246-253. Linton, B.C., Bence, J.R., 2008. Evaluating methods for estimating process and observation error variances in statistical catch-at-age analysis. Fish. Res. 94, 26-35. https://doi.org/10.1016/j.fishres.2008.06.014. Linton, B.C., Bence, J.R., 2011. Catch-at-age assessment in the face of time-varying selectivity. ICES J. Mar. Sci. 68, 611-625. https://doi.org/10.1093/icesjms/fsq173. Martell, S., Stewart, I., 2014. Towards defining good practices for modeling time-varying selectivity. Fish. Res. 158, 84-95. https://doi.org/10.1016/j.fishres.2013.11.001. Maunder, M.N., Deriso, R.B., 2003. Estimation of recruitment in catch-at-age models. Can. J. Fish. Aquat. Sci. 60, 1204-1216. https://doi.org/10.1139/f03-104. Maunder, M.N., 2011. Review and evaluation of likelihood functions for composition data in stock-assessment models: Estimating the effective sample size. Fish. Res. 109, 311-319. https://doi.org/10.1016/j.fishres.2011.02.018. Maunder, M.N., Piner, K.R., 2017. Dealing with data conflicts in statistical inference of population assessment models that integrate information from multiple diverse data sets. Fish. Res. 192, 16-27. https://doi.org/10.1016/j.fishres.2016.04.022. Maunder, M.N., Thorson, J.T., 2018. Modeling temporal variation in recruitment in fisheries stock assessment: a review of theory and practice. Fish. Res. 217, 1-16. https://doi.org/10.1016/j.fishres.2018.12.014. Mertz, G., Myers, R.A., 1996. Influence of fecundity on recruitment variability of marine fish. Can. J. Fish. Aquat. Sci. 53, 1618-1625. https://doi.org/10.1139/f96-089. Miller, T.J., Hare, J.A., Alade, L.A., 2016. A state-space approach to incorporating environmental effects on recruitment in an age-structured assessment model with an application to southern new england yellowtail flounder. Can. J. Fish. Aquat. Sci. 73, 1261-1270. https://doi.org/10.1139/cjfas-2015-0339. [NEFSC] Northeast Fisheries Science Center. 2022. Stock Assessment Update of 14 Northeast Groundfish Stocks Through 2018. US Dept Commer, Northeast Fish. Sci. Cent. Ref. Doc. 22-06. 110 Nielsen, A., Berg, C.W., 2014. Estimation of time-varying selectivity in stock assessments using state-space models. Fish. Res. 158, 96-101. https://doi.org/10.1016/j.fishres.2014.01.014. Perreault, A.M.J., Wheeland, L.J., Joanne Morgan, M., Cadigan, N.G., 2020. A state-space stock assessment model for American plaice on the Grand Bank of Newfoundland. J. Northwest Atl. Fish. Sci. 51, 45-104. https://doi.org/10.2960/J.v51.m727. Rose, K.A., Cowan, J.H., Winemiller, K.O., Myers, R.A., Hilborn, R., 2001. Compensatory density dependence in fish populations: importance, controversy, understanding and prognosis. Fish and Fish. 2, 293-327. https://doi.org/10.1046/j.1467-2960.2001.00056.x. Rossi, S.P., Cox, S.P., Benoît, H.P., Swain, D.P., 2019. Inferring fisheries stock status from competing hypotheses. Fish. Res. 216, 155-166. https://doi.org/10.1016/j.fishres.2019.04.011. Soto, M., Fernández-Peralta, L., Pennino, M.G., Kokkalis, A., Rey, J., Salmerón, F., Liébana, M., Meissa, B., Kell, L., 2022. Effects of misreporting landings, discards, and Catch Per Unit of Effort index in state-space production models: the case of black hake in northwest Africa. ICES J. Mar. Sci. fsac188. https://doi.org/10.1093/icesjms/fsac188. Stewart, I.J., Martell, S.J.D., 2014. A historical review of selectivity approaches and retrospective patterns in the Pacific halibut stock assessment. Fish. Res. 158, 40-49. https://doi.org/10.1016/j.fishres.2013.09.012. Stewart, I.J., Monnahan, C.C., 2017. Implications of process error in selectivity for approaches to weighting compositional data in fisheries stock assessments. Fish. Res. 192, 126-134. https://doi.org/10.1016/j.fishres.2016.06.018. Stock, B.C., Miller, T.J., 2021. The Woods Hole Assessment Model (WHAM): A general state- space assessment framework that incorporates time- and age- varying processes via random effects and links to environmental covariates. Fish. Res. 240, 105967. https://doi.org/10.1016/j.fishres.2021.105967. Stock, B.C., Xu, H., Miller, T.J., Thorson, J.T., Nye, J.A., 2021. Implementing two-dimensional autocorrelation in either survival or natural mortality improves a state-space assessment model for Southern New England-Mid Atlantic yellowtail flounder. Fish. Res. 237, 105873. https://doi.org/10.1016/j.fishres.2021.105873. Szuwalski, C.S., Ianelli, J.N., Punt, A.E., 2018. Reducing retrospective patterns in stock assessment and impacts on management performance. ICES J. Mar. Sci. 75, 596-609. https://doi.org/10.1093/icesjms/fsx159. Thorson, J.T., Jensen, O.P., Zipkin, E.F., 2014. How variable is recruitment for exploited marine fishes? A hierarchical model for testing life history theory. Can. J. Fish. Aquat. Sci. 71, 973-983. https://doi.org/10.1139/cjfas-2013-0645. 111 Thorson, J.T., Kristensen, K., 2016. Implementing a generic method for bias correction in statistical models using random effects, with spatial and population dynamics examples. Fish. Res. 175, 66-74. https://doi.org/10.1016/j.fishres.2015.11.016. Thorson, J.T., 2019. Guidance for decisions using the Vector Autoregressive Spatio-Temporal (VAST) package in stock, ecosystem, habitat and climate assessments. Fish. Res. 210, 143-161. https://doi.org/10.1016/j.fishres.2018.10.013. Thorson, J. T., Rudd, M. B., Winker, H., 2019. The case for estimating recruitment variation in data-moderate and data-poor age-structured models. Fish. Res. 217, 87-97. https://doi.org/10.1016/j.fishres.2018.07.007. Trijoulet, V., Fay, G., Miller, T.J., 2020. Performance of a state-space multispecies model: What are the consequences of ignoring predation and process errors in stock assessments? J. Appl. Ecol. 57, 121-135. https://doi.org/10.1111/1365-2664.13515 Trijoulet, V., Albertsen, C.M., Kristensen, K., Legault, C.M., Miller, T.J., Nielsen, A., 2023. Model validation for compositional data in stock assessment models: Calculating residuals with correct properties. Fish. Res. 257, 106487. https://doi.org/10.1016/j.fishres.2022.106487. Xu, H., Thorson, J.T., Methot, R.D., 2020. Comparing the performance of three data-weighting methods when allowing for time-varying selectivity. Can. J. Fish. Aquat. Sci. 77, 247- 263. https://doi.org/10.1139/cjfas-2019-0107. 112 APPENDIX 2A: INPUT OBSERVATION VARIANCE VALUES Table 2A.1. Input standard deviation of lognormal distribution for total catch and total index, and effective sample size of multinomial distribution for catch and index proportions at age used to fit Gulf of Maine haddock. (𝐼) (𝐼) Year 𝜎𝐶𝑦 𝜎𝐼2𝑦,𝑆 𝜎𝐼2𝑦,𝐹 𝑛𝑦 (𝐶) 𝑛𝑦,𝑆 𝑛𝑦,𝐹 1977 0.2 0.6142 0.5225 60 15 25 1978 0.2 0.7069 0.3756 60 15 25 1979 0.2 0.5174 0.402 60 15 25 1980 0.2 0.6821 0.5415 60 15 25 1981 0.2 0.5442 0.3841 60 15 25 1982 0.2 0.6061 0.5385 60 15 25 1983 0.2 0.6917 0.4717 60 15 25 1984 0.2 0.7048 0.467 60 15 7 1985 0.2 0.6785 0.5869 60 15 7 1986 0.2 0.7443 0.6136 60 15 7 1987 0.2 0.6549 0.5188 60 5 7 1988 0.2 0.8185 0.8237 20 5 7 1989 0.15 1.0139 0.5951 20 5 7 1990 0.15 0.8431 0.5812 20 5 7 1991 0.15 0.8238 0.771 20 5 7 1992 0.15 0.915 0.7289 20 5 7 1993 0.15 0.7611 0.9208 60 10 7 1994 0.15 0.635 0.6189 60 10 7 1995 0.15 0.7618 0.7218 60 10 7 1996 0.15 0.6179 0.5799 60 10 7 1997 0.15 0.698 0.5582 60 10 7 1998 0.15 0.709 0.6899 60 10 30 1999 0.15 0.6945 0.4931 60 10 30 2000 0.15 0.69 0.6627 60 10 30 2001 0.1 0.8868 0.4699 60 10 30 2002 0.1 0.8074 0.5369 60 10 30 2003 0.1 0.543 0.4162 140 10 30 2004 0.1 0.6241 0.4569 140 10 30 2005 0.1 0.6961 0.3908 140 25 30 2006 0.1 0.744 0.4546 140 25 30 2007 0.1 0.6686 0.4973 140 25 30 2008 0.1 0.7815 0.5064 140 25 30 2009 0.1 0.666 0.5245 140 25 12 2010 0.1 0.6778 0.6653 140 25 12 2011 0.1 0.6538 0.6523 140 25 12 2012 0.1 0.6848 0.7937 140 25 12 113 Table 2A.1 (cont’d) 2013 0.1 0.73 0.4409 140 25 12 2014 0.1 0.492 0.5957 140 25 12 2015 0.1 0.8862 0.409 140 25 12 2016 0.1 0.5058 0.3803 140 25 12 2017 0.1 0.5909 0.3692 140 25 12 2018 0.1 0.4801 0.3968 140 25 12 2019 0.1 0.4973 0.4331 140 25 12 114 APPENDIX 2B: SUPPLEMENTARY FIGURES Figure 2B.1. Estimated recruitment and expected survival process error correlation coefficient. Black lines indicate the true correlation coefficient. Results presented in each column are for when process error standard deviation was set at a range of values (left- recruitment, middle-survival, right-selectivity) with the standard deviations for the other processes at baseline values. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Only estimation models that specify recruitment as a random effect (R--) are represented. The number above each boxplot indicates the number of models that converged and produced positive definite hessians, out of 100 simulated data sets. The black dot is the median and the red dot is the average relative error across the converged models. Note slight differences in the y-axis. 115 Figure 2B.2. Estimated selectivity process error correlation coefficient. Black lines indicate the true correlation coefficient. Results presented in each column are for when process error standard deviation was set at a range of values (left-recruitment, middle-survival, right-selectivity) with the standard deviations for the other processes at baseline values. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Only estimation models that specify selectivity as a random effect (--R) are represented. See caption of Figure 2B.1 for description of plot elements. Note slight differences in the y-axis. 116 Figure 2B.3. Mohn’s Rho of spawning stock biomass across process variability operating model scenarios. Black lines indicate zero retrospective bias. Results presented in each column are for when process error standard deviation was set at a range of values (left- recruitment, middle-survival, right-selectivity) with the standard deviations for the other processes at baseline values. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). See caption of Figure 2B.1 for description of plot elements. 117 Figure 2B.4. Mohn’s Rho of average fishing mortality across process variability operating model scenarios. Black lines indicate zero retrospective bias. Results presented in each column are for when process error standard deviation was set at a range of values (left- recruitment, middle-survival, right-selectivity) with the standard deviations for the other processes at baseline values. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). See caption of Figure 2B.1 for description of plot elements. 118 Figure 2B.5. Mohn’s Rho of recruitment across process variability operating model scenarios. Black lines indicate zero retrospective bias. Results presented in each column are for when process error standard deviation was set at a range of values (left-recruitment, middle-survival, right-selectivity) with the standard deviations for the other processes at baseline values. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). See caption of Figure 2B.1 for description of plot elements. 119 Figure 2B.6. Relative error in spawning stock biomass in the terminal year across observation error operating models scenarios where the standard deviation of the lognormal error of total catch or total indices (columns 1-2) or the effective sample size of the multinomial catch or index proportions (columns 3-4) are adjusted to make the data noisier. The black lines indicated zero relative error. Results presented in each column are for when process error variance was set at a range of values (left-recruitment, middle- survival, right-selectivity) with the standard deviations for the other processes at baseline values. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). The number above each boxplot indicates the number of models that converged and produced positive definite hessians, out of 100 simulated data sets. The black dot is the median and the red dot is the average relative error across the converged models. 120 Figure 2B.7. Relative error in average recruitment in the terminal 3 years across observation error operating models scenarios. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C), in that order. See caption of Figure 2B.6 for description of plot elements. 121 Figure 2B.8. Relative error in exploitation (predicted catch biomass/total biomass) in the terminal year across observation error operating models scenarios. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C), in that order. See caption of Figure 2B.6 for description of plot elements. 122 Figure 2B.9. Relative error in depletion in the terminal 10 years across observation error operating models scenarios. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C), in that order. See caption of Figure 2B.6 for description of plot elements. 123 Figure 2B.10. Mohn’s Rho of spawning stock biomass across observation error operating models scenarios. Black lines indicate zero retrospective bias. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). See caption of Figure 2B.6 for description of plot elements. 124 Figure 2B.11. Mohn’s Rho of average fishing mortality across observation error operating models scenarios. Black lines indicate zero retrospective bias. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). See caption of Figure 2B.6 for description of plot elements. 125 Figure 2B.12. Mohn’s Rho of recruitment across observation error operating models scenarios. Black lines indicate zero retrospective bias. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). See caption of Figure 2B.6 for description of plot elements. 126 Figure 2B.13. Estimated recruitment process error standard deviation across observation error operating model scenarios. Black lines indicate the true recruitment standard deviation. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Only estimation models that specify recruitment as a random effect (R--) are represented. See caption of Figure 2B.6 for description of plot elements. 127 Figure 2B.14. Estimated expected survival process error standard deviation across observation error operating model scenarios. Black lines indicate the true expected survival standard deviation. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Only estimation models that specify survival as a random effect (-R-) are represented. See caption of Figure 2B.6 for description of plot elements. 128 Figure 2B.15. Estimated selectivity process error standard deviation across observation error operating model scenarios. Black lines indicate the true selectivity standard deviation. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Only estimation models that specify selectivity as a random effect (R--) are represented. See caption of Figure 2B.6 for description of plot elements. 129 Figure 2B.16. Estimated recruitment and expected survival process error correlation coefficient across observation error operating model scenarios. Black lines indicate the true correlation coefficient. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Only estimation models that specify recruitment as a random effect (R--) are represented. See caption of Figure 2B.6 for description of plot elements. 130 Figure 2B.17. Estimated selectivity process error correlation coefficient across observation error operating model scenarios. Black lines indicate the true correlation coefficient. Individual boxplots represent the eight estimation models that treat recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Only estimation models that specify selectivity as a random effect (R--) are represented. See caption of Figure 2B.6 for description of plot elements. 131 CHAPTER 3: THE PERFORMANCE OF ALTERNATIVE AGE COMPOSITION LIKELIHOODS IN STATE-SPACE MODELS WITH MULTIPLE SOURCES OF PROCESS VARIATION 3.1 Abstract Integrated stock assessment models incorporate multiple sources of data like catch, indices of abundance, and proportions at age, each with different degrees of quality in either fishery or survey catch. In a maximum likelihood estimation framework, integrated models explicitly apportion observation variability through values supplied to the assumed likelihood, such as the standard deviation of a lognormal distribution (for catch) or the effective sample size of a multinomial distribution (for proportions at age), and these terms are typically supplied by the user. Several alternatives to the multinomial (MN) distribution, such as the Dirichlet- multinomial (DM), logistic normal (LG), and multivariate Tweedie (MVTW) distributions, can allow observation variance to be approximated within the model by estimating parameters that scale or replace the input effective sample size. These distributions can better account for the intra-haul correlation and overdispersion of sampled ages. However, these distributions have yet to be extensively tested in a state-space stock assessment model with estimated process error variance, which has the added challenge of apportioning total model variability between the observation and process sub-models. I used a simulation-estimation framework to generate data with process variability in recruitment, expected survival (i.e., abundance), and selectivity at age, and with observation variability in the catch proportions at age derived from one of several distributions. Simulated data were fit using several estimation models that differed in which processes (recruitment, expected survival, or selectivity) were incorporated and in the assumed distribution for observation error influencing age composition data. Estimation models had 132 difficulty apportioning variability among processes, especially when the observation model for age compositions did not account for realized inter-age correlation and overdispersion. Estimation models that assumed the survival processes were stochastic produced biased model output, but only models that assumed an intermediate amount of process variability (either stochastic survival or constant selectivity at age) were able to accurately estimate the parameters determining the characteristics of observation error in age composition data. It is recommended that the logistic normal or multivariate Tweedie distributions be employed in fitting state-space stock assessment models, especially in the face of several time-varying processes. 3.2 Introduction Statistical estimation models fit data using a likelihood function, which quantifies the probability of observing the data conditional on the model and the values of the parameters. Choices are made about observation error (e.g., how it is structured, the magnitude relative to the process error, and how much is distributed across multiple data sources) through specification of the likelihood function. Through the variance parameters, such as the standard deviation of the normal distribution or the effective sample size of the multinomial distribution, the model can be “weighted” such that some data are fit more closely (with less observation error such that the predicted and observed values are more similar) than others. Correctly weighting among multiple data sources in integrated stock assessment models is an ongoing challenge especially when data have conflicting trends (Maunder and Piner, 2017; Maunder and Punt, 2013; Punt et al., 2013). Changing the data weighting or the likelihood function as a whole can cause substantial changes in model output and model evaluation tools like AIC as the model fits certain data points more accurately (Francis, 2014, 2011). It is therefore imperative to specify an observation error 133 structure that accurately reflects the data collection process and correctly weights among, for example, catch, discards, indices of abundance, and tagging data. Age composition, or the fraction of each age represented in the catch or index, is recorded by sub-sampling individuals from the catch or survey and aging from otoliths, scales, or other bony structures. In standard statistical-catch-at-age models, the proportions at age are often assumed to have observation error that has a multinomial distribution (MN) (Fournier and Archibald, 1982; Legault and Restrepo, 1998; Methot and Wetzel, 2013). However, such a distribution assumes that individuals are sampled independently, when in reality factors like schooling or targeted fishing may lead to overdispersion (more variability than expected) and “intra-haul correlation” among age groups (Pennington and Vølstad, 1994). Therefore, alternative composition distributions with more desirable properties that can account for overdispersion and inter-age correlation have been explored instead of the multinomial (Francis, 2014; Maunder, 2011). In a state-space modeling framework, which include one or more time- varying random effects which are integrated from the marginal likelihood, the process variation is self-weighting and the applicability of alternative proportions at age likelihoods may be contingent on how well they can parse variance among data sources and between the process and observation sub-models (Besbeas and Morgan, 2017; Francis, 2017; Linton and Bence, 2008). The multinomial is still recommended over some of the alternative proportions at age distributions (listed below) when the model accounts for time-varying selectivity (Cronin-Fine and Punt, 2021). The Dirichlet-multinomial (DM) is a hierarchical distribution that assumes that the effective sample size of a multinomial distribution follows a Dirichlet distribution (Francis, 2017; Thorson et al., 2017). The DM outperforms other data weighting schemes like iteratively 134 reweighting the multinomial effective sample size or estimating effective sample sizes with a strong prior (Thorson et al., 2017; Xu et al., 2020). The logistic normal (LG) is another hierarchical distribution, but unlike DM can account for substantial correlation among age classes. LG is a transformed multivariate-normal distribution that can be parameterized to be linear, or an autoregressive first order AR(1) or second order AR(2) process, or moving average ARMA (Fisch et al., 2021; Francis, 2014). Its flexibility and ability to self-weight has led the LG distribution, and others that can account for correlations, to outperform the DM especially when there’s moderate process error in selectivity and the composition sample size is large (Albertsen et al., 2017; Fisch et al., 2021). The LG distribution is a recommended option when employing state-space stock assessment software WHAM and when there is variance in numbers at age, natural mortality, selectivity, or environmental covariates (Stock et al., 2021; Stock and Miller, 2021). The multivariate Tweedie distribution (MVTW) has recently been adapted for application in fisheries stock assessment. The MVTW is an attractive alternative to the multinomial that assumes that the proportions at age are derived from a transformed gamma and arises naturally from the process of subsampling age and length information (Thorson et al., 2022). Though previous research and application are limited for MVTW, it has outperformed the DM and performed comparably to the LG in a state-space model with time-varying selectivity and is already included in the state-space modeling software WHAM (Stock and Miller, 2021; Thorson et al., 2022). The MVTW can upweight the data (produce effective sample sizes larger than the input sample size, which combats the risk of setting the input sample size too low) and estimate heteroscedasticity unlike the DM, and account for zeros in the data, unlike the LG, so it exhibits many of the stated important qualities for an catch proportions at age distribution (Francis, 2014; Thorson et al., 2022). 135 Models may perform differently and produce biased model output when they do not account for overdispersion or inter-age correlation when these factors are present or when they do try to account for these processes when they are absent, particularly when the model is also accounting for several other stochastic processes. To address this question, catch data were simulated using alternative age composition scenarios that assumed several observation error distributions for the proportions at age (of those described above), then the simulated data were fit using estimation models that assumed different likelihoods for the proportions at age, and different sources of process variability in recruitment, expected survival, and selectivity. The additional process variability scenarios were intended to test how well the model can apportion variability when there are one or several time-varying processes. 3.3 Methods 3.3.1 Overview Catch data were simulated using a state-space stock assessment operating model that was based on a model fit with Gulf of Maine haddock data. The operating model generated total catch and catch proportions at age using four scenarios that assumed the observation error in total catch had a lognormal distribution and the observation errors in catch proportions at age had a multinomial, Dirichlet-multinomial, logistic-normal with AR(1) correlations, or multivariate Tweedie distribution. A fifth simulation scenario modeled observation error in catch-at-age, rather than the total catch and proportions separately, using a multivariate lognormal distribution with AR(1) error (MVN). The MVN distribution for catch-at-age is the default assumption in the state-space modeling software SAM and a common observation error distribution used to fit Canadian stocks (Berg and Nielsen, 2016; Nielsen and Berg, 2014; Perreault et al., 2020). The operating model included intra-annual process variance in recruitment, expected survival, and 136 selectivity, with an autocorrelated AR(1) distribution. Yearly deviations in these processes were not simulated from their respective distributions, but rather fixed at the estimated values, such that the realized abundance, spawning stock biomass, and fishing mortality was the same for every generated data set. The simulated data were fit using several estimation models that differed in the assumed likelihood of the proportions at age, either MN, DM, LG or MVTW, but not MVN because no such option exists in the modeling package used for fitting. Several versions of the estimation models were also used which differed in which processes (of recruitment, expected survival or selectivity) had process errors. Model output from fitting simulated data were compared to the realized values used to generate that data to assess model performance. 3.3.2 Simulation/Operating Model Scenarios Each simulation model generated total catch, total index, and catch and index age composition data (for ages 1-9+) in each year using the predicted values from an state-space stock assessment model fit with Gulf of Maine haddock catch and index data from 1977-2019 using the WHAM state-space assessment package (Stock and Miller, 2021). The process and observation sub-model were identical to that described in Table 2.1 and summarized below. The process sub-model assumed time-varying random effects in recruitment, expected survival at age, and selectivity at age, and the yearly deviations were assumed to have an autoregressive AR(1) process with inter-annual correlation. In the observation sub-model, catch at age was predicted each year from a Baranov catch equation and the predicted total catch and proportions at age were derived from these age-specific values. Indices of abundance from a spring and fall bottom trawl survey were predicted from the abundance at age, index catchability, and index selectivity, and similarly decomposed into total index and index age compositions. The model 137 estimated fixed effect parameters of mean recruitment, mean selectivity at age, numbers at age for ages 1-6 in the first year, index catchability and selectivity, and the variance and correlation parameters of the random effects. The model integrated the random effects from the marginal likelihood and predicted values of the random effects and derived quantities were calculated using the epsilon method for bias correction (Thorson and Kristensen, 2016). For a full description of the process and observation sub-models see section 2.3.2. The operating model generated catch and index data using the same estimated fixed effect and random effect values from the model fitted to real data, i.e., the recruitment, expected survival, and selectivity deviations were kept the same and not re-drawn from their respective distributions. Therefore, the process variance was the same across simulation scenarios. The total index and index proportions at age were not simulated, but rather matched that of the Gulf of Maine Haddock data. What distinguished the five simulation scenarios was the assumed probability distribution of the observation error in total catch and catch proportions at age (Table 3.1). 138 Table 3.1. Simulation distributions of total catch, proportion at age, or catch-at-age used in likelihood simulation scenarios and likelihood equations used in likelihood estimation scenarios. LN() denotes lognormal distribution with mean and standard deviation. N() is a normal distribution with mean and standard deviation. MN() is a multinomial distribution with effective sample size and predicted proportions. Gam() denotes gamma distribution with shape and scale parameters. Pois() is a Poisson distribution with mean/variance parameter. Γ() is the gamma function and det() is the determinant of a matrix. Predicted values of total catch 𝐶̂𝑦 in weight, proportions at age 𝑃̂𝑎,𝑦 , catch-at-age in numbers, 𝐶̂𝑎,𝑦 , and average weight at age at time of capture, 𝑊𝑎,𝑦 , are in Appendix 3A. Variance and Parameter Index Equation Values Simulation Model Total Catch Sim 𝜎𝐶𝑦 = 0.2, Lognormal 1977 ≤ 𝑦 ≤ 1988 𝜎𝐶𝑦 = 0.15, 𝐶𝑦 ~𝐿𝑁(𝐶̂𝑦 , 𝜎𝐶𝑦 ) 1989 ≤ 𝑦 ≤ 2000 𝜎𝐶𝑦 = 0.1, 2001 ≤ 𝑦 ≤ 2019 Catch Proportions at Age SimMulti (𝐶) 𝑛𝑦 = 60, 1977 ≤ 𝑦 ≤ 1987 (𝐶) 𝑛𝑦 = 20, (𝐶) 1988 ≤ 𝑦 ≤ 1992 𝑃𝑎,𝑦 ~𝑀𝑁(𝑛𝑦 , 𝑃̂𝑎,𝑦 ) (𝐶) 𝑛𝑦 = 60, 1993 ≤ 𝑦 ≤ 2002 (𝐶) 𝑛𝑦 = 140, 2003 ≤ 𝑦 ≤ 2019 SimDirMult (𝐶) (𝐶) 𝑛𝑦 see SimMulti 𝑃𝑎,𝑦 ~𝑀𝑁(𝑛𝑦 , 𝑃̂𝑎,𝑦 ∗ ) 𝛽 = 18.504 𝑋 𝑎,𝑦 𝑃̂𝑎,𝑦 ∗ = 𝐴 ∑𝑎=1 𝑋𝑎,𝑦 𝑋𝑎,𝑦 ~𝐺𝑎𝑚(𝑘𝑎,𝑦 , 1) 𝑘𝑎,𝑦 = 𝑃̂𝑎,𝑦 ∗ 𝛽 𝐴−1 (𝐶) SimLogi 𝑛𝑦 see SimMulti 𝑃𝐴,𝑦 = 1 − ∑ 𝑃𝑎,𝑦 𝜎 = 1.95 𝑎=1 𝜎𝑦 = exp (𝜎 + ∗ 𝑃𝑎,𝑦 1 (𝐶) 𝑃𝑎,𝑦 = ,𝑎 < 𝐴 log(𝑛𝑦 )) 1 + ∑𝐴−1 𝑎=1 𝑃𝑎,𝑦 ∗ 2 ∗ 𝑃𝑎,𝑦 ~𝑁(𝜇𝑎,𝑦 , 𝚺𝑦 ), 𝑎 < 𝐴 𝜌 = 0.648 𝚺𝑦,𝑎,𝑎̃ = 𝜌|𝑎−𝑎̃| 𝜎𝑦2 , 𝑎 < 𝐴, 𝑎̃ < 𝐴 𝜇𝑎,𝑦 = log(𝑃̂𝑎,𝑦 ) − log(𝑃̂𝐴,𝑦 ) , 𝑎 < 𝐴 139 Table 3.1 (cont’d) ∗ SimTwe 𝑃𝑎,𝑦 (𝐶) 𝑛𝑦 see SimMulti 𝑃𝑎,𝑦 = 𝐴 ∗ ∑𝑎=1 𝑃𝑎,𝑦 𝜓 = 1.294 ∗ 𝜙 = 0.296 𝑃𝑎,𝑦 ~𝐺𝑎𝑚(𝑘𝑎,𝑦 , 𝛾𝑎,𝑦 ) 𝑘𝑎,𝑦 = −𝑃𝑜𝑖𝑠(𝜆𝑎,𝑦 )𝛼 𝛾𝑎,𝑦 = 𝜙(𝜓 − 1)𝜇𝑎,𝑦 𝜓−1 2−𝜓 𝛼= 1−𝜓 𝜇𝑎,𝑦 2−𝜓 𝜆𝑎,𝑦 = 𝜙(2 − 𝜓) (𝐶) 𝜇𝑎,𝑦 = 𝑃̂𝑎,𝑦 𝑛𝑦 Catch at Age SimMultiV 𝜌 = 0.024 𝜎1 = 0.599 𝜎2 = 0.478 𝜎3 = 0.424 𝐶𝑎,𝑦 ~𝑁(𝐶̂𝑎,𝑦 , 𝚺𝑦 ) |𝑎−𝑎̃| 𝜎4 = 0.389 𝚺𝑎,𝑎̃ = 𝜌 𝜎𝑎 𝜎𝑎̃ , 1 < 𝑎 < 𝐴, 1 < 𝑎̃ < 𝐴 𝜎5 = 0.412 𝜎6 = 0.430 𝜎7 = 0.444 𝜎8 = 0.490 𝜎9+ = 0.448 𝐴 𝐶𝑦 = ∑ 𝐶𝑎,𝑦 𝑊𝑎,𝑦 𝑎=1 𝐶𝑎,𝑦 𝑃𝑎,𝑦 = ∑𝐴𝑎=1 𝐶𝑎,𝑦 Estimation Model Negative Log Likelihood Total Catch Est 𝑌 𝜎𝐶𝑦 is same as 1 𝐶𝑦 Lognormal ∑ (log )2 + log 𝜎𝐶𝑦 SimLognormal 2𝜎𝐶2𝑦 𝐶̂𝑦 𝑦=1977 Catch Proportions at Age EstMult 𝑌 𝐴 (𝐶) (𝐶) 𝑛𝑦 see SimMulti − ∑ ∑ 𝑛𝑦 𝑃̂𝑎,𝑦 log 𝑃𝑎,𝑦 𝑦=1977 𝑎=1 EstDirMult 𝑌 (𝐶) (𝐶) (𝐶) 𝑛𝑦 see SimMulti − ∑ log (Γ(𝑛𝑦 + 1) + log (Γ(𝜑𝑦 ) − log (Γ(𝑛𝑦 + 𝜑𝑦 ) 𝛽 estimated on the log 𝑦=1977 scale 𝐴 (𝐶) + ∑(−log (Γ(𝑛𝑦 𝑃𝑎,𝑦 + 1) 𝑎=1 (𝐶) + log (Γ(𝑛𝑦 𝑃𝑎,𝑦 + 𝑃̂𝑎,𝑦 ∗ 𝛽) − log (Γ(𝑃̂𝑎,𝑦 ∗ 𝛽))] 𝐴 𝜑𝑦 = ∑ 𝑘𝑎,𝑦 𝑎=1 𝑘𝑎,𝑦 = 𝑃̂𝑎,𝑦 ∗ 𝛽 140 Table 3.1 (cont’d) 𝑌 EstLogi 1 𝜎 estimated on the log − ∑ (log ) scale, one value for all 𝐴−1 𝑦=1977 √(2𝜋) 2 det years (𝚺) 𝜌 estimated on the logit 1 𝑇 − ((𝑿𝑦 − 𝝁𝑦 ) 𝚺 −𝟏 (𝑿𝑦 − 𝝁𝑦 )) scale 2 𝑿𝑦 and 𝝁𝑦 are vector 𝐴−1 notation and contain all + ∑ log ( 𝑃𝑎,𝑦 )) ages for a given year 𝑎=1 𝚺𝑦,𝑎,𝑎̃ = 𝜌|𝑎−𝑎̃| 𝜎 2 , 𝑎 < 𝐴, 𝑎̃ < 𝐴 𝜇𝑎,𝑦 = log(𝑃̂𝑎,𝑦 ) − log(𝑃̂𝐴,𝑦 ) , 𝑎 < 𝐴 𝑋𝑎,𝑦 = log(𝑃𝑎,𝑦 ) − log(𝑃𝐴,𝑦 ) , 𝑎 < 𝐴 EstTwe (𝐶) 𝑛𝑦 see SimMulti (𝜓 -1) estimated on logit scale (to constrain between 1 and 2) 𝑌 𝐴 𝜙 estimated on log scale (𝐶) ∑ −log (∏ 𝑑𝑡𝑤𝑒𝑒𝑑𝑖𝑒( 𝑃𝑎,𝑦 |𝑃̂𝑎,𝑦 , 𝑛𝑦 , 𝜓, 𝜙) 𝑦=1977 𝑎=1 Note- there is no closed form version of the multivariate Tweedie distribution. See Thorson et al. (2022) The first scenario mimicked the structure of the model fit to real Gulf of Maine haddock data such that total catch was lognormally distributed with the mean equal to the predicted value in each year, and the catch proportions at age had a multinomial distribution (“SimMulti”) with the probability equal to the predicted values (for predicted values, see Appendix 3A). The standard deviation and effective sample size of these distributions, respectively, were the same as in the Gulf of Maine haddock data (see table 2A.1). Other simulation scenarios assumed the total catch was drawn from a lognormal distribution, like the first model, but the proportions at age were generated using either a Dirichlet-multinomial (“SimDirMult”), logistic-normal distribution with AR(1) correlations (“SimLogi”), or multivariate Tweedie distribution (“SimTwe”). A final simulation scenario assumed that the catch-at-age, rather than total catch and proportions at age, was simulated from a multivariate lognormal distribution (“SimMultiV”) but decomposed into 141 total catch and proportions at age before being supplied to the estimation models. The data were simulated in R rather than using the simulation options built into the package WHAM, and the package MASS was used for multivariate lognormal distributions (a component of “SimLogi” and “SimMultiV”), and the package tweedie was used for “SimTwe” (Venables and Ripley, 2002; Dunn, 2022). The Dirichlet-multinomial, logistic-normal, and multivariate Tweedie distributions all contain additional “age composition parameters” that determine the variance and/or correlation of the observation error distribution. To ensure that the observation error variance was approximately the same across scenarios, appropriate age composition parameters were found by fitting the Gulf of Maine haddock data assuming a DM, LG, or MVTW distribution for the proportions at age while fixing all the parameters at their estimated values except the age composition parameter. All the data inputs were kept the same as well, such that the alternative proportion at age models were fit using the same input sample size as the model fit using the multinomial distribution. Because there was no option to fit the catch at age data to a multivariate lognormal distribution in WHAM, an alternative approach was used to find appropriate standard deviation and correlation values for the multivariate log normal distribution that approximated the variance of the lognormal catch and multinomial proportions at age. 100,000 data sets of total catch and proportion at age were generated from the lognormal and multinomial distributions, converted into catch-at-age, and the average standard deviation at age and correlation of the residuals between the observed and predicted values were estimated. Each simulation scenario also necessitated an input sample size, which was assumed to equal the effective sample size in the Gulf of Maine haddock data. In total, 100 data sets were generated for each of the five simulation scenarios. 142 3.3.3 Estimation Model Scenarios Simulated data were fit by different estimation models that included four different assumptions about the distribution for the observed age-compositions (observation scenarios) and five different assumptions about process variability (process scenarios) estimation scenarios and five process variance scenarios. The observation scenarios assumed the catch proportions at age followed a multinomial “EstMulti,” Dirichlet-multinomial “EstDirMult,” logistic normal “EstLogi,” or multivariate Tweedie “EstTwe” distribution, each corresponding to one of the simulation scenarios (Table 3.1). The logistic normal distribution “EstLogi” cannot account for zeros in the proportions at age data and therefore zeros were pooled with adjacent age bins automatically in WHAM. Each observation scenario was crossed with process scenarios that assumed random effects in recruitment, expected survival, and selectivity which were integrated from the marginal likelihood (which matches the operating model), or that one or more process was constant or a fixed effect (which does not match the operating model). The process scenarios were designated by a three-letter code which indicated if the recruitment, expected survival, or selectivity was specified as a random effect (R), a fixed effect (F), or constant (C). See section 2.3.4 for a complete description of the process variation estimation scenarios. The intent was to investigate trade-offs in weighing total model variability between the observation and process variance. In total there were 100 combinations of simulation and estimation model. 3.3.4 Performance of Assessment Models The estimation models were assessed by how precisely and accurately they estimated spawning stock biomass in the terminal year, average recruitment in the terminal 3 years, exploitation rate in the terminal year, and the depletion, defined as the ratio of the predicted spawning stock biomass in the terminal year to that 10 years prior to the terminal year. Metrics in 143 the terminal year are often the basis for management reference points and management decisions, so evaluating their accuracy is of primary interest. A retrospective analysis was performed, which involved refitting the model after sequentially removing the ultimate seven years of data to investigate how recent observations influenced the estimates, and retrospective patterns were quantified by Mohn’s Rho statistic for spawning stock biomass, average fishing mortality, and recruitment (Hurtado-Ferro et al., 2015; Mohn, 1999). See section 2.4.5 and Table 2.2 “Derived Quantities and Simulation Output” for a complete description of these metrics and how relative error was calculated. Additional performance metrics were evaluated if the estimation model correctly specified the likelihood of the proportions at age i.e., a “self-test.” When the simulation and estimation models matched in the assumed distributions for proportions at age the estimated age composition parameters, which quantify the observation error variance, were compared against the true value used to generate the data. The bias in estimated process error variance and correlation coefficients in recruitment, expected survival, and selectivity cannot be determined because these processes were not simulated with a known variance value, however these estimates were compared against that estimated from fitting to the real Gulf of Maine Haddock. “Biased” and “unbiased” were used to explain how closely the estimates from simulated data matched those from the original estimation model on which the simulations are based. 3.4 Results The estimation models successfully converged and produced an invertible Hessian matrix for most replicates in most combinations of simulation and estimation scenario (Table 3.2). However, when the likelihood simulation scenario was multinomial and the estimation scenario was multivariate Tweedie (“SimMulti” + “EstTwe”), no replicates ran successfully for any 144 assumed process variance model. Similarly, when the data were generated from a multivariate Tweedie and fit assuming a Dirichlet-multinomial (“SimTwe” + “EstDirMult”), only 19 out of 500 simulated data sets successfully converged. Convergence rate was zero for several cases where the estimation model assumed that age compositions followed a -multinomial, (“EstDirMult”), especially for the (“RRR”) process scenario. Convergence rate was lowest when the simulated data had a multinomial distribution (64.7%), and highest when the simulated data had a Dirichlet-multinomial distribution (95.75%). Convergence rate was lowest when the model assumed process variance in all three processes (72.85%) and highest when the survival was deterministic and selectivity was constant (90.8% for RCC and 90.05% for FCC). Table 3.2. Convergence rate across likelihood simulation scenarios, likelihood estimation scenarios, and process variance estimation scenarios. Convergence criteria included a successful model run without errors and production of an invertible Hessian matrix. Likelihood Likelihood Process Variance Scenarios Simulation Estimation Scenario Scenario RRR RRC RCR RCC FCC Average Median Min Max SimMulti EstMulti 100 100 100 100 100 100 100 100 100 SimMulti EstDirMult 2 15 91 100 100 61.6 91 2 100 SimMulti EstLogi 93 100 93 100 100 97.2 100 93 100 SimMulti EstTwe 0 0 0 0 0 0 0 0 0 SimDirMult EstMulti 96 100 99 100 100 99 100 96 100 SimDirMult EstDirMult 96 100 94 100 100 98 100 94 100 SimDirMult EstLogi 83 100 59 100 100 88.4 100 59 100 SimDirMult EstTwe 98 100 90 100 100 97.6 100 90 100 SimLogi EstMulti 100 100 100 100 100 100 100 100 100 SimLogi EstDirMult 0 62 96 100 100 71.6 96 0 100 SimLogi EstLogi 96 100 95 100 100 98.2 100 95 100 SimLogi EstTwe 100 100 100 100 100 100 100 100 100 SimTwe EstMulti 100 100 100 100 100 100 100 100 100 SimTwe EstDirMult 0 0 0 17 2 3.8 0 0 17 SimTwe EstLogi 98 100 100 100 99 99.4 100 98 100 SimTwe EstTwe 100 100 100 100 100 100 100 100 100 SimMultiV EstMulti 100 100 100 100 100 100 100 100 100 SimMultiV EstDirMult 0 6 92 100 100 59.6 92 0 100 SimMultiV EstLogi 99 100 100 99 100 99.6 100 99 100 SimMultiV EstTwe 96 100 99 100 100 99 100 96 100 145 Table 3.2 (cont’d) Average 72.85 79.15 85.4 90.8 90.05 Median 96 100 97.5 100 100 Min 0 0 0 0 0 Max 100 100 100 100 100 Estimation models that assumed no random effects (“FCC”), only process variance in recruitment (“RCC”), or only process variance in recruitment and selectivity (“RCR”) had similar distributions of relative errors (i.e., similar precision and bias) across all model evaluation criteria. Collectively, these estimation models can be referred to as the “deterministic survival” scenarios because they all assume that the expected numbers at age do not deviate from that predicted by the abundance in the previous year and the instantaneous mortality, and the other scenarios assume a random effect on survival. Spawning stock biomass (SSB) in the terminal year was underestimated when the estimation model assumed deterministic survival (Figure 3.1). Conversely, SSB was overestimated when the estimation model assumed process variance in only recruitment and expected survival (“RRC”), except when the Dirichlet-multinomial or logistic normal was correctly specified (“SimDirMult” + “EstDirMult” or “SimLogi” + “EstLogi”). When the process scenario “RRR” was correctly specified in the estimation model, the model over-, under- or accurately estimated SSB, depending upon the simulated and assumed distribution for the age composition data and was least biased when the simulated data were logistic-normal (“SimLogi”). Average recruitment in the terminal three years was underestimated when the estimation model assumed deterministic survival (Figure 3.2). When the expected survival was a random effect, the average recruitment could be over-, under- or accurately estimated depending on the 146 scenario. The average recruitment was most precise and bias was lowest when the estimation model assumed a logistic normal distribution (“EstLogi”). Exploitation rate in the terminal year was overestimated when the estimation model assumed deterministic survival, except when the catch at age data were simulated from a multivariate lognormal distribution (“MultiV”) (Figure 3.3). Estimation models that assumed process variability in recruitment and expected survival (“RRC”) underestimated or accurately estimated exploitation rate and was the least biased when the Dirichlet-multinomial or logistic normal distributions were correctly specified. When the process model correctly assumed variability in all three processes, the exploitation rate was not substantially biased, especially when the estimation model assumed a Logistic normal or multivariate Tweedie distribution. Depletion was underestimated when the estimation model assumed deterministic survival (Figure 3.4). Depletion was slightly negatively biased even when the model did assume process variability in expected survival and the bias was often less when the selectivity was assumed to be constant. The retrospective patterns in spawning stock biomass (SSB) was negative and progressively larger in magnitude in estimation models that specified more of the processes as constant or fixed effects rather than as random effects (Figure 3.5). The same pattern was seen for the magnitude of retrospective patterns in average fishing mortality, though in this case the retrospective bias was positive (Figure 3.6). The retrospective patterns in recruitment was also consistently negative across all simulation and estimation scenarios (like SSB) but estimation models that assumed constant selectivity exhibited less retrospective patterns than models that specified selectivity was a random effect (Figure 3.7). 147 Recruitment process error variance was estimated with high precision and low bias, except when age composition data were simulated from a Dirichlet-multinomial in which case it was overestimated (Figure 3.8). The expected survival process variance tended to be overestimated, and the bias was greater when the simulated data were Dirichlet-multinomial (Figure 3.9). The selectivity process was underestimated or greatly overestimated, though the bias was less when the simulated data were multivariate lognormal (Figure 3.10). The correlation coefficient for the AR(1) process error in recruitment and expected survival was estimated with the least bias when the model assumed constant selectivity (--C) and overestimated when the expected survival was deterministic (-C-) (Figure 3.11). The correlation coefficient for the AR(1) process error in selectivity was strongly underestimated or slightly overestimated, and the bias was worse when the simulated data were logistic normal (Figure 3.12). When the simulation and estimation model both assumed a Dirichlet-multinomial distribution for the age compositions, the scaling parameter, 𝛽 was estimated with low bias when the estimation model assumed two of the processes were varying and one was constant (RRC or RCR), but not when all three were varying (Figure 3.13A). 𝛽 was overestimated (so the effective sample size was higher and the variance was lower) when the process variance scenario was RRR and was underestimated (corresponding to higher estimated variance) when the survival was deterministic and selectivity was constant (-CC). A similar pattern of bias was present in the logistic normal distribution which estimated variance parameter, 𝜎. The observation error variance was accurately estimated when only two processes were assumed to be random effects but predicted lower variance when all processes are varying and higher variance when survival was deterministic and selectivity was constant. (Figure 3.13B). The inter-age correlation of the age compositions, 𝜌, was estimated with little bias and high precision when selectivity was a 148 random effect but overestimated when selectivity was constant. The pattern was also present in the estimation of multivariate Tweedie parameters 𝜙 and 𝜓 (Figure 3.13C). Larger 𝜙 or 𝜓 parameters correspond to a smaller effective sample size and therefore greater observation variance. Like the Dirichlet-multinomial or the logistic-normal distributions, the multivariate Tweedie distribution predicted more observation variability when the process model assumed less process variability and less observation variability with the process model assumed substantial process variability. 149 Figure 3.1. Relative error of spawning stock biomass in the terminal year across alternative age composition simulation (rows) and estimation (columns) scenarios and process variance estimation scenarios (individual boxes). The number above each box indicates the number of the 100 generated data sets that were successfully fit. The catch proportions at age were either simulated (Sim) or estimated (Est) as a multinomial distribution (Mult), Dirichlet-multinomial distribution (DirMult), logistic normal distribution (Logi), multivariate Tweedie distribution (Twe), or multivariate normal distribution (MultiV) (catch at age rather than proportions). Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines are drawn through zero, red dots indicate the mean, and black dots indicate the median. When no plot is present, no replicates converged for that combination of simulation and estimation scenarios. 150 Figure 3.2. Relative error of average recruitment in the terminal three years across alternative age composition simulation (rows) and estimation (columns) scenarios and process variance estimation scenarios (individual boxes). The number above each box indicates the number of the 100 generated data sets that were successfully fit. The catch proportions at age were either simulated (Sim) or estimated (Est) as a multinomial distribution (Mult), Dirichlet-multinomial distribution (DirMult), logistic normal distribution (Logi), multivariate Tweedie distribution (Twe), or multivariate normal distribution (MultiV) (catch at age rather than proportions). Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines are drawn through zero, red dots indicate the mean, and black dots indicate the median. When no plot is present, no replicates converged for that combination of simulation and estimation scenarios. 151 Figure 3.3. Relative error of exploitation rate in the terminal year across alternative age composition simulation (rows) and estimation (columns) scenarios and process variance estimation scenarios (individual boxes). The number above each box indicates the number of the 100 generated data sets that were successfully fit. The catch proportions at age were either simulated (Sim) or estimated (Est) as a multinomial distribution (Mult), Dirichlet-multinomial distribution (DirMult), logistic normal distribution (Logi), multivariate Tweedie distribution (Twe), or multivariate normal distribution (MultiV) (catch at age rather than proportions). Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines are drawn through zero, red dots indicate the mean, and black dots indicate the median. When no plot is present, no replicates converged for that combination of simulation and estimation scenarios. 152 Figure 3.4. Relative error of depletion across alternative age composition simulation (rows) and estimation (columns) scenarios and process variance estimation scenarios (individual boxes). The number above each box indicates the number of the 100 generated data sets that were successfully fit. The catch proportions at age were either simulated (Sim) or estimated (Est) as a multinomial distribution (Mult), Dirichlet-multinomial distribution (DirMult), logistic normal distribution (Logi), multivariate Tweedie distribution (Twe), or multivariate normal distribution (MultiV) (catch at age rather than proportions). Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines are drawn through zero, red dots indicate the mean, and black dots indicate the median. When no plot is present, no replicates converged for that combination of simulation and estimation scenarios. 153 Figure 3.5. Mohn’s Rho in spawning stock biomass across alternative age composition simulation (rows) and estimation (columns) scenarios and process variance estimation scenarios (individual boxes). The number above each box indicates the number of the 100 generated data sets that were successfully fit. The catch proportions at age were either simulated (Sim) or estimated (Est) as a multinomial distribution (Mult), Dirichlet-multinomial distribution (DirMult), logistic normal distribution (Logi), multivariate Tweedie distribution (Twe), or multivariate normal distribution (MultiV) (catch at age rather than proportions). Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines are drawn through zero, red dots indicate the mean, and black dots indicate the median. When no plot is present, no replicates converged for that combination of simulation and estimation scenarios. 154 Figure 3.6. Mohn’s Rho in average fishing mortality across alternative age composition simulation (rows) and estimation (columns) scenarios and process variance estimation scenarios (individual boxes). The number above each box indicates the number of the 100 generated data sets that were successfully fit. The catch proportions at age were either simulated (Sim) or estimated (Est) as a multinomial distribution (Mult), Dirichlet-multinomial distribution (DirMult), logistic normal distribution (Logi), multivariate Tweedie distribution (Twe), or multivariate normal distribution (MultiV) (catch at age rather than proportions). Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines are drawn through zero, red dots indicate the mean, and black dots indicate the median. When no plot is present, no replicates converged for that combination of simulation and estimation scenarios. 155 Figure 3.7. Mohn’s Rho in recruitment across alternative age composition simulation (rows) and estimation (columns) scenarios and process variance estimation scenarios (individual boxes). The number above each box indicates the number of the 100 generated data sets that were successfully fit. The catch proportions at age were either simulated (Sim) or estimated (Est) as a multinomial distribution (Mult), Dirichlet-multinomial distribution (DirMult), logistic normal distribution (Logi), multivariate Tweedie distribution (Twe), or multivariate normal distribution (MultiV) (catch at age rather than proportions). Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines are drawn through zero, red dots indicate the mean, and black dots indicate the median. When no plot is present, no replicates converged for that combination of simulation and estimation scenarios. 156 Figure 3.8. Estimated recruitment process error variance across alternative age composition simulation (rows) and estimation (columns) scenarios and process variance estimation scenarios (individual boxes). The number above each box indicates the number of the 100 generated data sets that were successfully fit. The catch proportions at age were either simulated (Sim) or estimated (Est) as a multinomial distribution (Mult), Dirichlet-multinomial distribution (DirMult), logistic normal distribution (Logi), multivariate Tweedie distribution (Twe), or multivariate normal distribution (MultiV) (catch at age rather than proportions). Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines indicate the original estimate of the variance from the model on which the simulations are based, red dots indicate the mean, and black dots indicate the median. When no plot is present, no replicates converged for that combination of simulation and estimation scenarios. 157 Figure 3.9. Estimated expected survival process error variance across alternative age composition simulation (rows) and estimation (columns) scenarios and process variance estimation scenarios (individual boxes). The number above each box indicates the number of the 100 generated data sets that were successfully fit. The catch proportions at age were either simulated (Sim) or estimated (Est) as a multinomial distribution (Mult), Dirichlet-multinomial distribution (DirMult), logistic normal distribution (Logi), multivariate Tweedie distribution (Twe), or multivariate normal distribution (MultiV) (catch at age rather than proportions). Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines indicate the original estimate of the variance from the model on which the simulations are based, red dots indicate the mean, and black dots indicate the median. Two plots are removed because no models converged for those estimation models. 158 Figure 3.10. Estimated selectivity process error variance across alternative age composition simulation (rows) and estimation (columns) scenarios and process variance estimation scenarios (individual boxes). The number above each box indicates the number of the 100 generated data sets that were successfully fit. The catch proportions at age were either simulated (Sim) or estimated (Est) as a multinomial distribution (Mult), Dirichlet-multinomial distribution (DirMult), logistic normal distribution (Logi), multivariate Tweedie distribution (Twe), or multivariate normal distribution (MultiV) (catch at age rather than proportions). Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines are drawn through the predicted selectivity variance, red dots indicate the mean, and black dots indicate the median. Two plots are removed because no models converged for those estimation models. 159 Figure 3.11. Estimated recruitment and expected survival AR(1) correlation coefficient across alternative age composition simulation (rows) and estimation (columns) scenarios and process variance estimation scenarios (individual boxes). The number above each box indicates the number of the 100 generated data sets that were successfully fit. The catch proportions at age were either simulated (Sim) or estimated (Est) as a multinomial distribution (Mult), Dirichlet-multinomial distribution (DirMult), logistic normal distribution (Logi), multivariate Tweedie distribution (Twe), or multivariate normal distribution (MultiV) (catch at age rather than proportions). Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines indicate the original estimate of the correlation coefficient from the model on which the simulations are based, red dots indicate the mean, and black dots indicate the median. Plots are removed if no model converged. 160 Figure 3.12. Estimated selectivity correlation coefficient across alternative age composition simulation (rows) and estimation (columns) scenarios and process variance estimation scenarios (individual boxes). The number above each box indicates the number of the 100 generated data sets that were successfully fit. The catch proportions at age were either simulated (Sim) or estimated (Est) as a multinomial distribution (Mult), Dirichlet-multinomial distribution (DirMult), logistic normal distribution (Logi), multivariate Tweedie distribution (Twe), or multivariate normal distribution (MultiV) (catch at age rather than proportions). Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines indicate the original estimate of the correlation coefficient from the model on which the simulations are based, red dots indicate the mean, and black dots indicate the median. Two plots are removed because no models converged for those estimation models. 161 Figure 3.13. Estimated observation error parameters in self-tests that generate and estimate age compositions assuming a (A) Dirichlet-multinomial, (B) logistic normal, and (C) multivariate Tweedie distribution. The number above each box indicates the number of the 100 generated data sets that were successfully fit. Individual boxplots represent the five estimation models that assume recruitment, expected survival, or selectivity as a random effect (R), fixed effect (F), or constant (C). Black lines are drawn through the predicted value, red dots indicate the mean, and black dots indicate the median. 162 Figure 3.13 (cont’d) 163 3.5 Discussion The estimation models that specified random effects in recruitment, expected survival, and selectivity, were able to produce relatively unbiased and precise estimates of spawning stock biomass, recruitment, exploitation, and depletion under a wide range of real observation error distributions for catch proportions at age, with the important exception of when the Dirichlet- multinomial was assumed in the estimation model. In such cases, the model rarely converged unless the proportion at age data was actually from a Dirichlet-multinomial. This shortcoming was likely because the Dirichlet-multinomial could not account for inter-age correlation, only overdispersion. Likewise, using the multivariate Tweedie distribution should be avoided if the catch proportions at age contain observation error from a multinomial distribution. However there is substantial evidence that real age composition data is not sampled randomly and has inter-age correlation and overdispersion, so the data generated using the multinomial distribution is unlikely to resemble real conditions (Francis, 2014; Pennington and Vølstad, 1994). The logistic normal and the multivariate Tweedie distribution outperformed other estimation models when the process variation was correctly specified. This aligns with previous work that has recommended the logistic normal distribution or the multivariate Tweedie over the Dirichlet- multinomial in simulated cases when process error was substantial (Fisch et al., 2021) and when applied to real catch data (Albertsen et al., 2017; Francis, 2014; Thorson et al., 2022). If deciding between the logistic normal and multivariate Tweedie, the latter might be preferred for two reasons: its ability to accommodate zeros in the data and its flexibility to predict an effective sample size greater than the input sample size, which can correct for an input sample size set too low (Thorson et al., 2022). 164 In cases where the RRR model did produce biased outputs, model performance could not be reliably improved by assuming one or more processes was constant or a fixed effect rather than random effect. In some cases, specifying selectivity as constant, rather than time-varying, could increase the convergence rate and reduce bias, however, assuming survival was deterministic consistently led to under- or over- estimation of model outputs. Care should be taken to correctly specify the variability in recruitment and expected survival, regardless of whether the likelihood for the proportions at age includes weighting parameters. It is recommended to include selectivity as a random effect because there is little consequence to trying to estimate it even when it is constant (Cronin-Fine and Punt, 2021; Martell and Stewart, 2014; Stewart and Monnahan, 2017). And if the Dirichlet-multinomial is employed, the number of assumed processes will likely need to be reduced for the model to successfully run. A similar upper limit on the complexity of the process model was noted in Stock et al., (2021) who found that a model that assumed 2D autoregressive AR(1) process in expected survival and natural mortality and a logistic normal distribution for the catch proportions at age failed to converge. The RRR model had difficulty correctly allocating variability among the three processes across all age composition scenarios and had additional trouble when age composition quality was estimated (i.e., the Dirichlet-multinomial, logistic normal, or multivariate Tweedie). On average, the recruitment and expected survival process error variance tended to be overestimated and the selectivity process error variance was underestimated. When the data generating and estimation model both assume a multinomial distribution, the expected survival variance is usually underestimated under a wide arrange of true expected survival variances (see Chapter 2, Figure 2.6). In the current study, when the age composition likelihood is incorrectly specified, expected survival, more than recruitment and selectivity, may be absorbing the error that results 165 from model misspecification. The model could be accounting for differences between the expected and observed proportions at age by deviating the numbers at age much more than would be expected by the previous numbers at age and instantaneous mortality alone. This could also explain why models that assume that survival is deterministic will routinely underestimate recruitment, spawning stock biomass, and depletion and overestimate fishing mortality, and why those that assume variability are more likely to be unbiased. The principal disadvantage to assuming process error in recruitment, expected survival, and selectivity is that if the assumed observation error in the proportions at age is not multinomial, the observation error parameters- 𝛽 for the Dirichlet-multinomial, 𝜎 and 𝜌 for the logistic normal and 𝜙 and 𝜓 for the multivariate Tweedie, could be biased. Even when the observation model is correctly specified, the process model estimated variance in recruitment, expected survival, and selectivity at higher or lower values than the fit to real data. The logistic normal distribution produces estimates of process variance most similar to the fit to real data, though still underestimates the observation variance. Getting accurate estimates of the observation error variance may depend on reducing the amount of assumed variability in the process model. A two-step approach may be appropriate wherein the model is first fit assuming the survival is deterministic, then a second time assuming random effects on expected survival but the observation error parameter is fixed at its estimated value from the first step. Previous work has generated catch proportions at age data with overdispersion and inter- age correlation, either 1) by simulating from a distribution other than the multinomial as in Cronin-Fine and Punt (2021) or 2) by designing an operating model that accounts for clustering/schooling, aging error and targeted fishing, all factors that can lead to overdispersion and inter-age correlation as in Maunder (2011) or Fisch et al. (2021). My approach was the 166 former which is computationally simpler and allows for explicit self- tests that match the simulation and estimation models. However, I did not allow for different degrees of overdispersion and inter-age correlation to be evaluated and future work may focus on generating catch data with low, medium, or high inter-haul correlation to test what effect these have on the performance of age composition likelihoods. Additionally, it would be prudent to expand this work to several more parameterizations and options of the Dirichlet-multinomial and logistic normal including the “linear parameterization” of the former and other correlation structures of the variance-covariance like a second order autoregressive AR(2) or autoregressive moving average (ARMA) parameterization of the latter, either of which may be more flexible in its ability to adjust observation error variance (Fisch et al., 2021; Thorson et al., 2017). Because state-space stock assessment models with time-varying random effects have been shown to perform differently when the variability is large or small, this work could be further expanded to include cases where, for example, selectivity is not time-varying but the model assumes it is (see chapter 2). Future work should also consider generating new time series of recruitment, expected survival, and selectivity rather than limiting to those predicted from fitting the model to Gulf of Maine haddock. However, this may require use of new information or approaches, given that in Chapter 2 I found that when data were generated from a new set of states this often led to convergence failure for the self-weighting approaches, which is why I used the multinomial distribution for all simulations in Chapter 2. This study uniquely contends with the question of how to parameterize process and observation stochasticity using a simulation-estimation framework. My recommendations to use a likelihood function other than the multinomial to fit catch proportions at age data is consistent with research that used simulation-estimation methods with fewer varying processes (Cronin- 167 Fine and Punt, 2021; Fisch et al., 2021; Thorson et al., 2022; Xu et al., 2020) or applied models to real data with several time-varying processes (Albertsen et al., 2017). This work contributes to the ongoing inquiry in how to parameterize state-space stock assessment models in the face of substantial stochasticity and uncertainty. 168 REFERENCES Albertsen, C.M., Nielsen, A., Thygesen, U.H., 2017. Choosing the observational likelihood in state-space stock assessment models. Can. J. Fish. Aquat. Sci. 74, 779-789. https://doi.org/10.1139/cjfas-2015-0532. Berg, C.W., Nielsen, A., 2016. Accounting for correlated observations in an age-based state- space stock assessment model. ICES J. Mar. Sci. 73, 1788-1797. https://doi.org/10.1093/icesjms/fsw046. Besbeas, P., Morgan, B.J.T., 2017. Variance estimation for integrated population models. Adv. Stat. Anal. 101, 439-460. https://doi.org/10.1007/s10182-017-0304-5 Cronin-Fine, L., Punt, A.E., 2021. Modeling time-varying selectivity in size-structured assessment models. Fish. Res. 239, 105927. https://doi.org/10.1016/j.fishres.2021.105927. Dunn PK. 2022. Tweedie: Evaluation of Tweedie Exponential Family Models. R package version 2.3.5 Fisch, N., Camp, E., Shertzer, K., Ahrens, R., 2021. Assessing likelihoods for fitting composition data within stock assessments, with emphasis on different degrees of process and observation error. Fish. Res. 243, 106069. https://doi.org/10.1016/j.fishres.2021.106069. Fournier, D.A., Archibald, C.P., 1982. A general theory for analyzing catch at age data. Can. J. Fish. Aquat. Sci. 39, 1195-1207. https://doi.org/10.1139/f82-157. Francis, R.I.C.C., 2011. Corrigendum: Data weighting in statistical fisheries stock assessment models. Can. J. Fish. Aquat. Sci. 68, 2228-2228. https://doi.org/10.1139/f2011-165. Francis, R.I.C.C., 2014. Replacing the multinomial in stock assessment models: A first step. Fish. Res. 151, 70-84. https://doi.org/10.1016/j.fishres.2013.12.015. Francis, R.I.C.C., 2017. Revisiting data weighting in fisheries stock assessment models. Fish. Res. 151, 5-15. https://doi.org/10.1016/j.fishres.2016.06.006. Hurtado-Ferro, F., Szuwalski, C.S., Valero, J.L., Anderson, S.C., Cunningham, C.J., Johnson, K.F., Licandeo, R.R., Mcgilliard, C.R., Monnahan, C.C., Muradian, M.L., 2015. Looking in the rear-view mirror: bias and retrospective patterns in integrated, age-structured stock assessment models. ICES J. Mar. Sci. 72, 99-110. https://doi.org/10.1093/icesjms/fsu198. Legault, C.M., Restrepo, V.R., 1998. A flexible forward age-structured assessment program. Collect. Vol. Sci. Pap. ICCAT, 49, 246-253. 169 Linton, B.C., Bence, J.R., 2008. Evaluating methods for estimating process and observation error variances in statistical catch-at-age analysis. Fish. Res. 94, 26-35. https://doi.org/10.1016/j.fishres.2008.06.014. Martell, S., Stewart, I., 2014. Towards defining good practices for modeling time-varying selectivity. Fish. Res. 158, 84-95. https://doi.org/10.1016/j.fishres.2013.11.001. Maunder, M. N. 2011. Review and evaluation of likelihood functions for composition data in stock-assessment models: Estimating the effective sample size. Fisheries Research, 109: 311–319. Elsevier B.V. Maunder, M.N., Punt, A.E., 2013. A review of integrated analysis in fisheries stock assessment. Fish. Res. 142, 61-74. https://doi.org/10.1016/j.fishres.2012.07.025. Maunder, M.N., Piner, K.R., 2017. Dealing with data conflicts in statistical inference of population assessment models that integrate information from multiple diverse data sets. Fish. Res. 192, 16-27. https://doi.org/10.1016/j.fishres.2016.04.022. Methot, R.D., Wetzel, C.R., 2013. Stock synthesis: A biological and statistical framework for fish stock assessment and fishery management. Fish. Res. 142, 86-99. https://doi.org/10.1016/j.fishres.2012.10.012. Mohn, R., 1999. The retrospective problem in sequential population analysis: An investigation using cod fishery and simulated data. ICES J. Mar. Sci. 56, 473-488. https://doi.org/10.1006/jmsc.1999.0481. Nielsen, A., Berg, C.W., 2014. Estimation of time-varying selectivity in stock assessments using state-space models. Fish. Res. 158, 96-101. https://doi.org/10.1016/j.fishres.2014.01.014. Pennington, M., Vølstad, J.H., 1994. Assessing the effect of intra-haul correlation and variable density on estimates of population characteristics from marine surveys. Biometrics 50, 725-732. https://doi.org/10.2307/2532786. Perreault, A.M.J., Wheeland, L.J., Joanne Morgan, M., Cadigan, N.G., 2020. A state-space stock assessment model for American plaice on the Grand Bank of Newfoundland. J. Northwest Atl. Fish. Sci. 51, 45-104. https://doi.org/10.2960/J.v51.m727. Punt, A.E., Huang, T., Maunder, M.N., 2013. Review of integrated size-structured models for stock assessment of hard-to-age crustacean and mollusc species. ICES J. Mar. Sci. 70, 16-33. https://doi.org/10.1093/icesjms/fss185. Stewart, I.J., Monnahan, C.C., 2017. Implications of process error in selectivity for approaches to weighting compositional data in fisheries stock assessments. Fish. Res. 192, 126-134. https://doi.org/10.1016/j.fishres.2016.06.018. 170 Stock, B.C., Miller, T.J., 2021. The Woods Hole Assessment Model (WHAM): A general state- space assessment framework that incorporates time- and age- varying processes via random effects and links to environmental covariates. Fish. Res. 240, 105967. https://doi.org/10.1016/j.fishres.2021.105967. Stock, B.C., Xu, H., Miller, T.J., Thorson, J.T., Nye, J.A., 2021. Implementing two-dimensional autocorrelation in either survival or natural mortality improves a state-space assessment model for Southern New England-Mid Atlantic yellowtail flounder. Fish. Res. 237, 105873. https://doi.org/10.1016/j.fishres.2021.105873. Thorson, J.T., Kristensen, K., 2016. Implementing a generic method for bias correction in statistical models using random effects, with spatial and population dynamics examples. Fish. Res. 175, 66-74. https://doi.org/10.1016/j.fishres.2015.11.016. Thorson, J.T., Johnson, K.F., Methot, R.D., Taylor, I.G., 2017. Model-based estimates of effective sample size in stock assessment models using the Dirichlet-multinomial distribution. Fish. Res. 192, 84-93. https://doi.org/10.1016/j.fishres.2016.06.005. Thorson, J.T., Miller, T.J., Stock, B.C., 2022. The multivariate-Tweedie: a self-weighting likelihood for age and length composition data arising from hierarchical sampling designs. ICES J. Mar. Sci. fsac159. https://doi.org/10.1093/icesjms/fsac159. Venables WN, Ripley BD. 2002. Modern Applied Statistics with S, Fourth edition. Springer, New York. ISBN 0-387-95457-0, https://www.stats.ox.ac.uk/pub/MASS4/. Xu, H., Thorson, J.T., Methot, R.D., 2020. Comparing the performance of three data-weighting methods when allowing for time-varying selectivity. Can. J. Fish. Aquat. Sci. 77, 247- 263. https://doi.org/10.1139/cjfas-2019-0107. 171 APPENDIX 3A: PREDICTED VALUES AND SUPPLEMENTARY DATA Note that the catch at age is not the product of total catch and proportions because total catch is in weight and catch at age is in numbers of fish. Table 3A.1. Predicted total catch in weight, 𝐶̂𝑦 , in metric tons (mt) and proportions at age, 𝑃̂𝑎,𝑦 , for ages 1-9+. Year 𝑃̂1,𝑦 𝑃̂2,𝑦 𝑃̂3,𝑦 𝑃̂4,𝑦 𝑃̂5,𝑦 𝑃̂6,𝑦 𝑃̂7,𝑦 𝑃̂8,𝑦 𝑃̂9+,𝑦 𝐶̂𝑦 1977 0.02485 0.58862 0.07487 0.17968 0.08162 0.04801 0.00045 9.00E-05 0.00181 3306.993 1978 0.00279 0.1628 0.61264 0.05695 0.10153 0.04222 0.01996 0.00021 0.00091 5039.263 1979 0.01114 0.03139 0.24924 0.55267 0.04931 0.06854 0.02471 0.0123 0.00072 4667.412 1980 0.03503 0.16492 0.06502 0.23823 0.38412 0.03931 0.04692 0.01731 0.00915 6827.306 1981 0.01573 0.28606 0.25863 0.05399 0.13209 0.19393 0.0206 0.02567 0.01329 6078.218 1982 0.00286 0.13785 0.36099 0.21853 0.04023 0.08418 0.11801 0.01373 0.02362 6887.568 1983 0.00899 0.02676 0.26407 0.28601 0.20623 0.03496 0.06158 0.08651 0.0249 7162.477 1984 0.0028 0.10825 0.06465 0.30634 0.20092 0.18593 0.02612 0.03962 0.06538 4209.778 1985 0.00092 0.04324 0.30163 0.08128 0.23333 0.14017 0.13471 0.01534 0.04938 3309.395 1986 0.00189 0.02002 0.17546 0.36877 0.07214 0.16318 0.09312 0.07656 0.02886 2268.542 1987 0.00156 0.0418 0.08844 0.28995 0.26971 0.06605 0.12712 0.06071 0.05465 960.6168 1988 0.00714 0.02627 0.13866 0.13345 0.28472 0.2206 0.04714 0.0829 0.05912 493.2147 1989 0.00461 0.12793 0.08612 0.19807 0.13259 0.2184 0.13636 0.02739 0.06855 320.0106 1990 0.0076 0.06396 0.37221 0.09601 0.15385 0.08441 0.11544 0.0651 0.04141 481.9544 1991 0.01105 0.1002 0.19423 0.38361 0.07487 0.09713 0.04293 0.05112 0.04486 443.6066 1992 0.031 0.12773 0.28023 0.20016 0.22649 0.041 0.04201 0.01609 0.03529 311.7735 1993 0.03811 0.24328 0.25114 0.19286 0.10479 0.11576 0.01846 0.01602 0.01958 203.4478 1994 0.03622 0.24449 0.37519 0.14535 0.07797 0.05061 0.0499 0.00672 0.01354 186.6776 1995 0.01662 0.26103 0.35847 0.22601 0.05739 0.03433 0.02085 0.01792 0.00737 448.9902 1996 0.00417 0.11105 0.46693 0.26256 0.09567 0.02672 0.01519 0.00818 0.00955 340.4701 1997 0.00818 0.03312 0.24951 0.45465 0.16408 0.05755 0.01588 0.00796 0.00906 1058.044 1998 0.00531 0.06458 0.07915 0.31495 0.35244 0.12178 0.04109 0.01012 0.01057 975.1465 1999 0.02128 0.04706 0.14231 0.12113 0.28738 0.25265 0.08948 0.02566 0.01304 601.3936 2000 0.00305 0.17644 0.10578 0.17526 0.10612 0.21514 0.14825 0.04831 0.02163 1113.83 2001 0.00076 0.02783 0.35275 0.1401 0.15285 0.08534 0.12652 0.07679 0.03706 1206.591 2002 0.0009 0.0069 0.0546 0.47061 0.13761 0.13102 0.05239 0.07688 0.0691 1281.283 2003 0.00012 0.00907 0.01786 0.07707 0.60629 0.11505 0.07319 0.02612 0.07522 1385.141 2004 0.00171 0.00173 0.02746 0.0349 0.10919 0.65622 0.0712 0.04121 0.05637 1426.343 2005 0.00026 0.04256 0.00753 0.05709 0.06636 0.15168 0.54991 0.05463 0.06998 1755.596 2006 0.00114 0.00437 0.1818 0.01276 0.07291 0.07543 0.11914 0.4486 0.08385 1125.342 2007 0.00229 0.02097 0.01711 0.35806 0.01561 0.07277 0.05865 0.08623 0.36831 1885.741 2008 0.00041 0.03435 0.06598 0.02628 0.4774 0.01343 0.05577 0.03854 0.28785 1537.247 2009 0.0006 0.0063 0.0905 0.0909 0.03233 0.49476 0.01071 0.03986 0.23404 1341.602 2010 0.00202 0.01013 0.01803 0.11091 0.10726 0.03464 0.48652 0.00771 0.2228 1294.561 2011 0.01612 0.04239 0.03068 0.02508 0.13736 0.11964 0.03343 0.39921 0.19609 931.1966 172 Table 3A.1 (cont’d) 2012 0.00745 0.29621 0.09671 0.03835 0.02228 0.11626 0.07804 0.02063 0.32407 1066.45 2013 0.02434 0.07007 0.54144 0.08655 0.02602 0.01334 0.06311 0.03717 0.13796 1274.27 2014 0.04052 0.16908 0.11163 0.53616 0.0448 0.01266 0.00535 0.02362 0.05617 1413.875 2015 0.00178 0.3748 0.23443 0.07934 0.26332 0.01868 0.0043 0.00166 0.02169 1520.039 2016 0.001 0.0129 0.62262 0.1553 0.0422 0.14815 0.00786 0.00168 0.00832 3417.433 2017 0.00098 0.00984 0.02767 0.7187 0.11606 0.02976 0.08788 0.0041 0.005 4919.816 2018 0.00032 0.01227 0.02628 0.0379 0.77052 0.08417 0.01764 0.04601 0.0049 3735.021 2019 0.00019 0.00442 0.03543 0.04084 0.04193 0.77665 0.05632 0.01176 0.03247 4572.009 Table 3A.2. Predicted catch at age, 𝐶̂𝑎,𝑦 , in thousands of fish, for ages 1-9+. Year Ĉ1,y Ĉ2,y Ĉ3,y Ĉ4,y Ĉ5,y Ĉ6,y Ĉ7,y Ĉ8,y Ĉ9+,y 1977 64.36553 1524.642 193.9273 465.4 211.4082 124.3626 1.170178 0.234036 4.68071 1978 9.647603 563.2727 2119.684 197.0473 351.2694 146.074 69.06422 0.719318 3.132671 1979 30.11564 84.88222 674.0665 1494.711 133.3532 185.3607 66.8179 33.27062 1.947584 1980 119.8756 564.4389 222.5368 815.3669 1314.654 134.5246 160.574 59.23828 31.33018 1981 51.02478 927.7554 838.7944 175.1043 428.3966 628.936 66.7942 83.26101 43.11055 1982 10.54797 508.4561 1331.498 806.0282 148.3758 310.5089 435.2775 50.62773 87.10347 1983 31.29291 93.09956 918.8288 995.1624 717.5684 121.6312 214.2858 301.0209 86.62564 1984 5.073111 196.3108 117.2492 555.5566 364.3804 337.2003 47.36384 71.85188 118.5626 1985 1.400902 65.49444 456.8645 123.1039 353.4084 212.3017 204.0388 23.2416 74.80125 1986 2.119151 22.43891 196.6509 413.3133 80.85484 182.8851 104.364 85.81045 32.3497 1987 0.590107 15.80274 33.43479 109.6133 101.9615 24.97019 48.05652 22.95174 20.66113 1988 1.375507 5.062218 26.71892 25.71581 54.8635 42.50886 9.083762 15.97367 11.39138 1989 0.5895 16.37471 11.02297 25.35299 16.97125 27.95547 17.45419 3.505387 8.774151 1990 1.512147 12.71739 74.01403 19.09191 30.5929 16.78417 22.95563 12.94558 8.234291 1991 1.889746 17.14003 33.22292 65.61805 12.80715 16.61451 7.343159 8.743497 7.673595 1992 4.711277 19.41443 42.59333 30.42376 34.42487 6.231598 6.386077 2.445837 5.364227 1993 4.263278 27.21651 28.09623 21.57666 11.7232 12.95011 2.065694 1.79212 2.190663 1994 3.860754 26.06329 39.99627 15.49485 8.311407 5.395068 5.319272 0.716742 1.443894 1995 4.926308 77.36357 106.2421 66.98581 17.01032 10.17556 6.178856 5.311216 2.185125 1996 0.99144 26.38946 110.9612 62.39491 22.73529 6.348823 3.608618 1.943576 2.268914 1997 4.603832 18.64612 140.4697 255.9595 92.37351 32.3997 8.942162 4.482889 5.099105 1998 2.487198 30.23084 37.05621 147.4437 164.9968 57.00897 19.23646 4.738927 4.950635 1999 7.380731 16.32361 49.36437 42.01725 99.68484 87.63504 31.0393 8.900266 4.52376 2000 2.058675 118.9442 71.31067 118.146 71.54197 145.0352 99.94023 32.56987 14.5832 2001 0.535843 19.55248 247.8588 98.4428 107.4029 59.96165 88.89631 53.95432 26.04177 2002 0.684518 5.233093 41.39876 356.8458 104.3426 99.34635 39.72212 58.29197 52.39453 2003 0.105072 7.780729 15.31173 66.07663 519.8392 98.64247 62.75669 22.39426 64.49716 2004 1.487501 1.498638 23.85292 30.30837 94.83503 569.9329 61.83989 35.79431 48.95461 2005 0.289912 46.86504 8.292384 62.86154 73.06885 167.0184 605.5313 60.15509 77.05889 173 Table 3A.2 (cont’d) 2006 0.859622 3.296312 137.0759 9.620302 54.97207 56.87209 89.82911 338.2413 63.22189 2007 3.031117 27.77332 22.65559 474.1867 20.6707 96.37616 77.6682 114.2025 487.7589 2008 0.438864 37.03156 71.14057 28.32969 514.7272 14.4844 60.12627 41.55791 310.3514 2009 0.52068 5.456948 78.32872 78.67509 27.98236 428.2262 9.271401 34.49954 202.5658 2010 1.594541 8.000589 14.24218 87.63078 84.74826 27.36814 384.4188 6.091257 176.0378 2011 8.851628 23.28364 16.85143 13.7746 75.44232 65.70769 18.35991 219.2538 107.6999 2012 6.12203 243.4982 79.50022 31.52638 18.31593 95.5677 64.15483 16.95681 266.4025 2013 27.57266 79.38815 613.4074 98.05162 29.48127 15.11717 71.4959 42.10628 156.3008 2014 54.39156 226.9524 149.8382 719.6802 60.13748 16.99709 7.186617 31.70809 75.39438 2015 3.161245 663.892 415.255 140.5441 466.4308 33.08125 7.615274 2.934991 38.41228 2016 4.130328 53.50235 2582.956 644.2627 175.06 614.5949 32.59656 6.951739 34.50351 2017 5.118253 51.4821 144.7045 3758.474 606.9205 155.6328 459.5756 21.46536 26.13838 2018 1.190751 46.1663 98.88559 142.5834 2899.036 316.6825 66.37148 173.0962 18.44742 2019 0.868842 20.23003 162.1355 186.9135 191.9033 3554.357 257.7378 53.81914 148.5856 Table 3A.3. Weight at age at time of harvest, 𝑊𝑎,𝑦 , in kilograms (kg). Year 𝑊1,y 𝑊2,y 𝑊3,y 𝑊4,y 𝑊5,y 𝑊6,y 𝑊7,y 𝑊8,y 𝑊9+,y 1977 0.113 0.757 1.163 2.008 2.558 3.358 3.709 3.587 4.686 1978 0.113 0.777 1.234 1.684 2.438 3.108 4.642 4.075 6.088 1979 0.56 0.774 1.155 1.805 2.261 2.659 2.775 3.587 4.725 1980 0.468 0.76 1.168 1.852 2.389 3.354 3.602 4.562 4.204 1981 0.56 0.685 1.515 1.978 2.641 3.027 3.658 4.185 3.921 1982 0.376 0.62 0.995 2.138 2.598 3.107 3.646 4.129 4.293 1983 0.181 0.667 1.2 1.727 2.376 2.969 3.373 3.719 4.215 1984 0.313 0.803 1.23 1.801 2.324 3.166 3.917 4.498 4.071 1985 0.315 0.981 1.068 1.86 2.34 2.652 3.588 4.09 4.153 1986 0.503 0.507 1.193 1.457 2.264 2.494 3.063 3.636 4.589 1987 0.349 0.856 1.592 2.008 2.402 2.609 3.272 4.236 5.277 1988 0.331 0.412 1.116 1.626 2.562 2.583 3.872 4.664 5.18 1989 0.247 1.146 1.818 1.827 2.375 2.863 3.537 4.4 4.244 1990 0.296 0.831 1.541 3.331 2.454 3.043 3.738 3.618 4.189 1991 0.347 1.46 1.882 2.657 3.027 2.958 3.35 4.433 3.881 1992 0.448 1.192 1.764 1.973 2.654 3.067 2.079 3.721 3.45 1993 0.364 0.885 1.593 2.041 2.436 3.035 3.393 3.422 3.657 1994 0.361 0.776 1.589 2.187 3.065 2.788 3.623 3.41 3.73 1995 0.278 0.821 1.16 1.747 2.515 3.512 4.091 5.209 5.665 1996 0.336 0.674 1.074 1.803 2.197 3.149 2.472 2.388 3.163 1997 0.353 0.835 1.766 1.65 2.329 2.975 2.991 3.073 3.608 1998 0.25 0.976 1.447 1.828 2.213 2.843 3.375 3.152 2.989 1999 0.265 0.609 1.311 1.609 1.764 1.926 2.279 3.025 3.29 2000 0.261 0.607 1.005 1.54 1.782 2.009 2.369 2.669 3.081 174 Table 3A.3 (cont’d) 2001 0.242 0.899 1.271 1.495 1.813 2.213 2.245 2.481 2.528 2002 0.121 0.475 1.022 1.34 1.631 2.141 2.596 2.299 2.638 2003 0.318 0.583 0.89 1.228 1.466 1.766 2.13 2.418 2.507 2004 0.189 0.582 0.782 1.382 1.341 1.659 1.813 2.01 2.204 2005 0.173 0.55 0.752 1.134 1.458 1.436 1.674 1.946 2.273 2006 0.255 0.426 0.841 0.798 1.345 1.649 1.508 1.682 2.035 2007 0.243 0.589 0.799 1.138 1.216 1.512 1.635 1.576 1.708 2008 0.289 0.55 0.986 1.188 1.322 1.302 1.658 1.734 1.746 2009 0.304 0.714 0.781 1.118 1.277 1.608 1.533 1.849 1.906 2010 0.37 0.591 0.776 1.086 1.396 1.533 1.705 2.099 2.014 2011 0.365 0.593 1.03 1.177 1.371 1.57 1.773 1.896 2.096 2012 0.247 0.54 0.903 1.056 1.41 1.444 1.736 1.798 1.962 2013 0.292 0.572 0.842 1.207 1.341 1.566 1.657 1.851 2.088 2014 0.301 0.59 0.759 1.096 1.556 1.656 1.961 1.924 2.179 2015 0.187 0.441 0.799 0.976 1.285 1.672 1.575 1.991 2.218 2016 0.185 0.366 0.603 1.012 1.175 1.372 1.706 2.004 2.003 2017 0.182 0.399 0.598 0.832 1.137 1.29 1.503 1.608 2.627 2018 0.174 0.408 0.564 0.747 0.943 1.298 1.37 1.644 1.808 2019 0.234 0.391 0.575 0.719 0.851 0.984 1.279 1.495 1.786 175 CONCLUSIONS AND RECOMMENDATIONS The overarching intent of this work was to answer one central question – how can state- space stock assessment models (SSSAM) be designed and employed most effectively across a range of modeling contexts now that computing software has eased their use? In what contexts and with what data can this modeling framework be used to minimize bias in crucial management estimates like spawning stock biomass, exploitation, and recruitment, or the process and observation error variances. I sought to test the limits of SSSAM’s purported abilities to estimate process and observation variance, reduce retrospective patterns, and allow for more stochasticity to be included in the population model than was previously possible in age based statistical catch at age models. Chapter 1 demonstrated that existing SSSAM structures can be adapted to accommodate fisheries dependent data such as catch per unit effort in the absence of a fisheries independent index. Several fisheries across the world rely on fisheries dependent data either exclusively or alongside fisheries independent surveys. I recommend that such fisheries at least consider the use of a state-space stock assessment model in their management practices, alongside conventional approaches. This chapter also demonstrated that yearly recruitment, catchability-at-age, and their associated variances could be predicted in a stock assessment model without additional functions (a stock-recruitment curve or a selectivity curve, respectively) to inform them, though the observation error variance would need to be specified. This added flexibility contributed to observed differences between the state-space model and the non-state-space model in output recruitment and spawning stock biomass. There is some evidence that the estimated recruitment in the SSSAM may have been positively biased - the model had negative residuals in the proportions at age for the youngest age classes and there was positive bias in the recruitment 176 retrospective analysis. I recommend that careful consideration be taken when parameterizing and interpreting recruitment in SSSAMs. Without enough structure or data to inform it, the model may estimate constant recruitment or very little yearly variability. However, with a specific and restrictive structure such as a stock-recruitment curve, which constricts the expected recruitment to be a function of predicted stock size in a previous year, or a random walk, which constricts the recruitment to be similar to recruitment in previous and subsequent years, the model may be biased. I recommend further research, particularly simulation-estimation models or model evaluation by likelihood or AIC, to investigate the performance of SSSAM under several parameterizations of the recruitment. Chapter 2 demonstrated how well SSSAM can achieve unbiased estimates of recruitment, spawning stock biomass, exploitation, and depletion when the recruitment, abundance (or “expected survival”) and selectivity are highly varying, slightly variable, or constant. By using SSSAM, I was able to include more process variance than was previously possible for Gulf of Maine haddock. I used a simulation-estimation experiment to generate data with known underlying states like abundance and selectivity, such that I could compare the estimated values to the true values and thus quantify bias and precision. I found that a SSSAM that assumes a process is varying when it is constant is preferable to assuming a process is constant when it is varying, with one exception. If the survival is deterministic (i.e., the numbers at age are a function of numbers at age in the previous year and age and instantaneous mortality without an additional random error), the estimation model should be specified as such, or else the model is unlikely to converge. I recommend that process errors be added to natural mortality, rather than survival, if variability is expected in numbers at age due to stochasticity in natural mortality, and that models explicitly account for movement if the variability in numbers at age are due to 177 immigration or emigration. Using these approaches, the model may better reflect the true population dynamics and the output is likely to be more accurate. This work also revealed that when SSSAMs incorporate error in several different processes and estimate their variances, there is risk of misapportioning the variability across those sources. I would recommend that the expected survival and selectivity process variances especially be interpreted with caution, particularly if there is AR(1) autoregression in the process errors, as it might lead to underestimation of the variance. Chapter 3 compared the performance of several existing age composition likelihood options in a popular state-space stock assessment software, WHAM, to determine which of these choices may be preferable under the widest array of circumstances, and when the data have overdispersion, inter-age correlation, and zeros. If catch proportion at age data is to be used in SSSAM, alternatives to the multinomial must be employed in order to estimate the observation variance. When the estimation model accounts for several time-varying processes, age composition likelihoods that can account for overdispersion and inter-age correlation outperformed those that could not. I recommend employing a logistic normal or multivariate Tweedie distribution to fit catch proportions at age data in SSSAM, especially when the model accounts for two or more time-varying processes. Though this resulted in the least biased outputs, I caution that the estimated observation error variance may be underestimated and the process error variance biased in such cases. The findings from this work support my previous recommendation to account for multiple sources of process variation in the process model, as such models reduced relative error and retrospective patterns compared to those that assumed fixed effects and constant parameters regardless of how the observation model was specified. Data weighting and alternative likelihoods in SSSAM are ongoing areas of investigation, and I 178 recommend continued research into the logistic normal and multivariate Tweedie distributions using more simulation-estimation experiments to understand how much overdispersion and/or inter-age correlation can be accounted for by their parameters. This work encountered several reoccurring issues, none more omnipresent than difficulties surrounding model fitting and convergence. Model failure, more than anything, revealed the limits of SSSAM, such as trying to estimate standard deviation parameters of the total catch for Lake Michigan lake whitefish (Chapter 1), trying to estimate survival process variance when it was absent (Chapter 2), or trying to scale the effective sample size while failing to account for overdispersion and correlation (Chapter 3). When George Box revisited his famous aphorism, he included the clause: “the practical question is how wrong do [models] have to be to not be useful.” There is no more useless model than a failed model, and no stronger sign throughout this work that a model was inappropriate in a given context, either because processes were specified incorrectly, or the model tried to estimate too many processes with not enough data. A continued challenge in quantitative fishery science in general and SSSAM in particular is to find synergy among system complexity, data availability, and model structure. Another consistent theme throughout this work was questioning how to collect catch and index data. If there are fisheries dependent indices of abundance like catch per unit effort, a survey is not required to fit a SSSAM. And if a survey is used, fewer samples may be required than was previously assumed to achieve the same level of model performance. When the standard deviation of the realized total index values was increased (by 10% or 50%) or the effective sample size of the proportions at age were lowered (also by 10 or 50%) the model did just as well in estimating performance metrics like recruitment and spawning stock biomass or the process error variances, suggesting fewer sampling trips may be taken or fewer individuals 179 might be aged to achieve the same accuracy and precision. The multivariate Tweedie distribution can increase or decrease the effective sample size relative to the input sample size, suggesting that even if the total number of sampled individuals in the catch is low, the model may be able to right-weight the information internally. In the future, models may be able to do more with less, and resources can be allocated towards collecting as much informative data as possible without being redundant. My intent was to investigate and offer a supplement, rather than a replacement, to the quantitative fisheries modeling toolbox. SSSAMs present an exciting avenue to introduce additional stochasticity and estimate more varying processes but they are not without their limitations. It should never be assumed that one model or class of models will always be right for every fishery or circumstance, and it is wise to consider a suite of model parameterizations and structures, if nothing else to determine how models are sensitive to particular assumptions. I hope that the findings of this research will lend confidence to the continued inclusion of SSSAM in management plans of several fisheries stocks across the globe. 180