WIENER-CHAOS ANALYSIS ON BAYESIAN MODELS WITH APPLICATIONS IN AGRICULTURE AND CLIMATOLOGY By Han Wang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Statistics — Doctor of Philosophy 2020 ABSTRACT WIENER-CHAOS ANALYSIS ON BAYESIAN MODELS WITH APPLICATIONS IN AGRICULTURE AND CLIMATOLOGY By Han Wang Understanding the challenges to increasing maize productivity in sub-Saharan Africa has important implications for policies to reduce national and global food insecurity. There is insufficient research on the key agronomic and environmental factors that influence maize yield in a smallholder-farm environment. We implement a Bayesian analysis with longi- tudinal household survey data covering 1,197 plots among 320 farms in central Malawi. The results reveal a high positive association between a leaf chlorophyll indicator and yield, with significance levels exceeding 95% Bayesian credibility at all sites, and the posterior mean of the regression coefficient ranging from 28% to 42% on a relative scale. A parasitic weed, Striga asiatica, is the variable that negatively associated with yield of high inten- sity. The impact of rainfall varies by site and season, either directly or indirectly. We conclude that the determinants preventing striga infestation and enhancing nitrogen fertility will lead to higher maize yield in Malawi. To improve plant nitrogen status, fertilizer is ef- fective at higher-productivity sites, whereas soil carbon and organic inputs are important at marginal sites. Uniquely, the Bayesian approach allows differentiation of response by site for a modest-sample-size study. Considering the biophysical constraints, our findings highlight area-specific recommendations as well as management strategies for crop yield. Quantifying the sensitivity of climate forcing factors such as greenhouse gas concentra- tion and solar irradiation, is critical in comprehending the evolution of the Earth’s climate. There exists a variety of statistical methods to reconstruct temperature in the past, but the same is not true for projecting future temperatures. We produce a multi-level stochastic model to systematically reconstruct and project the northern-hemisphere average tempera- ture anomalies, for the past millennium (1000-1999) and the next century (2019-2100), by coordinating with climatic forcings and natural proxies from diverse data sources. Additive noises are applied to the model to capture the unaccounted variability. Model parameters are estimated using Bayesian-inference techniques, resulting in complete distributional in- formation. Reconstructions with memory features (no, short, long) are evaluated through selected validation metrics, and the results constitute evidence in favor of using a moderate- memory length. For the purpose of temperature projections, we incorporate realistic climate forcing uncertainties to Year 2100. Similarly, we include an uncertainty component on top of using representative carbon pathway scenarios for global greenhouse gases. Our projections’ posterior means show a great level of agreement with the 95% confidence interval provided by the Coupled Model Intercomparison Project, while featuring differences in most cases. The models described above are both implemented via Gibbs sampler with 10,000 itera- tions. In order to avoid its potential computational heft, we combine the use of maximum likelihood estimators for regression elements with properties of Wiener chaos, to approxi- mate the predictive samples with specific chaos distributions that do not require sampling via numerics. Some of the approximations’ statistics, such as error variances are also ex- plicitly provided. The precision are relatively high (nearly 0.1% and 0.5%) depending on dimension circumstances. This allows practitioners to estimate approximation accuracy and convergence rates in practice, with no resort to heavy computational demands. ACKNOWLEDGMENTS First and foremost, I would love to express my deep gratitude to Dr. Frederi Viens. Not only is he a knowledgeable advisor providing constant guidance throughout the program, but also he is a great mentor who teaches me how to become a better person. Also, I am grateful for Dr. Sieglinde Snapp who helps me open the window of research world. It is my great pleasure to collaborate with her and get to learn how to do research meticulously. I’m thankful that Dr. R.V. Ramamoorthi and Dr. Shlomo Levental agree to be on my committee, offering crucial insights and expertise. Moreover, a special thank you to my parents’ consistent love and support. I would never be where I am today without them. My girlfriend Mengzi’s companionship through the years is very much appreciated as well. Last but not least, I’d like to thank all the staff members and my fellow PhD colleagues, with whom I have countless discussions on a variety of topics. iv TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Bayesian Regression Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Chaos Structure of Wiener Space . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Study Sites 2.1 2.2 Materials Chapter 2 Maize Yield Determinants and Management Strategies . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Modeling Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Explanatory Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2.1 Maize yield . . . . . . . . . . . . . . . . . . . . . . . . . . . SPAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2.2 Striga . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2.3 . . . . . . . . . . . . . . . . . . . 2.4.2.4 Household (farmer) effect 2.4.2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Discussions Socioeconomic factors Chapter 3 Global Temperature’s Reconstruction and Projection . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Forcing Transformation and Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Multi-level Autoregressive Model 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Validation Metrics 3.4.2 Reconstruction (1000-1999) . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Projection (2019-2100) . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Conclusions and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4 Predictive-distribution Approximation via Wiener Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Preliminaries v 1 1 2 4 5 5 6 6 7 9 9 10 13 13 16 16 18 19 21 22 23 27 27 29 31 31 34 36 36 37 42 47 49 49 4.2 Approximation Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Wiener-chaos Expansion . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Error Magnitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Prediction Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Explicit Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation Results 4.4 Discussions 51 51 53 55 55 65 66 Chapter 5 Future Work Directions . . . . . . . . . . . . . . . . . . . . . . . . 69 APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX A Model Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX B Exponential Decay . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX C Posterior-distribution Computation . . . . . . . . . . . . . . . . . APPENDIX D Proof of Theorem 4.3.5 . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX E Boundaries Justification . . . . . . . . . . . . . . . . . . . . . . . . 70 71 74 76 82 86 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 vi LIST OF TABLES Table 2.1: Mean and (standard error) of all model variables by location . . . . . . . 14 Table 2.2: Number and (proportion) of households where the y-intercept is signifi- cantly non-zero at 95% credibility level in all models . . . . . . . . . . . . 21 Table 3.1: Root mean square error (RMSE) for all combinations of memory coefficients 37 Table 3.2: Empirical coverage probability (ECP) at levels 80% (up) and 95% (down) 38 Table 3.3: Negative interval score (IS) at both 80% (up) and 95% (down) levels . . . 39 Table 3.4: Negative continuous ranked probability score (CRPS) for different memories 39 Table 3.5: ECP at two levels 80% (up) and 95% (down) for specific calibrations . . . 40 Table 3.6: Validation metrics (negative IS and CRPS) for both models . . . . . . . . 40 Table 3.7: The diagnosis of MCMC’s convergence (PSRF) for both models . . . . . . 41 Table 3.8: List of CMIP5 models used for multi-averaging temperature calculations . 43 Table 3.9: Temperature differences (in absolute values) between two projection periods 45 Table 3.10: Gibbs sampler convergence for the posterior means and (standard devia- tions) of model coefficients in Scenario D as well as validation metrics for each memory combination . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 4.1: Approximation results for various combinations of (s2, σ2 significance levels (α), dimension (n), and number of coefficients (k) Y ∗) under different . . . 46 68 vii LIST OF FIGURES Figure 2.1: 10-day precipitation (CHIRPS) from December to May at all EPA sites . 15 Figure 2.2: 95% credible intervals for the determinants of maize yield . . . . . . . 17 Figure 2.3: 95% credible intervals for the determinants associated with SPAD . . . 18 Figure 2.4: 95% credible intervals for the drivers of striga prevention (logistic) . . 20 Figure 2.5: 95% credible intervals for the determinants of striga control . . . . . . 20 Figure 2.6: 95% credible intervals for determinants including two socioeconomic factors associated with maize yield . . . . . . . . . . . . . . . . . . . . . 23 Figure 3.1: Greenhouse gas projections of four RCP scenarios . . . . . . . . . . . . . 31 Figure 3.2: The first 36 re-normalized volcanic eruptions’ decay . . . . . . . . . . . . 33 Figure 3.3: Temperature anomalies’ reconstructions between our model and EIV . . 41 Figure 3.4: Temperature prediction for RCP8.5 between 2006 and 2100 (CMIP5) . . 44 Figure 3.5: Temperature projections until 2100 for all four RCPs: comparisons with CMIP5 multi-model average . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.6: Temperature projections’ comparisons with CMIP5 for medium scenarios (up: RCP4.5; down: RCP6.0) . . . . . . . . . . . . . . . . . . . . . . . . Figure A.1: Temperature projections’ comparisons (CMIP5) with no σC . . . . . . . Figure A.2: Temperature projections’ comparisons (CMIP5) with 1 2σC . . . . . . . . Figure A.3: Temperature projections’ comparisons (CMIP5) with 2σC . . . . . . . . Figure A.4: Temperature projections’ comparisons (CMIP5) with 3σC . . . . . . . . Figure A.5: 10,000 MCMC samples of all model parameters (from top to bottom): α1, . . . . . . . . . . . . . . . . . . . . . . . . . α0, βC , βV , βS, β0, σ2 P , σ2 T 44 45 71 72 72 72 73 viii Chapter 1 Introduction 1.1 Bayesian Regression Modeling Bayesian statistics is based on the concept of probability to predict the future, where probability is interpreted as a degree of belief in an event. The degree of belief can come from previous knowledge or personal beliefs about the event. Bayesian statistics was named after Thomas Bayes, who first described the outcome of a coin-toss experiment using Bayes’s theorem on his paper published in 1763. From the late 1700s to the early 1800s, Pierre-Simon Laplace established the Bayesian interpretation of probability, and many other Bayesian methods were developed by other scholars during that time. Over much of the 20th century, the widely-used statistical methods were based on the frequentist interpretation. Bayesian methods were not in favor of, even controversial to many statisticians, due to computational reliance and practical restrictions. Since the 1950s, researchers began to apply Bayes’s theorem to account for model un- certainties, by incorporating educated guesses about the likelihood of something happening, and then making predictions. However, it is difficult to implement these types of guesses. In the recent two decades, particularly with the emergence of high-performance computers and new repetitive algorithms like Markov chain Monte Carlo (MCMC), Bayesian methods have been more recognized as powerful tools in the scientific community after overcoming 1 the implementation difficulties. Bayesian linear regression is a modeling approach in which the statistical analysis is un- dertaken within the context of Bayesian inference. When the regression model has an error following a normal distribution, and a particular form of prior distribution is assumed, the full posterior probability distributions for every model parameter are explicitly available. Not only does it provide more information than point estimates like means and variances in classical frequentist statistics, but credible intervals are straightforward to interpret even by non-statisticians. P-values can be computed in a Bayesian way, with more power and flex- ibility in assessing the significance of explanatory variables [1], avoiding misinterpretations of p-values [2, 3]. Also, the Bayesian approach allows background knowledge from domain specialists to be incorporated into the analysis, as a type of participatory model building, improving the accuracy and credibility of the estimations [4]. Last but not least, Bayesian statistics has an advantage in statistical power for data- limited studies [5, 6]. There is an often quoted but rarely if ever formally cited rule of thumb, by which the number of parameters (or degrees of freedom) that one can estimate reliably (e.g., with credibility level higher than 90%) in a linear model, is a third of the total number of data points, compared to a tenth in ordinary frequentist linear regression [7]. 1.2 Chaos Structure of Wiener Space Polynomial chaos (also called Wiener chaos expansion) is a non-sampling-based method to determine evolution of uncertainty in a dynamic system in which the parameters have probabilistic uncertainties. It was first introduced by Norbert Wiener in 1938, where Hermite polynomials were used to model stochastic processes with Gaussian random variables [8]. 2 When the second moment is finite, such an expansion converges in the L2 sense for any arbitrary stochastic process, which is applicable in most physics systems. In any real and separable Hilbert space H, there exists an isonormal Gaussian process of a centered Gaussian family (G(ϕ), ϕ ∈ H) of random variables on a probability space (Ω,F,P), such that E[G(ϕ)G(ψ)] = (cid:104)ϕ, ψ(cid:105)H (1.1) The Wiener chaos of order n is defined as the closure in L2(Ω), a linear span of the random variables Hn(G(ϕ)), where Hn is the Hermite polynomial of degree n, and ||ϕ||H = 1. For any F ∈ L2(Ω), there is a unique sequence of functions fn ∈ H(cid:12)n (symmetric tensor product) such that ∞(cid:88) n=0 F = In(fn) (1.2) where In is the Wiener stochastic integral with respect to G, I0(f0) := E[F ], and all of them are mutually orthogonal in L2(Ω). This is the fundamental decomposition of L2(Ω) as a direct sum of all Wiener-chaos terms [9]. Moreover, L2(Ω) is closed under multiplication, for any p, q and f ∈ H(cid:12)p, g ∈ H(cid:12)q (symmetric), the product of Wiener integrals is calculated by: p∧q(cid:88) Ip(f )Iq(g) = r!Cr pCr q Ip+q−2r(f ⊗r g) (1.3) n=1 where the contraction f ⊗r g is an element of H⊗(p+q−2r). In particular, the special case when p = q = 1 is mostly used as follows: I1(f )I1(g) = I2(f ⊗ g + g ⊗ f ) + (cid:104)f, g(cid:105)H 1 2 (1.4) 3 1.3 Dissertation Overview This dissertation focuses on the applications of Bayesian regression models, specifically in agriculture and climatology, as well as Wiener-chaos analysis on the predictive distribution of the response variable under linear-regression-model setup. Here is the breakdown: • Chapter 2 introduces a multi-linear regression model to mainly explore the determi- nants of maize yield stability in Malawi. • Chapter 3 develops a hierarchical Bayesian model to investigate the reconstruction and the projection of temperature anomalies in northern hemisphere. • Chapter 4 proposes an estimation framework of predictive distribution using Wiener- chaos-expansion technique, which can give rise to a better understanding on the risk of a linear-regression model and the approximation error. • Chapter 5 discusses the possible work directions in the future. 4 Chapter 2 Maize Yield Determinants and Management Strategies 2.1 Introduction Yield gaps in African smallholder agriculture are pervasive and large. The yields achieved on the vast majority of African farms are 10-30% of their genetic potential [10]. Yield- limiting factors have been identified, such as environment, sub-optimal planting in terms of timing and spacing, deficiencies in soil nutrients, moisture, as well as damage from weeds and pests [11,12]. Agricultural economists commonly emphasize market prices, farmer education, and related socio-economic factors to influence on-farm production [13]. There are many challenges to carrying out effective diagnostic analysis of yield gaps, and often the focus has been on the size of the gap. Yet if research priorities and agronomic recommendations are to address farm-level constraints, there is urgent need for evidence-based examination of the main determinants of yield in specific contexts. This is the first study to understand maize yield determinants by applying a Bayesian approach to a unique survey dataset from central Malawi. Crop simulation models are often used for gap analysis, and are suited to providing insights into yield potential as well as technology response to weather variability; however, they do not reveal the drivers of 5 yield gaps [14]. In field experimentation, trials are often run under conditions that are not representative of on-farm conditions. Smallholders in sub-Saharan Africa often have marginal soils with less intensive management, facing weed, disease, and other pest problems [15]. The disconnect between soil conditions at research stations and those on smallholder farms is illustrated by a nationwide assessment in Malawi, where soil organic matter levels at research stations are 1.5 to 2 times as high as those observed on smallholder farms [16]. Researchers generally choose a field site and invest resources, so as to ensure a homogeneous environment, within which to evaluate a practice or to address a specific research question. Thus field research sites tend to be flat, uniform, and high-potential, given that conventional research experimentation usually tests one or two component practices while controlling other sources of variability [17]. The overall objective of the project is to conduct a Bayesian analysis of household-survey data that comprises multiple visits to maize-focal plots in central Malawi, to determine which variables influence the maize yield [18]. Specifically, we examine the predictive ability of time-series environmental factors and management practices regarding field observations of maize yield. Furthermore, we assess leaf chlorophyll status and parasitic weed incidence to provide site-specific models and recommendations. 2.2 Materials 2.2.1 Study Sites Central Malawi agriculture is dominated by mixed maize production systems with limited livestock presence, which is broadly typical for poor-resource smallholder farms in southern Africa [19]. Administrative units in the Malawi government are comprised of region, agricul- 6 tural development division, and extension planning area (EPA). The study sites are chosen using a stratified random sampling scheme, where all EPAs within central Malawi are clas- sified using the strata of marginal, moderate or mesic for plant growth based on rainfall and evapotranspiration. There is one marginal site that contains two adjacent EPAs–Golomoti and Mtakataka (referred herein as Golomoti), two moderate-potential sites which are Kandeu and Nsipe, and one high-potential site called Linthipe, a total of 22 village clusters included within these five EPAs [19]. Golomoti is located near the lakeshore at a low elevation and a high evapotranspiration, with a mix of soil types dominated by Eutric Cambisols and Eu- tric Fluvisols. Linthipe has well-distributed rainfall and a long history of maize-dominated agriculture. Soils in Linthipe are primarily Ferric Luvisols, whereas in Kandeu and Nsipe, soils are mixed with Chromic Luvisols and Orthic Ferrasols [20]. Market access also varies across locations, with Kandeu and Nsipe being moderately remote, Golomoti and Linthipe being proximate. 2.2.2 Data Collection The data are from a panel of 320 farm households, two maize plots per household surveyed in 2014/2015 and in 2015/2016, with a survey instrument approved through the Michigan State University Human Research Protection Program in the Office of Regulatory Affairs, following a human subjects’ protocol with informed consent obtained from all farmers, trans- lated into local languages. The farmers are asked to choose two maize plots at random, and the same plots and farmers are then revisited and surveyed at preseason (October 2014 and 2015), mid-season (March 2015 and 2016), and harvest (May 2015 and 2016). Enumerators are trained over a one-week period, and supervised by graduate students on the field. The data collection process involves close attention to data entry and quality control. The infor- 7 mation on the survey is voluntary and every effort is carried out to maintain confidentiality. The survey topics address crop production, socioeconomic information (household size and composition, household head’s educational level), farm management and practices (la- bor, seed, planting dates, plot history, residue, crop grown, time of sowing and weeding, fertilizer application), and soil characteristics (pH, total carbon, permanganate oxidizable carbon). Mid-season survey is to assess maize planting arrangements (including row spac- ing), and maize leaf chlorophyll, which is based on soil plant analysis development (SPAD) absorbance. Enumerators record three reading replicates per plant for four plants at each of the eight locations. To avoid edge effects, the enumerators observe at least two ridges from the plot border, and randomly choose three locations (two-ridge apart) along a diagonal transect. The spacing is from the center of one ridge to the center of the adjacent one. Additionally, two types of measurements are made to investigate the incidence of Striga asiatica (L.) Kuntze, commonly known as witchweed, a genus of parasitic plants. One is directly asking farmers if they have a problem with striga on a given maize plot. The other one is obtained by enumerators who make eight observations per plot at random sites along rows following a prescribed procedure; thus, striga information is recorded from 0 to 8 for each plot. At harvest, a survey is provided to measure maize yield by weighing biomass of stover and grain from three-square-meter plots per field, where grain is removed from cobs to allow for a dry weight basis. In summary, there are 1,197 plots in total, which are geographically located with GPS coordinates in October of 2014, involving a small amount of missing data. The plot-level data is longitudinal, and combine socioeconomic characteristics, maize production, plant and soil information, as well as farm management. 8 2.3 Modeling Framework 2.3.1 Explanatory Variables Environmental factors (rainfall and soil properties) are continuous variables, which are standardized for comparison with the estimated coefficients on the same scale. Rainfall data come from the Climate Hazards InfraRed Precipitation with Station (CHIRPS) resource, which is a public quasi-global rainfall data set starting from 1981. This is the only com- prehensive precipitation data source that is available for Malawi, and has been previously validated by comparisons to local rainfall records for three of the five EPAs [19]. We include three variables in the model that indicate the amount of rainfall (in millimeter) for (a) December, January, and February (the sowing period) (b) March (the end of the rainy season) (c) April and May (the harvest period) to explore the association between seasonal-rainfall variability and maize yield by measuring at three critical stages of the maize growing season. Using more than one subset of semi- annual rainfall is common in certain agricultural studies and practices, such as the definition and calculation of rain-index-based crop insurance [21]. Regarding soil properties, perman- ganate oxidizable carbon (POXC), which is a sensitive indicator of active soil organic carbon, and pH are selected as the main soil factors in the model. There are five farm-management variables, namely ridge (row) spacing, total ridge-weed biomass, fertilizer application, intercropping, and manure/compost use. Although fertilizer and SPAD are highly correlated, they both have explanatory power for maize yield. Ridge spacing (in centimeter) is the distance between two ridges at each of three locations within 9 a field. Total ridge-weed biomass (in quadrat of 0.5 m2) is weighted in the field at harvest, and treated as an index of weeding effectiveness. Intercropping and manure/compost use are both binary variables, and we do not consider the density or the genre of the crop that farmers intercropped with maize. Fertilizer is calculated by the amount of nitrogen applied with any type of fertilizer amendment. Endogeneity bias is possible, particularly for these choice variables. However, we are unable to address endogeneity because our data set does not include suitable instrumental variables, that strongly predict the endogenous explanatory variables but do not directly affect the response variables. As a result, coefficient estimates should be interpreted as indicating association rather than causality. 2.3.2 Regression Model A Bayesian framework is applied to estimate the statistical relationship in a linear re- gression setting, and an agronomy perspective based on expert knowledge is used to form the basis of this model as the following: Yijk = αi + Xijkβi + σεijk (2.1) • Yijk (response variable): maize yield (in kilogram per hectare) of plot k managed by farm household j at EPA site i • αi (y-intercept): can either be a constant or vary from location to location • Xijk (design matrix): incorporates all the data from 12 explanatory variables • βi (regression coefficients): measure how much of the variation of maize yield (Yijk) is explained by the covariates (Xijk) 10 • εijk (noise terms): independently and identically distributed N (0, 1) representing the error of the model • σ: scale of the error To evaluate the determinants of maize yield, we compare the effects of each explanatory variable on the response in any one of our three models (i.e., yield, SPAD, striga). Because all variables in the models have been standardized, the estimated regression coefficients (βi) of each variable give a magnitude of influence, which can be compared with the scale of the noise terms (σ), to find out which predictor(s) has the strongest effect. Usually, σ is quite large relative to a single regression coefficient, but one should add the absolute magnitudes of several independent variables for a more meaningful comparison, since the statistical error needs to be compared to the strength of all the explanatory factors combined. For each model, the set of explanatory variables is chosen to be consistent with agronomists’ beliefs about what factors may influence yield, SPAD, or striga. Each of the three models is specified in the simple linear framework, with appropriate logistic modifications in the case of the the striga model, to distinguish between incidence of striga and levels of striga. Such a linear framework can be seen as a first-order approximation for each response. We can gauge the random effect of location since αi and βi both depend on the EPA index i. Pooling the two-year data avoids model misspecification, and has the added benefit of increasing statistical power. Further models test response variables SPAD and the parasitic weed striga, to uncover the underlying key drivers of maize yield, where we expect striga to be a negative factor and SPAD to be positive. The noise terms (εijk) are assumed to be independent across all three models in order to minimize the number of parameters needed to be estimated. It also avoids the use of a large number of correlation parameters 11 (hyperparameters) at the prior level in the Bayesian context, which need to be consistent when investigating a system of 3 equations with 11 common explanatory variables between any pair of models. Consequently unobserved factors that might simultaneously affect more than one model are not taken into account. The eight striga records have been reverted to a scale of quantitative values from 0 to 8; with 0 indicating the absence of striga, and 1 to 8 revealing the level of striga infestation. The striga model can be thought of as a two-step procedure: • First, the zero and non-zero values provide two alternatives which allow us to estimate the influence of the presence or absence of striga via logistic regression, speaking for the possibility of prevention. • Next, when conditioning on the presence of striga (1–8), ordinary linear regression is employed, which links to the insights on effectiveness of striga control Although having a large number of observations equal to zero (known as zero inflation) may induce biased results, the proposed strategy mitigates this problem, since the standard linear regression model is relative to the non-zero striga values. In addition to the models described above, we conduct two sets of analysis to consider if socioeconomic indicators—educational level of the household head and total dependency ratio of each household—are of importance in predicting maize yield, by adding these two variables to the covariate matrix X. The available data is also used to investigate non-specific household effects, which utilizes simplified versions of the three linear models, allowing the y-intercept (α) to depend on the household identifier (dummy variable) to help determine whether households are predictive of maize yield, SPAD, or striga. This may be interpreted as pointing indirectly to socioeconomic effects, or directly to effects of farmer skill. 12 We apply the package ‘PyMC3’ built into Python to implement the methodology. This package provides a way to estimate the posterior distribution of our model parameters (i.e., regression coefficients and error terms) by implementing a sampling mechanisum for these distributions. It uses the ordinary Gibbs sampler to produce samples, with a burn-in period of 500 initial samples, and an additional 10,000 iterations with two independent MCMC chains after each burn-in. Without having prior knowledge on the distribution of the param- eters, we employ the classical weakly-informative prior distributions: the standard normal distributions (for βi) and inverse-gamma distribution (for σ). These prior choices present numerical advantages in terms of conjugacy [1,22]. We monitor the convergence of the proce- dure by keeping track of the discrepancy between the two aforementioned chains using R-hat statistics (a widely-accepted convergence diagnosis). All the R-hat values are below the ac- ceptable threshold of 1.1, which implies that the chains successfully converge, producing excellent parameters’ estimates as well as their credible intervals. In a word, we focus on biophysical determinants, yet there is more to be explored in the future regarding socioeconomic drivers. Given the limited amount of data, this project does not delve into the higher-complexity models, since they lie beyond the scope of our data set and thus of our analysis. Hence, the three structural models are uniquely characterized by their respective response variables and explanatory variables. 2.4 Results 2.4.1 Descriptive Statistics Table 2.1 provides descriptive statistics for the variables in the model categorized by EPA location. Figure 2.1 shows the precipitation for all study regions during the maize-growing 13 season over two years. Mean values for rainfall are consistent with earlier characterization of Golomoti as a low-rainfall (marginal) site. The other locations differ in terms of mean rainfall for March, with Linthipe (mesic site) having the highest rainfall level. Golomoti Linthipe Kandeu Nsipe 563.0 (78.0) 78.6 (26.8) 25.2 (10.4) 278.9 (152.43) 6.56 (0.61) 624.5 (27.7) 128.9 (42.0) 43.8 (11.6) 466.9 (220.5) 6.09 (0.46) 629.5 (94.6) 102.8 (26.5) 39.1 (9.1) 636.8 (132.8) 116.8 (25.4) 44.4 (8.8) 390.41 (191.15) 340.70 (160.22) 6.10 (0.53) 6.32 (0.61) 1567.44 (1039.3) 2636.3 (1526.0) 2069.4 (1471.5) 2320.9 (1452.9) 41.20 (8.85) 46.98 (7.15) 46.00 (8.62) 41.84 (8.11) 0.897 (0.11) 0.183 (0.16) 8.908 (13.60) 0.22 (0.42) 0.66 (0.47) 0.41 (0.49) 0.927 (0.11) 0.079 (0.07) 13.763 (25.48) 0.970 (0.13) 0.201(0.15) 15.901 (18.04) 0.914 (0.14) 0.246 (0.16) 12.411 (12.86) 0.31 (0.46) 0.77 (0.42) 0.45 (0.50) 282 0.16 (0.37) 0.74 (0.44) 0.31 (0.46) 298 0.28 (0.45) 0.60 (0.49) 0.27 (0.45) 305 Environment Dec.–Feb. precipitation (mm) March precipitation (mm) Apr.–May precipitation (mm) POXC (mg carbon/kg soil) Soil pH Crop performance Maize yield (kg/ha) SPAD Management practice Maize spacing (m) Weed biomass (kg/m2) Fertilizer (kg) Striga (binary) Intercrop (binary) Compost (binary) Total number of observations 312 Table 2.1: Mean and (standard error) of all model variables by location Soils are generally marginal at Golomoti sites, as evidenced by low mean value for soil active carbon (POXC), in accordance with previous reports [19]. The highest POXC value appears to be in Linthipe. Soil pH varies little among locations, and mean values are con- sistent with moderate acidity, and thus non-limiting pH conditions for the crops grown. Crop response consists of maize yield and leaf nitrogen content, as indicated by SPAD values. Average maize yield is the lowest in Golomoti, followed by Nsipe and Kandeu, and the highest is in Linthipe. SPAD data has a similar but not identical pattern: low in Golomoti and Nsipe, and high in Kandeu and Linthipe. Striga incidence and weed biomass are distributed across EPAs with no clear spatial pattern. Farmer-reported striga problems on about 16-30% of sampled fields, and from 0.18 to 0.25 kg/m2 dry-weight weed biomass remaining in the field at harvest. The latter is an 14 Figure 2.1: 10-day precipitation (CHIRPS) from December to May at all EPA sites indicator of how effective farmers’ weed management is, although the endogenous infestation levels of weeds at a site could also contribute to observed presence. Overall, fertilizer use is lower at Golomoti site than the other ones. This is accordant with the intuition that farmers in marginal environments have less motivation to invest in their lands and crops. Fertilizer application is similar in Linthipe, Kandeu, and Nsipe. Compost use is higher in Golomoti and Linthipe than it is in Kandeu and Nsipe. As expected, intercropping is more frequently practiced in Linthipe and in Kandeu, where many farmers grow bean and cowpea (legume) in mixed stands with maize. Average plant spacing is lower in Golomoti and Nsipe, and the highest in Kandeu. 15 2.4.2 Bayesian Inference Statistical significance is determined under a linear Bayesian circumstance. For example, an explanatory variable (such as SPAD) for a response variable (such as yield) is statistically significant at 95% Bayesian credibility if its regression coefficient has a posterior probability of being on one side of zero which exceeds 95%. It is true as soon as the 95% credible interval of that variable’s regression coefficient lies on either side of zero. If this condition fails to happen, then strictly speaking, one may accuse it of not being statistically significant. This is a steep threshold to apply in most cases, since a variable with 90% credibility, still holds some predictive information. In this project, however, when an explanatory variable fails to be significantly associated with the model’s response variable, the failure occurs at a much lower level than 90%. The size of an association needs to be distinguished from its significance. For variables which are significant in terms of size (or intensity), the significance are measured by their regression coefficients’ posterior mean. 2.4.2.1 Maize yield Figure 2.2 presents the 95% credibility intervals for the determinants of maize yield at different sites. In general, maize yield is positively associated with SPAD and negatively associated with plant spacing; the magnitude of the coefficients for SPAD are particularly large. There is also a small but positive direct relationship between fertilizer and maize yield for Kandeu and Nsipe. Yield is not consistently affected by soil pH, weed biomass, intercropping, or March rainfall. The α values differ markedly from each other by EPA, which suggests that location has an important influence on maize yield, independently of other factors in the model. Since those factors are capable of exacerbating differences in maize yield by location, we turn to consider specific locations from now on. 16 Figure 2.2: 95% credible intervals for the determinants of maize yield First, rainfall levels are only significant factors for yield in Linthipe and Nsipe. In both areas, early-season rainfall (i.e., December to February) has a positive relation with maize yield. In Nsipe, high rainfall in April/May is associated with low maize yield, perhaps a reflection of late-season disease harming the crop such as Fusarium ear rot [23]. Secondly, soil active carbon and compost application are positively associated with maize yield at Golomoti. Third, in Linthipe and Nsipe, striga has a large negative effect on maize yield, whereas it is not significantly related with maize yield in Golomoti or Kandeu. Slightly- streamlined yield models are also investigated, where either SPAD or striga is removed. It turns out that these reduced models resulted in decreasing explanatory power for all variables, which could be easily assessed via the posterior distributions of regression coefficients. Overall, the results indicate that maize leaf nitrogen (SPAD) and striga are the strongest 17 determinants of maize yield at all sites, while early and late rainfall are significant at some sites. As farmer behavior can directly influence striga and SPAD, we evaluate separate models to uncover the drivers of these two critical inputs. 2.4.2.2 SPAD As we can see from Figure 2.3, there are three main predictors for SPAD: rainfall, POXC, and application of fertilizer or compost. Rainfall is generally found to be influential on SPAD, except in Kandeu, where rainfall variables are not statistically significant. In Golomoti, SPAD increases with March rainfall, whereas in Linthipe and Nsipe it is the early rainfall that has a positive impact on SPAD. Also in Nsipe, a negative association with SPAD is observed for late-season rainfall. Figure 2.3: 95% credible intervals for the determinants associated with SPAD Fertilizer quantity is an important driver of SPAD at all sites except Golomoti, which 18 in turn is highly predictive of maize yield. The magnitude of its effect on SPAD is higher in Kandeu and Nsipe than it is in Linthipe. In dry and marginal sites, plant growth and response to fertilizer are often limited by insufficient soil moisture, thus fertilizer application does not necessarily lead to plant uptake of nitrogen (or yield). The results in the yield model also show a lack of response to fertilizer in Golomoti (see Figure 2.2), where compost/manure and POXC are positively related to SPAD. Fertilizer impacts on both SPAD and yield are substantial, however, suggesting the need to improve the effectiveness of fertilizer applied. 2.4.2.3 Striga Factors that influence striga incidence (Figure 2.4) and level of striga infestation (Figure 2.5) are shown below. Among farm management and soil properties, most are either not significant or of small magnitude in relationship to striga. There is some evidence of soil fertility amendments being useful in prevention, as fertilizer application is associated with striga absence (as reported by farmers) in Kandeu and in Linthipe. Fertilizer is also an apparent control factor in Linthipe, where it is a predictor of low striga-infestation level. Compost is related with both striga absence and low incidence in Nsipe, with contrasting results observed in Kandeu. At Golomoti site, fertility amendments have no striga control benefits, and the only farm-management effect is wide ridge spacing, which is negatively correlated to striga presence and intensity. In addition, intercropping is neither helpful nor harmful to striga prevention and control. The only exception is Linthipe where intercrops are associated with striga problems. 19 Figure 2.4: 95% credible intervals for the drivers of striga prevention (logistic) Figure 2.5: 95% credible intervals for the determinants of striga control 20 2.4.2.4 Household (farmer) effect With most of the 320 households, and up to four plots per household (two plots per household in two years), we find that it is not possible to determine if there is a connec- tion between any particular farmer and the corresponding data. However, some statistical significance is extracted from roughly 10% of cases, meaning that the variability among the four data points of such households is almost certainly not due to chance alone. Moreover, the effect is most likely to identify a favorable household environment. Specifically, setting the significance level at 5%, we compute the number of households for which the regression intercept α is away from zero with posterior probability at least 95% (see Table 2.2). Model Positive effect Negative effect Yield SPAD Striga 23 (7.5%) 11 (3.6%) 26 (8.5%) 5 (1.6%) 7 (2.3%) 0 (0.0%) Table 2.2: Number and (proportion) of households where the y-intercept is significantly non-zero at 95% credibility level in all models Despite the limited data per household, we are able to detect such effects with nearly 10% of the households. In the yield model, the total number of significant households exceeds 9%. Interestingly, among these households, there are far more cases where the yield is higher than it is lower (7.5% against 1.6%). In other words, when the data towards a farmer having a significant effect on yield, the odds are about 5:1 that this is a skilled farmer with high maize yield. For SPAD, about 6% of households have SPAD levels which cannot be explained by chance. The odds of having high SPAD against low SPAD is about 3 to 2. As for the striga model, there is no case that a farmer could avoid striga entirely, while 8.5% of farmers are likely to be associated with a striga problem. It may seem surprising to say that we cannot determine any farmer with the skills or the knowledge of practices to be 21 superior to others in preventing striga. This result is not to be taken as a discouraging fact. Instead, it reflects that over two thirds of all plots in the study regions are striga-free. Thus, while one third of the plots being infected with striga reaches epidemic levels, having two thirds of plots without striga makes its absence so prevalent that the 320 households cannot identify anyone with unusual striga-prevention skills. Readers are referred to the full striga model for more precise recommendations on striga. 2.4.2.5 Socioeconomic factors The credible intervals for the two socioeconomic drivers (2nd and 3rd line in Figure 2.6) overlap rather heavily with the zero vertical line in most cases, and are small in magnitude compared to other determinants, indicating inconclusive significance and small impact. It also shows that the regression coefficients of other variables are insensitive to whether or not one includes the additional two socioeconomic indicators. This could be a sign of insufficient data to draw conclusions on either rejecting or asserting socioeconomic importance at any reasonable level of significance (e.g 80% credibility or higher). We also carry out additional models by removing insignificant variables from X, and the remaining analysis (not reported) are largely unaffected, which are similar to Figure 2.6. In sum, these evidence imply that our regression model is robust. 22 Figure 2.6: 95% credible intervals for determinants including two socioeconomic factors associated with maize yield 2.5 Discussions The plot-level longitudinal data set and Bayesian regression models place biophysical constraints in sharp focus in the analysis of what influences maize yield in central Malawi. Some have strong effects with very high credibility, notably SPAD, striga, and sub-seasonal rainfall pattern, which is consistent with much of the literature on small-scale, mixed-maize production systems [12,15]. Yet, fertilizer application does not necessarily lead to improving leaf nitrogen nutrition (SPAD) or in subsequent maize yields. Hence, the results highlight the challenges to ensure effective nitrogen uptake and translation into grain, particularly in marginal environments. Fertilizer is generally associated with high plant nitrogen tissue at all areas but Golomoti. Soil organic carbon fractions tend to be sensitive for crop response at lower values, as Golo- 23 moti is 20-40% lower compared to the other locations, which both recommend farmers to apply manure or legume rotations at marginal sites, to build nitrogen supply capacity [14]. Integrated management of soil fertility combining manure and fertilizer has been shown pre- viously to be highly effective and profitable for raising maize yields [24]. In concurrence, a study finds out that maize yield response to nitrogen fertilizer application is low when soil organic matter is low [25]. Rainfall is another crucial factor, particularly early-to-mid season, which supports a pos- itive association between seasonal rainfall and maize yield based on an econometric analysis of farm-level data [24]. At Kandeu, we find a negative relationship of late-season rainfall to yield and SPAD. This could reflect a leaching problem, with rainfall inducing soil inorganic nitrogen losses and thus limiting nitrogen availability during the critical maize grain filling period, which requires high nitrogen availability [26]. It could also be related to Fusarium or other infections of the corn ear, induced by a late-season moist environment causing grain spoilage and yield loss [5]. To our knowledge, it is the first study to produce this type of differentiated conclusions based on panel survey data. Although weed biomass alone has no discernible effect on maize yield, parasitic weed striga is a strong negative determinant, especially in Linthipe and Nsipe. It is a bit surprising that the phenomena are only observed in the two relatively high-potential locations, as the presence of striga is typical throughout the study sites and is indeed ubiquitous nationwide [27]. This implies that a barrier to crop production that has been largely overlooked by agricultural research and policy makers, where the focus has often been on subsidized access to hybrid maize seeds and fertilizers. Effective and affordable means of striga prevention and control are in need. In Malawi, farmers primarily rely on hand weeding for striga control, which appears to 24 be apparently ineffective due to the parasitic nature of striga. Therefore, maize growth sup- pression has already occurred by the time hand weeding is done [28]. The utility of fertilizer and manure/compost links low-nitrogen soil to higher striga incidence [29, 30]. Similarly, farmers in a recent survey rank manure application as the best option for striga control [31]. Another study reveals complex relationships that early application of fertilizer helps maize plants overcome early effects of striga attachment, whereas late application is associated with worse striga [32]. Given the observed differences in soil fertility and fertilizer across the EPAs, we expect higher striga infestation in Golomoti over the other areas, instead there is a widespread presence of striga and lack of uniformity in what works for prevention and control. For instance, fertilizer is found to be a critical factor for striga, but only in Kan- deu and Linthipe. Manure/compost appears to have merits for reducing striga problems in Nsipe only. In short, our findings suggest a complex mode of action with difficult-to-predict reactions of maize to striga presence. Overall, the results from this project call for area-specific recommendations. The Malawi government’s suggestions for hybrid maize production have mainly focused on nitrogen rate [33]. We find evidence that shows targeting complementary investments and timing of application could potentially add value. For example, maize yield at the mesic sites would gain benefits on early and judicious use of fertilizer for not only striga control, but also for nitrogen nutrition. Whereas for the marginal sites, response is markedly different, with no fertilizer or striga drivers observed on yield, and instead soil active carbon and compost are positive determinants of yield, recommending practices should focus on managing soil organic matter. Soil carbon accumulation provides important environmental services at all sites; however, marginal locations benefit from stable production over time and space to substantial gains in nitrogen efficiency, and the incremental gains in soil carbon are high. 25 Compost preparation and utilization at modest amounts have beneficial influence on some sites, with gains in plant health as indicated by nitrogen status and striga suppression. This management practice is especially helpful during a poor-rainfall season in a marginal area. With little to no downside, government policies, extensions, and educational efforts by civil society, can all play crucial roles in building appreciation for compost advantages. 26 Chapter 3 Global Temperature’s Reconstruction and Projection 3.1 Introduction Understanding the evolution of the climate on Earth is one of the major scientific chal- lenges for this century. Quantifying the link between the sensitivity of the climate system constituents, and its response to the climate forcing factors can be critical to discover the underlying relationship in climate change [34]. Although the most updated instrument-base observational temperature has a record since 1850, it is desirable to extend the temperature- reconstruction methodology to make use of climate proxies, hence better capturing the cli- mate variability on different time scales [35]. The Intergovernmental Panel on Climate Change (IPCC) is dedicated to providing an objective and comprehensive view of climate change to the world, and pushes forward the development of accurate climatic models like General Circulation Models (GCMs), which are used for forecasting global and regional climate change [36]. Coupled Model Intercomparison Project Phase 6 (CMIP6) is another ongoing collaborative framework organized by the World Climate Research Program (WCRP), aiming to gather the efforts of the international climate research community, to improve the design of global climate model [37,38]. In addition, there 27 are plenty of research articles relating the past temperature to proxies [35,39,40], such as tree ring, ice cores, lacustrine deposits, etc., implying that the natural proxies are good indicators for paleoclimate because they accurately reflect a wide range of climate sensitivity [41]. Previous approaches used in temperature reconstruction including principal component regression (PCR) [42,43], canonical correlation analysis (CCA) [43–45], regularized expecta- tion maximization (RegEM) [46–48], and linear regression with various kinds of regulariza- tions [49,50]. These literatures significantly improve the comprehension of the climate in the past. However, the lack of spatiotemporal covariance specification and uncertainty quantifi- cation continue to be the statistical challenges [51–53]. Several recommendations like model simplification, model averaging, have been proposed in order to address these issues [54–56]. The idea of using Bayesian model to reconstruct past temperature first appears in [57], and further studied by [40, 58, 59]. An advantage of applying hierarchical modeling frame- work to paleo-climate reconstruction problem is that it can incorporate different sources of information (proxies, forcings), to capture the variability of temperature. Furthermore, parameters estimates are systematically updated within Bayesian inference, which simultane- ously provides the full posterior distributions of each model parameter to better understand the relation between each factor and the mean temperature, as well as reduces and quantifies the reconstruction uncertainty in a more statistically meaningful way [35, 40, 53]. Stochastic models have been applied to hydrology and coral archives involving proxies and paleo-data [60,61]. In this study, we develop a multi-level stochastic model to reconstruct northern hemisphere (NH) temperature over the past millennium (1000-1999), and extend the three forcing time series (solar, volcanic, greenhouse gases) to project the temperature in this century (2019-2100). The first level (data) embeds the climate proxies into the underly- ing temperature anomalies, and the second level (process) linearly relates the temperature to 28 solar irradiance, volcanic activity, and greenhouse gas concentration. Finally, we implement the Bayesian technique to estimate all the parameters in the model, due to a limited amount of data in the calibration interval. 3.2 Data Source There are three data sources employed in the project. Four main types of proxies includ- ing tree ring widths, lacustrine sediment cores, ice cores and speleothems (cave formations) between 1000 and 1999. Observational surface temperature over the period 1900-1999, and estimates of three natural climate forcings (solar, volcanic, greenhouse gas) from 1000 to 1999 (for reconstruction), as well as from 2019 to 2100 (for projection). The original proxy data comprises of 1209 annually and decadal series [62]. Owing to the focus of NH average temperatures reconstruction instead of spatial analysis, it is reasonable to reduce the dimension of proxies and maintain as much climatic information as possible at the same time. Moreover, proxy reduction can avoid model overfitting and decrease computational time of parameters estimations, yield a more parsimonious model due to a limiting calibration period. All the natural proxies are aggregated into a single variable through a rigorous selection and averaging process [41, 58, 63]. HadCRUT4 is the longest and most up-to-date global temperature data set, which combines observational near-surface air temperature [64] and sea-surface temperature [65] anomalies evolution since 1850 [66]. Anomalies are calculated relative to the average of 1961-1990 [65, 67]. Because the climate proxies may intrinsically carry notable noise, and the data before 1900 may not be reliable enough, we choose 1900-1999 to be the calibration time to circumvent the potential significant amplitude attenuation issue [68]. 29 Total solar irradiance (TSI) is the measurement of solar power per unit area (W/m2) on the Earth’s upper atmosphere. The solar activity is a complex phenomenon, whose fluctuation is considered to be around an overall constant 1361.0 W/m2 [38, 69]. We take a proxy-base reconstruction of TSI [70] that is in accordance with the climate reconstruction in [52]. Its relationship with NH temperature is studied in [71]. In terms of projection, we use the dataset recommended for CMIP6, where it calculates the weighted average of three statistical models, namely, analogue forecast (non-parametric), autoregressive model (parametric), and harmonic model (non-linear) [38]. The volcanic forcing is proposed based on ice core aerosol proxies, which is generated by the ejected particles during the explosive volcanic eruptions [72]. The impact of volcanism on global and regional temperature are also investigated [73, 74] . Greenhouse gas concentration (measured by parts per million (ppm), denoted by CO2), is the most dominant forcing account for climate variations since 1950 [72]. According to the level of greenhouse gas (equivalent to radiative forcing in W/m2) at Year 2100 (Figure 3.1), the IPCC chooses and names four possible representative concentration pathways (RCP) for the evolution of the global greenhouse gases, depending on how fast the human civilization adapts itself to reduce the emission of greenhouse gas [75, 76]. The first scenario (RCP2.6) describes a trajectory where the emissions stay very low, as a result of the environment-policy changes drastically made by governments and firms [77]. The medium scenarios are RCP4.5 and RCP6.0, a situation where the greenhouse gases increase but still stabilize before or after 2100 [78]. RCP8.5 is the worst scenario where the greenhouse gas concentration keeps increasing without stabilization [79]. All three forcings are available over 1000 till 1999 for reconstruction purpose. The full description regarding their derivation is in [72]. 30 Figure 3.1: Greenhouse gas projections of four RCP scenarios 3.3 Model Specification 3.3.1 Forcing Transformation and Extension The volcanic influence can be extremely strong among the years following eruptions, but fast decays over non-volcanic years, leading to a cooling effect in the long run [74,80]. Hence, we apply a decreasing logarithm transformation ˜Vt = log(−Vt + 1) (3.1) to the original data to mitigate the impact of large volcanic eruptions during calibration period [35]. Also, we consider the simplest transformation ˜Ct = log(Ct) (3.2) 31 given that the radiative forcing depends on greenhouse gases logarithmically [81, 82]. In order to project the NH mean temperature for the century, we need to expand the three forcing series to 2100. Following [38], we remove the 11-year cycle in the solar forcing via a moving-average process, since this periodic cycle does not appear in the reconstructed data that we use. In addition, the solar constant of the series has to be adjusted by about 5 W/m2, because of the technological update on its evaluation [38, 69]. As for greenhouse gas forcing, we add a random drifting noise term to each RCP scenario, to capture the uncertainty in the global evolution with respect to the climate policies’ change. Specifically, we have: C(cid:48) 2019+t = C2019+t + σC εC , t ∈ [0, 81] t 82 with εC ∼ N (0, 1) and σC is decided by (cid:0) max Ct − min Ct (cid:1), 2019 ≤ t ≤ 2100 σC = 1 10 (3.3) (3.4) which yields an uncertainty that grows with the RCPs’ magnitudes. Various uncertainties have been applied to test robustness of the model (refer Appendix A). To our knowledge, there has not been any literature about the long-term volcanic pre- diction on a global scale. Most studies concentrate on either a specific volcano or forecast- ing eruptions short-periodically ahead [83–85]. Therefore, we treat volcanic activity as a stochastic process, allowing to take volcanic uncertainty into account for the temperature’s projection. Note that the volcanic aerosol concentration decays similarly (up to a rescaling factor, see Figure 3.2) after eruptions, we thus model the volcanic time series as a succession 32 of spikes with random amplitudes and time intervals, which can be written as: Vt = V0 + τt + 1{t≥Ti}AiPt−Ti (3.5) ∞(cid:88) i=0 where • V0 is an overall constant • τt is a small random fluctuation • Ti are the dates of eruptions with amplitude Ai • Pt−Ti is the decreasing pattern of the aerosol concentration after each spike Figure 3.2: The first 36 re-normalized volcanic eruptions’ decay 33 Moreover, the time increment (Ti+1 − Ti) and the spike amplitude (Ai) are assumed to be independent and identically distributed. This assumption is verified by high p-values of Ljung-Box test (autocorrelations of a time series) applied to both series [86–88]. Eventually, the three natural forcing factors are normalized to make their influence on the temperature comparable among each other. 3.3.2 Multi-level Autoregressive Model The hierarchical stochastic model is established as follows: Pt = aPt−1 + α0 + α1Tt + σP εt Tt = bTt−1 + β0 + βSSt−1 + βV Vt−1 + βC Ct−1 + σT ηt (3.6) where • εt, ηt: white noises (mean 0, standard deviation 1) without any correlation structure • a, b: autoregressive parameters • α, β: coefficients associated with temperature and forcings At any given year, the non-stationary model has the advantage of taking into account the impact of forcings in the past years (with an exponential decay) to the current temperature (see Appendix B for further details). From a Bayesian perspective, we assign the following 34 prior distributions to the parameters in the model: α = (α0, α1) ∼ N ([0, 1], I2) β = (β0, βS, βV , βC ) ∼ N ([0, 1, 1, 1], I4) σ2 P , σ2 T ∼ IG(2, 0.1) a, b ∼ U(0, 1) (3.7) In is an n-dimensional identity matrix. IG, U represent inverse gamma distribution and uniform distribution respectively. The model is run by Gibbs sampling algorithm with 10,000 iterations, but we only take the last 5,000 samples since the beginning of the chain (burn-in period) is often discarded, owing to its lacking accuracy to represent the desired distribution. The choice of prior distributions (normal and inverse gamma) for the linear coefficients and variances, is for the sake of conjugacy (posterior distributions yield same distributions as priors with different parameters), thereby avoiding computational burden [89, 90]. The complete computation of posterior conditional densities for the model may be found in Appendix C. It appears that small variations in the memory coefficients (a and b) lead to convergence instability for other model coefficients (α and β). Thus, we explore all the combinations of (a, b) ∈ {0, 0.1, 0.3, 0.5, 0.7, 0.9}2 (no memory to strong memory), and run the Gibbs sampler for all other parameters. The model is implemented in Python using a few fundamental libraries such as NumPy and pandas. 35 3.4 Results 3.4.1 Validation Metrics Several metrics are produced to assess the quality of the temperature’s reconstructions as well as to evaluate the posterior distributions of the parameters, applying them as criteria to compare with other works. We first calculate the root mean square error (RMSE) by using the mean of the posterior distributions. It is a frequently employed measure of the differences between observed values and predicted values [91–93]. The empirical coverage probability (ECP) indicates the pro- portion of true value of interest (temperature) fall into the confidence interval [89]. Lower RMSE implies better model fit. Additionally, we provide two scoring rules which are interval score (IS) and continuous ranked probability score (CRPS) [94–97]. IS rates the posterior confidence interval in func- tion of the quantiles of posterior distribution for each observation, rewarding sharp prediction intervals and penalizing uncovered observations [96]. ECP and IS are both computed and compared at the levels of 80% and 95%. CRPS is defined as a squared distance between the cumulative distribution function F (y) and the indicator function 1y≥x of the predictive distribution as below: CRP S(F, x) = (cid:2)F (y) − 1y≥x (cid:3)2dy (3.8) (cid:90) ∞ −∞ Both scoring rules and ECP are positive-oriented with the designated model (higher values indicate a more appropriate model). A Markov chain tends to run through as much state space as possible, and continuously reduces its variability to focus on the areas of the space with high density for an invariant 36 distribution, which it converges toward in the end. Therefore, the total variance of the chain shrinks until it asymptotically converges to the variance of the invariant distribution. In order to determine the performance of the MCMC’s convergence, we compute the potential scale reduction factor (PSRF) for the posterior samples [98, 99]. The PSRF (optimal value is 1) estimates how much variance needs to be reduced before achieving convergence. 3.4.2 Reconstruction (1000-1999) Smaller memories are related to lower RMSE (better model performance) is not surpris- ing, because the no-memory model is equivalent to a linear regression, which comes down to minimizing the squared error (Table 3.1). a b 0 0.1 0.3 0.5 0.7 0.9 0 0.1 0.3 0.5 0.7 0.9 0.157 0.148 0.139 0.136 0.138 0.175 0.160 0.151 0.142 0.138 0.143 0.190 0.169 0.161 0.152 0.146 0.153 0.221 0.182 0.175 0.166 0.159 0.168 0.288 0.204 0.199 0.189 0.187 0.203 0.341 0.219 0.215 0.208 0.212 0.238 0.375 Table 3.1: Root mean square error (RMSE) for all combinations of memory coefficients For the ECPs, the credible intervals are widened at both levels 80% and 95% when increasing memory parameters, and a good compromise can be found with a between 0.5 and 0.7, b between 0.3 and 0.7 (Table 3.2). The IS (80% level) does not exhibit any specific trend, except for extreme memory value (b = 0.9) returning unsatisfactory results. Whereas at level of 95%, enlarge a and diminish b would lower the negative interval score overall indicating a better model fit (Table 3.3). The continuous ranked probability scores are quite similar for a and b below 0.7, but become larger for higher memory values (Table 3.4). Since the best performances are achieved with a and b at the order of magnitude described 37 a b 0 0.1 0.3 0.5 0.7 0.9 a b 0 0.1 0.3 0.5 0.7 0.9 0 0.1 0.3 0.5 0.7 0.9 71 68 60 58 50 42 69 67 62 57 52 42 67 66 67 64 62 41 73 69 68 70 71 47 81 81 80 80 79 65 88 87 84 88 96 96 0 0.1 0.3 0.5 0.7 0.9 89 84 78 74 72 58 90 86 80 74 72 60 91 89 82 76 75 63 96 94 94 91 84 65 99 98 97 97 98 83 99 99 99 99 99 100 Table 3.2: Empirical coverage probability (ECP) at levels 80% (up) and 95% (down) regarding the ECPs (i.e. the uncertainty quantification), we run another set of simulations refining this particular area (a ∈ [0.5, 0.7] and b ∈ [0.3, 0.7]), results in Table 3.5). Among the new simulations, the IS(80) and IS(95) are best at low memory parameters until a = 0.6 and b = 0.55. The CRPS does not vary much but seems to decrease as a and b increase. The highest ECP (80% level) is always attained when a = 0.7. However, the best ECP at level 95 is obtained for lower values of a. Eventually, considering all the other validation metrics, we choose the pair a = 0.6, b = 0.5 to be the memory coefficients. It confirms that even though the no-memory model achieves a better fit, it is necessary to include memories to properly address model uncertainties [58]. In [58], the authors build a hierarchical Bayesian model with eight scenarios based on memory or memoryless feature (controlled by two Hurst parameters H and K), with or without external forcings, and error terms’ structures: fractional Gaussian (fGn) or autore- gressive (AR). We compute all of the validation measurements for one case (scenario D in [58] 38 a b 0 0.1 0.3 0.5 0.7 0.9 a b 0 0.1 0.3 0.5 0.7 0.9 0 0.1 0.3 0.5 0.7 0.9 0.184 0.188 0.201 0.221 0.254 0.363 0.184 0.189 0.199 0.215 0.251 0.382 0.186 0.189 0.193 0.204 0.234 0.415 0.188 0.186 0.186 0.185 0.218 0.489 0.193 0.188 0.178 0.178 0.204 0.432 0.198 0.193 0.189 0.192 0.215 0.344 0 0.1 0.3 0.5 0.7 0.9 0.062 0.066 0.085 0.108 0.137 0.214 0.061 0.065 0.084 0.101 0.132 0.221 0.059 0.061 0.067 0.081 0.107 0.219 0.058 0.058 0.057 0.057 0.076 0.225 0.062 0.061 0.060 0.061 0.067 0.153 0.067 0.066 0.065 0.068 0.078 0.129 Table 3.3: Negative interval score (IS) at both 80% (up) and 95% (down) levels a b 0 0.1 0.3 0.5 0.7 0.9 0 0.1 0.3 0.5 0.7 0.9 0.075 0.074 0.075 0.078 0.084 0.117 0.076 0.075 0.074 0.078 0.086 0.126 0.076 0.076 0.076 0.076 0.083 0.138 0.081 0.082 0.077 0.074 0.079 0.173 0.083 0.084 0.079 0.078 0.079 0.168 0.086 0.086 0.084 0.081 0.081 0.114 Table 3.4: Negative continuous ranked probability score (CRPS) for different memories vs. a = 0.6, b = 0.5 in this work) in each model to compare the reconstructions (see Table 3.6 and Table 3.7). In general, scenario D performs the best among other scenarios, and is also the closest one to the model in this project, which put short memories in both equations. It turns out that our model obtains more precise ECPs at both 80% and 95% levels, and the CRPS is less than half as much as the model in [58]. The interval scores stay quite similar between both models, and the RMSE is a bit better for [58], but RMSE is not necessarily a performance indicator since it favors the memoryless models. 39 a b 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 a b 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.50 0.55 0.60 0.65 0.70 68 68 70 70 70 72 74 70 71 69 70 71 71 71 74 74 73 72 76 71 73 74 76 76 76 75 76 78 79 79 78 78 79 79 77 78 80 80 82 81 80 81 79 79 79 0.50 0.55 0.60 0.65 0.70 94 93 94 94 91 90 88 86 84 95 94 94 94 94 93 92 88 88 96 97 97 97 95 97 97 92 90 97 97 97 98 98 98 98 97 95 97 98 97 98 97 98 98 98 98 Table 3.5: ECP at two levels 80% (up) and 95% (down) for specific calibrations Additionally, both models appear to converge towards limit posterior distributions nu- merically well (PSRFs in Table 3.7), but our model seems to converge slightly better. This could be explained by the smaller variability in the Gibbs sampling process for our model, given that a and b are fixed to be 0.6 and 0.5 respectively, whereas the memory parameters (H and K) are not constants in scenario D [58]. Scenario D Model (a = 0.6, b = 0.5) RMSE ECP (80) ECP (95) 0.162 0.174 90.9 95.0 74.7 76.0 IS (80) 0.176 0.181 IS (95) CRPS 0.209 0.063 0.056 0.074 Table 3.6: Validation metrics (negative IS and CRPS) for both models We also compare our reconstruction results with another regression method in climatology domain, namely errors in variables (EIV), which is a scaling regularization allowing for errors 40 Scenario D Model (a = 0.6, b = 0.5) α0 1.01 1.00 α1 1.00 1.01 β0 1.05 1.01 βS 1.07 1.01 βV 1.00 1.00 βC 1.03 1.01 σ2 P 1.09 1.00 σ2 T 1.00 1.01 Table 3.7: The diagnosis of MCMC’s convergence (PSRF) for both models in both explanatory and response variables [41, 100]. The comparison may not be perfect, since our model also incorporates the reconstructions of the three climate forcings, which is proved to outperform those reconstitutions that only involving proxies [58, 101]. From the curves (displayed in Figure 3.3), we can see the trend is lower on our side as opposed to [41], implying a more radical temperature in the past millennium, which confirms that forcings’ incorporation helps produce a cooler reconstruction [58]. The local variations are rather similar, thanks to the common proxy data used in both methods. The gap between the two curves is more distinct during the period of notably low solar activity (the Maunder Minimum, see [102] for more details), which can be easily detected in the solar forcing series (St), because sunspot number is strongly correlated to the TSI [103, 104]. Figure 3.3: Temperature anomalies’ reconstructions between our model and EIV Furthermore, the memory included in our model takes into account the influence of 41 forcings of any given year on the following year. It is especially true for volcanism where eruptions may yield cooling periods, but whose spikes only last around two years. In [105], the authors show that tree rings underestimate the cooling effect of large volcanic events, which means that tree-ring based reconstructions (particularly proxy-only) may not properly address volcanic eruptions. In this study, the proxy is an aggregation of multiple proxy time series, less than half of which are tree rings. Hence, our reconstruction compensates this phenomenon by adding moderate memories in the model. 3.4.3 Projection (2019-2100) As part of the CMIP5, many institutes publish the result of climate simulations for this century using different models and data [106]. For the four RCPs from 2006 to 2100, we compute the means of the surface temperature on the northern hemisphere from different models (listed in Table 3.8), followed by converting them to temperature anomalies. Al- though the models and data exhibit difference, the sample paths of the various simulations have almost identical trends (see Figure 3.4 for an example). Thus, for each RCP, we consider the temperature projection is equal to the mean of all model simulations, which allowing us to compare our results with this multi-model average method (Figure 3.5 and Figure 3.6). Except RCP8.5, the temperature’s posterior means in our model exactly fall into the 95% confidence intervals of CMIP5 projections. For RCP4.5 and RCP6.0, we can also see that the 95% credible intervals cover CMIP5 means very well, especially for RCP6.0, the temperature’s posterior means are almost the same as the CMIP5 projections. We cannot explain why both our model and CMIP5 have a jump at the beginning, however, we do not intend to model the high-frequency fluctuations from year to year. In the initial 13-year (2006-2018) projections, we note that the discrepancy between the NH average temperature’s 42 Modeling center or group (Location) AORI, NIES, JAMSTEC (Japan) Canadian Centre for Climate Modelling and Analysis (Canada) Centre national de recherche m´et´eorologique (France) Commonwealth Scientific and Industrial Research Organisation (Australia) EC-Earth (Europe) Institute of Atmospheric Physics, Chinese Academy of Sciences (China) Institut Pierre Simon Laplace (France) Max Planck Institute for Meteorology (Germany) Meteorological Research Institute (Japan) Met Office Hadley Centre for Climate Science and Services (UK) NASA Goddard Institute for Space Studies (USA) Norwegian Climate Centre (Norway) Model (acronym) MIROC-ESM-CHEM MIROC-ESM MIROC5 CanESM2 CNRM-CM5 ACCESS1.0 ACCESS1.3 CSIRO-Mk3.6.0 CSIRO-Mk3L EC-EARTH FGOALS-s2 IPSL-CM5A-LR IPSL-CM5A-MR IPSL-CM5B-LR MPI-ESM-LR MPI-ESM-MR MRI-CGCM3 MRI-ESM1 HadGEM2-AO GISS-E2-H-CC GISS-E2-H NorESM1-ME NorESM1-M Table 3.8: List of CMIP5 models used for multi-averaging temperature calculations annual fluctuations and CMIP5, our projection is smoother with greater magnitude than this initial jump, which could be viewed as part of the fact that the feature of yearly fluctuations are missing from our model and from CMIP5. In terms of the model robustness, not only do we add various choices of σC to expand greenhouse gas concentration forcings (Appendix A), but we compute the differences over two separate projection periods (2006-2100 and 2019-2100) using the same model (Table 3.9). The trends in the graphs are relatively similar to each other (the progression re- sults can be found in the supplemental materials), and the values in Table 3.9 are rather minimal compared with the projected temperature anomalies in the corresponding RCP sce- narios. Therefore, both methods justify that our model is quite robust and the uncertainty 43 Figure 3.4: Temperature prediction for RCP8.5 between 2006 and 2100 (CMIP5) Figure 3.5: Temperature projections until 2100 for all four RCPs: comparisons with CMIP5 multi-model average quantification is reasonable, which is not allowed to be evaluated by the model averaging methodology of CMIP5. Besides, following [35], we extend and test all memory combinations-white noise (WN, no memory), AR(1) and AR(2) (short-term memory), and fGn (long-term memory) for the purpose of projection in [58]. The outcomes are shown in Table 3.10 below. In similar to the reconstruction results, the best one is the no-memory model (WN-WN) except the ECPs, where the AR(1)-fGn achieves better at both levels. Models with stronger memory yield worse RMSE, since memoryless model corresponds to the classical framework of linear regression, which directly minimizes the mean squared error. The scoring measures 44 Figure 3.6: Temperature projections’ comparisons with CMIP5 for medium scenarios (up: RCP4.5; down: RCP6.0) RCP2.6 RCP4.5 RCP6.0 RCP8.5 Year 2020 2030 2040 2050 2060 2070 2080 2090 2100 0.039 0.044 0.041 0.052 0.034 0.066 0.050 0.045 0.042 Average 0.045 Maximum 0.066 0.026 0.032 0.029 0.031 0.025 0.031 0.048 0.046 0.039 0.033 0.052 0.061 0.076 0.104 0.120 0.127 0.168 0.214 0.239 0.244 0.148 0.266 0.196 0.284 0.400 0.540 0.717 0.914 1.134 1.354 1.627 0.775 1.627 Table 3.9: Temperature differences (in absolute values) between two projection periods (IS and CRPS) do not have explicit pattern. Scenario AR(1)-AR(1) model attains decent validation metrics, and beats WN-WN model with respect to the ECPs, which assures to incorporate memories to more suitably quantify uncertainties. For the coefficients part, the values of α1 are much larger for the models with a white noise in the proxy equation, because the noise is anticipated to take a part of the signal. Among the forcings, the greenhouse gas concentration (βC ) is always the highest with low standard deviation, which identifies its statistical significance. This is also reasonable because it explains the temperature’s increase over the last century [72,100]. Solar coefficient (βS) is 45 H WN AR(1) AR(2) fGn K WN AR(1) AR(2) fGn WN AR(1) AR(2) fGn WN AR(1) AR(2) fGn WN AR(1) AR(2) fGn H WN AR(1) AR(2) fGn α0 -0.015 (0.008) -0.029 (0.005) -0.028 (0.005) -0.024 (0.005) -0.234 (0.034) -0.188 (0.028) -0.076 (0.022) -0.171 (0.034) -0.204 (0.040) -0.179 (0.055) -0.185 (0.044) -0.170 (0.062) -0.167 (0.313) -0.153 (0.159) -0.081 (0.103) -0.133 (0.158) α1 0.684 (0.008) 0.573 (0.005) 0.587 (0.005) 0.629 (0.005) 0.230 (0.034) 0.266 (0.028) 0.253 (0.022) 0.285 (0.034) 0.224 (0.040) 0.268 (0.055) 0.257 (0.044) 0.282 (0.062) 0.190 (0.313) 0.202 (0.159) 0.214 (0.103) 0.247 (0.158) β0 -0.474 (0.018) -0.538 (0.038) -0.526 (0.045) -0.444 (0.210) -0.053 (0.986) -0.550 (0.047) -1.006 (0.180) -0.574 (0.086) -0.543 (0.043) -0.535 (0.055) -0.550 (0.047) -0.531 (0.122) -0.542 (0.044) -0.648 (0.155) -1.086 (0.249) -0.618 (0.142) βS 0.026 (0.018) 0.046 (0.038) 0.041 (0.045) 0.007 (0.210) -0.012 (0.986) 0.084 (0.047) 0.071 (0.180) 0.071 (0.086) 0.089 (0.043) 0.077 (0.055) 0.083 (0.047) 0.048 (0.122) 0.078 (0.044) 0.065 (0.155) 0.075 (0.249) 0.048 (0.142) βV 0.010 (0.018) -0.008 (0.038) -0.008 (0.045) -0.005 (0.210) -0.413 (0.986) -0.017 (0.047) -0.019 (0.180) -0.018 (0.086) -0.018 (0.043) -0.017 (0.055) -0.018 (0.047) -0.017 (0.122) -0.015 (0.044) -0.020 (0.155) -0.022 (0.249) -0.018 (0.142) βC 0.127 (0.018) 0.153 (0.038) 0.158 (0.045) 0.151 (0.210) 0.895 (0.986) 0.113 (0.047) 0.278 (0.180) 0.130 (0.086) 0.107 (0.043) 0.113 (0.055) 0.114 (0.047) 0.132 (0.122) 0.113 (0.044) 0.159 (0.155) 0.291 (0.249) 0.154 (0.142) K WN AR(1) AR(2) fGn WN AR(1) AR(2) fGn WN AR(1) AR(2) fGn WN AR(1) AR(2) fGn RMSE ECP (80) ECP (95) 0.126 0.118 0.115 0.106 0.169 0.158 0.297 0.167 0.157 0.156 0.153 0.172 0.159 0.203 0.310 0.178 0.65 0.63 0.65 0.68 0.90 0.92 0.90 0.95 0.92 0.97 0.97 0.99 0.92 0.99 0.92 0.99 0.72 0.50 0.52 0.46 0.63 0.69 0.64 0.76 0.70 0.82 0.76 0.88 0.73 0.90 0.82 0.83 IS (80) 0.126 0.224 0.213 0.185 0.204 0.172 0.347 0.169 0.170 0.147 0.147 0.152 0.167 0.176 0.319 0.161 IS (95) CRPS 0.036 0.056 0.074 0.122 0.071 0.114 0.090 0.064 0.082 0.072 0.071 0.053 0.138 0.110 0.050 0.073 0.070 0.055 0.064 0.049 0.048 0.064 0.067 0.052 0.070 0.049 0.095 0.065 0.106 0.012 0.070 0.054 Table 3.10: Gibbs sampler convergence for the posterior means and (standard deviations) of model coefficients in Scenario D as well as validation metrics for each memory combination 46 the same order of magnitude even though it is smaller than βC , which is in accordance with the solar activity has a non-negligible influence on the climate before greenhouse gas explodes [71]. The volcanic coefficient (βV ) is always negative with probability greater than 99% in most models. Considering we apply a decreasing transformation on volcanism log(−Vt + 1), which means that the volcanic activity is expected to slightly warm the planet. However, the volcanic eruptions are known to produce an overall global cooling effect [105]. It cannot have any impact beyond five or six years since eruptions are represented as short-memory spikes (in Vt). Therefore, long-period volcanic activity is more appropriately addressed by autoregression proposed in our study. In sum, increasing the memory parameters would decrease all the coefficients except for volcanism. It is coherent because increasing memory grants more weights to the past climate forcing factors and temperature in the model. The stability of the volcanic coefficient indicates that stronger memories help taking the impact of past eruptions over the following years into consideration. 3.5 Conclusions and Discussions In the study, we build a multi-level stochastic model to reconstruct NH temperature anomalies over the past millennium and to make projections for this century, by incorporating natural climate forcings (solar irradiance, volcanism, greenhouse gases) with memories in both levels. Bayesian inference is adopted to systematically estimate the magnitude of all unknown quantities as well as model uncertainties. Moderate memory is suggested by the validation metrics in comparisons from both re- constructions and projections. The memory inclusion in our model (autoregression) not only 47 allows for the influence of external forcings at any given year on the years after, but also strengthens the overall decreasing trend appeared in reconstructions (Figure 3.3). It gener- ates lower temperatures (especially before Year 1900) compared to [41], noticing the model and data are different though. The projection deviation between this paper and CMIP5 is proportional to the greenhouse gas growth amplitude, which may be a consequence of the solar activity prediction took from the CMIP6, where the solar decay over the next century is a novelty that CMIP5 did not take into account [38]. We believe that the solar activ- ity’s decay is more impactful on the temperature when the greenhouse gas concentration is smaller, which explains the evolution of the gap. There are a few possible extensions of this paper. One is to simultaneously evaluate mem- ory parameters (a and b) with other prior distributions involving model coefficients (α and β), which most likely would cause higher computational demand. Another possibility is to include spatiotemporal component by designing a spatial pattern for proxies over time [59], providing more smooth reconstructions [107]. This implementation requires more compre- hensive understandings of climatic system to propose a more proper and feasible temporal covariance structure for the model, which probably might simplify computations and create further scientific insights. 48 Chapter 4 Predictive-distribution Approximation via Wiener Chaos 4.1 Preliminaries Assume we have the simplest linear regression model: k(cid:88) Yi = Xijβj + σεi (4.1) j=1 where i = 1, 2, ..., n, j = 1, 2, ..., k, and εi i.i.d.∼ N (0, 1). Let X, Y, βββ, εεε denote Xij, Yi, βj, εi respectively in matrix forms. The model (4.1) is rewritten as: Y = Xβββ + σεεε The log-likelihood function of the parameters (βββ, σ2) up to a constant term is: l(βββ, σ2) = −nln(σ) − 1 2σ2 (Y − Xβββ)2 (4.2) (4.3) 49 Then the maximum likelihood estimator (MLE) of (βββ, σ2) can be derived from (4.3) by letting ∂l(βββ,σ2) ∂σ2 = 0. Thus, the MLEs of the parameters are ∂βββ = 0 and ∂l(βββ,σ2)  ˆβββ = (XTX)−1XTY (Y − Xˆβββ)2 (cid:99)σ2 = 1 n (4.4) ˆβββ is known as the ordinary linear squared (OLS) estimator, which is unbiased for βββ (i.e., E(ˆβββ) = βββ). However,(cid:99)σ2 is biased for σ2. An unbiased estimator of σ2 (i.e., E(s2) = σ2) is s2 = where ˆβββ is independent from s2 and YTY − ˆβββ T XTY n − (k + 1) (n − (k + 1))s2 σ2 ∼ χ2 n−(k+1) The variance-covariance matrices of (ˆβββ, s2) are  C(ˆβββ) = (XTX)−1XTV(Y)((XTX)−1XT)T = σ2(XTX)−1 := C V(s2) = V(cid:16) (cid:1) = V(cid:0)χ2 χ2 n−(k+1) n−(k+1) (cid:17) σ4 σ2 n − (k + 1) = (n − (k + 1))2 (4.5) (4.6) (4.7) 2σ4 n − (k + 1) 50 4.2 Approximation Error 4.2.1 Wiener-chaos Expansion Intuitively, s2 is an unbiased estimator of σ2, so we decide to use s2 to replace σ2 in C(ˆβββ) and in V(s2). Then the error of the approximation es2 for V(s2) is es2 = 2σ4 n − (k + 1) − 2s4 n − (k + 1) Now let d = n − (k + 1). (4.6) becomes where E(Z) = d, V(Z) = 2d. Then Z := ds2 σ2 ∼ χ2 d ˜Z := Z − d ∼ χ2 d (4.8) (4.9) (4.10) is a centered chi-squared distribution (i.e., E( ˜Z) = 0) with degrees of freedom d. In the classic Wiener space L2[0, 1], a centered Gaussian family G(ψ), ψ ∈ L2[0, 1] of random variables can be identified as the stochastic differential of a Wiener process W : (cid:90) 1 G(ψ) := ψ(s)dW (s) (4.11) 0 In order to find how big the approximation error in (4.8) is, it is convenient to represent 51 ˜Z in terms of Wiener integrals for computational sake, that is ˜Z = ds2 σ2 − d D = (G2 i − 1) d(cid:88) i=1 where (cid:90) 1 0 Gi = εi(s)dW (s) := W (εi) = IW 1 (εi) and εi, i ≥ 1 are orthonormal family existed on L2[0, 1]. By contraction rule, (4.12) (4.13) W 2(εi) − 1 = IW 2 (ε⊗2 i ) (4.14) Hence, applying product formula and (4.14) to ˜Z2, we have )2 − IW 2 (ε⊗2 i − ε⊗2 j j )2(cid:3) 1 4 i 4 j (G2 ) = j i=1 j=1 i=1 j=1 i=1 j=1 i=1 j=1 i=1 j=1 i − 1)(G2 j − 1) = d(cid:88) d(cid:88) d(cid:88) d(cid:88) 2 (ε⊗2 2 (ε⊗2 IW )IW d(cid:88) d(cid:88) (cid:2)IW (cid:0)(ε⊗2 i + ε⊗2 ) ⊗ (ε⊗2 d(cid:88) d(cid:88) )(cid:1)(cid:3) − 1 i + ε⊗2 (cid:0)(ε⊗2 i − ε⊗2 d(cid:88) d(cid:88) (cid:2)4IW 4 (ε⊗2 (cid:18) d(cid:88) d(cid:88) d(cid:88) d(cid:88) (cid:0)W 2(εi) − 1(cid:1)(cid:0)W 2(εj) − 1(cid:1) d(cid:88) d(cid:88) (cid:2)IW 2 (ε⊗2 i + ε⊗2 )(cid:1) + 4IW (cid:0)(ε⊗2 i + ε⊗2 i + ε⊗2 )(cid:1) (cid:2)IW (cid:0)(ε⊗2 i − ε⊗2 i − ε⊗2 )(cid:1)(cid:3) i − ε⊗2 ) ⊗1 (ε⊗2 )(cid:3) i ⊗1 ε⊗2 2 (ε⊗2 (cid:18) d(cid:88) (cid:19) d(cid:88) i ⊗1 ε⊗2 ε⊗2 εi ⊗ εi ⊗ εj ⊗ εj i ⊗ ε⊗2 ) ⊗ (ε⊗2 ) + 16IW + 4IW 2 + 4IW 2 ⊗1 (ε⊗2 (cid:19) i=1 j=1 i=1 j=1 j 2 j j 1 4 1 4 4 j 4 j ) j j j j j ˜Z2 − E( ˜Z2) D = = = = = IW 4 i=1 j=1 i=1 j=1 (4.15) 52 where i ⊗1 ε⊗2 ε⊗2 j (s1, s2) = (cid:90) 1 0 εi(s1)εi(s)εj(s2)εj(s)ds = 0 εi(s1)εj(s2) = εi ⊗ εi i (cid:54)= j i = j (4.16) Note that E( ˜Z2) = V( ˜Z) + E2( ˜Z) = 2d. Therefore, ˜Z2 becomes (cid:18) d(cid:88) d(cid:88) ˜Z2 D = IW 4 (cid:19) (cid:18) d(cid:88) (cid:19) + 2d (4.17) εi ⊗ εi ⊗ εj ⊗ εj + 4IW 2 εi ⊗ εi i=1 j=1 i=1 4.2.2 Error Magnitude The error (4.8) in the replacement of σ2 by s2 for V(s2) is equal to − 2 d (cid:18) 2σ4 d es2 = D = −2σ4 d3 IW 4 (cid:17)2 (cid:16)σ2( ˜Z + d) (cid:16) d(cid:88) d(cid:88) d i=1 j=1 = −2σ4 d3 ( ˜Z2 + 2 ˜Zd) (cid:17) εi ⊗ εi ⊗ εj ⊗ εj + (2d + 4)IW 2 (cid:17)(cid:19) εi ⊗ εi (cid:16) d(cid:88) i=1 (4.18) with E(es2) = 0 because it only contains the sums of chaos terms. For any function g and q ≥ 1, V(Iq(g)) = q!(cid:107)g(cid:107)2 L2([0,1]q) (4.19) 53 Then the variance of the approximation given (4.19) is V( ˜Z2) = 4! (cid:13)(cid:13)(cid:13)(cid:13) d(cid:88) d(cid:88) d(cid:88) d(cid:88) j=1 i=1 (cid:13)(cid:13)(cid:13)(cid:13)2 εi ⊗ εi ⊗ εj ⊗ εj d(cid:88) d(cid:88) d(cid:88) d(cid:88) 1 + 32 i(cid:48)=1 i=1 j=1 j(cid:48)=1 i(cid:48)=1 i=1 = 24 (cid:13)(cid:13)(cid:13)(cid:13) d(cid:88) i=1 (cid:13)(cid:13)(cid:13)(cid:13)2 L2([0,1]2) εi ⊗ εi + 16 × 2! L2([0,1]4) 1 (4.20) = 24d2 + 32d Apply the results in (4.20), the variance of es2 is d3 (cid:16) − 2σ4 (cid:17)2(cid:0)24d2 + (2d + 4)2 × 2d(cid:1) (cid:0)8d3 + 56d2 + 32d(cid:1) d3 + O(cid:0) 1 4σ8 d6 32σ8 (cid:1) d4 V(es2) = = = (4.21) Similarly, we employ the same methodology to C to find its approximation error eˆβββ . eˆβββ = (σ2 − s2)(XTX)−1 (cid:17) (cid:16) σ2 − σ2( ˜Z + d) = = −σ2 ˜Z(XTX)−1 d D = −σ2 d (cid:16) d(cid:88) IW 2 d εi ⊗ εi i=1 (XTX)−1 (cid:17) (XTX)−1 where E(eˆβββ ) = 0 since it only includes the second-chaos terms. The variance of eˆβββ (cid:16) − σ2 (cid:17)2 × 2 × V(cid:0)(XTX)−1(cid:1) V(cid:0)(XTX)−1(cid:1) d 2σ4 d2 V(eˆβββ ) = = 54 (4.22) is (4.23) 4.3 Prediction Evaluation 4.3.1 Explicit Boundaries Considering prediction for a new observation X∗. The density of (cid:112) σ2ε∗ = law law 2σ4 d (cid:18)(cid:114) N(cid:0)σ2, (cid:18)(cid:115) (cid:114) (cid:114) 2 (cid:18) (cid:16) (cid:18) σε∗(cid:16) σ2 + 1 + 1 + 1 2 σ (cid:1) × ε∗(cid:19) η × ε∗(cid:19) (cid:17) × ε∗(cid:19) (cid:17)(cid:19) 2σ4 d η d 1√ 2d η = law ≈ law = law √ σ2ε∗ is (4.24) where η ∼ N (0, 1). ε∗ and εεε are independent, which is consistent with εi Based on the approximation error calculated above, replacing σ2 with s2 (data-base value), the approximate distribution of Y ∗ from (4.2) is i.i.d.∼ N (0, 1). (cid:99)law Y ∗ = law (X∗ ˆβββ + (cid:112) s2ε∗)  ˆβββ ∼ N(cid:0)(XTX)−1XTY,(cid:98)C(cid:1) s2 ∼ N(cid:0)s2, (cid:1) 2s4 d with where (cid:98)C = s2(XTX)−1. (4.25) (4.26) (4.27) Thus, the predictive distribution of Y ∗ in (4.25) is approximated by convolution as (cid:16) sε∗(cid:0)1 + 1√ 2d η(cid:1)(cid:17) (cid:99)law Y ∗ ≈ N(cid:0)µY ∗, σ2 Y ∗(cid:1) (cid:126) law 55 with Define µY ∗ = X∗(cid:0)XTX(cid:1)−1XTY Y ∗ = X∗(cid:98)CX∗T σ2 U1 = sε∗ ∼ N (0, s2), U2 = 1 + 1√ 2d η ∼ N (1, 1 2d ) U1 and U2 are independent by definition. Then U := U1 × U2 = sε∗ + ε∗η s√ 2d (4.28) (4.29) (4.30) is a product of two independent normal random variables. Lemma 4.3.1. ηε∗ is a product of two independent standard normal variables, which can be represented in the second Wiener chaos as follows: (cid:0)(η(cid:48))2 − (ε∗(cid:48))2(cid:1) ηε∗ = 1 2 where η = 1√ 2 (η(cid:48) + ε∗(cid:48)) and ε∗ = 1√ 2 (η(cid:48) − ε∗(cid:48)). η(cid:48), ε∗(cid:48) ∼ N (0, 1) are independent. The distribution of U can be seen as a non-convolution sum of two coupled (due to the common ε∗) random variables. The first piece is N (0, s2), the second piece is a product nor- mal, which can be characterized by two independent chi-square distributions (from Lemma 4.3.1), each with degree of freedom 1 (i.e., χ2(1)), and a scaling factor s√ 8d . Corollary 4.3.2. When σ2 = 1 (i.e., X is standard normal), for  > 0 − (+1)2 e 2 ≤ P(X ≥ ) ≤ 1 2 1 2 − 2 e 2 56 Proof. P(X ≥ ) = = ≥ − x2 2 dx − (x+)2 2 dx 1√ 2π 1√ 2π e e − (x+)2 e 2 dx 1√ 2π − 2+2 2 ≥ 0.34e − (+1)2 e 2 ≥ 1 2 (cid:90) ∞ (cid:90) ∞ (cid:90) 1 0  0 (cid:90) ∞ (cid:90) ∞ 0 − (x+)2 2 e dx − x2+2 e 2 dx 1√ 2π 1√ 2π P(X ≥ ) = ≤ = 0 1 2 − 2 e 2 Equivalently, P(X ≥ ) ≤ 1 2 e − 2 2 ≤ P(X ≥  − 1) (4.31) showing the upper bound is between the tail probabilities within one standard deviation. Hence, the cumulative distribution function (CDF) of U , namely ΦU (u) = P(U ≤ u), u > 0 is computed as ΦU (u) = 1 − P(U ≥ u) = 1 − Eη (cid:104)P ε∗(cid:0)ε∗ ≥ ε∗(cid:0)ε∗ < = 1 − Eη + P u s(1 + η√ 2d ) ηε∗ ≥ u(cid:12)(cid:12)η(cid:1)(cid:105) s√ 2d (cid:104)P ε∗(cid:0)sε∗ + (cid:12)(cid:12)η(cid:1)1{η>−√ (cid:105) (cid:12)(cid:12)η(cid:1)1{η<−√ 2d} ) u s(1 + η√ 2d 2d} (4.32) 57 Remark. η ∼ N (0, 1). Suppose d > 4, ε∗(cid:0)ε∗ < P (cid:12)(cid:12)η(cid:1) < P u s(1 + η√ 2d ) ε∗(ε∗ < 0) < 0.5 √ P(η < − 2d) < P(η < −3) < 0.0015 which makes it negligible compared to the first term inside Eη. Based on Corollary 4.3.2, the bounds of ΦU (u) are Φ1(u, d) ≤ ΦU (u) ≤ Φ2(u, d) (4.33) with Φ1(u, d) = 1 − Eη Φ2(u, d) = 1 − Eη )2(cid:21) )2 (cid:21) )]2 − u2 2s2 (1+ 1 η√ 2d − [u+s(1+ 2s2(1+ η√ 2d η√ 2d 2 (cid:20)1 (cid:20)1 (cid:114) 2 e e Theorem 4.3.3. s = YTY−ˆβββ d T XTY defined in (4.5). (cid:90) ∞ −√ (cid:90) ∞ −√ = 1 − 1 2 1√ 2π e 2d = 1 − 1 2 1√ 2π e 2d − u2 2s2 − x2 2 e 1 (1+ x√ 2d )2 dx − [u+s(1+ x√ 2d 2s2(1+ x√ 2d )]2 )2 − x2 2 e dx (4.34) 1 − 1 2 e − u2 2s2 ≤ lim d→∞ ΦU (u) ≤ 1 − 1 2 − (u+s)2 2s2 e Proof. Φ1(u, d): For each x ∈ (−√ 2d,∞), − u2 2s2 − x2 2 e lim d→∞ e 1 (1+ x√ 2d )2 − x2 2 e − u2 2s2 = e 58 Also, (cid:12)(cid:12)(cid:12)(cid:12)e − x2 2 e − u2 2s2 1 (1+ x√ 2d )2(cid:12)(cid:12)(cid:12)(cid:12) ≤ e − x2 2 , (cid:90) ∞ −∞ e √ − x2 2 = 2π < ∞ By Dominated Convergence Theorem, d→∞ Φ1(u, d) = 1 − 1 lim 2 (cid:90) ∞ −∞ 1√ 2π − x2 2 e lim d→∞ e − u2 2s2 1 (1+ x√ 2d )2 dx = 1 − 1 2 − u2 2s2 e In similar, for Φ2(u, d), using Dominated Convergence Theorem, − [u+s(1+ x√ 2d 2s2(1+ x√ 2d )]2 )2 − x2 2 e − (u+s)2 2s2 , = e − x2 2 e lim d→∞ e (cid:12)(cid:12)(cid:12)(cid:12)e − x2 2 e − [u+s(1+ x√ 2d 2s2(1+ x√ 2d )]2 )2 (cid:12)(cid:12)(cid:12)(cid:12) ≤ e − x2 2 (cid:90) ∞ −∞ ⇒ lim d→∞ Φ2(u, d) = 1 − 1 Therefore, when d −→ ∞, 2 1√ 2π lim d→∞ e − x2 2 e − [u+s(1+ x√ 2d 2s2(1+ x√ 2d )]2 )2 dx = 1 − 1 2 − (u+s)2 2s2 e 1 − 1 2 e − u2 2s2 = lim d→∞ Φ1(u, d) ≤ lim d→∞ ΦU (u) ≤ lim d→∞ Φ2(u, d) = 1 − 1 2 − (u+s)2 2s2 e Lemma 4.3.4. Let X ∼ N (µ, σ2) be a Gaussian random variable. For any non-negative integer m, the m-th moment of X is  E(Xm) = , 3 2 ,− µ 2σ2 ), m is odd (4.35) m is even 1 − m 2 ,− µ 2σ2 ), µσm−12 m+1 2 Γ( m 2 + 1)√ π M ( σm2 m 2 Γ( m+1 2 )√ π M (−m 2 , 1 2 59 (cid:90) ∞ 0 Γ(z) = tz−1e−tdt is the gamma function and M (a, b, z) = ∞(cid:88) n=0 a(a + 1)··· (a + n − 1)zn b(b + 1)··· (b + n − 1)n! is the Kummer’s confluent hypergeometric function. In particular, if µ = 0, E(Xm) = (cid:90) ∞ √ −∞ xm 1 2π = σm(m − 1)!! σ − x2 2σ2 dx e (4.36) when m is even. !! denotes double factorial, all odd moments are 0. Theorem 4.3.5. s = (same as in Theorem 4.3.3). For any u > 0, (cid:114) (cid:18) YTY−ˆβββ d T XTY − (u+s)2 e 1 − 1 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)ΦU (u) − (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ δ2(u, d) − (u+s)2 2s2 e πd √ 4s2 2s2 − u(u + s)(2 − e−d) s + 1)2(cid:3) u2(cid:2)3 + ( u − (u+s)2 2s2 e δ2(u, d) = 8s2d where The proof uses second-order Taylor series expansion on e − (u+s)2 2s2 and Lemma 4.3.4 (see Appendix D for further reference). Finally, Y ∗ can be divided as the following: Y ∗ ≈ X∗ ˆβββ + = µY ∗ + σY ∗ε(cid:48) + sε∗(cid:0)1 + = µY ∗ +(cid:0)σY ∗ε(cid:48) + sε∗(cid:1) + η(cid:1) ηε∗ 1√ 2d s√ 2d (4.37) (cid:112) s2ε∗ 60 where ε(cid:48) ∼ N (0, 1) is independent from ε∗ and η. The CDF of V := σY ∗ε(cid:48) + U (4.38) is ΦV (v) = P(V ≤ v), v > 0 derived as ΦV (v) = P(σY ∗ε(cid:48) + U ≤ v) = E ε(cid:48) (cid:105) (cid:104)P(U ≤ v − σY ∗ε(cid:48)) (cid:104) ΦU (cid:0)v − σY ∗ε(cid:48)(cid:1)(cid:105) = E ε(cid:48) Thus, according to Theorem 4.3.5, the bound of ΦV (v) is (cid:104) ΦV (v) ≤ Ex − (v−xσY ∗ +s)2 2s2 − (v+s−xσY ∗ )2 1 − 1 2 e (cid:90) ∞ (cid:90) ∞ −∞ e = 1 − 1 2 e −∞ 1√ = 1 − 1 2 2π − (v+s)2 2(s2+σ2 √ 2 − (v+s)2 2(s2+σ2 = 1 − e 2π Y ∗) (cid:113) Y ∗) Y ∗ s2 + σ2 = 1 − se 2 (cid:105) + δ(cid:48) 2(v − xσY ∗, d) − 1 2 2s2 (cid:2)(cid:0)1+ (cid:90) ∞ −∞ e + Ex 2 dx + Ex (v+s)σY ∗ x+ (cid:0)x− (v+s)σY ∗ Y ∗ s2+σ2 s2 − x2 e 1√ 2π Y ∗ σ2 s2 (cid:1)x2−2 (cid:2) s2+σ2 (cid:2)δ(cid:48) 2(v − xσY ∗, d)(cid:3) − 1 2 Y ∗ s2 (cid:3) (v+s)2 (cid:2)δ(cid:48) 2(v − xσY ∗, d)(cid:3) (cid:2)δ(cid:48) 2(v − xσY ∗, d)(cid:3) (cid:2)δ(cid:48) 2(v − xσY ∗, d)(cid:3) dx + Ex dx + Ex (cid:1)2(cid:3) s2 given δ(cid:48) 2(u, d) = δ2(u, d) + − (u+s)2 2s2 e (4.39) (4.40) u(u + s)(2 − e−d) √ πd 4s2 61 Denote the density function of X(cid:48) ∼ N (µX(cid:48), σ2 X(cid:48)) as fX(cid:48)(x) = − (x−µX(cid:48) )2 X(cid:48) 2σ2 √ 1 σX(cid:48) e 2π  µX(cid:48) = σ2 X(cid:48) = (v + s)σY ∗ Y ∗ s2 + σ2 s2 Y ∗ s2 + σ2 (4.41) Using Lemma 4.3.4, we have 2(v − xσY ∗, d)(cid:3) = (cid:2)δ(cid:48) Ex = = + + 2s2 π Y ∗) √ π dx √ π s s (cid:90) ∞ − x2 2√ 2π (v − xσY ∗)e 4s2 √ 2 d − (v+s)2 2(s2+σ2 e − (v−xσY ∗ +s)2 √ d (cid:18)(v − xσY ∗ + s)(2 − e−d) + 1(cid:1)2(cid:105)(cid:19) e (cid:104) 3 +(cid:0) v − xσY ∗ (cid:18)(v − xσY ∗ + s)(2 − e−d) (cid:90) ∞ −∞(v − xσY ∗) + 1(cid:1)2(cid:105)(cid:19) (cid:104) 3 +(cid:0) v − xσY ∗ (cid:26)2 − e−d√ (cid:104) (cid:16) − 3v2σY ∗µX(cid:48) + 3vσ2 −∞ v − xσY ∗ (cid:113) 4s v − xσY ∗ (cid:113) X(cid:48)(cid:1)(cid:105) Y ∗ ) d(s2 + σ2 √ 2 d − (v+s)2 2(s2+σ2 e Y ∗ ) Y ∗) d(s2 + σ2 √ 1 2 v2 − 2vσY ∗µX(cid:48) + σ2 fX(cid:48)(x)dx Y ∗(cid:0)µ2 v(v + s) − (2v + s)σY ∗µX(cid:48) + σ2 X(cid:48)(cid:1)(cid:17) (cid:16) Y ∗(cid:0)µ2 X(cid:48)(cid:1)(cid:17) X(cid:48)(cid:1) − σ3 Y ∗(cid:0)µ3 X(cid:48)(cid:1) − 4vσ3 Y ∗(cid:0)µ2 Y ∗(cid:0)µ3 X(cid:48)(cid:1)(cid:17)(cid:105)(cid:27) 2 s X(cid:48) + 3µX(cid:48)σ2 X(cid:48) + 3σ4 X(cid:48) + 6µ2 X(cid:48) + σ2 X(cid:48) + σ2 X(cid:48)σ2 X(cid:48) v3 + 4 Y ∗(cid:0)µ2 X(cid:48)(cid:1) + σ4 Y ∗(cid:0)µ4 X(cid:48) + σ2 v4 − 4v3σY ∗µX(cid:48) + 6v2σ2 1 s2 + 3µX(cid:48)σ2 + (cid:16) (cid:104) d 4s + σ2 + X(cid:48) (4.42) 62 Hence, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)ΦV (v) − (cid:18) (cid:113) − (v+s)2 2(s2+σ2 Y ∗) Y ∗ s2 + σ2 1 − se 2 with a precision as (cid:113) − (2 − e−d)e − (v+s)2 2(s2+σ2 Y ∗) Y ∗) 4s πd(s2 + σ2 − (2v + s)σY ∗µX(cid:48) + σ2 X(cid:48) + σ2 v(v + s) (cid:104) Y ∗(cid:0)µ2 X(cid:48)(cid:1)(cid:105)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ δ(v, d) (4.43) (4.44) δ(v, d) = v2 − 2vσY ∗µX(cid:48) + σ2 (cid:104) (cid:16) e − (v+s)2 2(s2+σ2 Y ∗) Y ∗ s2 + σ2 (cid:113) − 3v2σY ∗µX(cid:48) + 3vσ2 8sd 4 (cid:16) X(cid:48) + σ2 v4 − 4v3σY ∗µX(cid:48) + 6v2σ2 Y ∗(cid:0)µ2 X(cid:48)(cid:1) + σ4 Y ∗(cid:0)µ4 1 s2 + 3µX(cid:48)σ2 + 2 s X(cid:48) + σ2 (cid:16) X(cid:48)(cid:1)(cid:17) Y ∗(cid:0)µ2 X(cid:48)(cid:1)(cid:17) X(cid:48)(cid:1) − σ3 Y ∗(cid:0)µ3 Y ∗(cid:0)µ2 X(cid:48)(cid:1) − 4vσ3 Y ∗(cid:0)µ3 X(cid:48)(cid:1)(cid:17)(cid:105) X(cid:48) + 3µX(cid:48)σ2 X(cid:48) + 3σ4 X(cid:48) + σ2 X(cid:48) X(cid:48) + 6µ2 X(cid:48)σ2 + v3 and µX(cid:48), σ2 X(cid:48) defined in (4.41). The sharpness of the bound (4.43) is proved in Appendix E. Let g(v, d) = v(v + s) − (2v + s)σY ∗µX(cid:48) + σ2 X(cid:48) + σ2 (4.45) Y ∗(cid:0)µ2 X(cid:48)(cid:1) be the non-normal correction for ΦV (v). For the sake of simplicity and also without loss of generality, removing e−d, we have eV = 1 − e 2 (cid:113) − (v+s)2 2(s2+σ2 Y ∗) Y ∗ s2 + σ2 (cid:16) s + √ 1 πd s (cid:17) g(v, d) (4.46) 63 Plug in (4.41) to expand g(v, d). g(v, d) = = = s2 (s2 + σ2 s2 (s2 + σ2 s4 (s2 + σ2 Y ∗)2 Y ∗)2 Y ∗)2 (cid:2)s2v2 + (s3 − sσ2 (cid:104) s2(cid:0)v + (cid:0)v + Y ∗(cid:3) (cid:1)2 − (s2 − σ2 (cid:1)2 + Y ∗ − s2) Y ∗) Y ∗)v + σ4 s2 − σ2 Y ∗ 2s s2 − σ2 Y ∗ 2s s2(3σ2 4(s2 + σ2 4 Y ∗)2 (cid:105) Y ∗ + σ4 (4.47) is increasing in v when v > − s2−σ2 Y ∗ 2s , so is eV . Denote the precision term δ(v, d) in (4.44) as δ(v, d) = e (cid:113) − (v+s)2 2(s2+σ2 Y ∗ ) Y ∗ s2 + σ2 8sd (cid:16) (cid:17) h1(v, d) + h2(v, d) + h3(v, d) where h1(v, d) = 4 h2(v, d) = h3(v, d) = X(cid:48) + σ2 (cid:104) (cid:104) v2 − 2vσY ∗µX(cid:48) + σ2 (cid:104) v3 − 3v2σY ∗µX(cid:48) + 3vσ2 v4 − 4v3σY ∗µX(cid:48) + 6v2σ2 X(cid:48)(cid:1)(cid:105) Y ∗(cid:0)µ2 Y ∗(cid:0)µ2 X(cid:48)(cid:1) − σ3 Y ∗(cid:0)µ3 Y ∗(cid:0)µ2 X(cid:48)(cid:1) − 4vσ3 Y ∗(cid:0)µ3 X(cid:48)(cid:1)(cid:105) X(cid:48) + 3σ4 Y ∗(cid:0)µ4 X(cid:48) + σ2 X(cid:48) + 6µ2 X(cid:48)σ2 X(cid:48) + σ2 + σ4 2 s 1 s2 X(cid:48)(cid:1)(cid:105) X(cid:48)(cid:1) X(cid:48) + 3µX(cid:48)σ2 X(cid:48) + 3µX(cid:48)σ2 (4.48) (4.49) 64 Using the similar expansion in (4.47), h1(v, d) = = h2(v, d) = h3(v, d) = 4s2 (s2 + σ2 4s4 (s2 + σ2 2 (s2v2 − 2σ2 Y ∗ s Y ∗)2 Y ∗)3 Y ∗)2 (cid:0)v − σ2 (cid:1)2 + (cid:104)(cid:0)s6 + s4σ2 Y ∗(cid:1)v − s3σ4 Y ∗(cid:0)s2 + 2σ2 (cid:104) s8v4 − 4(cid:0)s7σ2 Y ∗(cid:1)v + s4σ4 Y ∗)4 Y ∗ + 4s5σ6 1 Y ∗) Y ∗sv + s2σ2 Y ∗ + 2σ4 Y ∗ 4s2σ2 Y ∗ s2 + σ2 Y ∗ + σ6 Y ∗ + 2s2σ4 Y ∗(cid:1)v3 − 3s5σ2 Y ∗(cid:1)(cid:105) Y ∗(cid:0)3s2 + 4σ2 Y ∗(cid:1)v3 + 6(cid:0)s8σ2 Y ∗(cid:0)3s4 + 12s2σ2 Y ∗ + 10σ4 Y ∗ + 3sσ8 s(s2 + σ2 + 3s4σ2 s2(s2 + σ2 − 4(cid:0)3s7σ4 Y ∗v2 Y ∗(cid:1)v2 Y ∗ + 2s6σ4 Y ∗(cid:1)(cid:105) Plug (4.50) into (4.48), δ(v, d) is δ(v, d) = e − (v+s)2 2(s2+σ2 (cid:104) + 2(cid:0)2s10 + 10s8σ2 Y ∗) Y ∗) 8s3d(s2 + σ2 9 2 s8v4 + 2(cid:0)s9 + 3s5σ4 Y ∗(cid:1)v3 Y ∗ − 5sσ8 Y ∗ + 11s6σ4 Y ∗(cid:0)4s6 + 13s4σ2 Y ∗ + 18s2σ4 + s4σ2 Y ∗ + 3s3σ6 Y ∗(cid:1)v2 − 2(cid:0)s9σ2 Y ∗(cid:1)(cid:105) Y ∗ + 10σ6 Y ∗ + 5s7σ4 Y ∗ + 6s5σ6 Y ∗(cid:1)v (4.50) (4.51) which shows that except for the well-expected 1 d, δ(v, d) is free of d. 4.3.2 Simulation Results Now we choose different values for s and σ2 Y ∗ under certain circumstances. Note that δ(v, d) is homogeneous degree 0 (i.e., dimension-free). Given a standardized data set, the values of v are one-standard-deviation percentile and two-standard-deviation percentile at α = 2.5%, 16%. As we can see from Table 4.1, when n is large, the precision of the approxi- 65 mation is quite high because δ(v, d) is extremely small (roughly 0.1% and 0.5% of eV in each respective scenario). We may lose some degree of accuracy when the significance level α is increasing. Even if n is not large enough compared to the number of explanatory variables k, our approximation is still reliably precise (later blocks in Table 4.1). The non-normal correc- tion eV is monotonously increasing when the magnitude of new noise σ2 The largest percentile v appears at s2 is comparable with σ2 Y ∗ becomes bigger. Y ∗, but overall not affected much by the the noise scale, which indicates its consistency. 4.4 Discussions In this chapter, not only do we construct an explicit sequence of closed-form functions to approximate the true predictive distribution of the response variable, but also we provide a comprehensive analysis on the approximation based on Wiener’s polynomial chaos. The approximation is with a great deal of accuracy, and the unbiased estimation convergences in a relatively-fast fashion. The biggest advantage of this approximation methodology is to avoid numeric sampling algorithms such as Markov chain Monte Carlo, which reduces the computational burden to a certain level [108]. Also, the boundary formulations as well as the asymptotic properties yield the benefits to closely monitor the performance of predictions. Second Wiener chaos is a linear space, and because the model is linear, its solution lies in the same chaos. In many practical situations, however, incomplete or inaccurate statistical knowledge about parameters’ uncertainties limits the utility of high-order polynomial chaos expansions [109]. Fortunately, in order to create a finite-order expansion, we just need some reliable information on the probability measure that can be represented by a finite 66 number of moments. One of the possible extensions is to explore the behavior in a nonlinear- model setting such as involving stochasticity. This may require the derivation of maximum likelihood estimators under approximated log-likelihood functions [110]. 67 α n k s2 0.5% 1000 3 0.15 2.5% 1000 3 0.15 10% 1000 3 0.15 16% 1000 3 0.15 0.5% 200 12 1.50 2.5% 200 12 1.50 10% 200 12 1.50 16% 200 12 1.50 Y ∗ σ2 0.01 0.05 0.13 0.48 1.50 0.01 0.05 0.13 0.48 1.50 0.01 0.05 0.13 0.48 1.50 0.01 0.05 0.13 0.48 1.50 0.15 0.47 1.52 4.33 12.5 0.15 0.47 1.52 4.33 12.5 0.15 0.47 1.52 4.33 12.5 0.15 0.47 1.52 4.33 12.5 v -0.3944 -0.3944 -0.3944 -0.3934 -0.3934 -0.3943 -0.3945 -0.3939 -0.3949 -0.3949 -0.3945 -0.3949 -0.3941 -0.3946 -0.3932 -0.3944 -0.3945 -0.3949 -0.3974 -0.4025 -1.2763 -1.2763 -1.2763 -1.2763 -1.2450 -1.2787 -1.2787 -1.2738 -1.3031 -1.3031 -1.2783 -1.2783 -1.2763 -1.2867 -1.3063 -1.2780 -1.2709 -1.2765 -1.2769 -1.2775 eV 0.5153 0.5650 0.6310 0.7527 0.8468 0.5153 0.5650 0.6310 0.7527 0.8468 0.5153 0.5650 0.6310 0.7527 0.8468 0.5153 0.5650 0.6310 0.7527 0.8468 0.5211 0.5591 0.6401 0.7386 0.8303 0.5211 0.5591 0.6402 0.7385 0.8303 0.5211 0.5591 0.6401 0.7386 0.8303 0.5211 0.5591 0.6401 0.7386 0.8303 δ(v, d) 0.0005 0.0006 0.0006 0.0007 0.0006 0.0005 0.0006 0.0006 0.0007 0.0006 0.0005 0.0006 0.0006 0.0007 0.0006 0.0005 0.0006 0.0006 0.0007 0.0006 0.0028 0.0033 0.0034 0.0037 0.0033 0.0028 0.0033 0.0034 0.0038 0.0035 0.0028 0.0033 0.0034 0.0037 0.0036 0.0028 0.0033 0.0034 0.0037 0.0035 Y ∗) under different sig- Table 4.1: Approximation results for various combinations of (s2, σ2 nificance levels (α), dimension (n), and number of coefficients (k) 68 Chapter 5 Future Work Directions In the agriculture study, one way to improve the analysis is to include more data points from later years. We could also remove the irrelevant explanatory variables from the model, or to develop a link among maize yield, SPAD, and striga to systematically analyze the determinants of each individual response. The latter proposal is most definitely going to generate more coefficients needed to be estimated [18]. For the climatology project, assigning more proper prior distributions to the model co- efficients may be able to achieve their estimates all at once. However, it is very likely to be accompanied with more computational burden. Another possibility is to incorporate a spa- tial pattern for the natural proxies over time [59], which needs an extensive understandings of climatic system to propose a more feasible temporal covariance structure, thus simplifying calculations and bringing more meaningful insights. There are several directions to extend the work in Chapter 4. For example, we can utilize Taylor-series expansion with higher order, so to obtain a more statistical power and faster convergence [111]. Moreover, we can either compare our estimates with the actual MCMC- method results, or calculate its total-variation distance between chaos terms and the normal law [112], to see how far the approximations are away from the true values. 69 APPENDICES 70 APPENDIX A Model Robustness In (3.4), σC is defined to grow with the RCPs’ magnitudes for each scenario. Here, we test the model robustness by applying different fraction values (i.e., 0, 0.5, 2, 3) in σC . Apparently, based on the prediction graphs (Figures A.1, A.2, A.3, A.4), the patterns are quite similar to Figure 3.6 except for reasonable fluctuations when σC increases. The convergence diagnosis (Figure A.5) shows that the Gibbs samplers indeed converge (and quite fast). Hence, the projection model is not heavily affected by the added greenhouse gas uncertainty with various magnitudes (robust in other words). Figure A.1: Temperature projections’ comparisons (CMIP5) with no σC 71 Figure A.2: Temperature projections’ comparisons (CMIP5) with 1 2σC Figure A.3: Temperature projections’ comparisons (CMIP5) with 2σC Figure A.4: Temperature projections’ comparisons (CMIP5) with 3σC 72 Figure A.5: 10,000 MCMC samples of all model parameters (from top to bottom): α1, α0, βC , βV , βS, β0, σ2 P , σ2 T 73 APPENDIX B Exponential Decay The hierarchical stochastic model described in (3.6) is equivalent as follows after iteration:  Pt = atP0 + = atP0 + (uα)t + Tt = btT0 + t−1(cid:88) i=0 t−1(cid:88) i=0 t−1(cid:88) i=0 aiσP εt−i (cid:1) + ai(cid:0)α0 + α1Tt−i t−1(cid:88) bi(cid:0)β0 + βSSt−i + βV Vt−i + βC Ct−i t−1(cid:88) aiσP εt−i i=0 = btT0 + (vβ)t + biσT ηt−i i=0 (cid:1) + t−1(cid:88) i=0 biσT ηt−i (B.1) Hereinafter, u ∈ MT,2(R) such that ∀t ≤ T : t−1(cid:88) i=0 t−1(cid:88) i=0 aiTt−i (B.2) ut,0 = ai, ut,1 = and v ∈ MT,4(R) is definite and same for all three forcings (t ≤ T ) such that t−1(cid:88) t−1(cid:88) t−1(cid:88) t−1(cid:88) vt,0 = ai, vt,1 = aiSt−i, vt,2 = aiVt−i, vt,3 = aiCt−i (B.3) i=0 i=0 i=0 i=0 These formalizations show that proxy (P) and temperature (T) at time t, take all tem- perature and forcings before t into consideration respectively with exponential decay. Moreover, the time series Pt and Tt are non-stationary, whose expectations depend on t. 74 When t is large enough, assume P0 = T0 = 0 and let σP = 1 − a2, σT = 1 − b2. From now on, we use [Y(cid:12)(cid:12)X] to represent the conditional probability distribution of the random variable Y given X. Then the variances of [P(cid:12)(cid:12)T, a, α, σP ] and [T(cid:12)(cid:12)b, β, σT ] are approximately equal to constants σP and σT separately. Namely, the conditional densities of P and T are two normal distributions with known covariance structures: (B.4)  (cid:2)P(cid:12)(cid:12)T, a, α, σP (cid:2)T(cid:12)(cid:12)b, β, σT (cid:3) ∼ N(cid:0)uα, σ2 (cid:3) ∼ N(cid:0)vβ, σ2 (cid:1) (cid:1) P ΣP T ΣT where ΣP and ΣT are the covaraince matrices of AR(1) processes with parameters a and b. 75 APPENDIX C Posterior-distribution Computation Recall the prior distributions: normal (N ), inverse gamma (IG), and uniform (U) that are assigned to the parameters in the model (3.6): α ∼ N(cid:0)µα, I2 β ∼ N(cid:0)µβ, I4 σ2 ∼ IG(cid:0)q, r(cid:1) a, b ∼ U(cid:0)0, 1(cid:1) (cid:1) (cid:1) (C.1) where α = (α0, α1), β = (β0, βS, βC , βV ), σ2 = (σ2 P , σ2 T ) q = (qP , qT ), r = (rP , rT ) In is an identity matrix of n dimensions. In particular, we also assign the initial values as: µα = (0, 1) µβ = (0, 1, 1, 1) q = (2, 2) r = (0.1, 0.1) (C.2) After some derivations, we obtain the full posterior distributions for all the parameters. 76 Regression coefficients: α, β (cid:2)α(cid:12)(cid:12)P, T, a, σ2 P (cid:3) ∝(cid:2)P(cid:12)(cid:12)α, T, a, σ2 P P (cid:3)(cid:2)α(cid:12)(cid:12)T, a, σ2 (cid:3) (cid:0)P − uα(cid:1)(cid:41) (cid:0)P − uα(cid:1)TΣ−1 αT(cid:0)uTΣ−1 (cid:1)α + (cid:27) P u + σ2 P I2 P − 1 2σ2 P − 1 2σ2 P (α − µα)TΩ−1 (cid:1) ∝ exp (cid:40) (cid:40) (cid:26) ∼ N(cid:0)µα, Ωα ∝ exp ∝ exp −1 2 α (α − µα) (cid:26) −1 2 exp (cid:107)α(cid:107)2 (cid:27) (cid:41) αTuTΣ−1 P P 1 σ2 P where exp means exponential distribution and (cid:2)β(cid:12)(cid:12)T, b, σ2 T with  µα = Ω−1 α = 1 σ2 P 1 σ2 P ΩαuTΣ−1 P P P u + I2 uTΣ−1 (cid:3) T T ∝ exp (cid:3) ∝(cid:2)T|β, b, σ2 (cid:40) (cid:40) (cid:26) ∼ N(cid:0)µβ, Ωβ (cid:3)(cid:2)β(cid:12)(cid:12)b, σ2 (cid:0)T − vβ(cid:1)TΣ−1 − 1 βT(cid:0)vTΣ−1 2σ2 T − 1 2σ2 T (β − µβ)TΩ−1 (cid:1) (cid:0)T − vβ(cid:1)(cid:41) (cid:1)β + (cid:27) β (β − µβ) T v + σ2 ∝ exp ∝ exp −1 2 T I4 T 1 σ2 T  µβ = Ω−1 β = ΩβvTΣ−1 T T vTΣ−1 T v + I4 1 σ2 T 1 σ2 T 77 (C.3) (C.4) (C.5) (C.6) (cid:26) exp (cid:107)β(cid:107)2 −1 2 βTvTΣ−1 T T (cid:27) (cid:41) where dim is short for dimension and Scale of noise terms: σ2 P , σ2 T (cid:2)σ2 P P ∝ (cid:1) σ2 P (cid:12)(cid:12)P, T, a, α(cid:3) ∝(cid:2)P(cid:12)(cid:12)σ2 P , T, a, α,(cid:3)(cid:2)σ2 (cid:40) (cid:19)dimT (cid:18) 1 (cid:41) (cid:40) (cid:19)qP + dimT σP × exp − rP σ2 P exp 2 +1 ∝ P = qP + r(cid:48) P = rP + (cid:18) 1 ∼ IG(cid:0)q(cid:48) P , r(cid:48) q(cid:48) (cid:12)(cid:12)T, b, β(cid:3) ∝(cid:2)T(cid:12)(cid:12)σ2 T , b, β(cid:3)(cid:2)σ2 T |b, β(cid:3) (cid:40) (cid:18) 1 (cid:19)dimT (cid:41) (cid:40) (cid:19)qT + dimT (cid:1) q(cid:48) (cid:18) 1 ∼ IG(cid:0)q(cid:48) T , r(cid:48) T = qT + r(cid:48) T = rT + σT × exp − rT σ2 T 2 +1 − 1 2σ2 T (cid:40) σ2 T exp exp ∝ ∝ T (cid:2)σ2 T with (C.7) (C.8) (cid:19)qT +1 dimT 2 (cid:0)P − uα(cid:1)TΣ−1 P 1 2 (cid:0)T − vβ(cid:1)TΣ−1 T (cid:0)P − uα(cid:1) (cid:0)T − vβ(cid:1)(cid:41)(cid:18) 1 σ2 T (cid:20) − 1 σ2 T rT + 1 2 (cid:0)T − vβ(cid:1)TΣ−1 T (cid:0)T − vβ(cid:1)(cid:21)(cid:41) (C.9) P |T, a, α(cid:3) − 1 2σ2 P (cid:0)P − uα(cid:1)TΣ−1 P (cid:0)P − uα(cid:1)(cid:41)(cid:18) 1 σ2 P (cid:19)qP +1 (cid:40) (cid:20) rP + (cid:0)P − uα(cid:1)TΣ−1 P 1 2 (cid:0)P − uα(cid:1)(cid:21)(cid:41) − 1 σ2 P exp dimT 2 (cid:0)T − vβ(cid:1)TΣ−1 (cid:0)T − vβ(cid:1) (C.10) T 1 2 78 Autoregressive coefficients: a, b P (cid:2)a(cid:12)(cid:12)P, T, α, σ2 (cid:2)b(cid:12)(cid:12)T, β, σ2 T P P ∝ (cid:3) ∝(cid:2)P|a, T, α, σ2 1(cid:112)det(ΣP ) (cid:3) ∝(cid:2)T|b, β, σ2 1(cid:112)det(ΣT ) (cid:3)(cid:2)a|T, α, σ2 (cid:40) (cid:3)(cid:2)b|β, σ2 (cid:40) (cid:3) (cid:0)P − uα(cid:1)TΣ−1 (cid:0)T − vβ(cid:1)TΣ−1 − 1 2σ2 P − 1 2σ2 T (cid:3) exp exp ∝ T T P T (cid:0)P − uα(cid:1)(cid:41) (cid:0)T − vβ(cid:1)(cid:41) (C.11) where det stands for determinant (of a matrix). Finally, we derive the posterior distribution for temperature (T), from which to draw samples using Gibbs sampler. (cid:2)T(cid:12)(cid:12)P, a, b, α, β, σ2 P , σ2 T (cid:3) ∝(cid:2)P(cid:12)(cid:12)T, a, b, α, β, σ2 P , σ2 T (cid:3)(cid:2)T|b, β, σ2 (cid:3) (cid:0)P − uα(cid:1)(cid:41) (cid:0)P − uα(cid:1)TΣ−1 (cid:0)T − vβ(cid:1)(cid:41) (cid:0)T − vβ(cid:1)TΣ−1 P T T ∝ exp × exp (cid:40) − 1 2σ2 P − 1 2σ2 T (C.12) (cid:40) t−1(cid:88) T(cid:88) i=0 where u relies on T. Let us re-write (P − uα) at time t by expanding u as defined above: (cid:0)P − uα(cid:1) t−1(cid:88) t = Pt − α0 aiTt−i − α1 i=0 at−i1{i≤t} − α1 aiTt−i T(cid:88) at−iTi1{i≤t} (C.13) = Pt − α0 i=0 i=0 = P − α0Me − α1MT where M ∈ MT,T (R) with Mi,j = 1{j≤i}ai−j, and e is a vector whose entries are all ones. 79 Therefore, (cid:2)T(cid:12)(cid:12)P, a, b, α, β, σ2 P , σ2 T (cid:3) ∝ exp (cid:40) × exp (cid:40) ∝ exp (cid:0)P − α0Me − α1MT(cid:1)(cid:41) (cid:41) (cid:41) TTΣ−1 T vβ 1 σ2 T (cid:40) − 1 2σ2 P − 1 2σ2 T (cid:0)P − α0Me − α1MT(cid:1)TΣ−1 (cid:0)T − vβ(cid:1)(cid:41) (cid:0)T − vβ(cid:1)TΣ−1 (cid:17) TT(cid:16) α2 MTΣ−1 Σ−1 P T T −1 2 (cid:40) 1 σ2 P 1 σ2 T T P M + (cid:0)P − α0Me(cid:1) + (cid:1)(cid:27) T (T − µT P × exp α1 σ2 P −1 2 (cid:26) ∼ N(cid:0)µT , ΩT ∝ exp TTMTΣ−1 (cid:0)T − µT )TΩ−1 (cid:1) with  µT = Ω−1 T = α1 σ2 P α2 1 σ2 P (cid:0)P − α0Me(cid:1) + ΩT Σ−1 T vβ 1 σ2 T P ΩT MTΣ−1 MTΣ−1 P M + Σ−1 T 1 σ2 T (C.14) (C.15) Theorem C.0.1. Assume X =  ∼ N X1 X2  , (cid:32)µ1 µ2  Σ1 Σ12 Σ21 Σ2 (cid:33) where Σ21 = ΣT 12. Then (cid:2)X1 (cid:12)(cid:12)X2 (cid:3) ∼ N(cid:16) (cid:0)X2 − µ2 (cid:1), Σ1 − Σ12Σ−1 2 Σ21 (cid:17) µ1 + Σ12Σ−1 2 Consider X1 = T1 (past) and X2 = T2 (calibration). According to Theorem C.0.1, the 80 posterior distribution of past temperature given the data is: (cid:12)(cid:12)T2, P, a, b, α, β, σ2 (cid:0)Ω (cid:0)Ω (1) T + Ω (1)(2) T − Ω (1)(2) T P , σ2 T (2)(2) T (cid:3) ∼ N(cid:0)µ, Ω(cid:1) (cid:1)−1(cid:0)T2 − µ (cid:1)−1(cid:0)Ω (2) T (2)(1) T (2)(2) T Ω = Ω (1)(1) T (cid:2)T1 µ = µ  (cid:1) (cid:1) (C.16) (i)(j) where Ω T (i = 1, 2 and j = 1, 2) is the partitioned matrix. The projection (denoted by T3) can be done via the second level in (3.6) after all climate forcings extended to Year 2100. 81 APPENDIX D Proof of Theorem 4.3.5 Proof. Let H(x) = e − 1 2 [ u s(1+ x√ 2d ) +1]2 . The first and second derivative of H(x) are − 1 2 [ H(cid:48)(x) = e u s(1+ x√ 2d ) +1]2 − 1 2 [ = e u s(1+ x√ 2d ) +1]2 := H(x)H1(x) ×(cid:16) −(cid:2) u s(1 + x√ 2d ) √ × u s 2d 1 (1 + x√ 2d )2 + 1(cid:3)(cid:17) × u (cid:2) u s (cid:2) − + 1(cid:3) s(1 + x√ 2d ) 1 (1 + x√ 2d (cid:3) 1√ 2d )2 H(cid:48)(cid:48)(x) = H(cid:48)(x)H1(x) + H(x)H(cid:48) 1 (x) + H(cid:48) 1(x)(cid:3) + 1(cid:3)2 + 2(cid:2) + 1]2 2s2d(1 + x√ 2d s(1 + x√ 2d ) )4 u2 1(x) = H(x)(cid:2)H2 (cid:2) (cid:16) (cid:26) u[ + 1(cid:3)(cid:17)(cid:27) )3 )3 u u u 2sd(1 + x√ 2d + 2(cid:2) 2sd(1 + x√ 2d s(1 + x√ 2d ) u u ) s(1 + x√ 2d u s(1+ x√ ) 2d s(1 + x√ 2d ) − 1 2 [ = e u2 s(1+ x√ 2d ) +1]2 − 1 2 [ u2 s(1+ x√ 2d ) +1]2 − e − 1 2 [ u2 s(1+ x√ 2d ) +1]2 = e −(cid:16) u s(1 + x√ 2d ) + 1(cid:3)(cid:17) ) (D.1) u s(1 + x√ 2d 82 Use Taylor expansion on H(x) at point a (a ∈ IR), H(x) = H(n)(a) (x − a)n n! ∞(cid:88) 0 − u2 2s2 = e 1 (1+ a√ 2d )2 − u2 2s2 1 (1+ a√ 2d )2 + e (x − a) u2 √ s2 2d 1 (1 + a√ 2d )3 + R1(x) where R1(x) is the mean-value (or Lagrange) form of the remainder: R1(x) = H(cid:48)(cid:48)(ξ) 2! (x − a)2 − u2 2s2 e (x − a)2 = 2 1 ξ√ 2d (1+ )2 u2 2s2d 1 (1 + ξ√ 2d )4 (cid:104) u2 s2(1 + ξ√ 2d )2 (cid:105) − 3 for ξ ∈ [a, x]. When a = 0 (i.e., Maclaurin series), 2s2 (cid:104) − (u+s)2 + 1(cid:1)(cid:105) (cid:0) u s + RH (x) (D.2) 1 + √ ux 2d s H(x) = e and RH (x) = x2 2! H(cid:48)(cid:48)(ξ) − 1 2 [ = e x2 2 −(cid:16) u2 ξ√ 2d s(1+ +1]2 ) u + 2(cid:2) 2sd(1 + ξ√ 2d u s(1 + ξ√ 2d ) u s(1 + ξ√ 2d ) (cid:26) u[ + 1(cid:3)(cid:17)(cid:27) )3 + 1]2 u ξ√ s(1+ ) 2d s(1 + ξ√ 2d ) (D.3) Note that (cid:90) ∞ −√ − x2 e 2 e 1√ 2π 2d − [u+s(1+ x√ 2d 2s2(1+ x√ 2d )]2 )2 dx, Φ2(u, d) = 1 − 1 2 83 d→∞ Φ2(u, d) = 1 − 1 lim 2 − (u+s)2 2s2 e Hence, plug in (D.2) and by triangle inequality, (cid:12)(cid:12)(cid:12)Φ2(u, d) −(cid:0)1 − 1 2s2 (cid:1)(cid:12)(cid:12)(cid:12) ≤ 1 − (u+s)2 e 2 (cid:12)(cid:12)(cid:12)dx − (u+s)2 2s2 − e + 1(cid:1)dx (cid:90) ∞ (cid:90) ∞ −√ (cid:90) ∞ −√ (cid:90) ∞ −√ −√ 2d 2d 2d 2d 1√ 2π 1√ 2π 1√ 2π 1√ 2π − x2 e 2 − x2 e +1]2 ) − 1 2 [ u s(1+ x√ 2d (cid:12)(cid:12)(cid:12)e 2 (cid:12)(cid:12)H(x) − H(0)(cid:12)(cid:12)dx (cid:0) u 2 (cid:12)(cid:12)RH (x)(cid:12)(cid:12)dx u|x| √ 2d s − (u+s)2 2s2 s − x2 2 e e − x2 e 2 ≤ 1 2 ≤ 1 2 + 1 2 := Φ21(u, d) + Φ22(u, d) (cid:16)(cid:90) ∞ (cid:90) 0 −√ 0 2d 1√ 2π 1√ 2π − x2 2 e e − (u+s)2 2s2 s − (u+s)2 − x2 2 e e Φ21(u, d) = 1 2 + √ 1 2π 2 − (u+s)2 2s2 √ u(u + s)e 2d − (u+s)2 2s2 √ u(u + s)e 4s2 (−e s2 πd − (u+s)2 2s2 √ u(u + s)e 4s2 πd = = = s + 1(cid:1)dx (cid:0) u √ ux (cid:17) (cid:0) u + 1(cid:1)dx 2d 2s2 −ux √ (cid:90) √ (cid:16)(cid:90) ∞ 2d (cid:12)(cid:12)(cid:12)√ (cid:12)(cid:12)(cid:12)∞ + (−e − x2 2 ) 2 dx + − x2 xe s s 0 0 0 2d (cid:17) 2d (cid:17) − x2 2 dx xe (D.4) H(cid:48)(cid:48)(ξ) is continuous for all ξ ∈ [0, x], x ∈ IR and (cid:12)(cid:12)H(cid:48)(cid:48)(ξ)(cid:12)(cid:12) ≤ e − (u+s)2 2s2 − (u+s)2 2s2 = e s + 1)2 s −(cid:104) u + 1(cid:1)2 − 3u s s + 1(cid:1)(cid:105)(cid:12)(cid:12)(cid:12)(cid:12) (D.5) + 2(cid:0) u (cid:12)(cid:12)(cid:12) − 2 s 0 − x2 2 ) (cid:16) (cid:0)2 − e−d(cid:1) (cid:12)(cid:12)(cid:12)(cid:12) u( u (cid:12)(cid:12)(cid:12) u (cid:0) u u 2sd u 2sd s s 84 (cid:26) u[ u 2sd(1 + ξ√ 2d )3 + 1]2 u ξ√ s(1+ ) 2d s(1 + ξ√ 2d ) 1√ 2π − x2 e 2 x2 2 dx (D.6) − (u+s)2 2s2 e (D.7) s + 1)2(cid:3) 8s2d 2s2 (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) ≤ u2(cid:2)3 + ( u (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − (u+s)2 2s2 e − (u+s)2 2s2 √ πd 4s2 Plug (4.36), (D.5) into (D.3), Φ22(u, d) = (cid:90) ∞ −√ −(cid:16) 1 2 u2 ξ√ 2d ) s(1+ +1]2 + 1(cid:3)(cid:17)(cid:27) (cid:12)(cid:12)(cid:12)(cid:16)(cid:90) ∞ 0 − 2 dx s(1 + ξ√ 2d ) 1√ 2π 2d u s(1 + ξ√ 2d − (u+s)2 2s2 4sd (cid:90) 0 −√ − (u+s)2 2s2 2d 1√ 2π − 1 2 [ e − x2 2 e x2 2 ) u + 2(cid:2) + 1(cid:1)2 − 3u (cid:0) u (cid:17) + 1(cid:1)2(cid:105) (cid:12)(cid:12)(cid:12) u (cid:104) 3 +(cid:0) u s − x2 2 x2 2 dx s s e s ≤ ue + ≤ u2e 8s2d From (D.4) and (D.6), (cid:12)(cid:12)(cid:12)(cid:12)Φ2(u, d) −(cid:16) 1 − 1 2 e − (u+s)2 2s2 − u(u + s)(2 − e−d) √ e 4s2 πd − (u+s)2 Therefore, 1 − 1 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)ΦU (u) − (cid:18) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Φ2(u, d) − (cid:18) s + 1)2(cid:3) ≤ u2(cid:2)3 + ( u ≤ 8s2d 1 − 1 2 − (u+s)2 2s2 e e √ − (u+s)2 2s2 − u(u + s)(2 − e−d) 2s2 − u(u + s)(2 − e−d) − (u+s)2 e 4s2 πd e := δ2(u, d) 85 APPENDIX E Boundaries Justification Moreover, let G(x) = e − u2 2s2 1 (1+ x√ 2d )2 . − u2 2s2 G(cid:48)(x) = e 1 (1+ x√ 2d )2 ×(cid:0) − u2 2s2 (cid:1)(cid:104) − 2 (cid:105) 1 (1 + x√ 2d )3 1√ 2d − u2 2s2 = e 1 (1+ x√ 2d := G(x)G1(x) )2 × u2 √ 2d s2 1 (1 + x√ 2d )3 G(cid:48)(cid:48)(x) = G(cid:48)(x)G1(x) + G(x)G(cid:48) )2(cid:20)(cid:16) u2 − u2 2s2 1 (1+ x√ 2d = e 1(x) √ 2d s2 1 (1 + x√ 2d (cid:17)2 − 3u2 √ 2d s2 − u2 2s2 = e 1 (1+ x√ 2d )2 × u2 2s2d 1 (1 + x√ 2d )4 u2 s2(1 + x√ 2d )3 (cid:104) 1 (1 + x√ 2d (cid:105) )2 − 3 (cid:21) 1√ 2d )4 (E.1) Then the Maclaurin series of G(x) is 2s2(cid:16) − u2 (cid:17) u2x √ s2 2d + RG(x) (E.2) G(x) = e 1 + 86 and the Lagrange form of the remainder for ξ ∈ [0, x] is RG(x) = = G(cid:48)(cid:48)(ξ) − u2 2s2 e x2 2! x2 2 1 ξ√ 2d (1+ )2 u2 2s2d 1 (1 + ξ√ 2d )4 (cid:104) u2 s2(1 + ξ√ 2d )2 (cid:105) − 3 , ξ ∈ [0, x] (E.3) Recall Φ1(u, d) = 1 − 1 2 (cid:90) ∞ −√ ⇒ Φ1(u, d) −(cid:0)1 − 1 − u2 2s2 − x2 2 e e 1 (1+ x√ 2d )2 dx, 1√ 2π 2d 2s2(cid:1) = − u2 e (cid:90) ∞ −√ 1 2 − x2 2 e 1√ 2π 2d (cid:104) e 2 − u2 2s2 e 2 lim d→∞ Φ1(u, d) = 1 − 1 )2(cid:105) − u2 2s2 1 (1+ x√ 2d − u2 2s2 − e dx By triangle inequality, (cid:12)(cid:12)(cid:12)Φ1(u, d) −(cid:0)1 − 1 2 e 2s2(cid:1)(cid:12)(cid:12)(cid:12) ≤ 1 − u2 2 ≤ 1 2 ≤ 1 2 + 2d (cid:90) ∞ (cid:90) ∞ −√ (cid:90) ∞ −√ (cid:90) ∞ −√ −√ 1 2 2d 2d 2s2(cid:12)(cid:12)(cid:12)dx − u2 )2 − e − x2 e 2 − x2 e − u2 2s2 1 (1+ x√ 2d (cid:12)(cid:12)(cid:12)e 2 (cid:12)(cid:12)G(x) − G(0)(cid:12)(cid:12)dx 2s2 u2|x| − u2 √ 2 (cid:12)(cid:12)RG(x)(cid:12)(cid:12)dx dx s2 2d − x2 e − x2 2 e e 1√ 2π 1√ 2π 1√ 2π 1√ 2π 2d := Φ11(u, d) + Φ12(u, d) 87 2s2 u2(−x) − u2 √ s2 2d (cid:17) dx (E.4) (cid:16)(cid:90) ∞ Φ11(u, d) = = = = 0 e 2d 1 2 − x2 2 e (cid:16)(cid:90) ∞ (cid:12)(cid:12)(cid:12)∞ 1√ 2π − u2 u2e 2s2 √ √ 1 s2 2π 2 − u2 2s2 √ πd − u2 2s2 √ πd (cid:16) (cid:0)2 − e−d(cid:1) − x2 2 ) u2e 4s2 u2e 4s2 (−e 0 0 − u2 2s2 u2x √ 2d s2 dx + − x2 2 e e 1√ 2π 2d − x2 2 dx + xe − x2 2 dx xe (cid:17) + (−e − x2 2 ) (cid:90) 0 −√ (cid:90) √ (cid:12)(cid:12)(cid:12)√ (cid:17) 2d 2d 0 0 For all ξ ∈ [0, x], x ∈ IR, G(cid:48)(cid:48)(ξ) is continuous and (cid:12)(cid:12)G(cid:48)(cid:48)(ξ)(cid:12)(cid:12) ≤ e − u2 2s2 u2 2s2d Applying the same calculation in (D.6) to Φ12(u, d), (cid:12)(cid:12)(cid:12)u2 s2 − 3 (cid:12)(cid:12)(cid:12) (E.5) (cid:12)(cid:12)(cid:12)(cid:12)dx − 3 (cid:12)(cid:12)(cid:12)(cid:12) )4 u2 s2(1 + ξ√ 2d )2 (cid:17) dx 1 (1 + ξ√ 2d (cid:90) 0 −√ 1√ 2π − x2 2 e x2 2 dx + 1√ 2π − x2 2 e x2 2 2d 1 ξ√ 2d (1+ )2 u2 2s2d − u2 2s2 e |x|2 2 (cid:12)(cid:12)(cid:12)(cid:16)(cid:90) ∞ 0 2s2(cid:17)(cid:12)(cid:12)(cid:12)(cid:12) ≤ u2(3 + u2 8s2d s2 ) − u2 e (E.6) (E.7) − u2 2s2 e Φ12(u, d) = 1 2 − x2 e 2 (cid:90) ∞ −√ − u2 2s2 u2 2s2d 2d 1√ 2π (cid:12)(cid:12)(cid:12) u2 s2 − 3 (cid:17) u2 s2 3 + ≤ 1 2 e 8s2d ≤ u2e − u2 2s2 (cid:16) (cid:12)(cid:12)(cid:12)(cid:12)Φ1(u, d) −(cid:16) So 1 − 1 2 2s2 − u2(2 − e−d) − u2 e √ πd 4s2 88 From (E.7), (cid:12)(cid:12)(cid:12)(cid:12)ΦU (u) −(cid:16) (cid:12)(cid:12)(cid:12)(cid:12)Φ1(u, d) −(cid:16) ≥ ≥ u2(3 + u2 s2 ) 8s2d e 1 − 1 2 1 − 1 2 − u2 √ 2s2 − u2(2 − e−d) − u2 2s2 − u2(2 − e−d) − u2 πd √ 4s2 e e e 4s2 πd 2s2(cid:17)(cid:12)(cid:12)(cid:12)(cid:12) 2s2(cid:17)(cid:12)(cid:12)(cid:12)(cid:12) − u2 − u2 e 2s2 := δ1(u, d) which shows that δ2(u, d) in Theorem 4.3.5 is sharp enough. By the similar computation in (4.39) and (4.42), ΦV (v) ≥ Ex 1 − 1 2 e (cid:104) 2s2 2s2 − x2 e 2 dx − Ex − (v−xσY ∗ )2 (cid:105) − δ1(v − xσY ∗, d) (cid:90) ∞ (cid:2)δ1(v − xσY ∗, d)(cid:3) − (v−xσY ∗ )2 (cid:2) s2+σ2 (cid:1)2(cid:3) −∞ e (cid:90) ∞ (cid:2)δ1(v − xσY ∗, d)(cid:3) (cid:113) (cid:2)δ1(v − xσY ∗, d)(cid:3) (cid:0)x− vσY ∗ 1√ 2π Y ∗ Y ∗) − Ex Y ∗ s2+σ2 1√ 2π 2(s2+σ2 2(s2+σ2 Y ∗) −∞ − e − e − 1 2 e dx v2 v2 s2 = 1 − 1 2 = 1 − 1 2 − Ex = 1 − s 2 Y ∗ s2 + σ2 The density function of X ∼ N (µX , σ2 X ) is − (x−µX )2 2σ2 X √ 1 e 2π σX fX (x) =  µX = σ2 X = vσY ∗ Y ∗ s2 + σ2 s2 Y ∗ s2 + σ2 89 (E.8) (cid:2)δ1(v − xσY ∗, d)(cid:3) = ⇒ Ex (cid:90) ∞ −∞ (cid:17) d (v − xσY ∗)2fX (x)dx (cid:16)2 − e−d√ π 2s2 − (v−xσY ∗)2 (v − xσY ∗)2e √ (cid:17)e 4s2 (v−xσY ∗ )2 √ (cid:20)(cid:90) ∞ 2 d − x2 2√ 2π s2 d v2 dx (cid:16)2 − e−d√ −∞ π Y ∗) (v − xσY ∗)4fX (x)dx + √ 3 2 (cid:21) (cid:17)(cid:104) 3 + + 4s + = = 2(s2+σ2 Y ∗ ) d(s2 + σ2 √ 1 2s2 v2 d e − (cid:113) (cid:90) ∞ −∞ − (cid:113) e 2(s2+σ2 Y ∗ ) Y ∗) d(s2 + σ2 √ 1 2s2 (cid:1)(cid:105) + d Y ∗(cid:0)µ3 − 4vσ3 4s + σ2 X (cid:26)(cid:16)2 − e−d√ (cid:104) π + √ 3 2 d v4 − 4v3σY ∗µX + 6v2σ2 (cid:1) + σ4 Y ∗(cid:0)µ4 X + 3µX σ2 X Y ∗(cid:0)µ2 v2 − 2vσY ∗µX + σ2 (cid:1) (cid:1)(cid:105)(cid:27) Y ∗(cid:0)µ2 X + σ2 X X X + 6µ2 X σ2 X + 3σ4 X Therefore, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)ΦV (v) − (cid:18) − (cid:113) v2 2(s2+σ2 Y ∗) Y ∗ s2 + σ2 1 − se 2 δ(cid:48)(v, d) = e − v2 2(s2+σ2 Y ∗ ) Y ∗ s2 + σ2 (cid:113) Y ∗(cid:0)µ2 8sd + 6v2σ2 X + σ2 X (cid:104) (cid:16) (cid:1) − 4vσ3 Y ∗(cid:0)µ3 v2 − 2vσY ∗µX + σ2 3 (E.9) (E.10) − (cid:113) − (2 − e−d)e 4s πd(s2 + σ2 − 2vσY ∗µX + σ2 (cid:104) v2 v2 2(s2+σ2 Y ∗ ) Y ∗) Y ∗(cid:0)µ2 Y ∗(cid:0)µ2 X + σ2 X X + σ2 X (cid:1)(cid:17) (cid:1) + σ4 Y ∗(cid:0)µ4 (cid:1)(cid:105)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ δ(cid:48)(v, d) (cid:16) + 1 s2 v4 − 4v3σY ∗µX (cid:1)(cid:17)(cid:105) X + 3µX σ2 X X + 6µ2 X σ2 X + 3σ4 X and µX , σ2 X defined in (E.8), which implies the bounds of ΦV (v) cannot do any better. (E.11) 90 BIBLIOGRAPHY 91 BIBLIOGRAPHY [1] Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515-533. [2] Haller, H. and Krauss, S. (2002). Misinterpretations of significance: a problem students share with their teachers. Methods of Psychological Research, 7(1), 1-20. [3] McShane, B. B. and Gal, D. (2017). Statistical significance and the dichotomization of evidence. Journal of the American Statistical Association, 112(519), 885-895. [4] Dunson, D. B. (2001). Commentary: practical advantages of Bayesian analysis of epi- demiologic data. American journal of Epidemiology, 153(12), 1222-1226. [5] Baldos, U. L. C., Viens, F. G., Hertel, T. W., and Fuglie, K. O. (2019). R&D spend- ing, knowledge capital, and agricultural productivity growth: a Bayesian approach. American Journal of Agricultural Economics, 101(1), 291-310. [6] Zhang, Z., Hamagami, F., Wang, L., Nesselroade, J. R., and Grimm, K. J. (2005). Bayesian analysis of longitudinal data using growth curve models. International Journal of Behavioral Development, 31(4), 374-383. [7] Neufcourt, L., Cao, Y., Nazarewicz, W., and Viens, F. (2018). Bayesian approach to model-based extrapolation of nuclear observables. Physical Review C, 98(3), 034318. [8] Wiener, N. (1938). The homogeneous chaos. American Journal of Mathematics, 60(4), 897-936. [9] Nourdin, I. and Peccati, G. (2009). Noncentral convergence of multiple integrals. The Annals of Probability, 37(4), 1412-1426. [10] Lobell, D. B., Cassman, K. G., and Field, C. B. (2009). Crop yield gaps: their im- portance, magnitudes, and causes. Annual Review of Environment and Resources, 34, 179-204. [11] Fermont, A. M., van Asten, P. J. A., Tittonell, P., van Wijk, M. T., and Giller, K. E. (2009). Closing the cassava yield gap: an analysis from smallholder farms in East Africa. Field Crops Research, 112(1), 24-36. [12] Tamene, L., Mponela, P., Ndengu, G., and Kihara, J. (2016). Assessment of maize yield gap and major determinant factors between smallholder farmers in the Dedza district of Malawi. Nutrient Cycling in Agroecosystems, 105, 291-308. 92 [13] van Dijk, M., Morley, T., Jongeneel, R., van Ittersum, M., Reidsma, P., and Ruben, R. (2017). Disentangling agronomic and economic yield gaps: an integrated framework and application. Agricultural Systems, 154, 90-99. [14] Liu, Z., Yang, X., Hubbard, K. G., and Lin, X. (2012). Maize potential yields and yield gaps in the changing climate of northeast China. Global Change Biology, 18(11), 3441-3454. [15] Tittonell, P. and Giller, K. E. (2013). When yield gaps are poverty traps: The paradigm of ecological intensification in African smallholder agriculture. Field Crops Research, 143, 76-90. [16] Snapp, S. S. (1998). Soil nutrient status of smallholder farms in Malawi. Communica- tions in Soil Science and Plant Analysis, 29(17-18), 2571-2588. [17] Batie, S. S. (2008). Wicked problems and applied economics. American Journal of Agricultural Economics, 90(5), 1176-1191. [18] Wang, H., Snapp, S. S., Fisher, M., and Viens, F. (2019). A Bayesian analysis of longitudinal farm surveys in central Malawi reveals yield determinants and site-specific management strategies. PLoS ONE, 14(8), e0219296. [19] Mungai, L. M., Snapp, S., Messina, J. P., Chikowo, R., Smith, A., Anders, E., Richard- son, R. B., and Li, G. (2016). Smallholder farms and the potential for sustainable intensification. Frontiers in Plant Science, 7, 1720. [20] Lowole, M. W. (1983). Soil Map of Malawi. Department of Agricultural Research, Lilongwe, Malawi. [21] Gin´e, X., Townsend, R., and Vickery, J. (2007). Statistical analysis of rainfall insurance payouts in southern India. American Journal of Agricultural Economics, 89(5), 1248- 1254. [22] Diaconis, P. and Ylvisaker, D. (1979). Conjugate priors for exponential families. The Annals of Statistics, 7(2), 269-281. [23] Beukes, I., Rose, L. J., Shephard, G. S., Flett, B. C., and Viljoen, A. (2017). Mycotox- igenic Fusarium species associated with grain crops in South Africa – a review. South African Journal of Science, 113(3-4), 1-12. [24] Sauer, J. and Tchale, H. (2009). The economics of soil fertility management in Malawi. Review of Agricultural Economics, 31(3), 535-560. [25] Marenya, P. P. and Barrett, C. B. (2009). State-conditional fertilizer yield response on western Kenyan farms. American Journal of Agricultural Economics, 91(4), 991-1006. 93 [26] Scharf, P. C., Wiebold, W. J., and Lory, J. A. (2002). Corn yield response to nitrogen fertilizer timing and deficiency level. Agronomy Journal, 94(3), 435-441. [27] Kabambe, V. H., Nambuzi, S. C., and Kauwa A. E. (2008). Integrated management of witchweed (Striga asiatica [L.] Kuntze) by means of maize-legume rotations and intercropping systems in Malawi. Bunda Journal of Agriculture, Environmental Science and Technology, 3(2), 35-42. [28] Berner, D. K., Kling, J. G. and Singh, B. B. (1995). Striga research and control: a perspective from Africa. Plant Disease, 79(7), 652-660. [29] Jamil, M., Kanampiu, F. K., Karaya, H., Charnikhova, T., and Bouwmeester, H. J. (2012). Striga hermonthica parasitism in maize in response to N and P fertilisers. Field Crops Research, 134, 1-10. [30] Sauerborn, J., Kranz, B., and Mercer-Quarshie, H. (2003). Organic amendments mit- igate heterotrophic weed infestation in savannah agriculture. Applied Soil Ecology, 23(2), 181-186. [31] Atera, E. A., Itoh, K., Azuma, T. and Ishii, T. (2012). Farmers perception and con- straints to the adoption of weed control option: the case of Striga asiatica in Malawi. The Journal of Agricultural Science, 4(5), 41. [32] Shaxson, L. and Riches, C. (1998). Where once there was grain to burn: a farming system in crisis in eastern Malawi. Outlook on Agriculture, 27(2), 101-105. [33] Snapp, S. S. , Blackie, M. J., and Donovan, C. (2003). Realigning research and exten- sion to focus on farmers’ constraints and opportunities. Food Policy, 28(4), 349-363. [34] Stocker, T. F., Qin, D., Plattner, G.-K., Tignor, M., Allen, S. K. , Boschung, J., Nauels, A., Xia, Y., Bex V., and Midgley, P. M. (2013). Climate change 2013: the physical science basis. Working group I contribution to the fifth assessment report of the Intergovernmental Panel on Climate Change. Cambridge, New York: Cambridge University Press. [35] Li, B., Nychka, D. W., and Ammann, C. M. (2010). The value of multiproxy recon- struction of past climate. Journal of the American Statistical Association, 105(491), 883-895. [36] Grotch, S. L. and MacCracken, M. C. (1991). The use of general circulation models to predict regional climatic change. Journal of Climate, 4, 286-303. [37] Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., and Taylor, K. E. (2016). Overview of the coupled model intercomparison project phase 94 6 (CMIP6) experimental design and organization. Geoscientific Model Development, 9(5), 1937-1958. [38] Matthes, K., Andersson, F. B., Barnard, M. E., Beer, L., Charbonneau, J., Clilverd, P., Dudok de Wit, M. A., Haberreiter, T., Hendry, M., Jackman, A., Kretzschmar, C. H., Kruschke, M., Kunze, T., Langematz, M., Marsh, U.,Maycock, D. R., Misios, A. C. (2017). Solar Forcing for CMIP6 (v3.2). Geoscientific Model Development, 10(6), 2247-2302. [39] Jones, P. D., Briffa, K. R., Osborn, T. J., Lough, J. M., van Ommen, T. D., Vinther, B. M., Luterbacher, J., Wahl, E. R., Zwiers, F. W., Mann, M. E., Schmidt, G. A., Ammann, C. M., Buckley, B. M., Cobb, K. M., Esper, J., Goosse, H., Graham, N., Jansen, E., Kiefer, T., Kull, C., and K¨utte, M. (2009). High-resolution palaeoclima- tology of the last millennium: a review of current status and future prospects. The Holocene, 19, 3-49. [40] Werner, J. P. and Tingley, M. P. (2015). Technical note: probabilistically constraining proxy agedepth models within a Bayesian hierarchical reconstruction model. Climate of the Past, 11, 533-545. [41] Mann, M. E., Zhang, Z., Hughes, M. K., Bradley, R. S., Miller, S. K., Rutherford, S., and Ni, F. (2008). Proxy-based reconstructions of hemispheric and global surface tem- perature variations over the past two millennia. Proceedings of the National Academy of Sciences, 105(36), 13252-13257. [42] Mann, M. E., Bradley, R. S., and Hughes, M. K. (1998). Global-scale temperature patterns and climate forcing over the past six centuries. Nature, 392, 779787. [43] Yu, Z., Chu, P., and Schroeder, T. (1997). Predictive skills of seasonal to annual rainfall variations in the U.S. affiliated Pacific islands: canonical correlation analysis and multivariate principal component regression approaches. Journal of Climate, 10, 2586-2509. [44] Smerdon, J. E., Kaplan, A., Chang, D., and Evans, M. N. (2010). A pseudoproxy evaluation of the CCA and RegEM methods for reconstructing climate fields of the last millennium. Journal of Climate, 23, 4856-4880. [45] Werner, J. P., Luterbacher, J., and Smerdon, J. E. (2013). A pseudoproxy evaluation of Bayesian hierarchical modeling and canonical correlation analysis for climate field reconstructions over Europe. Journal of Climate, 26, 851-867. [46] Schneider, T. (2001). Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. Journal of Climate, 14, 853-871. 95 [47] Rutherford, S., Mann, M. E., Delworth, T. L., and Stouffer, R. J. (2003). Climate field reconstruction under stationary and nonstationary forcing. Journal of Climate, 16, 462-479. [48] Zhang, Z., Mann, M. E., and Cook, E. R. (2004). Alternative methods of proxy-based climate field reconstruction: application to summer drought over the conterminous United States back to AD 1700 from tree-ring data. The Holocene, 14(4), 502-516. [49] Lee, T. C. K., Zwiers, F. W., and Tsao, M. (2008). Evaluation of proxy-based millennial reconstruction methods. Climate Dynamics, 31, 263-281. [50] Christiansen, B., Schmith, T., and Thejll, P. (2009). A surrogate ensemble study of climate reconstruction methods: stochasticity and robustness. Journal of Climate, 22, 951-976. [51] Jones, P. D., Briffa, K. R., Barnett, T. P., and Tett, S. F. B. (1998). High-resolution palaeoclimatic records for the last millennium: interpretation, integration and com- parison with general circulation model control-run temperatures. The Holocene, 8(4), 455-471. [52] Mann, M. E., Bradley, R. S., and Hughes, M. K. (1999). Northern hemisphere tem- peratures during the past millennium: inferences, uncertainties, and limitations. Geo- physical Research Letters, 26(6), 759-762. [53] Tingley, M. P., Craigmile, P. F., Haran, M., Li, B., Mannshardt, E., and Rajaratnam, B. (2012). Piecing together the past: statistical insights into paleoclimatic reconstruc- tions. Quaternary Science Reviews, 35, 1-22. [54] Hoeting, J. A., Madigan, D., Raft, A. E., and Volinsky, C. T. (1999). Bayesian model averaging: a tutorial. Statistical Science, 14(4), 382-417. [55] Katz, R. W. (2002). Techniques for estimating uncertainty in climate change scenarios and impact studies. Climate Research, 20, 167-185. [56] Cheaib, A. , Badeau, V., Boe, J., Chuine, I., Delire, C., Dufrˆene, E., Fran¸cois, C., Gritti, E. S., Legay, M., Pag´e, C., Thuiller, W., Viovy, N., and Leadley, P. (2012). Cli- mate change impacts on tree ranges: model intercomparison facilitates understanding and quantification of uncertainty. Ecology Letters, 15, 533-544. [57] Haslett, J. , Whiley, M., Bhattacharya, S., SalterTownshend, M., Wilson, S. P., Allen, J. R., Huntley, B., and Mitchell, F. J. (2006). Bayesian palaeoclimate reconstruction. Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(3), 395- 438. 96 [58] Barboza, L., Li, B., Tingley, M. P., and Viens, F. G. (2014). Reconstructing past temperatures from natural proxies and estimated climate forcings using short- and long-memory models. The Annals of Applied Statistics, 8, 1966-2001. [59] Tingley, M. P. and Huybers, P. (2010). A Bayesian algorithm for reconstructing climate anomalies in space and time. Part I: development and applications to paleoclimate reconstruction problems. Journal of Climate, 23, 2759-2781. [60] Henley, B. J., Thyer, M. A., Kuczera, G., and Franks, S. W. (2011). Climate-informed stochastic hydrological modeling: Incorporating decadal-scale variability using paleo data. Water Resources Research, 47, W11509. [61] Comboul, M., Emile-Geay, J., Evans, M. N., Mirnateghi, N., Cobb, K. M., and Thomp- son, D. M. (2014). A probabilistic model of chronological errors in layer-counted cli- mate proxies: applications to annually banded coral archives. Climate of the Past, 10, 825-841. [62] Mann, M. E., Zhang, Z., Rutherford, S., Bradley, R. S., Hughes, M. K., Shindell, D., Ammann, C., Faluvegi, G., and Ni, F. (2009). Global signatures and dynamical origins of the Little Ice Age and Medieval Climate Anomaly. Science, 326, 1256-1260. [63] Loso, M. G. (2009). Summer temperatures during the Medieval Warm Period and Little Ice Age inferred from varved proglacial lake sediments in southern Alaska. Journal of Paleolimnology, 41, 117. [64] Brohan, P., Kennedy, J. J., Harris, I., Tett, S. F. B., and Jones, P. D. (2006). Uncer- tainty estimates in regional and global observed temperature changes: a new data set from 1850. Journal of Geophysical Research, 111, D12106. [65] Rayner, N. A., Brohan, P., Parker, D. E., Folland, C. K., Kennedy, J. J., Vanicek, M., Ansell, T. J., and Tett, S. F. B. (2006). Improved analyses of changes and uncertainties in sea surface temperature measured in situ since the mid-nineteenth century: the HadSST2 dataset.” Journal of Climate, 19, 446-469. [66] Morice, C. P., Kennedy, J. J., Rayner, N. A., and Jones, P. D. (2012). Quantifying uncertainties in global and regional temperature change using an ensemble of obser- vational estimates: the HadCRUT4 data set. Journal of Geophysical Research, 117, D08101. [67] Jones, P. D., Lister, D. H., Osborn, T. J., Harpham, C., Salmon, M., and Morice, C. P. (2012). Hemispheric and large-scale land surface air temperature variations: an ex- tensive revision and an update to 2010. Journal of Geophysical Research, 117, D05127. 97 [68] Ammann, C. M., Genton, M. G., and Li, B. (2010). Technical note: correcting for signal attenuation from noisy proxy data in climate reconstructions. Climate of the Past, 6, 273-279. [69] Gueymard, C. A. (2018). A reevaluation of the solar constant based on a 42-year total solar irradiance time series and a reconciliation of spaceborne observation. Solar Energy, 168, 2-9. [70] Bard, E., Raisbeck, G., Yiou, F., and Jouzel, J. (2000). Solar irradiance during the last 1200 years based on cosmogenic nuclides. Tellus B, 52, 985-992. [71] Lean, J., Beer, J., and Bradley, R. (1995). Reconstruction of solar irradiance since 1610: implications for climate change. Geophysical Research Letters, 22(23), 3195-3198. [72] Ammann, C. M., Joos, F., Schimel, D. S., Otto-Bliesner, B. L., and Tomas, R. A. (2007). Solar influence on climate during the past millennium: results from tran- sient simulations with the NCAR climate system model. Proceedings of the National Academy of Sciences, 104(10), 3713-3718. [73] Shindell, D. T., Schmidt, G. A., Miller, R. L., and Rind, D. (2001). Northern hemi- sphere winter climate response to greenhouse gas, ozone, solar, and volcanic forcing. Journal of Geophysical Research, 106(D7), 7193-7210. [74] Shindell, D. T. and Schmidt, G. A. (2003). Volcanic and solar forcing of climate change during the preindustrial era.” Journal of Climate, 16, 4094-4107. [75] Moss, R. H., Edmonds, J. A., Hibbard, K. A., Manning, M. R., Rose, S. K., van Vuuren, D. P., Carter, T. R., Emori, S., Kainuma, M., Kram, T., Meehl, G. A., Mitchell, J. F. B., Nakicenovic, N., Riahi, K., Smith, S. J., Stouffer, R. J., Thomson, A. M., Weyant, J. P., and Wilbanks, T. J. (2010). The next generation of scenarios for climate change research and assessment. Nature, 463, 747-756. [76] van Vuuren, D. P., Edmonds, J., Kainuma, M., Riahi, K., Thomson, A., Hibbard, K., Hurtt, G. C., Kram, T., Krey, V., Lamarque, J. F., Masui, T., Meinshausen, M., Nakicenovic, N., Smith, S. J., Rose, S. K. (2011). The representative concentration pathways: an overview. Climatic Change, 109, 5-31. [77] van Vuuren, D. P., den Elzen, M. G. J., Lucas, P. L., Eickhout, B., Strengers, B. J., van Ruijven, B., Wonink, S., and van Houdt, R. (2007). Stabilizing greenhouse gas concentrations at low levels: an assessment of reduction strategies and costs. Climatic Change, 81, 119-159. [78] Hijioka, Y., Matsuoka, Y., Nishimoto, H., Masui, T., and Kainuma, M. (2008). Global GHG emission scenarios under GHG concentration stabilization targets. Journal of Global Environment Engineering, 13, 97-108. 98 [79] Riahi, Keywan, Gr¨ubler, A., and Nakicenovicac, N. (2007). Scenarios of long-term socio-economic and environmental development under climate stabilization. Techno- logical Forecasting and Social Change, 74(7), 887-935. [80] Free, M. and Robock, A. (1999). Global warming in the context of the Little Ice Age. Journal of Geophysical Research, 104, 19057-19070. [81] Myhre, G., Highwood, E. J., Shine, K. P., and Stordal, F. (1998). New estimates of radiative forcing due to well mixed greenhouse gases. Geophysical Research Letters, 25, 2715-2718. [82] Collins, W. D., Ramaswamy, V., Schwarzkopf, M. D., Sun, Y., Portmann, R. W., Fu, Q., Casanova, S. E. B., Dufresne, J. L., Fillmore, D. W., Forster, P. M. D., Galin, V. Y., Gohar, L. K., Ingram, W. J., Kratz, D. P., Lefebvre, M. P., Li, J., Marquet, P., Oinas, V., and Tsush, Y. (2006). Radiative forcing by well-mixed greenhouse gases: estimates from climate models in the intergovernmental panel on climate change (IPCC) fourth assessment report (AR4). Journal of Geophysical Research, 111, D14317. [83] Chouet, B. A. (1996). Long-period volcano seismicity: its source and use in eruption forecasting. Nature, 380, 309-316. [84] Oppenheimer, C. (2003). Climatic, environmental and human consequences of the largest known historic eruption: Tambora volcano (Indonesia) 1815. Progress in Phys- ical Geography, 27 (2), 230-259. [85] Man, W., Zhou, T., and Jungclaus, J. H. (2014). Effects of large volcanic eruptions on global summer climate and East Asian monsoon changes during the last millennium: analysis of MPI-ESM simulations. Journal of Climate, 27, 7394-7409. [86] Ljung, G. M. and Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika, 65(2), 297-303. [87] Ter¨asvirta, T. (1994). Specification, estimation, and evaluation of smooth transition autoregressive models. Journal of the American Statistical Association, 89(425), 208- 218. [88] Eitrheim, Ø. and Ter¨asvirta, T. (1996). Testing the adequacy of smooth transition autoregressive models. Journal of Econometrics, 74 (1), 59-75. [89] Morris, C. N. (1983). Parametric empirical Bayes inference: theory and applications. Journal of the American Statistical Association, 78(381), 47-55. [90] Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457-472. 99 [91] McKeen, S., Wilczak, J., Grell, G., Djalalova, I., Peckham, S., Hsie, E. Y., Gong, W., Bouchet, V., Menard, S., Moffet, R., McHenry, J., McQueen, J., Tang, Y., Carmichael, G. R., Pagowski, M., Chan, A., Dye, T., Frost, G., Lee, P., and Mathur, R. (2005). As- sessment of an ensemble of seven real-time ozone forecasts over eastern North America during the summer of 2004. Journal of Geophysical Research, 110, D21307. [92] Savage, N. H., Agnew, P., Davis, L. S., Ord´o˜nnez, C., Thorpe, R., Johnson, C. E., O’Connor, F. M., and Dalvi, M. (2013). Air quality modelling using the met office uni- fied model (AQUM OS24-26): model description and initial evaluation. Geoscientific Model Development, 6, 353-372. [93] Chai, T. and Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)? – arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7, 1247-1250. [94] Hersbach, H. (2000). Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather and Forecasting, 15, 559-570. [95] Grimit, E. P., Gneiting, T., Berrocal, V. J., and Johnson, N. A. (2006). The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification. Quarterly Journal of the Royal Meteorological Society, 132, 2925- 2942. [96] Gschl¨oßl, S. and Czado, C. (2007). Spatial modelling of claim frequency and claim size in non-life insurance. Scandinavian Actuarial Journal, 2007(3), 202-225. [97] Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359-378. [98] Cowles, M. K. and Carlin, B. P. (1996). Markov chain Monte Carlo convergence diagnostics: a comparative review. Journal of the American Statistical Association, 91(434), 883-904. [99] Brooks, S. P. and Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434- 455. [100] Hegerl, G. C., Br¨onnimann, S., Schurer, A., and Cowan, T. (2018). The early 20th cen- tury warming: anomalies, causes, and consequences. Wiley Interdisciplinary Reviews: Climate Change, 9(4), e522. [101] Phipps, S. J., McGregor, H. V., Gergis, J., Gallant, A. J. E., Neukom, R., Stevenson, S., Ackerley, D., Brown, J. R., Fischer, M. J., and van Ommen, T. D. (2013). Paleoclimate datamodel comparison and the role of climate forcings over the past 1500 years. Journal of Climate, 26, 6915-6936. 100 [102] Eddy, J. A. (1976). The Maunder Minimum. Science, 192(4245), 1189-1202. [103] Hempelmann, A. and Weber, W. (2012). Correlation between the sunspot number, the total solar irradiance, and the terrestrial insolation. Solar Physics, 277, 417-430. [104] Kopp, G., Krivova, N., Wu, C. J., and Lean, J. (2016). The impact of the revised sunspot record on solar irradiance reconstructions. Solar Physics, 291, 2951-2965. [105] Mann, M. E., Fuentes, J. D., and Rutherford, S. (2012). Underestimation of volcanic cooling in tree-ring-based reconstructions of hemispheric temperatures. Nature Geo- science, 5, 202-205. [106] Taylor, K. E., Stouffer, R. J., and Meehl, G. A. (2012). An overview of CMIP5 and the experiment design. Bulletin of the American Meteorological Society, 93, 485-498. [107] Holmstr¨om, L., Ilvonen, L., Sepp¨a, H., and Veski, S. (2015). A Bayesian spatiotemporal model for reconstructing climate from multiple pollen records. The Annals of Applied Statistics, 9(3), 1194-1225. [108] Kim, K. K., Shen, D. E., Nagy, Z. K., and Braatz, R. D. (2013). Wiener’s polynomial chaos for the analysis and control of nonlinear dynamical systems with probabilistic uncertainties [historical perspectives]. IEEE Control Systems, 33, 58-67. [109] Oladyshkin, S. and Nowak, W. (2018). Incomplete statistical information limits the utility of high-order polynomial chaos expansions. Reliability Engineering & System Safety, 169, 137-148. [110] Pedersen, A. R. (1995). A new approach to maximum likelihood estimation for stochas- tic differential equations based on discrete observations. Scandinavian Journal of Statis- tics, 22(1), 55-71. [111] A¨ıt-Sahalia, Y. (2002). Maximum likelihood estimation of discretely sampled diffusions: a closedform approximation approach. Econometrica, 70(1), 223-262. [112] Nourdin, I. and Peccati, G. (2015). The optimal fourth moment theorem. Proceedings of the American Mathematical Society, 143, 3123-3133. 101