WIENER-CHAOS ANALYSIS ON BAYESIAN MODELS WITH

APPLICATIONS IN AGRICULTURE AND CLIMATOLOGY

By

Han Wang

A DISSERTATION

Submitted to

Michigan State University

in partial fulﬁllment of the requirements

for the degree of

Statistics — Doctor of Philosophy

2020

ABSTRACT

WIENER-CHAOS ANALYSIS ON BAYESIAN MODELS WITH APPLICATIONS IN

AGRICULTURE AND CLIMATOLOGY

By

Han Wang

Understanding the challenges to increasing maize productivity in sub-Saharan Africa has

important implications for policies to reduce national and global food insecurity. There is

insuﬃcient research on the key agronomic and environmental factors that inﬂuence maize

yield in a smallholder-farm environment. We implement a Bayesian analysis with longi-

tudinal household survey data covering 1,197 plots among 320 farms in central Malawi.

The results reveal a high positive association between a leaf chlorophyll indicator and yield,

with signiﬁcance levels exceeding 95% Bayesian credibility at all sites, and the posterior

mean of the regression coeﬃcient ranging from 28% to 42% on a relative scale. A parasitic

weed, Striga asiatica, is the variable that negatively associated with yield of high inten-

sity. The impact of rainfall varies by site and season, either directly or indirectly. We

conclude that the determinants preventing striga infestation and enhancing nitrogen fertility

will lead to higher maize yield in Malawi. To improve plant nitrogen status, fertilizer is ef-

fective at higher-productivity sites, whereas soil carbon and organic inputs are important at

marginal sites. Uniquely, the Bayesian approach allows diﬀerentiation of response by site for

a modest-sample-size study. Considering the biophysical constraints, our ﬁndings highlight

area-speciﬁc recommendations as well as management strategies for crop yield.

Quantifying the sensitivity of climate forcing factors such as greenhouse gas concentra-

tion and solar irradiation, is critical in comprehending the evolution of the Earth’s climate.

There exists a variety of statistical methods to reconstruct temperature in the past, but the

same is not true for projecting future temperatures. We produce a multi-level stochastic

model to systematically reconstruct and project the northern-hemisphere average tempera-

ture anomalies, for the past millennium (1000-1999) and the next century (2019-2100), by

coordinating with climatic forcings and natural proxies from diverse data sources. Additive

noises are applied to the model to capture the unaccounted variability. Model parameters

are estimated using Bayesian-inference techniques, resulting in complete distributional in-

formation. Reconstructions with memory features (no, short, long) are evaluated through

selected validation metrics, and the results constitute evidence in favor of using a moderate-

memory length. For the purpose of temperature projections, we incorporate realistic climate

forcing uncertainties to Year 2100. Similarly, we include an uncertainty component on top of

using representative carbon pathway scenarios for global greenhouse gases. Our projections’

posterior means show a great level of agreement with the 95% conﬁdence interval provided

by the Coupled Model Intercomparison Project, while featuring diﬀerences in most cases.

The models described above are both implemented via Gibbs sampler with 10,000 itera-

tions. In order to avoid its potential computational heft, we combine the use of maximum

likelihood estimators for regression elements with properties of Wiener chaos, to approxi-

mate the predictive samples with speciﬁc chaos distributions that do not require sampling

via numerics. Some of the approximations’ statistics, such as error variances are also ex-

plicitly provided. The precision are relatively high (nearly 0.1% and 0.5%) depending on

dimension circumstances. This allows practitioners to estimate approximation accuracy and

convergence rates in practice, with no resort to heavy computational demands.

ACKNOWLEDGMENTS

First and foremost, I would love to express my deep gratitude to Dr. Frederi Viens. Not

only is he a knowledgeable advisor providing constant guidance throughout the program,

but also he is a great mentor who teaches me how to become a better person.

Also, I am grateful for Dr. Sieglinde Snapp who helps me open the window of research

world. It is my great pleasure to collaborate with her and get to learn how to do research

meticulously. I’m thankful that Dr. R.V. Ramamoorthi and Dr. Shlomo Levental agree to

be on my committee, oﬀering crucial insights and expertise.

Moreover, a special thank you to my parents’ consistent love and support. I would never

be where I am today without them. My girlfriend Mengzi’s companionship through the years

is very much appreciated as well.

Last but not least, I’d like to thank all the staﬀ members and my fellow PhD colleagues,

with whom I have countless discussions on a variety of topics.

iv

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Chapter 1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Bayesian Regression Modeling . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Chaos Structure of Wiener Space . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Study Sites

2.1
2.2 Materials

Chapter 2 Maize Yield Determinants and Management Strategies . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Modeling Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Explanatory Variables
. . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 Bayesian Inference
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2.1 Maize yield . . . . . . . . . . . . . . . . . . . . . . . . . . .
SPAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2.2
Striga . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2.3
. . . . . . . . . . . . . . . . . . .
2.4.2.4 Household (farmer) eﬀect
2.4.2.5
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.5 Discussions

Socioeconomic factors

Chapter 3 Global Temperature’s Reconstruction and Projection . . . . .
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Model Speciﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Forcing Transformation and Extension . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
3.3.2 Multi-level Autoregressive Model
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Validation Metrics
3.4.2 Reconstruction (1000-1999)
. . . . . . . . . . . . . . . . . . . . . . .
3.4.3 Projection (2019-2100) . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Conclusions and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 4 Predictive-distribution Approximation via Wiener Chaos . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.1 Preliminaries

v

1
1
2
4

5
5
6
6
7
9
9
10
13
13
16
16
18
19
21
22
23

27
27
29
31
31
34
36
36
37
42
47

49
49

4.2 Approximation Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Wiener-chaos Expansion . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Error Magnitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Prediction Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Explicit Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Simulation Results

4.4 Discussions

51
51
53
55
55
65
66

Chapter 5 Future Work Directions . . . . . . . . . . . . . . . . . . . . . . . .

69

APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPENDIX A Model Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPENDIX B Exponential Decay . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPENDIX C Posterior-distribution Computation . . . . . . . . . . . . . . . . .
APPENDIX D Proof of Theorem 4.3.5 . . . . . . . . . . . . . . . . . . . . . . . .
APPENDIX E Boundaries Justiﬁcation . . . . . . . . . . . . . . . . . . . . . . . .

70
71
74
76
82
86

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

vi

LIST OF TABLES

Table 2.1: Mean and (standard error) of all model variables by location . . . . . . .

14

Table 2.2: Number and (proportion) of households where the y-intercept is signiﬁ-
cantly non-zero at 95% credibility level in all models . . . . . . . . . . . .

21

Table 3.1: Root mean square error (RMSE) for all combinations of memory coeﬃcients 37

Table 3.2: Empirical coverage probability (ECP) at levels 80% (up) and 95% (down)

38

Table 3.3: Negative interval score (IS) at both 80% (up) and 95% (down) levels . . .

39

Table 3.4: Negative continuous ranked probability score (CRPS) for diﬀerent memories 39

Table 3.5: ECP at two levels 80% (up) and 95% (down) for speciﬁc calibrations . . .

40

Table 3.6: Validation metrics (negative IS and CRPS) for both models . . . . . . . .

40

Table 3.7: The diagnosis of MCMC’s convergence (PSRF) for both models . . . . . .

41

Table 3.8: List of CMIP5 models used for multi-averaging temperature calculations .

43

Table 3.9: Temperature diﬀerences (in absolute values) between two projection periods 45

Table 3.10: Gibbs sampler convergence for the posterior means and (standard devia-
tions) of model coeﬃcients in Scenario D as well as validation metrics for
each memory combination . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table 4.1: Approximation results for various combinations of (s2, σ2

signiﬁcance levels (α), dimension (n), and number of coeﬃcients (k)

Y ∗) under diﬀerent
. . .

46

68

vii

LIST OF FIGURES

Figure 2.1: 10-day precipitation (CHIRPS) from December to May at all EPA sites .

15

Figure 2.2: 95% credible intervals for the determinants of maize yield . . . . . . .

17

Figure 2.3: 95% credible intervals for the determinants associated with SPAD . . .

18

Figure 2.4: 95% credible intervals for the drivers of striga prevention (logistic) . .

20

Figure 2.5: 95% credible intervals for the determinants of striga control

. . . . . .

20

Figure 2.6: 95% credible intervals for determinants including two socioeconomic
factors associated with maize yield . . . . . . . . . . . . . . . . . . . . .

23

Figure 3.1: Greenhouse gas projections of four RCP scenarios . . . . . . . . . . . . .

31

Figure 3.2: The ﬁrst 36 re-normalized volcanic eruptions’ decay . . . . . . . . . . . .

33

Figure 3.3: Temperature anomalies’ reconstructions between our model and EIV . .

41

Figure 3.4: Temperature prediction for RCP8.5 between 2006 and 2100 (CMIP5) . .

44

Figure 3.5: Temperature projections until 2100 for all four RCPs: comparisons with
CMIP5 multi-model average . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 3.6: Temperature projections’ comparisons with CMIP5 for medium scenarios
(up: RCP4.5; down: RCP6.0) . . . . . . . . . . . . . . . . . . . . . . . .

Figure A.1: Temperature projections’ comparisons (CMIP5) with no σC . . . . . . .

Figure A.2: Temperature projections’ comparisons (CMIP5) with 1

2σC . . . . . . . .

Figure A.3: Temperature projections’ comparisons (CMIP5) with 2σC . . . . . . . .

Figure A.4: Temperature projections’ comparisons (CMIP5) with 3σC . . . . . . . .

Figure A.5: 10,000 MCMC samples of all model parameters (from top to bottom): α1,
. . . . . . . . . . . . . . . . . . . . . . . . .

α0, βC , βV , βS, β0, σ2

P , σ2
T

44

45

71

72

72

72

73

viii

Chapter 1

Introduction

1.1 Bayesian Regression Modeling

Bayesian statistics is based on the concept of probability to predict the future, where

probability is interpreted as a degree of belief in an event. The degree of belief can come

from previous knowledge or personal beliefs about the event. Bayesian statistics was named

after Thomas Bayes, who ﬁrst described the outcome of a coin-toss experiment using Bayes’s

theorem on his paper published in 1763.

From the late 1700s to the early 1800s, Pierre-Simon Laplace established the Bayesian

interpretation of probability, and many other Bayesian methods were developed by other

scholars during that time. Over much of the 20th century, the widely-used statistical methods

were based on the frequentist interpretation. Bayesian methods were not in favor of, even

controversial to many statisticians, due to computational reliance and practical restrictions.

Since the 1950s, researchers began to apply Bayes’s theorem to account for model un-

certainties, by incorporating educated guesses about the likelihood of something happening,

and then making predictions. However, it is diﬃcult to implement these types of guesses.

In the recent two decades, particularly with the emergence of high-performance computers

and new repetitive algorithms like Markov chain Monte Carlo (MCMC), Bayesian methods

have been more recognized as powerful tools in the scientiﬁc community after overcoming

1

the implementation diﬃculties.

Bayesian linear regression is a modeling approach in which the statistical analysis is un-

dertaken within the context of Bayesian inference. When the regression model has an error

following a normal distribution, and a particular form of prior distribution is assumed, the

full posterior probability distributions for every model parameter are explicitly available.

Not only does it provide more information than point estimates like means and variances in

classical frequentist statistics, but credible intervals are straightforward to interpret even by

non-statisticians. P-values can be computed in a Bayesian way, with more power and ﬂex-

ibility in assessing the signiﬁcance of explanatory variables [1], avoiding misinterpretations

of p-values [2, 3]. Also, the Bayesian approach allows background knowledge from domain

specialists to be incorporated into the analysis, as a type of participatory model building,

improving the accuracy and credibility of the estimations [4].

Last but not least, Bayesian statistics has an advantage in statistical power for data-

limited studies [5, 6]. There is an often quoted but rarely if ever formally cited rule of

thumb, by which the number of parameters (or degrees of freedom) that one can estimate

reliably (e.g., with credibility level higher than 90%) in a linear model, is a third of the total

number of data points, compared to a tenth in ordinary frequentist linear regression [7].

1.2 Chaos Structure of Wiener Space

Polynomial chaos (also called Wiener chaos expansion) is a non-sampling-based method

to determine evolution of uncertainty in a dynamic system in which the parameters have

probabilistic uncertainties. It was ﬁrst introduced by Norbert Wiener in 1938, where Hermite

polynomials were used to model stochastic processes with Gaussian random variables [8].

2

When the second moment is ﬁnite, such an expansion converges in the L2 sense for any

arbitrary stochastic process, which is applicable in most physics systems.

In any real and separable Hilbert space H, there exists an isonormal Gaussian process
of a centered Gaussian family (G(ϕ), ϕ ∈ H) of random variables on a probability space
(Ω,F,P), such that

E[G(ϕ)G(ψ)] = (cid:104)ϕ, ψ(cid:105)H

(1.1)

The Wiener chaos of order n is deﬁned as the closure in L2(Ω), a linear span of the random
variables Hn(G(ϕ)), where Hn is the Hermite polynomial of degree n, and ||ϕ||H = 1. For
any F ∈ L2(Ω), there is a unique sequence of functions fn ∈ H(cid:12)n (symmetric tensor

product) such that

∞(cid:88)

n=0

F =

In(fn)

(1.2)

where In is the Wiener stochastic integral with respect to G, I0(f0) := E[F ], and all of them

are mutually orthogonal in L2(Ω). This is the fundamental decomposition of L2(Ω) as a

direct sum of all Wiener-chaos terms [9].

Moreover, L2(Ω) is closed under multiplication, for any p, q and f ∈ H(cid:12)p, g ∈ H(cid:12)q

(symmetric), the product of Wiener integrals is calculated by:

p∧q(cid:88)

Ip(f )Iq(g) =

r!Cr

pCr

q Ip+q−2r(f ⊗r g)

(1.3)

n=1

where the contraction f ⊗r g is an element of H⊗(p+q−2r). In particular, the special case

when p = q = 1 is mostly used as follows:

I1(f )I1(g) =

I2(f ⊗ g + g ⊗ f ) + (cid:104)f, g(cid:105)H

1
2

(1.4)

3

1.3 Dissertation Overview

This dissertation focuses on the applications of Bayesian regression models, speciﬁcally in

agriculture and climatology, as well as Wiener-chaos analysis on the predictive distribution

of the response variable under linear-regression-model setup. Here is the breakdown:

• Chapter 2 introduces a multi-linear regression model to mainly explore the determi-

nants of maize yield stability in Malawi.

• Chapter 3 develops a hierarchical Bayesian model to investigate the reconstruction and

the projection of temperature anomalies in northern hemisphere.

• Chapter 4 proposes an estimation framework of predictive distribution using Wiener-

chaos-expansion technique, which can give rise to a better understanding on the risk

of a linear-regression model and the approximation error.

• Chapter 5 discusses the possible work directions in the future.

4

Chapter 2

Maize Yield Determinants and

Management Strategies

2.1

Introduction

Yield gaps in African smallholder agriculture are pervasive and large. The yields achieved

on the vast majority of African farms are 10-30% of their genetic potential [10]. Yield-

limiting factors have been identiﬁed, such as environment, sub-optimal planting in terms of

timing and spacing, deﬁciencies in soil nutrients, moisture, as well as damage from weeds and

pests [11,12]. Agricultural economists commonly emphasize market prices, farmer education,

and related socio-economic factors to inﬂuence on-farm production [13]. There are many

challenges to carrying out eﬀective diagnostic analysis of yield gaps, and often the focus has

been on the size of the gap. Yet if research priorities and agronomic recommendations are

to address farm-level constraints, there is urgent need for evidence-based examination of the

main determinants of yield in speciﬁc contexts.

This is the ﬁrst study to understand maize yield determinants by applying a Bayesian

approach to a unique survey dataset from central Malawi. Crop simulation models are

often used for gap analysis, and are suited to providing insights into yield potential as well

as technology response to weather variability; however, they do not reveal the drivers of

5

yield gaps [14]. In ﬁeld experimentation, trials are often run under conditions that are not

representative of on-farm conditions. Smallholders in sub-Saharan Africa often have marginal

soils with less intensive management, facing weed, disease, and other pest problems [15].

The disconnect between soil conditions at research stations and those on smallholder farms

is illustrated by a nationwide assessment in Malawi, where soil organic matter levels at

research stations are 1.5 to 2 times as high as those observed on smallholder farms [16].

Researchers generally choose a ﬁeld site and invest resources, so as to ensure a homogeneous

environment, within which to evaluate a practice or to address a speciﬁc research question.

Thus ﬁeld research sites tend to be ﬂat, uniform, and high-potential, given that conventional

research experimentation usually tests one or two component practices while controlling

other sources of variability [17].

The overall objective of the project is to conduct a Bayesian analysis of household-survey

data that comprises multiple visits to maize-focal plots in central Malawi, to determine

which variables inﬂuence the maize yield [18]. Speciﬁcally, we examine the predictive ability

of time-series environmental factors and management practices regarding ﬁeld observations

of maize yield. Furthermore, we assess leaf chlorophyll status and parasitic weed incidence

to provide site-speciﬁc models and recommendations.

2.2 Materials

2.2.1 Study Sites

Central Malawi agriculture is dominated by mixed maize production systems with limited

livestock presence, which is broadly typical for poor-resource smallholder farms in southern

Africa [19]. Administrative units in the Malawi government are comprised of region, agricul-

6

tural development division, and extension planning area (EPA). The study sites are chosen

using a stratiﬁed random sampling scheme, where all EPAs within central Malawi are clas-

siﬁed using the strata of marginal, moderate or mesic for plant growth based on rainfall and

evapotranspiration. There is one marginal site that contains two adjacent EPAs–Golomoti

and Mtakataka (referred herein as Golomoti), two moderate-potential sites which are Kandeu

and Nsipe, and one high-potential site called Linthipe, a total of 22 village clusters included

within these ﬁve EPAs [19]. Golomoti is located near the lakeshore at a low elevation and

a high evapotranspiration, with a mix of soil types dominated by Eutric Cambisols and Eu-

tric Fluvisols. Linthipe has well-distributed rainfall and a long history of maize-dominated

agriculture. Soils in Linthipe are primarily Ferric Luvisols, whereas in Kandeu and Nsipe,

soils are mixed with Chromic Luvisols and Orthic Ferrasols [20]. Market access also varies

across locations, with Kandeu and Nsipe being moderately remote, Golomoti and Linthipe

being proximate.

2.2.2 Data Collection

The data are from a panel of 320 farm households, two maize plots per household surveyed

in 2014/2015 and in 2015/2016, with a survey instrument approved through the Michigan

State University Human Research Protection Program in the Oﬃce of Regulatory Aﬀairs,

following a human subjects’ protocol with informed consent obtained from all farmers, trans-

lated into local languages. The farmers are asked to choose two maize plots at random, and

the same plots and farmers are then revisited and surveyed at preseason (October 2014 and

2015), mid-season (March 2015 and 2016), and harvest (May 2015 and 2016). Enumerators

are trained over a one-week period, and supervised by graduate students on the ﬁeld. The

data collection process involves close attention to data entry and quality control. The infor-

7

mation on the survey is voluntary and every eﬀort is carried out to maintain conﬁdentiality.

The survey topics address crop production, socioeconomic information (household size

and composition, household head’s educational level), farm management and practices (la-

bor, seed, planting dates, plot history, residue, crop grown, time of sowing and weeding,

fertilizer application), and soil characteristics (pH, total carbon, permanganate oxidizable

carbon). Mid-season survey is to assess maize planting arrangements (including row spac-

ing), and maize leaf chlorophyll, which is based on soil plant analysis development (SPAD)

absorbance. Enumerators record three reading replicates per plant for four plants at each of

the eight locations. To avoid edge eﬀects, the enumerators observe at least two ridges from

the plot border, and randomly choose three locations (two-ridge apart) along a diagonal

transect. The spacing is from the center of one ridge to the center of the adjacent one.

Additionally, two types of measurements are made to investigate the incidence of Striga

asiatica (L.) Kuntze, commonly known as witchweed, a genus of parasitic plants. One is

directly asking farmers if they have a problem with striga on a given maize plot. The other

one is obtained by enumerators who make eight observations per plot at random sites along

rows following a prescribed procedure; thus, striga information is recorded from 0 to 8 for

each plot. At harvest, a survey is provided to measure maize yield by weighing biomass of

stover and grain from three-square-meter plots per ﬁeld, where grain is removed from cobs

to allow for a dry weight basis.

In summary, there are 1,197 plots in total, which are geographically located with GPS

coordinates in October of 2014, involving a small amount of missing data. The plot-level

data is longitudinal, and combine socioeconomic characteristics, maize production, plant and

soil information, as well as farm management.

8

2.3 Modeling Framework

2.3.1 Explanatory Variables

Environmental factors (rainfall and soil properties) are continuous variables, which are

standardized for comparison with the estimated coeﬃcients on the same scale. Rainfall data

come from the Climate Hazards InfraRed Precipitation with Station (CHIRPS) resource,

which is a public quasi-global rainfall data set starting from 1981. This is the only com-

prehensive precipitation data source that is available for Malawi, and has been previously

validated by comparisons to local rainfall records for three of the ﬁve EPAs [19]. We include

three variables in the model that indicate the amount of rainfall (in millimeter) for

(a) December, January, and February (the sowing period)

(b) March (the end of the rainy season)

(c) April and May (the harvest period)

to explore the association between seasonal-rainfall variability and maize yield by measuring

at three critical stages of the maize growing season. Using more than one subset of semi-

annual rainfall is common in certain agricultural studies and practices, such as the deﬁnition

and calculation of rain-index-based crop insurance [21]. Regarding soil properties, perman-

ganate oxidizable carbon (POXC), which is a sensitive indicator of active soil organic carbon,

and pH are selected as the main soil factors in the model.

There are ﬁve farm-management variables, namely ridge (row) spacing, total ridge-weed

biomass, fertilizer application, intercropping, and manure/compost use. Although fertilizer

and SPAD are highly correlated, they both have explanatory power for maize yield. Ridge

spacing (in centimeter) is the distance between two ridges at each of three locations within

9

a ﬁeld. Total ridge-weed biomass (in quadrat of 0.5 m2) is weighted in the ﬁeld at harvest,

and treated as an index of weeding eﬀectiveness. Intercropping and manure/compost use

are both binary variables, and we do not consider the density or the genre of the crop that

farmers intercropped with maize. Fertilizer is calculated by the amount of nitrogen applied

with any type of fertilizer amendment. Endogeneity bias is possible, particularly for these

choice variables. However, we are unable to address endogeneity because our data set does

not include suitable instrumental variables, that strongly predict the endogenous explanatory

variables but do not directly aﬀect the response variables. As a result, coeﬃcient estimates

should be interpreted as indicating association rather than causality.

2.3.2 Regression Model

A Bayesian framework is applied to estimate the statistical relationship in a linear re-

gression setting, and an agronomy perspective based on expert knowledge is used to form

the basis of this model as the following:

Yijk = αi + Xijkβi + σεijk

(2.1)

• Yijk (response variable): maize yield (in kilogram per hectare) of plot k managed by

farm household j at EPA site i

• αi (y-intercept): can either be a constant or vary from location to location

• Xijk (design matrix): incorporates all the data from 12 explanatory variables

• βi (regression coeﬃcients): measure how much of the variation of maize yield (Yijk) is

explained by the covariates (Xijk)

10

• εijk (noise terms): independently and identically distributed N (0, 1) representing the

error of the model

• σ: scale of the error

To evaluate the determinants of maize yield, we compare the eﬀects of each explanatory

variable on the response in any one of our three models (i.e., yield, SPAD, striga). Because

all variables in the models have been standardized, the estimated regression coeﬃcients (βi)

of each variable give a magnitude of inﬂuence, which can be compared with the scale of the

noise terms (σ), to ﬁnd out which predictor(s) has the strongest eﬀect. Usually, σ is quite

large relative to a single regression coeﬃcient, but one should add the absolute magnitudes of

several independent variables for a more meaningful comparison, since the statistical error

needs to be compared to the strength of all the explanatory factors combined. For each

model, the set of explanatory variables is chosen to be consistent with agronomists’ beliefs

about what factors may inﬂuence yield, SPAD, or striga. Each of the three models is speciﬁed

in the simple linear framework, with appropriate logistic modiﬁcations in the case of the the

striga model, to distinguish between incidence of striga and levels of striga. Such a linear

framework can be seen as a ﬁrst-order approximation for each response.

We can gauge the random eﬀect of location since αi and βi both depend on the EPA index

i. Pooling the two-year data avoids model misspeciﬁcation, and has the added beneﬁt of

increasing statistical power. Further models test response variables SPAD and the parasitic

weed striga, to uncover the underlying key drivers of maize yield, where we expect striga

to be a negative factor and SPAD to be positive. The noise terms (εijk) are assumed

to be independent across all three models in order to minimize the number of parameters

needed to be estimated. It also avoids the use of a large number of correlation parameters

11

(hyperparameters) at the prior level in the Bayesian context, which need to be consistent

when investigating a system of 3 equations with 11 common explanatory variables between

any pair of models. Consequently unobserved factors that might simultaneously aﬀect more

than one model are not taken into account.

The eight striga records have been reverted to a scale of quantitative values from 0 to 8;

with 0 indicating the absence of striga, and 1 to 8 revealing the level of striga infestation.

The striga model can be thought of as a two-step procedure:

• First, the zero and non-zero values provide two alternatives which allow us to estimate

the inﬂuence of the presence or absence of striga via logistic regression, speaking for

the possibility of prevention.

• Next, when conditioning on the presence of striga (1–8), ordinary linear regression is

employed, which links to the insights on eﬀectiveness of striga control

Although having a large number of observations equal to zero (known as zero inﬂation) may

induce biased results, the proposed strategy mitigates this problem, since the standard linear

regression model is relative to the non-zero striga values.

In addition to the models described above, we conduct two sets of analysis to consider

if socioeconomic indicators—educational level of the household head and total dependency

ratio of each household—are of importance in predicting maize yield, by adding these two

variables to the covariate matrix X. The available data is also used to investigate non-speciﬁc

household eﬀects, which utilizes simpliﬁed versions of the three linear models, allowing the

y-intercept (α) to depend on the household identiﬁer (dummy variable) to help determine

whether households are predictive of maize yield, SPAD, or striga. This may be interpreted

as pointing indirectly to socioeconomic eﬀects, or directly to eﬀects of farmer skill.

12

We apply the package ‘PyMC3’ built into Python to implement the methodology. This

package provides a way to estimate the posterior distribution of our model parameters (i.e.,

regression coeﬃcients and error terms) by implementing a sampling mechanisum for these

distributions. It uses the ordinary Gibbs sampler to produce samples, with a burn-in period

of 500 initial samples, and an additional 10,000 iterations with two independent MCMC

chains after each burn-in. Without having prior knowledge on the distribution of the param-

eters, we employ the classical weakly-informative prior distributions: the standard normal

distributions (for βi) and inverse-gamma distribution (for σ). These prior choices present

numerical advantages in terms of conjugacy [1,22]. We monitor the convergence of the proce-

dure by keeping track of the discrepancy between the two aforementioned chains using R-hat

statistics (a widely-accepted convergence diagnosis). All the R-hat values are below the ac-

ceptable threshold of 1.1, which implies that the chains successfully converge, producing

excellent parameters’ estimates as well as their credible intervals.

In a word, we focus on biophysical determinants, yet there is more to be explored in the

future regarding socioeconomic drivers. Given the limited amount of data, this project does

not delve into the higher-complexity models, since they lie beyond the scope of our data set

and thus of our analysis. Hence, the three structural models are uniquely characterized by

their respective response variables and explanatory variables.

2.4 Results

2.4.1 Descriptive Statistics

Table 2.1 provides descriptive statistics for the variables in the model categorized by EPA

location. Figure 2.1 shows the precipitation for all study regions during the maize-growing

13

season over two years. Mean values for rainfall are consistent with earlier characterization

of Golomoti as a low-rainfall (marginal) site. The other locations diﬀer in terms of mean

rainfall for March, with Linthipe (mesic site) having the highest rainfall level.

Golomoti

Linthipe

Kandeu

Nsipe

563.0 (78.0)
78.6 (26.8)
25.2 (10.4)

278.9 (152.43)

6.56 (0.61)

624.5 (27.7)
128.9 (42.0)
43.8 (11.6)
466.9 (220.5)
6.09 (0.46)

629.5 (94.6)
102.8 (26.5)
39.1 (9.1)

636.8 (132.8)
116.8 (25.4)
44.4 (8.8)

390.41 (191.15)

340.70 (160.22)

6.10 (0.53)

6.32 (0.61)

1567.44 (1039.3)

2636.3 (1526.0)

2069.4 (1471.5)

2320.9 (1452.9)

41.20 (8.85)

46.98 (7.15)

46.00 (8.62)

41.84 (8.11)

0.897 (0.11)
0.183 (0.16)
8.908 (13.60)
0.22 (0.42)
0.66 (0.47)
0.41 (0.49)

0.927 (0.11)
0.079 (0.07)
13.763 (25.48)

0.970 (0.13)
0.201(0.15)

15.901 (18.04)

0.914 (0.14)
0.246 (0.16)
12.411 (12.86)

0.31 (0.46)
0.77 (0.42)
0.45 (0.50)

282

0.16 (0.37)
0.74 (0.44)
0.31 (0.46)

298

0.28 (0.45)
0.60 (0.49)
0.27 (0.45)

305

Environment

Dec.–Feb. precipitation (mm)

March precipitation (mm)

Apr.–May precipitation (mm)

POXC (mg carbon/kg soil)

Soil pH

Crop performance
Maize yield (kg/ha)

SPAD

Management practice

Maize spacing (m)

Weed biomass (kg/m2)

Fertilizer (kg)
Striga (binary)

Intercrop (binary)
Compost (binary)

Total number of observations

312

Table 2.1: Mean and (standard error) of all model variables by location

Soils are generally marginal at Golomoti sites, as evidenced by low mean value for soil

active carbon (POXC), in accordance with previous reports [19]. The highest POXC value

appears to be in Linthipe. Soil pH varies little among locations, and mean values are con-

sistent with moderate acidity, and thus non-limiting pH conditions for the crops grown.

Crop response consists of maize yield and leaf nitrogen content, as indicated by SPAD

values. Average maize yield is the lowest in Golomoti, followed by Nsipe and Kandeu, and the

highest is in Linthipe. SPAD data has a similar but not identical pattern: low in Golomoti

and Nsipe, and high in Kandeu and Linthipe.

Striga incidence and weed biomass are distributed across EPAs with no clear spatial

pattern. Farmer-reported striga problems on about 16-30% of sampled ﬁelds, and from 0.18

to 0.25 kg/m2 dry-weight weed biomass remaining in the ﬁeld at harvest. The latter is an

14

Figure 2.1: 10-day precipitation (CHIRPS) from December to May at all EPA sites

indicator of how eﬀective farmers’ weed management is, although the endogenous infestation

levels of weeds at a site could also contribute to observed presence.

Overall, fertilizer use is lower at Golomoti site than the other ones. This is accordant

with the intuition that farmers in marginal environments have less motivation to invest

in their lands and crops. Fertilizer application is similar in Linthipe, Kandeu, and Nsipe.

Compost use is higher in Golomoti and Linthipe than it is in Kandeu and Nsipe. As expected,

intercropping is more frequently practiced in Linthipe and in Kandeu, where many farmers

grow bean and cowpea (legume) in mixed stands with maize. Average plant spacing is lower

in Golomoti and Nsipe, and the highest in Kandeu.

15

2.4.2 Bayesian Inference

Statistical signiﬁcance is determined under a linear Bayesian circumstance. For example,

an explanatory variable (such as SPAD) for a response variable (such as yield) is statistically

signiﬁcant at 95% Bayesian credibility if its regression coeﬃcient has a posterior probability

of being on one side of zero which exceeds 95%.

It is true as soon as the 95% credible

interval of that variable’s regression coeﬃcient lies on either side of zero. If this condition

fails to happen, then strictly speaking, one may accuse it of not being statistically signiﬁcant.

This is a steep threshold to apply in most cases, since a variable with 90% credibility, still

holds some predictive information. In this project, however, when an explanatory variable

fails to be signiﬁcantly associated with the model’s response variable, the failure occurs at

a much lower level than 90%. The size of an association needs to be distinguished from its

signiﬁcance. For variables which are signiﬁcant in terms of size (or intensity), the signiﬁcance

are measured by their regression coeﬃcients’ posterior mean.

2.4.2.1 Maize yield

Figure 2.2 presents the 95% credibility intervals for the determinants of maize yield at

diﬀerent sites. In general, maize yield is positively associated with SPAD and negatively

associated with plant spacing; the magnitude of the coeﬃcients for SPAD are particularly

large. There is also a small but positive direct relationship between fertilizer and maize

yield for Kandeu and Nsipe. Yield is not consistently aﬀected by soil pH, weed biomass,

intercropping, or March rainfall. The α values diﬀer markedly from each other by EPA,

which suggests that location has an important inﬂuence on maize yield, independently of

other factors in the model. Since those factors are capable of exacerbating diﬀerences in

maize yield by location, we turn to consider speciﬁc locations from now on.

16

Figure 2.2: 95% credible intervals for the determinants of maize yield

First, rainfall levels are only signiﬁcant factors for yield in Linthipe and Nsipe. In both

areas, early-season rainfall (i.e., December to February) has a positive relation with maize

yield. In Nsipe, high rainfall in April/May is associated with low maize yield, perhaps a

reﬂection of late-season disease harming the crop such as Fusarium ear rot [23]. Secondly,

soil active carbon and compost application are positively associated with maize yield at

Golomoti. Third, in Linthipe and Nsipe, striga has a large negative eﬀect on maize yield,

whereas it is not signiﬁcantly related with maize yield in Golomoti or Kandeu. Slightly-

streamlined yield models are also investigated, where either SPAD or striga is removed. It

turns out that these reduced models resulted in decreasing explanatory power for all variables,

which could be easily assessed via the posterior distributions of regression coeﬃcients.

Overall, the results indicate that maize leaf nitrogen (SPAD) and striga are the strongest

17

determinants of maize yield at all sites, while early and late rainfall are signiﬁcant at some

sites. As farmer behavior can directly inﬂuence striga and SPAD, we evaluate separate

models to uncover the drivers of these two critical inputs.

2.4.2.2 SPAD

As we can see from Figure 2.3, there are three main predictors for SPAD: rainfall, POXC,

and application of fertilizer or compost. Rainfall is generally found to be inﬂuential on SPAD,

except in Kandeu, where rainfall variables are not statistically signiﬁcant.

In Golomoti,

SPAD increases with March rainfall, whereas in Linthipe and Nsipe it is the early rainfall

that has a positive impact on SPAD. Also in Nsipe, a negative association with SPAD is

observed for late-season rainfall.

Figure 2.3: 95% credible intervals for the determinants associated with SPAD

Fertilizer quantity is an important driver of SPAD at all sites except Golomoti, which

18

in turn is highly predictive of maize yield. The magnitude of its eﬀect on SPAD is higher

in Kandeu and Nsipe than it is in Linthipe. In dry and marginal sites, plant growth and

response to fertilizer are often limited by insuﬃcient soil moisture, thus fertilizer application

does not necessarily lead to plant uptake of nitrogen (or yield). The results in the yield model

also show a lack of response to fertilizer in Golomoti (see Figure 2.2), where compost/manure

and POXC are positively related to SPAD. Fertilizer impacts on both SPAD and yield are

substantial, however, suggesting the need to improve the eﬀectiveness of fertilizer applied.

2.4.2.3 Striga

Factors that inﬂuence striga incidence (Figure 2.4) and level of striga infestation (Figure

2.5) are shown below. Among farm management and soil properties, most are either not

signiﬁcant or of small magnitude in relationship to striga. There is some evidence of soil

fertility amendments being useful in prevention, as fertilizer application is associated with

striga absence (as reported by farmers) in Kandeu and in Linthipe. Fertilizer is also an

apparent control factor in Linthipe, where it is a predictor of low striga-infestation level.

Compost is related with both striga absence and low incidence in Nsipe, with contrasting

results observed in Kandeu. At Golomoti site, fertility amendments have no striga control

beneﬁts, and the only farm-management eﬀect is wide ridge spacing, which is negatively

correlated to striga presence and intensity. In addition, intercropping is neither helpful nor

harmful to striga prevention and control. The only exception is Linthipe where intercrops

are associated with striga problems.

19

Figure 2.4: 95% credible intervals for the drivers of striga prevention (logistic)

Figure 2.5: 95% credible intervals for the determinants of striga control

20

2.4.2.4 Household (farmer) eﬀect

With most of the 320 households, and up to four plots per household (two plots per

household in two years), we ﬁnd that it is not possible to determine if there is a connec-

tion between any particular farmer and the corresponding data. However, some statistical

signiﬁcance is extracted from roughly 10% of cases, meaning that the variability among the

four data points of such households is almost certainly not due to chance alone. Moreover,

the eﬀect is most likely to identify a favorable household environment. Speciﬁcally, setting

the signiﬁcance level at 5%, we compute the number of households for which the regression

intercept α is away from zero with posterior probability at least 95% (see Table 2.2).

Model Positive eﬀect Negative eﬀect
Yield
SPAD
Striga

23 (7.5%)
11 (3.6%)
26 (8.5%)

5 (1.6%)
7 (2.3%)
0 (0.0%)

Table 2.2: Number and (proportion) of households where the y-intercept is signiﬁcantly
non-zero at 95% credibility level in all models

Despite the limited data per household, we are able to detect such eﬀects with nearly 10%

of the households. In the yield model, the total number of signiﬁcant households exceeds 9%.

Interestingly, among these households, there are far more cases where the yield is higher than

it is lower (7.5% against 1.6%). In other words, when the data towards a farmer having a

signiﬁcant eﬀect on yield, the odds are about 5:1 that this is a skilled farmer with high maize

yield. For SPAD, about 6% of households have SPAD levels which cannot be explained by

chance. The odds of having high SPAD against low SPAD is about 3 to 2.

As for the striga model, there is no case that a farmer could avoid striga entirely, while

8.5% of farmers are likely to be associated with a striga problem. It may seem surprising to

say that we cannot determine any farmer with the skills or the knowledge of practices to be

21

superior to others in preventing striga. This result is not to be taken as a discouraging fact.

Instead, it reﬂects that over two thirds of all plots in the study regions are striga-free. Thus,

while one third of the plots being infected with striga reaches epidemic levels, having two

thirds of plots without striga makes its absence so prevalent that the 320 households cannot

identify anyone with unusual striga-prevention skills. Readers are referred to the full striga

model for more precise recommendations on striga.

2.4.2.5 Socioeconomic factors

The credible intervals for the two socioeconomic drivers (2nd and 3rd line in Figure 2.6)

overlap rather heavily with the zero vertical line in most cases, and are small in magnitude

compared to other determinants, indicating inconclusive signiﬁcance and small impact. It

also shows that the regression coeﬃcients of other variables are insensitive to whether or not

one includes the additional two socioeconomic indicators. This could be a sign of insuﬃcient

data to draw conclusions on either rejecting or asserting socioeconomic importance at any

reasonable level of signiﬁcance (e.g 80% credibility or higher). We also carry out additional

models by removing insigniﬁcant variables from X, and the remaining analysis (not reported)

are largely unaﬀected, which are similar to Figure 2.6. In sum, these evidence imply that

our regression model is robust.

22

Figure 2.6: 95% credible intervals for determinants including two socioeconomic factors
associated with maize yield

2.5 Discussions

The plot-level longitudinal data set and Bayesian regression models place biophysical

constraints in sharp focus in the analysis of what inﬂuences maize yield in central Malawi.

Some have strong eﬀects with very high credibility, notably SPAD, striga, and sub-seasonal

rainfall pattern, which is consistent with much of the literature on small-scale, mixed-maize

production systems [12,15]. Yet, fertilizer application does not necessarily lead to improving

leaf nitrogen nutrition (SPAD) or in subsequent maize yields. Hence, the results highlight

the challenges to ensure eﬀective nitrogen uptake and translation into grain, particularly in

marginal environments.

Fertilizer is generally associated with high plant nitrogen tissue at all areas but Golomoti.

Soil organic carbon fractions tend to be sensitive for crop response at lower values, as Golo-

23

moti is 20-40% lower compared to the other locations, which both recommend farmers to

apply manure or legume rotations at marginal sites, to build nitrogen supply capacity [14].

Integrated management of soil fertility combining manure and fertilizer has been shown pre-

viously to be highly eﬀective and proﬁtable for raising maize yields [24]. In concurrence, a

study ﬁnds out that maize yield response to nitrogen fertilizer application is low when soil

organic matter is low [25].

Rainfall is another crucial factor, particularly early-to-mid season, which supports a pos-

itive association between seasonal rainfall and maize yield based on an econometric analysis

of farm-level data [24]. At Kandeu, we ﬁnd a negative relationship of late-season rainfall to

yield and SPAD. This could reﬂect a leaching problem, with rainfall inducing soil inorganic

nitrogen losses and thus limiting nitrogen availability during the critical maize grain ﬁlling

period, which requires high nitrogen availability [26]. It could also be related to Fusarium or

other infections of the corn ear, induced by a late-season moist environment causing grain

spoilage and yield loss [5]. To our knowledge, it is the ﬁrst study to produce this type of

diﬀerentiated conclusions based on panel survey data.

Although weed biomass alone has no discernible eﬀect on maize yield, parasitic weed

striga is a strong negative determinant, especially in Linthipe and Nsipe. It is a bit surprising

that the phenomena are only observed in the two relatively high-potential locations, as the

presence of striga is typical throughout the study sites and is indeed ubiquitous nationwide

[27]. This implies that a barrier to crop production that has been largely overlooked by

agricultural research and policy makers, where the focus has often been on subsidized access

to hybrid maize seeds and fertilizers. Eﬀective and aﬀordable means of striga prevention and

control are in need.

In Malawi, farmers primarily rely on hand weeding for striga control, which appears to

24

be apparently ineﬀective due to the parasitic nature of striga. Therefore, maize growth sup-

pression has already occurred by the time hand weeding is done [28]. The utility of fertilizer

and manure/compost links low-nitrogen soil to higher striga incidence [29, 30]. Similarly,

farmers in a recent survey rank manure application as the best option for striga control [31].

Another study reveals complex relationships that early application of fertilizer helps maize

plants overcome early eﬀects of striga attachment, whereas late application is associated

with worse striga [32]. Given the observed diﬀerences in soil fertility and fertilizer across the

EPAs, we expect higher striga infestation in Golomoti over the other areas, instead there

is a widespread presence of striga and lack of uniformity in what works for prevention and

control. For instance, fertilizer is found to be a critical factor for striga, but only in Kan-

deu and Linthipe. Manure/compost appears to have merits for reducing striga problems in

Nsipe only. In short, our ﬁndings suggest a complex mode of action with diﬃcult-to-predict

reactions of maize to striga presence.

Overall, the results from this project call for area-speciﬁc recommendations. The Malawi

government’s suggestions for hybrid maize production have mainly focused on nitrogen

rate [33]. We ﬁnd evidence that shows targeting complementary investments and timing

of application could potentially add value. For example, maize yield at the mesic sites would

gain beneﬁts on early and judicious use of fertilizer for not only striga control, but also for

nitrogen nutrition. Whereas for the marginal sites, response is markedly diﬀerent, with no

fertilizer or striga drivers observed on yield, and instead soil active carbon and compost

are positive determinants of yield, recommending practices should focus on managing soil

organic matter. Soil carbon accumulation provides important environmental services at all

sites; however, marginal locations beneﬁt from stable production over time and space to

substantial gains in nitrogen eﬃciency, and the incremental gains in soil carbon are high.

25

Compost preparation and utilization at modest amounts have beneﬁcial inﬂuence on some

sites, with gains in plant health as indicated by nitrogen status and striga suppression. This

management practice is especially helpful during a poor-rainfall season in a marginal area.

With little to no downside, government policies, extensions, and educational eﬀorts by civil

society, can all play crucial roles in building appreciation for compost advantages.

26

Chapter 3

Global Temperature’s Reconstruction

and Projection

3.1

Introduction

Understanding the evolution of the climate on Earth is one of the major scientiﬁc chal-

lenges for this century. Quantifying the link between the sensitivity of the climate system

constituents, and its response to the climate forcing factors can be critical to discover the

underlying relationship in climate change [34]. Although the most updated instrument-base

observational temperature has a record since 1850, it is desirable to extend the temperature-

reconstruction methodology to make use of climate proxies, hence better capturing the cli-

mate variability on diﬀerent time scales [35].

The Intergovernmental Panel on Climate Change (IPCC) is dedicated to providing an

objective and comprehensive view of climate change to the world, and pushes forward the

development of accurate climatic models like General Circulation Models (GCMs), which are

used for forecasting global and regional climate change [36]. Coupled Model Intercomparison

Project Phase 6 (CMIP6) is another ongoing collaborative framework organized by the World

Climate Research Program (WCRP), aiming to gather the eﬀorts of the international climate

research community, to improve the design of global climate model [37,38]. In addition, there

27

are plenty of research articles relating the past temperature to proxies [35,39,40], such as tree

ring, ice cores, lacustrine deposits, etc., implying that the natural proxies are good indicators

for paleoclimate because they accurately reﬂect a wide range of climate sensitivity [41].

Previous approaches used in temperature reconstruction including principal component

regression (PCR) [42,43], canonical correlation analysis (CCA) [43–45], regularized expecta-

tion maximization (RegEM) [46–48], and linear regression with various kinds of regulariza-

tions [49,50]. These literatures signiﬁcantly improve the comprehension of the climate in the

past. However, the lack of spatiotemporal covariance speciﬁcation and uncertainty quantiﬁ-

cation continue to be the statistical challenges [51–53]. Several recommendations like model

simpliﬁcation, model averaging, have been proposed in order to address these issues [54–56].

The idea of using Bayesian model to reconstruct past temperature ﬁrst appears in [57],

and further studied by [40, 58, 59]. An advantage of applying hierarchical modeling frame-

work to paleo-climate reconstruction problem is that it can incorporate diﬀerent sources

of information (proxies, forcings), to capture the variability of temperature. Furthermore,

parameters estimates are systematically updated within Bayesian inference, which simultane-

ously provides the full posterior distributions of each model parameter to better understand

the relation between each factor and the mean temperature, as well as reduces and quantiﬁes

the reconstruction uncertainty in a more statistically meaningful way [35, 40, 53].

Stochastic models have been applied to hydrology and coral archives involving proxies

and paleo-data [60,61]. In this study, we develop a multi-level stochastic model to reconstruct

northern hemisphere (NH) temperature over the past millennium (1000-1999), and extend

the three forcing time series (solar, volcanic, greenhouse gases) to project the temperature in

this century (2019-2100). The ﬁrst level (data) embeds the climate proxies into the underly-

ing temperature anomalies, and the second level (process) linearly relates the temperature to

28

solar irradiance, volcanic activity, and greenhouse gas concentration. Finally, we implement

the Bayesian technique to estimate all the parameters in the model, due to a limited amount

of data in the calibration interval.

3.2 Data Source

There are three data sources employed in the project. Four main types of proxies includ-

ing tree ring widths, lacustrine sediment cores, ice cores and speleothems (cave formations)

between 1000 and 1999. Observational surface temperature over the period 1900-1999, and

estimates of three natural climate forcings (solar, volcanic, greenhouse gas) from 1000 to

1999 (for reconstruction), as well as from 2019 to 2100 (for projection).

The original proxy data comprises of 1209 annually and decadal series [62]. Owing to the

focus of NH average temperatures reconstruction instead of spatial analysis, it is reasonable

to reduce the dimension of proxies and maintain as much climatic information as possible

at the same time. Moreover, proxy reduction can avoid model overﬁtting and decrease

computational time of parameters estimations, yield a more parsimonious model due to a

limiting calibration period. All the natural proxies are aggregated into a single variable

through a rigorous selection and averaging process [41, 58, 63].

HadCRUT4 is the longest and most up-to-date global temperature data set, which

combines observational near-surface air temperature [64] and sea-surface temperature [65]

anomalies evolution since 1850 [66]. Anomalies are calculated relative to the average of

1961-1990 [65, 67]. Because the climate proxies may intrinsically carry notable noise, and

the data before 1900 may not be reliable enough, we choose 1900-1999 to be the calibration

time to circumvent the potential signiﬁcant amplitude attenuation issue [68].

29

Total solar irradiance (TSI) is the measurement of solar power per unit area (W/m2)

on the Earth’s upper atmosphere. The solar activity is a complex phenomenon, whose

ﬂuctuation is considered to be around an overall constant 1361.0 W/m2 [38, 69]. We take a

proxy-base reconstruction of TSI [70] that is in accordance with the climate reconstruction

in [52].

Its relationship with NH temperature is studied in [71].

In terms of projection,

we use the dataset recommended for CMIP6, where it calculates the weighted average of

three statistical models, namely, analogue forecast (non-parametric), autoregressive model

(parametric), and harmonic model (non-linear) [38]. The volcanic forcing is proposed based

on ice core aerosol proxies, which is generated by the ejected particles during the explosive

volcanic eruptions [72]. The impact of volcanism on global and regional temperature are

also investigated [73, 74] .

Greenhouse gas concentration (measured by parts per million (ppm), denoted by CO2),

is the most dominant forcing account for climate variations since 1950 [72]. According to the

level of greenhouse gas (equivalent to radiative forcing in W/m2) at Year 2100 (Figure 3.1),

the IPCC chooses and names four possible representative concentration pathways (RCP) for

the evolution of the global greenhouse gases, depending on how fast the human civilization

adapts itself to reduce the emission of greenhouse gas [75, 76]. The ﬁrst scenario (RCP2.6)

describes a trajectory where the emissions stay very low, as a result of the environment-policy

changes drastically made by governments and ﬁrms [77]. The medium scenarios are RCP4.5

and RCP6.0, a situation where the greenhouse gases increase but still stabilize before or

after 2100 [78]. RCP8.5 is the worst scenario where the greenhouse gas concentration keeps

increasing without stabilization [79].

All three forcings are available over 1000 till 1999 for reconstruction purpose. The full

description regarding their derivation is in [72].

30

Figure 3.1: Greenhouse gas projections of four RCP scenarios

3.3 Model Speciﬁcation

3.3.1 Forcing Transformation and Extension

The volcanic inﬂuence can be extremely strong among the years following eruptions, but

fast decays over non-volcanic years, leading to a cooling eﬀect in the long run [74,80]. Hence,

we apply a decreasing logarithm transformation

˜Vt = log(−Vt + 1)

(3.1)

to the original data to mitigate the impact of large volcanic eruptions during calibration

period [35]. Also, we consider the simplest transformation

˜Ct = log(Ct)

(3.2)

31

given that the radiative forcing depends on greenhouse gases logarithmically [81, 82].

In order to project the NH mean temperature for the century, we need to expand the

three forcing series to 2100. Following [38], we remove the 11-year cycle in the solar forcing

via a moving-average process, since this periodic cycle does not appear in the reconstructed

data that we use. In addition, the solar constant of the series has to be adjusted by about

5 W/m2, because of the technological update on its evaluation [38, 69].

As for greenhouse gas forcing, we add a random drifting noise term to each RCP scenario,

to capture the uncertainty in the global evolution with respect to the climate policies’ change.

Speciﬁcally, we have:

C(cid:48)
2019+t = C2019+t +

σC εC , t ∈ [0, 81]

t
82

with εC ∼ N (0, 1) and σC is decided by

(cid:0) max Ct − min Ct

(cid:1), 2019 ≤ t ≤ 2100

σC =

1
10

(3.3)

(3.4)

which yields an uncertainty that grows with the RCPs’ magnitudes. Various uncertainties

have been applied to test robustness of the model (refer Appendix A).

To our knowledge, there has not been any literature about the long-term volcanic pre-

diction on a global scale. Most studies concentrate on either a speciﬁc volcano or forecast-

ing eruptions short-periodically ahead [83–85]. Therefore, we treat volcanic activity as a

stochastic process, allowing to take volcanic uncertainty into account for the temperature’s

projection. Note that the volcanic aerosol concentration decays similarly (up to a rescaling

factor, see Figure 3.2) after eruptions, we thus model the volcanic time series as a succession

32

of spikes with random amplitudes and time intervals, which can be written as:

Vt = V0 + τt +

1{t≥Ti}AiPt−Ti

(3.5)

∞(cid:88)

i=0

where

• V0 is an overall constant

• τt is a small random ﬂuctuation

• Ti are the dates of eruptions with amplitude Ai

• Pt−Ti

is the decreasing pattern of the aerosol concentration after each spike

Figure 3.2: The ﬁrst 36 re-normalized volcanic eruptions’ decay

33

Moreover, the time increment (Ti+1 − Ti) and the spike amplitude (Ai) are assumed to

be independent and identically distributed. This assumption is veriﬁed by high p-values of

Ljung-Box test (autocorrelations of a time series) applied to both series [86–88]. Eventually,

the three natural forcing factors are normalized to make their inﬂuence on the temperature

comparable among each other.

3.3.2 Multi-level Autoregressive Model

The hierarchical stochastic model is established as follows:

Pt = aPt−1 + α0 + α1Tt + σP εt

Tt = bTt−1 + β0 + βSSt−1 + βV Vt−1 + βC Ct−1 + σT ηt

(3.6)

where

• εt, ηt: white noises (mean 0, standard deviation 1) without any correlation structure

• a, b: autoregressive parameters

• α, β: coeﬃcients associated with temperature and forcings

At any given year, the non-stationary model has the advantage of taking into account the

impact of forcings in the past years (with an exponential decay) to the current temperature

(see Appendix B for further details). From a Bayesian perspective, we assign the following

34

prior distributions to the parameters in the model:

α = (α0, α1) ∼ N ([0, 1], I2)

β = (β0, βS, βV , βC ) ∼ N ([0, 1, 1, 1], I4)

σ2
P , σ2

T ∼ IG(2, 0.1)
a, b ∼ U(0, 1)

(3.7)

In is an n-dimensional identity matrix. IG, U represent inverse gamma distribution and

uniform distribution respectively.

The model is run by Gibbs sampling algorithm with 10,000 iterations, but we only take

the last 5,000 samples since the beginning of the chain (burn-in period) is often discarded,

owing to its lacking accuracy to represent the desired distribution. The choice of prior

distributions (normal and inverse gamma) for the linear coeﬃcients and variances, is for the

sake of conjugacy (posterior distributions yield same distributions as priors with diﬀerent

parameters), thereby avoiding computational burden [89, 90]. The complete computation

of posterior conditional densities for the model may be found in Appendix C. It appears

that small variations in the memory coeﬃcients (a and b) lead to convergence instability
for other model coeﬃcients (α and β). Thus, we explore all the combinations of (a, b) ∈
{0, 0.1, 0.3, 0.5, 0.7, 0.9}2 (no memory to strong memory), and run the Gibbs sampler for all

other parameters. The model is implemented in Python using a few fundamental libraries

such as NumPy and pandas.

35

3.4 Results

3.4.1 Validation Metrics

Several metrics are produced to assess the quality of the temperature’s reconstructions

as well as to evaluate the posterior distributions of the parameters, applying them as criteria

to compare with other works.

We ﬁrst calculate the root mean square error (RMSE) by using the mean of the posterior

distributions. It is a frequently employed measure of the diﬀerences between observed values

and predicted values [91–93]. The empirical coverage probability (ECP) indicates the pro-

portion of true value of interest (temperature) fall into the conﬁdence interval [89]. Lower

RMSE implies better model ﬁt.

Additionally, we provide two scoring rules which are interval score (IS) and continuous

ranked probability score (CRPS) [94–97]. IS rates the posterior conﬁdence interval in func-

tion of the quantiles of posterior distribution for each observation, rewarding sharp prediction

intervals and penalizing uncovered observations [96]. ECP and IS are both computed and

compared at the levels of 80% and 95%. CRPS is deﬁned as a squared distance between

the cumulative distribution function F (y) and the indicator function 1y≥x of the predictive

distribution as below:

CRP S(F, x) =

(cid:2)F (y) − 1y≥x

(cid:3)2dy

(3.8)

(cid:90) ∞

−∞

Both scoring rules and ECP are positive-oriented with the designated model (higher values

indicate a more appropriate model).

A Markov chain tends to run through as much state space as possible, and continuously

reduces its variability to focus on the areas of the space with high density for an invariant

36

distribution, which it converges toward in the end. Therefore, the total variance of the chain

shrinks until it asymptotically converges to the variance of the invariant distribution.

In

order to determine the performance of the MCMC’s convergence, we compute the potential

scale reduction factor (PSRF) for the posterior samples [98, 99]. The PSRF (optimal value

is 1) estimates how much variance needs to be reduced before achieving convergence.

3.4.2 Reconstruction (1000-1999)

Smaller memories are related to lower RMSE (better model performance) is not surpris-

ing, because the no-memory model is equivalent to a linear regression, which comes down to

minimizing the squared error (Table 3.1).

a

b

0
0.1
0.3
0.5
0.7
0.9

0

0.1

0.3

0.5

0.7

0.9

0.157
0.148
0.139
0.136
0.138
0.175

0.160
0.151
0.142
0.138
0.143
0.190

0.169
0.161
0.152
0.146
0.153
0.221

0.182
0.175
0.166
0.159
0.168
0.288

0.204
0.199
0.189
0.187
0.203
0.341

0.219
0.215
0.208
0.212
0.238
0.375

Table 3.1: Root mean square error (RMSE) for all combinations of memory coeﬃcients

For the ECPs, the credible intervals are widened at both levels 80% and 95% when

increasing memory parameters, and a good compromise can be found with a between 0.5

and 0.7, b between 0.3 and 0.7 (Table 3.2). The IS (80% level) does not exhibit any speciﬁc

trend, except for extreme memory value (b = 0.9) returning unsatisfactory results. Whereas

at level of 95%, enlarge a and diminish b would lower the negative interval score overall

indicating a better model ﬁt (Table 3.3). The continuous ranked probability scores are quite

similar for a and b below 0.7, but become larger for higher memory values (Table 3.4).

Since the best performances are achieved with a and b at the order of magnitude described

37

a

b

0
0.1
0.3
0.5
0.7
0.9

a

b

0
0.1
0.3
0.5
0.7
0.9

0

0.1

0.3

0.5

0.7

0.9

71
68
60
58
50
42

69
67
62
57
52
42

67
66
67
64
62
41

73
69
68
70
71
47

81
81
80
80
79
65

88
87
84
88
96
96

0

0.1

0.3

0.5

0.7

0.9

89
84
78
74
72
58

90
86
80
74
72
60

91
89
82
76
75
63

96
94
94
91
84
65

99
98
97
97
98
83

99
99
99
99
99
100

Table 3.2: Empirical coverage probability (ECP) at levels 80% (up) and 95% (down)

regarding the ECPs (i.e. the uncertainty quantiﬁcation), we run another set of simulations
reﬁning this particular area (a ∈ [0.5, 0.7] and b ∈ [0.3, 0.7]), results in Table 3.5). Among

the new simulations, the IS(80) and IS(95) are best at low memory parameters until a = 0.6

and b = 0.55. The CRPS does not vary much but seems to decrease as a and b increase.

The highest ECP (80% level) is always attained when a = 0.7. However, the best ECP at

level 95 is obtained for lower values of a. Eventually, considering all the other validation

metrics, we choose the pair a = 0.6, b = 0.5 to be the memory coeﬃcients. It conﬁrms that

even though the no-memory model achieves a better ﬁt, it is necessary to include memories

to properly address model uncertainties [58].

In [58], the authors build a hierarchical Bayesian model with eight scenarios based on

memory or memoryless feature (controlled by two Hurst parameters H and K), with or

without external forcings, and error terms’ structures: fractional Gaussian (fGn) or autore-

gressive (AR). We compute all of the validation measurements for one case (scenario D in [58]

38

a

b

0
0.1
0.3
0.5
0.7
0.9

a

b

0
0.1
0.3
0.5
0.7
0.9

0

0.1

0.3

0.5

0.7

0.9

0.184
0.188
0.201
0.221
0.254
0.363

0.184
0.189
0.199
0.215
0.251
0.382

0.186
0.189
0.193
0.204
0.234
0.415

0.188
0.186
0.186
0.185
0.218
0.489

0.193
0.188
0.178
0.178
0.204
0.432

0.198
0.193
0.189
0.192
0.215
0.344

0

0.1

0.3

0.5

0.7

0.9

0.062
0.066
0.085
0.108
0.137
0.214

0.061
0.065
0.084
0.101
0.132
0.221

0.059
0.061
0.067
0.081
0.107
0.219

0.058
0.058
0.057
0.057
0.076
0.225

0.062
0.061
0.060
0.061
0.067
0.153

0.067
0.066
0.065
0.068
0.078
0.129

Table 3.3: Negative interval score (IS) at both 80% (up) and 95% (down) levels

a

b

0
0.1
0.3
0.5
0.7
0.9

0

0.1

0.3

0.5

0.7

0.9

0.075
0.074
0.075
0.078
0.084
0.117

0.076
0.075
0.074
0.078
0.086
0.126

0.076
0.076
0.076
0.076
0.083
0.138

0.081
0.082
0.077
0.074
0.079
0.173

0.083
0.084
0.079
0.078
0.079
0.168

0.086
0.086
0.084
0.081
0.081
0.114

Table 3.4: Negative continuous ranked probability score (CRPS) for diﬀerent memories

vs. a = 0.6, b = 0.5 in this work) in each model to compare the reconstructions (see Table

3.6 and Table 3.7).

In general, scenario D performs the best among other scenarios, and is also the closest

one to the model in this project, which put short memories in both equations. It turns out

that our model obtains more precise ECPs at both 80% and 95% levels, and the CRPS is less

than half as much as the model in [58]. The interval scores stay quite similar between both

models, and the RMSE is a bit better for [58], but RMSE is not necessarily a performance

indicator since it favors the memoryless models.

39

a

b
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70

a

b
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70

0.50

0.55

0.60

0.65

0.70

68
68
70
70
70
72
74
70
71

69
70
71
71
71
74
74
73
72

76
71
73
74
76
76
76
75
76

78
79
79
78
78
79
79
77
78

80
80
82
81
80
81
79
79
79

0.50

0.55

0.60

0.65

0.70

94
93
94
94
91
90
88
86
84

95
94
94
94
94
93
92
88
88

96
97
97
97
95
97
97
92
90

97
97
97
98
98
98
98
97
95

97
98
97
98
97
98
98
98
98

Table 3.5: ECP at two levels 80% (up) and 95% (down) for speciﬁc calibrations

Additionally, both models appear to converge towards limit posterior distributions nu-

merically well (PSRFs in Table 3.7), but our model seems to converge slightly better. This

could be explained by the smaller variability in the Gibbs sampling process for our model,

given that a and b are ﬁxed to be 0.6 and 0.5 respectively, whereas the memory parameters

(H and K) are not constants in scenario D [58].

Scenario D

Model (a = 0.6, b = 0.5)

RMSE ECP (80) ECP (95)
0.162
0.174

90.9
95.0

74.7
76.0

IS (80)
0.176
0.181

IS (95) CRPS
0.209
0.063
0.056
0.074

Table 3.6: Validation metrics (negative IS and CRPS) for both models

We also compare our reconstruction results with another regression method in climatology

domain, namely errors in variables (EIV), which is a scaling regularization allowing for errors

40

Scenario D

Model (a = 0.6, b = 0.5)

α0
1.01
1.00

α1
1.00
1.01

β0
1.05
1.01

βS
1.07
1.01

βV
1.00
1.00

βC
1.03
1.01

σ2
P
1.09
1.00

σ2
T
1.00
1.01

Table 3.7: The diagnosis of MCMC’s convergence (PSRF) for both models

in both explanatory and response variables [41, 100]. The comparison may not be perfect,

since our model also incorporates the reconstructions of the three climate forcings, which is

proved to outperform those reconstitutions that only involving proxies [58, 101].

From the curves (displayed in Figure 3.3), we can see the trend is lower on our side as

opposed to [41], implying a more radical temperature in the past millennium, which conﬁrms

that forcings’ incorporation helps produce a cooler reconstruction [58]. The local variations

are rather similar, thanks to the common proxy data used in both methods. The gap between

the two curves is more distinct during the period of notably low solar activity (the Maunder

Minimum, see [102] for more details), which can be easily detected in the solar forcing series

(St), because sunspot number is strongly correlated to the TSI [103, 104].

Figure 3.3: Temperature anomalies’ reconstructions between our model and EIV

Furthermore, the memory included in our model takes into account the inﬂuence of

41

forcings of any given year on the following year. It is especially true for volcanism where

eruptions may yield cooling periods, but whose spikes only last around two years. In [105],

the authors show that tree rings underestimate the cooling eﬀect of large volcanic events,

which means that tree-ring based reconstructions (particularly proxy-only) may not properly

address volcanic eruptions. In this study, the proxy is an aggregation of multiple proxy time

series, less than half of which are tree rings. Hence, our reconstruction compensates this

phenomenon by adding moderate memories in the model.

3.4.3 Projection (2019-2100)

As part of the CMIP5, many institutes publish the result of climate simulations for this

century using diﬀerent models and data [106]. For the four RCPs from 2006 to 2100, we

compute the means of the surface temperature on the northern hemisphere from diﬀerent

models (listed in Table 3.8), followed by converting them to temperature anomalies. Al-

though the models and data exhibit diﬀerence, the sample paths of the various simulations

have almost identical trends (see Figure 3.4 for an example). Thus, for each RCP, we consider

the temperature projection is equal to the mean of all model simulations, which allowing us

to compare our results with this multi-model average method (Figure 3.5 and Figure 3.6).

Except RCP8.5, the temperature’s posterior means in our model exactly fall into the

95% conﬁdence intervals of CMIP5 projections. For RCP4.5 and RCP6.0, we can also see

that the 95% credible intervals cover CMIP5 means very well, especially for RCP6.0, the

temperature’s posterior means are almost the same as the CMIP5 projections. We cannot

explain why both our model and CMIP5 have a jump at the beginning, however, we do

not intend to model the high-frequency ﬂuctuations from year to year. In the initial 13-year

(2006-2018) projections, we note that the discrepancy between the NH average temperature’s

42

Modeling center or group (Location)

AORI, NIES, JAMSTEC (Japan)

Canadian Centre for Climate Modelling and Analysis (Canada)
Centre national de recherche m´et´eorologique (France)

Commonwealth Scientiﬁc and Industrial Research Organisation (Australia)

EC-Earth (Europe)
Institute of Atmospheric Physics, Chinese Academy of Sciences (China)

Institut Pierre Simon Laplace (France)

Max Planck Institute for Meteorology (Germany)

Meteorological Research Institute (Japan)

Met Oﬃce Hadley Centre for Climate Science and Services (UK)

NASA Goddard Institute for Space Studies (USA)

Norwegian Climate Centre (Norway)

Model (acronym)
MIROC-ESM-CHEM
MIROC-ESM
MIROC5
CanESM2
CNRM-CM5
ACCESS1.0
ACCESS1.3
CSIRO-Mk3.6.0
CSIRO-Mk3L
EC-EARTH
FGOALS-s2
IPSL-CM5A-LR
IPSL-CM5A-MR
IPSL-CM5B-LR
MPI-ESM-LR
MPI-ESM-MR
MRI-CGCM3
MRI-ESM1
HadGEM2-AO
GISS-E2-H-CC
GISS-E2-H
NorESM1-ME
NorESM1-M

Table 3.8: List of CMIP5 models used for multi-averaging temperature calculations

annual ﬂuctuations and CMIP5, our projection is smoother with greater magnitude than this

initial jump, which could be viewed as part of the fact that the feature of yearly ﬂuctuations

are missing from our model and from CMIP5.

In terms of the model robustness, not only do we add various choices of σC to expand

greenhouse gas concentration forcings (Appendix A), but we compute the diﬀerences over

two separate projection periods (2006-2100 and 2019-2100) using the same model (Table

3.9). The trends in the graphs are relatively similar to each other (the progression re-

sults can be found in the supplemental materials), and the values in Table 3.9 are rather

minimal compared with the projected temperature anomalies in the corresponding RCP sce-

narios. Therefore, both methods justify that our model is quite robust and the uncertainty

43

Figure 3.4: Temperature prediction for RCP8.5 between 2006 and 2100 (CMIP5)

Figure 3.5: Temperature projections until 2100 for all four RCPs: comparisons with CMIP5
multi-model average

quantiﬁcation is reasonable, which is not allowed to be evaluated by the model averaging

methodology of CMIP5.

Besides, following [35], we extend and test all memory combinations-white noise (WN,

no memory), AR(1) and AR(2) (short-term memory), and fGn (long-term memory) for the

purpose of projection in [58]. The outcomes are shown in Table 3.10 below.

In similar to the reconstruction results, the best one is the no-memory model (WN-WN)

except the ECPs, where the AR(1)-fGn achieves better at both levels. Models with stronger

memory yield worse RMSE, since memoryless model corresponds to the classical framework

of linear regression, which directly minimizes the mean squared error. The scoring measures

44

Figure 3.6: Temperature projections’ comparisons with CMIP5 for medium scenarios (up:
RCP4.5; down: RCP6.0)

RCP2.6 RCP4.5 RCP6.0 RCP8.5

Year
2020
2030
2040
2050
2060
2070
2080
2090
2100

0.039
0.044
0.041
0.052
0.034
0.066
0.050
0.045
0.042
Average
0.045
Maximum 0.066

0.026
0.032
0.029
0.031
0.025
0.031
0.048
0.046
0.039
0.033
0.052

0.061
0.076
0.104
0.120
0.127
0.168
0.214
0.239
0.244
0.148
0.266

0.196
0.284
0.400
0.540
0.717
0.914
1.134
1.354
1.627
0.775
1.627

Table 3.9: Temperature diﬀerences (in absolute values) between two projection periods

(IS and CRPS) do not have explicit pattern. Scenario AR(1)-AR(1) model attains decent

validation metrics, and beats WN-WN model with respect to the ECPs, which assures to

incorporate memories to more suitably quantify uncertainties.

For the coeﬃcients part, the values of α1 are much larger for the models with a white

noise in the proxy equation, because the noise is anticipated to take a part of the signal.

Among the forcings, the greenhouse gas concentration (βC ) is always the highest with low

standard deviation, which identiﬁes its statistical signiﬁcance. This is also reasonable because

it explains the temperature’s increase over the last century [72,100]. Solar coeﬃcient (βS) is

45

H

WN

AR(1)

AR(2)

fGn

K
WN
AR(1)
AR(2)

fGn
WN
AR(1)
AR(2)

fGn
WN
AR(1)
AR(2)

fGn
WN
AR(1)
AR(2)

fGn

H

WN

AR(1)

AR(2)

fGn

α0

-0.015 (0.008)
-0.029 (0.005)
-0.028 (0.005)
-0.024 (0.005)
-0.234 (0.034)
-0.188 (0.028)
-0.076 (0.022)
-0.171 (0.034)
-0.204 (0.040)
-0.179 (0.055)
-0.185 (0.044)
-0.170 (0.062)
-0.167 (0.313)
-0.153 (0.159)
-0.081 (0.103)
-0.133 (0.158)

α1

0.684 (0.008)
0.573 (0.005)
0.587 (0.005)
0.629 (0.005)
0.230 (0.034)
0.266 (0.028)
0.253 (0.022)
0.285 (0.034)
0.224 (0.040)
0.268 (0.055)
0.257 (0.044)
0.282 (0.062)
0.190 (0.313)
0.202 (0.159)
0.214 (0.103)
0.247 (0.158)

β0

-0.474 (0.018)
-0.538 (0.038)
-0.526 (0.045)
-0.444 (0.210)
-0.053 (0.986)
-0.550 (0.047)
-1.006 (0.180)
-0.574 (0.086)
-0.543 (0.043)
-0.535 (0.055)
-0.550 (0.047)
-0.531 (0.122)
-0.542 (0.044)
-0.648 (0.155)
-1.086 (0.249)
-0.618 (0.142)

βS

0.026 (0.018)
0.046 (0.038)
0.041 (0.045)
0.007 (0.210)
-0.012 (0.986)
0.084 (0.047)
0.071 (0.180)
0.071 (0.086)
0.089 (0.043)
0.077 (0.055)
0.083 (0.047)
0.048 (0.122)
0.078 (0.044)
0.065 (0.155)
0.075 (0.249)
0.048 (0.142)

βV

0.010 (0.018)
-0.008 (0.038)
-0.008 (0.045)
-0.005 (0.210)
-0.413 (0.986)
-0.017 (0.047)
-0.019 (0.180)
-0.018 (0.086)
-0.018 (0.043)
-0.017 (0.055)
-0.018 (0.047)
-0.017 (0.122)
-0.015 (0.044)
-0.020 (0.155)
-0.022 (0.249)
-0.018 (0.142)

βC

0.127 (0.018)
0.153 (0.038)
0.158 (0.045)
0.151 (0.210)
0.895 (0.986)
0.113 (0.047)
0.278 (0.180)
0.130 (0.086)
0.107 (0.043)
0.113 (0.055)
0.114 (0.047)
0.132 (0.122)
0.113 (0.044)
0.159 (0.155)
0.291 (0.249)
0.154 (0.142)

K
WN
AR(1)
AR(2)

fGn
WN
AR(1)
AR(2)

fGn
WN
AR(1)
AR(2)

fGn
WN
AR(1)
AR(2)

fGn

RMSE ECP (80) ECP (95)
0.126
0.118
0.115
0.106
0.169
0.158
0.297
0.167
0.157
0.156
0.153
0.172
0.159
0.203
0.310
0.178

0.65
0.63
0.65
0.68
0.90
0.92
0.90
0.95
0.92
0.97
0.97
0.99
0.92
0.99
0.92
0.99

0.72
0.50
0.52
0.46
0.63
0.69
0.64
0.76
0.70
0.82
0.76
0.88
0.73
0.90
0.82
0.83

IS (80)
0.126
0.224
0.213
0.185
0.204
0.172
0.347
0.169
0.170
0.147
0.147
0.152
0.167
0.176
0.319
0.161

IS (95) CRPS
0.036
0.056
0.074
0.122
0.071
0.114
0.090
0.064
0.082
0.072
0.071
0.053
0.138
0.110
0.050
0.073
0.070
0.055
0.064
0.049
0.048
0.064
0.067
0.052
0.070
0.049
0.095
0.065
0.106
0.012
0.070
0.054

Table 3.10: Gibbs sampler convergence for the posterior means and (standard deviations) of
model coeﬃcients in Scenario D as well as validation metrics for each memory combination

46

the same order of magnitude even though it is smaller than βC , which is in accordance with

the solar activity has a non-negligible inﬂuence on the climate before greenhouse gas explodes

[71]. The volcanic coeﬃcient (βV ) is always negative with probability greater than 99% in
most models. Considering we apply a decreasing transformation on volcanism log(−Vt + 1),

which means that the volcanic activity is expected to slightly warm the planet. However,

the volcanic eruptions are known to produce an overall global cooling eﬀect [105]. It cannot

have any impact beyond ﬁve or six years since eruptions are represented as short-memory

spikes (in Vt). Therefore, long-period volcanic activity is more appropriately addressed by

autoregression proposed in our study.

In sum, increasing the memory parameters would decrease all the coeﬃcients except

for volcanism. It is coherent because increasing memory grants more weights to the past

climate forcing factors and temperature in the model. The stability of the volcanic coeﬃcient

indicates that stronger memories help taking the impact of past eruptions over the following

years into consideration.

3.5 Conclusions and Discussions

In the study, we build a multi-level stochastic model to reconstruct NH temperature

anomalies over the past millennium and to make projections for this century, by incorporating

natural climate forcings (solar irradiance, volcanism, greenhouse gases) with memories in

both levels. Bayesian inference is adopted to systematically estimate the magnitude of all

unknown quantities as well as model uncertainties.

Moderate memory is suggested by the validation metrics in comparisons from both re-

constructions and projections. The memory inclusion in our model (autoregression) not only

47

allows for the inﬂuence of external forcings at any given year on the years after, but also

strengthens the overall decreasing trend appeared in reconstructions (Figure 3.3). It gener-

ates lower temperatures (especially before Year 1900) compared to [41], noticing the model

and data are diﬀerent though. The projection deviation between this paper and CMIP5 is

proportional to the greenhouse gas growth amplitude, which may be a consequence of the

solar activity prediction took from the CMIP6, where the solar decay over the next century

is a novelty that CMIP5 did not take into account [38]. We believe that the solar activ-

ity’s decay is more impactful on the temperature when the greenhouse gas concentration is

smaller, which explains the evolution of the gap.

There are a few possible extensions of this paper. One is to simultaneously evaluate mem-

ory parameters (a and b) with other prior distributions involving model coeﬃcients (α and

β), which most likely would cause higher computational demand. Another possibility is to

include spatiotemporal component by designing a spatial pattern for proxies over time [59],

providing more smooth reconstructions [107]. This implementation requires more compre-

hensive understandings of climatic system to propose a more proper and feasible temporal

covariance structure for the model, which probably might simplify computations and create

further scientiﬁc insights.

48

Chapter 4

Predictive-distribution

Approximation via Wiener Chaos

4.1 Preliminaries

Assume we have the simplest linear regression model:

k(cid:88)

Yi =

Xijβj + σεi

(4.1)

j=1

where i = 1, 2, ..., n, j = 1, 2, ..., k, and εi

i.i.d.∼ N (0, 1). Let X, Y, βββ, εεε denote Xij, Yi, βj, εi

respectively in matrix forms. The model (4.1) is rewritten as:

Y = Xβββ + σεεε

The log-likelihood function of the parameters (βββ, σ2) up to a constant term is:

l(βββ, σ2) = −nln(σ) − 1

2σ2 (Y − Xβββ)2

(4.2)

(4.3)

49

Then the maximum likelihood estimator (MLE) of (βββ, σ2) can be derived from (4.3) by
letting ∂l(βββ,σ2)

∂σ2 = 0. Thus, the MLEs of the parameters are

∂βββ = 0 and ∂l(βββ,σ2)

 ˆβββ = (XTX)−1XTY

(Y − Xˆβββ)2

(cid:99)σ2 =

1
n

(4.4)

ˆβββ is known as the ordinary linear squared (OLS) estimator, which is unbiased for βββ (i.e.,

E(ˆβββ) = βββ). However,(cid:99)σ2 is biased for σ2. An unbiased estimator of σ2 (i.e., E(s2) = σ2) is

s2 =

where ˆβββ is independent from s2 and

YTY − ˆβββ

T

XTY

n − (k + 1)

(n − (k + 1))s2

σ2

∼ χ2

n−(k+1)

The variance-covariance matrices of (ˆβββ, s2) are

 C(ˆβββ) = (XTX)−1XTV(Y)((XTX)−1XT)T = σ2(XTX)−1 := C
V(s2) = V(cid:16)
(cid:1) =
V(cid:0)χ2

χ2
n−(k+1)

n−(k+1)

(cid:17)

σ4

σ2

n − (k + 1)

=

(n − (k + 1))2

(4.5)

(4.6)

(4.7)

2σ4

n − (k + 1)

50

4.2 Approximation Error

4.2.1 Wiener-chaos Expansion

Intuitively, s2 is an unbiased estimator of σ2, so we decide to use s2 to replace σ2 in C(ˆβββ)

and in V(s2). Then the error of the approximation es2 for V(s2) is

es2 =

2σ4

n − (k + 1)

−

2s4

n − (k + 1)

Now let d = n − (k + 1). (4.6) becomes

where E(Z) = d, V(Z) = 2d. Then

Z :=

ds2
σ2 ∼ χ2

d

˜Z := Z − d ∼ χ2

d

(4.8)

(4.9)

(4.10)

is a centered chi-squared distribution (i.e., E( ˜Z) = 0) with degrees of freedom d.

In the classic Wiener space L2[0, 1], a centered Gaussian family G(ψ), ψ ∈ L2[0, 1] of

random variables can be identiﬁed as the stochastic diﬀerential of a Wiener process W :

(cid:90) 1

G(ψ) :=

ψ(s)dW (s)

(4.11)

0

In order to ﬁnd how big the approximation error in (4.8) is, it is convenient to represent

51

˜Z in terms of Wiener integrals for computational sake, that is

˜Z =

ds2
σ2 − d

D
=

(G2

i − 1)

d(cid:88)

i=1

where

(cid:90) 1

0

Gi =

εi(s)dW (s) := W (εi) = IW

1 (εi)

and εi, i ≥ 1 are orthonormal family existed on L2[0, 1]. By contraction rule,

(4.12)

(4.13)

W 2(εi) − 1 = IW

2 (ε⊗2

i

)

(4.14)

Hence, applying product formula and (4.14) to ˜Z2, we have

)2 − IW

2 (ε⊗2

i − ε⊗2

j

j

)2(cid:3)

1
4

i

4

j

(G2

) =

j

i=1

j=1

i=1

j=1

i=1

j=1

i=1

j=1

i=1

j=1

i − 1)(G2

j − 1) =

d(cid:88)
d(cid:88)
d(cid:88)
d(cid:88)
2 (ε⊗2
2 (ε⊗2
IW
)IW
d(cid:88)
d(cid:88)
(cid:2)IW
(cid:0)(ε⊗2
i + ε⊗2
) ⊗ (ε⊗2
d(cid:88)
d(cid:88)
)(cid:1)(cid:3) − 1
i + ε⊗2
(cid:0)(ε⊗2
i − ε⊗2
d(cid:88)
d(cid:88)
(cid:2)4IW
4 (ε⊗2
(cid:18) d(cid:88)
d(cid:88)

d(cid:88)
d(cid:88)
(cid:0)W 2(εi) − 1(cid:1)(cid:0)W 2(εj) − 1(cid:1)
d(cid:88)
d(cid:88)
(cid:2)IW
2 (ε⊗2
i + ε⊗2
)(cid:1) + 4IW
(cid:0)(ε⊗2
i + ε⊗2
i + ε⊗2
)(cid:1)
(cid:2)IW
(cid:0)(ε⊗2
i − ε⊗2
i − ε⊗2
)(cid:1)(cid:3)
i − ε⊗2
) ⊗1 (ε⊗2
)(cid:3)
i ⊗1 ε⊗2
2 (ε⊗2
(cid:18) d(cid:88)
(cid:19)
d(cid:88)
i ⊗1 ε⊗2
ε⊗2

εi ⊗ εi ⊗ εj ⊗ εj

i ⊗ ε⊗2

) ⊗ (ε⊗2

) + 16IW

+ 4IW
2

+ 4IW
2

⊗1 (ε⊗2

(cid:19)

i=1

j=1

i=1

j=1

j

2

j

j

1
4

1
4

4

j

4

j

)

j

j

j

j

j

˜Z2 − E( ˜Z2)

D
=

=

=

=

= IW
4

i=1

j=1

i=1

j=1

(4.15)

52

where

i ⊗1 ε⊗2
ε⊗2

j

(s1, s2) =

(cid:90) 1

0

εi(s1)εi(s)εj(s2)εj(s)ds =

0

εi(s1)εj(s2) = εi ⊗ εi

i (cid:54)= j

i = j

(4.16)

Note that E( ˜Z2) = V( ˜Z) + E2( ˜Z) = 2d. Therefore, ˜Z2 becomes

(cid:18) d(cid:88)

d(cid:88)

˜Z2 D

= IW
4

(cid:19)

(cid:18) d(cid:88)

(cid:19)

+ 2d

(4.17)

εi ⊗ εi ⊗ εj ⊗ εj

+ 4IW
2

εi ⊗ εi

i=1

j=1

i=1

4.2.2 Error Magnitude

The error (4.8) in the replacement of σ2 by s2 for V(s2) is equal to

− 2
d

(cid:18)

2σ4
d

es2 =
D
= −2σ4
d3

IW
4

(cid:17)2

(cid:16)σ2( ˜Z + d)
(cid:16) d(cid:88)
d(cid:88)

d

i=1

j=1

= −2σ4

d3 ( ˜Z2 + 2 ˜Zd)

(cid:17)

εi ⊗ εi ⊗ εj ⊗ εj

+ (2d + 4)IW
2

(cid:17)(cid:19)

εi ⊗ εi

(cid:16) d(cid:88)

i=1

(4.18)

with E(es2) = 0 because it only contains the sums of chaos terms.

For any function g and q ≥ 1,

V(Iq(g)) = q!(cid:107)g(cid:107)2

L2([0,1]q)

(4.19)

53

Then the variance of the approximation given (4.19) is

V( ˜Z2) = 4!

(cid:13)(cid:13)(cid:13)(cid:13) d(cid:88)
d(cid:88)
d(cid:88)
d(cid:88)

j=1

i=1

(cid:13)(cid:13)(cid:13)(cid:13)2
εi ⊗ εi ⊗ εj ⊗ εj
d(cid:88)
d(cid:88)
d(cid:88)
d(cid:88)

1 + 32

i(cid:48)=1

i=1

j=1

j(cid:48)=1

i(cid:48)=1

i=1

= 24

(cid:13)(cid:13)(cid:13)(cid:13) d(cid:88)

i=1

(cid:13)(cid:13)(cid:13)(cid:13)2

L2([0,1]2)

εi ⊗ εi

+ 16 × 2!

L2([0,1]4)

1

(4.20)

= 24d2 + 32d

Apply the results in (4.20), the variance of es2 is

d3

(cid:16) − 2σ4
(cid:17)2(cid:0)24d2 + (2d + 4)2 × 2d(cid:1)
(cid:0)8d3 + 56d2 + 32d(cid:1)
d3 + O(cid:0) 1

4σ8
d6
32σ8

(cid:1)

d4

V(es2) =

=

=

(4.21)

Similarly, we employ the same methodology to C to ﬁnd its approximation error eˆβββ

.

eˆβββ

= (σ2 − s2)(XTX)−1
(cid:17)
(cid:16)
σ2 − σ2( ˜Z + d)
=
= −σ2
˜Z(XTX)−1
d
D
= −σ2
d

(cid:16) d(cid:88)

IW
2

d

εi ⊗ εi

i=1

(XTX)−1
(cid:17)

(XTX)−1

where E(eˆβββ

) = 0 since it only includes the second-chaos terms. The variance of eˆβββ

(cid:16) − σ2
(cid:17)2 × 2 × V(cid:0)(XTX)−1(cid:1)
V(cid:0)(XTX)−1(cid:1)

d

2σ4
d2

V(eˆβββ

) =

=

54

(4.22)

is

(4.23)

4.3 Prediction Evaluation

4.3.1 Explicit Boundaries

Considering prediction for a new observation X∗. The density of

(cid:112)
σ2ε∗ = law

law

2σ4
d

(cid:18)(cid:114)
N(cid:0)σ2,
(cid:18)(cid:115)
(cid:114)
(cid:114) 2
(cid:18)
(cid:16)
(cid:18)
σε∗(cid:16)

σ2 +

1 +

1 +

1
2

σ

(cid:1) × ε∗(cid:19)
η × ε∗(cid:19)
(cid:17) × ε∗(cid:19)
(cid:17)(cid:19)

2σ4
d

η

d
1√
2d

η

= law

≈ law

= law

√

σ2ε∗ is

(4.24)

where η ∼ N (0, 1). ε∗ and εεε are independent, which is consistent with εi
Based on the approximation error calculated above, replacing σ2 with s2 (data-base value),
the approximate distribution of Y ∗ from (4.2) is

i.i.d.∼ N (0, 1).

(cid:99)law Y ∗ = law (X∗ ˆβββ +

(cid:112)
s2ε∗)

 ˆβββ ∼ N(cid:0)(XTX)−1XTY,(cid:98)C(cid:1)
s2 ∼ N(cid:0)s2,

(cid:1)

2s4
d

with

where (cid:98)C = s2(XTX)−1.

(4.25)

(4.26)

(4.27)

Thus, the predictive distribution of Y ∗ in (4.25) is approximated by convolution as

(cid:16)

sε∗(cid:0)1 +

1√
2d

η(cid:1)(cid:17)

(cid:99)law Y ∗ ≈ N(cid:0)µY ∗, σ2

Y ∗(cid:1) (cid:126) law

55

with

Deﬁne

µY ∗ = X∗(cid:0)XTX(cid:1)−1XTY
Y ∗ = X∗(cid:98)CX∗T

σ2

U1 = sε∗ ∼ N (0, s2), U2 = 1 +

1√
2d

η ∼ N (1,

1
2d

)

U1 and U2 are independent by deﬁnition. Then

U := U1 × U2 = sε∗ +

ε∗η

s√
2d

(4.28)

(4.29)

(4.30)

is a product of two independent normal random variables.

Lemma 4.3.1. ηε∗ is a product of two independent standard normal variables, which can

be represented in the second Wiener chaos as follows:

(cid:0)(η(cid:48))2 − (ε∗(cid:48))2(cid:1)

ηε∗ =

1
2

where η = 1√
2

(η(cid:48) + ε∗(cid:48)) and ε∗ = 1√

2

(η(cid:48) − ε∗(cid:48)). η(cid:48), ε∗(cid:48) ∼ N (0, 1) are independent.

The distribution of U can be seen as a non-convolution sum of two coupled (due to the
common ε∗) random variables. The ﬁrst piece is N (0, s2), the second piece is a product nor-

mal, which can be characterized by two independent chi-square distributions (from Lemma

4.3.1), each with degree of freedom 1 (i.e., χ2(1)), and a scaling factor

s√
8d

.

Corollary 4.3.2. When σ2 = 1 (i.e., X is standard normal), for  > 0

− (+1)2
e

2 ≤ P(X ≥ ) ≤ 1
2

1
2

− 2
e
2

56

Proof.

P(X ≥ ) =

=

≥

− x2

2 dx

− (x+)2

2

dx

1√
2π
1√
2π

e

e

− (x+)2
e

2

dx

1√
2π
− 2+2

2

≥ 0.34e

− (+1)2
e

2

≥ 1
2

(cid:90) ∞
(cid:90) ∞
(cid:90) 1

0


0

(cid:90) ∞
(cid:90) ∞

0

− (x+)2

2

e

dx

− x2+2
e

2

dx

1√
2π
1√
2π

P(X ≥ ) =

≤

=

0
1
2

− 2
e
2

Equivalently,

P(X ≥ ) ≤ 1
2

e

− 2

2 ≤ P(X ≥  − 1)

(4.31)

showing the upper bound is between the tail probabilities within one standard deviation.

Hence, the cumulative distribution function (CDF) of U , namely ΦU (u) = P(U ≤ u), u >

0 is computed as

ΦU (u) = 1 − P(U ≥ u) = 1 − Eη

(cid:104)P
ε∗(cid:0)ε∗ ≥
ε∗(cid:0)ε∗ <

= 1 − Eη

+ P

u

s(1 + η√
2d

)

ηε∗ ≥ u(cid:12)(cid:12)η(cid:1)(cid:105)

s√
2d

(cid:104)P
ε∗(cid:0)sε∗ +
(cid:12)(cid:12)η(cid:1)1{η>−√
(cid:105)
(cid:12)(cid:12)η(cid:1)1{η<−√

2d}

)

u

s(1 + η√
2d

2d}

(4.32)

57

Remark. η ∼ N (0, 1). Suppose d > 4,

ε∗(cid:0)ε∗ <

P

(cid:12)(cid:12)η(cid:1) < P

u

s(1 + η√
2d

)

ε∗(ε∗ < 0) < 0.5

√
P(η < −

2d) < P(η < −3) < 0.0015

which makes it negligible compared to the ﬁrst term inside Eη.

Based on Corollary 4.3.2, the bounds of ΦU (u) are

Φ1(u, d) ≤ ΦU (u) ≤ Φ2(u, d)

(4.33)

with

Φ1(u, d) = 1 − Eη

Φ2(u, d) = 1 − Eη

)2(cid:21)
)2 (cid:21)

)]2

− u2
2s2

(1+

1
η√
2d

− [u+s(1+
2s2(1+

η√
2d
η√
2d

2

(cid:20)1
(cid:20)1
(cid:114)

2

e

e

Theorem 4.3.3. s =

YTY−ˆβββ
d

T

XTY

deﬁned in (4.5).

(cid:90) ∞
−√
(cid:90) ∞
−√

= 1 − 1
2

1√
2π

e

2d

= 1 − 1
2

1√
2π

e

2d

− u2
2s2

− x2
2 e

1
(1+ x√
2d

)2

dx

− [u+s(1+ x√
2d
2s2(1+ x√
2d

)]2
)2

− x2
2 e

dx

(4.34)

1 − 1
2

e

− u2
2s2 ≤ lim

d→∞ ΦU (u) ≤ 1 − 1

2

− (u+s)2
2s2
e

Proof. Φ1(u, d): For each x ∈ (−√

2d,∞),

− u2
2s2

− x2
2 e

lim
d→∞ e

1
(1+ x√
2d

)2

− x2
2 e

− u2
2s2

= e

58

Also,

(cid:12)(cid:12)(cid:12)(cid:12)e

− x2
2 e

− u2
2s2

1
(1+ x√
2d

)2(cid:12)(cid:12)(cid:12)(cid:12) ≤ e

− x2
2 ,

(cid:90) ∞
−∞ e

√

− x2

2 =

2π < ∞

By Dominated Convergence Theorem,

d→∞ Φ1(u, d) = 1 − 1

lim

2

(cid:90) ∞

−∞

1√
2π

− x2
2 e

lim
d→∞ e

− u2
2s2

1
(1+ x√
2d

)2

dx = 1 − 1
2

− u2
2s2
e

In similar, for Φ2(u, d), using Dominated Convergence Theorem,

− [u+s(1+ x√
2d
2s2(1+ x√
2d

)]2
)2

− x2
2 e

− (u+s)2
2s2

,

= e

− x2
2 e

lim
d→∞ e

(cid:12)(cid:12)(cid:12)(cid:12)e

− x2
2 e

− [u+s(1+ x√
2d
2s2(1+ x√
2d

)]2

)2 (cid:12)(cid:12)(cid:12)(cid:12) ≤ e

− x2
2

(cid:90) ∞

−∞

⇒ lim

d→∞ Φ2(u, d) = 1 − 1
Therefore, when d −→ ∞,

2

1√
2π

lim
d→∞ e

− x2
2 e

− [u+s(1+ x√
2d
2s2(1+ x√
2d

)]2
)2

dx = 1 − 1
2

− (u+s)2
2s2
e

1 − 1
2

e

− u2

2s2 = lim

d→∞ Φ1(u, d) ≤ lim

d→∞ ΦU (u) ≤ lim

d→∞ Φ2(u, d) = 1 − 1

2

− (u+s)2
2s2

e

Lemma 4.3.4. Let X ∼ N (µ, σ2) be a Gaussian random variable. For any non-negative

integer m, the m-th moment of X is



E(Xm) =

,

3
2

,− µ

2σ2 ), m is odd

(4.35)

m is even

1 − m

2
,− µ

2σ2 ),

µσm−12

m+1

2

Γ( m

2 + 1)√

π

M (

σm2

m
2

Γ( m+1
2 )√
π

M (−m
2

,

1
2

59

(cid:90) ∞

0

Γ(z) =

tz−1e−tdt

is the gamma function and

M (a, b, z) =

∞(cid:88)

n=0

a(a + 1)··· (a + n − 1)zn
b(b + 1)··· (b + n − 1)n!

is the Kummer’s conﬂuent hypergeometric function. In particular, if µ = 0,

E(Xm) =

(cid:90) ∞
√
−∞ xm 1
2π
= σm(m − 1)!!

σ

− x2

2σ2 dx

e

(4.36)

when m is even. !! denotes double factorial, all odd moments are 0.

Theorem 4.3.5. s =

(same as in Theorem 4.3.3). For any u > 0,

(cid:114)
(cid:18)

YTY−ˆβββ
d

T

XTY

− (u+s)2
e

1 − 1
2

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)ΦU (u) −

(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ δ2(u, d)

− (u+s)2
2s2
e

πd

√

4s2

2s2 − u(u + s)(2 − e−d)
s + 1)2(cid:3)

u2(cid:2)3 + ( u

− (u+s)2
2s2

e

δ2(u, d) =

8s2d

where

The proof uses second-order Taylor series expansion on e

− (u+s)2
2s2

and Lemma 4.3.4 (see

Appendix D for further reference). Finally, Y ∗ can be divided as the following:

Y ∗ ≈ X∗ ˆβββ +

= µY ∗ + σY ∗ε(cid:48) + sε∗(cid:0)1 +
= µY ∗ +(cid:0)σY ∗ε(cid:48) + sε∗(cid:1) +

η(cid:1)

ηε∗

1√
2d
s√
2d

(4.37)

(cid:112)
s2ε∗

60

where ε(cid:48) ∼ N (0, 1) is independent from ε∗ and η. The CDF of

V := σY ∗ε(cid:48) + U

(4.38)

is ΦV (v) = P(V ≤ v), v > 0 derived as

ΦV (v) = P(σY ∗ε(cid:48) + U ≤ v) = E
ε(cid:48)

(cid:105)
(cid:104)P(U ≤ v − σY ∗ε(cid:48))

(cid:104)

ΦU

(cid:0)v − σY ∗ε(cid:48)(cid:1)(cid:105)

= E
ε(cid:48)

Thus, according to Theorem 4.3.5, the bound of ΦV (v) is

(cid:104)

ΦV (v) ≤ Ex

− (v−xσY ∗ +s)2

2s2

− (v+s−xσY ∗ )2

1 − 1
2

e

(cid:90) ∞
(cid:90) ∞
−∞ e

= 1 − 1
2

e

−∞

1√
= 1 − 1
2
2π
− (v+s)2
2(s2+σ2
√
2
− (v+s)2
2(s2+σ2

= 1 − e

2π

Y ∗)

(cid:113)

Y ∗)
Y ∗
s2 + σ2

= 1 − se
2

(cid:105)

+ δ(cid:48)

2(v − xσY ∗, d)

− 1
2

2s2

(cid:2)(cid:0)1+
(cid:90) ∞
−∞ e

+ Ex

2 dx + Ex
(v+s)σY ∗

x+

(cid:0)x− (v+s)σY ∗

Y ∗
s2+σ2

s2

− x2
e

1√
2π

Y ∗
σ2
s2

(cid:1)x2−2
(cid:2) s2+σ2
(cid:2)δ(cid:48)
2(v − xσY ∗, d)(cid:3)

− 1
2

Y ∗

s2

(cid:3)

(v+s)2

(cid:2)δ(cid:48)
2(v − xσY ∗, d)(cid:3)
(cid:2)δ(cid:48)
2(v − xσY ∗, d)(cid:3)
(cid:2)δ(cid:48)
2(v − xσY ∗, d)(cid:3)

dx + Ex

dx + Ex

(cid:1)2(cid:3)

s2

given

δ(cid:48)
2(u, d) = δ2(u, d) +

− (u+s)2
2s2

e

(4.39)

(4.40)

u(u + s)(2 − e−d)

√

πd

4s2

61

Denote the density function of X(cid:48) ∼ N (µX(cid:48), σ2

X(cid:48)) as

fX(cid:48)(x) =

− (x−µX(cid:48) )2
X(cid:48)
2σ2

√
1
σX(cid:48)

e

2π



µX(cid:48) =

σ2
X(cid:48) =

(v + s)σY ∗
Y ∗
s2 + σ2

s2

Y ∗
s2 + σ2

(4.41)

Using Lemma 4.3.4, we have

2(v − xσY ∗, d)(cid:3) =
(cid:2)δ(cid:48)

Ex

=

=

+

+

2s2

π

Y ∗)

√
π

dx

√

π

s

s

(cid:90) ∞

− x2
2√
2π

(v − xσY ∗)e
4s2

√
2
d
− (v+s)2
2(s2+σ2
e

− (v−xσY ∗ +s)2
√
d

(cid:18)(v − xσY ∗ + s)(2 − e−d)
+ 1(cid:1)2(cid:105)(cid:19) e
(cid:104)
3 +(cid:0) v − xσY ∗
(cid:18)(v − xσY ∗ + s)(2 − e−d)
(cid:90) ∞
−∞(v − xσY ∗)
+ 1(cid:1)2(cid:105)(cid:19)
(cid:104)
3 +(cid:0) v − xσY ∗
(cid:26)2 − e−d√
(cid:104)
(cid:16)
− 3v2σY ∗µX(cid:48) + 3vσ2

−∞
v − xσY ∗
(cid:113)
4s
v − xσY ∗
(cid:113)
X(cid:48)(cid:1)(cid:105)

Y ∗ )
d(s2 + σ2
√
2
d
− (v+s)2
2(s2+σ2
e

Y ∗ )
Y ∗)
d(s2 + σ2
√
1
2

v2 − 2vσY ∗µX(cid:48) + σ2

fX(cid:48)(x)dx

Y ∗(cid:0)µ2
v(v + s) − (2v + s)σY ∗µX(cid:48) + σ2
X(cid:48)(cid:1)(cid:17)
(cid:16)
Y ∗(cid:0)µ2
X(cid:48)(cid:1)(cid:17)
X(cid:48)(cid:1) − σ3
Y ∗(cid:0)µ3
X(cid:48)(cid:1) − 4vσ3
Y ∗(cid:0)µ2
Y ∗(cid:0)µ3
X(cid:48)(cid:1)(cid:17)(cid:105)(cid:27)

2
s
X(cid:48) + 3µX(cid:48)σ2

X(cid:48) + 3σ4

X(cid:48) + 6µ2

X(cid:48) + σ2

X(cid:48) + σ2

X(cid:48)σ2

X(cid:48)

v3

+

4

Y ∗(cid:0)µ2
X(cid:48)(cid:1) + σ4
Y ∗(cid:0)µ4

X(cid:48) + σ2
v4 − 4v3σY ∗µX(cid:48) + 6v2σ2

1
s2
+ 3µX(cid:48)σ2

+

(cid:16)

(cid:104)

d

4s

+ σ2

+

X(cid:48)

(4.42)

62

Hence,

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)ΦV (v) −

(cid:18)

(cid:113)

− (v+s)2
2(s2+σ2

Y ∗)
Y ∗
s2 + σ2

1 − se
2

with a precision as

(cid:113)
− (2 − e−d)e

− (v+s)2
2(s2+σ2

Y ∗)
Y ∗)

4s

πd(s2 + σ2

− (2v + s)σY ∗µX(cid:48) + σ2

X(cid:48) + σ2

v(v + s)

(cid:104)
Y ∗(cid:0)µ2

X(cid:48)(cid:1)(cid:105)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ δ(v, d)

(4.43)

(4.44)

δ(v, d) =

v2 − 2vσY ∗µX(cid:48) + σ2

(cid:104)

(cid:16)

e

− (v+s)2
2(s2+σ2

Y ∗)
Y ∗
s2 + σ2

(cid:113)
− 3v2σY ∗µX(cid:48) + 3vσ2

8sd

4

(cid:16)

X(cid:48) + σ2
v4 − 4v3σY ∗µX(cid:48) + 6v2σ2

Y ∗(cid:0)µ2
X(cid:48)(cid:1) + σ4
Y ∗(cid:0)µ4

1
s2
+ 3µX(cid:48)σ2

+

2
s

X(cid:48) + σ2

(cid:16)
X(cid:48)(cid:1)(cid:17)
Y ∗(cid:0)µ2
X(cid:48)(cid:1)(cid:17)
X(cid:48)(cid:1) − σ3
Y ∗(cid:0)µ3
Y ∗(cid:0)µ2
X(cid:48)(cid:1) − 4vσ3
Y ∗(cid:0)µ3
X(cid:48)(cid:1)(cid:17)(cid:105)

X(cid:48) + 3µX(cid:48)σ2

X(cid:48) + 3σ4

X(cid:48) + σ2

X(cid:48)

X(cid:48) + 6µ2

X(cid:48)σ2

+

v3

and µX(cid:48), σ2

X(cid:48) deﬁned in (4.41). The sharpness of the bound (4.43) is proved in Appendix E.

Let

g(v, d) = v(v + s) − (2v + s)σY ∗µX(cid:48) + σ2

X(cid:48) + σ2

(4.45)

Y ∗(cid:0)µ2

X(cid:48)(cid:1)

be the non-normal correction for ΦV (v). For the sake of simplicity and also without loss of
generality, removing e−d, we have

eV = 1 − e
2

(cid:113)

− (v+s)2
2(s2+σ2

Y ∗)
Y ∗
s2 + σ2

(cid:16)

s +

√
1
πd

s

(cid:17)

g(v, d)

(4.46)

63

Plug in (4.41) to expand g(v, d).

g(v, d) =

=

=

s2
(s2 + σ2
s2
(s2 + σ2
s4
(s2 + σ2

Y ∗)2

Y ∗)2

Y ∗)2

(cid:2)s2v2 + (s3 − sσ2
(cid:104)
s2(cid:0)v +
(cid:0)v +

Y ∗(cid:3)
(cid:1)2 − (s2 − σ2
(cid:1)2 +
Y ∗ − s2)
Y ∗)

Y ∗)v + σ4
s2 − σ2
Y ∗
2s
s2 − σ2
Y ∗
2s

s2(3σ2
4(s2 + σ2

4

Y ∗)2

(cid:105)

Y ∗
+ σ4

(4.47)

is increasing in v when v > − s2−σ2
Y ∗

2s

, so is eV .

Denote the precision term δ(v, d) in (4.44) as

δ(v, d) =

e

(cid:113)

− (v+s)2
2(s2+σ2

Y ∗ )
Y ∗
s2 + σ2

8sd

(cid:16)

(cid:17)

h1(v, d) + h2(v, d) + h3(v, d)

where

h1(v, d) = 4

h2(v, d) =

h3(v, d) =

X(cid:48) + σ2

(cid:104)
(cid:104)
v2 − 2vσY ∗µX(cid:48) + σ2
(cid:104)
v3 − 3v2σY ∗µX(cid:48) + 3vσ2
v4 − 4v3σY ∗µX(cid:48) + 6v2σ2

X(cid:48)(cid:1)(cid:105)
Y ∗(cid:0)µ2
Y ∗(cid:0)µ2
X(cid:48)(cid:1) − σ3
Y ∗(cid:0)µ3
Y ∗(cid:0)µ2
X(cid:48)(cid:1) − 4vσ3
Y ∗(cid:0)µ3
X(cid:48)(cid:1)(cid:105)

X(cid:48) + 3σ4

Y ∗(cid:0)µ4

X(cid:48) + σ2

X(cid:48) + 6µ2

X(cid:48)σ2

X(cid:48) + σ2

+ σ4

2
s
1
s2

X(cid:48)(cid:1)(cid:105)
X(cid:48)(cid:1)

X(cid:48) + 3µX(cid:48)σ2

X(cid:48) + 3µX(cid:48)σ2

(4.48)

(4.49)

64

Using the similar expansion in (4.47),

h1(v, d) =

=

h2(v, d) =

h3(v, d) =

4s2
(s2 + σ2
4s4
(s2 + σ2

2

(s2v2 − 2σ2

Y ∗
s

Y ∗)2

Y ∗)3

Y ∗)2

(cid:0)v − σ2
(cid:1)2 +
(cid:104)(cid:0)s6 + s4σ2
Y ∗(cid:1)v − s3σ4
Y ∗(cid:0)s2 + 2σ2
(cid:104)
s8v4 − 4(cid:0)s7σ2
Y ∗(cid:1)v + s4σ4

Y ∗)4
Y ∗ + 4s5σ6

1

Y ∗)

Y ∗sv + s2σ2
Y ∗ + 2σ4
Y ∗
4s2σ2
Y ∗
s2 + σ2
Y ∗ + σ6
Y ∗ + 2s2σ4

Y ∗(cid:1)v3 − 3s5σ2
Y ∗(cid:1)(cid:105)
Y ∗(cid:0)3s2 + 4σ2
Y ∗(cid:1)v3 + 6(cid:0)s8σ2
Y ∗(cid:0)3s4 + 12s2σ2

Y ∗ + 10σ4

Y ∗ + 3sσ8

s(s2 + σ2

+ 3s4σ2

s2(s2 + σ2

− 4(cid:0)3s7σ4

Y ∗v2

Y ∗(cid:1)v2

Y ∗ + 2s6σ4

Y ∗(cid:1)(cid:105)

Plug (4.50) into (4.48), δ(v, d) is

δ(v, d) =

e

− (v+s)2
2(s2+σ2

(cid:104)
+ 2(cid:0)2s10 + 10s8σ2

Y ∗)
Y ∗)

8s3d(s2 + σ2

9
2

s8v4 + 2(cid:0)s9 + 3s5σ4

Y ∗(cid:1)v3

Y ∗ − 5sσ8

Y ∗ + 11s6σ4

Y ∗(cid:0)4s6 + 13s4σ2

Y ∗ + 18s2σ4

+ s4σ2

Y ∗ + 3s3σ6

Y ∗(cid:1)v2 − 2(cid:0)s9σ2
Y ∗(cid:1)(cid:105)

Y ∗ + 10σ6

Y ∗ + 5s7σ4

Y ∗ + 6s5σ6

Y ∗(cid:1)v

(4.50)

(4.51)

which shows that except for the well-expected 1

d, δ(v, d) is free of d.

4.3.2 Simulation Results

Now we choose diﬀerent values for s and σ2

Y ∗ under certain circumstances. Note that

δ(v, d) is homogeneous degree 0 (i.e., dimension-free). Given a standardized data set, the

values of v are one-standard-deviation percentile and two-standard-deviation percentile at

α = 2.5%, 16%. As we can see from Table 4.1, when n is large, the precision of the approxi-

65

mation is quite high because δ(v, d) is extremely small (roughly 0.1% and 0.5% of eV in each

respective scenario). We may lose some degree of accuracy when the signiﬁcance level α is

increasing. Even if n is not large enough compared to the number of explanatory variables k,

our approximation is still reliably precise (later blocks in Table 4.1). The non-normal correc-

tion eV is monotonously increasing when the magnitude of new noise σ2

The largest percentile v appears at s2 is comparable with σ2

Y ∗ becomes bigger.
Y ∗, but overall not aﬀected much

by the the noise scale, which indicates its consistency.

4.4 Discussions

In this chapter, not only do we construct an explicit sequence of closed-form functions

to approximate the true predictive distribution of the response variable, but also we provide

a comprehensive analysis on the approximation based on Wiener’s polynomial chaos. The

approximation is with a great deal of accuracy, and the unbiased estimation convergences in

a relatively-fast fashion.

The biggest advantage of this approximation methodology is to avoid numeric sampling

algorithms such as Markov chain Monte Carlo, which reduces the computational burden to

a certain level [108]. Also, the boundary formulations as well as the asymptotic properties

yield the beneﬁts to closely monitor the performance of predictions.

Second Wiener chaos is a linear space, and because the model is linear, its solution lies in

the same chaos. In many practical situations, however, incomplete or inaccurate statistical

knowledge about parameters’ uncertainties limits the utility of high-order polynomial chaos

expansions [109]. Fortunately, in order to create a ﬁnite-order expansion, we just need

some reliable information on the probability measure that can be represented by a ﬁnite

66

number of moments. One of the possible extensions is to explore the behavior in a nonlinear-

model setting such as involving stochasticity. This may require the derivation of maximum

likelihood estimators under approximated log-likelihood functions [110].

67

α

n

k

s2

0.5% 1000

3

0.15

2.5% 1000

3

0.15

10% 1000

3

0.15

16% 1000

3

0.15

0.5% 200

12

1.50

2.5% 200

12

1.50

10% 200

12

1.50

16% 200

12

1.50

Y ∗
σ2
0.01
0.05
0.13
0.48
1.50
0.01
0.05
0.13
0.48
1.50
0.01
0.05
0.13
0.48
1.50
0.01
0.05
0.13
0.48
1.50
0.15
0.47
1.52
4.33
12.5
0.15
0.47
1.52
4.33
12.5
0.15
0.47
1.52
4.33
12.5
0.15
0.47
1.52
4.33
12.5

v

-0.3944
-0.3944
-0.3944
-0.3934
-0.3934
-0.3943
-0.3945
-0.3939
-0.3949
-0.3949
-0.3945
-0.3949
-0.3941
-0.3946
-0.3932
-0.3944
-0.3945
-0.3949
-0.3974
-0.4025
-1.2763
-1.2763
-1.2763
-1.2763
-1.2450
-1.2787
-1.2787
-1.2738
-1.3031
-1.3031
-1.2783
-1.2783
-1.2763
-1.2867
-1.3063
-1.2780
-1.2709
-1.2765
-1.2769
-1.2775

eV

0.5153
0.5650
0.6310
0.7527
0.8468
0.5153
0.5650
0.6310
0.7527
0.8468
0.5153
0.5650
0.6310
0.7527
0.8468
0.5153
0.5650
0.6310
0.7527
0.8468
0.5211
0.5591
0.6401
0.7386
0.8303
0.5211
0.5591
0.6402
0.7385
0.8303
0.5211
0.5591
0.6401
0.7386
0.8303
0.5211
0.5591
0.6401
0.7386
0.8303

δ(v, d)
0.0005
0.0006
0.0006
0.0007
0.0006
0.0005
0.0006
0.0006
0.0007
0.0006
0.0005
0.0006
0.0006
0.0007
0.0006
0.0005
0.0006
0.0006
0.0007
0.0006
0.0028
0.0033
0.0034
0.0037
0.0033
0.0028
0.0033
0.0034
0.0038
0.0035
0.0028
0.0033
0.0034
0.0037
0.0036
0.0028
0.0033
0.0034
0.0037
0.0035
Y ∗) under diﬀerent sig-

Table 4.1: Approximation results for various combinations of (s2, σ2
niﬁcance levels (α), dimension (n), and number of coeﬃcients (k)

68

Chapter 5

Future Work Directions

In the agriculture study, one way to improve the analysis is to include more data points

from later years. We could also remove the irrelevant explanatory variables from the model,

or to develop a link among maize yield, SPAD, and striga to systematically analyze the

determinants of each individual response. The latter proposal is most deﬁnitely going to

generate more coeﬃcients needed to be estimated [18].

For the climatology project, assigning more proper prior distributions to the model co-

eﬃcients may be able to achieve their estimates all at once. However, it is very likely to be

accompanied with more computational burden. Another possibility is to incorporate a spa-

tial pattern for the natural proxies over time [59], which needs an extensive understandings

of climatic system to propose a more feasible temporal covariance structure, thus simplifying

calculations and bringing more meaningful insights.

There are several directions to extend the work in Chapter 4. For example, we can utilize

Taylor-series expansion with higher order, so to obtain a more statistical power and faster

convergence [111]. Moreover, we can either compare our estimates with the actual MCMC-

method results, or calculate its total-variation distance between chaos terms and the normal

law [112], to see how far the approximations are away from the true values.

69

APPENDICES

70

APPENDIX A

Model Robustness

In (3.4), σC is deﬁned to grow with the RCPs’ magnitudes for each scenario. Here,

we test the model robustness by applying diﬀerent fraction values (i.e., 0, 0.5, 2, 3) in

σC . Apparently, based on the prediction graphs (Figures A.1, A.2, A.3, A.4), the patterns

are quite similar to Figure 3.6 except for reasonable ﬂuctuations when σC increases. The

convergence diagnosis (Figure A.5) shows that the Gibbs samplers indeed converge (and

quite fast). Hence, the projection model is not heavily aﬀected by the added greenhouse gas

uncertainty with various magnitudes (robust in other words).

Figure A.1: Temperature projections’ comparisons (CMIP5) with no σC

71

Figure A.2: Temperature projections’ comparisons (CMIP5) with 1

2σC

Figure A.3: Temperature projections’ comparisons (CMIP5) with 2σC

Figure A.4: Temperature projections’ comparisons (CMIP5) with 3σC

72

Figure A.5: 10,000 MCMC samples of all model parameters (from top to bottom): α1, α0,
βC , βV , βS, β0, σ2

P , σ2
T

73

APPENDIX B

Exponential Decay

The hierarchical stochastic model described in (3.6) is equivalent as follows after iteration:



Pt = atP0 +

= atP0 + (uα)t +

Tt = btT0 +

t−1(cid:88)

i=0

t−1(cid:88)

i=0

t−1(cid:88)

i=0

aiσP εt−i

(cid:1) +

ai(cid:0)α0 + α1Tt−i
t−1(cid:88)
bi(cid:0)β0 + βSSt−i + βV Vt−i + βC Ct−i
t−1(cid:88)

aiσP εt−i

i=0

= btT0 + (vβ)t +

biσT ηt−i

i=0

(cid:1) +

t−1(cid:88)

i=0

biσT ηt−i

(B.1)

Hereinafter, u ∈ MT,2(R) such that ∀t ≤ T :

t−1(cid:88)

i=0

t−1(cid:88)

i=0

aiTt−i

(B.2)

ut,0 =

ai, ut,1 =

and v ∈ MT,4(R) is deﬁnite and same for all three forcings (t ≤ T ) such that

t−1(cid:88)

t−1(cid:88)

t−1(cid:88)

t−1(cid:88)

vt,0 =

ai, vt,1 =

aiSt−i, vt,2 =

aiVt−i, vt,3 =

aiCt−i

(B.3)

i=0

i=0

i=0

i=0

These formalizations show that proxy (P) and temperature (T) at time t, take all tem-

perature and forcings before t into consideration respectively with exponential decay.

Moreover, the time series Pt and Tt are non-stationary, whose expectations depend on t.

74

When t is large enough, assume P0 = T0 = 0 and let σP = 1 − a2, σT = 1 − b2. From now

on, we use [Y(cid:12)(cid:12)X] to represent the conditional probability distribution of the random variable
Y given X. Then the variances of [P(cid:12)(cid:12)T, a, α, σP ] and [T(cid:12)(cid:12)b, β, σT ] are approximately equal

to constants σP and σT separately. Namely, the conditional densities of P and T are two

normal distributions with known covariance structures:

(B.4)



(cid:2)P(cid:12)(cid:12)T, a, α, σP
(cid:2)T(cid:12)(cid:12)b, β, σT

(cid:3) ∼ N(cid:0)uα, σ2
(cid:3) ∼ N(cid:0)vβ, σ2

(cid:1)
(cid:1)

P ΣP

T ΣT

where ΣP and ΣT are the covaraince matrices of AR(1) processes with parameters a and b.

75

APPENDIX C

Posterior-distribution Computation

Recall the prior distributions: normal (N ), inverse gamma (IG), and uniform (U) that

are assigned to the parameters in the model (3.6):

α ∼ N(cid:0)µα, I2
β ∼ N(cid:0)µβ, I4
σ2 ∼ IG(cid:0)q, r(cid:1)
a, b ∼ U(cid:0)0, 1(cid:1)

(cid:1)
(cid:1)

(C.1)

where

α = (α0, α1), β = (β0, βS, βC , βV ), σ2 = (σ2

P , σ2
T )

q = (qP , qT ), r = (rP , rT )

In is an identity matrix of n dimensions. In particular, we also assign the initial values as:

µα = (0, 1)

µβ = (0, 1, 1, 1)

q = (2, 2)

r = (0.1, 0.1)

(C.2)

After some derivations, we obtain the full posterior distributions for all the parameters.

76

Regression coeﬃcients: α, β

(cid:2)α(cid:12)(cid:12)P, T, a, σ2

P

(cid:3) ∝(cid:2)P(cid:12)(cid:12)α, T, a, σ2

P

P

(cid:3)(cid:2)α(cid:12)(cid:12)T, a, σ2
(cid:3)
(cid:0)P − uα(cid:1)(cid:41)
(cid:0)P − uα(cid:1)TΣ−1
αT(cid:0)uTΣ−1
(cid:1)α +
(cid:27)

P u + σ2

P I2

P

− 1
2σ2
P
− 1
2σ2
P
(α − µα)TΩ−1
(cid:1)

∝ exp

(cid:40)
(cid:40)
(cid:26)
∼ N(cid:0)µα, Ωα

∝ exp

∝ exp

−1
2

α (α − µα)

(cid:26)

−1
2

exp

(cid:107)α(cid:107)2

(cid:27)
(cid:41)

αTuTΣ−1
P P

1
σ2
P

where exp means exponential distribution and

(cid:2)β(cid:12)(cid:12)T, b, σ2

T

with



µα =

Ω−1
α =

1
σ2
P
1
σ2
P

ΩαuTΣ−1
P P

P u + I2

uTΣ−1
(cid:3)

T

T

∝ exp

(cid:3) ∝(cid:2)T|β, b, σ2
(cid:40)
(cid:40)
(cid:26)
∼ N(cid:0)µβ, Ωβ

(cid:3)(cid:2)β(cid:12)(cid:12)b, σ2
(cid:0)T − vβ(cid:1)TΣ−1
− 1
βT(cid:0)vTΣ−1
2σ2
T
− 1
2σ2
T
(β − µβ)TΩ−1
(cid:1)

(cid:0)T − vβ(cid:1)(cid:41)
(cid:1)β +
(cid:27)
β (β − µβ)

T v + σ2

∝ exp

∝ exp

−1
2

T I4

T

1
σ2
T



µβ =

Ω−1
β =

ΩβvTΣ−1
T T
vTΣ−1

T v + I4

1
σ2
T
1
σ2
T

77

(C.3)

(C.4)

(C.5)

(C.6)

(cid:26)

exp

(cid:107)β(cid:107)2

−1
2
βTvTΣ−1
T T

(cid:27)
(cid:41)

where dim is short for dimension and

Scale of noise terms: σ2

P , σ2
T

(cid:2)σ2

P

P

∝

(cid:1)

σ2
P

(cid:12)(cid:12)P, T, a, α(cid:3) ∝(cid:2)P(cid:12)(cid:12)σ2
P , T, a, α,(cid:3)(cid:2)σ2
(cid:40)
(cid:19)dimT
(cid:18) 1
(cid:41)
(cid:40)
(cid:19)qP + dimT

σP
× exp

− rP
σ2
P

exp

2 +1

∝

P = qP +
r(cid:48)
P = rP +

(cid:18) 1
∼ IG(cid:0)q(cid:48)
P , r(cid:48)
q(cid:48)
(cid:12)(cid:12)T, b, β(cid:3) ∝(cid:2)T(cid:12)(cid:12)σ2
T , b, β(cid:3)(cid:2)σ2
T |b, β(cid:3)
(cid:40)
(cid:18) 1
(cid:19)dimT
(cid:41)
(cid:40)
(cid:19)qT + dimT
(cid:1)
q(cid:48)

(cid:18) 1
∼ IG(cid:0)q(cid:48)
T , r(cid:48)

T = qT +
r(cid:48)
T = rT +

σT
× exp

− rT
σ2
T

2 +1

− 1
2σ2
T

(cid:40)

σ2
T

exp

exp

∝

∝

T

(cid:2)σ2

T

with

(C.7)

(C.8)

(cid:19)qT +1

dimT

2

(cid:0)P − uα(cid:1)TΣ−1

P

1
2

(cid:0)T − vβ(cid:1)TΣ−1

T

(cid:0)P − uα(cid:1)
(cid:0)T − vβ(cid:1)(cid:41)(cid:18) 1

σ2
T

(cid:20)

− 1
σ2
T

rT +

1
2

(cid:0)T − vβ(cid:1)TΣ−1

T

(cid:0)T − vβ(cid:1)(cid:21)(cid:41)

(C.9)

P |T, a, α(cid:3)

− 1
2σ2
P

(cid:0)P − uα(cid:1)TΣ−1

P

(cid:0)P − uα(cid:1)(cid:41)(cid:18) 1

σ2
P

(cid:19)qP +1

(cid:40)

(cid:20)

rP +

(cid:0)P − uα(cid:1)TΣ−1

P

1
2

(cid:0)P − uα(cid:1)(cid:21)(cid:41)

− 1
σ2
P

exp

dimT

2

(cid:0)T − vβ(cid:1)TΣ−1

(cid:0)T − vβ(cid:1)

(C.10)

T

1
2

78

Autoregressive coeﬃcients: a, b

P

(cid:2)a(cid:12)(cid:12)P, T, α, σ2
(cid:2)b(cid:12)(cid:12)T, β, σ2

T

P

P

∝

(cid:3) ∝(cid:2)P|a, T, α, σ2
1(cid:112)det(ΣP )
(cid:3) ∝(cid:2)T|b, β, σ2
1(cid:112)det(ΣT )

(cid:3)(cid:2)a|T, α, σ2
(cid:40)
(cid:3)(cid:2)b|β, σ2
(cid:40)

(cid:3)
(cid:0)P − uα(cid:1)TΣ−1
(cid:0)T − vβ(cid:1)TΣ−1

− 1
2σ2
P

− 1
2σ2
T

(cid:3)

exp

exp

∝

T

T

P

T

(cid:0)P − uα(cid:1)(cid:41)
(cid:0)T − vβ(cid:1)(cid:41)

(C.11)

where det stands for determinant (of a matrix). Finally, we derive the posterior distribution

for temperature (T), from which to draw samples using Gibbs sampler.

(cid:2)T(cid:12)(cid:12)P, a, b, α, β, σ2

P , σ2
T

(cid:3) ∝(cid:2)P(cid:12)(cid:12)T, a, b, α, β, σ2

P , σ2
T

(cid:3)(cid:2)T|b, β, σ2
(cid:3)
(cid:0)P − uα(cid:1)(cid:41)
(cid:0)P − uα(cid:1)TΣ−1
(cid:0)T − vβ(cid:1)(cid:41)
(cid:0)T − vβ(cid:1)TΣ−1

P

T

T

∝ exp

× exp

(cid:40)

− 1
2σ2
P
− 1
2σ2
T

(C.12)

(cid:40)

t−1(cid:88)
T(cid:88)

i=0

where u relies on T. Let us re-write (P − uα) at time t by expanding u as deﬁned above:

(cid:0)P − uα(cid:1)

t−1(cid:88)

t = Pt − α0

aiTt−i − α1

i=0

at−i1{i≤t} − α1

aiTt−i

T(cid:88)

at−iTi1{i≤t}

(C.13)

= Pt − α0

i=0

i=0

= P − α0Me − α1MT

where M ∈ MT,T (R) with Mi,j = 1{j≤i}ai−j, and e is a vector whose entries are all ones.

79

Therefore,

(cid:2)T(cid:12)(cid:12)P, a, b, α, β, σ2

P , σ2
T

(cid:3) ∝ exp

(cid:40)

× exp

(cid:40)

∝ exp

(cid:0)P − α0Me − α1MT(cid:1)(cid:41)
(cid:41)

(cid:41)

TTΣ−1

T vβ

1
σ2
T

(cid:40)

− 1
2σ2
P
− 1
2σ2
T

(cid:0)P − α0Me − α1MT(cid:1)TΣ−1
(cid:0)T − vβ(cid:1)(cid:41)
(cid:0)T − vβ(cid:1)TΣ−1
(cid:17)
TT(cid:16) α2

MTΣ−1

Σ−1

P

T

T

−1
2

(cid:40)

1
σ2
P

1
σ2
T

T

P M +

(cid:0)P − α0Me(cid:1) +
(cid:1)(cid:27)

T (T − µT

P

× exp

α1
σ2
P
−1
2

(cid:26)
∼ N(cid:0)µT , ΩT

∝ exp

TTMTΣ−1

(cid:0)T − µT )TΩ−1
(cid:1)

with



µT =

Ω−1
T =

α1
σ2
P
α2
1
σ2
P

(cid:0)P − α0Me(cid:1) +

ΩT Σ−1

T vβ

1
σ2
T

P

ΩT MTΣ−1
MTΣ−1

P M +

Σ−1

T

1
σ2
T

(C.14)

(C.15)

Theorem C.0.1. Assume

X =

 ∼ N

X1

X2

 ,

(cid:32)µ1

µ2

 Σ1 Σ12

Σ21 Σ2

(cid:33)

where Σ21 = ΣT

12. Then

(cid:2)X1

(cid:12)(cid:12)X2

(cid:3) ∼ N(cid:16)

(cid:0)X2 − µ2

(cid:1), Σ1 − Σ12Σ−1

2 Σ21

(cid:17)

µ1 + Σ12Σ−1

2

Consider X1 = T1 (past) and X2 = T2 (calibration). According to Theorem C.0.1, the

80

posterior distribution of past temperature given the data is:

(cid:12)(cid:12)T2, P, a, b, α, β, σ2
(cid:0)Ω
(cid:0)Ω

(1)
T + Ω

(1)(2)
T

− Ω

(1)(2)
T

P , σ2
T

(2)(2)
T

(cid:3) ∼ N(cid:0)µ, Ω(cid:1)
(cid:1)−1(cid:0)T2 − µ
(cid:1)−1(cid:0)Ω

(2)
T

(2)(1)
T

(2)(2)
T

Ω = Ω

(1)(1)
T

(cid:2)T1

µ = µ



(cid:1)
(cid:1)

(C.16)

(i)(j)
where Ω
T

(i = 1, 2 and j = 1, 2) is the partitioned matrix. The projection (denoted by

T3) can be done via the second level in (3.6) after all climate forcings extended to Year 2100.

81

APPENDIX D

Proof of Theorem 4.3.5

Proof. Let H(x) = e

− 1
2 [

u
s(1+ x√
2d

)

+1]2

. The ﬁrst and second derivative of H(x) are

− 1
2 [

H(cid:48)(x) = e

u
s(1+ x√
2d

)

+1]2

− 1
2 [

= e

u
s(1+ x√
2d

)

+1]2

:= H(x)H1(x)

×(cid:16) −(cid:2)

u

s(1 + x√
2d

)

√
× u

s

2d

1
(1 + x√
2d

)2

+ 1(cid:3)(cid:17) × u
(cid:2)

u

s

(cid:2) −
+ 1(cid:3)

s(1 + x√
2d

)

1
(1 + x√
2d

(cid:3)

1√
2d

)2

H(cid:48)(cid:48)(x) = H(cid:48)(x)H1(x) + H(x)H(cid:48)

1 (x) + H(cid:48)

1(x)(cid:3)
+ 1(cid:3)2
+ 2(cid:2)

+ 1]2

2s2d(1 + x√
2d

s(1 + x√
2d

)

)4

u2

1(x) = H(x)(cid:2)H2
(cid:2)
(cid:16)
(cid:26) u[
+ 1(cid:3)(cid:17)(cid:27)

)3

)3

u

u

u

2sd(1 + x√
2d

+ 2(cid:2)

2sd(1 + x√
2d

s(1 + x√
2d

)

u

u

)

s(1 + x√
2d
u
s(1+ x√
)
2d
s(1 + x√
2d

)

− 1
2 [

= e

u2
s(1+ x√
2d

)

+1]2

− 1
2 [

u2
s(1+ x√
2d

)

+1]2

− e

− 1
2 [

u2
s(1+ x√
2d

)

+1]2

= e

−(cid:16)

u

s(1 + x√
2d

)

+ 1(cid:3)(cid:17)

)

(D.1)

u

s(1 + x√
2d

82

Use Taylor expansion on H(x) at point a (a ∈ IR),

H(x) =

H(n)(a)

(x − a)n

n!

∞(cid:88)

0
− u2
2s2

= e

1
(1+ a√
2d

)2

− u2
2s2

1
(1+ a√
2d

)2

+ e

(x − a)

u2
√

s2

2d

1
(1 + a√
2d

)3 + R1(x)

where R1(x) is the mean-value (or Lagrange) form of the remainder:

R1(x) =

H(cid:48)(cid:48)(ξ)

2!

(x − a)2
− u2
2s2
e

(x − a)2

=

2

1
ξ√
2d

(1+

)2 u2
2s2d

1
(1 + ξ√
2d

)4

(cid:104)

u2
s2(1 + ξ√
2d

)2

(cid:105)

− 3

for ξ ∈ [a, x]. When a = 0 (i.e., Maclaurin series),

2s2 (cid:104)

− (u+s)2

+ 1(cid:1)(cid:105)

(cid:0) u

s

+ RH (x)

(D.2)

1 +

√
ux

2d

s

H(x) = e

and

RH (x) =

x2
2!

H(cid:48)(cid:48)(ξ)
− 1
2 [

=

e

x2
2

−(cid:16)

u2
ξ√
2d

s(1+

+1]2

)

u

+ 2(cid:2)

2sd(1 + ξ√
2d

u

s(1 + ξ√
2d

)

u

s(1 + ξ√
2d

)

(cid:26) u[
+ 1(cid:3)(cid:17)(cid:27)

)3

+ 1]2

u
ξ√
s(1+
)
2d
s(1 + ξ√
2d

)

(D.3)

Note that

(cid:90) ∞
−√

− x2
e
2 e

1√
2π

2d

− [u+s(1+ x√
2d
2s2(1+ x√
2d

)]2
)2

dx,

Φ2(u, d) = 1 − 1
2

83

d→∞ Φ2(u, d) = 1 − 1

lim

2

− (u+s)2
2s2

e

Hence, plug in (D.2) and by triangle inequality,

(cid:12)(cid:12)(cid:12)Φ2(u, d) −(cid:0)1 − 1

2s2 (cid:1)(cid:12)(cid:12)(cid:12) ≤ 1

− (u+s)2

e

2

(cid:12)(cid:12)(cid:12)dx

− (u+s)2
2s2

− e

+ 1(cid:1)dx

(cid:90) ∞
(cid:90) ∞
−√
(cid:90) ∞
−√
(cid:90) ∞
−√
−√

2d

2d

2d

2d

1√
2π
1√
2π
1√
2π
1√
2π

− x2
e
2

− x2
e

+1]2

)

− 1
2 [

u
s(1+ x√
2d

(cid:12)(cid:12)(cid:12)e
2 (cid:12)(cid:12)H(x) − H(0)(cid:12)(cid:12)dx
(cid:0) u
2 (cid:12)(cid:12)RH (x)(cid:12)(cid:12)dx

u|x|
√
2d
s

− (u+s)2
2s2

s

− x2
2 e
e

− x2
e

2
≤ 1
2

≤ 1
2

+

1
2

:= Φ21(u, d) + Φ22(u, d)

(cid:16)(cid:90) ∞
(cid:90) 0
−√

0

2d

1√
2π
1√
2π

− x2
2 e
e

− (u+s)2
2s2

s
− (u+s)2

− x2
2 e
e

Φ21(u, d) =

1
2

+

√
1
2π
2

− (u+s)2
2s2
√
u(u + s)e
2d
− (u+s)2
2s2
√
u(u + s)e
4s2

(−e

s2

πd
− (u+s)2
2s2
√
u(u + s)e
4s2

πd

=

=

=

s

+ 1(cid:1)dx
(cid:0) u
√
ux
(cid:17)
(cid:0) u
+ 1(cid:1)dx
2d
2s2 −ux
√
(cid:90) √
(cid:16)(cid:90) ∞
2d
(cid:12)(cid:12)(cid:12)√
(cid:12)(cid:12)(cid:12)∞

+ (−e

− x2
2 )

2 dx +

− x2

xe

s

s

0

0

0

2d

(cid:17)

2d

(cid:17)

− x2

2 dx

xe

(D.4)

H(cid:48)(cid:48)(ξ) is continuous for all ξ ∈ [0, x], x ∈ IR and

(cid:12)(cid:12)H(cid:48)(cid:48)(ξ)(cid:12)(cid:12) ≤ e

− (u+s)2
2s2

− (u+s)2
2s2

= e

s + 1)2
s

−(cid:104) u
+ 1(cid:1)2 − 3u

s

s

+ 1(cid:1)(cid:105)(cid:12)(cid:12)(cid:12)(cid:12)

(D.5)

+ 2(cid:0) u
(cid:12)(cid:12)(cid:12)

− 2

s

0

− x2
2 )

(cid:16)
(cid:0)2 − e−d(cid:1)
(cid:12)(cid:12)(cid:12)(cid:12) u( u
(cid:12)(cid:12)(cid:12) u
(cid:0) u

u
2sd

u
2sd

s

s

84

(cid:26) u[

u

2sd(1 + ξ√
2d

)3

+ 1]2

u
ξ√
s(1+
)
2d
s(1 + ξ√
2d

)

1√
2π

− x2
e
2

x2
2

dx

(D.6)

− (u+s)2
2s2
e

(D.7)

s + 1)2(cid:3)

8s2d

2s2 (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) ≤ u2(cid:2)3 + ( u
(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)
(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)

− (u+s)2
2s2
e

− (u+s)2
2s2

√

πd

4s2

Plug (4.36), (D.5) into (D.3),

Φ22(u, d) =

(cid:90) ∞
−√
−(cid:16)

1
2

u2
ξ√
2d

)

s(1+

+1]2

+ 1(cid:3)(cid:17)(cid:27)
(cid:12)(cid:12)(cid:12)(cid:16)(cid:90) ∞

0

− 2

dx

s(1 + ξ√
2d

)

1√
2π

2d

u

s(1 + ξ√
2d
− (u+s)2
2s2

4sd

(cid:90) 0
−√
− (u+s)2
2s2

2d

1√
2π

− 1
2 [
e

− x2
2

e

x2
2

)

u

+ 2(cid:2)
+ 1(cid:1)2 − 3u
(cid:0) u
(cid:17)
+ 1(cid:1)2(cid:105)

(cid:12)(cid:12)(cid:12) u
(cid:104)
3 +(cid:0) u

s
− x2
2

x2
2

dx

s

s

e

s

≤ ue

+

≤ u2e

8s2d

From (D.4) and (D.6),

(cid:12)(cid:12)(cid:12)(cid:12)Φ2(u, d) −(cid:16)

1 − 1
2

e

− (u+s)2

2s2 − u(u + s)(2 − e−d)

√

e

4s2

πd

− (u+s)2

Therefore,

1 − 1
2

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)ΦU (u) −
(cid:18)
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Φ2(u, d) −
(cid:18)
s + 1)2(cid:3)
≤ u2(cid:2)3 + ( u

≤

8s2d

1 − 1
2
− (u+s)2
2s2

e

e

√

− (u+s)2

2s2 − u(u + s)(2 − e−d)
2s2 − u(u + s)(2 − e−d)
− (u+s)2
e

4s2

πd

e

:= δ2(u, d)

85

APPENDIX E

Boundaries Justiﬁcation

Moreover, let G(x) = e

− u2
2s2

1
(1+ x√
2d

)2

.

− u2
2s2

G(cid:48)(x) = e

1
(1+ x√
2d

)2 ×(cid:0) − u2

2s2

(cid:1)(cid:104) − 2

(cid:105)

1
(1 + x√
2d

)3

1√
2d

− u2
2s2

= e

1
(1+ x√
2d

:= G(x)G1(x)

)2 × u2
√
2d

s2

1
(1 + x√
2d

)3

G(cid:48)(cid:48)(x) = G(cid:48)(x)G1(x) + G(x)G(cid:48)
)2(cid:20)(cid:16) u2

− u2
2s2

1
(1+ x√
2d

= e

1(x)

√
2d

s2

1
(1 + x√
2d

(cid:17)2 − 3u2

√
2d

s2

− u2
2s2

= e

1
(1+ x√
2d

)2 × u2
2s2d

1
(1 + x√
2d

)4

u2

s2(1 + x√
2d

)3

(cid:104)

1
(1 + x√
2d

(cid:105)
)2 − 3

(cid:21)

1√
2d

)4

(E.1)

Then the Maclaurin series of G(x) is

2s2(cid:16)

− u2

(cid:17)

u2x
√
s2
2d

+ RG(x)

(E.2)

G(x) = e

1 +

86

and the Lagrange form of the remainder for ξ ∈ [0, x] is

RG(x) =

=

G(cid:48)(cid:48)(ξ)
− u2
2s2
e

x2
2!

x2
2

1
ξ√
2d

(1+

)2 u2
2s2d

1
(1 + ξ√
2d

)4

(cid:104)

u2
s2(1 + ξ√
2d

)2

(cid:105)

− 3

, ξ ∈ [0, x]

(E.3)

Recall

Φ1(u, d) = 1 − 1
2

(cid:90) ∞
−√
⇒ Φ1(u, d) −(cid:0)1 − 1

− u2
2s2

− x2
2 e
e

1
(1+ x√
2d

)2

dx,

1√
2π

2d

2s2(cid:1) =

− u2
e

(cid:90) ∞
−√

1
2

− x2
2

e

1√
2π

2d

(cid:104)

e

2

− u2
2s2
e

2

lim

d→∞ Φ1(u, d) = 1 − 1
)2(cid:105)

− u2
2s2

1
(1+ x√
2d

− u2
2s2 − e

dx

By triangle inequality,

(cid:12)(cid:12)(cid:12)Φ1(u, d) −(cid:0)1 − 1

2

e

2s2(cid:1)(cid:12)(cid:12)(cid:12) ≤ 1

− u2

2
≤ 1
2
≤ 1
2

+

2d

(cid:90) ∞
(cid:90) ∞
−√
(cid:90) ∞
−√
(cid:90) ∞
−√
−√

1
2

2d

2d

2s2(cid:12)(cid:12)(cid:12)dx

− u2

)2 − e

− x2
e
2

− x2
e

− u2
2s2

1
(1+ x√
2d

(cid:12)(cid:12)(cid:12)e
2 (cid:12)(cid:12)G(x) − G(0)(cid:12)(cid:12)dx
2s2 u2|x|
− u2
√
2 (cid:12)(cid:12)RG(x)(cid:12)(cid:12)dx

dx

s2

2d

− x2

e

− x2
2 e
e

1√
2π
1√
2π
1√
2π
1√
2π

2d

:= Φ11(u, d) + Φ12(u, d)

87

2s2 u2(−x)
− u2

√

s2

2d

(cid:17)

dx

(E.4)

(cid:16)(cid:90) ∞

Φ11(u, d) =

=

=

=

0

e

2d

1
2

− x2
2 e

(cid:16)(cid:90) ∞
(cid:12)(cid:12)(cid:12)∞

1√
2π
− u2
u2e
2s2
√
√
1
s2
2π
2
− u2
2s2
√
πd
− u2
2s2
√
πd

(cid:16)
(cid:0)2 − e−d(cid:1)

− x2
2 )

u2e
4s2

u2e
4s2

(−e

0

0

− u2
2s2 u2x
√
2d

s2

dx +

− x2
2 e
e

1√
2π

2d

− x2

2 dx +

xe

− x2

2 dx

xe

(cid:17)

+ (−e

− x2
2 )

(cid:90) 0
−√
(cid:90) √
(cid:12)(cid:12)(cid:12)√

(cid:17)

2d

2d

0

0

For all ξ ∈ [0, x], x ∈ IR, G(cid:48)(cid:48)(ξ) is continuous and

(cid:12)(cid:12)G(cid:48)(cid:48)(ξ)(cid:12)(cid:12) ≤ e

− u2
2s2 u2
2s2d

Applying the same calculation in (D.6) to Φ12(u, d),

(cid:12)(cid:12)(cid:12)u2

s2 − 3

(cid:12)(cid:12)(cid:12)

(E.5)

(cid:12)(cid:12)(cid:12)(cid:12)dx

− 3

(cid:12)(cid:12)(cid:12)(cid:12)

)4

u2
s2(1 + ξ√
2d

)2

(cid:17)

dx

1
(1 + ξ√
2d

(cid:90) 0
−√

1√
2π

− x2
2

e

x2
2

dx +

1√
2π

− x2
2

e

x2
2

2d

1
ξ√
2d

(1+

)2 u2
2s2d

− u2
2s2
e

|x|2
2

(cid:12)(cid:12)(cid:12)(cid:16)(cid:90) ∞

0

2s2(cid:17)(cid:12)(cid:12)(cid:12)(cid:12) ≤ u2(3 + u2

8s2d

s2 )

− u2
e

(E.6)

(E.7)

− u2
2s2

e

Φ12(u, d) =

1
2

− x2
e
2

(cid:90) ∞
−√
− u2
2s2 u2
2s2d

2d

1√
2π

(cid:12)(cid:12)(cid:12) u2

s2 − 3
(cid:17)

u2
s2

3 +

≤ 1
2

e

8s2d

≤ u2e

− u2
2s2

(cid:16)
(cid:12)(cid:12)(cid:12)(cid:12)Φ1(u, d) −(cid:16)

So

1 − 1
2

2s2 − u2(2 − e−d)
− u2
e

√
πd

4s2

88

From (E.7),

(cid:12)(cid:12)(cid:12)(cid:12)ΦU (u) −(cid:16)
(cid:12)(cid:12)(cid:12)(cid:12)Φ1(u, d) −(cid:16)

≥
≥ u2(3 + u2
s2 )

8s2d

e

1 − 1
2
1 − 1
2

− u2

√

2s2 − u2(2 − e−d)
− u2
2s2 − u2(2 − e−d)
− u2

πd
√

4s2

e

e

e

4s2

πd

2s2(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)
2s2(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)

− u2

− u2
e

2s2 := δ1(u, d)

which shows that δ2(u, d) in Theorem 4.3.5 is sharp enough.

By the similar computation in (4.39) and (4.42),

ΦV (v) ≥ Ex

1 − 1
2

e

(cid:104)

2s2

2s2

− x2

e

2 dx − Ex

− (v−xσY ∗ )2

(cid:105)
− δ1(v − xσY ∗, d)
(cid:90) ∞
(cid:2)δ1(v − xσY ∗, d)(cid:3)
− (v−xσY ∗ )2
(cid:2) s2+σ2
(cid:1)2(cid:3)
−∞ e
(cid:90) ∞
(cid:2)δ1(v − xσY ∗, d)(cid:3)
(cid:113)

(cid:2)δ1(v − xσY ∗, d)(cid:3)

(cid:0)x− vσY ∗

1√
2π
Y ∗

Y ∗) − Ex

Y ∗
s2+σ2

1√
2π

2(s2+σ2

2(s2+σ2

Y ∗)

−∞

−

e

−

e

− 1
2

e

dx

v2

v2

s2

= 1 − 1
2

= 1 − 1
2
− Ex

= 1 −

s

2

Y ∗
s2 + σ2

The density function of X ∼ N (µX , σ2

X ) is

− (x−µX )2

2σ2
X

√
1

e

2π

σX

fX (x) =



µX =

σ2
X =

vσY ∗
Y ∗
s2 + σ2

s2

Y ∗
s2 + σ2

89

(E.8)

(cid:2)δ1(v − xσY ∗, d)(cid:3) =

⇒ Ex

(cid:90) ∞

−∞

(cid:17)

d

(v − xσY ∗)2fX (x)dx

(cid:16)2 − e−d√

π

2s2

− (v−xσY ∗)2
(v − xσY ∗)2e
√
(cid:17)e
4s2
(v−xσY ∗ )2
√
(cid:20)(cid:90) ∞
2

d
− x2
2√
2π

s2
d
v2

dx

(cid:16)2 − e−d√

−∞

π

Y ∗)
(v − xσY ∗)4fX (x)dx

+

√
3
2

(cid:21)
(cid:17)(cid:104)

3 +

+

4s

+

=

=

2(s2+σ2

Y ∗ )
d(s2 + σ2
√
1
2s2
v2

d

e

−

(cid:113)
(cid:90) ∞

−∞
−

(cid:113)

e

2(s2+σ2

Y ∗ )
Y ∗)
d(s2 + σ2
√
1
2s2

(cid:1)(cid:105)

+

d

Y ∗(cid:0)µ3

− 4vσ3

4s

+ σ2
X

(cid:26)(cid:16)2 − e−d√
(cid:104)

π

+

√
3
2
d

v4 − 4v3σY ∗µX + 6v2σ2

(cid:1) + σ4
Y ∗(cid:0)µ4

X + 3µX σ2
X

Y ∗(cid:0)µ2
v2 − 2vσY ∗µX + σ2
(cid:1)
(cid:1)(cid:105)(cid:27)

Y ∗(cid:0)µ2

X + σ2
X

X

X + 6µ2

X σ2

X + 3σ4
X

Therefore,

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)ΦV (v) −

(cid:18)

−

(cid:113)

v2

2(s2+σ2

Y ∗)
Y ∗
s2 + σ2

1 − se
2

δ(cid:48)(v, d) =

e

−

v2

2(s2+σ2

Y ∗ )
Y ∗
s2 + σ2

(cid:113)
Y ∗(cid:0)µ2

8sd

+ 6v2σ2

X + σ2
X

(cid:104)

(cid:16)
(cid:1) − 4vσ3
Y ∗(cid:0)µ3

v2 − 2vσY ∗µX + σ2

3

(E.9)

(E.10)

−

(cid:113)
− (2 − e−d)e

4s

πd(s2 + σ2
− 2vσY ∗µX + σ2

(cid:104)

v2

v2

2(s2+σ2

Y ∗ )
Y ∗)

Y ∗(cid:0)µ2
Y ∗(cid:0)µ2

X + σ2
X

X + σ2
X

(cid:1)(cid:17)
(cid:1) + σ4
Y ∗(cid:0)µ4

(cid:1)(cid:105)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ δ(cid:48)(v, d)
(cid:16)

+

1
s2

v4 − 4v3σY ∗µX

(cid:1)(cid:17)(cid:105)

X + 3µX σ2
X

X + 6µ2

X σ2

X + 3σ4
X

and µX , σ2

X deﬁned in (E.8), which implies the bounds of ΦV (v) cannot do any better.

(E.11)

90

BIBLIOGRAPHY

91

BIBLIOGRAPHY

[1] Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models.

Bayesian Analysis, 1(3), 515-533.

[2] Haller, H. and Krauss, S. (2002). Misinterpretations of signiﬁcance: a problem students

share with their teachers. Methods of Psychological Research, 7(1), 1-20.

[3] McShane, B. B. and Gal, D. (2017). Statistical signiﬁcance and the dichotomization

of evidence. Journal of the American Statistical Association, 112(519), 885-895.

[4] Dunson, D. B. (2001). Commentary: practical advantages of Bayesian analysis of epi-

demiologic data. American journal of Epidemiology, 153(12), 1222-1226.

[5] Baldos, U. L. C., Viens, F. G., Hertel, T. W., and Fuglie, K. O. (2019). R&D spend-
ing, knowledge capital, and agricultural productivity growth: a Bayesian approach.
American Journal of Agricultural Economics, 101(1), 291-310.

[6] Zhang, Z., Hamagami, F., Wang, L., Nesselroade, J. R., and Grimm, K. J. (2005).
Bayesian analysis of longitudinal data using growth curve models. International Journal
of Behavioral Development, 31(4), 374-383.

[7] Neufcourt, L., Cao, Y., Nazarewicz, W., and Viens, F. (2018). Bayesian approach to
model-based extrapolation of nuclear observables. Physical Review C, 98(3), 034318.

[8] Wiener, N. (1938). The homogeneous chaos. American Journal of Mathematics, 60(4),

897-936.

[9] Nourdin, I. and Peccati, G. (2009). Noncentral convergence of multiple integrals. The

Annals of Probability, 37(4), 1412-1426.

[10] Lobell, D. B., Cassman, K. G., and Field, C. B. (2009). Crop yield gaps: their im-
portance, magnitudes, and causes. Annual Review of Environment and Resources, 34,
179-204.

[11] Fermont, A. M., van Asten, P. J. A., Tittonell, P., van Wijk, M. T., and Giller, K.
E. (2009). Closing the cassava yield gap: an analysis from smallholder farms in East
Africa. Field Crops Research, 112(1), 24-36.

[12] Tamene, L., Mponela, P., Ndengu, G., and Kihara, J. (2016). Assessment of maize
yield gap and major determinant factors between smallholder farmers in the Dedza
district of Malawi. Nutrient Cycling in Agroecosystems, 105, 291-308.

92

[13] van Dijk, M., Morley, T., Jongeneel, R., van Ittersum, M., Reidsma, P., and Ruben,
R. (2017). Disentangling agronomic and economic yield gaps: an integrated framework
and application. Agricultural Systems, 154, 90-99.

[14] Liu, Z., Yang, X., Hubbard, K. G., and Lin, X. (2012). Maize potential yields and
yield gaps in the changing climate of northeast China. Global Change Biology, 18(11),
3441-3454.

[15] Tittonell, P. and Giller, K. E. (2013). When yield gaps are poverty traps: The paradigm
of ecological intensiﬁcation in African smallholder agriculture. Field Crops Research,
143, 76-90.

[16] Snapp, S. S. (1998). Soil nutrient status of smallholder farms in Malawi. Communica-

tions in Soil Science and Plant Analysis, 29(17-18), 2571-2588.

[17] Batie, S. S. (2008). Wicked problems and applied economics. American Journal of

Agricultural Economics, 90(5), 1176-1191.

[18] Wang, H., Snapp, S. S., Fisher, M., and Viens, F. (2019). A Bayesian analysis of
longitudinal farm surveys in central Malawi reveals yield determinants and site-speciﬁc
management strategies. PLoS ONE, 14(8), e0219296.

[19] Mungai, L. M., Snapp, S., Messina, J. P., Chikowo, R., Smith, A., Anders, E., Richard-
son, R. B., and Li, G. (2016). Smallholder farms and the potential for sustainable
intensiﬁcation. Frontiers in Plant Science, 7, 1720.

[20] Lowole, M. W. (1983). Soil Map of Malawi. Department of Agricultural Research,

Lilongwe, Malawi.

[21] Gin´e, X., Townsend, R., and Vickery, J. (2007). Statistical analysis of rainfall insurance
payouts in southern India. American Journal of Agricultural Economics, 89(5), 1248-
1254.

[22] Diaconis, P. and Ylvisaker, D. (1979). Conjugate priors for exponential families. The

Annals of Statistics, 7(2), 269-281.

[23] Beukes, I., Rose, L. J., Shephard, G. S., Flett, B. C., and Viljoen, A. (2017). Mycotox-
igenic Fusarium species associated with grain crops in South Africa – a review. South
African Journal of Science, 113(3-4), 1-12.

[24] Sauer, J. and Tchale, H. (2009). The economics of soil fertility management in Malawi.

Review of Agricultural Economics, 31(3), 535-560.

[25] Marenya, P. P. and Barrett, C. B. (2009). State-conditional fertilizer yield response on
western Kenyan farms. American Journal of Agricultural Economics, 91(4), 991-1006.

93

[26] Scharf, P. C., Wiebold, W. J., and Lory, J. A. (2002). Corn yield response to nitrogen

fertilizer timing and deﬁciency level. Agronomy Journal, 94(3), 435-441.

[27] Kabambe, V. H., Nambuzi, S. C., and Kauwa A. E. (2008). Integrated management
of witchweed (Striga asiatica [L.] Kuntze) by means of maize-legume rotations and
intercropping systems in Malawi. Bunda Journal of Agriculture, Environmental Science
and Technology, 3(2), 35-42.

[28] Berner, D. K., Kling, J. G. and Singh, B. B. (1995). Striga research and control: a

perspective from Africa. Plant Disease, 79(7), 652-660.

[29] Jamil, M., Kanampiu, F. K., Karaya, H., Charnikhova, T., and Bouwmeester, H. J.
(2012). Striga hermonthica parasitism in maize in response to N and P fertilisers. Field
Crops Research, 134, 1-10.

[30] Sauerborn, J., Kranz, B., and Mercer-Quarshie, H. (2003). Organic amendments mit-
igate heterotrophic weed infestation in savannah agriculture. Applied Soil Ecology,
23(2), 181-186.

[31] Atera, E. A., Itoh, K., Azuma, T. and Ishii, T. (2012). Farmers perception and con-
straints to the adoption of weed control option: the case of Striga asiatica in Malawi.
The Journal of Agricultural Science, 4(5), 41.

[32] Shaxson, L. and Riches, C. (1998). Where once there was grain to burn: a farming

system in crisis in eastern Malawi. Outlook on Agriculture, 27(2), 101-105.

[33] Snapp, S. S. , Blackie, M. J., and Donovan, C. (2003). Realigning research and exten-

sion to focus on farmers’ constraints and opportunities. Food Policy, 28(4), 349-363.

[34] Stocker, T. F., Qin, D., Plattner, G.-K., Tignor, M., Allen, S. K. , Boschung, J.,
Nauels, A., Xia, Y., Bex V., and Midgley, P. M. (2013). Climate change 2013: the
physical science basis. Working group I contribution to the ﬁfth assessment report of
the Intergovernmental Panel on Climate Change. Cambridge, New York: Cambridge
University Press.

[35] Li, B., Nychka, D. W., and Ammann, C. M. (2010). The value of multiproxy recon-
struction of past climate. Journal of the American Statistical Association, 105(491),
883-895.

[36] Grotch, S. L. and MacCracken, M. C. (1991). The use of general circulation models to

predict regional climatic change. Journal of Climate, 4, 286-303.

[37] Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouﬀer, R. J., and
Taylor, K. E. (2016). Overview of the coupled model intercomparison project phase

94

6 (CMIP6) experimental design and organization. Geoscientiﬁc Model Development,
9(5), 1937-1958.

[38] Matthes, K., Andersson, F. B., Barnard, M. E., Beer, L., Charbonneau, J., Clilverd,
P., Dudok de Wit, M. A., Haberreiter, T., Hendry, M., Jackman, A., Kretzschmar, C.
H., Kruschke, M., Kunze, T., Langematz, M., Marsh, U.,Maycock, D. R., Misios, A.
C. (2017). Solar Forcing for CMIP6 (v3.2). Geoscientiﬁc Model Development, 10(6),
2247-2302.

[39] Jones, P. D., Briﬀa, K. R., Osborn, T. J., Lough, J. M., van Ommen, T. D., Vinther,
B. M., Luterbacher, J., Wahl, E. R., Zwiers, F. W., Mann, M. E., Schmidt, G. A.,
Ammann, C. M., Buckley, B. M., Cobb, K. M., Esper, J., Goosse, H., Graham, N.,
Jansen, E., Kiefer, T., Kull, C., and K¨utte, M. (2009). High-resolution palaeoclima-
tology of the last millennium: a review of current status and future prospects. The
Holocene, 19, 3-49.

[40] Werner, J. P. and Tingley, M. P. (2015). Technical note: probabilistically constraining
proxy agedepth models within a Bayesian hierarchical reconstruction model. Climate
of the Past, 11, 533-545.

[41] Mann, M. E., Zhang, Z., Hughes, M. K., Bradley, R. S., Miller, S. K., Rutherford, S.,
and Ni, F. (2008). Proxy-based reconstructions of hemispheric and global surface tem-
perature variations over the past two millennia. Proceedings of the National Academy
of Sciences, 105(36), 13252-13257.

[42] Mann, M. E., Bradley, R. S., and Hughes, M. K. (1998). Global-scale temperature

patterns and climate forcing over the past six centuries. Nature, 392, 779787.

[43] Yu, Z., Chu, P., and Schroeder, T. (1997). Predictive skills of seasonal to annual
rainfall variations in the U.S. aﬃliated Paciﬁc islands: canonical correlation analysis
and multivariate principal component regression approaches. Journal of Climate, 10,
2586-2509.

[44] Smerdon, J. E., Kaplan, A., Chang, D., and Evans, M. N. (2010). A pseudoproxy
evaluation of the CCA and RegEM methods for reconstructing climate ﬁelds of the
last millennium. Journal of Climate, 23, 4856-4880.

[45] Werner, J. P., Luterbacher, J., and Smerdon, J. E. (2013). A pseudoproxy evaluation
of Bayesian hierarchical modeling and canonical correlation analysis for climate ﬁeld
reconstructions over Europe. Journal of Climate, 26, 851-867.

[46] Schneider, T. (2001). Analysis of incomplete climate data: estimation of mean values
and covariance matrices and imputation of missing values. Journal of Climate, 14,
853-871.

95

[47] Rutherford, S., Mann, M. E., Delworth, T. L., and Stouﬀer, R. J. (2003). Climate
ﬁeld reconstruction under stationary and nonstationary forcing. Journal of Climate,
16, 462-479.

[48] Zhang, Z., Mann, M. E., and Cook, E. R. (2004). Alternative methods of proxy-based
climate ﬁeld reconstruction: application to summer drought over the conterminous
United States back to AD 1700 from tree-ring data. The Holocene, 14(4), 502-516.

[49] Lee, T. C. K., Zwiers, F. W., and Tsao, M. (2008). Evaluation of proxy-based millennial

reconstruction methods. Climate Dynamics, 31, 263-281.

[50] Christiansen, B., Schmith, T., and Thejll, P. (2009). A surrogate ensemble study of
climate reconstruction methods: stochasticity and robustness. Journal of Climate, 22,
951-976.

[51] Jones, P. D., Briﬀa, K. R., Barnett, T. P., and Tett, S. F. B. (1998). High-resolution
palaeoclimatic records for the last millennium:
interpretation, integration and com-
parison with general circulation model control-run temperatures. The Holocene, 8(4),
455-471.

[52] Mann, M. E., Bradley, R. S., and Hughes, M. K. (1999). Northern hemisphere tem-
peratures during the past millennium: inferences, uncertainties, and limitations. Geo-
physical Research Letters, 26(6), 759-762.

[53] Tingley, M. P., Craigmile, P. F., Haran, M., Li, B., Mannshardt, E., and Rajaratnam,
B. (2012). Piecing together the past: statistical insights into paleoclimatic reconstruc-
tions. Quaternary Science Reviews, 35, 1-22.

[54] Hoeting, J. A., Madigan, D., Raft, A. E., and Volinsky, C. T. (1999). Bayesian model

averaging: a tutorial. Statistical Science, 14(4), 382-417.

[55] Katz, R. W. (2002). Techniques for estimating uncertainty in climate change scenarios

and impact studies. Climate Research, 20, 167-185.

[56] Cheaib, A. , Badeau, V., Boe, J., Chuine, I., Delire, C., Dufrˆene, E., Fran¸cois, C.,
Gritti, E. S., Legay, M., Pag´e, C., Thuiller, W., Viovy, N., and Leadley, P. (2012). Cli-
mate change impacts on tree ranges: model intercomparison facilitates understanding
and quantiﬁcation of uncertainty. Ecology Letters, 15, 533-544.

[57] Haslett, J. , Whiley, M., Bhattacharya, S., SalterTownshend, M., Wilson, S. P., Allen,
J. R., Huntley, B., and Mitchell, F. J. (2006). Bayesian palaeoclimate reconstruction.
Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(3), 395-
438.

96

[58] Barboza, L., Li, B., Tingley, M. P., and Viens, F. G. (2014). Reconstructing past
temperatures from natural proxies and estimated climate forcings using short- and
long-memory models. The Annals of Applied Statistics, 8, 1966-2001.

[59] Tingley, M. P. and Huybers, P. (2010). A Bayesian algorithm for reconstructing climate
anomalies in space and time. Part I: development and applications to paleoclimate
reconstruction problems. Journal of Climate, 23, 2759-2781.

[60] Henley, B. J., Thyer, M. A., Kuczera, G., and Franks, S. W. (2011). Climate-informed
stochastic hydrological modeling: Incorporating decadal-scale variability using paleo
data. Water Resources Research, 47, W11509.

[61] Comboul, M., Emile-Geay, J., Evans, M. N., Mirnateghi, N., Cobb, K. M., and Thomp-
son, D. M. (2014). A probabilistic model of chronological errors in layer-counted cli-
mate proxies: applications to annually banded coral archives. Climate of the Past, 10,
825-841.

[62] Mann, M. E., Zhang, Z., Rutherford, S., Bradley, R. S., Hughes, M. K., Shindell, D.,
Ammann, C., Faluvegi, G., and Ni, F. (2009). Global signatures and dynamical origins
of the Little Ice Age and Medieval Climate Anomaly. Science, 326, 1256-1260.

[63] Loso, M. G. (2009). Summer temperatures during the Medieval Warm Period and Little
Ice Age inferred from varved proglacial lake sediments in southern Alaska. Journal of
Paleolimnology, 41, 117.

[64] Brohan, P., Kennedy, J. J., Harris, I., Tett, S. F. B., and Jones, P. D. (2006). Uncer-
tainty estimates in regional and global observed temperature changes: a new data set
from 1850. Journal of Geophysical Research, 111, D12106.

[65] Rayner, N. A., Brohan, P., Parker, D. E., Folland, C. K., Kennedy, J. J., Vanicek, M.,
Ansell, T. J., and Tett, S. F. B. (2006). Improved analyses of changes and uncertainties
in sea surface temperature measured in situ since the mid-nineteenth century: the
HadSST2 dataset.” Journal of Climate, 19, 446-469.

[66] Morice, C. P., Kennedy, J. J., Rayner, N. A., and Jones, P. D. (2012). Quantifying
uncertainties in global and regional temperature change using an ensemble of obser-
vational estimates: the HadCRUT4 data set. Journal of Geophysical Research, 117,
D08101.

[67] Jones, P. D., Lister, D. H., Osborn, T. J., Harpham, C., Salmon, M., and Morice, C.
P. (2012). Hemispheric and large-scale land surface air temperature variations: an ex-
tensive revision and an update to 2010. Journal of Geophysical Research, 117, D05127.

97

[68] Ammann, C. M., Genton, M. G., and Li, B. (2010). Technical note: correcting for
signal attenuation from noisy proxy data in climate reconstructions. Climate of the
Past, 6, 273-279.

[69] Gueymard, C. A. (2018). A reevaluation of the solar constant based on a 42-year
total solar irradiance time series and a reconciliation of spaceborne observation. Solar
Energy, 168, 2-9.

[70] Bard, E., Raisbeck, G., Yiou, F., and Jouzel, J. (2000). Solar irradiance during the

last 1200 years based on cosmogenic nuclides. Tellus B, 52, 985-992.

[71] Lean, J., Beer, J., and Bradley, R. (1995). Reconstruction of solar irradiance since 1610:

implications for climate change. Geophysical Research Letters, 22(23), 3195-3198.

[72] Ammann, C. M., Joos, F., Schimel, D. S., Otto-Bliesner, B. L., and Tomas, R. A.
(2007). Solar inﬂuence on climate during the past millennium:
results from tran-
sient simulations with the NCAR climate system model. Proceedings of the National
Academy of Sciences, 104(10), 3713-3718.

[73] Shindell, D. T., Schmidt, G. A., Miller, R. L., and Rind, D. (2001). Northern hemi-
sphere winter climate response to greenhouse gas, ozone, solar, and volcanic forcing.
Journal of Geophysical Research, 106(D7), 7193-7210.

[74] Shindell, D. T. and Schmidt, G. A. (2003). Volcanic and solar forcing of climate change

during the preindustrial era.” Journal of Climate, 16, 4094-4107.

[75] Moss, R. H., Edmonds, J. A., Hibbard, K. A., Manning, M. R., Rose, S. K., van Vuuren,
D. P., Carter, T. R., Emori, S., Kainuma, M., Kram, T., Meehl, G. A., Mitchell, J. F.
B., Nakicenovic, N., Riahi, K., Smith, S. J., Stouﬀer, R. J., Thomson, A. M., Weyant,
J. P., and Wilbanks, T. J. (2010). The next generation of scenarios for climate change
research and assessment. Nature, 463, 747-756.

[76] van Vuuren, D. P., Edmonds, J., Kainuma, M., Riahi, K., Thomson, A., Hibbard,
K., Hurtt, G. C., Kram, T., Krey, V., Lamarque, J. F., Masui, T., Meinshausen, M.,
Nakicenovic, N., Smith, S. J., Rose, S. K. (2011). The representative concentration
pathways: an overview. Climatic Change, 109, 5-31.

[77] van Vuuren, D. P., den Elzen, M. G. J., Lucas, P. L., Eickhout, B., Strengers, B. J.,
van Ruijven, B., Wonink, S., and van Houdt, R. (2007). Stabilizing greenhouse gas
concentrations at low levels: an assessment of reduction strategies and costs. Climatic
Change, 81, 119-159.

[78] Hijioka, Y., Matsuoka, Y., Nishimoto, H., Masui, T., and Kainuma, M. (2008). Global
GHG emission scenarios under GHG concentration stabilization targets. Journal of
Global Environment Engineering, 13, 97-108.

98

[79] Riahi, Keywan, Gr¨ubler, A., and Nakicenovicac, N. (2007). Scenarios of long-term
socio-economic and environmental development under climate stabilization. Techno-
logical Forecasting and Social Change, 74(7), 887-935.

[80] Free, M. and Robock, A. (1999). Global warming in the context of the Little Ice Age.

Journal of Geophysical Research, 104, 19057-19070.

[81] Myhre, G., Highwood, E. J., Shine, K. P., and Stordal, F. (1998). New estimates of
radiative forcing due to well mixed greenhouse gases. Geophysical Research Letters,
25, 2715-2718.

[82] Collins, W. D., Ramaswamy, V., Schwarzkopf, M. D., Sun, Y., Portmann, R. W., Fu,
Q., Casanova, S. E. B., Dufresne, J. L., Fillmore, D. W., Forster, P. M. D., Galin, V. Y.,
Gohar, L. K., Ingram, W. J., Kratz, D. P., Lefebvre, M. P., Li, J., Marquet, P., Oinas,
V., and Tsush, Y. (2006). Radiative forcing by well-mixed greenhouse gases: estimates
from climate models in the intergovernmental panel on climate change (IPCC) fourth
assessment report (AR4). Journal of Geophysical Research, 111, D14317.

[83] Chouet, B. A. (1996). Long-period volcano seismicity: its source and use in eruption

forecasting. Nature, 380, 309-316.

[84] Oppenheimer, C. (2003). Climatic, environmental and human consequences of the
largest known historic eruption: Tambora volcano (Indonesia) 1815. Progress in Phys-
ical Geography, 27 (2), 230-259.

[85] Man, W., Zhou, T., and Jungclaus, J. H. (2014). Eﬀects of large volcanic eruptions on
global summer climate and East Asian monsoon changes during the last millennium:
analysis of MPI-ESM simulations. Journal of Climate, 27, 7394-7409.

[86] Ljung, G. M. and Box, G. E. P. (1978). On a measure of lack of ﬁt in time series

models. Biometrika, 65(2), 297-303.

[87] Ter¨asvirta, T. (1994). Speciﬁcation, estimation, and evaluation of smooth transition
autoregressive models. Journal of the American Statistical Association, 89(425), 208-
218.

[88] Eitrheim, Ø. and Ter¨asvirta, T. (1996). Testing the adequacy of smooth transition

autoregressive models. Journal of Econometrics, 74 (1), 59-75.

[89] Morris, C. N. (1983). Parametric empirical Bayes inference: theory and applications.

Journal of the American Statistical Association, 78(381), 47-55.

[90] Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple

sequences. Statistical Science, 7(4), 457-472.

99

[91] McKeen, S., Wilczak, J., Grell, G., Djalalova, I., Peckham, S., Hsie, E. Y., Gong, W.,
Bouchet, V., Menard, S., Moﬀet, R., McHenry, J., McQueen, J., Tang, Y., Carmichael,
G. R., Pagowski, M., Chan, A., Dye, T., Frost, G., Lee, P., and Mathur, R. (2005). As-
sessment of an ensemble of seven real-time ozone forecasts over eastern North America
during the summer of 2004. Journal of Geophysical Research, 110, D21307.

[92] Savage, N. H., Agnew, P., Davis, L. S., Ord´o˜nnez, C., Thorpe, R., Johnson, C. E.,
O’Connor, F. M., and Dalvi, M. (2013). Air quality modelling using the met oﬃce uni-
ﬁed model (AQUM OS24-26): model description and initial evaluation. Geoscientiﬁc
Model Development, 6, 353-372.

[93] Chai, T. and Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute
error (MAE)? – arguments against avoiding RMSE in the literature. Geoscientiﬁc
Model Development, 7, 1247-1250.

[94] Hersbach, H. (2000). Decomposition of the continuous ranked probability score for

ensemble prediction systems. Weather and Forecasting, 15, 559-570.

[95] Grimit, E. P., Gneiting, T., Berrocal, V. J., and Johnson, N. A. (2006). The continuous
ranked probability score for circular variables and its application to mesoscale forecast
ensemble veriﬁcation. Quarterly Journal of the Royal Meteorological Society, 132, 2925-
2942.

[96] Gschl¨oßl, S. and Czado, C. (2007). Spatial modelling of claim frequency and claim size

in non-life insurance. Scandinavian Actuarial Journal, 2007(3), 202-225.

[97] Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and

estimation. Journal of the American Statistical Association, 102(477), 359-378.

[98] Cowles, M. K. and Carlin, B. P. (1996). Markov chain Monte Carlo convergence
diagnostics: a comparative review. Journal of the American Statistical Association,
91(434), 883-904.

[99] Brooks, S. P. and Gelman, A. (1998). General methods for monitoring convergence of
iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434-
455.

[100] Hegerl, G. C., Br¨onnimann, S., Schurer, A., and Cowan, T. (2018). The early 20th cen-
tury warming: anomalies, causes, and consequences. Wiley Interdisciplinary Reviews:
Climate Change, 9(4), e522.

[101] Phipps, S. J., McGregor, H. V., Gergis, J., Gallant, A. J. E., Neukom, R., Stevenson, S.,
Ackerley, D., Brown, J. R., Fischer, M. J., and van Ommen, T. D. (2013). Paleoclimate
datamodel comparison and the role of climate forcings over the past 1500 years. Journal
of Climate, 26, 6915-6936.

100

[102] Eddy, J. A. (1976). The Maunder Minimum. Science, 192(4245), 1189-1202.

[103] Hempelmann, A. and Weber, W. (2012). Correlation between the sunspot number, the

total solar irradiance, and the terrestrial insolation. Solar Physics, 277, 417-430.

[104] Kopp, G., Krivova, N., Wu, C. J., and Lean, J. (2016). The impact of the revised

sunspot record on solar irradiance reconstructions. Solar Physics, 291, 2951-2965.

[105] Mann, M. E., Fuentes, J. D., and Rutherford, S. (2012). Underestimation of volcanic
cooling in tree-ring-based reconstructions of hemispheric temperatures. Nature Geo-
science, 5, 202-205.

[106] Taylor, K. E., Stouﬀer, R. J., and Meehl, G. A. (2012). An overview of CMIP5 and
the experiment design. Bulletin of the American Meteorological Society, 93, 485-498.

[107] Holmstr¨om, L., Ilvonen, L., Sepp¨a, H., and Veski, S. (2015). A Bayesian spatiotemporal
model for reconstructing climate from multiple pollen records. The Annals of Applied
Statistics, 9(3), 1194-1225.

[108] Kim, K. K., Shen, D. E., Nagy, Z. K., and Braatz, R. D. (2013). Wiener’s polynomial
chaos for the analysis and control of nonlinear dynamical systems with probabilistic
uncertainties [historical perspectives]. IEEE Control Systems, 33, 58-67.

[109] Oladyshkin, S. and Nowak, W. (2018). Incomplete statistical information limits the
utility of high-order polynomial chaos expansions. Reliability Engineering & System
Safety, 169, 137-148.

[110] Pedersen, A. R. (1995). A new approach to maximum likelihood estimation for stochas-
tic diﬀerential equations based on discrete observations. Scandinavian Journal of Statis-
tics, 22(1), 55-71.

[111] A¨ıt-Sahalia, Y. (2002). Maximum likelihood estimation of discretely sampled diﬀusions:

a closedform approximation approach. Econometrica, 70(1), 223-262.

[112] Nourdin, I. and Peccati, G. (2015). The optimal fourth moment theorem. Proceedings

of the American Mathematical Society, 143, 3123-3133.

101