HIGH DIMENSIONAL COMPUTATIONAL MODELS FOR
BIOMEDICAL IMAGING DATA ANALYSIS
By
Liangliang Zhang

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Statistics - Doctor of Philosophy
2017

ABSTRACT
HIGH DIMENSIONAL COMPUTATIONAL MODELS FOR BIOMEDICAL IMAGING
DATA ANALYSIS
By
Liangliang Zhang
The importance of big data does not revolve around how much data you have, but what
you do with it. You can take data from any source and analyze it to find answers that enable
1) cost reductions, 2) time reductions, 3) new product development and optimized offerings,
and 4) smart decision making. My thesis is mainly focused on high-dimensional image data
analysis and computations. The image data I worked on are CT images of abdominal aortic
aneurysm (AAA) and brain images of Alzheimerâs Disease. I developed Bayesian calibration
method for the former and Supervised learning with Markov Chain for the latter.
Bayesian calibration has a long history within computer modeling in general. Bayesian
calibration is an iterative process of updating uncertainty distributions on the calibration
parameters in a way that is consistent with the observed data. Because of the advances in
complex mathematical models and fast computer codes, Bayesian calibration of computer
experiments are popular in the scientific research nowadays. As we know, compared to a
computer model, a complex system in real life is expensive both in time and money to
observe. Therefore, computer models can be a stand-alone tool or combined with (typically
smaller) data from physical experiments or field observations. And Bayesian calibration is
powerful in integrating all sources of uncertainties into the model definition and calibration
procedure.
For the AAA data, first I modeled only one patient (patient-specific prediction discussed
in Chapter 2), and then built an advanced model which can incorporate all patients (multi-

patients prediction in Chapter 3). In the process, semi-parametric functional data analysis,
covariance modeling and Bayesian methods was highly practiced and used. The contributions
are as follows. First, we formulate the Bayesian calibration of our AAA G&R computation
model taking into account model inadequacy, prior distributions of model parameters, measurement errors, and most importantly, longitudinal CT scan images. Next, we demonstrate
how to achieve the proposed aims by solving the formulated Bayesian calibration problem
using a simulation study and real data analysis. In particular, we compare and discuss
the performance and computation time under different sampling cases for the computation
model output data and (synthesized) patient data, both of which are synthesized by the
G&R computation. We apply our Bayesian calibration to the real CT data and validate our
prediction, showing the usefulness of our approach to the computational science and medical
communities in aiding decision making.
For the Alzheimerâs Disease data, the causes are currently being researched massively,
but no definitive answers exist as yet. Genetic predisposition, abnormal protein deposits
in the brain and environmental factors are suspected to play a role in the development of
the disease. In Chapter 4, my main goal is to model the progression of Alzheimerâs Disease
by applying multi-state Markov model, and to investigate the significance of known risk
factors like Age, ApoE4 and some brain structural volumetric variables getting from MRI
like hippocampus, and at the same time, to predict the transitions between different clinical
diagnosis states based transition rates and transition probabilities. We found that the model
with age is not significant (p-value is 0.1733) according to the likelihood ratio test, while
ApoE4 is a significant risk factor in our Markov model. Predictions based on transition rates
and transition intensities were made and validated with the accuracy as high as 0.7849.

ACKNOWLEDGMENTS

Thanks for the ancestors who paved the path before me upon whose shoulders I stand.
This is also dedicated to my family and the many friends who supported my on this journey.
I appreciate all the failure, trials and suffering for giving me one more chance each time unto
a full grown man.
Thanks for the Statisticians and other scientists before and in my time. No matter how
ignorant I was at the beginning, I always feel so lucky to be selected as part of Statistics
since I was a sophomore. Statistics is omnipotent like the saying goes: God also gambles.
We are living in a world full of uncertainties. Determinacy in Mathematics is generalized in
Statistics; The natural randomness in Quantum can be systematically described in Statistics;
Predictive models in the financial market are cautious of Black Swan... ...
I would like to express my deepest gratitude to my supervisor Prof. Taps Maiti for his
unwavering support of my five years Ph.D study and life, for his encouragement, motivation,
collegiality and mentorship throughout all the projects, for his time and effort put into
cultivating me from a dependent researcher to be an independent academician.
I would like to express my special appreciation to my co-advisor Dr. Chae Young Lim,
not only for her tremendous academic support, but also for her patience, genuine caring and
concern. Without all her contributions of time and ideas, it is not possible to make this
thesis completed and comprehensive.
Besides my supervisors, I would like to extend my thanks to those who offered research
opportunities, collegial guidance and continuous support over the years: my thesis committee
Dr. Jongeun Choi and Dr. R.V. Ramamoorthi, for their encouragement and serving as
member of committee.
iv

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

Chapter 1 Introduction . . . . . . .
1.1 Computer model . . . . . . . . .
1.2 Gaussian Process Emulation . . .
1.3 Design of Computer Experiments
1.4 Bayesian Statistics . . . . . . . .
1.5 Calibration . . . . . . . . . . . .
1.6 Bayesian calibration . . . . . . .
1.7 Advantages and disadvantages . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

1
1
2
3
4
5
5
7

Chapter 2 Patient-Specific Prediction from Single-subject Data
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 AAA G&R computation model . . . . . . . . . . . . . . .
2.2.2 Quantity of interest (QoI) for AAA G&R . . . . . . . . . .
2.2.3 Calibration model . . . . . . . . . . . . . . . . . . . . . . .
2.2.4 Statistical models . . . . . . . . . . . . . . . . . . . . . . .
2.3 Bayesian analysis for calibration . . . . . . . . . . . . . . . . . . .
2.3.1 Likelihood of computer model . . . . . . . . . . . . . . . .
2.3.2 Likelihood of real observations . . . . . . . . . . . . . . . .
2.3.3 Joint likelihood . . . . . . . . . . . . . . . . . . . . . . . .
2.3.4 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.5 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Simulated Observation Cases . . . . . . . . . . . . . . . .
2.4.2 Real Observation Case . . . . . . . . . . . . . . . . . . . .
2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.1 Results from Simulated Observation Cases . . . . . . . . .
2.5.1.1 Robustness of prior selection . . . . . . . . . . .
2.5.1.2 Computation time . . . . . . . . . . . . . . . . .
2.5.2 Results from Real Observation Case . . . . . . . . . . . . .
2.6 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . .
2.7 Data Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix 1 Programming steps for simulated observations . . . . . .
Appendix 2 Other Results . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

8
8
12
13
16
19
21
23
24
25
26
27
30
32
33
35
36
37
41
41
42
44
48
50
50
51

v

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

Chapter 3 Patient-Specific Prediction from Multi-subject
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Calibration model . . . . . . . . . . . . . . . . . . . . . . .
3.3 Statistical models . . . . . . . . . . . . . . . . . . . . . . .
3.4 Computer outputs . . . . . . . . . . . . . . . . . . . . . . .
3.5 Observations . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Full data . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7 Bayesian analysis . . . . . . . . . . . . . . . . . . . . . . .
3.7.1 Joint posterior distribution . . . . . . . . . . . . . .
3.7.2 Full conditional of Ď1 . . . . . . . . . . . . . . . .
3.7.3 Full conditional of Ď2 . . . . . . . . . . . . . . . .
3.7.4 Posterior distribution of calibration parameter Î¸ . .
3.7.5 Predictive distribution of true process Îś(Âˇ) . . . . .
3.8 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . .
3.8.1 Small sample . . . . . . . . . . . . . . . . . . . . .
3.8.2 Big sample . . . . . . . . . . . . . . . . . . . . . . .
3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

58
58
59
60
61
63
64
66
66
67
67
71
71
72
73
74
76

Alzheimerâs Disease Progression Analysis Using Multi-state
Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Model without covariates . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.2 Model with a single covariate . . . . . . . . . . . . . . . . . . . . . .
4.4.2.1 Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.2.2 ApoE4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.3 Model with multiple covariates . . . . . . . . . . . . . . . . . . . . .
4.4.3.1 Prediction and validation . . . . . . . . . . . . . . . . . . .
4.4.3.2 Impact of transition rates and transition probabilities . . . .
4.5 Conclusion and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix 1 Multi-state Markov model . . . . . . . . . . . . . . . . . . . . . . .
Appendix 2 Specification for AAA progression . . . . . . . . . . . . . . . . . . .

78
78
83
87
90
90
92
93
94
97
99
102
105
107
108
110

Chapter 5 Conclusions, Discussion, and Directions for
5.1 Conclusions and discussion . . . . . . . . . . . . . . . .
5.2 Directions for future research . . . . . . . . . . . . . . .
5.2.1 Bayesian consistency . . . . . . . . . . . . . . .
5.2.2 Variable selection in Bayesian statistics . . . . .
5.2.3 Bayesian Dynamic network . . . . . . . . . . . .

111
111
112
112
115
119

Chapter 4

Future Research
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .

.
.
.
.
.
.

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

vi

LIST OF TABLES

Table 2.1. Differences in Xc and Ď22 for simulation cases . . . . . . . . . . . . . . . .

35

Table 2.2. Case details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

Table 2.3. Posterior estimates of Î¸1 and Î¸2

. . . . . . . . . . . . . . . . . . . . . . .

43

Table 2.4. Computation times for Bayesian calibration and prediction. Estimation
(time) describes the time spent in maximizing to estimate Ď1 and Ď2 , while
sampling (time) describes the time spent in calculating predictions and
credible intervals via sampling from posteriors. . . . . . . . . . . . . . . .

43

Table A.1. Priors and Estimates of Hyperparameters for Case 1a . . . . . . . . . . .

51

Table A.2. Priors and Estimates of Hyperparameters for Case 1b . . . . . . . . . . .

51

Table A.3. Priors and Estimates of Hyperparameters for Case 2a . . . . . . . . . . .

52

Table A.4. Priors and Estimates of Hyperparameters for Case 2b . . . . . . . . . . .

52

Table A.5. Priors and Estimates of Hyperparameters for Case 2c . . . . . . . . . . .

52

Table A.6. Priors and Estimates of Hyperparameters for Case 3a . . . . . . . . . . .

53

Table A.7. Priors and Estimates of Hyperparameters for Case 3b . . . . . . . . . . .

53

Table A.8. Priors and Estimates of Hyperparameters for real observations . . . . . .

53

Table 3.1. Priors and Estimates of Hyperparameters . . . . . . . . . . . . . . . . . .

76

Table 4.1. Distribution of age in each diagnosis group . . . . . . . . . . . . . . . . .

85

Table 4.1. Table of transition rates and probabilities for the model without covariates. 92
Table 4.2. Table of transition rates (95% confidence intervals) for the model with
covariate ApoE4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

Table 4.3. Table of transition probabilities (95% confidence intervals) at year 2 for the
model with covariate ApoE4. . . . . . . . . . . . . . . . . . . . . . . . . .

97

Table 4.4. Table of transition probabilities (95% confidence intervals) at year 4 for the
model with covariate ApoE4. . . . . . . . . . . . . . . . . . . . . . . . . .

98

vii

Table 4.5. Table of mean sojourn time (95% confidence intervals) for the model with
covariate ApoE4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

Table 4.6. Likelihood ratio test between different models. . . . . . . . . . . . . . . .

99

Table 4.7. Prediction of two data sets with different alignments in time. Data1 includes all the observations before AD for each subject, while Data 2 includes only 2 observations before Ad for each subject. Data1 is divided
into Group1 and Group2. Group1 in Data1 includes all the patients whose
transition probability is ever more than 0.45 or whose transition rate is
more than 1.2. Also, Data2 is divided into Group1 and Group2. Group1
in Data2 include all the patients whose transition probability is ever more
than 0.6 or whose transition rate is more than 1.2. . . . . . . . . . . . . . 102

viii

LIST OF FIGURES

Figure 2.1. (a) The location of an abdominal aortic aneurysm colored as redis indicated by an arrow, while green region on the left indicates the inferior
vena vein and (b) four longitudinal AAA CT images 1, 2, 3, and 4 which
are screened at consecutive 4 times. Regarding the technique to align the
CT scan images, please refer to Kwon et al. [61]. . . . . . . . . . . . . . .

8

Figure 2.2. The flow-chart of Bayesian calibration. . . . . . . . . . . . . . . . . . . .

12

Figure 2.3. 3D point clouds sampled from a CT scan image. (a) An inscribed sphere
with point clouds from the CT scan images and (b) a centerline computed
from point clouds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

Figure 2.4. The diagram of data preparation. . . . . . . . . . . . . . . . . . . . . . .

19

Figure 2.5. Radius vs height coordinate on the centerline measured from CT scans of
a patient over the surveillant time. The lines from bottom to top represent
images 1, 2, 3, and 4, respectively. . . . . . . . . . . . . . . . . . . . . . .

20

Figure 2.6. Diagram of the Bayesian calibration process. . . . . . . . . . . . . . . . .

25

Figure 2.7. Two stages to estimate Ď.

. . . . . . . . . . . . . . . . . . . . . . . . .

29

Figure 2.8. Predictions of the true processes. T [t] denotes the true values at year t,
while P [t] denotes the predictions at year t. The solid line denotes the
true values, while the dashed line denotes the predictions. . . . . . . . .

38

Figure 2.9. 3D 95% credible band of predictions. The blue stars denote the true values
lying inside the credible band. The red marks denote the true values lying
outside the credible band. . . . . . . . . . . . . . . . . . . . . . . . . . .

39

Figure 2.10. Predictions and credible bands for Case 2c.

. . . . . . . . . . . . . . . .

40

Figure 2.11. Predictions and credible bands for Real observation. . . . . . . . . . . . .

44

Figure A.1. Relative errors between predictions and tmrue values. . . . . . . . . . . .

54

Figure A.2. Relative errors for Case 2c (a) and for real observation case (b). . . . . .

55

Figure A.3. Prior and posterior densities of Î¸1 and Î¸2 . . . . . . . . . . . . . . . . . .

56

Figure A.4. Prior and posterior densities of Î¸1 and Î¸2 . . . . . . . . . . . . . . . . . .

57

ix

Figure 3.1. Prediction results for subject 1. . . . . . . . . . . . . . . . . . . . . . . .

74

Figure 3.2. Prediction results for subject 2. . . . . . . . . . . . . . . . . . . . . . . .

75

Figure 3.3. Prior and posterior results for the data consisting of 6 patients. . . . . .

77

Figure 4.1. Individual plot of percentage of hippocampus volume to ICV vs. age.
In each graph, each black line segement with solid points in it denotes
how the hippocampus of each patient changes with age. The solid point
denotes the age when the patient takes screening. The x-axis is the real
age of patients. The y-axis is the percentage of hippocampus volume to
ICV (intracranial volume). . . . . . . . . . . . . . . . . . . . . . . . . .

86

Figure 4.2. Longitudinal plots of fraction of hippocampus volume to ICV and MMSE
in 4 different diagnosis groups. âD11â denotes the group of patients who
stay within NC all the time. âD22â denotes the group of patients who stay
within MCI all the time. âD23â denotes the group of patients who ever
transit from MCI to AD. âD33â denotes the group of patients who stay
within AD all the time. . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

Figure 4.1. Plot of survival curves (not entering to the final state AD). The dashed
blue line denotes fitted survival curve from NC to AD. The solid red line
denotes fitted survival curve from MCI to AD. The dotted lines denote
the 95% confidence band of corresponding fitted survival curve. . . . . .

93

Figure 4.2. Fitted transition rates changing with age in different groups. The red solid
line with circle marks denotes the transition rates from NC state to MCI
state. The blue dashed line with triangular marks denotes the transition
rates from MCI state to NC state. The purple dotted line with plus marks
denotes the transition rates from MCI state to AD state. x-axis denotes
the age, while y-axis denotes the transition rates. . . . . . . . . . . . . .

95

Figure 4.3. Transition probabilities changing with time for the model with covariate
ApoE4. The red solid line with circle marks describes transition probabilities from NC state to MCI state. The blue dashed line with triangle
marks describes transition probabilities from MCI state to NC state. The
purple dotted line with plus marks describes transition probabilities from
MCI state to AD state. x-axis denotes the scanning time, while y-axis
denotes the transition probabilities. The left panel describes the ApoE4
carriers (positive), while the right panel describes the ApoE4 non-carriers
(negative). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

x

Figure 4.4. Longitudinal plots of predicted transition rates and probabilities in different diagnosis groups. âD11â denotes the group of patients who stay within
NC all the time. âD22â denotes the group of patients who stay within MCI
all the time. âD23â denotes the group of patients who ever transit from
MCI to AD. âD33â denotes the group of patients who stay within AD all
the time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Figure 5.1. Inverted Beta distribution when parameters a and b take different values. 118
Figure 5.2. Group 1, Group 2 and Group 3 (big circles) denote the nodes of the brain
network. The thick solid line denotes the connectivity between nodes.
As we can see the connectivity changes with time. Voxels (small circles)
denote the real observations. . . . . . . . . . . . . . . . . . . . . . . . . 121

xi

Chapter 1
Introduction
In this thesis, the image data I worked on are CT images of abdominal aortic aneurysm
(AAA) and brain images of Alzheimerâs Disease. I developed Bayesian calibration method for
the former and Supervised learning with Markov Chain for the latter. Chapter 1 reviews the
background, research questions and development of Bayesian calibration method. Chapter 2
introduces patient-specific prediction for single subject data by using Bayesian calibration.
Chapter 3 introduces patient-specific prediction for multiple subjects data by using Bayesian
calibration. Chapter 4 introduces analysis on conversion of Alzheimers Disease using a multistate Markov model. Chapter 5 introduces Conclusions, Discussion, and Suggestions for
Future Research

1.1

Computer model

Complex models (also called computer models) are used to resemble real-world systems and
are thus used in many different fields, such as clinical sciences, Physics, engineering and
climate change. A computer model is defined by a set of equations and is implemented as
computer codes. People typically use the model to predict how the real world system may
behave in the future. Running a computer model at a number of different input values is
known as computer experiment. Under different backgrounds the output could be either a
scalar or a vector; however, we will just consider the case with univariate output here. The
1

computer models are deterministic, for each time they are run with the same inputs they
will produce the same output. [99]
The inputs usually are used to describe some properties of the real system which need to
observed. Thus the true values of these inputs may be unknown. Although when the values
of the inputs can be obtained, it is likely that some form of measurement error will have
occurred. Therefore, the uncertainty in the inputs needs to be quantified as it will relate
to how uncertain we are that the computer model output matches reality. In other words,
uncertainty analysis is aiming at quantifying the uncertainty in model outputs caused by
uncertainty in inputs. Besides, another source of uncertainty is in the model itself. Even
when the true values of the inputs are known, running the model at these points will not
produce an output which matches exactly the observation of the real-world system.
Complex models are computationally expensive and take a long time to run. Because
a simulation of a real-world system can require the high number of dimensions. Sensitivity
analysis is defined as the process of evaluating how the output of a model is modified by
changes in the inputs. Performing analyses such as sensitivity and uncertainty analysis
can require many runs of the computer model and this quickly becomes impractical with a
computationally expensive model.

1.2

Gaussian Process Emulation

In statistics, Gaussian process emulator is one name for a general type of statistical model
that has been used in contexts where the problem is to make maximum use of the outputs of
a complex (often non-random) computer model. The main element of the Gaussian process
emulator is that it models the outputs as a Gaussian process on a space that is defined by

2

the computer model inputs.
The analysis of computer models and Gaussian process emulator has been studied for
about 20 years. Sacks et al. (1989) [89] describe a technique for approximating an unknown
deterministic computer experiment by modeling the output as a realization of a stochastic
process. A Bayesian method of predicting a computer model at untested inputs is given in
Currin et al. (1991) [22]. OâHagan (2006) [77] gives an introduction to Gaussian process
emulation as well as details of Bayesian methods for various analyses.

1.3

Design of Computer Experiments

In order to obtain the training data, we need to select the design. The choice of design points
is named as the design of the experiment. Without intensively increasing the computational
time, we aim to gain more information from the training data, so that the emulator can
approximate better the computer model. Various criteria are defined under different designs.
For example, under a maximum design, points are selected such that the maximum distance
between any two design points in minimized.
McKay et al. (1979) [69] compare three methods for choosing designs for Monte Carlo
studiesâsimple random sampling, stratified sampling and Latin hypercube sampling. A Latin
hypercube sample ensures that the points are spread evenly over the range of each input
dimension. The three methods are assessed by comparing estimators of the mean, variance
and distribution function of the output. They conclude that Latin hypercube sampling is
preferred. Besides, Santner et al. (2003) [91] comment on the widespread use of this method
for generating designs in other kinds of computer experiment.

3

1.4

Bayesian Statistics

Bayesian statistics began with a posthumous publication in 1763 by Thomas Bayes, [5] a
non-conformist minister from Tunbridge Wells. [51] His work was formalised as Bayesâs theorem, which is a simple and uncontroversial result in probability theory. Bayesian statistics
is a system for describing epistemological uncertainty using the mathematical language of
probability. [98]The methods can be outlined as formal combinations through the use of
Bayesâs theorem of: 1. a prior distribution about the value of a quantity of interest based
on evidence not derived from the study, with 2. a summary of the information (likelihood)
concerning the same quantity available from the data collected in the study, to yield 3. an
updated or posterior distribution of the quantity of interest. These methods address the
question of how new information should update what we currently believe. They extend
naturally into making predictions, synthesizing information from different sources, and designing studies. Nevertheless, Bayesian methods are a controversial topic. Because they may
involve the explicit use of subjective judgment in a rigorous scientific exercise.
Compared with traditional methods, a Bayesian perspective leads to more flexible and
ethical methods of analyzing clinical trials and observational studies, [56] and to more elegant
ways of handling multiple substudies, for instance when simultaneously estimating the effects
of a treatment on many subgroups [10]. A Bayesian approach enables one to provide suitable
conclusions for making decisions for specific patients, for planning research, or for public
policy [65].

4

1.5

Calibration

For a conventional calibration, an expert modeler repeatedly adjusts calibration parameters
to minimize the difference between observed data and model outputs (Reddy et al., 2007) [82].
One main drawback of this process is that it demands intensive manual labor and inherently
introduces modeler biases into the calibrated model. To improve, optimization schemes
have been employed to automate calibration (New and Chandler, 2013 [74]; Raftery et al.,
2011 [81]). These schemes determine the combination of model parameters by minimizing
the error between data and model outputs.
Therefore, the solution of the calibration problem can be understood as identifying the
global optimum of the objective function within the feasible domain of the calibration parameters. When both the objective function and the feasible domain are convex or they can
be restated as convex, it is very stable for the mathematical algorithm to find the single optimum. For problems that are not convex or cannot be recast as convex, exact solutions often
become computationally expensive, and only approximate solutions may be practical [9].
For more details, please refer to [25].
In most cases, calibration is the same as statistical estimation [46], [27]. Both processes
are aiming at finding input values that lead to the best possible model fit. For example, if
the objective function of the calibration is a likelihood function, calibration is equivalent to
maximum likelihood estimation.

1.6

Bayesian calibration

Many methods of automated calibration have been developed which reduce costs, time and
biases. Among them, Bayesian methods differ significantly from the others in that inputs are
5

assumed to be uncertain and the main goal is not to match the prediction to the measured
data as closely as possible, but to reduce the uncertainty in the inputs in a manner consistent
with the measured data. When the sample size of measured data are limited and the model
inputs have high sensitivity and high uncertainty, Bayesian methods are particular useful to
achieve calibration. For more details, please refer to Muehleisen and Bergerson 2016 [73].
Bayesian calibration is an advanced method of calibration and thus is fundamentally
different than conventional calibration methods. Bayesian calibration has a long history
within computer modeling in general ( [30]; [80]; [58]). Bayesian calibration is an automated
process of updating uncertainty distributions on the calibration parameters in a way that is
consistent with the observed data. Unlike the conventional calibration that minimizes the
difference between observed data and model outputs, Bayesian calibration determines the
most likely uncertainties for input parameters that yield an output uncertainty in which the
observed data is most likely. Bayesian calibration is an application of Bayes Theorem, which
combines prior information with future information contained in the likelihood of observed
outputs from the model ( [6]). As commented by Kennedy and OâHagan (2001) [58], none
of the existing methods of calibration recognize fully, all sources of uncertainty.
As similar as a regular Bayesian problem, the general process of Bayesian calibration is
1) define PDFs (probability density functions) for uncertain model parameters, 2) collect
simulations and observations of the real system for known input parameter values, and 3)
calibrate (assumed) prior parameters based on the observed data by iteratively using Bayes
Theorem until iterations converge to an acceptable level. Establishing prior PDFs for input
parameters generally requires both expert opinion solicitation and literature review. For
example, a uniform distribution may be assumed, when only a range of appropriate values can
be determined for a parameter(Riddle and Muehleisen, 2014 [85]). A triangular distribution
6

may be used, when both a range of values and a most likely value can be determined.

1.7

Advantages and disadvantages

As Bayesian calibration integrates uncertainty to the model definition and calibration procedure, one of the primary advantages is that a modeler can quantify a confidence level in the
calibrated model. Another advantage is that a probabilistic risk analysis can be performed
and competing retrofits can be ranked according to a desired risk level, based on mean values,
variances, etc. Finally, Bayesian calibration reduces the tendency to overfit the model to the
observed data. Overfitting is a problem in regression analysis, multi-objective optimization,
and machine learning when the fitting/optimization routine fits the model to the noise in
measured data (Dietterich, 1995 [29]). Unlike the conventional calibration that minimizes
the difference between observed data and model outputs, Bayesian calibration is trying to
maximize the likelihood that the model output is statistically consistent with the measured
data. This consistency naturally takes into account the uncertainty in the measured data
(Heckerman, 1998 [48]).
There is one common disadvantages of all calibration methods, both conventional and
Bayesian. It is typically very challenging, time consuming and expensive to collect observed
data from the real system. Another disadvantage of Bayesian calibration requires a significant number of iterations to converge to the most likely PDFs, especially when the prior
probability density functions are poorly chosen. Finnally, because of the high computational demand, in order to reduce run time, Bayesian calibration often requires a pre-step
of parameter screening and selection to obtain the most significant parameters to use in the
calibration process in large models [73].

7

Chapter 2
Patient-Specific Prediction from
Single-subject Data

2.1

Introduction

(a)

(b)

Figure 2.1. (a) The location of an abdominal aortic aneurysm colored as redis indicated
by an arrow, while green region on the left indicates the inferior vena vein and (b) four
longitudinal AAA CT images 1, 2, 3, and 4 which are screened at consecutive 4 times.
Regarding the technique to align the CT scan images, please refer to Kwon et al. [61].

An abdominal aortic aneurysm (AAA) is an enlarged localized volume in the lower part
8

of the aorta, which supplies blood to a large part of the body (see Fig. 2.1). Enlargement of
the aorta by more than 50% of its normal diameter is defined as an aortic aneurysm. The
vast majority (over 90%) of aortic aneurysms occur in the abdominal region, specifically
the infrarenal aorta [79] [114]. In general, an aorta with a diameter larger than 3 cm is
considered an aneurysm.
A ruptured aneurysm can cause life threatening internal bleeding. If ruptured, patient
mortality rates are greater than 80% [59, 60]. Depending on the size and rate of growth,
treatment of an AAA may vary from watchful waiting to emergency surgery. Once an
AAA is found, doctors will closely monitor it so that surgery can be planned if it becomes
necessary. Since elective repair of AAAs can result in peri-operative deaths (4-8%) [63],
performing unnecessary surgeries increases patient risk. A thorough understanding of the
expansion and rupture of AAAs is thus needed in order to minimize patient risk.
While significant advances have been made in the management of AAA patients [13],
this disease still carries a high mortality rate. During the last decade, bio-chemo-mechanical
studies have been integrating computation modeling with increased understanding of the
expansion and weakening of aneurysms. Recently, this computational platform, called a
growth and remodeling (G&R) model, has been developed and incorporated with patientspecific anatomical information, which aids in treatment planning on a per-patient basis [2, 36, 108, 117]. Zeinali-Davarani et al. [117] developed the G&R model to take into
account both elastic degeneration and stress-mediated collagen turnover during AAA development using finite element analysis (FEA). By the way there are still other models of
AAAs without G&R have been shown to be potentially useful to aid in patient-specific treatment planning [31,68]. A coupled simulation of G&R with hemodynamics was conducted for
studying its effects on AAA expansion [95]. Geometric, kinetic and material parameters have
9

been identified for individual patients using inverse optimization techniques for modeling the
growth of AAAs. Furthermore, it has been shown that the same material parameters for
AAA expansion can help to predict intrasac-pressure dependent vascular adaptation after
endovascular repair [62].
Translating recent computational advances into a predictive tool for individualized clinical treatment, however, requires a major paradigm shift due to the incompleteness of the
model, limited information, and uncertainty associated with clinical measurements with regard to each individual patient. Most importantly, the associated uncertainty in the prediction propagated from various sources needs to be correctly quantified. For example, the G&R
modelâs internal parameters need to be carefully adjusted according to patient-specific data,
e.g., longitudinal computed tomography (CT) images, in order to make better prediction
and so be useful for clinical decisions.
The aims of this paper are to develop a framework that 1) first calibrates the physical
AAA G&R model using patient-specific longitudinal CT scan images, 2) predicts the expansion of an AAA in future time, and 3) analyzes the associated uncertainty in the prediction.
To achieve our aims, we perform Bayesian calibration of our computational AAA G&R
model [58]. In particular, Bayesian calibration will be used to incorporate the computational
G&R model, patient-specific data (e.g., CT scan images), and various uncertainties as well
as to compute the uncertainty level of the prediction on the AAA expansion.
As computational science advances, there is a growing interest in applying Bayesian
calibration developed from the statistical community to engineering applications (see more
details in [49, 50, 58, 77, 112] and references therein). To the best of our knowledge, we are
the first to apply the Bayesian calibration method to the AAA G&R computation model.
The successful outcomes of our aims will make the G&R computation model viable to aid
10

clinicians in decision making. Hawkins-Daarud A. et al. [47] used a Bayesian framework to
address questions on validation, model selection, and uncertainty quantification for tumor
growth. Biehler J. et al. [7] presents an uncertainty quantification framework based on multifidelity sampling and Bayesian formulations and analyzes the impact of the uncertainty in the
input parameter on mechanical quantities typically related to abdominal aortic aneurysm
rupture potential. Zhao X. and Pelegri A. [119] used a Bayesian approach to estimate a
reasonable quantification of the probability distributions of soft tissue mechanical properties
in the presence of measurement noise and model parameter uncertainty. While HawkinsDaarud A. et al. [47] considered a set of discrete model candidates to select and validate
a model, we consider a statistical model for the true physical process by introducing two
Gaussian random fields for the G&R computation model and the inadequacy of the model.
We adopt a Bayesian calibration technique proposed in [58] to calibrate parameters in the
G&R computation model and predict the AAA expansion. The proposed statistical model
and Bayesian calibration can take different sources of uncertainty; therefore, it is well suited
to achieve our aims in predicting the AAA expansion process as well as in computing the
propagated uncertainty given the patient-specific data.
The contributions of our paper are as follows. First, we formulate the Bayesian calibration
of our AAA G&R computation model taking into account model inadequacy, prior distributions of model parameters (Seyedsalehi S and Zhang L, et al. [94]), measurement errors,
and most importantly, patient-specific longitudinal CT scan images. Next, we demonstrate
how to achieve the proposed aims by solving the formulated Bayesian calibration problem
using a simulation study and real data analysis. In particular, we compare and discuss
the performance and computation time under different sampling cases for the computation
model output data and (synthesized) patient data, both of which are synthesized by the
11

G&R computation. We apply our Bayesian calibration to the real patient-specific CT data
and validate our prediction, showing the usefulness of our approach to the computational
science and medical communities in aiding decision making.
The organization of the paper is as follows. Section 2.2 introduces the AAA G&R model,
the quantity of interest in making prediction, and the full statistical model with hyperparameters for Bayesian calibration. In Section 2.3, we discuss assumptions for priors and then
provide the posterior and predictive distributions under the Bayesian framework. Section 2.4
describes the design of the simulation study with synthetic observation data and a case study
with real patient data. In Section 2.5, we present the results of the Bayesian calibration for
both simulation and real data cases. Finally, we provide some discussions and concluding
remarks in Section 2.6.

2.2

Models
Prior of Parameters

CT Images

Bayesian Calibration

Posterior of
Parameters
Prediction and
Credible Band

<-",-&%-&9"*(
Statistical
Models
2.3$*%

Predictive
Distribution

Computer Model

Figure 2.2. The flow-chart of Bayesian calibration.

Bayesian calibration can be defined as a Bayesian approach to calibrate investigated
parameters in a theoretical model for a real complex system (here stands for the G&R computer model for AAA) that enables the incorporation of uncertainty regarding parameters,
12

real observations and possible simulations from the theoretical model. Fig. 2.2 shows the
general framework of Bayesian calibration, which also structured all the contents needed to
be discussed in the paper. The left two rectangles and one on the top list the inputs we
should prepare for a Bayesian calibration, while the right three rectangles list the outputs
we can get from a Bayesian calibration. The rectangle with double edges in the middle depicts that Bayesian calibration is actually a specially designed complex system capsulating
a bunch of statistical models. As you can see, Bayesian calibration is including a bunch
of models and integrating different sources of data together to generate products, such as
calibrated parameters (posterior distribution), prediction (predictive distribution) and statistical inference (credible bands). In this section, we introduce the data and all the basic
models needed in our Bayesian calibration (basically things in left two rectangles and the
rectangle with double edges in the middle). The following parts in this section are organized as follows. 2.2.1 introduces a computational model of AAA growth. 2.2.2 introduces
how the real data from CT images are prepared for Bayesian calibration. 2.2.3 introduces
a calibration model to combine the computational model with real data. 2.2.4 introduces
statistical specifications for implementing Bayesian calibration.

2.2.1

AAA G&R computation model

The Bayesian calibration framework includes a computational G&R model of AAA as a data
input, where the detailed computational G&R model was described in Zeinali-Davarani et
al. [117]. Briefly, the computational G&R model has three parts: constitutive relations of
intrinsic material behavior, a stress-mediated production function, and a damage function.
To describe material behavior, we assume that the aorta is comprised of three stress-bearing
constituents, viz. elastin, collagen fiber families, and vasoactive smooth muscle cells. Each
13

constituent, in addition to its contribution to the construction and strength of the arteryâs
wall, has its own individual properties, where the population-based material parameter distributions of abdominal aortas was given by Seyedsalehi et al. [94]. Also, the stress-mediated
production functions connect the stress-state of the artery to changes of the mass rates with
a stress-mediated feedback approach. Moreover, in the G&R model, the AAA is initialized
by imposing damage function to the elastin of normal aorta. The initialization can be justified by the previous study [87], which shows that one of the main features of AAA is that
elastin is decreased and the study [55, 115] that elaborate that the degradation of elastin
can directly form patient-specific shapes of aneurysms. However, the other factors, such as
alteration of intrinsic material parameters [83], disturbed collagen production [23, 24] and
hemodynamics [3], are taken as the minor reason of initialization of AAA, and be considered
as statistical error later.
In this paper, we aim to calibrate the computational model with the expansion rate
and evolution of aneurysm shapes along the longitudinal scan images. Our focus of interest,
called quantity of interest (QoI) (described in section 2.2 for the Bayesian calibration), is used
to calibrate the aneurysm growth with medical data. However, those parameters related to
intrinsic material function and stress-mediated production function cannot directly be used to
form the patient-specific geometry, but have a strong relation with patientâs health, age and
other patient behavior, while with respect to [115], the degradation of elastin play a leading
role in forming a patient-specific geometry, which is to say different patient-specific geometry
can be achieved given different damage parameters in G&R simulation. Consequently, we
prescribe the mean values of the material parameters [94] and stress-mediated parameters
[116] and calibrate the damage parameters statistically with the evolution of AAA geometry.
Also, in the statistical calibration model introduced later, random variations of prescribed
14

parameters will be treated as errors.
In our model, damage function works on elastin and vasoactive smooth muscle, therein
elastin plays an important role in the mechanical behavior of aorta. Elastin contributes
resilience and elasticity to the aortic tissue; but when the personâs age is advanced, elastin
cannot be replaced. For an AAA, the localized dilation of the aorta is initiated by the
degradation of the elastin. This degradation (or damage) will reduce the amount of elastin,
leading to the weakening of the wall. The damage will result in the increase of the diameter
and wall stress of the aneurysm. The increase of the stress in constituents results in an
increase in the accumulation of collagen and smooth muscles as a way to compensate for the
elastinâs loss and decrease stress in the wall. All the relations and details of the model have
been previously reported by Zeinali-Davarani et al. [116] and Kwon et al. [62].
Although the AAAâs shape has a wide variability and its expansion associates with the
complexity of the G&R process, in the current study, we decided to use the 2D axisymmetric
G&R model and first study the effectiveness of the Bayesian calibration. As discussed, the
elastin damage in the aortic wall initiates the growth of the aneurysm. Here we define the
initial elastinâs damage function as
"

|s â Îą1 |Îą2
d(s) = Î¸1 exp â
2 Î¸22

#!
(2.1)

where s is the coordinate defined on the centerline. g(s) = 1 â d(s) is the ratio of remaining
elastin to the initial amount at s.
The damage functions in (2.1) contains two parameters of interest {Î¸1 , Î¸2 } to be calibrated
from the real data and two other quantities {Îą1 , Îą2 } that can be easily identified via CT
images without Bayesian calibration. They have their own specific effects on the shape of

15

the damage function and thus on the stress-stretch and geometrical state of the AAA at a
given time. In particular, Î¸1 is a scaling factor with Î¸1 â [0, 1). An increase in Î¸1 toward 1
will increase the degradation of elastin and thus increase the dilatation of the artery. Î¸1 = 0
means no degradation; hence, the artery will retain its initial state. Îą1 corresponds to the
location on the centerline at which the maximum damage occurs. Îą1 and Îą2 control the
areal distribution of damage on the sides of the peak location.
Note that Îą1 and Îą2 are relatively easy to be estimated from the CT images compared
to Î¸1 and Î¸2 . In particular, the maximum diameter occurs at or very close to the location
of the maximum damage; hence, we estimated Îą1 by finding the location of the maximum
diameter using the CT images. Îą2 is chosen by minimizing the error between the model
simulation and CT images. Thus, we fix Îą1 and Îą2 with appropriate values obtained from
CT images directly a-priori and focus on calibration of Î¸1 and Î¸2 in a Bayesian way.
To see our simulation results in a symmetrical and straightforward configuration, we
have chosen Îą1 = 7.5 (i.e., averaged value) and Îą2 = 2 (i.e., the most simple shape). On the
other hand, we used Îą1 = 12.1 and Îą2 = 5, which are estimated directly from the patient
CT images for real data analysis in Section 2.5.2. Additionally, the choice of Îą2 = 2 for
the simulation study covers the other end of the possible values as compared to the real
observation case with Îą2 = 5.

2.2.2

Quantity of interest (QoI) for AAA G&R

The quantity of interest (QoI) of AAA G&R is what we want to predict in AAA growth
in future time. The selection of the QoI will let us subsequently determine the statistical
models and investigate the associated uncertainties. In our study, we select the radius of
the inscribed sphere with respect to a centerline as a QoI, where a centerline is computed
16

by inscribed spheres in the AAA 3D images [40]. Figs. 2.3a and 2.3b show inscribed spheres
and the resulting centerline, respectively, for a given 3D point cloud sampled from CT scan
image data from a patient. This QoI selection is consistent with medical practice in which
the diameter of the AAA is used as an important decision variable [59, 60].
To have a brief idea of the relationship among G&R model, 2D axisymmetric and QoI,
we plot a diagram showing how the 3D images are transformed into 2D data and how the
computer model simulates it. When we look at Fig. 2.4, there are two lines of processes.
First, I will describe how the data is transformed, as shown in Fig. 2.4a,Fig. 2.4b and
Fig. 2.4c. Fig. 2.4a shows the same process of transforming 3D images into 2D data as
Fig. 2.3. Thereafter, taking the length of the centerline as the coordinate, we obtain the
radius of the inscribed sphere r versus the height coordinate of the centerline s in Fig. 2.4b,
whose details are explained in [40]. Then we truncate the curve in Fig. 2.4b, and get the
QoI in Fig. 2.4c. In short, the 3D images are transformed into 2D data and then cut off into
QoI. Second, I will describe how G&R model works to simulate the QoI. Fig. 2.4d shows the
computational model of healthy aorta, then the healthy aorta evolutes into AAA following
mechanisms introduced by computational G&R model, which is actually a 2D axisymetric
geometric shape (Fig. 2.4e). Then we obtain the radius versus the height coordinate of the
centerline in Fig. 2.4f which can be called as simulation. And finally, Fig. 2.4g shows the
simulation result by minimizing the difference between the QoI and simulation.
Fig. 2.5 shows the real data needed in the Bayesian calibration procedure, which is the
QoIâs from scan images of patient K taken at 4 time points. To get each curve, we can
repeat the same process from (a) to (c) in Fig. 2.4 on each scan image. Bayesian calibration
combines the QoIâs and the simulation data to calibrate the selected parameters in the
computer model. And the radius in the figure are actually the target quantities we want
17

to predict finally. Therefore, we are saying that we would like to make prediction of the
QoI with associated uncertainty in future times. In this paper we consider the following
questions. Can we predict the radius vs. height profiles for CT scan images 3 and 4 in future
time given CT scan images 1 and 2 in Fig. 2.5? What is the uncertainty associated with
such prediction? We answer these questions by applying the proposed Bayesian calibration
method. The answers to the questions for this particular patient-specific data set shown in
Fig. 2.5 are given in Section 2.5.2 as a real data study case.

(a)

(b)

Figure 2.3. 3D point clouds sampled from a CT scan image. (a) An inscribed sphere with
point clouds from the CT scan images and (b) a centerline computed from point clouds.

18

Figure 2.4. The diagram of data preparation.

2.2.3

Calibration model

Let Îś(x) be the QoI of the true AAA G&R process, the input variable of which is denoted
by x and defined as x = [t s], where t is the time and s is the height on the centerline as
illustrated in Fig. 2.3b. Suppose that we have n observations from CT scan images of a given
patient, then âimage nâ is the ân-thâ image of one patient. To model possible observation
error, e.g., resolution and segmentation errors in CT scan images, we consider the noisy
observations as follows.

zi = Îś(xi ) + Îľi ,

âi â {1, Âˇ Âˇ Âˇ , n},

(2.2)

where the difference (or error) between the observation and the true process is denoted by
Îľi . We further assume that each Îľi is independently distributed as N (0, Îť), which represents
a normal (Gaussian) random variable with mean 0 and variance Îť.

19

Figure 2.5. Radius vs height coordinate on the centerline measured from CT scans of a
patient over the surveillant time. The lines from bottom to top represent images 1, 2, 3,
and 4, respectively.
We denote the QoI of the G&R computation model output at x as r(x, Î¸), where Î¸
is called a set of calibration parameters, or calibration inputs. In the G&R computation
model for the AAA expansion, damage parameters serve as calibration parameters that are
patient specific for the AAA growth, i.e., the calibration parameters are Î¸ = [Î¸1 Î¸2 ] defined
in Section 2.2.1.
Given the available QoI of the G&R computation model, we model the true process as

Îś(xi ) = r(xi , Î¸) + Î´(xi ),

20

(2.3)

where Î´(Âˇ) is a model inadequacy function, i.e., model error, which is independent of the
computation outputs. It is natural to assume the true AAA expansion process Îś(x) cannot
be fully described by the computation model, therefore we introduce Î´(x) to represent the
discrepancy in (2.3).
By combining (2.2) and (2.3), we have

zi = r(xi , Î¸) + Î´(xi ) + Îľi ,

(2.4)

which gives a calibration model that relates G&R computation outputs with the true process
and the observations. The Bayesian calibration method [58] we adopt in this paper introduces
Gaussian process priors for the computational model and the model error in order to calibrate
Î¸ and predict the QoI. Fig. 2.2 shows the flow-chart for Bayesian calibration. The statistical
models in Fig. 2.2 build on two Gaussian process priors for the G&R computation model
and model error, which will be discussed in the next section.

2.2.4

Statistical models

Let GP(m(Âˇ), k(Âˇ, Âˇ)) be the Gaussian process with the mean function m(Âˇ) and the covariance
function k(Âˇ, Âˇ). GP is flexible and popularly used as a prior model for functions [110, 111].
We introduce the following Gaussian processes as prior beliefs for the G&R computation
model and the model error:
r(x, Î¸) âź GP(m1 (x, Î¸), k1 (x, Î¸; x0 , Î¸ 0 )),

(2.5)

Î´(x) âź GP(m2 (x), k2 (x, x0 )).
To specify Gaussian process priors in (2.5) further, we introduce mean and covariance
21

structures for both processes.
For a mean structure, one can consider a linear combination of basis functions to approximate the general mean structure. The linear function is always the first thing to do,
although it is not perfect, since the results of linear function will give us insights for other
nonlinear study. Let h(x, Î¸) be a vector of basis functions and Î˛ be a vector of corresponding coefficients then we can represent m(x, Î¸) = h(x, Î¸)Î˛ T , where (Âˇ)T is the transpose of a
matrix or a vector. Since we need two mean functions, m1 (x, Î¸) and m2 (x), we introduce
two sets of h and Î˛ such that m1 (x, Î¸) = h1 (x, Î¸)Î˛1T and m2 (x) = h2 (x)Î˛2T .
In this paper, we consider the following linear mean structures for computational efficiency:
m1 (x, Î¸) = h1 (x, Î¸)Î˛1T = Î˛10 + Î˛11 t + Î˛12 Î¸1 + Î˛13 Î¸2 ,

(2.6)

where h1 (x, Î¸) = [1 t Î¸1 Î¸2 ] and Î˛1 = [Î˛10 Î˛11 Î˛12 Î˛13 ], and

m2 (x) = h2 (x)Î˛2T = Î˛21 t + Î˛22 s,

(2.7)

where h2 (x) = [t s] and Î˛2 = [Î˛21 Î˛22 ]. These mean structures imply that the mean function
of the G&R computation model is linear in time t and calibration parameters, {Î¸1 , Î¸2 }. The
mean function of the model error is linear in time t and location s. Î˛ = [Î˛1 Î˛2 ] are
hyperparameters for the mean functions in a Bayesian context.
For a covariance structure, we use the following exponential functions as follows.

k1 (x, Î¸; x0 , Î¸ 0 ) = Ď12 exp{â(x â x0 )âŚx (x â x0 )T } exp{â(Î¸ â Î¸ 0 )âŚÎ¸ (Î¸ â Î¸ 0 )T },
k2 (x, x0 ) = Ď22 exp{â(x â x0 )âŚ?x (x â x0 )T },

22

where âŚx , âŚÎ¸ , âŚ?x are diagonal matrices such that
ďŁŤ

ďŁś

ďŁŤ

ďŁś

ďŁŤ

?
Ďx1

ďŁŹ Ďx1 0 ďŁˇ
ďŁŹ ĎÎ¸1 0 ďŁˇ ? ďŁŹ
âŚx = ďŁ­
ďŁ¸ , âŚÎ¸ = ďŁ­
ďŁ¸ , âŚx = ďŁ­
0 Ďx2
0 ĎÎ¸2
0

ďŁś
0 ďŁˇ
ďŁ¸
?
Ďx2

? âs.
so that the hyperparameters for the covariance functions are Ď12 , Ď22 and wxj , wÎ¸j , wxj

Note that the resulting covariance functions are not isotropic since they allow different scaling
for each coordinate. Additionally, they can be regarded as separable covariance functions
since the dependence due to one coordinate is independent of the dependence due to another
coordinate. One can consider a more generalized covariance structure at the cost of higher
computational complexity.
For the parameters controlling the covariances, we introduce Ď = [Ď1 Ď2 ], with
Ď1 = [Ďx1 Ďx2 ĎÎ¸1 ĎÎ¸2 Ď12 ],

(2.8)

? Ď ? Ď 2 ].
Ď2 = [Ďx1
x2 2

Ď1 is the set of hyperparameters related to the G&R computation model. Ď2 is the set of
hyperparameters related to model error. For the remainder of the paper, the AAA G&R
computation model is referred to as the computation model for notational simplicity.

2.3

Bayesian analysis for calibration

Bayesian calibration considers computation model outputs as the data in addition to the
observations and combines these two to calibrate Î¸ under the Bayesian framework. Calibration helps improve the prediction of the QoI compared to the prediction using only the
computation model outputs or using only observations. The diagram of the Bayesian cali23

bration process Fig. 2.6 gives us an overview of the whole process of Bayesian analysis which
combines the information of priors, computer model, statistical model and observations to
calibrate the parameter Î¸ to make predictions. The top right part shows that computer
model data covers the whole time range of real observations and predictions. And the computer model data helps to estimate the covariance hyperparamter Ď1 , which is defined as
State 1. The left middle part shows the real observations helps to estimate the covariance
hyperparameter Ď2 , which is defined as Stage 2. The right bottom part shows the calibration
and prediction which is called Stage 3. As we can see calibration parameters only exist in
computer models, and we use real observations to calibrate the parameters in the computer
model so that the prediction can be improved.

2.3.1

Likelihood of computer model

We need computation model outputs generated at various sets of Î¸ and x as the data for
calibration. To avoid confusion, we use Î¸ â = [Î¸1â Î¸2â ] and xâ instead of Î¸ and x, respectively,
when they were used to generate computation model outputs. For example, at (xâ , Î¸ â ),
we obtain one computation model output r(xâ , Î¸ â ). Computation model outputs which
correspond to a set of various (xâ , Î¸ â ) will be used in Bayesian calibration as a part of the
data set. We call Î¸ â calibration inputs and xâ variable inputs.
To illustrate the Bayesian calibration method [58, 77], we introduce necessary notations. Regarding the computation model outputs, let N be the total number of pairs
of variable inputs and calibration inputs. Then, for the ith set of inputs, (xâi , Î¸iâ ), we
let yi be the corresponding computation model output, that is, yi = r(xâi , Î¸iâ ). We further define y = [y1 Âˇ Âˇ Âˇ yN ]T â RN Ă1 as the computation model output vector and
â )T ]T â RN Ă4 as the input matrix that corresponds to y.
Xc = [(xâ1 , Î¸1â )T , Âˇ Âˇ Âˇ , (xâN , Î¸N

24

D1

Î¸

(t, S)
Computer data

Î¸

(t, S)
Stage1

(t,S)

Î¸

(t,S)

(t, S)

(t,S)
(t,S)
(t,S)

D2

Î¸

(t,S)
(t,S)

(t, S)

(t,S)
D3
(t, S)

Î¸

(t, S)

Real Observations
Stage2

Bayesian Prediction
t years
Stage3

Stage1

Prediction

Calibration
Stage3

Stage2

Figure 2.6. Diagram of the Bayesian calibration process.
Then, the assumption of the Gaussian process prior on the computation model gives

y âź N (H1 (Xc )Î˛1T , V1 (Xc )),
â )T ]T and the (i, j) entry of V (X ) is k ((xâ , Î¸ â ), (xâ , Î¸ â )).
where H1 (Xc ) = [h1 (xâ1 , Î¸1â )T , Âˇ Âˇ Âˇ , h1 (xâN , Î¸N
1 c
1
i i
j j

2.3.2

Likelihood of real observations

Let n be the number of observations and z = [z1 Âˇ Âˇ Âˇ zn ]T â RnĂ1 be the set of observations
corresponding to the variable input matrix Xo = [xT1 , Âˇ Âˇ Âˇ , xTn ]T â RnĂ2 . Note that Xo is not
25

necessarily the same as the set of variable inputs for the computation model outputs. In general, the number of observations is smaller than that of the computation model outputs since
we can control the amount of the computation model outputs. To calibrate Î¸ from the observations, we augment variable inputs Xo with Î¸ such that Xo (Î¸) = [(x1 , Î¸)T , Âˇ Âˇ Âˇ , (xn , Î¸)T ]T .
Then, from the calibration model, we have

z âź N (H1 (Xo (Î¸))Î˛1T + H2 (Xo )Î˛2T , ÎťIn + V1 (Xo (Î¸)) + V2 (Xo )),

where
H1 (Xo (Î¸)) = [h1 (x1 , Î¸)T , Âˇ Âˇ Âˇ , h1 (xn , Î¸)T ]T
and
H2 (Xo ) = [h2 (x1 )T , Âˇ Âˇ Âˇ , h2 (xn )T ]T .
V1 (Xo (Î¸)) is defined in a similar way to define V1 (Xc ). The (i, j) entry of V2 (Xo ) is
k2 (xi , xj ).

2.3.3

Joint likelihood

We then combine the computation model outputs and observations, d = [y T z T ]T â
R(N +n)Ă1 , which we call a data vector.
ďŁŤ

ďŁś

ďŁŹ y ďŁˇ
d = ďŁ­ ďŁ¸ âź N (md (Î¸), Vd (Î¸)),
z
where
md (Î¸) := E(d|Î¸, Î˛, Ď) = H(Î¸)Î˛ T ,
26

(2.9)

with
ďŁŤ
ďŁŹ
H(Î¸) = ďŁ­

ďŁś
H1 (Xc )

0

H1 (Xo (Î¸)) H2 (Xo )

ďŁˇ
ďŁ¸,

and
ďŁŤ
ďŁŹ
Vd (Î¸) := var(d|Î¸, Î˛, Ď) = ďŁ­

V1 (Xc )

C1 (Xc , Xo (Î¸))T

C1 (Xc , Xo (Î¸)) ÎťIn + V1 (Xo (Î¸)) + V2 (Xo )

ďŁś
ďŁˇ
ďŁ¸,

where In is the n Ă n identity matrix and C1 (Xc , Xo (Î¸)) is a cross-covariance matrix whose
(i, j) entry is k1 ((xâi , Î¸iâ ), (xj , Î¸j )). We note that md (Î¸) and Vd (Î¸) also depend on Î˛ and
Ď. We drop them to reduce notational complexity.
Remark 2.3.1 Recall that the calibration model assumes the true process as the sum of
the computation model process and the model error process by assuming they are Gaussian
processes as shown in (2.4) and (2.5). If we have observations of the true process only,
we cannot identify these two Gaussian processes. However, the Bayesian calibration method
makes use of both computation model outputs and observations as data. Computation model
outputs contribute to the first Gaussian process only while observations contributes to both
Gaussian processes as well as observation errors. Since the computation model outputs and
observations contribute to two Gaussian processes differently, there is no identifiability issue.

2.3.4

Calibration

To estimate (Î¸, Î˛, Ď) under the Bayesian framework, we consider the following prior distributions and assumptions.
A.1 Î˛1 in (3.3) and Î˛2 in (3.4) have non-informative priors, i.e., p(Î˛1 , Î˛2 ) â 1 .
A.2 Î¸ is independent of the other parameters.
27

A.3 Î¸ follows a normal distribution.
A.4 Ď1 and Ď2 in (2.8) follow lognormal distributions.
A.5 log(Îť) has a non-informative prior, i.e., p(log(Îť)) â 1.
Remark 2.3.2 Note that based on A.1 and A.2, we can get the joint prior distribution in
the following form p(Î¸, Î˛, Ď) â p(Î¸)p(Ď). For A.3, we use the sample mean and variance
of the calibration parameter inputs that were used to generate computation model outputs as
mean and variance for normal prior density. A.4 guarantees that Ď1 and Ď2 in (2.8) are
positive values in the calculation since they are all hyperparameters in covariance functions.
Together with the prior specification given above, we have the following full joint posterior
distribution of (Î¸, Î˛, Ď) given d:

p(Î¸, Î˛, Ď | d) â p(Î¸)p(Ď)f (d; md (Î¸), Vd (Î¸))


1
â 21
T
â1
Ă exp â {(d â md (Î¸)) Vd (Î¸) (d â md (Î¸))} ,
â p(Î¸)p(Ď)|Vd (Î¸)|
2

(2.10)

where f (d; m, V ) is the multivariate Gaussian density function with mean m and variance
V . Note that the posterior distribution of (Î¸, Î˛, Ď) given d is proper even if we assume
non-informative priors for Î˛.
To calibrate (estimate) Î¸ from the posterior distribution of Î¸ | d, we need to integrate
out Î˛ and Ď. As shown explicitly in (2.9), md (Î¸) depends on Î˛, while Vd (Î¸) depends on
Ď. Clearly, md (Î¸) is a linear function of Î˛, so we can find

Î˛ | Î¸, Ď, d âź N (Î˛Ě(Î¸), W (Î¸)),

28

(2.11)

where
Î˛Ě(Î¸) = W (Î¸)H(Î¸)T Vd (Î¸)â1 d,
W (Î¸) = (H(Î¸)T Vd (Î¸)â1 H(Î¸))â1 .
Thus, we can integrate out Î˛ in (2.10) and get
1

1

p(Î¸, Ď | d) â p(Î¸)p(Ď)|Vd (Î¸)|â 2 |W (Î¸)| 2


1
T
â1
Ă exp â {(d â H(Î¸)Î˛Ě(Î¸)) Vd (Î¸) (d â H(Î¸)Î˛Ě(Î¸))} .
2

(2.12)

While Î˛ is integrated out easily, Ď is not. Numerical integration can be considered but
may not give an accurate result since it is the multi-dimensional integration. Thus, we use
the conditional posterior distribution of Î¸ given d and a plausible estimate of Ď plugged
in, which is proposed by Kennedy and OâHagan (2001) [58]. The plausible estimate of Ď is
obtained in a two-step approach. First, we estimate Ď1 by maximizing p(Ď1 |y) (Stage 1).
Then, we estimate Ď2 by maximizing p(Ď2 |d, Ď1 ) after plugging in the estimated Ď1 (Stage
2). This process is shown in Fig. 2.7. In this study, numerical maximization is used with
500 iterations for Stage 1 and 100 iterations for Stage 2.

p(Î¸, Îť, Ď|d)

ĎĚ

Stage1

p(Î¸|Îť, Ď, d)

Stage2

Estimate Îť and Ď2 by
maximizing p(Îť, Ď2 |d, Ď1 )

Estimate Ď1 by
maximizing p(Ď1 |y)
Figure 2.7. Two stages to estimate Ď.

Having estimated Ď and plugging it into the distribution of Î¸ | d, Ď, we obtain the

29

posterior distribution of the calibration parameter Î¸ to be
1

1

p(Î¸ | ĎĚ, d) â p(Î¸)|Vd (Î¸)|â 2 |W (Î¸)| 2


1
T
â1
Ă exp â {(d â H(Î¸)Î˛Ě(Î¸)) Vd (Î¸) (d â H(Î¸)Î˛Ě(Î¸))}
2

(2.13)

and use (2.13) to calibrate and make an inference of Î¸.

2.3.5

Prediction

We introduce another set of variable inputs Xp = [xTn+1 , Âˇ Âˇ Âˇ , xTn+m ]T â RmĂ2 to be used for
prediction. The corresponding prediction of the QoI at given variable inputs Xp is denoted
as P = [P1 Âˇ Âˇ Âˇ Pm ]T â RmĂ1 . For quality of the prediction, we can consider variable inputs
in Xc that cover Xp .
Under the Bayesian framework, a prediction can be made using the predictive distribution of the (unobserved) true process Îś(x) given full data d. More specifically, we consider
the predictive distribution of Îś(x) given d and ĎĚ since we fix Ď using the estimate ĎĚ in
our approach. We first analytically obtain the distribution of Îś(x) | Î¸, Ď, d, Î˛, which is
conditionally normal since Îś(x) and d are normally distributed. Using the laws of total expectation and total variance to integrate out Î˛, we obtain the distribution of Îś(Âˇ) conditional
on Î¸ and ĎĚ, which is also normal. Therefore, its mean function is given by

E(Îś(x)|Î¸, ĎĚ, d) = h(x, Î¸)T Î˛Ě(Î¸) + v(x, Î¸)T Vd (Î¸)â1 (d â H(Î¸)Î˛Ě(Î¸)),
where
ďŁŤ

ďŁś

ďŁŹ h1 (x, Î¸) ďŁˇ
h(x, Î¸) = ďŁ­
ďŁ¸,
h2 (x)
30

(2.14)

and
ďŁŤ
ďŁŹ
v(x, Î¸) = ďŁ­

ďŁś
V1 ((x, Î¸), Xo )
V1 {(x, Î¸), Xo (Î¸)} + V2 (x, Xo )

ďŁˇ
ďŁ¸.

Additionally, its covariance function is given by

cov(Îś(x), Îś(x0 ) | Î¸, ĎĚ, d) = k1 ((x, Î¸), (x0 , Î¸)) + k2 (x, x0 )
â v(x, Î¸)T Vd (Î¸)â1 v(x0 , Î¸) + (h(x, Î¸)

(2.15)

â H(Î¸)T Vd (Î¸)â1 v(x, Î¸))T W (Î¸)(h(x0 , Î¸)
â H(Î¸)T Vd (Î¸)â1 v(x0 , Î¸)).
We then can obtain the predictive distribution of Îś(x) given d and ĎĚ by integrating Î¸
out from the distribution of Îś(x) | Î¸, ĎĚ, d with respect to the posterior distribution of Î¸
given in (2.13). From (2.14) and (2.15), we obtain the predictive expectation and variance
of Îś(Âˇ) evaluated at inputs Xp as follows:

E[Îś(Xp ) | ĎĚ, d] = EÎ¸ {E[Îś(Xp ) | Î¸, ĎĚ, d]},

(2.16)

and

var[Îś(Xp ) | ĎĚ, d] = EÎ¸ {var[Îś(Xp ) | Î¸, ĎĚ, d]} + varÎ¸ {E[Îś(Xp ) | Î¸, ĎĚ, d]},

(2.17)

where Îś(Xp ) = [Îś(xn+1 ) Âˇ Âˇ Âˇ Îś(xn+m )]T .
Based on the posterior density (2.13) of the calibration parameter Î¸, we can draw a bunch
of Markov Chain Monte Carlo (MCMC) samples for Î¸ and thus predictive expectation and

31

variance (2.16) and (2.17) can be approximated as follows.. First, we draw M samples
{Î¸(1), Âˇ Âˇ Âˇ , Î¸(M )} from (2.13) using the Metropolis-Hastings algorithm [15] and plug these
samples into (2.14) and (2.15). Next, we get the predictions P , which are the MCMC
estimates of (2.16) written as
M
1 X
b
P := E[Îś(Xp ) | ĎĚ, d] =
E[Îś(Xp ) | Î¸(i), ĎĚ, d].
M

(2.18)

i=1

Similarly, the outermost expectation and variance on the right hand side of (2.17) are estimated by their corresponding sample mean and sample variance with respect to Î¸. Then,
we obtain the predictive variances Vp as

b Î¸ {var[Îś(Xp ) | Î¸, ĎĚ, d]} + var
Vp := var[Îś(X
c
c Î¸ {E[Îś(Xp ) | Î¸, ĎĚ, d]},
p ) | ĎĚ, d] = E

(2.19)

where
M
1 X
b
var[Îś(Xp ) | Î¸(i), ĎĚ, d],
EÎ¸ {var[Îś(Xp ) | Î¸, ĎĚ, d]} =
M
i=1

M

var
c Î¸ {E[Îś(Xp ) | Î¸, ĎĚ, d]} =

1 X
2
b
{E[Îś(Xp ) | Î¸(i), ĎĚ, d] â E[Îś(X
p ) | ĎĚ, d]} .
M â1
i=1

We will construct the 95% credible intervals based on these predictions P and their predictive
variances Vp .

2.4

Data Description

We consider two cases of data sets. In the first case, we use the G&R computation model of
the AAA expansion to generate synthetic observations in addition to producing computation
32

model outputs. This is the simulated observation case that validates our approach since we
know the realized true process along with the parameters that generate it for comparison.
In the second case, we use real data, i.e., observations from a patientâs CT scan images. The
G&R computation model is used only to obtain computation model outputs in this real-world
clinical application. When implementing Bayesian analysis, we standardize all the inputs x,
Î¸ and outputs y, z to stabilize computation. In particular, we use the notation Î¸ = [Î¸1 Î¸2 ]
org

to denote the standardized ones while we use Î¸ org = [Î¸1

org

Î¸2 ] to denote original values for

the rest of the paper.

2.4.1

Simulated Observation Cases

For the simulated observation case, we generate r(x, Î¸) from the G&R computation model.
To validate our approach, we realize and set patient specific values on the calibration parameters (say Î¸0 ) for the synthesized true process so that we evaluate the estimated calibration
parameters. Recall that for the damage function, we need to set Î¸1 , Î¸2 , Îą1 , and Îą2 to
produce the G&R computation model outputs. For the simulated observation case, we set
org

Î¸1

org

= 0.65, Î¸2

org

= 6, Îą1 = 7.5 and Îą2 = 2. Thus, the calibration input is Î¸0

= [0.65 6]

for the true process. For the variable inputs x = [t s], we consider 8 equally spaced s from 0
to 15 cm and a time step of 5 days, which are fixed throughout the study. The time duration
is 7 years, but several different choices of sampling times are considered. Note that when
we implement Bayesian calibration, the values of the QoI (the radius with respect to the
centerline) are standardized as well so that its mean is 0 and its variance is 1.
Once we obtain a realization of r(x, Î¸0 ), we add model and observation errors to get the
final observations. The model error, Î´(x), is realized from the model assumption (2.5) with
Î˛2 = [0.001 0.001], Ď21 = 1, Ď22 = 1, and Ď22 â {0.005, 0.001}. Then the standard deviation
33

of model error Î´(x) on each height at each time is about {0.071, 0.032}. The observation
error Îľ is generated from N (0, Îť) with Îť = 0.001, i.e.,  âź N (0, 0.0322 ). The standard
deviation of standardized computation outputs r is 1. To generate simulated observations,
we add a model error process of 7.1% or 3.2% (with respect to r) and an observation error
process of 3.2%. We chose these values for the model and observation errors in order to
produce a data set that best illustrates the effects of different noise levels and sampling
schemes.
We also need computation model outputs at various combinations of variable inputs and
org org
calibration inputs. For calibration inputs, we consider Î¸ org = [Î¸1 Î¸2 ] â {0.5, 0.65, 0.8} Ă

{2, 6, 10} so that there are total 9 combinations of Î¸1 and Î¸2 . For variable inputs, s will be
considered at 8 equally spaced values from 0 to 15 cm. For time t, we consider three different
scenarios to see the effects of different time resolutions. With two choices of Ď22 , there are
a total of 6 different scenarios. Table 2.1 gives the label of each scenario. Further details
of each scenario are given in Table 2.2. With three cases of sampling time grids and the
two levels of Ď22 , we want to investigate the behavior of interaction between the computation
model and Bayesian calibration. In Case 1, computation model inputs have a sparse time
grid while those in Case 3 have a denser one. In Case 2, the computation model inputs do
not cover the full time range we want to predict while those in Case 1 and Case 3 do.
We note that variable inputs and calibration inputs for the proposed Bayesian calibration
are standardized in the actual implementation so that we can assume the realized calibration
inputs are centered at zero. Then, we use 0.3 as prior means of Î¸1 and Î¸2 (away from the
zero) to investigate the robustness of prior distributions. In addition, we consider 0.1 for
prior variances of Î¸1 and Î¸2 . The same setting for the prior of Î¸1 and Î¸2 is used later in the
real observation case. Note that the normal prior can be used for Î¸1 that is constrained to
34

the interval [0, 1]. As Î¸1 was standardized (mean is 0, standard deviation is 1), the range of
Î¸1 becomes [â1.3, 1.3] based on the parameter values we use. We set the prior of mean at 0.3,
and the variance of prior at 0.1 (standard deviation is 0.32). According to the empirical rule,
99.7% of the samples from the prior will lie in the interval [â0.66, 1.26] which contains in the
interval [â1.3, 1.3]. The sensitivity of prior choice on the posterior estimates of Î¸1 and Î¸2 is
investigated and presented in Supplemental Fig. A.3 (in Supplementary Section Appendix
2), where we can see that the posterior distributions are robust against the subjective choice
of prior distributions.
Table 2.1. Differences in Xc and Ď22 for simulation cases
sampling time in Xc Ď22 = 0.005 Ď22 = 0.001
every 2 years
every 1.5 years
every 1 year

2.4.2

Case 1a
Case 2a
Case 3a

Case 1b
Case 2b/2c
Case 3b

Real Observation Case

For the real observation case, we consider 4 longitudinal CT scan images of a patient as shown
in Fig. 2.1. This particular patientâs CT scan examination spans a period of 3 years. For each
image, we measured the corresponding radius (QoI) at each height on the centerline as illustrated in Fig. 2.5 using the maximally-inscribed sphere method developed in [40]. From the
preliminary study of the damage shape with respect to the CT scan images, we selected Îą1
and Îą2 such that Îą1 = 12.6 and Îą2 = 5 for (2.1) a-priori. In addition, we selected the reasonable range of the calibration parameter Î¸ a-priori to produce comparable computation model
outputs for the particular observations (Fig. 2.5). More specifically, as prior information to
determine the range of the calibration parameter Î¸ org , we fit the diameter of the computation
model outputs to the diameter calculated from CT scan images using fminsearch in MAT35

org org

LAB (Mathwork, Natick, USA) and found the optimum value to be [Î¸1 Î¸2 ] = [0.2 0.9].
org org
From this information, we consider Î¸ org = [Î¸1 Î¸2 ] â {0.05, 0.2, 0.35} Ă {0.7, 0.90, 0.99}.

For these 9 combinations of Î¸ org , we run the G&R model to produce the computation model
outputs. More details for this real observation case are given in Table 2.2.
Table 2.2. Case details
Xc times
Xc dimension
Xo times
Xo dimension
Xp times
Xp dimension
y dimension
z dimension
Ď22
Î´ % of r

2.5

Case 1a
Case 1b
[0 2 4 6]
288 Ă 4
[0 0.5 1 1.5 2 2.5 3]
56 Ă 2
[3.5 4 4.5 5 5.5 6]
48 Ă 2
288 Ă 1
56 Ă 1
0.005
0.001
7.2%
3.2%

Case 2a
Case 2b/2c
Case 3a
Case 3b
[0 1.5 3 4.5 6]
[0 1 2 3 4 5 6 7]
360 Ă 4
576 Ă 4
[0 0.5 1 1.5 2 2.5 3 3.5 4]
72 Ă 2
[4.5 5 5.5 6 6.5 7]
48 Ă 2
360 Ă 1
576 Ă 1
72 Ă 1
0.005
0.001
0.005
0.001
7.2%
3.2%
7.2%
3.2%

Real Data
[1 2 3 4]
288 Ă 4
[1 2]
16 Ă 2
[3 4]
16 Ă 2
288 Ă 1
16 Ă 1
0.01
10%

Results

In this section, we present results of the Bayesian calibration method on simulated observations and real observations as described in Section 2.4. For detailed implementation steps,
readers are referred to Section Appendix 1 in the supplementary document.
For simulated observations, we show three plots, prediction results with true processes
(Fig.

2.8), 95% credible bands with true processes (Fig.

2.9) and relative errors for

prediction (Fig. A.1). Relative errors R = [R1 Âˇ Âˇ Âˇ Rm ]T over the height of the AAA are
calculated as
Ri =

| Pi â Ti |
,
Ti

âi â {1, Âˇ Âˇ Âˇ , m},

(2.20)

where T = [T1 Âˇ Âˇ Âˇ Tm ]T is the realized true process for the simulated observation case that
is described in Section 2.4.1.
For the real observations, the prediction and credible band graphs are given together in

36

Fig. 2.11. Since only observations are available for this case, relative errors for prediction
are calculated by replacing {Ti } in (2.20) with the observations (i.e., noisy true values) as
shown in Fig. A.2b.

2.5.1

Results from Simulated Observation Cases

From Figs. 2.8, 2.9, and A.1 of the simulation study, we observe the following findings.
Prediction quality improves when more computation model outputs are used for calibration.
When we consider different sampling time resolutions, prediction quality monotonically improves from Case 1 (coarse resolution) to Case 3 (fine resolution). Smaller model errors
(Cases 1b, 2b, and 3b) provide better prediction results. Compared to Cases 1a, 2a, and
3a (left column of each plot), prediction quality from Cases 1b, 2b, and 3b (right column of
each plot) is better. As one can expect, prediction quality decreases at the year for which
the computation model outputs or observations are not available for calibration. For example, at years 4.5, 5, and 5.5, the errors between the true processes and predictions are
more pronounced in Case 1a since both computation model outputs and observations are
not available at those years. In particular, predictions at years 6.5 and 7 for Case 2 are the
worst in terms of relative errors. Such results are reflected in wide credible bands as well. A
possible reason is that calibration for Case 2 did not have information at years 6.5 or later
to predict at years 6.5 and 7. Recall that computation model outputs are up to year 6 and
observations are up to year 4. On the other hand, calibration for Case 1 has computation
model outputs at year 6 to predict at years 4.5, 5 and 5.5. This result suggests that we can
use computation model outputs at future times during calibration for better prediction. In
this regard, results from Case 3 are better than Cases 1 and 2. However, more computation
model outputs in the calibration process implies higher computational burden as shown in
37

T[3.5]
P[3.5]
T[4.0]
P[4.0]
T[4.5]
P[4.5]
T[5.0]
P[5.0]
T[5.5]
P[5.5]
T[6.0]
P[6.0]

2.7
2.5

Radius

2.3
2.1

2.9

2.5
2.3
2.1

1.9

1.9

1.7

1.7

1.5

1.5

1.3
0

3

6

9

12

T[3.5]
P[3.5]
T[4.0]
P[4.0]
T[4.5]
P[4.5]
T[5.0]
P[5.0]
T[5.5]
P[5.5]
T[6.0]
P[6.0]

2.7

Radius

2.9

1.3
0

15

3

6

Height

2.9

T[4.5]
P[4.5]
T[5.0]
P[5.0]
T[5.5]
P[5.5]
T[6.0]
P[6.0]
T[6.5]
P[6.5]
T[7.0]
P[7.0]

2.7
2.5

Radius

2.3
2.1

2.5
2.3
2.1

1.7

1.7

1.5

1.5

6

9

12

T[4.5]
P[4.5]
T[5.0]
P[5.0]
T[5.5]
P[5.5]
T[6.0]
P[6.0]
T[6.5]
P[6.5]
T[7.0]
P[7.0]

2.7

1.9

1.3
0

15

3

6

Height

2.7
2.5
2.3

Radius

15

2.1

2.9

2.5
2.3
2.1

1.9

1.9

1.7

1.7

1.5

1.5

6

9

12

T[4.5]
P[4.5]
T[5.0]
P[5.0]
T[5.5]
P[5.5]
T[6.0]
P[6.0]
T[6.5]
P[6.5]
T[7.0]
P[7.0]

2.7

Radius

T[4.5]
P[4.5]
T[5.0]
P[5.0]
T[5.5]
P[5.5]
T[6.0]
P[6.0]
T[6.5]
P[6.5]
T[7.0]
P[7.0]

1.3
0

15

3

Height

(e) Case 3a

12

(d) Case 2b

2.9

3

9

Height

(c) Case 2a

1.3
0

15

2.9

1.9

3

12

(b) Case 1b

Radius

(a) Case 1a

1.3
0

9

Height

6

9

12

15

Height

(f ) Case 3b

Figure 2.8. Predictions of the true processes. T [t] denotes the true values at year t, while
P [t] denotes the predictions at year t. The solid line denotes the true values, while the
dashed line denotes the predictions.
Table 2.4.
From the Bayesian calibration, we have posterior samples of calibration parameters as
38

(a) Case 1a

(b) Case 1b

(c) Case 2a

(d) Case 2b

(e) Case 3a

(f ) Case 3b

Figure 2.9. 3D 95% credible band of predictions. The blue stars denote the true values
lying inside the credible band. The red marks denote the true values lying outside the
credible band.
well as hyperparameters. For the simulated observation case, true values of parameters
(values that were used to generated the data) are known so that we can compare the per-

39

formance of Bayesian calibration for various simulation scenarios. The calibrated estimates
of standardized Î¸1 and Î¸2 for each case are provided in Table 2.3. Posterior densities of
standardized Î¸1 and Î¸2 are shown in Supplementary Fig. A.3. The prior distributions and
estimates of hyperparameters are shown in Supplementary Tables A.1âA.8.
Comparing the posteriors of Î¸1 and Î¸2 in all cases, we clearly notice that the posteriors
of Case 3 have the sharpest peaks around the true values, which can be utilized as point
estimators. As shown in Table 2.3, the posterior estimates of Î¸1 and Î¸2 in Case 3b are the
closest to their true values 0 and 0 among all cases. This also illustrates that more information from computation model outputs can give more accurate estimates of the calibrated
parameters. Consequently, better estimation of calibrated parameters is likely to give us
better quality of predictions.
2.9

T[4.5]
P[4.5]
T[5.0]
P[5.0]
T[5.5]
P[5.5]
T[6.0]
P[6.0]
T[6.5]
P[6.5]
T[7.0]
P[7.0]

2.7
2.5

Radius

2.3
2.1
1.9
1.7
1.5
1.3
0

3

6

9

12

15

Height

(a) Predictions of the true processes. T [t]
denotes the true values at year t, while P [t]
denotes the predictions at year t. The solid line
denotes the true values, while the dashed line
denotes the predictions.

(b) 3D 95% credible band of predictions. The
blue stars denote the true values lying inside
the credible band. The red marks denote the
true values lying outside the credible band.

Figure 2.10. Predictions and credible bands for Case 2c.

40

2.5.1.1

Robustness of prior selection

To investigate the impact of different priors on Bayesian calibration, we compare the results
of Case 2b and Case 2c, where Case 2c is the same as Case 2b except the prior variances of
calibration parameters are 10 times larger.
The posteriors of Î¸1 and Î¸2 for Case 2c (Fig. A.4a, in the supplementary document) cover
the prior mean values 0 and 0, but they have larger posterior variances than those from Case
2b due to the larger prior variances. We see that the predictions (Figs. 2.8d and 2.10a) and
the 95% credible bands (Figs. 2.9d and 2.10b) are similar even though their prior variances
are quite different. This can be seen also in relative errors (Figs. A.1d and A.2a). This
implies that our Bayesian calibration method is robust to such changes in priors. Therefore,
we may use a diffuse prior that can cover all the possible values of parameters Î¸ by learning
the statistical information from the background and the computation model. One can choose
the priors for the real observation case in a similar way.

2.5.1.2

Computation time

In this study, computation model outputs for calibration were generated separately. Calibration and prediction were implemented using the computational model outputs together with
observations. Our method requires computation model outputs for each set of calibration
parameters. Thus, we provide computation time to generate computation model outputs per
each set of calibration parameters. The time to generate computation outputs by solving
the G&R computation model is 10 minutes for one combination of parameters (Î¸1 and Î¸2 ),
producing outputs in 8 grid points of heights (total height is 15 cm) and every 5 days out of
total 2750 days. The G&R computation has been coded and implemented using MATLAB
R2012a on an Intel Core i7 3770 3.4 GHz Processor with 12 GB of RAM.
41

If we generate a large number of computation model outputs for densely sampled calibration parameter values, the accuracy of calibration and prediction will improve at the expense
of the computational cost. Note that this computation is naturally parallel since we run the
computation model on a given parameter vector, the procedure of which is repeated to cover
a defined range of parameter combinations. As the computation model becomes increasingly
complicated, this time will also increase. The total computation will still remain parallel.
In Table 2.4, we show the computation time for Bayesian calibration and prediction. Case
1, Case 2, and Case 3 took around 6.5, 12, and 30 hours, respectively. As expected, Case
3 takes the longest time yielding the best prediction results. To provide more insight, we
divide the total computational time into two parts in Table 2.4: one spent in optimization to
find Ď1 and Ď2 , the other one spent in calculation of estimation, prediction, and uncertainty
via sampling. As can be seen in Table 2.4, the latter part takes more time due to the
sequential computation of Bayesian analysis steps. All computations for Bayesian analysis
were implemented on a twelve-core CPU 2.90GHz running Windows Server 2008 using R
software (R version 3.1.1 Copyright (C) 2014 The R Foundation for Statistical Computing).
We used the package âcalibratorâ [45] in R with appropriate modifications in order to adopt
our application and also to reduce computation time significantly (e.g., 100 times faster).
The results from Cases 1-3 illustrate that if affordable, the best results are achieved when
Xc has a dense time grid covering the whole targeted prediction range of interest.

2.5.2

Results from Real Observation Case

In this section, we discuss the calibration results from real observations using CT images 1-4
as shown in Fig. 2.5, i.e., (CT images taken at 0, 1.2, 2.3, and 3.2 years, respectively). To
generate computation model outputs to be used for calibration, we consider the inputs Xc
42

Table 2.3. Posterior estimates of Î¸1 and Î¸2
org
org
Case Î¸1 (mean, std1 ) Î¸2 (mean, std) Î¸1 (mean, std)
Î¸2 (mean, std)
1a
(0.678, 0.035)
(6.05, 0.105) (0.225, 0.288)
(0.015, 0.032)
1b
(0.668, 0.023)
(5.912, 0.147) (0.149, 0.281) (â0.027, 0.045)
2a
(0.681, 0.035)
(6.072, 0.105) (0.249, 0.079)
(0.022, 0.032)
2b
(0.672, 0.024)
(6.013, 0.080) (0.177, 0.195)
(0.004, 0.025)
3a
(0.669, 0.016)
(5.997, 0.056) (0.155, 0.127) (â0.001, 0.017)
3b
(0.660, 0.010)
(6.003, 0.040) (0.083, 0.084)
(0.001, 0.012)
2c
(0.665, 0.032)
(5.992, 0.115) (0.122, 0.264) (â0.003, 0.035)
Real
(0.250, 0.096)
(0.858, 0.062) (0.406, 0.784) (â0.041, 0.515)
1 std denotes the standard deviation.

Table 2.4. Computation times for Bayesian calibration and prediction. Estimation (time)
describes the time spent in maximizing to estimate Ď1 and Ď2 , while sampling (time)
describes the time spent in calculating predictions and credible intervals via sampling from
posteriors.
Case Estimation
Sampling
Case Estimation
Sampling
1a
1.177 hours 5.305 hours
1b 1.195 hours 5.160 hours
2a
2.828 hours 8.785 hours
2b 2.796 hours 8.941 hours
3a
7.314 hours 24.377 hours 3b 7.221 hours 24.240 hours
2c
2.875 hours 9.044 hours Real 8.237 mins 2.998 hours
for a computation model as follows. Eight heights uniformly ranging from 9.3 to 17.5 are
org

chosen. For calibration inputs, Î¸1

org

is chosen from {0.05, 0.2, 0.35} and Î¸2

is chosen from

{0.7, 0.90, 0.99}. 500 iterations are used for Stage 1 and 100 iterations are used for Stage 2.
The computation time is 30 minutes as shown in Table 2.4.
org

Table 2.3 shows the estimates of all calibration parameters. The prior means of Î¸1
org

Î¸2

and

are 0.2 and 0.863, and their prior standard deviations are 0.150 and 0.148, respectively.
org

The resulting posterior means of Î¸1

org

and Î¸2

from Bayesian analysis are 0.250, and 0.858,

and their standard deviations are 0.096 and 0.062, respectively.
We withhold observations for the last two years during the calibration procedure in order
to validate our approach by comparing predictions on the last two years (CT images 3 and
4, taken at 2.3 and 3.3 years, respectively). In contrast to the simulation study, we compare
the results with observations (noisy true values) since the true process is not available in

43

the real observations. The difference between the true process and the observation (the
measurement error) is estimated to be N (0, 0.01) from the real observation data used for
Bayesian calibration as shown in Table 2.2. There are some larger discrepancies in the lower
part of predictions at 3.2 years, which can be shown by the corresponding larger relative
errors in Fig. A.2b. Most of the relative errors are less than 5%. Besides, Figure 2.11a shows
how close the Bayesian calibration model predicts the observation (i.e., noisy true) values.
From Fig. 2.11b, we also find that only one observation value lies outside of the credible
bands. These results support that our approach has a capability to predict AAA expansion
using real data.
3.1

O[2.3]
P[2.3]
O[3.2]
P[3.2]

2.9
2.7

Radius

2.5
2.3
2.1
1.9
1.7
1.5
1.3

10

12

14

16

18

Height

(a) Predictions of the observation processes.
O[t] denotes the observation values at year t,
while P [t] denotes the predictions at year t.
The solid line denotes the observations, while
the dashed line denotes the predictions.

(b) 3D 95% credible band of predictions. The
blue stars denote the observations lying inside
the credible band. The red marks denote an
observation value lying outside the credible
band.

Figure 2.11. Predictions and credible bands for Real observation.

2.6

Discussion and Conclusion

From prediction graphs in Figs. 2.8e and 2.8f of Case 3 (simulation study), we notice that the
diameters of proximal and distal ends of the true line at year 7 tend to be smaller than the
44

previous ones. When the AAA is gradually expanding, the end of both regions can be axially
compressed, which means the radii of the two ends are often contracted. The computation
model successfully captures this phenomenon. Currently we are studying longitudinal images
of abdominal aortic aneurysms, registered with the vertebral column. We speculate that the
renal vein and artery, superior mesenteric artery, and iliac bifurcation can serve as an anchor
(both longitudinal and circumferential directions at the superior and inferior boundaries) to
the infrarenal AAA during expansion. Hence, it may be possible that the physical constraints
of the tethering of those vessels provide a strong confinement, or an anchor at the region
of the aorta. During the AAA expansion, the volume of the AAAâs sac will gradually
increase while stretching mostly in the circumferential direction and slightly in the axial
direction simultaneously. Hence, because of the AAA expansion and the confinement in
axial direction, the neck and distal part of aorta (from renal branches to the AAAâs sac)
will be compressed in the axial direction. Using the 3D growth and remodeling simulation,
Zeinali et al. (2012) [115] show the local change in the stress distribution, in which the stress
of the sac is increased but the neckâs stress is decreased.
Recall that we estimated Îą1 and Îą2 a-priori before Bayesian calibration since we assume
that the peak location of the future AAA and its overall geometry do not change significantly
from the previous scans. As illustrated in the real observation case, the AAA peak location
and the aneurysm shape did not significantly change during the follow-up images, which
support our approach.
At present, implementing a full Bayesian analysis especially in a large data case is not
practical since Stage 2 (the process of estimating Ď2 ) in Fig. 2.7 is time consuming even for
one iteration. If we adopt a full Bayesian method, e.g., Gibbs sampling, we need to update
Ď2 in every iteration. In this case, the computational time will quickly accumulate to a
45

non-feasible numerical solution. This is the main reason to use point estimators Ď1 and Ď2
instead of using a full Bayesian method. The case studies presented in the paper using our
approach with point estimators show this methodâs feasibility and good performance.
The estimates of hyperparameters in all cases are shown in Supplementary Tables A.1â
A.8. Estimates of Î˛11 (the coefficient for time t) are always positive, which means the radii
of AAAs increase in time due to AAA expansion. Estimates of Î˛12 (the coefficient for Î¸1 )
are also positive in all cases, which means the QoI, i.e., the radii of AAA increases as Î¸1
increases. This can be explained by the damage function (2.1). The QoI increases while
the elastin contents decreases as the amplitude of the damage function (e.g., Î¸1 ) increases.
Estimates of Î˛13 (the coefficient for Î¸2 ) are negative in the real observation case while all
Î˛13 âs are positive in the simulation cases. This could be due to the fact that Î¸2 values are
largely different between real and simulation observations. In particular, Î¸2 samples values
from {0.7, 0.9, 0.99} in the real observation case while Î¸2 samples values from {2, 6, 10} in the
simulation cases (with 6 being the true value). The largely different parameters values could
give different local sensitivities of the QoI with respect to parameters when the influence of
the parameters is rather complex. On the other hand, the estimates of Î˛21 , Î˛22 and Ď22 in all
cases are small. This implies that the computation model could explain most of the linear
and covariance structures of the true process.
The results from the simulation case study suggest that Bayesian calibration may be used
to combine the computation model, prior and uncertainty models, and real observations to
predict the QoI at future times or at unobserved locations. Computation model output data
provide a deterministic trend of the expansion process, which is modeled by a Gaussian
process. Additionally, computation model discrepancies obtained from real observations are
modeled by another Gaussian process. When we have more information from the computa46

tion model (in the form of finer grids for inputs Xc ), we achieve lower prediction errors, and
the posteriors of parameters Î¸1 and Î¸2 are more likely to concentrate on their true values
as illustrated in our simulation results. We also find that posteriors and predictions from
our approach are robust to the selection of priors for Î¸1 and Î¸2 . In the real observation
case of one patient, the results of Bayesian calibration indicated that the predictions are
reasonably good when compared with the unused last two observations. Most of the unused
observations match reasonably well with predictions and lie inside the 95% credible bands.
The model and the observation errors collectively capture the structure of the true process
in a consistent manner from a Bayesian perspective.
Considering these results, we conclude that the Bayesian calibration process is capable of
predicting complex G&R AAA processes. We also provide evidence that this holds true for a
set of real longitudinal observations, where Bayesian calibration produces good predictions.
However, there is a need for validating our approach with a large number of real observation
cases to evaluate its performance and efficacy in a clinical sense. We believe that for given
past CT images of a patient, Bayesian calibration can help to guide the scheduling of future
CT scans according to predictions with the credible bands.
We also used the 2D axisymmetric G&R model ignoring the time evolution of the centerline in order to simplify the formulation of the Bayesian calibration process and use only
two parameters to study the effectiveness of Bayesian calibration. As future work, we plan
to consider complicated damage functional shapes with more parameters as well as asymmetric AAA expansion in a 3D space. Additionally, the linearity assumption on the mean
functions for Gaussian processes in (2.5) will be relaxed to be more flexible for better prediction performance. We also plan to investigate how to deal with high-dimensional calibration
parameters when flexible and complex functions are used for damage and the rate of mass
47

production. Finally, we will investigate how to incorporate other available patient-specific
data for Bayesian calibration of a particular patient case.

2.7

Data Accessibility

More information about the results is summarized in the supplementary document in the
forms of tables and figures. The real observation data set used for the real observation
case in this paper (Fig. 2.5) is available in the websites of the corresponding authors (http:
//www.egr.msu.edu/âźjchoi/files/papers/index.html and http://www.egr.msu.edu/âźsbaek/
Publications.html).

48

APPENDICES

49

Appendix 1

Programming steps for simulated observations

Figure 2.6 shows us the whole process of Bayesian analysis which combines the information
of priors, computer model, statistical model and observations to calibrate the parameter Î¸
to make predictions. To be more specific, the programming steps for the simulation study
are given as follows. The sections, tables, equations referred here are from the main paper.
Step 0: Specify the statistical models for m1 (x, Î¸) and m2 (x) (See Sections 2.(d)).
Step 1: Set the initials including true values of Ď2 and Î˛2 , normal priors of Î¸, log-normal
priors of Ď1 and Ď2 . See prior assumptions in Section 3.
Step 2: Determine inputs Xc , Xo and Xp as described in Table 2.
Step 3: Simulate the full data d = [y T z T ]T by using Xc , Xo and true values of Ď2 and
Î˛2 as described in Section 4.
Step 4: Run Stage 1 with 500 iterations to obtain the estimate ĎĚ1 . See Section 3.
Step 5: Run Stage 2 with 100 iterations to obtain the estimate ĎĚ2 . See Section 3.
Step 6: Draw samples of Î¸ from the posterior distribution p(Î¸ | ĎĚ, d) with estimated ĎĚ =
[ĎĚ1T , ĎĚ2T ]T plugged in. See Section 3.
Step 7: Plug samples of Î¸, estimated Ď, full data d and Xp into (3.10) and (3.11) to get
predictions and their corresponding variances. See Section 3.

50

Appendix 2

Other Results

Table A.1. Priors and Estimates of Hyperparameters for Case
Ď1
Ďx1
Ďx2
ĎÎ¸1
mean
1
1
1
Lognormal Prior
variance 1.1
1.1
1.1
Estimates from Stage1
0.641 6.552 0.044
?
Ď2
Îť
Ďx1
True values
0.001
1
mean
0.01
1
Lognormal Prior
variance
0.1
1.1
Estimates from Stage2
0.0007 0.270
Î˛
Î˛10
Î˛11
Î˛12
Î˛13
Estimates
0.008
0.626 0.096 0.290

1a
ĎÎ¸2
Ď12
1
1
1.3
1.2
0.725 0.355
?
Ď22
Ďx2
1
0.005
1
0.005
1.1
0.2
0.185 0.022
Î˛21
Î˛22
0.016 -0.048

Table A.2. Priors and Estimates of Hyperparameters
Ď1
Ďx1
Ďx2
mean
1
1
Lognormal Prior
variance 1.1
1.1
Estimates from Stage1
0.703 6.406
Ď2
Îť
True values
0.001
mean
0.01
Lognormal Prior
variance
0.1
Estimates from Stage2
0.002
Î˛
Î˛10
Î˛11
Î˛12
Estimates
0.0008 0.628 0.096

1b
ĎÎ¸2
Ď12
1
1
1.3
1.2
0.702 0.290
?
Ďx2
Ď22
1
0.001
1
0.001
1.1
0.2
0.791 0.0007
Î˛21
Î˛22
0.024 0.017

51

for Case
ĎÎ¸1
1
1.1
0.046
?
Ďx1
1
1
1.1
0.442
Î˛13
0.291

Table A.3. Priors and Estimates of Hyperparameters
Ď1
Ďx1
Ďx2
mean
1
1
Lognormal Prior
variance 1.1
1.1
Estimates from Stage1
0.803 6.815
Ď2
Îť
True values
0.001
mean
0.01
Lognormal Prior
variance
0.1
Estimates from Stage2
0.001
Î˛
Î˛10
Î˛11
Î˛12
Estimates
0.009
0.637 0.101

for Case 2a
ĎÎ¸1
ĎÎ¸2
1
1
1.1
1.3
0.059 0.675
?
?
Ďx2
Ďx1
1
1
1
1
1.1
1.1
0.384 0.272
Î˛13
Î˛21
0.315 -0.086

Ď12
1
1.2
0.261
Ď22
0.005
0.005
0.2
0.122
Î˛22
-0.016

Table A.4. Priors and Estimates of Hyperparameters
Ď1
Ďx1
Ďx2
mean
1
1
Lognormal Prior
variance 1.1
1.1
Estimates from Stage1
0.700 6.836
Ď2
Îť
True values
0.001
mean
0.01
Lognormal Prior
variance
0.1
Estimates from Stage2
0.004
Î˛
Î˛10
Î˛11
Î˛12
Estimates
0.030
0.638 0.102

for Case 2b
ĎÎ¸1
ĎÎ¸2
1
1
1.1
1.3
0.054 0.990
?
?
Ďx1
Ďx2
1
1
1
1
1.1
1.1
0.331 1.103
Î˛13
Î˛21
0.313 -0.014

Ď12
1
1.2
0.304
Ď22
0.001
0.001
0.2
0.003
Î˛22
-0.009

Table A.5. Priors and Estimates of Hyperparameters for Case
Ď1
Ďx1
Ďx2
ĎÎ¸1
mean
1
1
1
Lognormal Prior
variance 1.1
1.1
1.1
Estimates from Stage1
0.794 6.467 0.054
?
Ď2
Îť
Ďx1
True values
0.001
1
mean
0.01
1
Lognormal Prior
variance
0.1
1.1
Estimates from Stage2
0.001 0.353
Î˛
Î˛10
Î˛11
Î˛12
Î˛13
Estimates
0.009
0.641 0.101 0.312

52

2c
ĎÎ¸2
1
1.3
0.818
?
Ďx2
1
1
1.1
2.925
Î˛21
0.006

Ď12
1
1.2
0.283
Ď22
0.001
0.001
0.2
0.008
Î˛22
-0.003

Table A.6. Priors and Estimates of Hyperparameters for Case
Ď1
Ďx1
Ďx2
ĎÎ¸2
mean
1
1
1
Lognormal Prior
variance 1.1
1.1
1.1
Estimates from Stage1
0.898 15.920 0.250
?
Ď2
Îť
Ďx1
True values
0.001
1
mean
0.01
1
Lognormal Prior
variance
0.1
1.1
Estimates from Stage2
0.003 0.210
Î˛
Î˛10
Î˛11
Î˛12
Î˛13
Estimates
0.070
0.621 0.094 0.341

3a
ĎÎ¸2
Ď12
1
1
1.3
1.2
1.640 0.533
?
Ď22
Ďx2
1
0.005
1
0.005
1.1
0.2
0.054 0.032
Î˛21
Î˛22
-0.030 0.004

Table A.7. Priors and Estimates of Hyperparameters for Case
Ď1
Ďx1
ĎÎ¸1
ĎÎ¸1
mean
1
1
1
Lognormal Prior
variance 1.1
1.1
1.1
Estimates from Stage1
0.951 15.786 0.217
?
Ď2
Îť
Ďx1
True values
0.001
1
mean
0.01
1
Lognormal Prior
variance
0.1
1.1
Estimates from Stage2
0.001 0.637
Î˛
Î˛10
Î˛11
Î˛12
Î˛13
Estimates
0.066
0.616 0.094 0.345

3b
ĎÎ¸2
1
1.3
1.971
?
Ďx2
1
1
1.1
1.705
Î˛21
-0.020

Ď12
1
1.2
0.526
Ď22
0.001
0.001
0.2
0.0006
Î˛22
0.015

Table A.8. Priors and Estimates of Hyperparameters for real observations
Ď1
Ďx2
Ďx2
ĎÎ¸1
ĎÎ¸2
Ď12
mean
1
1
1
1
1
Lognormal Prior
variance 1.1
1.1
1.1
1.3
1.2
Estimates from Stage1
0.242 1.702 0.082 0.008 0.729
?
?
Ď2
Îť
Ďx1
Ďx2
Ď22
mean
0.01
1
1
0.1
Lognormal Prior
variance
0.1
1.1
1.1
0.2
Estimates from Stage2
0.01
1
1
0.001
Î˛
Î˛10
Î˛11
Î˛12
Î˛13
Î˛21
Î˛22
Estimates
0.182
0.537 0.163 -0.189 0.021 -0.058

53

0.22

0.22
Year 3.5
Year 4.0
Year 4.5
Year 5.0
Year 5.5
Year 6.0

0.2
0.18

0.18
0.16

Relative errors

Relative errors

0.16
0.14
0.12
0.1
0.08

0.14
0.12
0.1
0.08

0.06

0.06

0.04

0.04

0.02

0.02

0

Year 3.5
Year 4.0
Year 4.5
Year 5.0
Year 5.5
Year 6.0

0.2

0

3

6

9

12

0

15

0

3

6

Height

(a) Case 1a

0.18

0.18
0.16

0.14
0.12
0.1
0.08

0.14
0.12
0.1
0.08

0.06

0.06

0.04

0.04

0.02

0.02
3

6

9

12

0
0

15

3

6

Height

9

12

15

Height

(c) Case 2a

(d) Case 2b

0.22

0.22
Year 4.5
Year 5.0
Year 5.5
Year 6.0
Year 6.5
Year 7.0

0.2
0.18

Year 4.5
Year 5.0
Year 5.5
Year 6.0
Year 6.5
Year 7.0

0.2
0.18
0.16

Relative errors

0.16

Relative errors

15

Year 4.5
Year 5.0
Year 5.5
Year 6.0
Year 6.5
Year 7.0

0.2

Relative errors

0.16

Relative errors

0.22

Year 4.5
Year 5.0
Year 5.5
Year 6.0
Year 6.5
Year 7.0

0.2

0.14
0.12
0.1
0.08

0.14
0.12
0.1
0.08

0.06

0.06

0.04

0.04

0.02

0.02

0

12

(b) Case 1b

0.22

0
0

9

Height

0

3

6

9

12

15

0

0

Height

(e) Case 3a

3

6

9

Height

(f ) Case 3b

Figure A.1. Relative errors between predictions and tmrue values.
54

12

15

0.22

0.18

Year 2.3
Year 3.2

0.08
0.07

Relative errors

0.16

Relative errors

0.09

Year 4.5
Year 5.0
Year 5.5
Year 6.0
Year 6.5
Year 7.0

0.2

0.14
0.12
0.1
0.08

0.06
0.05
0.04
0.03

0.06
0.02

0.04

0.01

0.02
0
0

3

6

9

12

0

15

Height

(a) Case 2c

10

12

14

16

18

Height

(b) Real observation

Figure A.2. Relative errors for Case 2c (a) and for real observation case (b).

55

12

12

priors of Î¸1 and Î¸2

priors of Î¸1 and Î¸2
posterior of Î¸1

posterior of Î¸

1

10

10

posterior of Î¸2

8

8

6

6

4

4

2

2

0
â1

â0.5

0

0.5

1

0
â1

1.5

(a) Case 1a

posterior of Î¸2

â0.5

0

0.5

1

1.5

(b) Case 1b

10

18

priors of Î¸1 and Î¸2

9

posterior of Î¸

8

posterior of Î¸

priors of Î¸1 and Î¸2
posterior of Î¸

16

1

1

posterior of Î¸

2

2

14

7

12

6
10
5
8
4
6

3
2

4

1

2

0
â1

â0.5

0

0.5

1

1.5

0
â1

2

(c) Case 2a

â0.5

0

0.5

1

1.5

(d) Case 2b

25

30

priors of Î¸1 and Î¸2

priors of Î¸1 and Î¸2

posterior of Î¸1

posterior of Î¸1

25

posterior of Î¸2

20

posterior of Î¸2

20
15
15
10
10
5

0
â0.5

(e) Case 3a

5

0

0.5

0
â0.5

1

(f ) Case 3b

Figure A.3. Prior and posterior densities of Î¸1 and Î¸2

56

0

0.5

1

12

1

priors of Î¸1 and Î¸2

10

prior of Î¸1

posterior of Î¸1

0.9

prior of Î¸2

posterior of Î¸2

0.8

posterior of Î¸1
posterior of Î¸2

0.7

8

0.6
6

0.5
0.4

4

0.3
0.2

2

0.1
0
â1

(a) Case 2c

â0.5

0

0.5

1

0
â2

1.5

â1

0

(b) Real observation

Figure A.4. Prior and posterior densities of Î¸1 and Î¸2 .

57

1

2

3

Chapter 3
Patient-Specific Prediction from
Multi-subject Data

3.1

Introduction

The computer model is built based on all the 26 subjects which determine the biomedical
system of AAA enlargement. Therefore, the resembling mechanism of computer model considers both the within-subject and between-subject effects. But the Patient-specific Bayesian
calibration calibrates the unobserved parameters in the computer model by consider real observations from only one subject, which might actually cause the calibration results to distort
itself more on the mechanism determined by the specific patient. Therefore, it has practical
meaning to build the Bayesian calibration model for multi-subject data so that the calibrated
parameter could consider uncertainties from real observations across all the subjects.
Bayesian calibration analysis of longitudinal imaging data it self is very challenging and
interesting. On the one hand, the dimension of imaging data for a single patient at only
one time point is very high, not to say multi-subject data collected at multiple time points.
Thus, whatever models built for the data requires a heavy computational burden. On the
other hand, variability of real observations happens both within-subject and between-subject
across time. And as we know, calibration parameters are unknown inputs in the computer

58

model proposed by the biomedical engineers. Thus, how uncertainties in real observations
across all the subjects affect uncertainties of calibration parameters is an important problem
in Bayesian analysis, since no others have done it yet. But in Bayesian analysis, mixing
and separating so many uncertainties with different properties is quite a challenging and
worthwhile work to do.
Given the above reasons, we model the Bayesian calibration for multi-subject data as
follows. Mostly, we keep the same notation and symbols as Single-subject modeling case in
Chapter 2.

3.2

Calibration model

Suppose the total number of subjects is I, then for the i-th subject, the real observation zij
can be modeled as

zij = r(xij , Î¸i ) + Î´(xij ) + Îľij ,

âi â {1, Âˇ Âˇ Âˇ , I},

(3.1)

where j = 1, Âˇ Âˇ Âˇ , ni , which means we have ni observations from CT scan images of a given
subject i. r(Âˇ, Âˇ) is the computer model. Î´(Âˇ) is the model error. ij is the observation
error. We assume that each Îľij is independently distributed as N (0, Îť), which represents a
normal (Gaussian) random variable with mean 0 and variance Îť. We further assume that
observational errors are independent among and within subjects.

59

3.3

Statistical models

Let GP(m(Âˇ), k(Âˇ, Âˇ)) be the Gaussian process with the mean function m(Âˇ) and the covariance
function k(Âˇ, Âˇ). We introduce the following Gaussian processes as prior beliefs for the G&R
computation model and the model error:

r(xi , Î¸i ) âź GP(m1 (xi , Î¸i ), k1 (xi , x0i )),
Î´(xi ) âź

(3.2)

GP(m2 (xi ), k2 (xi , x0i )).

To specify Gaussian process priors in (3.2) further, we introduce mean and covariance structures for both processes.
For a mean structure, one can consider a linear combination of basis functions to approximate the general mean structure.

m1 (xi , Î¸i ) = h1 (xi , Î¸i )T Î˛1

(3.3)

= Î˛10 + Î˛11 ti + Î˛12 Î¸i1 + Î˛13 Î¸i2 ,
where h1 (xi , Î¸i ) = [1 ti Î¸i1 Î¸i2 ]T and Î˛1 = [Î˛10 Î˛11 Î˛12 Î˛13 ]T .

m2 (xi ) = h2 (xi )T Î˛2 = Î˛21 ti + Î˛22 si ,

(3.4)

where h2 (xi ) = [ti si ]T and Î˛2 = [Î˛21 Î˛22 ]T .
These mean structures imply that the mean function of the G&R computation model
is linear in time ti and calibration parameters, {Î¸i1 , Î¸i2 }. The mean function of the model
error is linear in time ti and location si . Î˛ = [Î˛1T Î˛2T ]T are hyperparameters for the mean
functions in Bayesian context.
For a covariance structure, we use the following exponential functions as follows.
60

k1 (xi , x0i ) = Ď12 exp{â(xi â x0i )âŚx (xi â x0i )T } exp{â(Î¸i â Î¸i0 )âŚÎ¸ (Î¸i â Î¸i0 )T },
k2 (xi , x0i ) = Ď22 exp{â(xi â x0i )âŚ?x (xi â x0i )T },
where âŚx , âŚ?x are diagonal matrices such that
ďŁŤ

ďŁś

ďŁŤ

ďŁś

ďŁŤ

?
Ďx1

ďŁŹ Ďx1 0 ďŁˇ
ďŁŹ ĎÎ¸1 0 ďŁˇ ? ďŁŹ
âŚx = ďŁ­
ďŁ¸ , âŚÎ¸ = ďŁ­
ďŁ¸ , âŚx = ďŁ­
0 Ďx2
0 ĎÎ¸2
0

ďŁś
0 ďŁˇ
ďŁ¸
?
Ďx2

? âs.
so that the hyperparameters for the covariance functions are Ď12 , Ď22 and wxj , wÎ¸j , wxj

For the parameters controlling the covariances, we introduce Ď = [Ď1 Ď2 ], with
Ď1 = [Ďx1 Ďx2 ĎÎ¸1 ĎÎ¸2 Ď12 ],
Ď2 =

?
[Ďx1

?
Ďx2

(3.5)

Ď22 ].

Ď1 is the set of hyperparameters related to the G&R computation model. Ď2 is the set of
hyperparameters related to model error.

3.4

Computer outputs

Let Ni be the number of computer outputs for each subject i and yi = [yi1 Âˇ Âˇ Âˇ ziNi ]T â
RNi Ă1 . Then given the Ni -th inputs for subject i, (xâiN , aiNi ), we have the corresponding
i
computer output yiNi = r(xâiN , aiNi ). The corresponding variable input matrix is Xi,c =
i
[(xâi1 )T , Âˇ Âˇ Âˇ , (xâiN )T ]T â RNi Ă2 . We further define y = [y1T Âˇ Âˇ Âˇ yIT ]T â RN Ă1 as the comi

puter model output vector. The final variable input matrix is Xc = [(X1,c )T , Âˇ Âˇ Âˇ , (XI,c )T ]T â

61

RN Ă2 . We augment variable inputs Xc with parameter inputs ai such that Xi,c (ai ) =
[(xâi1 , ai )T , Âˇ Âˇ Âˇ , (xâiN , ai )T ]T â RNi Ă2 . The final input matrix for computer outputs is
i
Xc (a) = [X1,c (a1 )T , Âˇ Âˇ Âˇ , XI,c (aI )T ]T â RN Ă4 .
Then, the GP assumption for the computer model gives
yi âź N (H1 (Xi,c (ai ))Î˛1 , V1 (Xi,c )),
where H1 (Xi,c (ai )) denotes the matrix with rows h1 (xâi1 , ai )T , Âˇ Âˇ Âˇ , h1 (xâiN , ai )T and the
i
(j, k) entry of V1 (Xi,c ) for the i-th subject is k1 (xâij , xâik ), for j, k â {1, 2, Âˇ Âˇ Âˇ , Ni }. We
denote the mean and covariance as Âľyi = H1 (Xi,c (ai ))Î˛1 and Vyi = V1 (Xi,c ) respectively.
Finally, the computer output is y = [y1T Âˇ Âˇ Âˇ yIT ]T â RN Ă1 . The distribution of y is
y âź N (Âľy , Vy ),

where

ďŁŤ

ďŁś

ďŁŹ Âľy1 ďŁˇ
ďŁŹ
ďŁˇ
ďŁŹ
ďŁˇ
ďŁŹ Âľy ďŁˇ
ďŁŹ
2 ďŁˇ
Âľy = ďŁŹ
ďŁˇ,
ďŁŹ .. ďŁˇ
ďŁŹ . ďŁˇ
ďŁŹ
ďŁˇ
ďŁ­
ďŁ¸
ÂľyI
and

ďŁŤ

ďŁś

0 ÂˇÂˇÂˇ
ďŁŹ Vy1
ďŁŹ
ďŁŹ
ďŁŹ 0 Vy Âˇ Âˇ Âˇ
ďŁŹ
2
Vy = ďŁŹ
ďŁŹ ..
..
...
ďŁŹ .
.
ďŁŹ
ďŁ­
0
0 ÂˇÂˇÂˇ

62

0
0
..
.
VyI

ďŁˇ
ďŁˇ
ďŁˇ
ďŁˇ
ďŁˇ
ďŁˇ.
ďŁˇ
ďŁˇ
ďŁˇ
ďŁ¸

3.5

Observations

Let ni be the number of observations for each subject i. Let zi = [zi1 Âˇ Âˇ Âˇ zini ]T â Rni Ă1 be
the set of observations corresponding to the variable input matrix Xi,o = [xTi1 , Âˇ Âˇ Âˇ , xTin ]T â
i
T , Âˇ Âˇ Âˇ , X T ]T â RnĂ2 is not necessarily the same as the set of
Rni Ă2 . Note that Xo = [X1,o
I,o

variable inputs for the computer model outputs. In general, the number of observations is
smaller than that of the computer model outputs since we can control the amount of the
computer model outputs. To calibrate Î¸ = [Î¸1 , Âˇ Âˇ Âˇ , Î¸I ] from the observations, we augment
variable inputs Xo with Î¸i such that Xi,o (Î¸i ) = [(xi1 , Î¸i )T , Âˇ Âˇ Âˇ , (xini , Î¸i )T ]T .
Then, from the calibration model, we have

zi âź N (H1 (Xi,o (Î¸i ))Î˛1 + H2 (Xi,o )Î˛2 , Îťi Ini + V1 (Xi,o ) + V2 (Xi,o )),
where H1 (Xi,o (Î¸i )) denotes the matrix with rows h1 (xi1 , Î¸i )T , Âˇ Âˇ Âˇ , h1 (xini , Î¸i )T , H2 (Xi,o )
denotes the matrix with rows h2 (xi1 )T , Âˇ Âˇ Âˇ , h2 (xini )T . V1 (Xi,o ) is defined in a similar way
to define V1 (Xi,c ). The (j, k) entry of V2 (Xi,o ) is k2 (xij , xik ), for j, k â {1, 2, Âˇ Âˇ Âˇ , ni }.
Denote Âľzi = H1 (Xi,o (Î¸i ))Î˛1 + H2 (Xi,o )Î˛2 and Vzi = Îťi Ini + V1 (Xi,o ) + V2 (Xi,o ))
Then the observation is z = [z1T Âˇ Âˇ Âˇ zIT ]T â RnĂ1 . The distribution of z is
z âź N (Âľz , Vz ),

63

where

ďŁŤ

ďŁś

ďŁŹ Âľz1 ďŁˇ
ďŁŹ
ďŁˇ
ďŁŹ
ďŁˇ
ďŁŹ Âľz ďŁˇ
ďŁŹ
2 ďŁˇ
Âľz = ďŁŹ
ďŁˇ,
ďŁŹ .. ďŁˇ
ďŁŹ . ďŁˇ
ďŁŹ
ďŁˇ
ďŁ­
ďŁ¸
ÂľzI
and

ďŁŤ

ďŁś

ďŁŹ Vz1 0 Âˇ Âˇ Âˇ
ďŁŹ
ďŁŹ
ďŁŹ 0 Vz Âˇ Âˇ Âˇ
ďŁŹ
2
Vz = ďŁŹ
ďŁŹ ..
..
...
ďŁŹ .
.
ďŁŹ
ďŁ­
0
0 ÂˇÂˇÂˇ

3.6

0 ďŁˇ
ďŁˇ
ďŁˇ
0 ďŁˇ
ďŁˇ
ďŁˇ.
.. ďŁˇ
. ďŁˇ
ďŁˇ
ďŁ¸
VzI

Full data

We then combine the computer model outputs and observations, d = [dT1 Âˇ Âˇ Âˇ dTI ]T â
R(N +n)Ă1 which we call a data vector.
ďŁŤ

ďŁś

ďŁŹ yi ďŁˇ
di = ďŁ­
ďŁ¸ âź N (mdi (Î¸i ), Vdi ),
zi

(3.6)

where
mdi (Î¸i ) := E(di |Î¸i , Î˛, Ď) = H(Î¸i )Î˛,
with
ďŁŤ

ďŁś

0
ďŁŹ H1 (Xi,c (ai ))
ďŁˇ
H(Î¸i ) = ďŁ­
ďŁ¸,
H1 (Xi,o (Î¸i )) H2 (Xi,o )
and
Î˛ = (Î˛1T , Î˛2T )T .
64

The full mean is
md (Î¸) = [md1 (Î¸1 )T , Âˇ Âˇ Âˇ , md (Î¸I )T ]T .
I
Denote H(Î¸) = [H(Î¸1 )T , Âˇ Âˇ Âˇ , H(Î¸I )T ]T .
We assume different subjects are independent, then the full covariance is
ďŁŤ

ďŁś

0
ÂˇÂˇÂˇ
ďŁŹ Vd1 (Ď)
ďŁŹ
ďŁŹ
ďŁŹ
0
Vd2 (Ď) Âˇ Âˇ Âˇ
ďŁŹ
Vd (Ď) = ďŁŹ
ďŁŹ
..
..
..
ďŁŹ
.
.
.
ďŁŹ
ďŁ­
0
0
ÂˇÂˇÂˇ

0
0
..
.
Vd (Ď)

ďŁˇ
ďŁˇ
ďŁˇ
ďŁˇ
ďŁˇ
ďŁˇ,
ďŁˇ
ďŁˇ
ďŁˇ
ďŁ¸

I

where
ďŁŤ
ďŁŹ Vyi
Vdi (Ď) := var(di |Ď) = ďŁ­
Cyi zi

CyT z
i i
Vzi

ďŁś
ďŁˇ
ďŁ¸,

where Cyi zi â Rni ĂNi is a cross covariance matrix whose (j, k) entry is k1 (xâij , xik ), for
j â {1, 2, Âˇ Âˇ Âˇ , Ni } and k â {1, 2, Âˇ Âˇ Âˇ , ni }. Note that Cyi zi is not necessary to be a square
matrix, since usually Ni 6= ni .

65

3.7
3.7.1

Bayesian analysis
Joint posterior distribution

We note that md (Î¸) also depend on Î˛. We drop it to reduce notational complexity. We
have the following full joint posterior distribution of (Î¸, Î˛, Ď) given d,

p(Î¸, Î˛, Ď | d) âp(Î¸)p(Ď)f (d; md (Î¸), Vd (Ď))
â

I
Y

1

p(Î¸i )p(Ď)|Vd (Ď)|â 2

(3.7)

i=1




1
T
â1
Ă exp â {(d â md (Î¸)) Vd (Ď) (d â md (Î¸))} ,
2
where f (d; m, V ) is the multivariate Gaussian density function with mean m and variance
V.
Clearly, md (Î¸) is a linear function of Î˛, so we can find

Î˛ | Î¸, Ď, d âź N (Î˛Ě(Î¸, Ď), W (Î¸, Ď)),

(3.8)

where
Î˛Ě(Î¸, Ď) = W (Î¸)H(Î¸)T Vd (Ď)â1 d,
W (Î¸, Ď) = (H(Î¸)T Vd (Ď)â1 H(Î¸))â1 ,
We can then integrate out Î˛ in (3.7) and get
1

p(Î¸, Ď | d) â p(Ď)|W (Î¸, Ď)| 2

I
Y
i=1

1

p(Î¸i )|Vdi (Ď)|â 2



1
Ă exp â {(di â H(Î¸i )Î˛Ě(Î¸, Ď))T Vdi (Ď)â1 (di â H(Î¸i )Î˛Ě(Î¸, Ď))}
2

66

(3.9)


Because of independence, Î˛Ě and W in (3.9) can be simplified as
ďŁŤ
W (Î¸, Ď) = ďŁ­

I
X
i=1

ďŁśâ1

H(Î¸i )T Vdi (Ď)â1 H(Î¸i )ďŁ¸

ďŁŤ
Î˛Ě(Î¸, Ď) = W (Î¸, Ď) ďŁ­

I
X
i=1

3.7.2

ďŁś
H(Î¸i )T Vdi (Ď)â1 di ďŁ¸

Full conditional of Ď1

We can estimate Ď1 by maximizing
p(Ď1 |y) â |V1 (Xc )|â1/2 p(Ď1 )|W1 (Xc (a))|1/2


1
T
â1
Ă exp â (y â H1 (Xc (a))Î˛Ě1 ) V1 (Xc ) (y â H1 (Xc (a))Î˛Ě1 ) ,
2

(3.10)

where
Î˛Ě1 = W1 (Xc (a))H1 (Xc (a))T V1 (Xc )y,
W1 (Xc (a)) = (H1 (Xc (a))T V1 (Xc )â1 H1 (Xc (a)))â1 .

3.7.3

Full conditional of Ď2

We can estimate Ď2 by maximizing p(Ď2 |d, Ď1 ).To obtain this density, we can first write
p(Î˛2 , Ď2 |d, Ď1 ) â p(Î˛2 , Ď2 )p(z|y, Î˛2 , Ď),
since y is independent of the second stage hyper-parameters.
We canât obtain the distribution of [z|y, Î˛2 , Ď] analytically, but starting with [z|y, Î˛2 , Ď, Î¸],
which is normally distributed.
zij | Î˛1 , Î˛2 , Ď, Î¸i âź N (h1 (xij , Î¸i )T Î˛1 + h2 (xij )T Î˛2 , Îť + V1 (xij ) + V2 (xij ))
67

E(zij |yi , Î˛2 , Ď, Î¸i , Î˛1 ) = h2 (xij )T Î˛2 + h1 (xij , Î¸i )T Î˛1
+t(xij )T V1 (Xi,c )â1 (yi â H1 (Xi,c (a))Î˛1 ),
where the k-th element of t(xij ) is k1 (xij , xâik ), for k = 1, . . . , Ni .
We can obtain expressions for its mean vector.

E(zij |yi , Î˛2 , Ď)
R
=
E(zij |yi , Î˛2 , Ď, Î¸i )p(Î¸i )dÎ¸i
R
= h2 (xij )T Î˛2 + h1 (xij , Î¸i )T p(Î¸i )dÎ¸i Î˛Ě1

,

+t(xij )T V1 (Xi,c )â1 (yi â H1 (Xi,c (ai ))Î˛Ě1 )
where the k-th element of t(xij , Î¸i ) is k1 (xij , xâik ), for k = 1, . . . , Ni .
We can obtain expressions for its variance matrix. The covariance matrix is Vi,Ď2 =
ÎťIi + V2 (Xi,o ) + Ci , where the (j, k) element of Ci is

R

cov(r(xij , Î¸i ), r(xik , Î¸i )|yi , Ď1 )p(Î¸i )dÎ¸i

= k1 (xij , xik )
âtr{V1 (Xi,c )â1 t(xij )t(xik )T }
R
+tr{W1 (Xi,c (ai )) h1 (xij , Î¸i )h1 (xik , Î¸i )T p(Î¸i )dÎ¸i }
R
âtr{W1 (Xi,c (ai ))H1 (Xi,c (ai ))T V1 (Xi,c )â1 t(xij ) h1 (xik , Î¸i )T p(Î¸i )dÎ¸i }
R
âtr{V1 (Xi,c )â1 H1 (Xi,c (ai ))W1 (Xi,c (ai )) h1 (xij , Î¸i )p(Î¸i )dÎ¸i t(xik )T }
+tr{V1 (Xi,c )â1 H1 (Xi,c (ai ))W1 (Xi,c (ai ))H1 (Xi,c (ai ))T V1 (Xi,c )â1
Ăt(xij )t(xik )T }

68


var r(xij , Î¸i ), r(xik , Î¸i ), yi |Ď1 , Î¸i
ďŁŽ

t(xij )T
var(r(xij , Î¸i ))
cov r(xij , Î¸i ), r(xik , Î¸i )
ďŁŻ
ďŁŻ

ďŁŻ
= ďŁŻ cov r(xij , Î¸i ), r(xik , Î¸i )
var(r(xik , Î¸i ))
t(xik )T
ďŁŻ
ďŁ°
t(xij )
t(xik )
V1 (Xi,c )
ďŁŽ
ďŁš
T
ďŁŻ k1 (xij , xij ) k1 (xij , xik ) t(xij ) ďŁş
ďŁŻ
ďŁş
ďŁŻ
ďŁş
= ďŁŻ k1 (xij , xik ) k1 (xik , xik ) t(xik )T ďŁş
ďŁŻ
ďŁş
ďŁ°
ďŁť
t(xij )
t(xik )
V1 (Xi,c )

ďŁš
ďŁş
ďŁş
ďŁş
ďŁş
ďŁş
ďŁť

Mean
E r(xij , Î¸i )|yi , Ď1 , Î¸i


=h1 (xij , Î¸i )T Î˛1 + t(xij )T V1 (Xi,c )â1 (yi â H1 (Xi,c (ai )Î˛1 )
Covariance
cov r(xij , Î¸i ), r(xik , Î¸i )|yi , Ď1 , Î¸i


=k1 (xij , xik ) â t(xij )T V (Xi,c )â1 t(xik )
n
o
â1
T
=k1 (xij , xik ) â tr V (Xi,c ) t(xij )t(xik )
Plug the above two equations into the law of total covariance, we can get Ci .
We then approximate [zi |yi , Î˛2 , Ď] by a normal distribution with the above moments for
the purpose of estimating Ď2 . To simplify the notation we write the mean vector as Âľi,Ď2 =
H2 (Xi,o )Î˛2 + ÎˇĚ(Xi,o ) , where ÎˇĚ(Xi,o ) is the vector with elements E(r(xij , Î¸i )|yi ), i =
1, . . . , n. we can now integrate out Î˛2 to obtain the approximation.
p(Ď2 |di , Ď1 ) â p(Ď2 )|Vi,Ď2 |1/2 |Wi,2 |1/2

â1
Ă exp â 21 (zi â H2 (Xi,o )Î˛Ěi,2 â ÎˇĚ(Xi,o ))T Vi,Ď
2
	
(zi â H2 (Xi,o )Î˛Ěi,2 â ÎˇĚ(Xi,o )) ,
69

(3.11)

where
â1 (z â ÎˇĚ(X ))
Î˛Ěi,2 = Wi,2 H2 (Xi,o )Vi,Ď
i
i,o
2

â1 H (X ))â1 .
Wi,2 = (H2 (Xi,o )T Vi,Ď
2 i,o
2

Full mean
ÂľĎ2 = H2 (Xo )Î˛2 + ÎˇĚ(Xo )
where ÎˇĚ(Xo ) = [ÎˇĚ(X1,o )T , Âˇ Âˇ Âˇ , ÎˇĚ(XI,o )]
Full covariance

ďŁś

ďŁŤ
0
ÂˇÂˇÂˇ
ďŁŹ V1,Ď2
ďŁŹ
ďŁŹ
ďŁŹ 0
V2,Ď2 Âˇ Âˇ Âˇ
ďŁŹ
VĎ2 = ďŁŹ
ďŁŹ ..
..
..
ďŁŹ .
.
.
ďŁŹ
ďŁ­
0
0
ÂˇÂˇÂˇ

0
0
..
.
VI,Ď2

ďŁˇ
ďŁˇ
ďŁˇ
ďŁˇ
ďŁˇ
ďŁˇ,
ďŁˇ
ďŁˇ
ďŁˇ
ďŁ¸

p(Ď2 |d, Ď1 ) â p(Ď2 )|VĎ2 |1/2 |W2 |1/2
n
o
Ă exp â 12 (z â H2 (Xo )Î˛Ě2 â ÎˇĚ(Xo ))T VĎâ1 (z â H2 (Xo )Î˛Ě2 â ÎˇĚ(Xo )) ,

(3.12)

2

where
Î˛Ě2 = W2 H2 (Xo )T VĎâ1 (z â ÎˇĚ(Xo ))
W2 =

2
(H2 (Xo )T VĎâ1 H2 (Xo ))â1 .
2

Notice the following expression

(z â H2 (Xo )Î˛Ě2 â ÎˇĚ(Xo ))T VĎâ1 (z â H2 (Xo )Î˛Ě2 â ÎˇĚ(Xo ))
2

Denote u = z â ÎˇĚ(Xo ).

70

(3.13)

Then expression 3.13 can be simplified as

[(I â H2 (H2T V â1 H2 )â1 H2T V â1 )u]T V â1 [(I â H2 (H2T V â1 H2 )â1 H2T V â1 )u]
=uT [(I â V â1 H2 (H2 V â1 H2 )â1 H2T )V â1 (I â H2 (H2T V â1 H2 )â1 H2T V â1 )]u
=uT [V â1 â V â1 H2 (H2T V â1 H2 )â1 H2T V â1 ]u
=uT Ru
where R = [V â1 â V â1 H2 (H2T V â1 H2 )â1 H2T V â1 ]
Since VĎ2 , W2 and R are symmetric, therefore we can decompose equation 3.12 into the
products of patient-wise likelihoods.

3.7.4

Posterior distribution of calibration parameter Î¸
p(Î¸ | Ď, d) â

I
Y

1

p(Î¸i )|W (Î¸, Ď)| 2

i=1

(3.14)




1
T
â1
Ă exp â {(d â H(Î¸)Î˛Ě(Î¸, Ď)) Vd (Ď) (d â H(Î¸)Î˛Ě(Î¸, Ď))}
2

3.7.5

Predictive distribution of true process Îś(Âˇ)

Its mean function is given by

E(Îś(x)|Î¸, Ď, d) =h(x, Î¸)T Î˛Ě(Î¸, Ď)
(3.15)
+ t(x, Î¸)T Vd (Ď)â1 (d â H(Î¸)Î˛Ě(Î¸, Ď)),
where
ďŁŤ

ďŁś

ďŁŹ h1 (x, Î¸) ďŁˇ
h(x, Î¸) = ďŁ­
ďŁ¸,
h2 (x)

71

and
ďŁŤ
ďŁŹ
t(x) = ďŁ­

ďŁś
V1 (x, Xc )
V1 {x, Xo } + V2 (x, Xo )

ďŁˇ
ďŁ¸.

Additionally, its covariance function is given by

cov(Îś(xi ), Îś(x0i ) | Î¸i , Ď, di ) = k1 (xi , x0i ) + k2 (x, x0 )
â t(x)T Vdi (Ď)â1 t(x0 , Î¸) + (h(x, Î¸)

(3.16)

â H(Î¸)T Vd (Ď)â1 t(x, Î¸))T W (Î¸)(h(x0 , Î¸)
â H(Î¸)T Vd (Ď)â1 t(x0 , Î¸)).

3.8

Data Analysis

As people are quite familiar with the classical method regression, people are tending to
compare statistical methods with regression. In terms of modeling, calibration is a reverse process to regression, where instead of a future dependent variable being predicted
from known explanatory variables, a known observation of the dependent variables is used
to predict a corresponding explanatory variable. In our application problem, the observation of dependent variables is the Bio-geometrical shape of AAA enlargement, while the
corresponding explanatory variables the calibration parameters. In terms of applications,
Bayesian calibration is actually resembling the mechanisms of a physical system (here is a
biomedical system) by calibrating the internal parameters (unobserved) in a way that the
model outputs can enhance the similarity of the model outputs and real observations. But
obviously, for example, in medical science, a regression model is sometimes used to identify
the significant covariate or treatments. Bayesian calibration integrates the computer model

72

as an independent source of useful information and expertise besides the real observations,
while a regular regression usually just has one data source which is just real observations.
Bayesian calibration combines two sources of dataâComputer data and real observations.
The latter is usually more expensive to collect than the former one, as a result, computer data
has much higher dimension than the real observations. Thus, the patient specific prediction
modeling is usually more computationally intensive than the regular regression using one
sources of data. And we can imagine, the computation of Bayesian calibration model for
multi-subject data can be much more demanding. Therefore, we make two parts of the data
analysis. One part is the analysis result for small sample (only two patients). Another part
is the analysis result for big sample (which includes 6 patient.)

3.8.1

Small sample

First, I run data analysis on the real observations from two subjects. For each subject, the
computer data covers the time span of real data and future inputs. We use the real data
to calibrate the model at one time point, while we use future inputs to make predictions
at a second time point. Besides, we choose equally spaced 8 points along the center line to
describe the AAA shape. The prediction results are shown as follows.
Priors and estimates of hyperparameters are shown in Table 3.1. As we can see, the
estimates of Ď2 are close to the prior means which are actually equal to true values. As
shown in Fig. 3.1, all the real observations are captured by the credible intervals and the
maximum relative error is 2.5%. The results shown in Fig. 3.2 is not as good as the results
in Fig. 3.1. But it is expected as discussed before, there is an acceleration property of AAA
expansion which sometimes can not be completely captured by the computer model. As the
values of calibration parameters from the two subjects are close, we use the same prior for
73

both subject 1 and subject 2 (Figures 3.1d and 3.2d).
3.1

3.1
Observation
Prediction

2.7

2.7

2.5

2.5

2.3
2.1

2.3
2.1

1.9

1.9

1.7

1.7

1.5

1.5

1.3

9

10

11

12

13

14

15

16

17

confidence band
inside
prediction

2.9

Radius

Height

2.9

1.3

18

Radius

9

10

11

12

13

14

15

16

17

18

Height

(a) Predictions of the observation processes.
The solid line denotes the observations, while
the dashed line denotes the predictions.

(b) 95% credible band of predictions. The blue
stars denote the observations lying inside the
credible band. The red marks denote an
observation value lying outside the credible
band.

0.03

Relative errors

0.025

0.02

0.015

0.01

0.005

0

8

10

12

14

16

18

Height

(c) Relative errors between predictions and true(d) Prior and posterior densities of Î¸1 and Î¸2 .
values.

Figure 3.1. Prediction results for subject 1.

3.8.2

Big sample

The design of Bayesian calibration is given as follows. For each subject, the computer data
covers the time span of real data and future inputs. The real data are used to calibrate
the model at the first two time points, while future inputs are used to make predictions at
74

3.1

3.1
Observation
Prediction

2.7

2.7

2.5

2.5

2.3
2.1

2.3
2.1

1.9

1.9

1.7

1.7

1.5

1.5

1.3

9

10

11

12

13

14

15

16

17

confidence band
inside
outside
prediction

2.9

Radius

Height

2.9

1.3

18

Radius

9

10

11

12

13

14

15

16

17

18

Height

(a) Predictions of the observation processes.
The solid line denotes the observations, while
the dashed line denotes the predictions.

(b) 95% credible band of predictions. The blue
stars denote the observations lying inside the
credible band. The red marks denote an
observation value lying outside the credible
band.

0.12

Relative errors

0.1

0.08

0.06

0.04

0.02

0

8

10

12

14

16

18

Height

(c) Relative errors between predictions and true(d) Prior and posterior densities of Î¸1 and Î¸2 .
values.

Figure 3.2. Prediction results for subject 2.
last two time points. Therefore, the computer data totally have 4 time points. Besides, we
choose equally spaced 8 points along the center line to describe the AAA shape. The results
are shown as follows.
When we have 6 patients, there are totally 12 calibration parameters to be inferred by
Bayesian analysis since each patient has two calibration parameters. And we can imaging, drawing samples from an unknown density with 12 parameters is not an easy task to

75

Table 3.1. Priors and Estimates of Hyperparameters
Ď1
Ďx2
Ďx2
mean
1
1
Lognormal Prior
variance 1.1
1.1
Estimates from Stage1
0.186 1.738
Ď2
Îť
mean
0.01
Lognormal Prior
variance
0.1
Estimates from Stage2
0.01
Î˛
Î˛10
Î˛11
Î˛12
Estimates
0.006
0.418 0.146

ĎÎ¸1
ĎÎ¸2
Ď12
1
1
1
1.1
1.3
1.2
0.0832 0.007 0.640
?
?
Ďx1
Ďx2
Ď22
1
1
0.1
1.1
1.1
0.2
1
1
0.001
Î˛13
Î˛21
Î˛22
-0.176 0.053 -0.016

accomplish.
As discussed above, Bayesian calibration is an iterative process of updating uncertainty
distributions on the calibration parameters in a way that is consistent with the observed
data. Therefore, it is more important for the model to update uncertainty distributions on
the calibration parameters by considering all the parameter information of all the subjects.
Therefore, we assume all the subjects have the same calibration parameter values with the
same distributions. From the computer simulations, we chose values of Î¸1 ranging from 0.5
to 0.8 and values of Î¸2 ranging from 1 to 9 as the calibration inputs. Then I used normal
priors which can cover the range of calibration parameters (See Figure 3.3). Finally the
posterior will determine a smaller range of values as the update of the values obtained from
computer simulations.

3.9

Conclusion

Based on the results, we can see individualized version of Bayesian calibration is powerful
in integrating individual information (uncertainty) of calibration parameters to update the
range of them. So these new ranges and distributions of calibration parameters will improve

76

2.5

0.6
prior of 3 1
posterior of 3 1

prior of 3 2
posterior of 3 2

0.5

2

0.4
1.5
0.3
1
0.2
0.5

0

0.1

0

0.2

0.4

0.6

0.8

1

1.2

(a) Prior and posterior densities of Î¸1

0

1.4

1

2

3

4

5

6

7

8

9

(b) Prior and posterior densities of Î¸2 .

Figure 3.3. Prior and posterior results for the data consisting of 6 patients.
in some way the consistency between estimated and observed, whenever researchers are
generating simulations from computer code or making predictions of future observations.
Computational difficulties mainly come from drawing samples of calibration parameters
from posteriors by using Metropolis Hasting algorithms. As we can see from equation 3.14,
posterior density is not tractable of calibration parameters analytically. Therefore, once we
incorporate large number of subjects into the model which means the number of calibration
parameters is high, then the computation of high dimensional Metropolis Hasting Algorithms
is very hard to converge and control. Therefore, we have to think of better ways to tackle
this problems.
Additionally, as introduced at the beginning of this dissertation, we should think of
using Latin hypercube sampling to improve the design of Computer Experiments, especially
in the Bayesian calibration model for multiple subjects. Because the computational time is
increasing significantly when we include more subjects into the model and make predicitons.

77

Chapter 4
Alzheimerâs Disease Progression
Analysis Using Multi-state Markov
Model

4.1

Introduction

With the rapid aging of the world population, approximately 10 million people worldwide are
affected by Alzheimerâs Disease. It is a leading cause of death behind cardiovascular disease
and cancer. Nearly 10% of people 65 years of age and older are affected by Alzheimerâs
Disease. While the disease is most common in the elderly, it has been diagnosed in patients
in their 40s and 50s.
Alzheimerâs Disease (AD) is a progressive, irreversible, degenerative brain disorder, causing impaired memory, thinking and behavior [76]. These impairments are related to the
death of brain cells and the breakdown of the connections between them. From the onset of
the symptoms, the disease usually progresses over a period of eight years. But the disease
can last for up to 20 years. Three stages are defined in the advance of AD, from early
signs, mild cognitive impairment to severe dementia. Generally speaking, the early signs of
AD appear after the age of 60, which often include loss of recent memory, faulty judgment,

78

and changes in personality. Individuals in the mild stage may even forget how to do simple
routine tasks such as dressing or bathing. In the final stages of the disease patients are
bedridden and frequently develop coexisting illnesses. Most commonly, individuals with AD
die from pneumonia. Although the risk of developing AD increases with age, AD is quite
different from the normal aging process, because AD is a disease, while the normal aging
process is not. Without this disease, the human brain often can operate to the age of 100
and beyond.
The research of Alzheimerâs Disease has been divided into three broad categories: causes
and risk factors, diagnosis, and treatment. Although researchers cannot provide definitive
answers to the causes of Alzheimerâs Disease until now, theyâve made progress to approach
the truth in many ways. For example, genetic predisposition, abnormal protein deposits in
the brain and environmental factors are suspected to play a role in the development of the
disease. In order to understand the pathology of the AD brain, scientists need to understand
the structure and function of the aging nervous system. Particularly to find the causes and
risk factors in AD, they should understand the loss of communication between neurons and
the death of neurons in the AD brain. The research on diagnosis research is focused on
finding ways to identify markers (indicators) of dementias, develop and improve ways to test
patients, determine causes and assess risk factors, and improve methods for finding cases and
sampling populations. When it comes to treatment, researchers are working to slow ADâs
progression, delay its onset, or eventually, prevent it altogether. They are also developing
ways to improve a patientâs ability to function, and to support caregivers of people with AD.
For full descriptions, please refer to the Laboratory of Neuro Imaging website [76].
The data we are using in this paper to understand the structure of the brain system
is magnetic resonance imaging (MRI). MRI is currently the imaging modality of choice
79

for assessing neurologic disorders, because of the exclusive capacity of providing excellent
anatomical detail in a safe setting. MRI is an imaging technique based on the principles
of nuclear magnetic resonance that detects proton signals from water molecules and that
allows us to produce high quality images of the internal structure of the living brain. To
distinguish between gray matter, white matter, and cerebrospinal fluid (CSF), certain protocols are designed to create anatomical images of the brain. As a result, MRI is able to
differentiate between tissue types. By providing the contrast needed to distinguish these,
MRI allows researchers to measure the sizes as well as various other properties of different
parts of the brain [71]. Among all the measurement, quantitative assessment of brain volumes, obtained through volumetric MRI, has been increasingly applied in recent years to
a wide range of neurologic conditions due to advances in computational technology. Typical current anatomical images of the brain captured by MRI have a spatial resolution of
approximately 1 cubic millimeter (mm3 ). The volume of an adult human brain is, on average, between 1,131,0001,273,000 mm3 , with substantial variation between individuals [1].
Although the spatial resolution of modern MRI protocols is very high, 1mm3 of cortical
gray matter can contain between 10,000 and 60,000 neurons, up to four times as many glial
cells per neuron [84], as well as neuronal processes, blood vessels, intracortical myelin and
dendritic spines. Measures of brain volumes have been shown to be valid biomarkers of the
clinical state and progression by offering high reliability and robust inferences on the underlying disease-related mechanisms [42]. In general, atrophy rates in different brain structures
of AD patients are related to deterioration of cognitive performance [34].
During all these years the clinical progression of AD has been studied for subjects in
different stages. Subjects with the traditional amnestic MCI (mild cognitive impairment)
have higher annual rate of conversion to AD than NC (normal controls) [78]. In particular,
80

MCI subjects with abnormalities in both CSF and medial temporal lobe has a very high risk
of progression to AD [8], where the hippocampal atrophy rate is in the order of 3.5% per
year compared to about 1% per year of MCI subjects not converting to AD [28]. However,
in addition to structures of the medial temporal lobe, volumes of cortical regions in parietal
and lateral temporal lobes are also used to predict the likelihood of progression to AD in
MCI subjects [64]. Moreover, an MRI score reflecting the degree of AD-like brain atrophy
features showed slightly higher predictive value for future cognitive impairment than CSF
measures [105].
In longitudinal medical studies, patients are observed over time and covariate information
is collected at several occasions. The analysis in such studies where individuals may experience several events is often performed using multi-state models. A multi-state model (MSM)
is a model for a continuous time stochastic process allowing individuals to move among a
finite number of states [70]. For example, the survival analysis is actually a two-state model,
while the continuous time Markov model is one special kind of multi-state model. Particularly, in our study, each subject is scanned at each visit and thus MRI data are collected
longitudinally. The changing volumes in each brain region plotted by these MRI scans can
be considered as covariate information to predict how AD individuals progress from early
signs, mild cognitive impairment to severe dementia. Commenges, D. [17, 18] have given
reviews on the analysis of interval-censored observations from multi-state models. Sukkar,
Rafid, et al. [100] used Hidden Markov Model (HMM) to help improve the assessment of
Alzheimerâs Disease progression, while Wang Ying, et al [107] proposed a spatio-temporal
methodology based on Hidden Markov Models (HMM), and apply it on four-dimensional
structural brain magnetic resonance imaging series of older individuals. Yu, Hong-mei, et
al. [113] applied Multi-state Markov model to explore known risk factorsâ effects on each of
81

the possible transitions in the progression of Alzheimerâs Disease. Besides, a bunch of other
references [14, 35, 43, 93, 101, 118] focus on various research interests by using different prediction techniques. Nevertheless, before doing any modeling of AD progression, two things
should be emphasized and understood carefully: measurement of progression and limitations
of prediction. Gelb, Douglas J. [37], from a clinicianâs perspective, reviewed questions that
can be answered by the natural course of AD, and specifically, information regarding measures of functional impairment and how they change over time, whereas Barnes, Deborah
E. [4] pointed out the limitations of most AD risk prediction strategies and suggested that
future techniques should simultaneously model the risk of mortality as well as the risk of AD
over the full preclinical spectrum and to consider the potential harm as well as the benefit
of identifying and treating high-risk older patients.
In this paper, we applied the multi-state Markov model to resemble the progression of
Alzheimerâs Disease. We treated the clinical diagnosis as the response variable (state space),
which consists of 3 levels (states)â NC (Normal control), MCI (Mild cognitive impairment)
and AD (Alzheimerâs Disease). Meanwhile, we treated the screening time (the time when
MRI scans are collected)) as the timeline along which the Markov chain undergoes transitions
from one state to another. Besides, we assumed that transition rates are assumed to be
independent of time (i.e., Markov model is homogeneous), but they depend on patient related
variables, such as MRI regional volumetric information, cognitive assessment information and
demographical information. More specifically, we incorporate them into the Markov model by
considering the logarithm of an transition intensity as a linear function of relevant predictors.
Then we performed likelihood ratio tests (LRT) to compare which model fits the data better.
Finally, we made predictions based on transition rates and transition probabilities by using
a supervised learning method. And leave-one-out cross validation results are provided. As
82

a whole, our contributions are listed as follows. First, for this data set, age is recognized
as not a statistically significant risk factor in our Markov model. Second, different sources
of information are integrated together to describe the overall trend of the progression of
Alzheimerâs Disease. Third, transition rates and transition probabilities from the Markov
model provide us a potential direction for predicting the course of Alzheimerâs Disease.
The remainder of this paper is structured as follows: Section 4.2 introduces how we got
and processed the data. Section 4.3 explains in details the multi-state Markov model used
for prediction, the Likelihood ratio test (LRT) for comparison and the survival analysis for
interpretation. Section 2.5 presents with 4 models, the model without any covariates, the
model with a single covariate Age, the model with a single covariate ApoE4 and the model
for predictions with all the known risk factors in the data. Section 2.6 summarizes the
implications of our findings and provide future perspectives.

4.2

Data

ADNI is an ongoing, longitudinal, multi-center study designed to develop clinical, imaging,
genetic, and biochemical biomarkers for the early detection and tracking of Alzheimerâs
Disease (AD). The ADNI study began in 2004 and included 400 subjects diagnosed with mild
cognitive impairment (MCI), 200 subjects with early AD and 200 elderly control subjects.
This initial phase of the study is known as ADNI1. In 2009, ADNI1 was extended with
ADNI GO which assessed the existing ADNI1 cohort and added 200 participants identified
as having early mild cognitive impairment (EMCI). The objective of this phase was to
examine biomarkers in an earlier stage of disease. In 2011, as ADNI GO was ending, ADNI2
began. ADNI2 assesses participants from the ADNI1/ADNI GO cohort in addition to the

83

following new participantsâ150 elderly controls, 100 EMCI participants, 150 LMCI (late
âmild cognitive impairmentâ) participants and 150 mild AD patients.
We downloaded the âadnimergeâ data from ADNI in 12/8/2014. The âadnimergeâ data
merges together several of the key variables from various case report forms and biomarker lab
summaries across the ADNI protocols (ADNI1 , ADNIGO, and ADNI2). It is updated daily
directly from the Alzheimerâs Disease Cooperative Study (ADCS; http://adcs.org/) data
system and posted to LONI. We separate out all the 570 subjects in ADNI1 as our target data
set. To reduce the impact of unbalanced longitudinal design and characterize the way the
response changes in time, we only keep the 456 subjects who have more than 4 observations
and remove all the others. Among them, 139 subjects, 105 subjects and 66 subjects stay in
NC (Normal control), MCI (Mild cognitive impairment) and AD (Alzheimerâs Disease) all
the time respectively, while 24 subjects, 8 subjects, 105 subjects and 2 subjects convert from
NC to MCI, from MCI to NC, from MCI to AD and from AD to MCI respectively. Besides,
3 subjects convert from NC to MCI to AD. 2 subjects oscillate between NC and MCI, while
another 2 subjects oscillate between MCI and AD.
This cleaned dataset has 272 male subjects and 184 female subjects. Each observation on
every subject is comprised of 3 sources of information â longitudinal MRI regional volumetric information about each subject including âVentriclesâ, âHippocampusâ, âWholeBrainâ,
âEntorhinalâ, âFusiformâ, âMidTempâ, âICVâ (intracranial volume), cognitive assessment
information like âMMSEâ (Mini Mental State Exam [21]) and âCDR-SBâ (Clinical Dementia Rating-Sum of Boxes [12]), demographical information like âAgeâ, âGenderâ, âRaceâ,
âEducationâ, âMarital Statusâ and one pivotal genetic characteristic âApoE4â. Up to 9
structural MRI scans are available for each subject. The average MRI data collecting time
among all the 456 subjects throughout the ADNI1 study is about 3.5 years, while the min84

Table 4.1. Distribution of age in each
Age
Diagnosis
mean median sd
NC
75
75
5.09
76
76
4.58
NC-MCI
MCI-NC
71
74
7.16
MCI
75
75
7.18
73
73
7.09
MCI-AD
AD-MCI
83
83
0.11
74
74
7.18
AD

diagnosis group
min
60
63
62
58
55
83
56

max
90
85
80
87
88
83
88

mean
29
29
29
28
24
26
22

MMSE
median sd min
29
1.15 22
29
1.26 24
29
1.33 24
28
2.14 18
25
4.09
0
26
2.10 22
23
4.42
5

max
30
30
30
30
30
29
30

imum is about 1.5 years and maximum is about 8 years. For patients who are diagnosed
as NC or MCI at baseline (first scan), MRI scans are collected every half year within the
beginning 2 years and every year after 2 years. For the patients who are diagnosed as AD at
baseline, MRI scans are collected every half year within the first year and once at the second
year, therefore we get 4 time points for subjects staying in AD all the time.
Based on the age of each patient at baseline, the 456 subjects can be divided into 4 age
groups â 13 subjects in 50-60, 74 subjects in 60-70, 275 subjects in 70-80 and 94 in 80-90.
In other words, almost 60.31% of the subjects are between 70 and 80. This percentage is
consistent with the natural history timeline that most Alzheimerâs Disease patients were
diagnosed between the ages of 70 and 79 [54]. Additionally, distributions of age in each
diagnosis group are given in Table 4.1. But we havenât observed any significant differences
of age distribution between different diagnosis groups. As a matter of fact, the relationship
between brain aging and Alzheimerâs Disease (AD) is contentious. One view holds AD results
when brain aging surpasses a threshold. The other view postulates AD is not a consequence
of brain aging. For more details, please refer to Swerdlow, H [102].
To describe and visualize the all the 456 subjects, I made the longitudinal trace plot
(Figure 4.1) and longitudinal interaction plot (Figure 4.2). As shown in Figure 4.1, the
percentage of hippocampus volume to ICV is decreasing with age which is known to all.
85

0.7
Hippocampus
0.4
0.5
0.6

0.7

0.3

Hippocampus
0.4
0.5
0.6

0.2

0.3
0.2
55

60

65

70 75 80
Age (years)

85

90

95

60

65

70 75 80
Age (years)

85

90

95

85

90

95

Hippocampus
0.4
0.5
0.6
0.3
0.2

0.2

0.3

Hippocampus
0.4
0.5
0.6

0.7

(b) Subjects within MCI

0.7

(a) Subjects within NC

55

55

60

65

70 75 80
Age (years)

85

90

(c) Subjects ever transit from MCI to AD

95

55

60

65

70 75 80
Age (years)

(d) Subjects within AD

Figure 4.1. Individual plot of percentage of hippocampus volume to ICV vs. age. In each
graph, each black line segement with solid points in it denotes how the hippocampus of
each patient changes with age. The solid point denotes the age when the patient takes
screening. The x-axis is the real age of patients. The y-axis is the percentage of
hippocampus volume to ICV (intracranial volume).
Besides, the slope of the average decreasing trend of AD subjects is higher than those of
NC subjects and MCI subjects. As shown by Figure 4.2a and Figure 4.2b, the paths of
Hippocampus and MMSE aggregated by age at baseline in different diagnosis groups can

86

not be clearly separated, while the paths aggregated by screening time (months) can be
separated very well in Figure 4.2c and Figure 4.2d. It implies that natural aging process
interact with the progression process of Alzheimerâs Disease and thus it is better to position
the observations on the time axis with screening time since we aim to detect how Alzheimerâs
Disease affect the transition between different states of severity.

4.3

Methods

A multi-state Markov model is used to model the progression of diseases (see, e.g. Jackson
et al., 2003 [53]) which can provide information of disease progression by transition probabilities between different states and mean sojourn time at one state. Specifically, we used
a homogeneous model by assuming a transition rate (also called transition intensity), the
rate of a transition probability, is independent of time. In other words, this model allows
a transition probability that changes over time but the rate remains constant. Generally
speaking, the transition rate here can be interpreted as the overall changing speed (or risk)
of the progression of Alzheimerâs Disease. Commenges, Daniel, et al. [19] proved that for
large sample size, the incidence (the number of new cases) in time period ât approximately
equals to the transition rates times ât. If ât is the time unit, for instance one year, then
they are the same. Therefore, the transition rate is not just a concept in the statistical
model, but it does have a direct interpretation in clinical science.
Although transition rates are assumed to be independent of time, actually as we know,
changing speed or risk should depend on patient related variables, such as MRI regional
volumetric information, cognitive assessment information and demographical information.
So we incorporate them into the model by considering the logarithm of an transition intensity

87

0.7
30
0.6

Diagnosis

Diagnosis
25

0.5

D22
D23

0.4

D11

MMSE

Hippocampus

D11

D22

20

D23
D33

D33
15

0.3

0.2

50

60

70

80

Age (years)

10

90

(a) Fraction of hippocampus volum to ICV
aggregated by age at baseline

50

60

70

80

Age (years)

90

(b) MMSE aggregated by age at baseline

30
0.50

Diagnosis

0.45

Diagnosis

0.40

D22
D23

D11

25

MMSE

Hippocampus

D11

0.35

D22
D23

20
D33

D33

0.30

0

10

20

30

40

15

50

Screening time (months)

(c) Fraction of hippocampus volume to ICV
aggregated by screening months

0

10

20

30

40

50

Screening time (months)

(d) MMSE aggregated by screening months

Figure 4.2. Longitudinal plots of fraction of hippocampus volume to ICV and MMSE in 4
different diagnosis groups. âD11â denotes the group of patients who stay within NC all the
time. âD22â denotes the group of patients who stay within MCI all the time. âD23â denotes
the group of patients who ever transit from MCI to AD. âD33â denotes the group of
patients who stay within AD all the time.
as a linear function of relevant predictors (known as Cox proportional hazard model [67]).
For detailed explanation of a multi-state Markov model, please see Appendices Appendix 1

88

and Appendix 2.
When comparing the longitudinal observations from different clinical groups, at different
aging stages, it is crucial to correctly position the observations on the time axis. This is
not straightforward since the disease appears at different ages and chronologically older
brains may have greater structural integrity than younger ones affected by the pathology
( [66]). And obviously, in our longitudinal study, there are two timelines: screening time and
age. Therefore, the way of accommodating the two timelines into our Markov model should
be clearly described. Actually, we treat the screening time as the timeline along which the
Markov chain undergoes transitions from one state to another, as observations in the original
data are aligned by screening time. Meanwhile, we are thinking about including age as a
covariate into the Markov model so that the continuous effect of age on the progression of
Alzheimerâs Disease will be fully identified.
To investigate which model fits the data better, we perform likelihood ratio tests (LRT).
It helps to determine whether known risk factors increased the risk of progression from one
state to another. The LRT is a statistical hypothesis test in which test statistics is the ratio
of two likelihoods (or -2 log likelihood ratio), where the likelihood in the numerator is from
the null model and the likelihood in the denominator is from the alternative model. The
larger value of the likelihood ratio (or smaller value of -2 log likelihood ratio) indicates the
data support the null model.
We can also describe results with a survival probability curve. Survival probability in the
multi-state Markov model implies the chance that a patient is not entering the final state
(i.e. survived when the final state is âdeathâ). Note that the final state in this study is not
âdeathâ but instead the clinical diagnosis âAlzheimerâs Diseaseâ. Compared with KaplanMeier (KM) curve [44] which generally models a process with two states and one possible
89

transition from an alive state to a dead state, a fitted non-AD probability curve from the
Markov model is more powerful and flexible in modeling longitudinal data with multiple
states and any possible transitions among them.

4.4

Results

In this section, we explore several homogeneous continuous-time Markov models to fit the
growth curve of the severity of Dementia (clinical diagnosis) by including different sets of
covariates into transition rates . We first consider the model without any covariates, then
the model with relevant covariates in it. In particular, we incorporate the real age, ApoE4,
MMSE, CDR-SB, and some regional volumetric variables, such as âVentriclesâ, âHippocampusâ, âWholeBrainâ, âEntorhinalâ, âFusiformâ, âMidTempâ, âICVâ. The data we used here
includes all the subjects whose diagnosis is NC or MCI at baseline, and then become MCI or
AD in the end. And we treat AD state as the absorbing state, which means once a subject
enters AD state, s/he will never transit to other states.

4.4.1

Model without covariates

A homogeneous Markov model assumes that transition rate from state r to state r0 , qrr0 , is
independent of time (Appendix Appendix 1). The model without any covariates serves as
a baseline model which barely describes how the growth curve of the transition probability
changes from previous state to the next state between two screening times. In other words,
the transition rates (changing speed of the severity progression) do not depend on the other
conditions but remain constant within each pair of transitions across time of observation.
The estimate of the baseline intensity (q0,rr0 ) and its 95% confidence interval (CI) are given
90

in Table 4.1.
Transition rate is the changing speed of transition probabilities with respect to time
(equation (A.2) in Appendix Appendix 1). From the Table 4.1, transition from âNCâ to
âMCIâ is about 2% (0.0441-0.0269) higher in rate compared to transition from âMCIâ to
âNCâ. Moving from âMCIâ state to âADâ state (0.1968) is about 15% higher in rate compared
to moving from âNCâ state to âMCIâ state (0.0441). More clearly, fitted survival probability
plots in Figure 4.1 shows that subjects in state NC has the higher fitted survival probabilities
(red solid line) than empirical survival probabilities (blue dashed line) while subjects in state
MCI has the lower survival probabilities. The empirical survival curve describes the mean
trend of Alzheimerâs Disease deterioration along with time. In fact, more than half of the
subjects never have a chance to convert into MCI or AD, so the lowest value that empirical
survival probability curve can reach is around 0.6 rather than 0 in common sense. Note
that an empirical survival probability curve is calculated directly from the data without
assuming any model, while the homogeneous continuous-time Markov model gives a smooth
fitted survival probability curve which is different for the patient in different state as shown
in Figure 4.1.
An estimated transition probability describes a chance of a subject moving from one state
to another state for the duration of given time. We provide estimated transition probabilities
for 2 years and 4 years as shown in Tables 4.1. Overall speaking, compared to year 2, the
subjects in year 4 tend to have higher transition probabilities from NC to MCI, from NC
to AD, from MCI to AD. In other words, with the time being, naturally there will be more
and more subjects converting into MCI and AD. Specially, we will observe more and more
complete progression traces that convert from NC to MCI then to AD, although there are
only 3 of them right now. More precisely, the result indicates that a chance of staying in
91

Table 4.1. Table of transition rates and probabilities for the model without covariates.
Probabilities(95% Confidence Intervals)
Transition Rates(95% Confidence Intervals)
Year 2
Year 4
NC-NC
0.9176 (0.8836,0.9417) 0.8447 (0.7794,0.8908)
0.0441 ( 0.0306, 0.0635)
0.0679 (0.0478,0.0970) 0.1058 (0.0739,0.1477)
NC-MCI
NC-AD
0
0.0146 (0.0100,0.0221) 0.0495 (0.0337,0.0719)
MCI-NC
0.0269 ( 0.0159, 0.0454)
0.0414 (0.0250,0.0716) 0.0645 (0.0383,0.1071)
0.6410 (0.5858,0.6886) 0.4136 (0.3494,0.4745)
MCI-MCI
MCI-AD
0.1968 ( 0.1629, 0.2379)
0.3176 (0.2720,0.3670) 0.5218 (0.4546,0.5856)
the same state is higher than moving to a following progression state within 2 years when
a patient is in either âNCâ or âMCIâ (higher remaining in the same state values 0.9176 and
0.6410 than the other values). Once the subject is in MCIâ state, there is a 32% chance
that s/he will move to the âADâ state within 2 years. However, chance of moving from MCI
to AD (52%) is getting higher than staying in the same state MCI within 4 years, which is
reasonable to expect.
The estimated mean sojourn time in Table 4.5 shows that a subject stays in âNCâ state
about 23 years while a subject stays in âMCIâ state for about 4.5 years. This implies the
progression of Alzheimerâs Disease is a slowly changing process especially for the subjects in
NC. This is also confirmed by the estimated transition rates in Table 4.1, where transition
rates at early states are just less than 5%. But once a subject becomes MCI, the converting
speed to AD is much faster (0.2611) and the mean sojourn time staying within MCI is much
shorter (4.5 years).

4.4.2

Model with a single covariate

In this section, we will investigate the effects of two main risk factors: age and apolipoprotein
E e4 (simplified as ApoE4) by building each of them as a single covariate into two separate
models. Details are introduced as follows.

92

1.0
0.8
0.6
0.4
0.2

Survival probability

Group

0.0

NC to AD
MCI to AD

0

2

4

6

8

Screening time (years)
Figure 4.1. Plot of survival curves (not entering to the final state AD). The dashed blue
line denotes fitted survival curve from NC to AD. The solid red line denotes fitted survival
curve from MCI to AD. The dotted lines denote the 95% confidence band of corresponding
fitted survival curve.
4.4.2.1

Age

The major risk factor for Alzheimerâs Disease (AD) is age, with a sharp increase in incidence
after 60 years (Kawas et al., 2000 [57]). Thus, the inherent link between aging and AD

93

should be understood thoroughly and clearly. Just as we know, the morphology observed
in the brains of patients affected by Alzheimerâs Disease (AD) is a combination of different
biological processes, such as normal aging and the pathological matter loss specific to AD
(Lorenzi, Marco, et al. [66]). Therefore, aging might be a contributing effect that accelerate
the progression of Alzheimerâs Disease as well as the transitions from mild status (MCI) to
severe status (AD). All these considerations motivate us to include age (real age when MRI
scans are collected) as a continuous covariate into our Markov model.
To have a brief view of the contributing effect of age, we plot changing curves of fitted
transition rates by age for 3 different transition groups. As shown in Figure 4.2, in the average
sense, transition rates from NC to MCI are increasing with age which is consistent with the
common sense that Brain function deteriorates more likely for older people. But meanwhile,
transition rates from MCI to AD are decreasing with age, which actually contradict with the
fact. The reason might be due to limitations of cross-sectional and longitudinal study designs
[32]. Cross-sectional studies may suffer from cohort effects, and, potentially more seriously,
different recruitment bias across age-groups, while for longitudinal data the selective attrition
and test-retest effects that may be larger than the change across time points (Salthouse,
2012 [90]). And thus it is not surprising to find that compared to model without any
covariates, the model with age is not significant (p-value is 0.1733) according to the likelihood
ratio test Table 4.6.

4.4.2.2

ApoE4

The apolipoprotein E (APOE) e4 allele is the most prevalent genetic risk factor for Alzheimerâs
Disease (AD) (Corder et al., 1993 [20]; Saunders et al., 1993 [92]) and is present in roughly
2025% of North Americans and Europeans (Gerdes et al., 1992 [39]). The presence of an
94

0.30

Group

0.20
0.10
0.00

Transition rate

NCâMCI
MCIâNC
MCIâAD

55

65

75

85

Age (years)
Figure 4.2. Fitted transition rates changing with age in different groups. The red solid
line with circle marks denotes the transition rates from NC state to MCI state. The blue
dashed line with triangular marks denotes the transition rates from MCI state to NC state.
The purple dotted line with plus marks denotes the transition rates from MCI state to AD
state. x-axis denotes the age, while y-axis denotes the transition rates.

95

e4 allele confers a significantly higher likelihood of developing AD [86]. Shi, Jie, et al. [96]
found significant hippocampal morphological deformation in APOE e4 carriers relative to
non-carriers in the entire cohort as well as in the non-demented (pooled MCI and control)
subjects. Spampinato, Maria Vittoria, et al. [97] found that, in a longitudinal magnetic resonance imaging (MRI) study using voxel-based morphometry (VBM), carriers of the APOE
e4 allele with MCI developed atrophy in hippocampus, insula, temporal, and parietal cortex
before converting to AD, while structural changes underlying the conversion to dementia in
non-carriers did not become apparent.
In our study, ApoE4 is a categorical variable, taking values at 0, 1, 2. We group the
ApoE4 into two levels â ApoE4 negative (ApoE4 taking value at 0) and ApoE4 positive
(ApoE4 taking value at 1 or 2). Because there are only 7.51% out of all the subjects
whose ApoE4 takes value at 2 and we wonât get any powerful conclusions if we compare
it with other levels (APoE 4 is 0 or 1). Then we include ApoE4 as a two level categorical
covariate into our Markov model. As expected from the model fitting, carriers of the ApoE4
(ApoE4 is positive) are more likely to transit from NC to MCI and from MCI to AD.
Evidences are shown in Table 4.2, Table 4.3, Table 4.4 and Figure 4.3. In the transition
rates table (Table 4.2), ApoE4 carriers (0.0906) have 3 times higher transition rates from
NC to MCI than ApoE4 non-carriers (0.0299), while ApoE4 carriers (0.2611) have 2 times
higher transition rates from MCI to AD than ApoE4 non-carriers (0.1338). In the table
of transition probabilities at year 2 (Table 4.3), ApoE4 carriers (0.1247) have more than
2 times higher transition probabilities from NC to MCI than ApoE4 non-carriers (0.0495),
while ApoE4 carriers (0.3973) have almost 2 times higher transition rates from MCI to
AD than ApoE4 non-carriers (0.2291). When comparing transition probabilities tables at
different years Table 4.4 (year 4) and Table 4.3 (year 2), the same pattern maintains between
96

Table 4.2. Table of transition rates (95% confidence intervals) for the model with
covariate ApoE4.
Transition
ApoE4 is positive
ApoE4 is negative
NC-MCI 0.0906 (0.0535, 0.1527) 0.0299 ( 0.0180, 0.0495)
NC-AD
0
0
MCI-NC 0.0277 ( 0.0132, 0.0582) 0.0265 ( 0.0126, 0.0556)
MCI-AD 0.2611 ( 0.2067, 0.3297) 0.1338 ( 0.0969, 0.1848)
Table 4.3. Table of transition probabilities (95% confidence intervals) at year 2 for the
model with covariate ApoE4.
Transition
ApoE4 is positive
ApoE4 is negative
NC-NC 0.8383 (0.7422,0.9016) 0.9434 (0.9070,0.9653)
NC-MCI 0.1247 (0.0760,0.1993) 0.0495 (0.0307,0.0825)
NC-AD 0.0369 (0.0220,0.0615) 0.0071 (0.0039,0.0125)
MCI-NC 0.0382 (0.0182,0.0747) 0.0439 (0.0214,0.0900)
MCI-MCI 0.5645 (0.4920,0.6264) 0.7270 (0.6445,0.7889)
MCI-AD 0.3973 (0.3297,0.4699) 0.2291 (0.1684,0.3022)
ApoE4 carriers and non-carriers. And in both tables, transition probabilities from MCI to
AD are higher than those from NC to MCI. In mean sojourn time table (Table 4.5), ApoE4
carriers have much shorter mean sojourn time than ApoE4 non-carriers which means that
ApoE4 has the effect of accelerating the progression of Alzheimerâs Disease. Compared
to model without any covariates, the model with APOE4 is significant (p-value is 0.0002)
according to the likelihood ratio test Table4.6.

4.4.3

Model with multiple covariates

In this section, we made predictions of possible transitions based on transition probabilities and transition rates by including known risk factors as covariates into the model. The
covariates we incorporate into the model are the real age, ApoE4, MMSE, CDR-SB, and
some regional volumetric variables, such as âVentriclesâ, âHippocampusâ, âWholeBrainâ,
âEntorhinalâ, âFusiformâ, âMidTempâ, âICVâ. Note that we believe that the real age contributes in prediction and thus should be included in the model, although it is not recognized
97

Table 4.4. Table of transition probabilities (95% confidence intervals) at year 4 for the
model with covariate ApoE4.
Transition
ApoE is positive
ApoE is negative
NC-NC 0.7076 (0.5558,0.8161) 0.8922 (0.8312,0.9331)
NC-MCI 0.1750 (0.1101,0.2610) 0.0827 (0.0505,0.1278)
NC-AD 0.1175 (0.0732,0.1885) 0.0251 (0.0143,0.0423)
MCI-NC 0.0536 (0.0240,0.1111) 0.0734 (0.0366,0.1495)
MCI-MCI 0.3235 (0.2411,0.3969) 0.5307 (0.4331,0.6196)
MCI-AD 0.6229 (0.5423,0.7087) 0.3959 (0.3069,0.4989)

0.8

0.8

Table 4.5. Table of mean sojourn time (95% confidence intervals) for the model with
covariate ApoE4.
Model with covariate ApoE4
Transition Model without covariates
ApoE is positive
ApoE is negative
NC 22.6864 (15.7601, 32.6567) 11.0653 (6.5486,18.6972) 33.5035 (20.1918,55.5913)
4.4695 (3.7402, 5.3411)
3.4632 (0.3939, 2.7712)
6.2389 (4.6385, 8.3914)
MCI

Group

Group

0.6
0.4

Transition probability

0.4

0.0

0.2

0.2

0.6

NCâMCI
MCIâNC
MCIâAD

0.0

Transition probability

NCâMCI
MCIâNC
MCIâAD

0

1

2

3

4

0

Screening time (years)

(a) ApoE4 is positive

1

2

3

4

Screening time (years)

(b) ApoE4 is negative

Figure 4.3. Transition probabilities changing with time for the model with covariate
ApoE4. The red solid line with circle marks describes transition probabilities from NC
state to MCI state. The blue dashed line with triangle marks describes transition
probabilities from MCI state to NC state. The purple dotted line with plus marks
describes transition probabilities from MCI state to AD state. x-axis denotes the scanning
time, while y-axis denotes the transition probabilities. The left panel describes the ApoE4
carriers (positive), while the right panel describes the ApoE4 non-carriers (negative).
as a significant risk factor in the previous section (p-value is 0.1733 in the Likelihood Ratio
Test).
98

Table 4.6. Likelihood ratio test between different models.
Models

Age

without covariates

-2 log LR
df
p-value

Volumes

-2 log LR -64.1235
df
15
p-value
1

4.4.3.1

ApoE4

4.9796
3
0.1733

Volumes

19.7598 69.10305
3
18
0.0002
0
-49.3432
15
1

-

Scores

Full

129.8185 166.6650
6
30
0
0
60.7155
12
0

97.5619
12
0

Prediction and validation

As described in the previous sections, MCI subjects have much higher risk of developing into
AD. Therefore, we are more interested in making predictions of transitions from MCI to AD
(only two- state model). The data we used here includes only the subjects whose diagnosis is
MCI at baseline, and then become AD in the end, which means we are not going to predict
the transition from NC to MCI. And we still treat AD state as the absorbing state.
Cross validation is performed by processing the testing set with the trained Markov
model. More specifically, a general two-step training and testing procedure is described as
follows. All the transition rates and probabilities here stand for the transition from MCI to
AD, and the logistic regression is just used as a tool for discriminating observations with
lower or higher transition rates and transition probabilities into MCI state or AD state.
First, our Markov model is trained by using the training set. Then we collect the outputs
including the estimated transition rates and transition probabilities. Afterwards, a logistic
regression model of diagnosis states (MCI or AD) was trained on the estimated transition
rates and transition probabilities. Second, by plugging the testing set into the trained Markov
model, we get the new estimated transition rates and transition probabilities. Then, we plug
them into the trained logistic regression model, and get the predicted diagnosis state. Note
that predictions are performed within each subjects, and it is one time step prediction, in
99

other words, each subjectâs information in the previous scanning time is used to predict the
diagnosis state in the next scanning time.
As known to all, correct registration and alignment of subjects on the timeline is crucial
to longitudinal medical studies. Therefore, to investigate how registration and alignment
will affect prediction of Alzheimerâs Disease progression, we prepared two data sets for cross
validation, Data1 and Data2. Data1 keeps all the observations before AD state for each
subject so that all the subjects are aligned to the baseline of scanning time and the numbers
of replications (observations) of different subjects are unbalanced. Data2 only keeps two
observations before AD state for each subject so that all the subjects are aligned to the last
scanning time point (AD state) and the numbers of replications (observations) of different
subjects are balanced (totally 3 observations). And we believe that, in Data2, subjects are
sharing more similarities in structural volumetric changing patterns during the two time
points before becoming AD state. But things are usually not as perfect as expected, because
the two time points before AD state can not be fully aligned for all the subjects. The reason is
that, as prespecified by ADNI, for subjects who are diagnosed as NC or MCI at baseline (first
scan), MRI scans are collected every half year within the beginning 2 years and every year
after 2 years. Thus it is highly likely that, within two years before AD, one subject is observed
every half year while the other is observed every year. Furthermore, based on the analysis
from previous section, we know that subjects with higher transition rates and transition
probabilities tend to develop higher risk of accelerating the deterioration of the Alzheimerâs
Disease. This phenomenon motivates us to divide the subjects in each data sets (Data1 and
Data2) into groups according to the scale of transition rates and transition probabilities.
And Group1 (Table 4.7) stands for the higher risk group, while Group2 (Table 4.7) stands
for the lower risk group. The performance of prediction between the two groups should be
100

different, just as they should be treated in different ways in clinical treatment.
Table 4.7 presents us with all the leave-one-out cross validation results for different groups
in different data sets. Note that the true positives here stand for the true AD is predicted as
AD, while the true negatives here stand for the true MCI is predicted as MCI. Comparing
the Overall model based on Data1 with the Overall model based on Data2, the latter has a
bigger sensitivity (0.5158 vs 0.4000), a smaller specificity (0.6947 vs 0.8930) and a smaller
accuracy (0.6053 vs 0.7313). It implies that there are more cases that true MCI or AD
are predicted as AD (for Data2, both true positives and false positives are increasing). In
other words, the trained model based on Data2 is more likely to transit to AD than trained
model on Data1. We consider two reasons. One reason is quite clear. On average, there are
more MCI state observations in Data1 than in Data2, thus the observations of transitions
from MCI to AD is relatively diluted in Data1. Hence, it turns out that the trained Markov
model based on Data1 reduces the rates and probability of transitions. Another reason is
that, for Data2, there are only 3 observations for each subject and also observations for MCI
state are not perfectly aligned on the timeline as explained in the last paragraph. These in
some sense will amplify the effect of disalignment on increasing the number of false positives
and false negatives. When comparing Group1 with Group2 in either data set, Group1 has
much higher sensitivity, specificity, precision and accuracy. It implies that transition rates
and transition probability not only provides us integrated information to do prediction, but
also provides a possible way to cluster individuals into different groups sharing similarities
in progression of brain functional impairment.

101

Table 4.7. Prediction of two data sets with different alignments in time. Data1 includes
all the observations before AD for each subject, while Data 2 includes only 2 observations
before Ad for each subject. Data1 is divided into Group1 and Group2. Group1 in Data1
includes all the patients whose transition probability is ever more than 0.45 or whose
transition rate is more than 1.2. Also, Data2 is divided into Group1 and Group2. Group1
in Data2 include all the patients whose transition probability is ever more than 0.6 or
whose transition rate is more than 1.2.
Data1 Subjects Sensitivity Specificity Precision Accuracy AUC
Group1
51
0.5098
0.9008
0.6842
0.7849
0.8583
Group2
54
0.4074
0.8404
0.5946
0.6824
0.6843
Overall
105
0.4000
0.8930
0.6461
0.7313
0.7404
Data2 Subjects Sensitivity Specificity Precision Accuracy AUC
Group1
59
0.7119
0.6610
0.6774
0.6864
0.7672
Group2
36
0.5556
0.6389
0.6061
0.5972
0.5829
95
0.5158
0.6947
0.6282
0.6053
0.6504
Overall
4.4.3.2

Impact of transition rates and transition probabilities

To have a clearer picture of how both transition rates and transition probabilities work as
integrated predictors helping with predictions in progression of Alzheimerâs Disease, we are
trying to make prediction on the original data set. The training set we used here includes
all the subjects whose diagnosis is NC or MCI at baseline, and then become MCI or AD in
the end. And we treat AD state as the absorbing state. The prediction set we used here is
the most original one which allows transitions from AD to AD. After training our Markov
model by using the training set, we can get one time step predictions of transition rates
and transition probabilities (from NC to MCI and from MCI to AD) based on the testing
set.Then we aggregate them by screening time (months) and real diagnosis state and get
the longitudinal plots of predicted transition rates and probabilities in different diagnosis
groups (Figure 4.4). Note that we took logarithm of transition rates, since they are in a
wide range. When comparing plots of predicted transition probabilities from NC to MCI
(Figure 4.4b) with plots of predicted transition probabilities from MCI to AD (Figure 4.4d),
we observe that neither the diagnosis group âD23â nor âD33â separated very well from all the
102

other diagnosis groups in Figure 4.4b, which tells us that when doing prediction of transition
from MCI to AD, weâd better use the information of transition intensities from MCI to
AD instead of from NC to MCI. The reason of the difference is obvious that transition
probabilities from NC to MCI mostly reflects the overall risk in the progression of disease
severity from NC state to MCI state. In the diagnosis group âD23â and âD33â, with the time
being, the transition probabilities from NC to MCI will be decreasing finally since most of
subjects in âD23â and âD33â have been getting involved in converting from MCI or AD to
AD state instead of converting from NC to MCI. To the opposite, transition probabilities
within four groups in Figure 4.4d have a continuously increasing trend, since all the diagnosis
states will convert to AD state (with the order from NC to MCI to AD) in the end. When
comparing plots of Predicted transition rates from MCI to AD (Figure 4.4c) with plots of
predicted transition probabilities from MCI to AD (Figure 4.4d), we find that in both cases
âD23â and âD33â are not separated very well with each other, which is consistent with the fact
that the boundary in the assessment of diagnosis between transition from MCI to AD and
always staying in AD are not very clear. However, all the other diagnosis group âNCâ, âMCIâ,
âADâ are separated very well, which indicates that transition rates and transition probabilities
could help to discriminate transition states and thus predict progression of brain functional
impairment. To put it differently, our Markov model is capable of integrating different sources
of information into two simple indices to resemble and predict the mechanism of progression
of Alzheimerâs Disease. One index is the transition rate, which reflects the overall speed of
progression. Another is the transition probability, which tells us the possibility of a certain
transition at a given time.

103

7.5
0.3

Transition rate

5.0

D11
D22

2.5

D23
0.0

D33

Group

Transition probability

Group

D11

0.2

D22
D23
D33

0.1

â2.5
0

10

20

30

40

0.0

50

Screening time (months)

0

10

20

30

40

50

Screening time (months)

(a) Predicted transition rates from NC to MCI (b) Predicted transition probabilities from NC
by screening months
to MCI by screening months

Group

Group

Transition probability

Transition rate

4

0.75

D11
2
D22

D11
D22

0.50

D23

0

D33

D23
D33

0.25

â2

0

10

20

30

40

0.00

50

Screening time (months)

0

10

20

30

40

50

Screening time (months)

(c) Predicted transition rates from MCI to AD (d) Predicted transition probabilities from MCI
by screening months
to AD by screening months

Figure 4.4. Longitudinal plots of predicted transition rates and probabilities in different
diagnosis groups. âD11â denotes the group of patients who stay within NC all the time.
âD22â denotes the group of patients who stay within MCI all the time. âD23â denotes the
group of patients who ever transit from MCI to AD. âD33â denotes the group of patients
who stay within AD all the time.

104

4.5

Conclusion and discussion

When positioning the observations on the time axis, we found that aligning the observations
by screening time instead of age provides us more distinct pathways to differentiate the
transition pattern between different transition groups (diagnosis). Technically, we treated
the screening time as the timeline along which the Markov chain undergoes transitions from
one state to another. Meanwhile, we included age as a covariate into the Markov model
to make prediction so that the continuous effect of age on the progression of Alzheimerâs
Disease will be fully identified.
Results from model without covariates imply the progression of Alzheimerâs Disease is a
slowly changing process especially for the subjects in NC. The transition speed from MCI
to AD is much faster than the transition speed from NC to MCI or from NC to AD, which
means MCI subjects take higher risk to develop into Alzheimerâs Disease. Actually, the
average sojourn time staying MCI is 4.47 year. In other words, after 4.47 years on average,
MCI subjects will transit to other states, most likely to AD. Compared with Kaplan-Meier
(KM) curve which generally models a process with two states and one possible transition
from an alive state to a dead state, a fitted non-AD probability curve from the Markov
model is more powerful and flexible in modeling longitudinal data with multiple states and
any possible transitions among them.
Compared to model without any covariates, the model with age is not significant (pvalue is 0.1733), while the model with ApoE4 (p-value is 0.0002), the model with Volumes
(p-value is 0), the model with scores (p-value is 0) and the full model (p-value is 0) are all
significant. Compared to model with volumes, the model with age (p-value is 1) and the
model with ApoE4 (p-value is 1) are not significant, while the model with cognitive testing

105

scores (p-value is 0) and the full model (p-value is 0) are significant.
Besides survival probability and mean sojourn time, Markov Chain model gives us two
main products can be used to do predictions: transition rates and transition probabilities.
First, we fitted the Markov model and got the corresponding transition rates and transition
probabilities and then use some machine learning methods to built the connection between
the two main products and the transition states. From the leave-one-out cross validation
results, the highest AUC (area under curve) we can get is 0.8583.
The key property of a continuous-time Markov chain model is that the state of the next
step only depends on the state information on the previous step. Because of this property,
Markov chain is more powerful in making one step prediction instead of multiple steps
prediction. What is more, the longitudinal data we used here is obviously interval censored.
We didnât design specially a scheme to model the interval censoring problem in this paper.
There are some other limitations. On the one hand, the results and findings are based only
on this particular data set. On the other hand, this is the largest available longitudinal study
on ADNI1, but we didnât consider the data from ADNIGo or ADNI2.

106

APPENDICES

107

Appendix 1

Multi-state Markov model

Let {S(t), t âĽ 0} be a multi-state Markov process with n different states. Let S be the set
of states. For simplicity, assume that S = {1, Âˇ Âˇ Âˇ , n}. The process is Markovian if for all
t, t0 > 0 and r, r0 â S,
P (S(t + t0 ) = r0 | S(t) = r, Fu , u < t) = P (S(t + t0 ) = r0 | S(t) = r),

(A.1)

where Fu is the observation history of the Markov process up to time u. That is, the state of
the process at a future time t + t0 given the previous history of the process up to the present
time t depends only through the state of the present time t. Given the initial distribution
of the process at time 0, a multi-state Markov process can be characterized by the matrix of
transition probabilities, P (t, t + t0 ) = ((prr0 (t, t + t0 ))), where prr0 (t, t + t0 ) = P (S(t + t0 ) =
r0 | S(t) = r). Note that the sum of each row of P (t, t + t0 ) is always 1. Alternatively, we
can introduce the matrix of transition intensities, Q(t) = ((qrr0 (t))),
qrr0 (t) = lim [P (S(t + t0 ) = r0 | S(t) = r)/t0 ].

(A.2)

t0 â0

qrr0 can be interpreted as the instantaneous hazard of progressio n to stage r0 from stage r
(see, e.g. Jackson et al., 2003 [53]). If the process S(t) is assumed to be homogeneous in
time, qrr0 (t) is independent of t.
Equation (A.2) implies that the intensity matrix Q behaves like the derivatives of transition probability matrix P . Thus, the sum of each row of Q is 0 so that qrr (t) = â

P

r0 6=r qrr0 (t)

for each r â S. From the Chapman-Kolmogorov equations, we can show that P (t) = exp(tQ)
for a homogeneous continuous-time Markov process (see, e.g. Whitt, 2012, [109]).

108

It is of interest to incorporate additional information of each subject in the model for
AAA progression. This can be done by introducing a regression component into the intensity
matrix Q. Let X(t) be the set of explanatory variables available at time t. Then, we model
the intensity matrix Q by

qrr0 (t, X(t)) = q0,rr0 exp





Î˛ Trr0 X(t)

,

(A.3)

where Î˛ rr0 is the set of regression parameters corresponding to the explanatory variable X(t)
for r 6= r0 . Note that we consider positive qrr0 for r 6= r0 while X(t) may be not. Thus,
an exponential function is natural to link these two quantities. qrr will be specified using
qrr (t) = â

P

r0 6=r qrr0 (t).

q0,rr0 can be interpreted as a baseline intensity when X(t) does

not affect the process. Note that the term of âhomogeneousâ in the homogeneous Markov
chains describes that the transition intensities qrs are independent of time t. Also in the
homogeneous Markov model, the sojourn time in each state r is exponentially distributed
with mean â1/qrr and the probability that an individual in state r moves to state s is
âqrs /qrr . For more details, please refer to [75].

109

Appendix 2

Specification for AAA progression

For AAA stages, we consider the following intensity matrix,
ďŁŤ

ďŁś

0
0 ďŁˇ
ďŁŹ âq12 q12
ďŁˇ
ďŁŹ
ďŁŹ
ďŁˇ
ďŁŹ 0
ďŁˇ
âq
q
0
23
23
ďŁŹ
ďŁˇ
Q=ďŁŹ
ďŁˇ.
ďŁŹ
ďŁˇ
ďŁŹ 0
ďŁˇ
q
â(q
+
q
)
q
32
32
34
34
ďŁŹ
ďŁˇ
ďŁ­
ďŁ¸
0
0
0
0

(A.4)

The structure in (A.4) represents a multi-state model. Also, the model assumes that the last
stage is an absorbing stage, i.e. one can not go back to the other stage once it enters that
absorbing stage. Note that once an AAA has been treated with a surgery or been ruptured,
state 4 (fatal) can not go back to the previous state. Thus, the absorbing stage assumption
is reasonable for AAA data.
We estimate q12 , q23 , q32 and q34 using the maximum likelihood method via EM algorithm
(Jackson et al., 2003 [53]). For the EM algorithm, we need to set the initial values. Mean
sojourn time, â1/qrr , can be approximated by
total years spent in state r
,
total transitions from state r

so that we can obtain initial values for qrr0 from qrr = â

P

r0 6=r qrr0 .

For AAA data, the

mean period in state 1 before moving to state 2 is about 2 year. Thus, we approximate
q11 = â2 and q12 = 0.5 for their initial values.
Jackson (2011) [16] developed the R package, msm (available at www.r-project.org)
for a multi-state hidden Markov model. We use this package for out analysis.

110

Chapter 5
Conclusions, Discussion, and
Directions for Future Research

5.1

Conclusions and discussion

Data analytics is playing a more and more crucial role in this information explosion era. Tons
of thousands of events, situations, systems and phenomenon are waiting for statisticians to
explore. All kinds of efforts and intelligence have been put into exploring the intractable
sources of uncertainties, the complex structures hidden behind observations, the unknown
interactions and dependencies across space and time, the ultra-sparse signals immersed in
noises, the evolutionary path of a high dimensional network and so much other information.
In this dissertation, Bayesian calibration is used to explore the intractable sources of
uncertainties and the complex structures hidden in the system of AAA enlargement. Markov
model is used to explore the unknown interactions and dependencies across space and time in
the Brain images of patients having Alzheimerâs Disease. But still many research questions
and aspects need to be taken care of in the future. Although Bayesian calibration has been
widely applied in analyzing big and complex data from Engineering, clinical sciences, climate
change and Physics, etc., theoretical results of convergence and consistency has not been fully
developed yet. For Brain imaging analysis, although we used Markov chain to analyze the

111

regional volumes getting from MRI of patients having Alzheimerâs Disease, further effort
should be dedicated to other types of data such as genetic data, PET scan, DTI, fMRI, etc.
and related research questions.
Another important concept that has not been discussed is dimension reduction. To tackle
the high dimensionality, variability and complexity of big data, I mainly seek two directions
of statistical research to approach innovative theories and methodologies that imply huge
impacts in data scienceâhierarchical modeling and dimension reduction (penalization).
Therefore, the future research is mainly focused on three aspects: consistency theory of
Bayesian calibration, Bayesian variable selection, dynamic network and sparsity regulation.

5.2
5.2.1

Directions for future research
Bayesian consistency

In most cases, the true value of the calibration parameters cannot be measured or observed
physically. Therefore, it is necessary to define the true parameter value and then provide a
theoretical study showing that the estimate of the calibration parameter converges to the
true value in probability. Rui Tuo and C. F. Jeff Wu 2016 [103] has provided a theoretical
framework for calibration in computer models by considering simplifications like dropping
the prior. So first I would like to introduce Rui Tuo and C. F. Jeff Wuâs work as follows,
and then propose the future directions of research.
Real observation and computer model are related through

z p (x) = rs (x, Î¸0 ) + Î´(x) + e,

112

(5.1)

where Î¸0 is the true value of the calibration parameter and Î´(Âˇ) is an unknown discrepancy function. Suppose the physical system is deterministic, then the observational error is
neglected and we get
z p (x) = rs (x, Î¸0 ) + Î´(x),

(5.2)

where Î¸0 is the true value of the calibration parameter and Î´ is the discrepancy between z p
and rs (Âˇ, Î¸0 ). Usually Î´ is a nonzero function because the computer code is built based on
certain assumptions or simplifications which do not match the reality exactly. Define

Îľ(x, Î¸) := z p (x) â rs (x, Î¸),

(5.3)

The discrepancy function Î´ is a realization of a Gaussian process with mean 0 and the
covariance function Ď 2 ÎŚ, where Ď 2 is an unknown parameter and the function ÎŚ is known.
We adopt the L2 norm in defining the distance.The L2 distance projection of Î¸ is given
by
Î¸? = argminÎ¸âÎ kÎľ(Âˇ, Î¸)kL (âŚ) ,
2

(5.4)

The L2 norm comes from the common use of the quadratic loss criterion. The expected
quadratic loss between the prediction and the computer model given Î¸ is

Z
âŚ

(z p (x) â rs (x, Î¸0 ))2 dx = kÎľ(Âˇ, Î¸)kL (âŚ)
2

Because we donât know the true value of Î¸, the authors define the true calibration parameter as the value Î¸? minimizes the average predictive error ??.
Maximum likelihood estimation (MLE) is used to estimate (Î¸, Ď 2 ) instead of Bayesian

113

analysis. Under these assumptions, the functions Îľ(xi , Âˇ) is known for i = 1, . . . , n. Then it
can be shown that the log-likelihood function for (Î¸, Ď 2 ) is given by

1
1
n
l(Î¸, Ď 2 ; Y ) = â log Ď 2 â log |ÎŚ| â 2 Îľ(x, Î¸)T ÎŚâ1 Îľ(x, Î¸),
2
2
Ď

(5.5)

where Îľ(x, Î¸) = (Îľ(x1 , Î¸), . . . , Îľ(xn , Î¸))T and ÎŚ = (ÎŚ(xi , xj ))ij . And the authors discuss the
consistency of the maximum likelihood estimate to the defined true parameter.
I would like to generalize the above theory from the frequency setup into Bayesian setup.
Following the same logic, first I need to define what is the true value. Define true process as

Îś(Âˇ) = rs (Âˇ, Î¸) + Î´(x),

The conditional posterior predictive density of true process is f (Îś(x)|Î¸, ĎĚ, d), while the posterior distribution of parameter Î¸ is f (Î¸|ĎĚ, d). I would like to show that when f (Îś(x)|Î¸, ĎĚ, d)
is consistent to f (Îś(x)|Î¸0 , ĎĚ, d), then f (Î¸|ĎĚ, d) is consistent to f (Î¸0 |ĎĚ, d).
Two popular methods to prove Bayesian consistency are Kullback-Leibler divergence
and Hellinger consistency. Stephen Walker, 2004 [106] derives sufficient conditions for both
Hellinger and Kullback-Leibler consistency and proves equivalence between them.
Hellinger distance is defined as

Z
dH (f, f0 ) =

1/2 
1/2
Z p
p
p
= 2(1 â ( f f0 ))
( f ) â f0

For Hellinger consistency, the required result we are aiming for is

Î n ({f : dH (f, f0 ) > }) â 0 a.s. [F0â ]
114

Kullback-Leibler divergence between f and f0 is dK (f, f0 ) =

R

f0 log(f0 /f ). The Kullback-

Leibler property is given by Î ({f : dK (f, f0 ) < }) > 0.
For more details about the consistency and convergence of posterior distribution, please
refer to Subhashis Ghosal, 1997 [41].

5.2.2

Variable selection in Bayesian statistics

Penalization methods in high dimensional problems, being the flavor of the last decade, have
enjoyed much attention resulting in a rich theoretical literature establishing their optimality
properties. There are also fast algorithms and compelling applied results underlining the
success story of these methods. In the regression setup, it is common to view the negative
log prior density as a penalty function. As a result, a Gaussian prior is equivalent to a L2
penalty regularization, while a Laplacian prior is equivalent to a L1 penalty regularization.
Besides, the idea of a weighted mixture prior is similar to that of an elastic net regularization.
Every time, model improvement happens when prior put more probability around zero, and
heavier tails of the non-zero regions. Because variable selection techniques always aim to
balance the efficiency of penalizing true zero coefficients to be exact zero and pulling true
nonzero coefficients away from zero. In terms of these consideration, a horseshoe prior
(Carvalho, 2010 [11]) comes to the scene, since it has infinite mass at zero and Cauchy-like
tails.
Although exact zeros can be achieved in spike and slab (George,and McCulloch, 1997 [38]),
the computational burden is high since sampling from the posterior of binary variables is
not an easy task especially in the high dimension case. There are several optimality results
available for the horseshoe and horseshoe+ priors (Carvalho et al., 2010 [11]); (Datta et al.,
2013 [26]), (van der Pas et al., 2014 [104])). However, to the best of my knowledge, all of
115

these results are concerned with the estimation of the multivariate normal mean. A satisfactory theory for uncertainty quantification for penalized estimators is not yet in place. I hope
to be able to choose a default shrinkage prior that leads to similar optimality properties to
those shown for L1 penalization and other approaches, but with the added advantage of the
entire posterior distribution concentrating at the optimal rate, instead of just focusing on
the point estimate.
In the following, I am going to introduce the Bayesian variable selection using Inverted
Beta distribution (Special case is Horse shoe prior). For the regression situation involving
the observation of a dependent variable Y and a set of potential predictors X1 , . . . , Xp , we
consider the canonical regression setup

Y |X, Î˛, Ď 2 âź Nn (XÎ˛, Ď 2 I),

(5.6)

where Y is n Ă 1, X = [X1 , . . . , Xp ] is n Ă p, Î˛ = (Î˛1 , . . . , Î˛p )0 , and Ď 2 is a scalar. Both Î˛
and Ď 2 are considered unknown.
We describe the prior of Î˛ as a multivariate normal

Ď(Î˛|Ď 2 , Îť) = Np (0, ÎĽ(Ď,Îť) ).

(5.7)

The residual variance Ď 2 is conveniently modeled as a realization from an inverse gamma
prior.
Ď(Ď 2 ) = IG(Î˝/2, Î˝Ď/2),
which is equivalent to Î˝Ď/Ď 2 âź Ď2Î˝ . It might be desirable to have Ď decrease with p.

116

(5.8)

We consider the special case of ÎĽ(Ď,Îť) in (5.7) is of the form

Ď(Î˛|Ď, Îť) = Np (0, Ď 2 D(Îť)RD(Îť)),

(5.9)

where D(Îť) is diagonal and R is a correlation matrix. We denote the i-th diagonal element
of D2 (Îť) by
(D2 (Îť))ii = Îť2 .
Each component of Î˛ is modeled as a normal distribution

Ď(Î˛i |Ď 2 , Îť2 ) = N (0, Îť2 Ď 2 ).

(5.10)

The scale parameter Îť2 follows an inverted-beta distribution :

p(Îť2 |a, b) =

(Îť2 )bâ1 (1 + Îť2 )â(a+b)
,
Beta(a, b)

(5.11)

where Beta(a, b) denotes the beta function, and where a and b are positive reals.
The Inverted Beta distribution is plotted when parameters a and b take different (Figure 5.1). When b â¤ 1, the density is infinite around origin. When b > 1, the density is 0 at
origin. As you might notice, the behaviors (convergence rates) at the tail and around origin
are different with the change of parameters a and b. It gives us ideas that Inverted Beta
distribution can serve as a flexible and versatile prior in Bayesian variable selection and it
suffices to model different level sparsity if parameters a and b are carefully chosen. As we
know, the horseshoe prior is just a special case when a = 0.5 and b = 0.5.

117

Figure 5.1. Inverted Beta distribution when parameters a and b take different values.
The generalized beta prime distribution has the probability density function as

f (x; Îą, Î˛, p, q) =

p( xq )Îąpâ1 (1 + ( xq )p )â(Îą+Î˛)
qBeta(Îą, Î˛)

,

where p > 0 controls the shape and q > 0 controls the scale.
Shown below are the two modeling options I am aiming to do, if we would like to use
the global-local prior. The first one is putting the global parameter Ď as an upper hierarchy
of the local parameter Îťi . Each regression coefficient Î˛i is controlled by local parameter Îťi ,

118

while each Îťi is dominated by the global parameter Ď .

Y |Î˛, Ď 2 âź Nn (XÎ˛, Ď 2 I),
Î˛|Ď 2 , Îťi âź Np (0, Ď 2 D(Îťi )RD(Îťi )),
Î˛i |Ď 2 , Îť2i âź N (0, Îť2i Ď 2 ),
Îť

(5.12)
Îť

( i )a1 â1 (1 + ( Ďi ))â(a1 +b1 )
Ď(Îťi |Ď, a1 , b1 ) = Ď
,
Ď Beta(a1 , b1 )
Ď(Ď |a0 , b0 ) =

(Ď )a0 â1 (1 + Ď )â(a0 +b0 )
.
Beta(a0 , b0 )

The second one is putting the global parameter Ď and the local parameter Îťi at the same
hierachy. But we use the product of them to control the magnitude each regression coefficient
Î˛i .
Y |Î˛, Ď 2 âź Nn (XÎ˛, Ď 2 I),
Î˛|Ď 2 , Îťi , Ď âź Np (0, Ď 2 D(Îťi , Ď )RD(Îťi , Ď )),
Î˛i |Ď 2 , Îť2i , Ď 2 âź N (0, Îť2i Ď 2 Ď 2 ),
Ď(Îťi |a1 , b1 ) =
Ď(Ď |a0 , b0 ) =

5.2.3

(5.13)

(Îťi )a1 â1 (1 + Îťi )â(a1 +b1 )
,
Beta(a1 , b1 )
(Ď )a0 â1 (1 + Ď )â(a0 +b0 )
.
Beta(a0 , b0 )

Bayesian Dynamic network

When applied to the brain, the term connectivity refers to several different and interrelated
aspects of brain organization (Horwitz, 2003 [52]). A fundamental distinction is that between structural connectivity, functional connectivity and effective connectivity. Functional
connectivity is the connectivity between brain regions that share functional properties. More
specifically, it can be defined as the temporal correlation between spatially remote neurophysiological events, expressed as deviation from statistical independence across these events
119

in distributed neuronal groups and areas. This applies to both resting state and task-state
studies.
Let G = (V, E) be a graph with the vertex (node) set V , and the edge set E. A node u â V
represents an entity (intensity of a voxel) and an edge (u, v) â E represents a connectivity.
The precision matrix (inverse covariance matrix) is a popular measure used to describe an
undirected graph, since the zero entry is used to model the conditional independence between
the corresponding two nodes. Friedman, 2008 [33] developed a simple algorithm to solve the
estimation of sparse undirected graphical models through the use of L1 (lasso) regularization.
For each patient, the data Y = (yt,v ) is a matrix, with each row t (t â {1, . . . , T })
denotes the time, and each column v (v â {1, . . . , V }) denotes a voxel. And the values
of each entry yt,v denotes the intensity. Each column is time-ordered. Then the precision
matrix estimated from data Y must be a function of time as the data is spatial and temporal.
From the perspective of application, we actually get a time varying undirected graph which
is called dynamic network. As shown in Figure [?], the network topology at time t â 1 is
different from that at time t. The topology includes several aspectsâorganization of voxels
and nodes, number of nodes, connectivities between nodes. When considering the changes
of network between two time points, we might consider changes in the intensity of nodes,
changes in the number of nodes and changes between network connectivities. We call them
all together as topology transitions. For more details, please refer to Zhou, et al. 2010 [120]
and Monti, et al. 2014 [72].
Because the precision matrix is positive definite, modeling and penalizing the timevarying precision matrix directly is difficult. We have to decompose the precision matrix
into parts that do not have positive definite requirements. Suppose each precision matrix

120

Voxel
Voxel
Voxel

Voxel

Voxel
Voxel

Voxel

Voxel

Voxel

Group 3
Voxel

Voxel

Voxel

Group 3

Voxel
Voxel

Group 2

Group2
Voxel

Network at t-1

Network at t
Group 1

Group 1

Voxel

Voxel
Voxel
Voxel
Voxel

Voxel
Voxel

Voxel

Voxel

Figure 5.2. Group 1, Group 2 and Group 3 (big circles) denote the nodes of the brain
network. The thick solid line denotes the connectivity between nodes. As we can see the
connectivity changes with time. Voxels (small circles) denote the real observations.
âŚt has a âLDLâ type of matrix decomposition

âŚt = LD t L0 ,

where D t is a diagonal matrix and L is a upper triangular matrix, which is an unique form
given by Cholesky Decomposition.
And we assume that the diagonal matrix D t evolutes with time. We model the sparsity of
the dynamic network by penalizing the Cholesky factor. But the sparse Cholesky factor does
not guarantee to construct a sparse symmetric positive-definite matrix. Rothman, 2010 [88]
discovered a subtle relationship between the sparsity of Cholesky factor and the sparsity
of the covariance matrix. When connectivity representative L changes with time, then it
is more likely that every âŚt can be decomposed exactly into Lt Dt Dt Lt .Then the graphical

121

model likelihood with penalty and other hierarchies becomes

Q(L, D, Î˛, Ď)
=

T 
X
t=1

+

tr(S t Lt Dt Dt (Lt )0 ) â log det Lt Dt Dt (Lt )0

T
X
t=2

â Îť1

kLt k


1

!
p
2
1 X
) + p log(Ď t ) ,
log(dti ) â Î˛it log(dtâ1
i
t
2
2(Ď )
i=1

where dti is the i â th diagonal entry of D t .

122

BIBLIOGRAPHY

123

BIBLIOGRAPHY

[1] Allen, J. S., Damasio, H., and Grabowski, T. J. Normal neuroanatomical
variation in the human brain: An mri-volumetric study. American journal of physical
anthropology 118, 4 (2002), 341â358.
[2] Aparicio, P., Mandaltsi, A., Boamah, J., Chen, H., Selimovic, A., Bratby,
M., Uberoi, R., Ventikos, Y., and Watton, P. Modelling the influence of endothelial heterogeneity on the progression of arterial disease: application to abdominal
aortic aneurysm evolution. International Journal for Numerical Methods in Biomedical
Engineering 30, 5 (2014), 563â586.
[3] Arzani, A., Les, A., Dalman, R., and Shadden, S. Effect of exercise on patient
specific abdominal aortic aneurysm flow topology and mixing. International Journal
for Numerical Methods in Biomedical Engineering 30, 2 (2014), 280295.
[4] Barnes, D. E., and Lee, S. J. Predicting alzheimerâs risk: why and how?
Alzheimerâs Research & Therapy 3, 6 (2011), 1â3.
[5] Berger, J. Statistical decision theory and bayesian inference. Springer0Verlag (New
York) (1985).
[6] Bergerson, J., and Muehleisen, R. Bayesian large model calibration using simulation and measured data for improved predictions. SAE International Journal of
Passenger Cars-Mechanical Systems 8, 2015-01-0481 (2015), 415â420.
[7] Biehler, J., Gee, M. W., and Wall, W. A. Towards efficient uncertainty quantification in complex and large-scale biomechanical problems based on a bayesian multifidelity scheme. Biomechanics and modeling in mechanobiology 14, 3 (2015), 489â513.
[8] Bouwman, F., Schoonenboom, S., van Der Flier, W., Van Elk, E., Kok,
A., Barkhof, F., Blankenstein, M., and Scheltens, P. Csf biomarkers and
medial temporal lobe atrophy predict dementia in mild cognitive impairment. Neurobiology of aging 28, 7 (2007), 1070â1074.
[9] Boyd, S., and Vandenberghe, L. Convex optimization. Cambridge university
press, 2004.
[10] Breslow, N. Biostatistics and bayes. Statistical Science (1990), 269â284.
[11] Carvalho, C. M., Polson, N. G., and Scott, J. G. The horseshoe estimator
for sparse signals. Biometrika 97, 2 (2010), 465â480.
124

[12] Cedarbaum, J. M., Jaros, M., Hernandez, C., Coley, N., Andrieu, S.,
Grundman, M., Vellas, B., Initiative, A. D. N., et al. Rationale for use of
the clinical dementia rating sum of boxes as a primary outcome measure for alzheimers
disease clinical trials. Alzheimerâs & Dementia 9, 1 (2013), S45âS55.
[13] Chaikof, E. L., Brewster, D. C., Dalman, R. L., Makaroun, M. S., Illig,
K. A., Sicard, G. A., Timaran, C. H., Upchurch, G. R., and Veith, F. J.
The care of patients with an abdominal aortic aneurysm: the society for vascular
surgery practice guidelines. Journal of Vascular Surgery 50, 4 (2009), S2âS49.
[14] Chen, Y., and Pham, T. D. Development of a brain mri-based hidden markov
model for dementia recognition. Biomedical engineering online 12, Suppl 1 (2013), S2.
[15] Chib, S., and Greenberg, E. Understanding the Metropolis-Hastings Algorithm.
American Statistician 49, 4 (1995), 327â335.
[16] Christopher, J. Multi-state models for panel data: The msm package for r. Journal
of Statistical Software 38, 8 (1 2011), 1â28.
[17] Commenges, D. Multi-state models in epidemiology. Lifetime data analysis 5, 4
(1999), 315â327.
[18] Commenges, D. Inference for multi-state models from interval-censored data. Statistical methods in medical research 11, 2 (2002), 167â182.
[19] Commenges, D., Joly, P., Letenneur, L., and Dartigues, J.-F. Incidence and
mortality of alzheimerâs disease or dementia using an illness-death model. Statistics in
medicine 23, 2 (2004), 199â210.
[20] Corder, E., Saunders, A., Strittmatter, W., Schmechel, D., Gaskell,
P., Small, G., Roses, A. D., Haines, J., and Pericak-Vance, M. A. Gene
dose of apolipoprotein e type 4 allele and the risk of alzheimerâs disease in late onset
families. Science 261, 5123 (1993), 921â923.
[21] Crane, P. K., Carle, A., Gibbons, L. E., Insel, P., Mackin, R. S., Gross,
A., Jones, R. N., Mukherjee, S., Curtis, S. M., Harvey, D., et al. Development and assessment of a composite score for memory in the alzheimers disease
neuroimaging initiative (adni). Brain imaging and behavior 6, 4 (2012), 502â516.
[22] Currin, C., Mitchell, T., Morris, M., and Ylvisaker, D. Bayesian prediction
of deterministic functions, with applications to the design and analysis of computer
experiments. Journal of the American Statistical Association 86, 416 (1991), 953â963.
[23] Cyron, C., and Humphrey, J. Vascular homeostasis and the concept of mechanobiological stability. International journal of engineering science 85 (2014), 203â223.
125

[24] Cyron, C., Wilson, J., and Humphrey, J. Mechanobiological stability: a new
paradigm to understand the enlargement of aneurysms? Journal of The Royal Society
Interface 11, 100 (2014), 20140680.
[25] Dahabreh, I., Chan, J., Earley, A., Moorthy, D., Avendano, E., Trikalinos, T., Balk, E., and Wong, J. Modeling and simulation in the context of
health technology assessment: Review of existing guidance, future research needs, and
validity assessment.
[26] Datta, J., Ghosh, J. K., et al. Asymptotic properties of bayes risk for the
horseshoe prior. Bayesian Analysis 8, 1 (2013), 111â132.
[27] Dawkins, C., Srinivasan, T. N., and Whalley, J. Calibration. Handbook of
econometrics 5 (2001), 3653â3703.
[28] Desikan, R., Fischl, B., Cabral, H., Kemper, T., Guttmann, C., Blacker,
D., Hyman, B., Albert, M., and Killiany, R. Mri measures of temporoparietal
regions show differential rates of atrophy during prodromal ad. Neurology 71, 11 (2008),
819â825.
[29] Dietterich, T. Overfitting and undercomputing in machine learning. ACM computing surveys (CSUR) 27, 3 (1995), 326â327.
[30] Dunsmore, I. A bayesian approach to calibration. Journal of the Royal Statistical
Society. Series B (Methodological) 30, 2 (1968), 396â405.
[31] Erhart, P., Hyhlik-DuĚrr, A., GeisbuĚsch, P., Kotelis, D., MuĚllerEschner, M., Gasser, T. C., von Tengg-Kobligk, H., and BoĚckler, D.
Finite element analysis in asymptomatic, symptomatic, and ruptured abdominal aortic aneurysms: in search of new rupture risk predictors. European Journal of Vascular
and Endovascular Surgery 49, 3 (2015), 239â245.
[32] Fjell, A. M., McEvoy, L., Holland, D., Dale, A. M., Walhovd, K. B.,
Initiative, A. D. N., et al. What is normal in normal aging? effects of aging,
amyloid and alzheimerâs disease on the cerebral cortex and the hippocampus. Progress
in neurobiology 117 (2014), 20â40.
[33] Friedman, J., Hastie, T., and Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 3 (2008), 432â441.
[34] Frisoni, G. B., Fox, N. C., Jack, C. R., Scheltens, P., and Thompson,
P. M. The clinical use of structural mri in alzheimer disease. Nature Reviews Neurology
6, 2 (2010), 67â77.

126

[35] Gaser, C., Franke, K., KloĚppel, S., Koutsouleris, N., Sauer, H., Initiative, A. D. N., et al. Brainage in mild cognitive impaired patients: predicting the
conversion to alzheimers disease. PloS one 8, 6 (2013), e67346.
[36] Gasser, T. C. An irreversible constitutive model for fibrous soft biological tissue: a 3d microfiber approach with demonstrative application to abdominal aortic aneurysms.
Acta biomaterialia 7, 6 (2011), 2457â2466.
[37] Gelb, D. J. Measurement of progression in alzheimerâs disease: a clinicianâs perspective. Statistics in Medicine 19, 11-12 (2000), 1393â1400.
[38] George, E. I., and McCulloch, R. E. Approaches for bayesian variable selection.
Statistica sinica (1997), 339â373.
[39] Gerdes, L. U., Klausen, I. C., Sihm, I., Faergeman, O., and Vogler, G.
Apolipoprotein e polymorphism in a danish population compared to findings in 45
other study populations around the world. Genetic epidemiology 9, 3 (1992), 155â167.
[40] Gharahi, H., Zambrano, B., Lim, C., Choi, J., Lee, W., and Baek, S.
On growth measurements of abdominal aortic aneurysms using maximally inscribed
sphere, 2015.
[41] Ghosal, S. A review of consistency and convergence of posterior distribution. In
Varanashi Symposium in Bayesian Inference, Banaras Hindu University (1997).
[42] Giorgio, A., and De Stefano, N. Clinical use of brain volumetry. Journal of
Magnetic Resonance Imaging 37, 1 (2013), 1â14.
[43] Godbey, M. W. The use of variable-bagging and the cross-validation selector in the
prediction of alzheimers using the adni database.
[44] Goel, M., Khanna, P., and Kishore, J. Understanding survival analysis: Kaplanmeier estimate. International journal of Ayurveda research 1, 4 (2010), 274.
[45] Hankin, R. K. Introducing BACCO, an R bundle for Bayesian Analysis of Computer
Code Output. Journal of Statistical Software 14, 16 (2005), 1â21.
[46] Hansen, L. P., and Heckman, J. J. The empirical foundations of calibration. The
Journal of Economic Perspectives 10, 1 (1996), 87â104.
[47] Hawkins-Daarud, A., Prudhomme, S., van der Zee, K. G., and Oden, J. T.
Bayesian calibration, validation, and uncertainty quantification of diffuse interface
models of tumor growth. Journal of mathematical biology 67, 6-7 (2013), 1457â1485.
[48] Heckerman, D., et al. A tutorial on learning with bayesian networks. Nato Asi
Series D Behavioural And Social Sciences 89 (1998), 301â354.
127

[49] Higdon, D., Gattiker, J., Williams, B., and Rightley, M. Computer model
calibration using high-dimensional output. Journal of the American Statistical Association 103, 482 (2008).
[50] Higdon, D., Kennedy, M., Cavendish, J. C., Cafeo, J. A., and Ryne, R. D.
Combining field data and computer simulations for calibration and prediction. SIAM
Journal on Scientific Computing 26, 2 (2004), 448â466.
[51] Holland, J. The reverend thomas bayes, frs (1702-61). Journal of the Royal Statistical Society. Series A (General) 125, 3 (1962), 451â461.
[52] Horwitz, B. The elusive concept of brain connectivity. Neuroimage 19, 2 (2003),
466â470.
[53] Jackson, C. H., Sharples, L. D., Thompson, S. G., Duffy, S. W., and
Couto, E. Multistate markov models for disease progression with classification error.
Journal of the Royal Statistical Society: Series D (The Statistician) 52, 2 (2003), 193â
209.
[54] Jost, B. C., and Grossberg, G. T. The natural history of alzheimerâs disease: A
brain bank study. Journal of the American Geriatrics Society 43, 11 (1995), 1248â1255.
[55] JS, W., Baek, S., and Humphrey, J. Importance of initial aortic properties on the
evolving regional anisotropy, stiffness and wall thickness of human abdominal aortic
aneurysms. Journal of the Royal Society Interface 9, 74 (2012), 2047â2058.
[56] Kadane, J. B. Prime time for bayes. Controlled Clinical Trials 16, 5 (1995), 313â318.
[57] Kawas, C., Gray, S., Brookmeyer, R., Fozard, J., and Zonderman, A.
Age-specific incidence rates of alzheimers disease the baltimore longitudinal study of
aging. Neurology 54, 11 (2000), 2072â2077.
[58] Kennedy, M. C., and OâHagan, A. Bayesian calibration of computer models.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63, 3 (2001),
425â464.
[59] Klink, A., Hyafil, F., Rudd, J., Faries, P., Fuster, V., Mallat, Z., Meilhac, O., Mulder, W. J., Michel, J.-B., Ramirez, F., et al. Diagnostic and
therapeutic strategies for small abdominal aortic aneurysms. Nature Reviews Cardiology 8, 6 (2011), 338â347.
[60] Kniemeyer, H., Kessler, T., Reber, P., Ris, H., Hakki, H., and Widmer,
M. Treatment of ruptured abdominal aortic aneurysm, a permanent challenge or
a waste of resources? prediction of outcome using a multi-organ-dysfunction score.
European Journal of Vascular and Endovascular Surgery 19, 2 (2000), 190â196.
128

[61] Kwon, S., Burek, W., Dupay, A., Farsad, M., Baek, S., Park, E., and
Lee, W. Interaction of expanding abdominal aortic aneurysm with surrounding tissue:
Retrospective ct image studies. Journal of nature and science 1, 8 (2015), e150.
[62] Kwon, S., Rectenwald, J., and Baek, S. Intrasac pressure changes and vascular remodeling after endovascular repair of abdominal aortic aneurysms: review and
biomechanical model simulation. Journal of biomechanical engineering 133, 1 (2011),
011011.
[63] Lederle, F. A., Wilson, S. E., Johnson, G. R., Reinke, D. B., Littooy,
F. N., Acher, C. W., Ballard, D. J., Messina, L. M., Gordon, I. L., Chute,
E. P., et al. Immediate repair compared with surveillance of small abdominal aortic
aneurysms. New England Journal of Medicine 346, 19 (2002), 1437â1444.
[64] Likeman, M., Anderson, V. M., Stevens, J. M., Waldman, A. D., Godbolt,
A. K., Frost, C., Rossor, M. N., and Fox, N. C. Visual assessment of atrophy
on magnetic resonance imaging in the diagnosis of pathologically confirmed youngonset dementias. Archives of neurology 62, 9 (2005), 1410â1415.
[65] Lilford, R., and Braunholtz, D. For debate: The statistical basis of public
policy: a paradigm shift is overdue. Bmj 313, 7057 (1996), 603â607.
[66] Lorenzi, M., Pennec, X., Frisoni, G. B., Ayache, N., Initiative, A. D. N.,
et al. Disentangling normal aging from alzheimerâs disease in structural magnetic
resonance images. Neurobiology of aging 36 (2015), S42âS52.
[67] Machin, D., Cheung, Y. B., and Parmar, M. K. Coxâs proportional hazards
model. Survival Analysis: A Practical Approach, Second Edition (2006), 121â153.
[68] Maier, A., Gee, M., Reeps, C., Pongratz, J., Eckstein, H.-H., and Wall,
W. A comparison of diameter, wall stress, and rupture potential index for abdominal aortic aneurysm rupture risk prediction. Annals of biomedical engineering 38, 10
(2010), 3124â3134.
[69] McKay, M. D., Beckman, R. J., and Conover, W. J. Comparison of three
methods for selecting values of input variables in the analysis of output from a computer
code. Technometrics 21, 2 (1979), 239â245.
[70] Meira-Machado, L. F., de UnĚa-AĚlvarez, J., Cadarso-SuaĚrez, C., and Andersen, P. Multi-state models for the analysis of time-to-event data. Statistical
methods in medical research (2008).
[71] Mills, K. L., and Tamnes, C. K. Methods and considerations for longitudinal
structural brain imaging analysis across development. Developmental cognitive neuroscience 9 (2014), 172â190.
129

[72] Monti, R. P., Hellyer, P., Sharp, D., Leech, R., Anagnostopoulos, C.,
and Montana, G. Estimating time-varying brain connectivity networks from functional mri time series. NeuroImage 103 (2014), 427â443.
[73] Muehleisen, R. T., and Bergerson, J. Bayesian calibration-what, why and how.
[74] New, J., and Chandler, T. Evolutionary tuning of building models to monthly
electrical consumption. ASHRAE Transactions 119 (2013), 89.
[75] Norris, J. R. Markov Chains. No. 7. Cambridge University Press, 1998.
[76] of Neuro Imaging, L. Alzheimerâs disease.
[77] OâHagan, A. Bayesian analysis of computer code outputs: a tutorial. Reliability
Engineering & System Safety 91, 10 (2006), 1290â1300.
[78] Petersen, R. C., Smith, G. E., Waring, S. C., Ivnik, R. J., Tangalos,
E. G., and Kokmen, E. Mild cognitive impairment: clinical characterization and
outcome. Archives of neurology 56, 3 (1999), 303â308.
[79] Porth, C. Essentials of pathophysiology: Concepts of altered health states. Lippincott
Williams & Wilkins, 2010. Edition 3.
[80] Racine-Poon, A. A bayesian approach to nonlinear calibration problems. Journal
of the American Statistical Association 83, 403 (1988), 650â656.
[81] Raftery, P., Keane, M., and ODonnell, J. Calibrating whole building energy
models: An evidence-based methodology. Energy and Buildings 43, 9 (2011), 2356â
2364.
[82] Reddy, T. A., Maor, I., and Panjapornpon, C. Calibrating detailed building
energy simulation programs with measured datapart i: General methodology (rp-1051).
Hvac&R Research 13, 2 (2007), 221â241.
[83] Reeps, C., Maier, A., Pelisek, J., HaĚrtl, F., Grabher-Meier, V., Wall,
W. A., Essler, M., Eckstein, H.-H., and Gee, M. W. Measuring and modeling
patient-specific distributions of material properties in abdominal aortic aneurysm wall.
Biomechanics and Modeling in Mechanobiology 12, 4 (2013), 717â733.
[84] Ribeiro, P. F., Ventura-Antunes, L., Gabi, M., Mota, B., Grinberg,
L. T., Farfel, J. M., Ferretti-Rebustini, R. E., Leite, R. E., HerculanoHouzel, S., et al. The human cerebral cortex is neither one nor many: neuronal
distribution reveals two quantitatively different zones in the gray matter, three in the
white matter, and explains local variations in cortical folding. Front. Neuroanat 7, 28
(2013), 10â3389.
130

[85] Riddle, M., and Muehleisen, R. T. A guide to bayesian calibration of building
energy models. In Building Simulation Conference (2014).
[86] Risacher, S. L., Kim, S., Shen, L., Nho, K., Foroud, T., Green, R. C.,
Petersen, R. C., Jack Jr, C. R., Aisen, P. S., Koeppe, R. A., et al. The
role of apolipoprotein e (apoe) genotype in early mild cognitive impairment (e-mci).
[87] Rizzo, R. J., McCarthy, W. J., Dixit, S. N., Lilly, M. P., Shively, V. P.,
Flinn, W. R., and Yao, J. S. Collagen types and matrix protein content in human
abdominal aortic aneurysms. Journal of Vascular Surgery 10, 4 (1989), 365 â 373.
[88] Rothman, A. J., Levina, E., and Zhu, J. A new approach to cholesky-based
covariance regularization in high dimensions. Biometrika 97, 3 (2010), 539â550.
[89] Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P. Design and
analysis of computer experiments. Statistical science (1989), 409â423.
[90] Salthouse, T. A. Robust cognitive change. Journal of the International Neuropsychological Society 18, 04 (2012), 749â756.
[91] Santner, T. J., Williams, B. J., and Notz, W. I. The design and analysis of
computer experiments. Springer Science & Business Media, 2013.
[92] Saunders, A., Strittmatter, W., Schmechel, D., George-Hyslop, P. S.,
Pericak-Vance, M., Joo, S., Rosi, B., Gusella, J., Crapper-MacLachlan,
D., Alberts, M., et al. Association of apolipoprotein e allele 4 with late-onset
familial and sporadic alzheimerâs disease. Neurology 43, 8 (1993), 1467â1467.
[93] Schmitter, D., Roche, A., MareĚchal, B., Ribes, D., Abdulkadir, A.,
Bach-Cuadra, M., Daducci, A., Granziera, C., KloĚppel, S., Maeder, P.,
et al. An evaluation of volume-based morphometry for prediction of mild cognitive
impairment and alzheimerâs disease. NeuroImage: Clinical 7 (2015), 7â17.
[94] Seyedsalehi, S., Zhang, L., Choi, J., and Baek, S. Prior distributions of
material parameters for bayesian calibration of growth and remodeling computational
model of abdominal aortic wall. J Biomech Eng 137, 10 (2015), 101001.
[95] Sheidaei, A., Hunley, S., Zeinali-Davarani, S., Raguin, L., and Baek, S.
Simulation of abdominal aortic aneurysm growth with updating hemodynamic loads
using a realistic geometry. Medical engineering & physics 33, 1 (2011), 80â88.
[96] Shi, J., Lepore, N., Gutman, B. A., Thompson, P. M., Baxter, L. C.,
Caselli, R. J., and Wang, Y. Genetic influence of apolipoprotein e4 genotype on
hippocampal morphometry: An n= 725 surface-based alzheimerâs disease neuroimaging initiative study. Human brain mapping 35, 8 (2014), 3903â3918.
131

[97] Spampinato, M. V., Rumboldt, Z., Hosker, R. J., and Mintzer, J. E.
Apolipoprotein e and gray matter volume loss in patients with mild cognitive impairment and alzheimer disease. Radiology 258, 3 (2011), 843â852.
[98] Spiegelhalter, D. J., Myles, J. P., Jones, D. R., and Abrams, K. R.
Bayesian methods in health technology assessment: a review.
[99] Stephenson, G. Statistical analysis of computer models and the use of derivative
information.
[100] Sukkar, R., Katz, E., Zhang, Y., Raunig, D., and Wyman, B. T. Disease
progression modeling using hidden markov models. In Engineering in Medicine and
Biology Society (EMBC), 2012 Annual International Conference of the IEEE (2012),
IEEE, pp. 2845â2848.
[101] Sweeting, M., Farewell, V., and De Angelis, D. Multi-state markov models
for disease progression in the presence of informative examination times: An application to hepatitis c. Statistics in medicine 29, 11 (2010), 1161â1174.
[102] Swerdlow, R. H. Brain aging, alzheimerâs disease, and mitochondria. Biochimica
et Biophysica Acta (BBA)-Molecular Basis of Disease 1812, 12 (2011), 1630â1639.
[103] Tuo, R., and Jeff Wu, C. A theoretical framework for calibration in computer
models: parametrization, estimation and convergence properties. SIAM/ASA Journal
on Uncertainty Quantification 4, 1 (2016), 767â795.
[104] Van Der Pas, S., Kleijn, B., Van Der Vaart, A., et al. The horseshoe
estimator: Posterior concentration around nearly black vectors. Electronic Journal of
Statistics 8, 2 (2014), 2585â2618.
[105] Vemuri, P., Wiste, H., Weigand, S., Shaw, L., Trojanowski, J., Weiner,
M., Knopman, D., Petersen, R., Jack, C., Initiative, A. D. N., et al. Mri
and csf biomarkers in normal, mci, and ad subjects predicting future clinical change.
Neurology 73, 4 (2009), 294â301.
[106] Walker, S. New approaches to bayesian consistency. Annals of Statistics (2004),
2028â2043.
[107] Wang, Y., Resnick, S. M., and Davatzikos, C. Analysis of spatio-temporal
brain imaging patterns by hidden markov models and serial mri images. Human brain
mapping 35, 9 (2014), 4777â4794.
[108] Watton, P. N., Huang, H., and Ventikos, Y. Multi-scale modelling of vascular
disease: Abdominal aortic aneurysm evolution. Computational Modeling in Tissue
Engineering 7 (2013), 309â339.
132

[109] Whitt, W.
Continuous-time markov chains.
edu/ ww2040 (2006).

INTERNET: www. columbia.

[110] Williams, C. K., and Rasmussen, C. E. Gaussian processes for machine learning.
MIT Press (2006).
[111] Xu, Y., Choi, J., Dass, S., and Maiti, T. Sequential Bayesian prediction and
adaptive sampling algorithms for mobile sensor networks. IEEE Transactions on Automatic Control 57, 8 (2012), 2078â2084.
[112] Yamamoto, S., Miyazaki, M., Iwano, T., and Kitazawa, S. Bayesian calibration of simultaneity in audiovisual temporal order judgments. PLoS ONE 7 (2012),
1â9.
[113] Yu, H.-m., Yang, S.-s., Gao, J.-w., Zhou, L.-y., Liang, R.-f., and Qu, C.-y.
Multi-state markov model in outcome of mild cognitive impairments among community
elderly residents in mainland china. International Psychogeriatrics 25, 05 (2013), 797â
804.
[114] Zankl, A., Schumacher, H., Krumsdorf, U., Katus, H., Jahn, L., et al.
Pathology, natural history and treatment of abdominal aortic aneurysms. Clinical
Research in Cardiology 96, 3 (2007), 140â151.
[115] Zeinali-Davarani, S., and Baek, S. Medical image-based simulation of abdominal
aortic aneurysm growth. Mechanics Research Communications 42 (2012), 107â117.
[116] Zeinali-Davarani, S., Raguin, L. G., Vorp, D. A., and Baek, S. Identification
of in vivo material and geometric parameters of a human aorta: toward patient-specific
modeling of abdominal aortic aneurysm. Biomechanics and modeling in mechanobiology 10, 5 (2011), 689â699.
[117] Zeinali-Davarani, S., Sheidaei, A., and Baek, S. A finite element model of
stress-mediated vascular adaptation: application to abdominal aortic aneurysms. Computer methods in biomechanics and biomedical engineering 14, 9 (2011), 803â817.
[118] Zhang, D., Shen, D., Initiative, A. D. N., et al. Predicting future clinical
changes of mci patients using longitudinal and multimodal biomarkers. PloS one 7, 3
(2012), e33182.
[119] Zhao, X., and Pelegri, A. A. A bayesian approach for characterization of soft
tissue viscoelasticity in acoustic radiation force imaging. Int. J. Numer. Meth. Biomed.
Engng. (2015).
[120] Zhou, S., Lafferty, J., and Wasserman, L. Time varying undirected graphs.
Machine Learning 80, 2 (2010), 295â319.
133