HIGH DIMENSIONAL COMPUTATIONAL MODELS FOR BIOMEDICAL IMAGING DATA ANALYSIS By Liangliang Zhang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Statistics - Doctor of Philosophy 2017 ABSTRACT HIGH DIMENSIONAL COMPUTATIONAL MODELS FOR BIOMEDICAL IMAGING DATA ANALYSIS By Liangliang Zhang The importance of big data does not revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable 1) cost reductions, 2) time reductions, 3) new product development and optimized offerings, and 4) smart decision making. My thesis is mainly focused on high-dimensional image data analysis and computations. The image data I worked on are CT images of abdominal aortic aneurysm (AAA) and brain images of Alzheimer’s Disease. I developed Bayesian calibration method for the former and Supervised learning with Markov Chain for the latter. Bayesian calibration has a long history within computer modeling in general. Bayesian calibration is an iterative process of updating uncertainty distributions on the calibration parameters in a way that is consistent with the observed data. Because of the advances in complex mathematical models and fast computer codes, Bayesian calibration of computer experiments are popular in the scientific research nowadays. As we know, compared to a computer model, a complex system in real life is expensive both in time and money to observe. Therefore, computer models can be a stand-alone tool or combined with (typically smaller) data from physical experiments or field observations. And Bayesian calibration is powerful in integrating all sources of uncertainties into the model definition and calibration procedure. For the AAA data, first I modeled only one patient (patient-specific prediction discussed in Chapter 2), and then built an advanced model which can incorporate all patients (multi- patients prediction in Chapter 3). In the process, semi-parametric functional data analysis, covariance modeling and Bayesian methods was highly practiced and used. The contributions are as follows. First, we formulate the Bayesian calibration of our AAA G&R computation model taking into account model inadequacy, prior distributions of model parameters, measurement errors, and most importantly, longitudinal CT scan images. Next, we demonstrate how to achieve the proposed aims by solving the formulated Bayesian calibration problem using a simulation study and real data analysis. In particular, we compare and discuss the performance and computation time under different sampling cases for the computation model output data and (synthesized) patient data, both of which are synthesized by the G&R computation. We apply our Bayesian calibration to the real CT data and validate our prediction, showing the usefulness of our approach to the computational science and medical communities in aiding decision making. For the Alzheimer’s Disease data, the causes are currently being researched massively, but no definitive answers exist as yet. Genetic predisposition, abnormal protein deposits in the brain and environmental factors are suspected to play a role in the development of the disease. In Chapter 4, my main goal is to model the progression of Alzheimer’s Disease by applying multi-state Markov model, and to investigate the significance of known risk factors like Age, ApoE4 and some brain structural volumetric variables getting from MRI like hippocampus, and at the same time, to predict the transitions between different clinical diagnosis states based transition rates and transition probabilities. We found that the model with age is not significant (p-value is 0.1733) according to the likelihood ratio test, while ApoE4 is a significant risk factor in our Markov model. Predictions based on transition rates and transition intensities were made and validated with the accuracy as high as 0.7849. ACKNOWLEDGMENTS Thanks for the ancestors who paved the path before me upon whose shoulders I stand. This is also dedicated to my family and the many friends who supported my on this journey. I appreciate all the failure, trials and suffering for giving me one more chance each time unto a full grown man. Thanks for the Statisticians and other scientists before and in my time. No matter how ignorant I was at the beginning, I always feel so lucky to be selected as part of Statistics since I was a sophomore. Statistics is omnipotent like the saying goes: God also gambles. We are living in a world full of uncertainties. Determinacy in Mathematics is generalized in Statistics; The natural randomness in Quantum can be systematically described in Statistics; Predictive models in the financial market are cautious of Black Swan... ... I would like to express my deepest gratitude to my supervisor Prof. Taps Maiti for his unwavering support of my five years Ph.D study and life, for his encouragement, motivation, collegiality and mentorship throughout all the projects, for his time and effort put into cultivating me from a dependent researcher to be an independent academician. I would like to express my special appreciation to my co-advisor Dr. Chae Young Lim, not only for her tremendous academic support, but also for her patience, genuine caring and concern. Without all her contributions of time and ideas, it is not possible to make this thesis completed and comprehensive. Besides my supervisors, I would like to extend my thanks to those who offered research opportunities, collegial guidance and continuous support over the years: my thesis committee Dr. Jongeun Choi and Dr. R.V. Ramamoorthi, for their encouragement and serving as member of committee. iv TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter 1 Introduction . . . . . . . 1.1 Computer model . . . . . . . . . 1.2 Gaussian Process Emulation . . . 1.3 Design of Computer Experiments 1.4 Bayesian Statistics . . . . . . . . 1.5 Calibration . . . . . . . . . . . . 1.6 Bayesian calibration . . . . . . . 1.7 Advantages and disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 4 5 5 7 Chapter 2 Patient-Specific Prediction from Single-subject Data 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 AAA G&R computation model . . . . . . . . . . . . . . . 2.2.2 Quantity of interest (QoI) for AAA G&R . . . . . . . . . . 2.2.3 Calibration model . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Statistical models . . . . . . . . . . . . . . . . . . . . . . . 2.3 Bayesian analysis for calibration . . . . . . . . . . . . . . . . . . . 2.3.1 Likelihood of computer model . . . . . . . . . . . . . . . . 2.3.2 Likelihood of real observations . . . . . . . . . . . . . . . . 2.3.3 Joint likelihood . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Simulated Observation Cases . . . . . . . . . . . . . . . . 2.4.2 Real Observation Case . . . . . . . . . . . . . . . . . . . . 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Results from Simulated Observation Cases . . . . . . . . . 2.5.1.1 Robustness of prior selection . . . . . . . . . . . 2.5.1.2 Computation time . . . . . . . . . . . . . . . . . 2.5.2 Results from Real Observation Case . . . . . . . . . . . . . 2.6 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . 2.7 Data Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 1 Programming steps for simulated observations . . . . . . Appendix 2 Other Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 12 13 16 19 21 23 24 25 26 27 30 32 33 35 36 37 41 41 42 44 48 50 50 51 v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 3 Patient-Specific Prediction from Multi-subject 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Calibration model . . . . . . . . . . . . . . . . . . . . . . . 3.3 Statistical models . . . . . . . . . . . . . . . . . . . . . . . 3.4 Computer outputs . . . . . . . . . . . . . . . . . . . . . . . 3.5 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Full data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Bayesian analysis . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Joint posterior distribution . . . . . . . . . . . . . . 3.7.2 Full conditional of ψ1 . . . . . . . . . . . . . . . . 3.7.3 Full conditional of ψ2 . . . . . . . . . . . . . . . . 3.7.4 Posterior distribution of calibration parameter θ . . 3.7.5 Predictive distribution of true process ζ(·) . . . . . 3.8 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.1 Small sample . . . . . . . . . . . . . . . . . . . . . 3.8.2 Big sample . . . . . . . . . . . . . . . . . . . . . . . 3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 58 59 60 61 63 64 66 66 67 67 71 71 72 73 74 76 Alzheimer’s Disease Progression Analysis Using Multi-state Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Model without covariates . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Model with a single covariate . . . . . . . . . . . . . . . . . . . . . . 4.4.2.1 Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2.2 ApoE4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Model with multiple covariates . . . . . . . . . . . . . . . . . . . . . 4.4.3.1 Prediction and validation . . . . . . . . . . . . . . . . . . . 4.4.3.2 Impact of transition rates and transition probabilities . . . . 4.5 Conclusion and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 1 Multi-state Markov model . . . . . . . . . . . . . . . . . . . . . . . Appendix 2 Specification for AAA progression . . . . . . . . . . . . . . . . . . . 78 78 83 87 90 90 92 93 94 97 99 102 105 107 108 110 Chapter 5 Conclusions, Discussion, and Directions for 5.1 Conclusions and discussion . . . . . . . . . . . . . . . . 5.2 Directions for future research . . . . . . . . . . . . . . . 5.2.1 Bayesian consistency . . . . . . . . . . . . . . . 5.2.2 Variable selection in Bayesian statistics . . . . . 5.2.3 Bayesian Dynamic network . . . . . . . . . . . . 111 111 112 112 115 119 Chapter 4 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 vi LIST OF TABLES Table 2.1. Differences in Xc and σ22 for simulation cases . . . . . . . . . . . . . . . . 35 Table 2.2. Case details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Table 2.3. Posterior estimates of θ1 and θ2 . . . . . . . . . . . . . . . . . . . . . . . 43 Table 2.4. Computation times for Bayesian calibration and prediction. Estimation (time) describes the time spent in maximizing to estimate ψ1 and ψ2 , while sampling (time) describes the time spent in calculating predictions and credible intervals via sampling from posteriors. . . . . . . . . . . . . . . . 43 Table A.1. Priors and Estimates of Hyperparameters for Case 1a . . . . . . . . . . . 51 Table A.2. Priors and Estimates of Hyperparameters for Case 1b . . . . . . . . . . . 51 Table A.3. Priors and Estimates of Hyperparameters for Case 2a . . . . . . . . . . . 52 Table A.4. Priors and Estimates of Hyperparameters for Case 2b . . . . . . . . . . . 52 Table A.5. Priors and Estimates of Hyperparameters for Case 2c . . . . . . . . . . . 52 Table A.6. Priors and Estimates of Hyperparameters for Case 3a . . . . . . . . . . . 53 Table A.7. Priors and Estimates of Hyperparameters for Case 3b . . . . . . . . . . . 53 Table A.8. Priors and Estimates of Hyperparameters for real observations . . . . . . 53 Table 3.1. Priors and Estimates of Hyperparameters . . . . . . . . . . . . . . . . . . 76 Table 4.1. Distribution of age in each diagnosis group . . . . . . . . . . . . . . . . . 85 Table 4.1. Table of transition rates and probabilities for the model without covariates. 92 Table 4.2. Table of transition rates (95% confidence intervals) for the model with covariate ApoE4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Table 4.3. Table of transition probabilities (95% confidence intervals) at year 2 for the model with covariate ApoE4. . . . . . . . . . . . . . . . . . . . . . . . . . 97 Table 4.4. Table of transition probabilities (95% confidence intervals) at year 4 for the model with covariate ApoE4. . . . . . . . . . . . . . . . . . . . . . . . . . 98 vii Table 4.5. Table of mean sojourn time (95% confidence intervals) for the model with covariate ApoE4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Table 4.6. Likelihood ratio test between different models. . . . . . . . . . . . . . . . 99 Table 4.7. Prediction of two data sets with different alignments in time. Data1 includes all the observations before AD for each subject, while Data 2 includes only 2 observations before Ad for each subject. Data1 is divided into Group1 and Group2. Group1 in Data1 includes all the patients whose transition probability is ever more than 0.45 or whose transition rate is more than 1.2. Also, Data2 is divided into Group1 and Group2. Group1 in Data2 include all the patients whose transition probability is ever more than 0.6 or whose transition rate is more than 1.2. . . . . . . . . . . . . . 102 viii LIST OF FIGURES Figure 2.1. (a) The location of an abdominal aortic aneurysm colored as redis indicated by an arrow, while green region on the left indicates the inferior vena vein and (b) four longitudinal AAA CT images 1, 2, 3, and 4 which are screened at consecutive 4 times. Regarding the technique to align the CT scan images, please refer to Kwon et al. [61]. . . . . . . . . . . . . . . 8 Figure 2.2. The flow-chart of Bayesian calibration. . . . . . . . . . . . . . . . . . . . 12 Figure 2.3. 3D point clouds sampled from a CT scan image. (a) An inscribed sphere with point clouds from the CT scan images and (b) a centerline computed from point clouds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Figure 2.4. The diagram of data preparation. . . . . . . . . . . . . . . . . . . . . . . 19 Figure 2.5. Radius vs height coordinate on the centerline measured from CT scans of a patient over the surveillant time. The lines from bottom to top represent images 1, 2, 3, and 4, respectively. . . . . . . . . . . . . . . . . . . . . . . 20 Figure 2.6. Diagram of the Bayesian calibration process. . . . . . . . . . . . . . . . . 25 Figure 2.7. Two stages to estimate ψ. . . . . . . . . . . . . . . . . . . . . . . . . . 29 Figure 2.8. Predictions of the true processes. T [t] denotes the true values at year t, while P [t] denotes the predictions at year t. The solid line denotes the true values, while the dashed line denotes the predictions. . . . . . . . . 38 Figure 2.9. 3D 95% credible band of predictions. The blue stars denote the true values lying inside the credible band. The red marks denote the true values lying outside the credible band. . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Figure 2.10. Predictions and credible bands for Case 2c. . . . . . . . . . . . . . . . . 40 Figure 2.11. Predictions and credible bands for Real observation. . . . . . . . . . . . . 44 Figure A.1. Relative errors between predictions and tmrue values. . . . . . . . . . . . 54 Figure A.2. Relative errors for Case 2c (a) and for real observation case (b). . . . . . 55 Figure A.3. Prior and posterior densities of θ1 and θ2 . . . . . . . . . . . . . . . . . . 56 Figure A.4. Prior and posterior densities of θ1 and θ2 . . . . . . . . . . . . . . . . . . 57 ix Figure 3.1. Prediction results for subject 1. . . . . . . . . . . . . . . . . . . . . . . . 74 Figure 3.2. Prediction results for subject 2. . . . . . . . . . . . . . . . . . . . . . . . 75 Figure 3.3. Prior and posterior results for the data consisting of 6 patients. . . . . . 77 Figure 4.1. Individual plot of percentage of hippocampus volume to ICV vs. age. In each graph, each black line segement with solid points in it denotes how the hippocampus of each patient changes with age. The solid point denotes the age when the patient takes screening. The x-axis is the real age of patients. The y-axis is the percentage of hippocampus volume to ICV (intracranial volume). . . . . . . . . . . . . . . . . . . . . . . . . . 86 Figure 4.2. Longitudinal plots of fraction of hippocampus volume to ICV and MMSE in 4 different diagnosis groups. ’D11’ denotes the group of patients who stay within NC all the time. ’D22’ denotes the group of patients who stay within MCI all the time. ’D23’ denotes the group of patients who ever transit from MCI to AD. ’D33’ denotes the group of patients who stay within AD all the time. . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Figure 4.1. Plot of survival curves (not entering to the final state AD). The dashed blue line denotes fitted survival curve from NC to AD. The solid red line denotes fitted survival curve from MCI to AD. The dotted lines denote the 95% confidence band of corresponding fitted survival curve. . . . . . 93 Figure 4.2. Fitted transition rates changing with age in different groups. The red solid line with circle marks denotes the transition rates from NC state to MCI state. The blue dashed line with triangular marks denotes the transition rates from MCI state to NC state. The purple dotted line with plus marks denotes the transition rates from MCI state to AD state. x-axis denotes the age, while y-axis denotes the transition rates. . . . . . . . . . . . . . 95 Figure 4.3. Transition probabilities changing with time for the model with covariate ApoE4. The red solid line with circle marks describes transition probabilities from NC state to MCI state. The blue dashed line with triangle marks describes transition probabilities from MCI state to NC state. The purple dotted line with plus marks describes transition probabilities from MCI state to AD state. x-axis denotes the scanning time, while y-axis denotes the transition probabilities. The left panel describes the ApoE4 carriers (positive), while the right panel describes the ApoE4 non-carriers (negative). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 x Figure 4.4. Longitudinal plots of predicted transition rates and probabilities in different diagnosis groups. ’D11’ denotes the group of patients who stay within NC all the time. ’D22’ denotes the group of patients who stay within MCI all the time. ’D23’ denotes the group of patients who ever transit from MCI to AD. ’D33’ denotes the group of patients who stay within AD all the time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Figure 5.1. Inverted Beta distribution when parameters a and b take different values. 118 Figure 5.2. Group 1, Group 2 and Group 3 (big circles) denote the nodes of the brain network. The thick solid line denotes the connectivity between nodes. As we can see the connectivity changes with time. Voxels (small circles) denote the real observations. . . . . . . . . . . . . . . . . . . . . . . . . 121 xi Chapter 1 Introduction In this thesis, the image data I worked on are CT images of abdominal aortic aneurysm (AAA) and brain images of Alzheimer’s Disease. I developed Bayesian calibration method for the former and Supervised learning with Markov Chain for the latter. Chapter 1 reviews the background, research questions and development of Bayesian calibration method. Chapter 2 introduces patient-specific prediction for single subject data by using Bayesian calibration. Chapter 3 introduces patient-specific prediction for multiple subjects data by using Bayesian calibration. Chapter 4 introduces analysis on conversion of Alzheimers Disease using a multistate Markov model. Chapter 5 introduces Conclusions, Discussion, and Suggestions for Future Research 1.1 Computer model Complex models (also called computer models) are used to resemble real-world systems and are thus used in many different fields, such as clinical sciences, Physics, engineering and climate change. A computer model is defined by a set of equations and is implemented as computer codes. People typically use the model to predict how the real world system may behave in the future. Running a computer model at a number of different input values is known as computer experiment. Under different backgrounds the output could be either a scalar or a vector; however, we will just consider the case with univariate output here. The 1 computer models are deterministic, for each time they are run with the same inputs they will produce the same output. [99] The inputs usually are used to describe some properties of the real system which need to observed. Thus the true values of these inputs may be unknown. Although when the values of the inputs can be obtained, it is likely that some form of measurement error will have occurred. Therefore, the uncertainty in the inputs needs to be quantified as it will relate to how uncertain we are that the computer model output matches reality. In other words, uncertainty analysis is aiming at quantifying the uncertainty in model outputs caused by uncertainty in inputs. Besides, another source of uncertainty is in the model itself. Even when the true values of the inputs are known, running the model at these points will not produce an output which matches exactly the observation of the real-world system. Complex models are computationally expensive and take a long time to run. Because a simulation of a real-world system can require the high number of dimensions. Sensitivity analysis is defined as the process of evaluating how the output of a model is modified by changes in the inputs. Performing analyses such as sensitivity and uncertainty analysis can require many runs of the computer model and this quickly becomes impractical with a computationally expensive model. 1.2 Gaussian Process Emulation In statistics, Gaussian process emulator is one name for a general type of statistical model that has been used in contexts where the problem is to make maximum use of the outputs of a complex (often non-random) computer model. The main element of the Gaussian process emulator is that it models the outputs as a Gaussian process on a space that is defined by 2 the computer model inputs. The analysis of computer models and Gaussian process emulator has been studied for about 20 years. Sacks et al. (1989) [89] describe a technique for approximating an unknown deterministic computer experiment by modeling the output as a realization of a stochastic process. A Bayesian method of predicting a computer model at untested inputs is given in Currin et al. (1991) [22]. O’Hagan (2006) [77] gives an introduction to Gaussian process emulation as well as details of Bayesian methods for various analyses. 1.3 Design of Computer Experiments In order to obtain the training data, we need to select the design. The choice of design points is named as the design of the experiment. Without intensively increasing the computational time, we aim to gain more information from the training data, so that the emulator can approximate better the computer model. Various criteria are defined under different designs. For example, under a maximum design, points are selected such that the maximum distance between any two design points in minimized. McKay et al. (1979) [69] compare three methods for choosing designs for Monte Carlo studies–simple random sampling, stratified sampling and Latin hypercube sampling. A Latin hypercube sample ensures that the points are spread evenly over the range of each input dimension. The three methods are assessed by comparing estimators of the mean, variance and distribution function of the output. They conclude that Latin hypercube sampling is preferred. Besides, Santner et al. (2003) [91] comment on the widespread use of this method for generating designs in other kinds of computer experiment. 3 1.4 Bayesian Statistics Bayesian statistics began with a posthumous publication in 1763 by Thomas Bayes, [5] a non-conformist minister from Tunbridge Wells. [51] His work was formalised as Bayes’s theorem, which is a simple and uncontroversial result in probability theory. Bayesian statistics is a system for describing epistemological uncertainty using the mathematical language of probability. [98]The methods can be outlined as formal combinations through the use of Bayes’s theorem of: 1. a prior distribution about the value of a quantity of interest based on evidence not derived from the study, with 2. a summary of the information (likelihood) concerning the same quantity available from the data collected in the study, to yield 3. an updated or posterior distribution of the quantity of interest. These methods address the question of how new information should update what we currently believe. They extend naturally into making predictions, synthesizing information from different sources, and designing studies. Nevertheless, Bayesian methods are a controversial topic. Because they may involve the explicit use of subjective judgment in a rigorous scientific exercise. Compared with traditional methods, a Bayesian perspective leads to more flexible and ethical methods of analyzing clinical trials and observational studies, [56] and to more elegant ways of handling multiple substudies, for instance when simultaneously estimating the effects of a treatment on many subgroups [10]. A Bayesian approach enables one to provide suitable conclusions for making decisions for specific patients, for planning research, or for public policy [65]. 4 1.5 Calibration For a conventional calibration, an expert modeler repeatedly adjusts calibration parameters to minimize the difference between observed data and model outputs (Reddy et al., 2007) [82]. One main drawback of this process is that it demands intensive manual labor and inherently introduces modeler biases into the calibrated model. To improve, optimization schemes have been employed to automate calibration (New and Chandler, 2013 [74]; Raftery et al., 2011 [81]). These schemes determine the combination of model parameters by minimizing the error between data and model outputs. Therefore, the solution of the calibration problem can be understood as identifying the global optimum of the objective function within the feasible domain of the calibration parameters. When both the objective function and the feasible domain are convex or they can be restated as convex, it is very stable for the mathematical algorithm to find the single optimum. For problems that are not convex or cannot be recast as convex, exact solutions often become computationally expensive, and only approximate solutions may be practical [9]. For more details, please refer to [25]. In most cases, calibration is the same as statistical estimation [46], [27]. Both processes are aiming at finding input values that lead to the best possible model fit. For example, if the objective function of the calibration is a likelihood function, calibration is equivalent to maximum likelihood estimation. 1.6 Bayesian calibration Many methods of automated calibration have been developed which reduce costs, time and biases. Among them, Bayesian methods differ significantly from the others in that inputs are 5 assumed to be uncertain and the main goal is not to match the prediction to the measured data as closely as possible, but to reduce the uncertainty in the inputs in a manner consistent with the measured data. When the sample size of measured data are limited and the model inputs have high sensitivity and high uncertainty, Bayesian methods are particular useful to achieve calibration. For more details, please refer to Muehleisen and Bergerson 2016 [73]. Bayesian calibration is an advanced method of calibration and thus is fundamentally different than conventional calibration methods. Bayesian calibration has a long history within computer modeling in general ( [30]; [80]; [58]). Bayesian calibration is an automated process of updating uncertainty distributions on the calibration parameters in a way that is consistent with the observed data. Unlike the conventional calibration that minimizes the difference between observed data and model outputs, Bayesian calibration determines the most likely uncertainties for input parameters that yield an output uncertainty in which the observed data is most likely. Bayesian calibration is an application of Bayes Theorem, which combines prior information with future information contained in the likelihood of observed outputs from the model ( [6]). As commented by Kennedy and O’Hagan (2001) [58], none of the existing methods of calibration recognize fully, all sources of uncertainty. As similar as a regular Bayesian problem, the general process of Bayesian calibration is 1) define PDFs (probability density functions) for uncertain model parameters, 2) collect simulations and observations of the real system for known input parameter values, and 3) calibrate (assumed) prior parameters based on the observed data by iteratively using Bayes Theorem until iterations converge to an acceptable level. Establishing prior PDFs for input parameters generally requires both expert opinion solicitation and literature review. For example, a uniform distribution may be assumed, when only a range of appropriate values can be determined for a parameter(Riddle and Muehleisen, 2014 [85]). A triangular distribution 6 may be used, when both a range of values and a most likely value can be determined. 1.7 Advantages and disadvantages As Bayesian calibration integrates uncertainty to the model definition and calibration procedure, one of the primary advantages is that a modeler can quantify a confidence level in the calibrated model. Another advantage is that a probabilistic risk analysis can be performed and competing retrofits can be ranked according to a desired risk level, based on mean values, variances, etc. Finally, Bayesian calibration reduces the tendency to overfit the model to the observed data. Overfitting is a problem in regression analysis, multi-objective optimization, and machine learning when the fitting/optimization routine fits the model to the noise in measured data (Dietterich, 1995 [29]). Unlike the conventional calibration that minimizes the difference between observed data and model outputs, Bayesian calibration is trying to maximize the likelihood that the model output is statistically consistent with the measured data. This consistency naturally takes into account the uncertainty in the measured data (Heckerman, 1998 [48]). There is one common disadvantages of all calibration methods, both conventional and Bayesian. It is typically very challenging, time consuming and expensive to collect observed data from the real system. Another disadvantage of Bayesian calibration requires a significant number of iterations to converge to the most likely PDFs, especially when the prior probability density functions are poorly chosen. Finnally, because of the high computational demand, in order to reduce run time, Bayesian calibration often requires a pre-step of parameter screening and selection to obtain the most significant parameters to use in the calibration process in large models [73]. 7 Chapter 2 Patient-Specific Prediction from Single-subject Data 2.1 Introduction (a) (b) Figure 2.1. (a) The location of an abdominal aortic aneurysm colored as redis indicated by an arrow, while green region on the left indicates the inferior vena vein and (b) four longitudinal AAA CT images 1, 2, 3, and 4 which are screened at consecutive 4 times. Regarding the technique to align the CT scan images, please refer to Kwon et al. [61]. An abdominal aortic aneurysm (AAA) is an enlarged localized volume in the lower part 8 of the aorta, which supplies blood to a large part of the body (see Fig. 2.1). Enlargement of the aorta by more than 50% of its normal diameter is defined as an aortic aneurysm. The vast majority (over 90%) of aortic aneurysms occur in the abdominal region, specifically the infrarenal aorta [79] [114]. In general, an aorta with a diameter larger than 3 cm is considered an aneurysm. A ruptured aneurysm can cause life threatening internal bleeding. If ruptured, patient mortality rates are greater than 80% [59, 60]. Depending on the size and rate of growth, treatment of an AAA may vary from watchful waiting to emergency surgery. Once an AAA is found, doctors will closely monitor it so that surgery can be planned if it becomes necessary. Since elective repair of AAAs can result in peri-operative deaths (4-8%) [63], performing unnecessary surgeries increases patient risk. A thorough understanding of the expansion and rupture of AAAs is thus needed in order to minimize patient risk. While significant advances have been made in the management of AAA patients [13], this disease still carries a high mortality rate. During the last decade, bio-chemo-mechanical studies have been integrating computation modeling with increased understanding of the expansion and weakening of aneurysms. Recently, this computational platform, called a growth and remodeling (G&R) model, has been developed and incorporated with patientspecific anatomical information, which aids in treatment planning on a per-patient basis [2, 36, 108, 117]. Zeinali-Davarani et al. [117] developed the G&R model to take into account both elastic degeneration and stress-mediated collagen turnover during AAA development using finite element analysis (FEA). By the way there are still other models of AAAs without G&R have been shown to be potentially useful to aid in patient-specific treatment planning [31,68]. A coupled simulation of G&R with hemodynamics was conducted for studying its effects on AAA expansion [95]. Geometric, kinetic and material parameters have 9 been identified for individual patients using inverse optimization techniques for modeling the growth of AAAs. Furthermore, it has been shown that the same material parameters for AAA expansion can help to predict intrasac-pressure dependent vascular adaptation after endovascular repair [62]. Translating recent computational advances into a predictive tool for individualized clinical treatment, however, requires a major paradigm shift due to the incompleteness of the model, limited information, and uncertainty associated with clinical measurements with regard to each individual patient. Most importantly, the associated uncertainty in the prediction propagated from various sources needs to be correctly quantified. For example, the G&R model’s internal parameters need to be carefully adjusted according to patient-specific data, e.g., longitudinal computed tomography (CT) images, in order to make better prediction and so be useful for clinical decisions. The aims of this paper are to develop a framework that 1) first calibrates the physical AAA G&R model using patient-specific longitudinal CT scan images, 2) predicts the expansion of an AAA in future time, and 3) analyzes the associated uncertainty in the prediction. To achieve our aims, we perform Bayesian calibration of our computational AAA G&R model [58]. In particular, Bayesian calibration will be used to incorporate the computational G&R model, patient-specific data (e.g., CT scan images), and various uncertainties as well as to compute the uncertainty level of the prediction on the AAA expansion. As computational science advances, there is a growing interest in applying Bayesian calibration developed from the statistical community to engineering applications (see more details in [49, 50, 58, 77, 112] and references therein). To the best of our knowledge, we are the first to apply the Bayesian calibration method to the AAA G&R computation model. The successful outcomes of our aims will make the G&R computation model viable to aid 10 clinicians in decision making. Hawkins-Daarud A. et al. [47] used a Bayesian framework to address questions on validation, model selection, and uncertainty quantification for tumor growth. Biehler J. et al. [7] presents an uncertainty quantification framework based on multifidelity sampling and Bayesian formulations and analyzes the impact of the uncertainty in the input parameter on mechanical quantities typically related to abdominal aortic aneurysm rupture potential. Zhao X. and Pelegri A. [119] used a Bayesian approach to estimate a reasonable quantification of the probability distributions of soft tissue mechanical properties in the presence of measurement noise and model parameter uncertainty. While HawkinsDaarud A. et al. [47] considered a set of discrete model candidates to select and validate a model, we consider a statistical model for the true physical process by introducing two Gaussian random fields for the G&R computation model and the inadequacy of the model. We adopt a Bayesian calibration technique proposed in [58] to calibrate parameters in the G&R computation model and predict the AAA expansion. The proposed statistical model and Bayesian calibration can take different sources of uncertainty; therefore, it is well suited to achieve our aims in predicting the AAA expansion process as well as in computing the propagated uncertainty given the patient-specific data. The contributions of our paper are as follows. First, we formulate the Bayesian calibration of our AAA G&R computation model taking into account model inadequacy, prior distributions of model parameters (Seyedsalehi S and Zhang L, et al. [94]), measurement errors, and most importantly, patient-specific longitudinal CT scan images. Next, we demonstrate how to achieve the proposed aims by solving the formulated Bayesian calibration problem using a simulation study and real data analysis. In particular, we compare and discuss the performance and computation time under different sampling cases for the computation model output data and (synthesized) patient data, both of which are synthesized by the 11 G&R computation. We apply our Bayesian calibration to the real patient-specific CT data and validate our prediction, showing the usefulness of our approach to the computational science and medical communities in aiding decision making. The organization of the paper is as follows. Section 2.2 introduces the AAA G&R model, the quantity of interest in making prediction, and the full statistical model with hyperparameters for Bayesian calibration. In Section 2.3, we discuss assumptions for priors and then provide the posterior and predictive distributions under the Bayesian framework. Section 2.4 describes the design of the simulation study with synthetic observation data and a case study with real patient data. In Section 2.5, we present the results of the Bayesian calibration for both simulation and real data cases. Finally, we provide some discussions and concluding remarks in Section 2.6. 2.2 Models Prior of Parameters CT Images Bayesian Calibration Posterior of Parameters Prediction and Credible Band <-",-&%-&9"*( Statistical Models 2.3$*% Predictive Distribution Computer Model Figure 2.2. The flow-chart of Bayesian calibration. Bayesian calibration can be defined as a Bayesian approach to calibrate investigated parameters in a theoretical model for a real complex system (here stands for the G&R computer model for AAA) that enables the incorporation of uncertainty regarding parameters, 12 real observations and possible simulations from the theoretical model. Fig. 2.2 shows the general framework of Bayesian calibration, which also structured all the contents needed to be discussed in the paper. The left two rectangles and one on the top list the inputs we should prepare for a Bayesian calibration, while the right three rectangles list the outputs we can get from a Bayesian calibration. The rectangle with double edges in the middle depicts that Bayesian calibration is actually a specially designed complex system capsulating a bunch of statistical models. As you can see, Bayesian calibration is including a bunch of models and integrating different sources of data together to generate products, such as calibrated parameters (posterior distribution), prediction (predictive distribution) and statistical inference (credible bands). In this section, we introduce the data and all the basic models needed in our Bayesian calibration (basically things in left two rectangles and the rectangle with double edges in the middle). The following parts in this section are organized as follows. 2.2.1 introduces a computational model of AAA growth. 2.2.2 introduces how the real data from CT images are prepared for Bayesian calibration. 2.2.3 introduces a calibration model to combine the computational model with real data. 2.2.4 introduces statistical specifications for implementing Bayesian calibration. 2.2.1 AAA G&R computation model The Bayesian calibration framework includes a computational G&R model of AAA as a data input, where the detailed computational G&R model was described in Zeinali-Davarani et al. [117]. Briefly, the computational G&R model has three parts: constitutive relations of intrinsic material behavior, a stress-mediated production function, and a damage function. To describe material behavior, we assume that the aorta is comprised of three stress-bearing constituents, viz. elastin, collagen fiber families, and vasoactive smooth muscle cells. Each 13 constituent, in addition to its contribution to the construction and strength of the artery’s wall, has its own individual properties, where the population-based material parameter distributions of abdominal aortas was given by Seyedsalehi et al. [94]. Also, the stress-mediated production functions connect the stress-state of the artery to changes of the mass rates with a stress-mediated feedback approach. Moreover, in the G&R model, the AAA is initialized by imposing damage function to the elastin of normal aorta. The initialization can be justified by the previous study [87], which shows that one of the main features of AAA is that elastin is decreased and the study [55, 115] that elaborate that the degradation of elastin can directly form patient-specific shapes of aneurysms. However, the other factors, such as alteration of intrinsic material parameters [83], disturbed collagen production [23, 24] and hemodynamics [3], are taken as the minor reason of initialization of AAA, and be considered as statistical error later. In this paper, we aim to calibrate the computational model with the expansion rate and evolution of aneurysm shapes along the longitudinal scan images. Our focus of interest, called quantity of interest (QoI) (described in section 2.2 for the Bayesian calibration), is used to calibrate the aneurysm growth with medical data. However, those parameters related to intrinsic material function and stress-mediated production function cannot directly be used to form the patient-specific geometry, but have a strong relation with patient’s health, age and other patient behavior, while with respect to [115], the degradation of elastin play a leading role in forming a patient-specific geometry, which is to say different patient-specific geometry can be achieved given different damage parameters in G&R simulation. Consequently, we prescribe the mean values of the material parameters [94] and stress-mediated parameters [116] and calibrate the damage parameters statistically with the evolution of AAA geometry. Also, in the statistical calibration model introduced later, random variations of prescribed 14 parameters will be treated as errors. In our model, damage function works on elastin and vasoactive smooth muscle, therein elastin plays an important role in the mechanical behavior of aorta. Elastin contributes resilience and elasticity to the aortic tissue; but when the person’s age is advanced, elastin cannot be replaced. For an AAA, the localized dilation of the aorta is initiated by the degradation of the elastin. This degradation (or damage) will reduce the amount of elastin, leading to the weakening of the wall. The damage will result in the increase of the diameter and wall stress of the aneurysm. The increase of the stress in constituents results in an increase in the accumulation of collagen and smooth muscles as a way to compensate for the elastin’s loss and decrease stress in the wall. All the relations and details of the model have been previously reported by Zeinali-Davarani et al. [116] and Kwon et al. [62]. Although the AAA’s shape has a wide variability and its expansion associates with the complexity of the G&R process, in the current study, we decided to use the 2D axisymmetric G&R model and first study the effectiveness of the Bayesian calibration. As discussed, the elastin damage in the aortic wall initiates the growth of the aneurysm. Here we define the initial elastin’s damage function as " |s − α1 |α2 d(s) = θ1 exp − 2 θ22 #! (2.1) where s is the coordinate defined on the centerline. g(s) = 1 − d(s) is the ratio of remaining elastin to the initial amount at s. The damage functions in (2.1) contains two parameters of interest {θ1 , θ2 } to be calibrated from the real data and two other quantities {α1 , α2 } that can be easily identified via CT images without Bayesian calibration. They have their own specific effects on the shape of 15 the damage function and thus on the stress-stretch and geometrical state of the AAA at a given time. In particular, θ1 is a scaling factor with θ1 ∈ [0, 1). An increase in θ1 toward 1 will increase the degradation of elastin and thus increase the dilatation of the artery. θ1 = 0 means no degradation; hence, the artery will retain its initial state. α1 corresponds to the location on the centerline at which the maximum damage occurs. α1 and α2 control the areal distribution of damage on the sides of the peak location. Note that α1 and α2 are relatively easy to be estimated from the CT images compared to θ1 and θ2 . In particular, the maximum diameter occurs at or very close to the location of the maximum damage; hence, we estimated α1 by finding the location of the maximum diameter using the CT images. α2 is chosen by minimizing the error between the model simulation and CT images. Thus, we fix α1 and α2 with appropriate values obtained from CT images directly a-priori and focus on calibration of θ1 and θ2 in a Bayesian way. To see our simulation results in a symmetrical and straightforward configuration, we have chosen α1 = 7.5 (i.e., averaged value) and α2 = 2 (i.e., the most simple shape). On the other hand, we used α1 = 12.1 and α2 = 5, which are estimated directly from the patient CT images for real data analysis in Section 2.5.2. Additionally, the choice of α2 = 2 for the simulation study covers the other end of the possible values as compared to the real observation case with α2 = 5. 2.2.2 Quantity of interest (QoI) for AAA G&R The quantity of interest (QoI) of AAA G&R is what we want to predict in AAA growth in future time. The selection of the QoI will let us subsequently determine the statistical models and investigate the associated uncertainties. In our study, we select the radius of the inscribed sphere with respect to a centerline as a QoI, where a centerline is computed 16 by inscribed spheres in the AAA 3D images [40]. Figs. 2.3a and 2.3b show inscribed spheres and the resulting centerline, respectively, for a given 3D point cloud sampled from CT scan image data from a patient. This QoI selection is consistent with medical practice in which the diameter of the AAA is used as an important decision variable [59, 60]. To have a brief idea of the relationship among G&R model, 2D axisymmetric and QoI, we plot a diagram showing how the 3D images are transformed into 2D data and how the computer model simulates it. When we look at Fig. 2.4, there are two lines of processes. First, I will describe how the data is transformed, as shown in Fig. 2.4a,Fig. 2.4b and Fig. 2.4c. Fig. 2.4a shows the same process of transforming 3D images into 2D data as Fig. 2.3. Thereafter, taking the length of the centerline as the coordinate, we obtain the radius of the inscribed sphere r versus the height coordinate of the centerline s in Fig. 2.4b, whose details are explained in [40]. Then we truncate the curve in Fig. 2.4b, and get the QoI in Fig. 2.4c. In short, the 3D images are transformed into 2D data and then cut off into QoI. Second, I will describe how G&R model works to simulate the QoI. Fig. 2.4d shows the computational model of healthy aorta, then the healthy aorta evolutes into AAA following mechanisms introduced by computational G&R model, which is actually a 2D axisymetric geometric shape (Fig. 2.4e). Then we obtain the radius versus the height coordinate of the centerline in Fig. 2.4f which can be called as simulation. And finally, Fig. 2.4g shows the simulation result by minimizing the difference between the QoI and simulation. Fig. 2.5 shows the real data needed in the Bayesian calibration procedure, which is the QoI’s from scan images of patient K taken at 4 time points. To get each curve, we can repeat the same process from (a) to (c) in Fig. 2.4 on each scan image. Bayesian calibration combines the QoI’s and the simulation data to calibrate the selected parameters in the computer model. And the radius in the figure are actually the target quantities we want 17 to predict finally. Therefore, we are saying that we would like to make prediction of the QoI with associated uncertainty in future times. In this paper we consider the following questions. Can we predict the radius vs. height profiles for CT scan images 3 and 4 in future time given CT scan images 1 and 2 in Fig. 2.5? What is the uncertainty associated with such prediction? We answer these questions by applying the proposed Bayesian calibration method. The answers to the questions for this particular patient-specific data set shown in Fig. 2.5 are given in Section 2.5.2 as a real data study case. (a) (b) Figure 2.3. 3D point clouds sampled from a CT scan image. (a) An inscribed sphere with point clouds from the CT scan images and (b) a centerline computed from point clouds. 18 Figure 2.4. The diagram of data preparation. 2.2.3 Calibration model Let ζ(x) be the QoI of the true AAA G&R process, the input variable of which is denoted by x and defined as x = [t s], where t is the time and s is the height on the centerline as illustrated in Fig. 2.3b. Suppose that we have n observations from CT scan images of a given patient, then ”image n” is the ”n-th” image of one patient. To model possible observation error, e.g., resolution and segmentation errors in CT scan images, we consider the noisy observations as follows. zi = ζ(xi ) + εi , ∀i ∈ {1, · · · , n}, (2.2) where the difference (or error) between the observation and the true process is denoted by εi . We further assume that each εi is independently distributed as N (0, λ), which represents a normal (Gaussian) random variable with mean 0 and variance λ. 19 Figure 2.5. Radius vs height coordinate on the centerline measured from CT scans of a patient over the surveillant time. The lines from bottom to top represent images 1, 2, 3, and 4, respectively. We denote the QoI of the G&R computation model output at x as r(x, θ), where θ is called a set of calibration parameters, or calibration inputs. In the G&R computation model for the AAA expansion, damage parameters serve as calibration parameters that are patient specific for the AAA growth, i.e., the calibration parameters are θ = [θ1 θ2 ] defined in Section 2.2.1. Given the available QoI of the G&R computation model, we model the true process as ζ(xi ) = r(xi , θ) + δ(xi ), 20 (2.3) where δ(·) is a model inadequacy function, i.e., model error, which is independent of the computation outputs. It is natural to assume the true AAA expansion process ζ(x) cannot be fully described by the computation model, therefore we introduce δ(x) to represent the discrepancy in (2.3). By combining (2.2) and (2.3), we have zi = r(xi , θ) + δ(xi ) + εi , (2.4) which gives a calibration model that relates G&R computation outputs with the true process and the observations. The Bayesian calibration method [58] we adopt in this paper introduces Gaussian process priors for the computational model and the model error in order to calibrate θ and predict the QoI. Fig. 2.2 shows the flow-chart for Bayesian calibration. The statistical models in Fig. 2.2 build on two Gaussian process priors for the G&R computation model and model error, which will be discussed in the next section. 2.2.4 Statistical models Let GP(m(·), k(·, ·)) be the Gaussian process with the mean function m(·) and the covariance function k(·, ·). GP is flexible and popularly used as a prior model for functions [110, 111]. We introduce the following Gaussian processes as prior beliefs for the G&R computation model and the model error: r(x, θ) ∼ GP(m1 (x, θ), k1 (x, θ; x0 , θ 0 )), (2.5) δ(x) ∼ GP(m2 (x), k2 (x, x0 )). To specify Gaussian process priors in (2.5) further, we introduce mean and covariance 21 structures for both processes. For a mean structure, one can consider a linear combination of basis functions to approximate the general mean structure. The linear function is always the first thing to do, although it is not perfect, since the results of linear function will give us insights for other nonlinear study. Let h(x, θ) be a vector of basis functions and β be a vector of corresponding coefficients then we can represent m(x, θ) = h(x, θ)β T , where (·)T is the transpose of a matrix or a vector. Since we need two mean functions, m1 (x, θ) and m2 (x), we introduce two sets of h and β such that m1 (x, θ) = h1 (x, θ)β1T and m2 (x) = h2 (x)β2T . In this paper, we consider the following linear mean structures for computational efficiency: m1 (x, θ) = h1 (x, θ)β1T = β10 + β11 t + β12 θ1 + β13 θ2 , (2.6) where h1 (x, θ) = [1 t θ1 θ2 ] and β1 = [β10 β11 β12 β13 ], and m2 (x) = h2 (x)β2T = β21 t + β22 s, (2.7) where h2 (x) = [t s] and β2 = [β21 β22 ]. These mean structures imply that the mean function of the G&R computation model is linear in time t and calibration parameters, {θ1 , θ2 }. The mean function of the model error is linear in time t and location s. β = [β1 β2 ] are hyperparameters for the mean functions in a Bayesian context. For a covariance structure, we use the following exponential functions as follows. k1 (x, θ; x0 , θ 0 ) = σ12 exp{−(x − x0 )Ωx (x − x0 )T } exp{−(θ − θ 0 )Ωθ (θ − θ 0 )T }, k2 (x, x0 ) = σ22 exp{−(x − x0 )Ω?x (x − x0 )T }, 22 where Ωx , Ωθ , Ω?x are diagonal matrices such that      ? ωx1  ωx1 0   ωθ1 0  ?  Ωx =   , Ωθ =   , Ωx =  0 ωx2 0 ωθ2 0  0   ? ωx2 ? ’s. so that the hyperparameters for the covariance functions are σ12 , σ22 and wxj , wθj , wxj Note that the resulting covariance functions are not isotropic since they allow different scaling for each coordinate. Additionally, they can be regarded as separable covariance functions since the dependence due to one coordinate is independent of the dependence due to another coordinate. One can consider a more generalized covariance structure at the cost of higher computational complexity. For the parameters controlling the covariances, we introduce ψ = [ψ1 ψ2 ], with ψ1 = [ωx1 ωx2 ωθ1 ωθ2 σ12 ], (2.8) ? ω ? σ 2 ]. ψ2 = [ωx1 x2 2 ψ1 is the set of hyperparameters related to the G&R computation model. ψ2 is the set of hyperparameters related to model error. For the remainder of the paper, the AAA G&R computation model is referred to as the computation model for notational simplicity. 2.3 Bayesian analysis for calibration Bayesian calibration considers computation model outputs as the data in addition to the observations and combines these two to calibrate θ under the Bayesian framework. Calibration helps improve the prediction of the QoI compared to the prediction using only the computation model outputs or using only observations. The diagram of the Bayesian cali23 bration process Fig. 2.6 gives us an overview of the whole process of Bayesian analysis which combines the information of priors, computer model, statistical model and observations to calibrate the parameter θ to make predictions. The top right part shows that computer model data covers the whole time range of real observations and predictions. And the computer model data helps to estimate the covariance hyperparamter ψ1 , which is defined as State 1. The left middle part shows the real observations helps to estimate the covariance hyperparameter ψ2 , which is defined as Stage 2. The right bottom part shows the calibration and prediction which is called Stage 3. As we can see calibration parameters only exist in computer models, and we use real observations to calibrate the parameters in the computer model so that the prediction can be improved. 2.3.1 Likelihood of computer model We need computation model outputs generated at various sets of θ and x as the data for calibration. To avoid confusion, we use θ ∗ = [θ1∗ θ2∗ ] and x∗ instead of θ and x, respectively, when they were used to generate computation model outputs. For example, at (x∗ , θ ∗ ), we obtain one computation model output r(x∗ , θ ∗ ). Computation model outputs which correspond to a set of various (x∗ , θ ∗ ) will be used in Bayesian calibration as a part of the data set. We call θ ∗ calibration inputs and x∗ variable inputs. To illustrate the Bayesian calibration method [58, 77], we introduce necessary notations. Regarding the computation model outputs, let N be the total number of pairs of variable inputs and calibration inputs. Then, for the ith set of inputs, (x∗i , θi∗ ), we let yi be the corresponding computation model output, that is, yi = r(x∗i , θi∗ ). We further define y = [y1 · · · yN ]T ∈ RN ×1 as the computation model output vector and ∗ )T ]T ∈ RN ×4 as the input matrix that corresponds to y. Xc = [(x∗1 , θ1∗ )T , · · · , (x∗N , θN 24 D1 θ (t, S) Computer data θ (t, S) Stage1 (t,S) θ (t,S) (t, S) (t,S) (t,S) (t,S) D2 θ (t,S) (t,S) (t, S) (t,S) D3 (t, S) θ (t, S) Real Observations Stage2 Bayesian Prediction t years Stage3 Stage1 Prediction Calibration Stage3 Stage2 Figure 2.6. Diagram of the Bayesian calibration process. Then, the assumption of the Gaussian process prior on the computation model gives y ∼ N (H1 (Xc )β1T , V1 (Xc )), ∗ )T ]T and the (i, j) entry of V (X ) is k ((x∗ , θ ∗ ), (x∗ , θ ∗ )). where H1 (Xc ) = [h1 (x∗1 , θ1∗ )T , · · · , h1 (x∗N , θN 1 c 1 i i j j 2.3.2 Likelihood of real observations Let n be the number of observations and z = [z1 · · · zn ]T ∈ Rn×1 be the set of observations corresponding to the variable input matrix Xo = [xT1 , · · · , xTn ]T ∈ Rn×2 . Note that Xo is not 25 necessarily the same as the set of variable inputs for the computation model outputs. In general, the number of observations is smaller than that of the computation model outputs since we can control the amount of the computation model outputs. To calibrate θ from the observations, we augment variable inputs Xo with θ such that Xo (θ) = [(x1 , θ)T , · · · , (xn , θ)T ]T . Then, from the calibration model, we have z ∼ N (H1 (Xo (θ))β1T + H2 (Xo )β2T , λIn + V1 (Xo (θ)) + V2 (Xo )), where H1 (Xo (θ)) = [h1 (x1 , θ)T , · · · , h1 (xn , θ)T ]T and H2 (Xo ) = [h2 (x1 )T , · · · , h2 (xn )T ]T . V1 (Xo (θ)) is defined in a similar way to define V1 (Xc ). The (i, j) entry of V2 (Xo ) is k2 (xi , xj ). 2.3.3 Joint likelihood We then combine the computation model outputs and observations, d = [y T z T ]T ∈ R(N +n)×1 , which we call a data vector.    y  d =   ∼ N (md (θ), Vd (θ)), z where md (θ) := E(d|θ, β, ψ) = H(θ)β T , 26 (2.9) with   H(θ) =   H1 (Xc ) 0 H1 (Xo (θ)) H2 (Xo )  , and   Vd (θ) := var(d|θ, β, ψ) =  V1 (Xc ) C1 (Xc , Xo (θ))T C1 (Xc , Xo (θ)) λIn + V1 (Xo (θ)) + V2 (Xo )   , where In is the n × n identity matrix and C1 (Xc , Xo (θ)) is a cross-covariance matrix whose (i, j) entry is k1 ((x∗i , θi∗ ), (xj , θj )). We note that md (θ) and Vd (θ) also depend on β and ψ. We drop them to reduce notational complexity. Remark 2.3.1 Recall that the calibration model assumes the true process as the sum of the computation model process and the model error process by assuming they are Gaussian processes as shown in (2.4) and (2.5). If we have observations of the true process only, we cannot identify these two Gaussian processes. However, the Bayesian calibration method makes use of both computation model outputs and observations as data. Computation model outputs contribute to the first Gaussian process only while observations contributes to both Gaussian processes as well as observation errors. Since the computation model outputs and observations contribute to two Gaussian processes differently, there is no identifiability issue. 2.3.4 Calibration To estimate (θ, β, ψ) under the Bayesian framework, we consider the following prior distributions and assumptions. A.1 β1 in (3.3) and β2 in (3.4) have non-informative priors, i.e., p(β1 , β2 ) ∝ 1 . A.2 θ is independent of the other parameters. 27 A.3 θ follows a normal distribution. A.4 ψ1 and ψ2 in (2.8) follow lognormal distributions. A.5 log(λ) has a non-informative prior, i.e., p(log(λ)) ∝ 1. Remark 2.3.2 Note that based on A.1 and A.2, we can get the joint prior distribution in the following form p(θ, β, ψ) ∝ p(θ)p(ψ). For A.3, we use the sample mean and variance of the calibration parameter inputs that were used to generate computation model outputs as mean and variance for normal prior density. A.4 guarantees that ψ1 and ψ2 in (2.8) are positive values in the calculation since they are all hyperparameters in covariance functions. Together with the prior specification given above, we have the following full joint posterior distribution of (θ, β, ψ) given d: p(θ, β, ψ | d) ∝ p(θ)p(ψ)f (d; md (θ), Vd (θ))   1 − 21 T −1 × exp − {(d − md (θ)) Vd (θ) (d − md (θ))} , ∝ p(θ)p(ψ)|Vd (θ)| 2 (2.10) where f (d; m, V ) is the multivariate Gaussian density function with mean m and variance V . Note that the posterior distribution of (θ, β, ψ) given d is proper even if we assume non-informative priors for β. To calibrate (estimate) θ from the posterior distribution of θ | d, we need to integrate out β and ψ. As shown explicitly in (2.9), md (θ) depends on β, while Vd (θ) depends on ψ. Clearly, md (θ) is a linear function of β, so we can find β | θ, ψ, d ∼ N (β̂(θ), W (θ)), 28 (2.11) where β̂(θ) = W (θ)H(θ)T Vd (θ)−1 d, W (θ) = (H(θ)T Vd (θ)−1 H(θ))−1 . Thus, we can integrate out β in (2.10) and get 1 1 p(θ, ψ | d) ∝ p(θ)p(ψ)|Vd (θ)|− 2 |W (θ)| 2   1 T −1 × exp − {(d − H(θ)β̂(θ)) Vd (θ) (d − H(θ)β̂(θ))} . 2 (2.12) While β is integrated out easily, ψ is not. Numerical integration can be considered but may not give an accurate result since it is the multi-dimensional integration. Thus, we use the conditional posterior distribution of θ given d and a plausible estimate of ψ plugged in, which is proposed by Kennedy and O’Hagan (2001) [58]. The plausible estimate of ψ is obtained in a two-step approach. First, we estimate ψ1 by maximizing p(ψ1 |y) (Stage 1). Then, we estimate ψ2 by maximizing p(ψ2 |d, ψ1 ) after plugging in the estimated ψ1 (Stage 2). This process is shown in Fig. 2.7. In this study, numerical maximization is used with 500 iterations for Stage 1 and 100 iterations for Stage 2. p(θ, λ, ψ|d) ψ̂ Stage1 p(θ|λ, ψ, d) Stage2 Estimate λ and ψ2 by maximizing p(λ, ψ2 |d, ψ1 ) Estimate ψ1 by maximizing p(ψ1 |y) Figure 2.7. Two stages to estimate ψ. Having estimated ψ and plugging it into the distribution of θ | d, ψ, we obtain the 29 posterior distribution of the calibration parameter θ to be 1 1 p(θ | ψ̂, d) ∝ p(θ)|Vd (θ)|− 2 |W (θ)| 2   1 T −1 × exp − {(d − H(θ)β̂(θ)) Vd (θ) (d − H(θ)β̂(θ))} 2 (2.13) and use (2.13) to calibrate and make an inference of θ. 2.3.5 Prediction We introduce another set of variable inputs Xp = [xTn+1 , · · · , xTn+m ]T ∈ Rm×2 to be used for prediction. The corresponding prediction of the QoI at given variable inputs Xp is denoted as P = [P1 · · · Pm ]T ∈ Rm×1 . For quality of the prediction, we can consider variable inputs in Xc that cover Xp . Under the Bayesian framework, a prediction can be made using the predictive distribution of the (unobserved) true process ζ(x) given full data d. More specifically, we consider the predictive distribution of ζ(x) given d and ψ̂ since we fix ψ using the estimate ψ̂ in our approach. We first analytically obtain the distribution of ζ(x) | θ, ψ, d, β, which is conditionally normal since ζ(x) and d are normally distributed. Using the laws of total expectation and total variance to integrate out β, we obtain the distribution of ζ(·) conditional on θ and ψ̂, which is also normal. Therefore, its mean function is given by E(ζ(x)|θ, ψ̂, d) = h(x, θ)T β̂(θ) + v(x, θ)T Vd (θ)−1 (d − H(θ)β̂(θ)), where    h1 (x, θ)  h(x, θ) =  , h2 (x) 30 (2.14) and   v(x, θ) =   V1 ((x, θ), Xo ) V1 {(x, θ), Xo (θ)} + V2 (x, Xo )  . Additionally, its covariance function is given by cov(ζ(x), ζ(x0 ) | θ, ψ̂, d) = k1 ((x, θ), (x0 , θ)) + k2 (x, x0 ) − v(x, θ)T Vd (θ)−1 v(x0 , θ) + (h(x, θ) (2.15) − H(θ)T Vd (θ)−1 v(x, θ))T W (θ)(h(x0 , θ) − H(θ)T Vd (θ)−1 v(x0 , θ)). We then can obtain the predictive distribution of ζ(x) given d and ψ̂ by integrating θ out from the distribution of ζ(x) | θ, ψ̂, d with respect to the posterior distribution of θ given in (2.13). From (2.14) and (2.15), we obtain the predictive expectation and variance of ζ(·) evaluated at inputs Xp as follows: E[ζ(Xp ) | ψ̂, d] = Eθ {E[ζ(Xp ) | θ, ψ̂, d]}, (2.16) and var[ζ(Xp ) | ψ̂, d] = Eθ {var[ζ(Xp ) | θ, ψ̂, d]} + varθ {E[ζ(Xp ) | θ, ψ̂, d]}, (2.17) where ζ(Xp ) = [ζ(xn+1 ) · · · ζ(xn+m )]T . Based on the posterior density (2.13) of the calibration parameter θ, we can draw a bunch of Markov Chain Monte Carlo (MCMC) samples for θ and thus predictive expectation and 31 variance (2.16) and (2.17) can be approximated as follows.. First, we draw M samples {θ(1), · · · , θ(M )} from (2.13) using the Metropolis-Hastings algorithm [15] and plug these samples into (2.14) and (2.15). Next, we get the predictions P , which are the MCMC estimates of (2.16) written as M 1 X b P := E[ζ(Xp ) | ψ̂, d] = E[ζ(Xp ) | θ(i), ψ̂, d]. M (2.18) i=1 Similarly, the outermost expectation and variance on the right hand side of (2.17) are estimated by their corresponding sample mean and sample variance with respect to θ. Then, we obtain the predictive variances Vp as b θ {var[ζ(Xp ) | θ, ψ̂, d]} + var Vp := var[ζ(X c c θ {E[ζ(Xp ) | θ, ψ̂, d]}, p ) | ψ̂, d] = E (2.19) where M 1 X b var[ζ(Xp ) | θ(i), ψ̂, d], Eθ {var[ζ(Xp ) | θ, ψ̂, d]} = M i=1 M var c θ {E[ζ(Xp ) | θ, ψ̂, d]} = 1 X 2 b {E[ζ(Xp ) | θ(i), ψ̂, d] − E[ζ(X p ) | ψ̂, d]} . M −1 i=1 We will construct the 95% credible intervals based on these predictions P and their predictive variances Vp . 2.4 Data Description We consider two cases of data sets. In the first case, we use the G&R computation model of the AAA expansion to generate synthetic observations in addition to producing computation 32 model outputs. This is the simulated observation case that validates our approach since we know the realized true process along with the parameters that generate it for comparison. In the second case, we use real data, i.e., observations from a patient’s CT scan images. The G&R computation model is used only to obtain computation model outputs in this real-world clinical application. When implementing Bayesian analysis, we standardize all the inputs x, θ and outputs y, z to stabilize computation. In particular, we use the notation θ = [θ1 θ2 ] org to denote the standardized ones while we use θ org = [θ1 org θ2 ] to denote original values for the rest of the paper. 2.4.1 Simulated Observation Cases For the simulated observation case, we generate r(x, θ) from the G&R computation model. To validate our approach, we realize and set patient specific values on the calibration parameters (say θ0 ) for the synthesized true process so that we evaluate the estimated calibration parameters. Recall that for the damage function, we need to set θ1 , θ2 , α1 , and α2 to produce the G&R computation model outputs. For the simulated observation case, we set org θ1 org = 0.65, θ2 org = 6, α1 = 7.5 and α2 = 2. Thus, the calibration input is θ0 = [0.65 6] for the true process. For the variable inputs x = [t s], we consider 8 equally spaced s from 0 to 15 cm and a time step of 5 days, which are fixed throughout the study. The time duration is 7 years, but several different choices of sampling times are considered. Note that when we implement Bayesian calibration, the values of the QoI (the radius with respect to the centerline) are standardized as well so that its mean is 0 and its variance is 1. Once we obtain a realization of r(x, θ0 ), we add model and observation errors to get the final observations. The model error, δ(x), is realized from the model assumption (2.5) with β2 = [0.001 0.001], ω21 = 1, ω22 = 1, and σ22 ∈ {0.005, 0.001}. Then the standard deviation 33 of model error δ(x) on each height at each time is about {0.071, 0.032}. The observation error ε is generated from N (0, λ) with λ = 0.001, i.e.,  ∼ N (0, 0.0322 ). The standard deviation of standardized computation outputs r is 1. To generate simulated observations, we add a model error process of 7.1% or 3.2% (with respect to r) and an observation error process of 3.2%. We chose these values for the model and observation errors in order to produce a data set that best illustrates the effects of different noise levels and sampling schemes. We also need computation model outputs at various combinations of variable inputs and org org calibration inputs. For calibration inputs, we consider θ org = [θ1 θ2 ] ∈ {0.5, 0.65, 0.8} × {2, 6, 10} so that there are total 9 combinations of θ1 and θ2 . For variable inputs, s will be considered at 8 equally spaced values from 0 to 15 cm. For time t, we consider three different scenarios to see the effects of different time resolutions. With two choices of σ22 , there are a total of 6 different scenarios. Table 2.1 gives the label of each scenario. Further details of each scenario are given in Table 2.2. With three cases of sampling time grids and the two levels of σ22 , we want to investigate the behavior of interaction between the computation model and Bayesian calibration. In Case 1, computation model inputs have a sparse time grid while those in Case 3 have a denser one. In Case 2, the computation model inputs do not cover the full time range we want to predict while those in Case 1 and Case 3 do. We note that variable inputs and calibration inputs for the proposed Bayesian calibration are standardized in the actual implementation so that we can assume the realized calibration inputs are centered at zero. Then, we use 0.3 as prior means of θ1 and θ2 (away from the zero) to investigate the robustness of prior distributions. In addition, we consider 0.1 for prior variances of θ1 and θ2 . The same setting for the prior of θ1 and θ2 is used later in the real observation case. Note that the normal prior can be used for θ1 that is constrained to 34 the interval [0, 1]. As θ1 was standardized (mean is 0, standard deviation is 1), the range of θ1 becomes [−1.3, 1.3] based on the parameter values we use. We set the prior of mean at 0.3, and the variance of prior at 0.1 (standard deviation is 0.32). According to the empirical rule, 99.7% of the samples from the prior will lie in the interval [−0.66, 1.26] which contains in the interval [−1.3, 1.3]. The sensitivity of prior choice on the posterior estimates of θ1 and θ2 is investigated and presented in Supplemental Fig. A.3 (in Supplementary Section Appendix 2), where we can see that the posterior distributions are robust against the subjective choice of prior distributions. Table 2.1. Differences in Xc and σ22 for simulation cases sampling time in Xc σ22 = 0.005 σ22 = 0.001 every 2 years every 1.5 years every 1 year 2.4.2 Case 1a Case 2a Case 3a Case 1b Case 2b/2c Case 3b Real Observation Case For the real observation case, we consider 4 longitudinal CT scan images of a patient as shown in Fig. 2.1. This particular patient’s CT scan examination spans a period of 3 years. For each image, we measured the corresponding radius (QoI) at each height on the centerline as illustrated in Fig. 2.5 using the maximally-inscribed sphere method developed in [40]. From the preliminary study of the damage shape with respect to the CT scan images, we selected α1 and α2 such that α1 = 12.6 and α2 = 5 for (2.1) a-priori. In addition, we selected the reasonable range of the calibration parameter θ a-priori to produce comparable computation model outputs for the particular observations (Fig. 2.5). More specifically, as prior information to determine the range of the calibration parameter θ org , we fit the diameter of the computation model outputs to the diameter calculated from CT scan images using fminsearch in MAT35 org org LAB (Mathwork, Natick, USA) and found the optimum value to be [θ1 θ2 ] = [0.2 0.9]. org org From this information, we consider θ org = [θ1 θ2 ] ∈ {0.05, 0.2, 0.35} × {0.7, 0.90, 0.99}. For these 9 combinations of θ org , we run the G&R model to produce the computation model outputs. More details for this real observation case are given in Table 2.2. Table 2.2. Case details Xc times Xc dimension Xo times Xo dimension Xp times Xp dimension y dimension z dimension σ22 δ % of r 2.5 Case 1a Case 1b [0 2 4 6] 288 × 4 [0 0.5 1 1.5 2 2.5 3] 56 × 2 [3.5 4 4.5 5 5.5 6] 48 × 2 288 × 1 56 × 1 0.005 0.001 7.2% 3.2% Case 2a Case 2b/2c Case 3a Case 3b [0 1.5 3 4.5 6] [0 1 2 3 4 5 6 7] 360 × 4 576 × 4 [0 0.5 1 1.5 2 2.5 3 3.5 4] 72 × 2 [4.5 5 5.5 6 6.5 7] 48 × 2 360 × 1 576 × 1 72 × 1 0.005 0.001 0.005 0.001 7.2% 3.2% 7.2% 3.2% Real Data [1 2 3 4] 288 × 4 [1 2] 16 × 2 [3 4] 16 × 2 288 × 1 16 × 1 0.01 10% Results In this section, we present results of the Bayesian calibration method on simulated observations and real observations as described in Section 2.4. For detailed implementation steps, readers are referred to Section Appendix 1 in the supplementary document. For simulated observations, we show three plots, prediction results with true processes (Fig. 2.8), 95% credible bands with true processes (Fig. 2.9) and relative errors for prediction (Fig. A.1). Relative errors R = [R1 · · · Rm ]T over the height of the AAA are calculated as Ri = | Pi − Ti | , Ti ∀i ∈ {1, · · · , m}, (2.20) where T = [T1 · · · Tm ]T is the realized true process for the simulated observation case that is described in Section 2.4.1. For the real observations, the prediction and credible band graphs are given together in 36 Fig. 2.11. Since only observations are available for this case, relative errors for prediction are calculated by replacing {Ti } in (2.20) with the observations (i.e., noisy true values) as shown in Fig. A.2b. 2.5.1 Results from Simulated Observation Cases From Figs. 2.8, 2.9, and A.1 of the simulation study, we observe the following findings. Prediction quality improves when more computation model outputs are used for calibration. When we consider different sampling time resolutions, prediction quality monotonically improves from Case 1 (coarse resolution) to Case 3 (fine resolution). Smaller model errors (Cases 1b, 2b, and 3b) provide better prediction results. Compared to Cases 1a, 2a, and 3a (left column of each plot), prediction quality from Cases 1b, 2b, and 3b (right column of each plot) is better. As one can expect, prediction quality decreases at the year for which the computation model outputs or observations are not available for calibration. For example, at years 4.5, 5, and 5.5, the errors between the true processes and predictions are more pronounced in Case 1a since both computation model outputs and observations are not available at those years. In particular, predictions at years 6.5 and 7 for Case 2 are the worst in terms of relative errors. Such results are reflected in wide credible bands as well. A possible reason is that calibration for Case 2 did not have information at years 6.5 or later to predict at years 6.5 and 7. Recall that computation model outputs are up to year 6 and observations are up to year 4. On the other hand, calibration for Case 1 has computation model outputs at year 6 to predict at years 4.5, 5 and 5.5. This result suggests that we can use computation model outputs at future times during calibration for better prediction. In this regard, results from Case 3 are better than Cases 1 and 2. However, more computation model outputs in the calibration process implies higher computational burden as shown in 37 T[3.5] P[3.5] T[4.0] P[4.0] T[4.5] P[4.5] T[5.0] P[5.0] T[5.5] P[5.5] T[6.0] P[6.0] 2.7 2.5 Radius 2.3 2.1 2.9 2.5 2.3 2.1 1.9 1.9 1.7 1.7 1.5 1.5 1.3 0 3 6 9 12 T[3.5] P[3.5] T[4.0] P[4.0] T[4.5] P[4.5] T[5.0] P[5.0] T[5.5] P[5.5] T[6.0] P[6.0] 2.7 Radius 2.9 1.3 0 15 3 6 Height 2.9 T[4.5] P[4.5] T[5.0] P[5.0] T[5.5] P[5.5] T[6.0] P[6.0] T[6.5] P[6.5] T[7.0] P[7.0] 2.7 2.5 Radius 2.3 2.1 2.5 2.3 2.1 1.7 1.7 1.5 1.5 6 9 12 T[4.5] P[4.5] T[5.0] P[5.0] T[5.5] P[5.5] T[6.0] P[6.0] T[6.5] P[6.5] T[7.0] P[7.0] 2.7 1.9 1.3 0 15 3 6 Height 2.7 2.5 2.3 Radius 15 2.1 2.9 2.5 2.3 2.1 1.9 1.9 1.7 1.7 1.5 1.5 6 9 12 T[4.5] P[4.5] T[5.0] P[5.0] T[5.5] P[5.5] T[6.0] P[6.0] T[6.5] P[6.5] T[7.0] P[7.0] 2.7 Radius T[4.5] P[4.5] T[5.0] P[5.0] T[5.5] P[5.5] T[6.0] P[6.0] T[6.5] P[6.5] T[7.0] P[7.0] 1.3 0 15 3 Height (e) Case 3a 12 (d) Case 2b 2.9 3 9 Height (c) Case 2a 1.3 0 15 2.9 1.9 3 12 (b) Case 1b Radius (a) Case 1a 1.3 0 9 Height 6 9 12 15 Height (f ) Case 3b Figure 2.8. Predictions of the true processes. T [t] denotes the true values at year t, while P [t] denotes the predictions at year t. The solid line denotes the true values, while the dashed line denotes the predictions. Table 2.4. From the Bayesian calibration, we have posterior samples of calibration parameters as 38 (a) Case 1a (b) Case 1b (c) Case 2a (d) Case 2b (e) Case 3a (f ) Case 3b Figure 2.9. 3D 95% credible band of predictions. The blue stars denote the true values lying inside the credible band. The red marks denote the true values lying outside the credible band. well as hyperparameters. For the simulated observation case, true values of parameters (values that were used to generated the data) are known so that we can compare the per- 39 formance of Bayesian calibration for various simulation scenarios. The calibrated estimates of standardized θ1 and θ2 for each case are provided in Table 2.3. Posterior densities of standardized θ1 and θ2 are shown in Supplementary Fig. A.3. The prior distributions and estimates of hyperparameters are shown in Supplementary Tables A.1–A.8. Comparing the posteriors of θ1 and θ2 in all cases, we clearly notice that the posteriors of Case 3 have the sharpest peaks around the true values, which can be utilized as point estimators. As shown in Table 2.3, the posterior estimates of θ1 and θ2 in Case 3b are the closest to their true values 0 and 0 among all cases. This also illustrates that more information from computation model outputs can give more accurate estimates of the calibrated parameters. Consequently, better estimation of calibrated parameters is likely to give us better quality of predictions. 2.9 T[4.5] P[4.5] T[5.0] P[5.0] T[5.5] P[5.5] T[6.0] P[6.0] T[6.5] P[6.5] T[7.0] P[7.0] 2.7 2.5 Radius 2.3 2.1 1.9 1.7 1.5 1.3 0 3 6 9 12 15 Height (a) Predictions of the true processes. T [t] denotes the true values at year t, while P [t] denotes the predictions at year t. The solid line denotes the true values, while the dashed line denotes the predictions. (b) 3D 95% credible band of predictions. The blue stars denote the true values lying inside the credible band. The red marks denote the true values lying outside the credible band. Figure 2.10. Predictions and credible bands for Case 2c. 40 2.5.1.1 Robustness of prior selection To investigate the impact of different priors on Bayesian calibration, we compare the results of Case 2b and Case 2c, where Case 2c is the same as Case 2b except the prior variances of calibration parameters are 10 times larger. The posteriors of θ1 and θ2 for Case 2c (Fig. A.4a, in the supplementary document) cover the prior mean values 0 and 0, but they have larger posterior variances than those from Case 2b due to the larger prior variances. We see that the predictions (Figs. 2.8d and 2.10a) and the 95% credible bands (Figs. 2.9d and 2.10b) are similar even though their prior variances are quite different. This can be seen also in relative errors (Figs. A.1d and A.2a). This implies that our Bayesian calibration method is robust to such changes in priors. Therefore, we may use a diffuse prior that can cover all the possible values of parameters θ by learning the statistical information from the background and the computation model. One can choose the priors for the real observation case in a similar way. 2.5.1.2 Computation time In this study, computation model outputs for calibration were generated separately. Calibration and prediction were implemented using the computational model outputs together with observations. Our method requires computation model outputs for each set of calibration parameters. Thus, we provide computation time to generate computation model outputs per each set of calibration parameters. The time to generate computation outputs by solving the G&R computation model is 10 minutes for one combination of parameters (θ1 and θ2 ), producing outputs in 8 grid points of heights (total height is 15 cm) and every 5 days out of total 2750 days. The G&R computation has been coded and implemented using MATLAB R2012a on an Intel Core i7 3770 3.4 GHz Processor with 12 GB of RAM. 41 If we generate a large number of computation model outputs for densely sampled calibration parameter values, the accuracy of calibration and prediction will improve at the expense of the computational cost. Note that this computation is naturally parallel since we run the computation model on a given parameter vector, the procedure of which is repeated to cover a defined range of parameter combinations. As the computation model becomes increasingly complicated, this time will also increase. The total computation will still remain parallel. In Table 2.4, we show the computation time for Bayesian calibration and prediction. Case 1, Case 2, and Case 3 took around 6.5, 12, and 30 hours, respectively. As expected, Case 3 takes the longest time yielding the best prediction results. To provide more insight, we divide the total computational time into two parts in Table 2.4: one spent in optimization to find ψ1 and ψ2 , the other one spent in calculation of estimation, prediction, and uncertainty via sampling. As can be seen in Table 2.4, the latter part takes more time due to the sequential computation of Bayesian analysis steps. All computations for Bayesian analysis were implemented on a twelve-core CPU 2.90GHz running Windows Server 2008 using R software (R version 3.1.1 Copyright (C) 2014 The R Foundation for Statistical Computing). We used the package “calibrator” [45] in R with appropriate modifications in order to adopt our application and also to reduce computation time significantly (e.g., 100 times faster). The results from Cases 1-3 illustrate that if affordable, the best results are achieved when Xc has a dense time grid covering the whole targeted prediction range of interest. 2.5.2 Results from Real Observation Case In this section, we discuss the calibration results from real observations using CT images 1-4 as shown in Fig. 2.5, i.e., (CT images taken at 0, 1.2, 2.3, and 3.2 years, respectively). To generate computation model outputs to be used for calibration, we consider the inputs Xc 42 Table 2.3. Posterior estimates of θ1 and θ2 org org Case θ1 (mean, std1 ) θ2 (mean, std) θ1 (mean, std) θ2 (mean, std) 1a (0.678, 0.035) (6.05, 0.105) (0.225, 0.288) (0.015, 0.032) 1b (0.668, 0.023) (5.912, 0.147) (0.149, 0.281) (−0.027, 0.045) 2a (0.681, 0.035) (6.072, 0.105) (0.249, 0.079) (0.022, 0.032) 2b (0.672, 0.024) (6.013, 0.080) (0.177, 0.195) (0.004, 0.025) 3a (0.669, 0.016) (5.997, 0.056) (0.155, 0.127) (−0.001, 0.017) 3b (0.660, 0.010) (6.003, 0.040) (0.083, 0.084) (0.001, 0.012) 2c (0.665, 0.032) (5.992, 0.115) (0.122, 0.264) (−0.003, 0.035) Real (0.250, 0.096) (0.858, 0.062) (0.406, 0.784) (−0.041, 0.515) 1 std denotes the standard deviation. Table 2.4. Computation times for Bayesian calibration and prediction. Estimation (time) describes the time spent in maximizing to estimate ψ1 and ψ2 , while sampling (time) describes the time spent in calculating predictions and credible intervals via sampling from posteriors. Case Estimation Sampling Case Estimation Sampling 1a 1.177 hours 5.305 hours 1b 1.195 hours 5.160 hours 2a 2.828 hours 8.785 hours 2b 2.796 hours 8.941 hours 3a 7.314 hours 24.377 hours 3b 7.221 hours 24.240 hours 2c 2.875 hours 9.044 hours Real 8.237 mins 2.998 hours for a computation model as follows. Eight heights uniformly ranging from 9.3 to 17.5 are org chosen. For calibration inputs, θ1 org is chosen from {0.05, 0.2, 0.35} and θ2 is chosen from {0.7, 0.90, 0.99}. 500 iterations are used for Stage 1 and 100 iterations are used for Stage 2. The computation time is 30 minutes as shown in Table 2.4. org Table 2.3 shows the estimates of all calibration parameters. The prior means of θ1 org θ2 and are 0.2 and 0.863, and their prior standard deviations are 0.150 and 0.148, respectively. org The resulting posterior means of θ1 org and θ2 from Bayesian analysis are 0.250, and 0.858, and their standard deviations are 0.096 and 0.062, respectively. We withhold observations for the last two years during the calibration procedure in order to validate our approach by comparing predictions on the last two years (CT images 3 and 4, taken at 2.3 and 3.3 years, respectively). In contrast to the simulation study, we compare the results with observations (noisy true values) since the true process is not available in 43 the real observations. The difference between the true process and the observation (the measurement error) is estimated to be N (0, 0.01) from the real observation data used for Bayesian calibration as shown in Table 2.2. There are some larger discrepancies in the lower part of predictions at 3.2 years, which can be shown by the corresponding larger relative errors in Fig. A.2b. Most of the relative errors are less than 5%. Besides, Figure 2.11a shows how close the Bayesian calibration model predicts the observation (i.e., noisy true) values. From Fig. 2.11b, we also find that only one observation value lies outside of the credible bands. These results support that our approach has a capability to predict AAA expansion using real data. 3.1 O[2.3] P[2.3] O[3.2] P[3.2] 2.9 2.7 Radius 2.5 2.3 2.1 1.9 1.7 1.5 1.3 10 12 14 16 18 Height (a) Predictions of the observation processes. O[t] denotes the observation values at year t, while P [t] denotes the predictions at year t. The solid line denotes the observations, while the dashed line denotes the predictions. (b) 3D 95% credible band of predictions. The blue stars denote the observations lying inside the credible band. The red marks denote an observation value lying outside the credible band. Figure 2.11. Predictions and credible bands for Real observation. 2.6 Discussion and Conclusion From prediction graphs in Figs. 2.8e and 2.8f of Case 3 (simulation study), we notice that the diameters of proximal and distal ends of the true line at year 7 tend to be smaller than the 44 previous ones. When the AAA is gradually expanding, the end of both regions can be axially compressed, which means the radii of the two ends are often contracted. The computation model successfully captures this phenomenon. Currently we are studying longitudinal images of abdominal aortic aneurysms, registered with the vertebral column. We speculate that the renal vein and artery, superior mesenteric artery, and iliac bifurcation can serve as an anchor (both longitudinal and circumferential directions at the superior and inferior boundaries) to the infrarenal AAA during expansion. Hence, it may be possible that the physical constraints of the tethering of those vessels provide a strong confinement, or an anchor at the region of the aorta. During the AAA expansion, the volume of the AAA’s sac will gradually increase while stretching mostly in the circumferential direction and slightly in the axial direction simultaneously. Hence, because of the AAA expansion and the confinement in axial direction, the neck and distal part of aorta (from renal branches to the AAA’s sac) will be compressed in the axial direction. Using the 3D growth and remodeling simulation, Zeinali et al. (2012) [115] show the local change in the stress distribution, in which the stress of the sac is increased but the neck’s stress is decreased. Recall that we estimated α1 and α2 a-priori before Bayesian calibration since we assume that the peak location of the future AAA and its overall geometry do not change significantly from the previous scans. As illustrated in the real observation case, the AAA peak location and the aneurysm shape did not significantly change during the follow-up images, which support our approach. At present, implementing a full Bayesian analysis especially in a large data case is not practical since Stage 2 (the process of estimating ψ2 ) in Fig. 2.7 is time consuming even for one iteration. If we adopt a full Bayesian method, e.g., Gibbs sampling, we need to update ψ2 in every iteration. In this case, the computational time will quickly accumulate to a 45 non-feasible numerical solution. This is the main reason to use point estimators ψ1 and ψ2 instead of using a full Bayesian method. The case studies presented in the paper using our approach with point estimators show this method’s feasibility and good performance. The estimates of hyperparameters in all cases are shown in Supplementary Tables A.1– A.8. Estimates of β11 (the coefficient for time t) are always positive, which means the radii of AAAs increase in time due to AAA expansion. Estimates of β12 (the coefficient for θ1 ) are also positive in all cases, which means the QoI, i.e., the radii of AAA increases as θ1 increases. This can be explained by the damage function (2.1). The QoI increases while the elastin contents decreases as the amplitude of the damage function (e.g., θ1 ) increases. Estimates of β13 (the coefficient for θ2 ) are negative in the real observation case while all β13 ’s are positive in the simulation cases. This could be due to the fact that θ2 values are largely different between real and simulation observations. In particular, θ2 samples values from {0.7, 0.9, 0.99} in the real observation case while θ2 samples values from {2, 6, 10} in the simulation cases (with 6 being the true value). The largely different parameters values could give different local sensitivities of the QoI with respect to parameters when the influence of the parameters is rather complex. On the other hand, the estimates of β21 , β22 and σ22 in all cases are small. This implies that the computation model could explain most of the linear and covariance structures of the true process. The results from the simulation case study suggest that Bayesian calibration may be used to combine the computation model, prior and uncertainty models, and real observations to predict the QoI at future times or at unobserved locations. Computation model output data provide a deterministic trend of the expansion process, which is modeled by a Gaussian process. Additionally, computation model discrepancies obtained from real observations are modeled by another Gaussian process. When we have more information from the computa46 tion model (in the form of finer grids for inputs Xc ), we achieve lower prediction errors, and the posteriors of parameters θ1 and θ2 are more likely to concentrate on their true values as illustrated in our simulation results. We also find that posteriors and predictions from our approach are robust to the selection of priors for θ1 and θ2 . In the real observation case of one patient, the results of Bayesian calibration indicated that the predictions are reasonably good when compared with the unused last two observations. Most of the unused observations match reasonably well with predictions and lie inside the 95% credible bands. The model and the observation errors collectively capture the structure of the true process in a consistent manner from a Bayesian perspective. Considering these results, we conclude that the Bayesian calibration process is capable of predicting complex G&R AAA processes. We also provide evidence that this holds true for a set of real longitudinal observations, where Bayesian calibration produces good predictions. However, there is a need for validating our approach with a large number of real observation cases to evaluate its performance and efficacy in a clinical sense. We believe that for given past CT images of a patient, Bayesian calibration can help to guide the scheduling of future CT scans according to predictions with the credible bands. We also used the 2D axisymmetric G&R model ignoring the time evolution of the centerline in order to simplify the formulation of the Bayesian calibration process and use only two parameters to study the effectiveness of Bayesian calibration. As future work, we plan to consider complicated damage functional shapes with more parameters as well as asymmetric AAA expansion in a 3D space. Additionally, the linearity assumption on the mean functions for Gaussian processes in (2.5) will be relaxed to be more flexible for better prediction performance. We also plan to investigate how to deal with high-dimensional calibration parameters when flexible and complex functions are used for damage and the rate of mass 47 production. Finally, we will investigate how to incorporate other available patient-specific data for Bayesian calibration of a particular patient case. 2.7 Data Accessibility More information about the results is summarized in the supplementary document in the forms of tables and figures. The real observation data set used for the real observation case in this paper (Fig. 2.5) is available in the websites of the corresponding authors (http: //www.egr.msu.edu/∼jchoi/files/papers/index.html and http://www.egr.msu.edu/∼sbaek/ Publications.html). 48 APPENDICES 49 Appendix 1 Programming steps for simulated observations Figure 2.6 shows us the whole process of Bayesian analysis which combines the information of priors, computer model, statistical model and observations to calibrate the parameter θ to make predictions. To be more specific, the programming steps for the simulation study are given as follows. The sections, tables, equations referred here are from the main paper. Step 0: Specify the statistical models for m1 (x, θ) and m2 (x) (See Sections 2.(d)). Step 1: Set the initials including true values of ψ2 and β2 , normal priors of θ, log-normal priors of ψ1 and ψ2 . See prior assumptions in Section 3. Step 2: Determine inputs Xc , Xo and Xp as described in Table 2. Step 3: Simulate the full data d = [y T z T ]T by using Xc , Xo and true values of ψ2 and β2 as described in Section 4. Step 4: Run Stage 1 with 500 iterations to obtain the estimate ψ̂1 . See Section 3. Step 5: Run Stage 2 with 100 iterations to obtain the estimate ψ̂2 . See Section 3. Step 6: Draw samples of θ from the posterior distribution p(θ | ψ̂, d) with estimated ψ̂ = [ψ̂1T , ψ̂2T ]T plugged in. See Section 3. Step 7: Plug samples of θ, estimated ψ, full data d and Xp into (3.10) and (3.11) to get predictions and their corresponding variances. See Section 3. 50 Appendix 2 Other Results Table A.1. Priors and Estimates of Hyperparameters for Case ψ1 ωx1 ωx2 ωθ1 mean 1 1 1 Lognormal Prior variance 1.1 1.1 1.1 Estimates from Stage1 0.641 6.552 0.044 ? ψ2 λ ωx1 True values 0.001 1 mean 0.01 1 Lognormal Prior variance 0.1 1.1 Estimates from Stage2 0.0007 0.270 β β10 β11 β12 β13 Estimates 0.008 0.626 0.096 0.290 1a ωθ2 σ12 1 1 1.3 1.2 0.725 0.355 ? σ22 ωx2 1 0.005 1 0.005 1.1 0.2 0.185 0.022 β21 β22 0.016 -0.048 Table A.2. Priors and Estimates of Hyperparameters ψ1 ωx1 ωx2 mean 1 1 Lognormal Prior variance 1.1 1.1 Estimates from Stage1 0.703 6.406 ψ2 λ True values 0.001 mean 0.01 Lognormal Prior variance 0.1 Estimates from Stage2 0.002 β β10 β11 β12 Estimates 0.0008 0.628 0.096 1b ωθ2 σ12 1 1 1.3 1.2 0.702 0.290 ? ωx2 σ22 1 0.001 1 0.001 1.1 0.2 0.791 0.0007 β21 β22 0.024 0.017 51 for Case ωθ1 1 1.1 0.046 ? ωx1 1 1 1.1 0.442 β13 0.291 Table A.3. Priors and Estimates of Hyperparameters ψ1 ωx1 ωx2 mean 1 1 Lognormal Prior variance 1.1 1.1 Estimates from Stage1 0.803 6.815 ψ2 λ True values 0.001 mean 0.01 Lognormal Prior variance 0.1 Estimates from Stage2 0.001 β β10 β11 β12 Estimates 0.009 0.637 0.101 for Case 2a ωθ1 ωθ2 1 1 1.1 1.3 0.059 0.675 ? ? ωx2 ωx1 1 1 1 1 1.1 1.1 0.384 0.272 β13 β21 0.315 -0.086 σ12 1 1.2 0.261 σ22 0.005 0.005 0.2 0.122 β22 -0.016 Table A.4. Priors and Estimates of Hyperparameters ψ1 ωx1 ωx2 mean 1 1 Lognormal Prior variance 1.1 1.1 Estimates from Stage1 0.700 6.836 ψ2 λ True values 0.001 mean 0.01 Lognormal Prior variance 0.1 Estimates from Stage2 0.004 β β10 β11 β12 Estimates 0.030 0.638 0.102 for Case 2b ωθ1 ωθ2 1 1 1.1 1.3 0.054 0.990 ? ? ωx1 ωx2 1 1 1 1 1.1 1.1 0.331 1.103 β13 β21 0.313 -0.014 σ12 1 1.2 0.304 σ22 0.001 0.001 0.2 0.003 β22 -0.009 Table A.5. Priors and Estimates of Hyperparameters for Case ψ1 ωx1 ωx2 ωθ1 mean 1 1 1 Lognormal Prior variance 1.1 1.1 1.1 Estimates from Stage1 0.794 6.467 0.054 ? ψ2 λ ωx1 True values 0.001 1 mean 0.01 1 Lognormal Prior variance 0.1 1.1 Estimates from Stage2 0.001 0.353 β β10 β11 β12 β13 Estimates 0.009 0.641 0.101 0.312 52 2c ωθ2 1 1.3 0.818 ? ωx2 1 1 1.1 2.925 β21 0.006 σ12 1 1.2 0.283 σ22 0.001 0.001 0.2 0.008 β22 -0.003 Table A.6. Priors and Estimates of Hyperparameters for Case ψ1 ωx1 ωx2 ωθ2 mean 1 1 1 Lognormal Prior variance 1.1 1.1 1.1 Estimates from Stage1 0.898 15.920 0.250 ? ψ2 λ ωx1 True values 0.001 1 mean 0.01 1 Lognormal Prior variance 0.1 1.1 Estimates from Stage2 0.003 0.210 β β10 β11 β12 β13 Estimates 0.070 0.621 0.094 0.341 3a ωθ2 σ12 1 1 1.3 1.2 1.640 0.533 ? σ22 ωx2 1 0.005 1 0.005 1.1 0.2 0.054 0.032 β21 β22 -0.030 0.004 Table A.7. Priors and Estimates of Hyperparameters for Case ψ1 ωx1 ωθ1 ωθ1 mean 1 1 1 Lognormal Prior variance 1.1 1.1 1.1 Estimates from Stage1 0.951 15.786 0.217 ? ψ2 λ ωx1 True values 0.001 1 mean 0.01 1 Lognormal Prior variance 0.1 1.1 Estimates from Stage2 0.001 0.637 β β10 β11 β12 β13 Estimates 0.066 0.616 0.094 0.345 3b ωθ2 1 1.3 1.971 ? ωx2 1 1 1.1 1.705 β21 -0.020 σ12 1 1.2 0.526 σ22 0.001 0.001 0.2 0.0006 β22 0.015 Table A.8. Priors and Estimates of Hyperparameters for real observations ψ1 ωx2 ωx2 ωθ1 ωθ2 σ12 mean 1 1 1 1 1 Lognormal Prior variance 1.1 1.1 1.1 1.3 1.2 Estimates from Stage1 0.242 1.702 0.082 0.008 0.729 ? ? ψ2 λ ωx1 ωx2 σ22 mean 0.01 1 1 0.1 Lognormal Prior variance 0.1 1.1 1.1 0.2 Estimates from Stage2 0.01 1 1 0.001 β β10 β11 β12 β13 β21 β22 Estimates 0.182 0.537 0.163 -0.189 0.021 -0.058 53 0.22 0.22 Year 3.5 Year 4.0 Year 4.5 Year 5.0 Year 5.5 Year 6.0 0.2 0.18 0.18 0.16 Relative errors Relative errors 0.16 0.14 0.12 0.1 0.08 0.14 0.12 0.1 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 Year 3.5 Year 4.0 Year 4.5 Year 5.0 Year 5.5 Year 6.0 0.2 0 3 6 9 12 0 15 0 3 6 Height (a) Case 1a 0.18 0.18 0.16 0.14 0.12 0.1 0.08 0.14 0.12 0.1 0.08 0.06 0.06 0.04 0.04 0.02 0.02 3 6 9 12 0 0 15 3 6 Height 9 12 15 Height (c) Case 2a (d) Case 2b 0.22 0.22 Year 4.5 Year 5.0 Year 5.5 Year 6.0 Year 6.5 Year 7.0 0.2 0.18 Year 4.5 Year 5.0 Year 5.5 Year 6.0 Year 6.5 Year 7.0 0.2 0.18 0.16 Relative errors 0.16 Relative errors 15 Year 4.5 Year 5.0 Year 5.5 Year 6.0 Year 6.5 Year 7.0 0.2 Relative errors 0.16 Relative errors 0.22 Year 4.5 Year 5.0 Year 5.5 Year 6.0 Year 6.5 Year 7.0 0.2 0.14 0.12 0.1 0.08 0.14 0.12 0.1 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 12 (b) Case 1b 0.22 0 0 9 Height 0 3 6 9 12 15 0 0 Height (e) Case 3a 3 6 9 Height (f ) Case 3b Figure A.1. Relative errors between predictions and tmrue values. 54 12 15 0.22 0.18 Year 2.3 Year 3.2 0.08 0.07 Relative errors 0.16 Relative errors 0.09 Year 4.5 Year 5.0 Year 5.5 Year 6.0 Year 6.5 Year 7.0 0.2 0.14 0.12 0.1 0.08 0.06 0.05 0.04 0.03 0.06 0.02 0.04 0.01 0.02 0 0 3 6 9 12 0 15 Height (a) Case 2c 10 12 14 16 18 Height (b) Real observation Figure A.2. Relative errors for Case 2c (a) and for real observation case (b). 55 12 12 priors of θ1 and θ2 priors of θ1 and θ2 posterior of θ1 posterior of θ 1 10 10 posterior of θ2 8 8 6 6 4 4 2 2 0 −1 −0.5 0 0.5 1 0 −1 1.5 (a) Case 1a posterior of θ2 −0.5 0 0.5 1 1.5 (b) Case 1b 10 18 priors of θ1 and θ2 9 posterior of θ 8 posterior of θ priors of θ1 and θ2 posterior of θ 16 1 1 posterior of θ 2 2 14 7 12 6 10 5 8 4 6 3 2 4 1 2 0 −1 −0.5 0 0.5 1 1.5 0 −1 2 (c) Case 2a −0.5 0 0.5 1 1.5 (d) Case 2b 25 30 priors of θ1 and θ2 priors of θ1 and θ2 posterior of θ1 posterior of θ1 25 posterior of θ2 20 posterior of θ2 20 15 15 10 10 5 0 −0.5 (e) Case 3a 5 0 0.5 0 −0.5 1 (f ) Case 3b Figure A.3. Prior and posterior densities of θ1 and θ2 56 0 0.5 1 12 1 priors of θ1 and θ2 10 prior of θ1 posterior of θ1 0.9 prior of θ2 posterior of θ2 0.8 posterior of θ1 posterior of θ2 0.7 8 0.6 6 0.5 0.4 4 0.3 0.2 2 0.1 0 −1 (a) Case 2c −0.5 0 0.5 1 0 −2 1.5 −1 0 (b) Real observation Figure A.4. Prior and posterior densities of θ1 and θ2 . 57 1 2 3 Chapter 3 Patient-Specific Prediction from Multi-subject Data 3.1 Introduction The computer model is built based on all the 26 subjects which determine the biomedical system of AAA enlargement. Therefore, the resembling mechanism of computer model considers both the within-subject and between-subject effects. But the Patient-specific Bayesian calibration calibrates the unobserved parameters in the computer model by consider real observations from only one subject, which might actually cause the calibration results to distort itself more on the mechanism determined by the specific patient. Therefore, it has practical meaning to build the Bayesian calibration model for multi-subject data so that the calibrated parameter could consider uncertainties from real observations across all the subjects. Bayesian calibration analysis of longitudinal imaging data it self is very challenging and interesting. On the one hand, the dimension of imaging data for a single patient at only one time point is very high, not to say multi-subject data collected at multiple time points. Thus, whatever models built for the data requires a heavy computational burden. On the other hand, variability of real observations happens both within-subject and between-subject across time. And as we know, calibration parameters are unknown inputs in the computer 58 model proposed by the biomedical engineers. Thus, how uncertainties in real observations across all the subjects affect uncertainties of calibration parameters is an important problem in Bayesian analysis, since no others have done it yet. But in Bayesian analysis, mixing and separating so many uncertainties with different properties is quite a challenging and worthwhile work to do. Given the above reasons, we model the Bayesian calibration for multi-subject data as follows. Mostly, we keep the same notation and symbols as Single-subject modeling case in Chapter 2. 3.2 Calibration model Suppose the total number of subjects is I, then for the i-th subject, the real observation zij can be modeled as zij = r(xij , θi ) + δ(xij ) + εij , ∀i ∈ {1, · · · , I}, (3.1) where j = 1, · · · , ni , which means we have ni observations from CT scan images of a given subject i. r(·, ·) is the computer model. δ(·) is the model error. ij is the observation error. We assume that each εij is independently distributed as N (0, λ), which represents a normal (Gaussian) random variable with mean 0 and variance λ. We further assume that observational errors are independent among and within subjects. 59 3.3 Statistical models Let GP(m(·), k(·, ·)) be the Gaussian process with the mean function m(·) and the covariance function k(·, ·). We introduce the following Gaussian processes as prior beliefs for the G&R computation model and the model error: r(xi , θi ) ∼ GP(m1 (xi , θi ), k1 (xi , x0i )), δ(xi ) ∼ (3.2) GP(m2 (xi ), k2 (xi , x0i )). To specify Gaussian process priors in (3.2) further, we introduce mean and covariance structures for both processes. For a mean structure, one can consider a linear combination of basis functions to approximate the general mean structure. m1 (xi , θi ) = h1 (xi , θi )T β1 (3.3) = β10 + β11 ti + β12 θi1 + β13 θi2 , where h1 (xi , θi ) = [1 ti θi1 θi2 ]T and β1 = [β10 β11 β12 β13 ]T . m2 (xi ) = h2 (xi )T β2 = β21 ti + β22 si , (3.4) where h2 (xi ) = [ti si ]T and β2 = [β21 β22 ]T . These mean structures imply that the mean function of the G&R computation model is linear in time ti and calibration parameters, {θi1 , θi2 }. The mean function of the model error is linear in time ti and location si . β = [β1T β2T ]T are hyperparameters for the mean functions in Bayesian context. For a covariance structure, we use the following exponential functions as follows. 60 k1 (xi , x0i ) = σ12 exp{−(xi − x0i )Ωx (xi − x0i )T } exp{−(θi − θi0 )Ωθ (θi − θi0 )T }, k2 (xi , x0i ) = σ22 exp{−(xi − x0i )Ω?x (xi − x0i )T }, where Ωx , Ω?x are diagonal matrices such that      ? ωx1  ωx1 0   ωθ1 0  ?  Ωx =   , Ωθ =   , Ωx =  0 ωx2 0 ωθ2 0  0   ? ωx2 ? ’s. so that the hyperparameters for the covariance functions are σ12 , σ22 and wxj , wθj , wxj For the parameters controlling the covariances, we introduce ψ = [ψ1 ψ2 ], with ψ1 = [ωx1 ωx2 ωθ1 ωθ2 σ12 ], ψ2 = ? [ωx1 ? ωx2 (3.5) σ22 ]. ψ1 is the set of hyperparameters related to the G&R computation model. ψ2 is the set of hyperparameters related to model error. 3.4 Computer outputs Let Ni be the number of computer outputs for each subject i and yi = [yi1 · · · ziNi ]T ∈ RNi ×1 . Then given the Ni -th inputs for subject i, (x∗iN , aiNi ), we have the corresponding i computer output yiNi = r(x∗iN , aiNi ). The corresponding variable input matrix is Xi,c = i [(x∗i1 )T , · · · , (x∗iN )T ]T ∈ RNi ×2 . We further define y = [y1T · · · yIT ]T ∈ RN ×1 as the comi puter model output vector. The final variable input matrix is Xc = [(X1,c )T , · · · , (XI,c )T ]T ∈ 61 RN ×2 . We augment variable inputs Xc with parameter inputs ai such that Xi,c (ai ) = [(x∗i1 , ai )T , · · · , (x∗iN , ai )T ]T ∈ RNi ×2 . The final input matrix for computer outputs is i Xc (a) = [X1,c (a1 )T , · · · , XI,c (aI )T ]T ∈ RN ×4 . Then, the GP assumption for the computer model gives yi ∼ N (H1 (Xi,c (ai ))β1 , V1 (Xi,c )), where H1 (Xi,c (ai )) denotes the matrix with rows h1 (x∗i1 , ai )T , · · · , h1 (x∗iN , ai )T and the i (j, k) entry of V1 (Xi,c ) for the i-th subject is k1 (x∗ij , x∗ik ), for j, k ∈ {1, 2, · · · , Ni }. We denote the mean and covariance as µyi = H1 (Xi,c (ai ))β1 and Vyi = V1 (Xi,c ) respectively. Finally, the computer output is y = [y1T · · · yIT ]T ∈ RN ×1 . The distribution of y is y ∼ N (µy , Vy ), where    µy1       µy   2  µy =  ,  ..   .      µyI and   0 ···  Vy1    0 Vy · · ·  2 Vy =   .. .. ...  . .   0 0 ··· 62 0 0 .. . VyI      .     3.5 Observations Let ni be the number of observations for each subject i. Let zi = [zi1 · · · zini ]T ∈ Rni ×1 be the set of observations corresponding to the variable input matrix Xi,o = [xTi1 , · · · , xTin ]T ∈ i T , · · · , X T ]T ∈ Rn×2 is not necessarily the same as the set of Rni ×2 . Note that Xo = [X1,o I,o variable inputs for the computer model outputs. In general, the number of observations is smaller than that of the computer model outputs since we can control the amount of the computer model outputs. To calibrate θ = [θ1 , · · · , θI ] from the observations, we augment variable inputs Xo with θi such that Xi,o (θi ) = [(xi1 , θi )T , · · · , (xini , θi )T ]T . Then, from the calibration model, we have zi ∼ N (H1 (Xi,o (θi ))β1 + H2 (Xi,o )β2 , λi Ini + V1 (Xi,o ) + V2 (Xi,o )), where H1 (Xi,o (θi )) denotes the matrix with rows h1 (xi1 , θi )T , · · · , h1 (xini , θi )T , H2 (Xi,o ) denotes the matrix with rows h2 (xi1 )T , · · · , h2 (xini )T . V1 (Xi,o ) is defined in a similar way to define V1 (Xi,c ). The (j, k) entry of V2 (Xi,o ) is k2 (xij , xik ), for j, k ∈ {1, 2, · · · , ni }. Denote µzi = H1 (Xi,o (θi ))β1 + H2 (Xi,o )β2 and Vzi = λi Ini + V1 (Xi,o ) + V2 (Xi,o )) Then the observation is z = [z1T · · · zIT ]T ∈ Rn×1 . The distribution of z is z ∼ N (µz , Vz ), 63 where    µz1       µz   2  µz =  ,  ..   .      µzI and    Vz1 0 · · ·    0 Vz · · ·  2 Vz =   .. .. ...  . .   0 0 ··· 3.6 0    0   . ..  .    VzI Full data We then combine the computer model outputs and observations, d = [dT1 · · · dTI ]T ∈ R(N +n)×1 which we call a data vector.    yi  di =   ∼ N (mdi (θi ), Vdi ), zi (3.6) where mdi (θi ) := E(di |θi , β, ψ) = H(θi )β, with   0  H1 (Xi,c (ai ))  H(θi ) =  , H1 (Xi,o (θi )) H2 (Xi,o ) and β = (β1T , β2T )T . 64 The full mean is md (θ) = [md1 (θ1 )T , · · · , md (θI )T ]T . I Denote H(θ) = [H(θ1 )T , · · · , H(θI )T ]T . We assume different subjects are independent, then the full covariance is   0 ···  Vd1 (ψ)    0 Vd2 (ψ) · · ·  Vd (ψ) =   .. .. ..  . . .   0 0 ··· 0 0 .. . Vd (ψ)      ,     I where   Vyi Vdi (ψ) := var(di |ψ) =  Cyi zi CyT z i i Vzi   , where Cyi zi ∈ Rni ×Ni is a cross covariance matrix whose (j, k) entry is k1 (x∗ij , xik ), for j ∈ {1, 2, · · · , Ni } and k ∈ {1, 2, · · · , ni }. Note that Cyi zi is not necessary to be a square matrix, since usually Ni 6= ni . 65 3.7 3.7.1 Bayesian analysis Joint posterior distribution We note that md (θ) also depend on β. We drop it to reduce notational complexity. We have the following full joint posterior distribution of (θ, β, ψ) given d, p(θ, β, ψ | d) ∝p(θ)p(ψ)f (d; md (θ), Vd (ψ)) ∝ I Y 1 p(θi )p(ψ)|Vd (ψ)|− 2 (3.7) i=1   1 T −1 × exp − {(d − md (θ)) Vd (ψ) (d − md (θ))} , 2 where f (d; m, V ) is the multivariate Gaussian density function with mean m and variance V. Clearly, md (θ) is a linear function of β, so we can find β | θ, ψ, d ∼ N (β̂(θ, ψ), W (θ, ψ)), (3.8) where β̂(θ, ψ) = W (θ)H(θ)T Vd (ψ)−1 d, W (θ, ψ) = (H(θ)T Vd (ψ)−1 H(θ))−1 , We can then integrate out β in (3.7) and get 1 p(θ, ψ | d) ∝ p(ψ)|W (θ, ψ)| 2 I Y i=1 1 p(θi )|Vdi (ψ)|− 2  1 × exp − {(di − H(θi )β̂(θ, ψ))T Vdi (ψ)−1 (di − H(θi )β̂(θ, ψ))} 2 66 (3.9)  Because of independence, β̂ and W in (3.9) can be simplified as  W (θ, ψ) =  I X i=1 −1 H(θi )T Vdi (ψ)−1 H(θi )  β̂(θ, ψ) = W (θ, ψ)  I X i=1 3.7.2  H(θi )T Vdi (ψ)−1 di  Full conditional of ψ1 We can estimate ψ1 by maximizing p(ψ1 |y) ∝ |V1 (Xc )|−1/2 p(ψ1 )|W1 (Xc (a))|1/2   1 T −1 × exp − (y − H1 (Xc (a))β̂1 ) V1 (Xc ) (y − H1 (Xc (a))β̂1 ) , 2 (3.10) where β̂1 = W1 (Xc (a))H1 (Xc (a))T V1 (Xc )y, W1 (Xc (a)) = (H1 (Xc (a))T V1 (Xc )−1 H1 (Xc (a)))−1 . 3.7.3 Full conditional of ψ2 We can estimate ψ2 by maximizing p(ψ2 |d, ψ1 ).To obtain this density, we can first write p(β2 , ψ2 |d, ψ1 ) ∝ p(β2 , ψ2 )p(z|y, β2 , ψ), since y is independent of the second stage hyper-parameters. We can’t obtain the distribution of [z|y, β2 , ψ] analytically, but starting with [z|y, β2 , ψ, θ], which is normally distributed. zij | β1 , β2 , ψ, θi ∼ N (h1 (xij , θi )T β1 + h2 (xij )T β2 , λ + V1 (xij ) + V2 (xij )) 67 E(zij |yi , β2 , ψ, θi , β1 ) = h2 (xij )T β2 + h1 (xij , θi )T β1 +t(xij )T V1 (Xi,c )−1 (yi − H1 (Xi,c (a))β1 ), where the k-th element of t(xij ) is k1 (xij , x∗ik ), for k = 1, . . . , Ni . We can obtain expressions for its mean vector. E(zij |yi , β2 , ψ) R = E(zij |yi , β2 , ψ, θi )p(θi )dθi R = h2 (xij )T β2 + h1 (xij , θi )T p(θi )dθi β̂1 , +t(xij )T V1 (Xi,c )−1 (yi − H1 (Xi,c (ai ))β̂1 ) where the k-th element of t(xij , θi ) is k1 (xij , x∗ik ), for k = 1, . . . , Ni . We can obtain expressions for its variance matrix. The covariance matrix is Vi,ψ2 = λIi + V2 (Xi,o ) + Ci , where the (j, k) element of Ci is R cov(r(xij , θi ), r(xik , θi )|yi , ψ1 )p(θi )dθi = k1 (xij , xik ) −tr{V1 (Xi,c )−1 t(xij )t(xik )T } R +tr{W1 (Xi,c (ai )) h1 (xij , θi )h1 (xik , θi )T p(θi )dθi } R −tr{W1 (Xi,c (ai ))H1 (Xi,c (ai ))T V1 (Xi,c )−1 t(xij ) h1 (xik , θi )T p(θi )dθi } R −tr{V1 (Xi,c )−1 H1 (Xi,c (ai ))W1 (Xi,c (ai )) h1 (xij , θi )p(θi )dθi t(xik )T } +tr{V1 (Xi,c )−1 H1 (Xi,c (ai ))W1 (Xi,c (ai ))H1 (Xi,c (ai ))T V1 (Xi,c )−1 ×t(xij )t(xik )T } 68  var r(xij , θi ), r(xik , θi ), yi |ψ1 , θi   t(xij )T var(r(xij , θi )) cov r(xij , θi ), r(xik , θi )     =  cov r(xij , θi ), r(xik , θi ) var(r(xik , θi )) t(xik )T   t(xij ) t(xik ) V1 (Xi,c )   T  k1 (xij , xij ) k1 (xij , xik ) t(xij )      =  k1 (xij , xik ) k1 (xik , xik ) t(xik )T      t(xij ) t(xik ) V1 (Xi,c )        Mean E r(xij , θi )|yi , ψ1 , θi  =h1 (xij , θi )T β1 + t(xij )T V1 (Xi,c )−1 (yi − H1 (Xi,c (ai )β1 ) Covariance cov r(xij , θi ), r(xik , θi )|yi , ψ1 , θi  =k1 (xij , xik ) − t(xij )T V (Xi,c )−1 t(xik ) n o −1 T =k1 (xij , xik ) − tr V (Xi,c ) t(xij )t(xik ) Plug the above two equations into the law of total covariance, we can get Ci . We then approximate [zi |yi , β2 , ψ] by a normal distribution with the above moments for the purpose of estimating ψ2 . To simplify the notation we write the mean vector as µi,ψ2 = H2 (Xi,o )β2 + η̂(Xi,o ) , where η̂(Xi,o ) is the vector with elements E(r(xij , θi )|yi ), i = 1, . . . , n. we can now integrate out β2 to obtain the approximation. p(ψ2 |di , ψ1 ) ∝ p(ψ2 )|Vi,ψ2 |1/2 |Wi,2 |1/2  −1 × exp − 21 (zi − H2 (Xi,o )β̂i,2 − η̂(Xi,o ))T Vi,ψ 2 (zi − H2 (Xi,o )β̂i,2 − η̂(Xi,o )) , 69 (3.11) where −1 (z − η̂(X )) β̂i,2 = Wi,2 H2 (Xi,o )Vi,ψ i i,o 2 −1 H (X ))−1 . Wi,2 = (H2 (Xi,o )T Vi,ψ 2 i,o 2 Full mean µψ2 = H2 (Xo )β2 + η̂(Xo ) where η̂(Xo ) = [η̂(X1,o )T , · · · , η̂(XI,o )] Full covariance   0 ···  V1,ψ2    0 V2,ψ2 · · ·  Vψ2 =   .. .. ..  . . .   0 0 ··· 0 0 .. . VI,ψ2      ,     p(ψ2 |d, ψ1 ) ∝ p(ψ2 )|Vψ2 |1/2 |W2 |1/2 n o × exp − 12 (z − H2 (Xo )β̂2 − η̂(Xo ))T Vψ−1 (z − H2 (Xo )β̂2 − η̂(Xo )) , (3.12) 2 where β̂2 = W2 H2 (Xo )T Vψ−1 (z − η̂(Xo )) W2 = 2 (H2 (Xo )T Vψ−1 H2 (Xo ))−1 . 2 Notice the following expression (z − H2 (Xo )β̂2 − η̂(Xo ))T Vψ−1 (z − H2 (Xo )β̂2 − η̂(Xo )) 2 Denote u = z − η̂(Xo ). 70 (3.13) Then expression 3.13 can be simplified as [(I − H2 (H2T V −1 H2 )−1 H2T V −1 )u]T V −1 [(I − H2 (H2T V −1 H2 )−1 H2T V −1 )u] =uT [(I − V −1 H2 (H2 V −1 H2 )−1 H2T )V −1 (I − H2 (H2T V −1 H2 )−1 H2T V −1 )]u =uT [V −1 − V −1 H2 (H2T V −1 H2 )−1 H2T V −1 ]u =uT Ru where R = [V −1 − V −1 H2 (H2T V −1 H2 )−1 H2T V −1 ] Since Vψ2 , W2 and R are symmetric, therefore we can decompose equation 3.12 into the products of patient-wise likelihoods. 3.7.4 Posterior distribution of calibration parameter θ p(θ | ψ, d) ∝ I Y 1 p(θi )|W (θ, ψ)| 2 i=1 (3.14)   1 T −1 × exp − {(d − H(θ)β̂(θ, ψ)) Vd (ψ) (d − H(θ)β̂(θ, ψ))} 2 3.7.5 Predictive distribution of true process ζ(·) Its mean function is given by E(ζ(x)|θ, ψ, d) =h(x, θ)T β̂(θ, ψ) (3.15) + t(x, θ)T Vd (ψ)−1 (d − H(θ)β̂(θ, ψ)), where    h1 (x, θ)  h(x, θ) =  , h2 (x) 71 and   t(x) =   V1 (x, Xc ) V1 {x, Xo } + V2 (x, Xo )  . Additionally, its covariance function is given by cov(ζ(xi ), ζ(x0i ) | θi , ψ, di ) = k1 (xi , x0i ) + k2 (x, x0 ) − t(x)T Vdi (ψ)−1 t(x0 , θ) + (h(x, θ) (3.16) − H(θ)T Vd (ψ)−1 t(x, θ))T W (θ)(h(x0 , θ) − H(θ)T Vd (ψ)−1 t(x0 , θ)). 3.8 Data Analysis As people are quite familiar with the classical method regression, people are tending to compare statistical methods with regression. In terms of modeling, calibration is a reverse process to regression, where instead of a future dependent variable being predicted from known explanatory variables, a known observation of the dependent variables is used to predict a corresponding explanatory variable. In our application problem, the observation of dependent variables is the Bio-geometrical shape of AAA enlargement, while the corresponding explanatory variables the calibration parameters. In terms of applications, Bayesian calibration is actually resembling the mechanisms of a physical system (here is a biomedical system) by calibrating the internal parameters (unobserved) in a way that the model outputs can enhance the similarity of the model outputs and real observations. But obviously, for example, in medical science, a regression model is sometimes used to identify the significant covariate or treatments. Bayesian calibration integrates the computer model 72 as an independent source of useful information and expertise besides the real observations, while a regular regression usually just has one data source which is just real observations. Bayesian calibration combines two sources of data–Computer data and real observations. The latter is usually more expensive to collect than the former one, as a result, computer data has much higher dimension than the real observations. Thus, the patient specific prediction modeling is usually more computationally intensive than the regular regression using one sources of data. And we can imagine, the computation of Bayesian calibration model for multi-subject data can be much more demanding. Therefore, we make two parts of the data analysis. One part is the analysis result for small sample (only two patients). Another part is the analysis result for big sample (which includes 6 patient.) 3.8.1 Small sample First, I run data analysis on the real observations from two subjects. For each subject, the computer data covers the time span of real data and future inputs. We use the real data to calibrate the model at one time point, while we use future inputs to make predictions at a second time point. Besides, we choose equally spaced 8 points along the center line to describe the AAA shape. The prediction results are shown as follows. Priors and estimates of hyperparameters are shown in Table 3.1. As we can see, the estimates of ψ2 are close to the prior means which are actually equal to true values. As shown in Fig. 3.1, all the real observations are captured by the credible intervals and the maximum relative error is 2.5%. The results shown in Fig. 3.2 is not as good as the results in Fig. 3.1. But it is expected as discussed before, there is an acceleration property of AAA expansion which sometimes can not be completely captured by the computer model. As the values of calibration parameters from the two subjects are close, we use the same prior for 73 both subject 1 and subject 2 (Figures 3.1d and 3.2d). 3.1 3.1 Observation Prediction 2.7 2.7 2.5 2.5 2.3 2.1 2.3 2.1 1.9 1.9 1.7 1.7 1.5 1.5 1.3 9 10 11 12 13 14 15 16 17 confidence band inside prediction 2.9 Radius Height 2.9 1.3 18 Radius 9 10 11 12 13 14 15 16 17 18 Height (a) Predictions of the observation processes. The solid line denotes the observations, while the dashed line denotes the predictions. (b) 95% credible band of predictions. The blue stars denote the observations lying inside the credible band. The red marks denote an observation value lying outside the credible band. 0.03 Relative errors 0.025 0.02 0.015 0.01 0.005 0 8 10 12 14 16 18 Height (c) Relative errors between predictions and true(d) Prior and posterior densities of θ1 and θ2 . values. Figure 3.1. Prediction results for subject 1. 3.8.2 Big sample The design of Bayesian calibration is given as follows. For each subject, the computer data covers the time span of real data and future inputs. The real data are used to calibrate the model at the first two time points, while future inputs are used to make predictions at 74 3.1 3.1 Observation Prediction 2.7 2.7 2.5 2.5 2.3 2.1 2.3 2.1 1.9 1.9 1.7 1.7 1.5 1.5 1.3 9 10 11 12 13 14 15 16 17 confidence band inside outside prediction 2.9 Radius Height 2.9 1.3 18 Radius 9 10 11 12 13 14 15 16 17 18 Height (a) Predictions of the observation processes. The solid line denotes the observations, while the dashed line denotes the predictions. (b) 95% credible band of predictions. The blue stars denote the observations lying inside the credible band. The red marks denote an observation value lying outside the credible band. 0.12 Relative errors 0.1 0.08 0.06 0.04 0.02 0 8 10 12 14 16 18 Height (c) Relative errors between predictions and true(d) Prior and posterior densities of θ1 and θ2 . values. Figure 3.2. Prediction results for subject 2. last two time points. Therefore, the computer data totally have 4 time points. Besides, we choose equally spaced 8 points along the center line to describe the AAA shape. The results are shown as follows. When we have 6 patients, there are totally 12 calibration parameters to be inferred by Bayesian analysis since each patient has two calibration parameters. And we can imaging, drawing samples from an unknown density with 12 parameters is not an easy task to 75 Table 3.1. Priors and Estimates of Hyperparameters ψ1 ωx2 ωx2 mean 1 1 Lognormal Prior variance 1.1 1.1 Estimates from Stage1 0.186 1.738 ψ2 λ mean 0.01 Lognormal Prior variance 0.1 Estimates from Stage2 0.01 β β10 β11 β12 Estimates 0.006 0.418 0.146 ωθ1 ωθ2 σ12 1 1 1 1.1 1.3 1.2 0.0832 0.007 0.640 ? ? ωx1 ωx2 σ22 1 1 0.1 1.1 1.1 0.2 1 1 0.001 β13 β21 β22 -0.176 0.053 -0.016 accomplish. As discussed above, Bayesian calibration is an iterative process of updating uncertainty distributions on the calibration parameters in a way that is consistent with the observed data. Therefore, it is more important for the model to update uncertainty distributions on the calibration parameters by considering all the parameter information of all the subjects. Therefore, we assume all the subjects have the same calibration parameter values with the same distributions. From the computer simulations, we chose values of θ1 ranging from 0.5 to 0.8 and values of θ2 ranging from 1 to 9 as the calibration inputs. Then I used normal priors which can cover the range of calibration parameters (See Figure 3.3). Finally the posterior will determine a smaller range of values as the update of the values obtained from computer simulations. 3.9 Conclusion Based on the results, we can see individualized version of Bayesian calibration is powerful in integrating individual information (uncertainty) of calibration parameters to update the range of them. So these new ranges and distributions of calibration parameters will improve 76 2.5 0.6 prior of 3 1 posterior of 3 1 prior of 3 2 posterior of 3 2 0.5 2 0.4 1.5 0.3 1 0.2 0.5 0 0.1 0 0.2 0.4 0.6 0.8 1 1.2 (a) Prior and posterior densities of θ1 0 1.4 1 2 3 4 5 6 7 8 9 (b) Prior and posterior densities of θ2 . Figure 3.3. Prior and posterior results for the data consisting of 6 patients. in some way the consistency between estimated and observed, whenever researchers are generating simulations from computer code or making predictions of future observations. Computational difficulties mainly come from drawing samples of calibration parameters from posteriors by using Metropolis Hasting algorithms. As we can see from equation 3.14, posterior density is not tractable of calibration parameters analytically. Therefore, once we incorporate large number of subjects into the model which means the number of calibration parameters is high, then the computation of high dimensional Metropolis Hasting Algorithms is very hard to converge and control. Therefore, we have to think of better ways to tackle this problems. Additionally, as introduced at the beginning of this dissertation, we should think of using Latin hypercube sampling to improve the design of Computer Experiments, especially in the Bayesian calibration model for multiple subjects. Because the computational time is increasing significantly when we include more subjects into the model and make predicitons. 77 Chapter 4 Alzheimer’s Disease Progression Analysis Using Multi-state Markov Model 4.1 Introduction With the rapid aging of the world population, approximately 10 million people worldwide are affected by Alzheimer’s Disease. It is a leading cause of death behind cardiovascular disease and cancer. Nearly 10% of people 65 years of age and older are affected by Alzheimer’s Disease. While the disease is most common in the elderly, it has been diagnosed in patients in their 40s and 50s. Alzheimer’s Disease (AD) is a progressive, irreversible, degenerative brain disorder, causing impaired memory, thinking and behavior [76]. These impairments are related to the death of brain cells and the breakdown of the connections between them. From the onset of the symptoms, the disease usually progresses over a period of eight years. But the disease can last for up to 20 years. Three stages are defined in the advance of AD, from early signs, mild cognitive impairment to severe dementia. Generally speaking, the early signs of AD appear after the age of 60, which often include loss of recent memory, faulty judgment, 78 and changes in personality. Individuals in the mild stage may even forget how to do simple routine tasks such as dressing or bathing. In the final stages of the disease patients are bedridden and frequently develop coexisting illnesses. Most commonly, individuals with AD die from pneumonia. Although the risk of developing AD increases with age, AD is quite different from the normal aging process, because AD is a disease, while the normal aging process is not. Without this disease, the human brain often can operate to the age of 100 and beyond. The research of Alzheimer’s Disease has been divided into three broad categories: causes and risk factors, diagnosis, and treatment. Although researchers cannot provide definitive answers to the causes of Alzheimer’s Disease until now, they’ve made progress to approach the truth in many ways. For example, genetic predisposition, abnormal protein deposits in the brain and environmental factors are suspected to play a role in the development of the disease. In order to understand the pathology of the AD brain, scientists need to understand the structure and function of the aging nervous system. Particularly to find the causes and risk factors in AD, they should understand the loss of communication between neurons and the death of neurons in the AD brain. The research on diagnosis research is focused on finding ways to identify markers (indicators) of dementias, develop and improve ways to test patients, determine causes and assess risk factors, and improve methods for finding cases and sampling populations. When it comes to treatment, researchers are working to slow AD’s progression, delay its onset, or eventually, prevent it altogether. They are also developing ways to improve a patient’s ability to function, and to support caregivers of people with AD. For full descriptions, please refer to the Laboratory of Neuro Imaging website [76]. The data we are using in this paper to understand the structure of the brain system is magnetic resonance imaging (MRI). MRI is currently the imaging modality of choice 79 for assessing neurologic disorders, because of the exclusive capacity of providing excellent anatomical detail in a safe setting. MRI is an imaging technique based on the principles of nuclear magnetic resonance that detects proton signals from water molecules and that allows us to produce high quality images of the internal structure of the living brain. To distinguish between gray matter, white matter, and cerebrospinal fluid (CSF), certain protocols are designed to create anatomical images of the brain. As a result, MRI is able to differentiate between tissue types. By providing the contrast needed to distinguish these, MRI allows researchers to measure the sizes as well as various other properties of different parts of the brain [71]. Among all the measurement, quantitative assessment of brain volumes, obtained through volumetric MRI, has been increasingly applied in recent years to a wide range of neurologic conditions due to advances in computational technology. Typical current anatomical images of the brain captured by MRI have a spatial resolution of approximately 1 cubic millimeter (mm3 ). The volume of an adult human brain is, on average, between 1,131,0001,273,000 mm3 , with substantial variation between individuals [1]. Although the spatial resolution of modern MRI protocols is very high, 1mm3 of cortical gray matter can contain between 10,000 and 60,000 neurons, up to four times as many glial cells per neuron [84], as well as neuronal processes, blood vessels, intracortical myelin and dendritic spines. Measures of brain volumes have been shown to be valid biomarkers of the clinical state and progression by offering high reliability and robust inferences on the underlying disease-related mechanisms [42]. In general, atrophy rates in different brain structures of AD patients are related to deterioration of cognitive performance [34]. During all these years the clinical progression of AD has been studied for subjects in different stages. Subjects with the traditional amnestic MCI (mild cognitive impairment) have higher annual rate of conversion to AD than NC (normal controls) [78]. In particular, 80 MCI subjects with abnormalities in both CSF and medial temporal lobe has a very high risk of progression to AD [8], where the hippocampal atrophy rate is in the order of 3.5% per year compared to about 1% per year of MCI subjects not converting to AD [28]. However, in addition to structures of the medial temporal lobe, volumes of cortical regions in parietal and lateral temporal lobes are also used to predict the likelihood of progression to AD in MCI subjects [64]. Moreover, an MRI score reflecting the degree of AD-like brain atrophy features showed slightly higher predictive value for future cognitive impairment than CSF measures [105]. In longitudinal medical studies, patients are observed over time and covariate information is collected at several occasions. The analysis in such studies where individuals may experience several events is often performed using multi-state models. A multi-state model (MSM) is a model for a continuous time stochastic process allowing individuals to move among a finite number of states [70]. For example, the survival analysis is actually a two-state model, while the continuous time Markov model is one special kind of multi-state model. Particularly, in our study, each subject is scanned at each visit and thus MRI data are collected longitudinally. The changing volumes in each brain region plotted by these MRI scans can be considered as covariate information to predict how AD individuals progress from early signs, mild cognitive impairment to severe dementia. Commenges, D. [17, 18] have given reviews on the analysis of interval-censored observations from multi-state models. Sukkar, Rafid, et al. [100] used Hidden Markov Model (HMM) to help improve the assessment of Alzheimer’s Disease progression, while Wang Ying, et al [107] proposed a spatio-temporal methodology based on Hidden Markov Models (HMM), and apply it on four-dimensional structural brain magnetic resonance imaging series of older individuals. Yu, Hong-mei, et al. [113] applied Multi-state Markov model to explore known risk factors’ effects on each of 81 the possible transitions in the progression of Alzheimer’s Disease. Besides, a bunch of other references [14, 35, 43, 93, 101, 118] focus on various research interests by using different prediction techniques. Nevertheless, before doing any modeling of AD progression, two things should be emphasized and understood carefully: measurement of progression and limitations of prediction. Gelb, Douglas J. [37], from a clinician’s perspective, reviewed questions that can be answered by the natural course of AD, and specifically, information regarding measures of functional impairment and how they change over time, whereas Barnes, Deborah E. [4] pointed out the limitations of most AD risk prediction strategies and suggested that future techniques should simultaneously model the risk of mortality as well as the risk of AD over the full preclinical spectrum and to consider the potential harm as well as the benefit of identifying and treating high-risk older patients. In this paper, we applied the multi-state Markov model to resemble the progression of Alzheimer’s Disease. We treated the clinical diagnosis as the response variable (state space), which consists of 3 levels (states)– NC (Normal control), MCI (Mild cognitive impairment) and AD (Alzheimer’s Disease). Meanwhile, we treated the screening time (the time when MRI scans are collected)) as the timeline along which the Markov chain undergoes transitions from one state to another. Besides, we assumed that transition rates are assumed to be independent of time (i.e., Markov model is homogeneous), but they depend on patient related variables, such as MRI regional volumetric information, cognitive assessment information and demographical information. More specifically, we incorporate them into the Markov model by considering the logarithm of an transition intensity as a linear function of relevant predictors. Then we performed likelihood ratio tests (LRT) to compare which model fits the data better. Finally, we made predictions based on transition rates and transition probabilities by using a supervised learning method. And leave-one-out cross validation results are provided. As 82 a whole, our contributions are listed as follows. First, for this data set, age is recognized as not a statistically significant risk factor in our Markov model. Second, different sources of information are integrated together to describe the overall trend of the progression of Alzheimer’s Disease. Third, transition rates and transition probabilities from the Markov model provide us a potential direction for predicting the course of Alzheimer’s Disease. The remainder of this paper is structured as follows: Section 4.2 introduces how we got and processed the data. Section 4.3 explains in details the multi-state Markov model used for prediction, the Likelihood ratio test (LRT) for comparison and the survival analysis for interpretation. Section 2.5 presents with 4 models, the model without any covariates, the model with a single covariate Age, the model with a single covariate ApoE4 and the model for predictions with all the known risk factors in the data. Section 2.6 summarizes the implications of our findings and provide future perspectives. 4.2 Data ADNI is an ongoing, longitudinal, multi-center study designed to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of Alzheimer’s Disease (AD). The ADNI study began in 2004 and included 400 subjects diagnosed with mild cognitive impairment (MCI), 200 subjects with early AD and 200 elderly control subjects. This initial phase of the study is known as ADNI1. In 2009, ADNI1 was extended with ADNI GO which assessed the existing ADNI1 cohort and added 200 participants identified as having early mild cognitive impairment (EMCI). The objective of this phase was to examine biomarkers in an earlier stage of disease. In 2011, as ADNI GO was ending, ADNI2 began. ADNI2 assesses participants from the ADNI1/ADNI GO cohort in addition to the 83 following new participants–150 elderly controls, 100 EMCI participants, 150 LMCI (late ”mild cognitive impairment”) participants and 150 mild AD patients. We downloaded the ”adnimerge” data from ADNI in 12/8/2014. The ”adnimerge” data merges together several of the key variables from various case report forms and biomarker lab summaries across the ADNI protocols (ADNI1 , ADNIGO, and ADNI2). It is updated daily directly from the Alzheimer’s Disease Cooperative Study (ADCS; http://adcs.org/) data system and posted to LONI. We separate out all the 570 subjects in ADNI1 as our target data set. To reduce the impact of unbalanced longitudinal design and characterize the way the response changes in time, we only keep the 456 subjects who have more than 4 observations and remove all the others. Among them, 139 subjects, 105 subjects and 66 subjects stay in NC (Normal control), MCI (Mild cognitive impairment) and AD (Alzheimer’s Disease) all the time respectively, while 24 subjects, 8 subjects, 105 subjects and 2 subjects convert from NC to MCI, from MCI to NC, from MCI to AD and from AD to MCI respectively. Besides, 3 subjects convert from NC to MCI to AD. 2 subjects oscillate between NC and MCI, while another 2 subjects oscillate between MCI and AD. This cleaned dataset has 272 male subjects and 184 female subjects. Each observation on every subject is comprised of 3 sources of information – longitudinal MRI regional volumetric information about each subject including ”Ventricles”, ”Hippocampus”, ”WholeBrain”, ”Entorhinal”, ”Fusiform”, ”MidTemp”, ”ICV” (intracranial volume), cognitive assessment information like ”MMSE” (Mini Mental State Exam [21]) and ”CDR-SB” (Clinical Dementia Rating-Sum of Boxes [12]), demographical information like ”Age”, ”Gender”, ”Race”, ”Education”, ”Marital Status” and one pivotal genetic characteristic ”ApoE4”. Up to 9 structural MRI scans are available for each subject. The average MRI data collecting time among all the 456 subjects throughout the ADNI1 study is about 3.5 years, while the min84 Table 4.1. Distribution of age in each Age Diagnosis mean median sd NC 75 75 5.09 76 76 4.58 NC-MCI MCI-NC 71 74 7.16 MCI 75 75 7.18 73 73 7.09 MCI-AD AD-MCI 83 83 0.11 74 74 7.18 AD diagnosis group min 60 63 62 58 55 83 56 max 90 85 80 87 88 83 88 mean 29 29 29 28 24 26 22 MMSE median sd min 29 1.15 22 29 1.26 24 29 1.33 24 28 2.14 18 25 4.09 0 26 2.10 22 23 4.42 5 max 30 30 30 30 30 29 30 imum is about 1.5 years and maximum is about 8 years. For patients who are diagnosed as NC or MCI at baseline (first scan), MRI scans are collected every half year within the beginning 2 years and every year after 2 years. For the patients who are diagnosed as AD at baseline, MRI scans are collected every half year within the first year and once at the second year, therefore we get 4 time points for subjects staying in AD all the time. Based on the age of each patient at baseline, the 456 subjects can be divided into 4 age groups – 13 subjects in 50-60, 74 subjects in 60-70, 275 subjects in 70-80 and 94 in 80-90. In other words, almost 60.31% of the subjects are between 70 and 80. This percentage is consistent with the natural history timeline that most Alzheimer’s Disease patients were diagnosed between the ages of 70 and 79 [54]. Additionally, distributions of age in each diagnosis group are given in Table 4.1. But we haven’t observed any significant differences of age distribution between different diagnosis groups. As a matter of fact, the relationship between brain aging and Alzheimer’s Disease (AD) is contentious. One view holds AD results when brain aging surpasses a threshold. The other view postulates AD is not a consequence of brain aging. For more details, please refer to Swerdlow, H [102]. To describe and visualize the all the 456 subjects, I made the longitudinal trace plot (Figure 4.1) and longitudinal interaction plot (Figure 4.2). As shown in Figure 4.1, the percentage of hippocampus volume to ICV is decreasing with age which is known to all. 85 0.7 Hippocampus 0.4 0.5 0.6 0.7 0.3 Hippocampus 0.4 0.5 0.6 0.2 0.3 0.2 55 60 65 70 75 80 Age (years) 85 90 95 60 65 70 75 80 Age (years) 85 90 95 85 90 95 Hippocampus 0.4 0.5 0.6 0.3 0.2 0.2 0.3 Hippocampus 0.4 0.5 0.6 0.7 (b) Subjects within MCI 0.7 (a) Subjects within NC 55 55 60 65 70 75 80 Age (years) 85 90 (c) Subjects ever transit from MCI to AD 95 55 60 65 70 75 80 Age (years) (d) Subjects within AD Figure 4.1. Individual plot of percentage of hippocampus volume to ICV vs. age. In each graph, each black line segement with solid points in it denotes how the hippocampus of each patient changes with age. The solid point denotes the age when the patient takes screening. The x-axis is the real age of patients. The y-axis is the percentage of hippocampus volume to ICV (intracranial volume). Besides, the slope of the average decreasing trend of AD subjects is higher than those of NC subjects and MCI subjects. As shown by Figure 4.2a and Figure 4.2b, the paths of Hippocampus and MMSE aggregated by age at baseline in different diagnosis groups can 86 not be clearly separated, while the paths aggregated by screening time (months) can be separated very well in Figure 4.2c and Figure 4.2d. It implies that natural aging process interact with the progression process of Alzheimer’s Disease and thus it is better to position the observations on the time axis with screening time since we aim to detect how Alzheimer’s Disease affect the transition between different states of severity. 4.3 Methods A multi-state Markov model is used to model the progression of diseases (see, e.g. Jackson et al., 2003 [53]) which can provide information of disease progression by transition probabilities between different states and mean sojourn time at one state. Specifically, we used a homogeneous model by assuming a transition rate (also called transition intensity), the rate of a transition probability, is independent of time. In other words, this model allows a transition probability that changes over time but the rate remains constant. Generally speaking, the transition rate here can be interpreted as the overall changing speed (or risk) of the progression of Alzheimer’s Disease. Commenges, Daniel, et al. [19] proved that for large sample size, the incidence (the number of new cases) in time period ∆t approximately equals to the transition rates times ∆t. If ∆t is the time unit, for instance one year, then they are the same. Therefore, the transition rate is not just a concept in the statistical model, but it does have a direct interpretation in clinical science. Although transition rates are assumed to be independent of time, actually as we know, changing speed or risk should depend on patient related variables, such as MRI regional volumetric information, cognitive assessment information and demographical information. So we incorporate them into the model by considering the logarithm of an transition intensity 87 0.7 30 0.6 Diagnosis Diagnosis 25 0.5 D22 D23 0.4 D11 MMSE Hippocampus D11 D22 20 D23 D33 D33 15 0.3 0.2 50 60 70 80 Age (years) 10 90 (a) Fraction of hippocampus volum to ICV aggregated by age at baseline 50 60 70 80 Age (years) 90 (b) MMSE aggregated by age at baseline 30 0.50 Diagnosis 0.45 Diagnosis 0.40 D22 D23 D11 25 MMSE Hippocampus D11 0.35 D22 D23 20 D33 D33 0.30 0 10 20 30 40 15 50 Screening time (months) (c) Fraction of hippocampus volume to ICV aggregated by screening months 0 10 20 30 40 50 Screening time (months) (d) MMSE aggregated by screening months Figure 4.2. Longitudinal plots of fraction of hippocampus volume to ICV and MMSE in 4 different diagnosis groups. ’D11’ denotes the group of patients who stay within NC all the time. ’D22’ denotes the group of patients who stay within MCI all the time. ’D23’ denotes the group of patients who ever transit from MCI to AD. ’D33’ denotes the group of patients who stay within AD all the time. as a linear function of relevant predictors (known as Cox proportional hazard model [67]). For detailed explanation of a multi-state Markov model, please see Appendices Appendix 1 88 and Appendix 2. When comparing the longitudinal observations from different clinical groups, at different aging stages, it is crucial to correctly position the observations on the time axis. This is not straightforward since the disease appears at different ages and chronologically older brains may have greater structural integrity than younger ones affected by the pathology ( [66]). And obviously, in our longitudinal study, there are two timelines: screening time and age. Therefore, the way of accommodating the two timelines into our Markov model should be clearly described. Actually, we treat the screening time as the timeline along which the Markov chain undergoes transitions from one state to another, as observations in the original data are aligned by screening time. Meanwhile, we are thinking about including age as a covariate into the Markov model so that the continuous effect of age on the progression of Alzheimer’s Disease will be fully identified. To investigate which model fits the data better, we perform likelihood ratio tests (LRT). It helps to determine whether known risk factors increased the risk of progression from one state to another. The LRT is a statistical hypothesis test in which test statistics is the ratio of two likelihoods (or -2 log likelihood ratio), where the likelihood in the numerator is from the null model and the likelihood in the denominator is from the alternative model. The larger value of the likelihood ratio (or smaller value of -2 log likelihood ratio) indicates the data support the null model. We can also describe results with a survival probability curve. Survival probability in the multi-state Markov model implies the chance that a patient is not entering the final state (i.e. survived when the final state is ‘death’). Note that the final state in this study is not ’death’ but instead the clinical diagnosis ”Alzheimer’s Disease”. Compared with KaplanMeier (KM) curve [44] which generally models a process with two states and one possible 89 transition from an alive state to a dead state, a fitted non-AD probability curve from the Markov model is more powerful and flexible in modeling longitudinal data with multiple states and any possible transitions among them. 4.4 Results In this section, we explore several homogeneous continuous-time Markov models to fit the growth curve of the severity of Dementia (clinical diagnosis) by including different sets of covariates into transition rates . We first consider the model without any covariates, then the model with relevant covariates in it. In particular, we incorporate the real age, ApoE4, MMSE, CDR-SB, and some regional volumetric variables, such as ”Ventricles”, ”Hippocampus”, ”WholeBrain”, ”Entorhinal”, ”Fusiform”, ”MidTemp”, ”ICV”. The data we used here includes all the subjects whose diagnosis is NC or MCI at baseline, and then become MCI or AD in the end. And we treat AD state as the absorbing state, which means once a subject enters AD state, s/he will never transit to other states. 4.4.1 Model without covariates A homogeneous Markov model assumes that transition rate from state r to state r0 , qrr0 , is independent of time (Appendix Appendix 1). The model without any covariates serves as a baseline model which barely describes how the growth curve of the transition probability changes from previous state to the next state between two screening times. In other words, the transition rates (changing speed of the severity progression) do not depend on the other conditions but remain constant within each pair of transitions across time of observation. The estimate of the baseline intensity (q0,rr0 ) and its 95% confidence interval (CI) are given 90 in Table 4.1. Transition rate is the changing speed of transition probabilities with respect to time (equation (A.2) in Appendix Appendix 1). From the Table 4.1, transition from ‘NC’ to ‘MCI’ is about 2% (0.0441-0.0269) higher in rate compared to transition from ‘MCI’ to ‘NC’. Moving from ‘MCI’ state to ‘AD’ state (0.1968) is about 15% higher in rate compared to moving from ‘NC’ state to ‘MCI’ state (0.0441). More clearly, fitted survival probability plots in Figure 4.1 shows that subjects in state NC has the higher fitted survival probabilities (red solid line) than empirical survival probabilities (blue dashed line) while subjects in state MCI has the lower survival probabilities. The empirical survival curve describes the mean trend of Alzheimer’s Disease deterioration along with time. In fact, more than half of the subjects never have a chance to convert into MCI or AD, so the lowest value that empirical survival probability curve can reach is around 0.6 rather than 0 in common sense. Note that an empirical survival probability curve is calculated directly from the data without assuming any model, while the homogeneous continuous-time Markov model gives a smooth fitted survival probability curve which is different for the patient in different state as shown in Figure 4.1. An estimated transition probability describes a chance of a subject moving from one state to another state for the duration of given time. We provide estimated transition probabilities for 2 years and 4 years as shown in Tables 4.1. Overall speaking, compared to year 2, the subjects in year 4 tend to have higher transition probabilities from NC to MCI, from NC to AD, from MCI to AD. In other words, with the time being, naturally there will be more and more subjects converting into MCI and AD. Specially, we will observe more and more complete progression traces that convert from NC to MCI then to AD, although there are only 3 of them right now. More precisely, the result indicates that a chance of staying in 91 Table 4.1. Table of transition rates and probabilities for the model without covariates. Probabilities(95% Confidence Intervals) Transition Rates(95% Confidence Intervals) Year 2 Year 4 NC-NC 0.9176 (0.8836,0.9417) 0.8447 (0.7794,0.8908) 0.0441 ( 0.0306, 0.0635) 0.0679 (0.0478,0.0970) 0.1058 (0.0739,0.1477) NC-MCI NC-AD 0 0.0146 (0.0100,0.0221) 0.0495 (0.0337,0.0719) MCI-NC 0.0269 ( 0.0159, 0.0454) 0.0414 (0.0250,0.0716) 0.0645 (0.0383,0.1071) 0.6410 (0.5858,0.6886) 0.4136 (0.3494,0.4745) MCI-MCI MCI-AD 0.1968 ( 0.1629, 0.2379) 0.3176 (0.2720,0.3670) 0.5218 (0.4546,0.5856) the same state is higher than moving to a following progression state within 2 years when a patient is in either ‘NC’ or ‘MCI’ (higher remaining in the same state values 0.9176 and 0.6410 than the other values). Once the subject is in MCI’ state, there is a 32% chance that s/he will move to the ‘AD’ state within 2 years. However, chance of moving from MCI to AD (52%) is getting higher than staying in the same state MCI within 4 years, which is reasonable to expect. The estimated mean sojourn time in Table 4.5 shows that a subject stays in ‘NC’ state about 23 years while a subject stays in ‘MCI’ state for about 4.5 years. This implies the progression of Alzheimer’s Disease is a slowly changing process especially for the subjects in NC. This is also confirmed by the estimated transition rates in Table 4.1, where transition rates at early states are just less than 5%. But once a subject becomes MCI, the converting speed to AD is much faster (0.2611) and the mean sojourn time staying within MCI is much shorter (4.5 years). 4.4.2 Model with a single covariate In this section, we will investigate the effects of two main risk factors: age and apolipoprotein E e4 (simplified as ApoE4) by building each of them as a single covariate into two separate models. Details are introduced as follows. 92 1.0 0.8 0.6 0.4 0.2 Survival probability Group 0.0 NC to AD MCI to AD 0 2 4 6 8 Screening time (years) Figure 4.1. Plot of survival curves (not entering to the final state AD). The dashed blue line denotes fitted survival curve from NC to AD. The solid red line denotes fitted survival curve from MCI to AD. The dotted lines denote the 95% confidence band of corresponding fitted survival curve. 4.4.2.1 Age The major risk factor for Alzheimer’s Disease (AD) is age, with a sharp increase in incidence after 60 years (Kawas et al., 2000 [57]). Thus, the inherent link between aging and AD 93 should be understood thoroughly and clearly. Just as we know, the morphology observed in the brains of patients affected by Alzheimer’s Disease (AD) is a combination of different biological processes, such as normal aging and the pathological matter loss specific to AD (Lorenzi, Marco, et al. [66]). Therefore, aging might be a contributing effect that accelerate the progression of Alzheimer’s Disease as well as the transitions from mild status (MCI) to severe status (AD). All these considerations motivate us to include age (real age when MRI scans are collected) as a continuous covariate into our Markov model. To have a brief view of the contributing effect of age, we plot changing curves of fitted transition rates by age for 3 different transition groups. As shown in Figure 4.2, in the average sense, transition rates from NC to MCI are increasing with age which is consistent with the common sense that Brain function deteriorates more likely for older people. But meanwhile, transition rates from MCI to AD are decreasing with age, which actually contradict with the fact. The reason might be due to limitations of cross-sectional and longitudinal study designs [32]. Cross-sectional studies may suffer from cohort effects, and, potentially more seriously, different recruitment bias across age-groups, while for longitudinal data the selective attrition and test-retest effects that may be larger than the change across time points (Salthouse, 2012 [90]). And thus it is not surprising to find that compared to model without any covariates, the model with age is not significant (p-value is 0.1733) according to the likelihood ratio test Table 4.6. 4.4.2.2 ApoE4 The apolipoprotein E (APOE) e4 allele is the most prevalent genetic risk factor for Alzheimer’s Disease (AD) (Corder et al., 1993 [20]; Saunders et al., 1993 [92]) and is present in roughly 2025% of North Americans and Europeans (Gerdes et al., 1992 [39]). The presence of an 94 0.30 Group 0.20 0.10 0.00 Transition rate NC−MCI MCI−NC MCI−AD 55 65 75 85 Age (years) Figure 4.2. Fitted transition rates changing with age in different groups. The red solid line with circle marks denotes the transition rates from NC state to MCI state. The blue dashed line with triangular marks denotes the transition rates from MCI state to NC state. The purple dotted line with plus marks denotes the transition rates from MCI state to AD state. x-axis denotes the age, while y-axis denotes the transition rates. 95 e4 allele confers a significantly higher likelihood of developing AD [86]. Shi, Jie, et al. [96] found significant hippocampal morphological deformation in APOE e4 carriers relative to non-carriers in the entire cohort as well as in the non-demented (pooled MCI and control) subjects. Spampinato, Maria Vittoria, et al. [97] found that, in a longitudinal magnetic resonance imaging (MRI) study using voxel-based morphometry (VBM), carriers of the APOE e4 allele with MCI developed atrophy in hippocampus, insula, temporal, and parietal cortex before converting to AD, while structural changes underlying the conversion to dementia in non-carriers did not become apparent. In our study, ApoE4 is a categorical variable, taking values at 0, 1, 2. We group the ApoE4 into two levels – ApoE4 negative (ApoE4 taking value at 0) and ApoE4 positive (ApoE4 taking value at 1 or 2). Because there are only 7.51% out of all the subjects whose ApoE4 takes value at 2 and we won’t get any powerful conclusions if we compare it with other levels (APoE 4 is 0 or 1). Then we include ApoE4 as a two level categorical covariate into our Markov model. As expected from the model fitting, carriers of the ApoE4 (ApoE4 is positive) are more likely to transit from NC to MCI and from MCI to AD. Evidences are shown in Table 4.2, Table 4.3, Table 4.4 and Figure 4.3. In the transition rates table (Table 4.2), ApoE4 carriers (0.0906) have 3 times higher transition rates from NC to MCI than ApoE4 non-carriers (0.0299), while ApoE4 carriers (0.2611) have 2 times higher transition rates from MCI to AD than ApoE4 non-carriers (0.1338). In the table of transition probabilities at year 2 (Table 4.3), ApoE4 carriers (0.1247) have more than 2 times higher transition probabilities from NC to MCI than ApoE4 non-carriers (0.0495), while ApoE4 carriers (0.3973) have almost 2 times higher transition rates from MCI to AD than ApoE4 non-carriers (0.2291). When comparing transition probabilities tables at different years Table 4.4 (year 4) and Table 4.3 (year 2), the same pattern maintains between 96 Table 4.2. Table of transition rates (95% confidence intervals) for the model with covariate ApoE4. Transition ApoE4 is positive ApoE4 is negative NC-MCI 0.0906 (0.0535, 0.1527) 0.0299 ( 0.0180, 0.0495) NC-AD 0 0 MCI-NC 0.0277 ( 0.0132, 0.0582) 0.0265 ( 0.0126, 0.0556) MCI-AD 0.2611 ( 0.2067, 0.3297) 0.1338 ( 0.0969, 0.1848) Table 4.3. Table of transition probabilities (95% confidence intervals) at year 2 for the model with covariate ApoE4. Transition ApoE4 is positive ApoE4 is negative NC-NC 0.8383 (0.7422,0.9016) 0.9434 (0.9070,0.9653) NC-MCI 0.1247 (0.0760,0.1993) 0.0495 (0.0307,0.0825) NC-AD 0.0369 (0.0220,0.0615) 0.0071 (0.0039,0.0125) MCI-NC 0.0382 (0.0182,0.0747) 0.0439 (0.0214,0.0900) MCI-MCI 0.5645 (0.4920,0.6264) 0.7270 (0.6445,0.7889) MCI-AD 0.3973 (0.3297,0.4699) 0.2291 (0.1684,0.3022) ApoE4 carriers and non-carriers. And in both tables, transition probabilities from MCI to AD are higher than those from NC to MCI. In mean sojourn time table (Table 4.5), ApoE4 carriers have much shorter mean sojourn time than ApoE4 non-carriers which means that ApoE4 has the effect of accelerating the progression of Alzheimer’s Disease. Compared to model without any covariates, the model with APOE4 is significant (p-value is 0.0002) according to the likelihood ratio test Table4.6. 4.4.3 Model with multiple covariates In this section, we made predictions of possible transitions based on transition probabilities and transition rates by including known risk factors as covariates into the model. The covariates we incorporate into the model are the real age, ApoE4, MMSE, CDR-SB, and some regional volumetric variables, such as ”Ventricles”, ”Hippocampus”, ”WholeBrain”, ”Entorhinal”, ”Fusiform”, ”MidTemp”, ”ICV”. Note that we believe that the real age contributes in prediction and thus should be included in the model, although it is not recognized 97 Table 4.4. Table of transition probabilities (95% confidence intervals) at year 4 for the model with covariate ApoE4. Transition ApoE is positive ApoE is negative NC-NC 0.7076 (0.5558,0.8161) 0.8922 (0.8312,0.9331) NC-MCI 0.1750 (0.1101,0.2610) 0.0827 (0.0505,0.1278) NC-AD 0.1175 (0.0732,0.1885) 0.0251 (0.0143,0.0423) MCI-NC 0.0536 (0.0240,0.1111) 0.0734 (0.0366,0.1495) MCI-MCI 0.3235 (0.2411,0.3969) 0.5307 (0.4331,0.6196) MCI-AD 0.6229 (0.5423,0.7087) 0.3959 (0.3069,0.4989) 0.8 0.8 Table 4.5. Table of mean sojourn time (95% confidence intervals) for the model with covariate ApoE4. Model with covariate ApoE4 Transition Model without covariates ApoE is positive ApoE is negative NC 22.6864 (15.7601, 32.6567) 11.0653 (6.5486,18.6972) 33.5035 (20.1918,55.5913) 4.4695 (3.7402, 5.3411) 3.4632 (0.3939, 2.7712) 6.2389 (4.6385, 8.3914) MCI Group Group 0.6 0.4 Transition probability 0.4 0.0 0.2 0.2 0.6 NC−MCI MCI−NC MCI−AD 0.0 Transition probability NC−MCI MCI−NC MCI−AD 0 1 2 3 4 0 Screening time (years) (a) ApoE4 is positive 1 2 3 4 Screening time (years) (b) ApoE4 is negative Figure 4.3. Transition probabilities changing with time for the model with covariate ApoE4. The red solid line with circle marks describes transition probabilities from NC state to MCI state. The blue dashed line with triangle marks describes transition probabilities from MCI state to NC state. The purple dotted line with plus marks describes transition probabilities from MCI state to AD state. x-axis denotes the scanning time, while y-axis denotes the transition probabilities. The left panel describes the ApoE4 carriers (positive), while the right panel describes the ApoE4 non-carriers (negative). as a significant risk factor in the previous section (p-value is 0.1733 in the Likelihood Ratio Test). 98 Table 4.6. Likelihood ratio test between different models. Models Age without covariates -2 log LR df p-value Volumes -2 log LR -64.1235 df 15 p-value 1 4.4.3.1 ApoE4 4.9796 3 0.1733 Volumes 19.7598 69.10305 3 18 0.0002 0 -49.3432 15 1 - Scores Full 129.8185 166.6650 6 30 0 0 60.7155 12 0 97.5619 12 0 Prediction and validation As described in the previous sections, MCI subjects have much higher risk of developing into AD. Therefore, we are more interested in making predictions of transitions from MCI to AD (only two- state model). The data we used here includes only the subjects whose diagnosis is MCI at baseline, and then become AD in the end, which means we are not going to predict the transition from NC to MCI. And we still treat AD state as the absorbing state. Cross validation is performed by processing the testing set with the trained Markov model. More specifically, a general two-step training and testing procedure is described as follows. All the transition rates and probabilities here stand for the transition from MCI to AD, and the logistic regression is just used as a tool for discriminating observations with lower or higher transition rates and transition probabilities into MCI state or AD state. First, our Markov model is trained by using the training set. Then we collect the outputs including the estimated transition rates and transition probabilities. Afterwards, a logistic regression model of diagnosis states (MCI or AD) was trained on the estimated transition rates and transition probabilities. Second, by plugging the testing set into the trained Markov model, we get the new estimated transition rates and transition probabilities. Then, we plug them into the trained logistic regression model, and get the predicted diagnosis state. Note that predictions are performed within each subjects, and it is one time step prediction, in 99 other words, each subject’s information in the previous scanning time is used to predict the diagnosis state in the next scanning time. As known to all, correct registration and alignment of subjects on the timeline is crucial to longitudinal medical studies. Therefore, to investigate how registration and alignment will affect prediction of Alzheimer’s Disease progression, we prepared two data sets for cross validation, Data1 and Data2. Data1 keeps all the observations before AD state for each subject so that all the subjects are aligned to the baseline of scanning time and the numbers of replications (observations) of different subjects are unbalanced. Data2 only keeps two observations before AD state for each subject so that all the subjects are aligned to the last scanning time point (AD state) and the numbers of replications (observations) of different subjects are balanced (totally 3 observations). And we believe that, in Data2, subjects are sharing more similarities in structural volumetric changing patterns during the two time points before becoming AD state. But things are usually not as perfect as expected, because the two time points before AD state can not be fully aligned for all the subjects. The reason is that, as prespecified by ADNI, for subjects who are diagnosed as NC or MCI at baseline (first scan), MRI scans are collected every half year within the beginning 2 years and every year after 2 years. Thus it is highly likely that, within two years before AD, one subject is observed every half year while the other is observed every year. Furthermore, based on the analysis from previous section, we know that subjects with higher transition rates and transition probabilities tend to develop higher risk of accelerating the deterioration of the Alzheimer’s Disease. This phenomenon motivates us to divide the subjects in each data sets (Data1 and Data2) into groups according to the scale of transition rates and transition probabilities. And Group1 (Table 4.7) stands for the higher risk group, while Group2 (Table 4.7) stands for the lower risk group. The performance of prediction between the two groups should be 100 different, just as they should be treated in different ways in clinical treatment. Table 4.7 presents us with all the leave-one-out cross validation results for different groups in different data sets. Note that the true positives here stand for the true AD is predicted as AD, while the true negatives here stand for the true MCI is predicted as MCI. Comparing the Overall model based on Data1 with the Overall model based on Data2, the latter has a bigger sensitivity (0.5158 vs 0.4000), a smaller specificity (0.6947 vs 0.8930) and a smaller accuracy (0.6053 vs 0.7313). It implies that there are more cases that true MCI or AD are predicted as AD (for Data2, both true positives and false positives are increasing). In other words, the trained model based on Data2 is more likely to transit to AD than trained model on Data1. We consider two reasons. One reason is quite clear. On average, there are more MCI state observations in Data1 than in Data2, thus the observations of transitions from MCI to AD is relatively diluted in Data1. Hence, it turns out that the trained Markov model based on Data1 reduces the rates and probability of transitions. Another reason is that, for Data2, there are only 3 observations for each subject and also observations for MCI state are not perfectly aligned on the timeline as explained in the last paragraph. These in some sense will amplify the effect of disalignment on increasing the number of false positives and false negatives. When comparing Group1 with Group2 in either data set, Group1 has much higher sensitivity, specificity, precision and accuracy. It implies that transition rates and transition probability not only provides us integrated information to do prediction, but also provides a possible way to cluster individuals into different groups sharing similarities in progression of brain functional impairment. 101 Table 4.7. Prediction of two data sets with different alignments in time. Data1 includes all the observations before AD for each subject, while Data 2 includes only 2 observations before Ad for each subject. Data1 is divided into Group1 and Group2. Group1 in Data1 includes all the patients whose transition probability is ever more than 0.45 or whose transition rate is more than 1.2. Also, Data2 is divided into Group1 and Group2. Group1 in Data2 include all the patients whose transition probability is ever more than 0.6 or whose transition rate is more than 1.2. Data1 Subjects Sensitivity Specificity Precision Accuracy AUC Group1 51 0.5098 0.9008 0.6842 0.7849 0.8583 Group2 54 0.4074 0.8404 0.5946 0.6824 0.6843 Overall 105 0.4000 0.8930 0.6461 0.7313 0.7404 Data2 Subjects Sensitivity Specificity Precision Accuracy AUC Group1 59 0.7119 0.6610 0.6774 0.6864 0.7672 Group2 36 0.5556 0.6389 0.6061 0.5972 0.5829 95 0.5158 0.6947 0.6282 0.6053 0.6504 Overall 4.4.3.2 Impact of transition rates and transition probabilities To have a clearer picture of how both transition rates and transition probabilities work as integrated predictors helping with predictions in progression of Alzheimer’s Disease, we are trying to make prediction on the original data set. The training set we used here includes all the subjects whose diagnosis is NC or MCI at baseline, and then become MCI or AD in the end. And we treat AD state as the absorbing state. The prediction set we used here is the most original one which allows transitions from AD to AD. After training our Markov model by using the training set, we can get one time step predictions of transition rates and transition probabilities (from NC to MCI and from MCI to AD) based on the testing set.Then we aggregate them by screening time (months) and real diagnosis state and get the longitudinal plots of predicted transition rates and probabilities in different diagnosis groups (Figure 4.4). Note that we took logarithm of transition rates, since they are in a wide range. When comparing plots of predicted transition probabilities from NC to MCI (Figure 4.4b) with plots of predicted transition probabilities from MCI to AD (Figure 4.4d), we observe that neither the diagnosis group ’D23’ nor ’D33’ separated very well from all the 102 other diagnosis groups in Figure 4.4b, which tells us that when doing prediction of transition from MCI to AD, we’d better use the information of transition intensities from MCI to AD instead of from NC to MCI. The reason of the difference is obvious that transition probabilities from NC to MCI mostly reflects the overall risk in the progression of disease severity from NC state to MCI state. In the diagnosis group ’D23’ and ’D33’, with the time being, the transition probabilities from NC to MCI will be decreasing finally since most of subjects in ’D23’ and ’D33’ have been getting involved in converting from MCI or AD to AD state instead of converting from NC to MCI. To the opposite, transition probabilities within four groups in Figure 4.4d have a continuously increasing trend, since all the diagnosis states will convert to AD state (with the order from NC to MCI to AD) in the end. When comparing plots of Predicted transition rates from MCI to AD (Figure 4.4c) with plots of predicted transition probabilities from MCI to AD (Figure 4.4d), we find that in both cases ’D23’ and ’D33’ are not separated very well with each other, which is consistent with the fact that the boundary in the assessment of diagnosis between transition from MCI to AD and always staying in AD are not very clear. However, all the other diagnosis group ’NC’, ’MCI’, ’AD’ are separated very well, which indicates that transition rates and transition probabilities could help to discriminate transition states and thus predict progression of brain functional impairment. To put it differently, our Markov model is capable of integrating different sources of information into two simple indices to resemble and predict the mechanism of progression of Alzheimer’s Disease. One index is the transition rate, which reflects the overall speed of progression. Another is the transition probability, which tells us the possibility of a certain transition at a given time. 103 7.5 0.3 Transition rate 5.0 D11 D22 2.5 D23 0.0 D33 Group Transition probability Group D11 0.2 D22 D23 D33 0.1 −2.5 0 10 20 30 40 0.0 50 Screening time (months) 0 10 20 30 40 50 Screening time (months) (a) Predicted transition rates from NC to MCI (b) Predicted transition probabilities from NC by screening months to MCI by screening months Group Group Transition probability Transition rate 4 0.75 D11 2 D22 D11 D22 0.50 D23 0 D33 D23 D33 0.25 −2 0 10 20 30 40 0.00 50 Screening time (months) 0 10 20 30 40 50 Screening time (months) (c) Predicted transition rates from MCI to AD (d) Predicted transition probabilities from MCI by screening months to AD by screening months Figure 4.4. Longitudinal plots of predicted transition rates and probabilities in different diagnosis groups. ’D11’ denotes the group of patients who stay within NC all the time. ’D22’ denotes the group of patients who stay within MCI all the time. ’D23’ denotes the group of patients who ever transit from MCI to AD. ’D33’ denotes the group of patients who stay within AD all the time. 104 4.5 Conclusion and discussion When positioning the observations on the time axis, we found that aligning the observations by screening time instead of age provides us more distinct pathways to differentiate the transition pattern between different transition groups (diagnosis). Technically, we treated the screening time as the timeline along which the Markov chain undergoes transitions from one state to another. Meanwhile, we included age as a covariate into the Markov model to make prediction so that the continuous effect of age on the progression of Alzheimer’s Disease will be fully identified. Results from model without covariates imply the progression of Alzheimer’s Disease is a slowly changing process especially for the subjects in NC. The transition speed from MCI to AD is much faster than the transition speed from NC to MCI or from NC to AD, which means MCI subjects take higher risk to develop into Alzheimer’s Disease. Actually, the average sojourn time staying MCI is 4.47 year. In other words, after 4.47 years on average, MCI subjects will transit to other states, most likely to AD. Compared with Kaplan-Meier (KM) curve which generally models a process with two states and one possible transition from an alive state to a dead state, a fitted non-AD probability curve from the Markov model is more powerful and flexible in modeling longitudinal data with multiple states and any possible transitions among them. Compared to model without any covariates, the model with age is not significant (pvalue is 0.1733), while the model with ApoE4 (p-value is 0.0002), the model with Volumes (p-value is 0), the model with scores (p-value is 0) and the full model (p-value is 0) are all significant. Compared to model with volumes, the model with age (p-value is 1) and the model with ApoE4 (p-value is 1) are not significant, while the model with cognitive testing 105 scores (p-value is 0) and the full model (p-value is 0) are significant. Besides survival probability and mean sojourn time, Markov Chain model gives us two main products can be used to do predictions: transition rates and transition probabilities. First, we fitted the Markov model and got the corresponding transition rates and transition probabilities and then use some machine learning methods to built the connection between the two main products and the transition states. From the leave-one-out cross validation results, the highest AUC (area under curve) we can get is 0.8583. The key property of a continuous-time Markov chain model is that the state of the next step only depends on the state information on the previous step. Because of this property, Markov chain is more powerful in making one step prediction instead of multiple steps prediction. What is more, the longitudinal data we used here is obviously interval censored. We didn’t design specially a scheme to model the interval censoring problem in this paper. There are some other limitations. On the one hand, the results and findings are based only on this particular data set. On the other hand, this is the largest available longitudinal study on ADNI1, but we didn’t consider the data from ADNIGo or ADNI2. 106 APPENDICES 107 Appendix 1 Multi-state Markov model Let {S(t), t ≥ 0} be a multi-state Markov process with n different states. Let S be the set of states. For simplicity, assume that S = {1, · · · , n}. The process is Markovian if for all t, t0 > 0 and r, r0 ∈ S, P (S(t + t0 ) = r0 | S(t) = r, Fu , u < t) = P (S(t + t0 ) = r0 | S(t) = r), (A.1) where Fu is the observation history of the Markov process up to time u. That is, the state of the process at a future time t + t0 given the previous history of the process up to the present time t depends only through the state of the present time t. Given the initial distribution of the process at time 0, a multi-state Markov process can be characterized by the matrix of transition probabilities, P (t, t + t0 ) = ((prr0 (t, t + t0 ))), where prr0 (t, t + t0 ) = P (S(t + t0 ) = r0 | S(t) = r). Note that the sum of each row of P (t, t + t0 ) is always 1. Alternatively, we can introduce the matrix of transition intensities, Q(t) = ((qrr0 (t))), qrr0 (t) = lim [P (S(t + t0 ) = r0 | S(t) = r)/t0 ]. (A.2) t0 →0 qrr0 can be interpreted as the instantaneous hazard of progressio n to stage r0 from stage r (see, e.g. Jackson et al., 2003 [53]). If the process S(t) is assumed to be homogeneous in time, qrr0 (t) is independent of t. Equation (A.2) implies that the intensity matrix Q behaves like the derivatives of transition probability matrix P . Thus, the sum of each row of Q is 0 so that qrr (t) = − P r0 6=r qrr0 (t) for each r ∈ S. From the Chapman-Kolmogorov equations, we can show that P (t) = exp(tQ) for a homogeneous continuous-time Markov process (see, e.g. Whitt, 2012, [109]). 108 It is of interest to incorporate additional information of each subject in the model for AAA progression. This can be done by introducing a regression component into the intensity matrix Q. Let X(t) be the set of explanatory variables available at time t. Then, we model the intensity matrix Q by qrr0 (t, X(t)) = q0,rr0 exp   β Trr0 X(t) , (A.3) where β rr0 is the set of regression parameters corresponding to the explanatory variable X(t) for r 6= r0 . Note that we consider positive qrr0 for r 6= r0 while X(t) may be not. Thus, an exponential function is natural to link these two quantities. qrr will be specified using qrr (t) = − P r0 6=r qrr0 (t). q0,rr0 can be interpreted as a baseline intensity when X(t) does not affect the process. Note that the term of ‘homogeneous’ in the homogeneous Markov chains describes that the transition intensities qrs are independent of time t. Also in the homogeneous Markov model, the sojourn time in each state r is exponentially distributed with mean −1/qrr and the probability that an individual in state r moves to state s is −qrs /qrr . For more details, please refer to [75]. 109 Appendix 2 Specification for AAA progression For AAA stages, we consider the following intensity matrix,   0 0   −q12 q12      0  −q q 0 23 23   Q= .    0  q −(q + q ) q 32 32 34 34     0 0 0 0 (A.4) The structure in (A.4) represents a multi-state model. Also, the model assumes that the last stage is an absorbing stage, i.e. one can not go back to the other stage once it enters that absorbing stage. Note that once an AAA has been treated with a surgery or been ruptured, state 4 (fatal) can not go back to the previous state. Thus, the absorbing stage assumption is reasonable for AAA data. We estimate q12 , q23 , q32 and q34 using the maximum likelihood method via EM algorithm (Jackson et al., 2003 [53]). For the EM algorithm, we need to set the initial values. Mean sojourn time, −1/qrr , can be approximated by total years spent in state r , total transitions from state r so that we can obtain initial values for qrr0 from qrr = − P r0 6=r qrr0 . For AAA data, the mean period in state 1 before moving to state 2 is about 2 year. Thus, we approximate q11 = −2 and q12 = 0.5 for their initial values. Jackson (2011) [16] developed the R package, msm (available at www.r-project.org) for a multi-state hidden Markov model. We use this package for out analysis. 110 Chapter 5 Conclusions, Discussion, and Directions for Future Research 5.1 Conclusions and discussion Data analytics is playing a more and more crucial role in this information explosion era. Tons of thousands of events, situations, systems and phenomenon are waiting for statisticians to explore. All kinds of efforts and intelligence have been put into exploring the intractable sources of uncertainties, the complex structures hidden behind observations, the unknown interactions and dependencies across space and time, the ultra-sparse signals immersed in noises, the evolutionary path of a high dimensional network and so much other information. In this dissertation, Bayesian calibration is used to explore the intractable sources of uncertainties and the complex structures hidden in the system of AAA enlargement. Markov model is used to explore the unknown interactions and dependencies across space and time in the Brain images of patients having Alzheimer’s Disease. But still many research questions and aspects need to be taken care of in the future. Although Bayesian calibration has been widely applied in analyzing big and complex data from Engineering, clinical sciences, climate change and Physics, etc., theoretical results of convergence and consistency has not been fully developed yet. For Brain imaging analysis, although we used Markov chain to analyze the 111 regional volumes getting from MRI of patients having Alzheimer’s Disease, further effort should be dedicated to other types of data such as genetic data, PET scan, DTI, fMRI, etc. and related research questions. Another important concept that has not been discussed is dimension reduction. To tackle the high dimensionality, variability and complexity of big data, I mainly seek two directions of statistical research to approach innovative theories and methodologies that imply huge impacts in data science–hierarchical modeling and dimension reduction (penalization). Therefore, the future research is mainly focused on three aspects: consistency theory of Bayesian calibration, Bayesian variable selection, dynamic network and sparsity regulation. 5.2 5.2.1 Directions for future research Bayesian consistency In most cases, the true value of the calibration parameters cannot be measured or observed physically. Therefore, it is necessary to define the true parameter value and then provide a theoretical study showing that the estimate of the calibration parameter converges to the true value in probability. Rui Tuo and C. F. Jeff Wu 2016 [103] has provided a theoretical framework for calibration in computer models by considering simplifications like dropping the prior. So first I would like to introduce Rui Tuo and C. F. Jeff Wu’s work as follows, and then propose the future directions of research. Real observation and computer model are related through z p (x) = rs (x, θ0 ) + δ(x) + e, 112 (5.1) where θ0 is the true value of the calibration parameter and δ(·) is an unknown discrepancy function. Suppose the physical system is deterministic, then the observational error is neglected and we get z p (x) = rs (x, θ0 ) + δ(x), (5.2) where θ0 is the true value of the calibration parameter and δ is the discrepancy between z p and rs (·, θ0 ). Usually δ is a nonzero function because the computer code is built based on certain assumptions or simplifications which do not match the reality exactly. Define ε(x, θ) := z p (x) − rs (x, θ), (5.3) The discrepancy function δ is a realization of a Gaussian process with mean 0 and the covariance function σ 2 Φ, where σ 2 is an unknown parameter and the function Φ is known. We adopt the L2 norm in defining the distance.The L2 distance projection of θ is given by θ? = argminθ∈Θ kε(·, θ)kL (Ω) , 2 (5.4) The L2 norm comes from the common use of the quadratic loss criterion. The expected quadratic loss between the prediction and the computer model given θ is Z Ω (z p (x) − rs (x, θ0 ))2 dx = kε(·, θ)kL (Ω) 2 Because we don’t know the true value of θ, the authors define the true calibration parameter as the value θ? minimizes the average predictive error ??. Maximum likelihood estimation (MLE) is used to estimate (θ, σ 2 ) instead of Bayesian 113 analysis. Under these assumptions, the functions ε(xi , ·) is known for i = 1, . . . , n. Then it can be shown that the log-likelihood function for (θ, σ 2 ) is given by 1 1 n l(θ, σ 2 ; Y ) = − log σ 2 − log |Φ| − 2 ε(x, θ)T Φ−1 ε(x, θ), 2 2 σ (5.5) where ε(x, θ) = (ε(x1 , θ), . . . , ε(xn , θ))T and Φ = (Φ(xi , xj ))ij . And the authors discuss the consistency of the maximum likelihood estimate to the defined true parameter. I would like to generalize the above theory from the frequency setup into Bayesian setup. Following the same logic, first I need to define what is the true value. Define true process as ζ(·) = rs (·, θ) + δ(x), The conditional posterior predictive density of true process is f (ζ(x)|θ, ψ̂, d), while the posterior distribution of parameter θ is f (θ|ψ̂, d). I would like to show that when f (ζ(x)|θ, ψ̂, d) is consistent to f (ζ(x)|θ0 , ψ̂, d), then f (θ|ψ̂, d) is consistent to f (θ0 |ψ̂, d). Two popular methods to prove Bayesian consistency are Kullback-Leibler divergence and Hellinger consistency. Stephen Walker, 2004 [106] derives sufficient conditions for both Hellinger and Kullback-Leibler consistency and proves equivalence between them. Hellinger distance is defined as Z dH (f, f0 ) = 1/2  1/2 Z p p p = 2(1 − ( f f0 )) ( f ) − f0 For Hellinger consistency, the required result we are aiming for is Πn ({f : dH (f, f0 ) > }) → 0 a.s. [F0∞ ] 114 Kullback-Leibler divergence between f and f0 is dK (f, f0 ) = R f0 log(f0 /f ). The Kullback- Leibler property is given by Π({f : dK (f, f0 ) < }) > 0. For more details about the consistency and convergence of posterior distribution, please refer to Subhashis Ghosal, 1997 [41]. 5.2.2 Variable selection in Bayesian statistics Penalization methods in high dimensional problems, being the flavor of the last decade, have enjoyed much attention resulting in a rich theoretical literature establishing their optimality properties. There are also fast algorithms and compelling applied results underlining the success story of these methods. In the regression setup, it is common to view the negative log prior density as a penalty function. As a result, a Gaussian prior is equivalent to a L2 penalty regularization, while a Laplacian prior is equivalent to a L1 penalty regularization. Besides, the idea of a weighted mixture prior is similar to that of an elastic net regularization. Every time, model improvement happens when prior put more probability around zero, and heavier tails of the non-zero regions. Because variable selection techniques always aim to balance the efficiency of penalizing true zero coefficients to be exact zero and pulling true nonzero coefficients away from zero. In terms of these consideration, a horseshoe prior (Carvalho, 2010 [11]) comes to the scene, since it has infinite mass at zero and Cauchy-like tails. Although exact zeros can be achieved in spike and slab (George,and McCulloch, 1997 [38]), the computational burden is high since sampling from the posterior of binary variables is not an easy task especially in the high dimension case. There are several optimality results available for the horseshoe and horseshoe+ priors (Carvalho et al., 2010 [11]); (Datta et al., 2013 [26]), (van der Pas et al., 2014 [104])). However, to the best of my knowledge, all of 115 these results are concerned with the estimation of the multivariate normal mean. A satisfactory theory for uncertainty quantification for penalized estimators is not yet in place. I hope to be able to choose a default shrinkage prior that leads to similar optimality properties to those shown for L1 penalization and other approaches, but with the added advantage of the entire posterior distribution concentrating at the optimal rate, instead of just focusing on the point estimate. In the following, I am going to introduce the Bayesian variable selection using Inverted Beta distribution (Special case is Horse shoe prior). For the regression situation involving the observation of a dependent variable Y and a set of potential predictors X1 , . . . , Xp , we consider the canonical regression setup Y |X, β, σ 2 ∼ Nn (Xβ, σ 2 I), (5.6) where Y is n × 1, X = [X1 , . . . , Xp ] is n × p, β = (β1 , . . . , βp )0 , and σ 2 is a scalar. Both β and σ 2 are considered unknown. We describe the prior of β as a multivariate normal π(β|σ 2 , λ) = Np (0, Υ(σ,λ) ). (5.7) The residual variance σ 2 is conveniently modeled as a realization from an inverse gamma prior. π(σ 2 ) = IG(ν/2, νω/2), which is equivalent to νω/σ 2 ∼ χ2ν . It might be desirable to have ω decrease with p. 116 (5.8) We consider the special case of Υ(σ,λ) in (5.7) is of the form π(β|σ, λ) = Np (0, σ 2 D(λ)RD(λ)), (5.9) where D(λ) is diagonal and R is a correlation matrix. We denote the i-th diagonal element of D2 (λ) by (D2 (λ))ii = λ2 . Each component of β is modeled as a normal distribution π(βi |σ 2 , λ2 ) = N (0, λ2 σ 2 ). (5.10) The scale parameter λ2 follows an inverted-beta distribution : p(λ2 |a, b) = (λ2 )b−1 (1 + λ2 )−(a+b) , Beta(a, b) (5.11) where Beta(a, b) denotes the beta function, and where a and b are positive reals. The Inverted Beta distribution is plotted when parameters a and b take different (Figure 5.1). When b ≤ 1, the density is infinite around origin. When b > 1, the density is 0 at origin. As you might notice, the behaviors (convergence rates) at the tail and around origin are different with the change of parameters a and b. It gives us ideas that Inverted Beta distribution can serve as a flexible and versatile prior in Bayesian variable selection and it suffices to model different level sparsity if parameters a and b are carefully chosen. As we know, the horseshoe prior is just a special case when a = 0.5 and b = 0.5. 117 Figure 5.1. Inverted Beta distribution when parameters a and b take different values. The generalized beta prime distribution has the probability density function as f (x; α, β, p, q) = p( xq )αp−1 (1 + ( xq )p )−(α+β) qBeta(α, β) , where p > 0 controls the shape and q > 0 controls the scale. Shown below are the two modeling options I am aiming to do, if we would like to use the global-local prior. The first one is putting the global parameter τ as an upper hierarchy of the local parameter λi . Each regression coefficient βi is controlled by local parameter λi , 118 while each λi is dominated by the global parameter τ . Y |β, σ 2 ∼ Nn (Xβ, σ 2 I), β|σ 2 , λi ∼ Np (0, σ 2 D(λi )RD(λi )), βi |σ 2 , λ2i ∼ N (0, λ2i σ 2 ), λ (5.12) λ ( i )a1 −1 (1 + ( τi ))−(a1 +b1 ) π(λi |τ, a1 , b1 ) = τ , τ Beta(a1 , b1 ) π(τ |a0 , b0 ) = (τ )a0 −1 (1 + τ )−(a0 +b0 ) . Beta(a0 , b0 ) The second one is putting the global parameter τ and the local parameter λi at the same hierachy. But we use the product of them to control the magnitude each regression coefficient βi . Y |β, σ 2 ∼ Nn (Xβ, σ 2 I), β|σ 2 , λi , τ ∼ Np (0, σ 2 D(λi , τ )RD(λi , τ )), βi |σ 2 , λ2i , τ 2 ∼ N (0, λ2i τ 2 σ 2 ), π(λi |a1 , b1 ) = π(τ |a0 , b0 ) = 5.2.3 (5.13) (λi )a1 −1 (1 + λi )−(a1 +b1 ) , Beta(a1 , b1 ) (τ )a0 −1 (1 + τ )−(a0 +b0 ) . Beta(a0 , b0 ) Bayesian Dynamic network When applied to the brain, the term connectivity refers to several different and interrelated aspects of brain organization (Horwitz, 2003 [52]). A fundamental distinction is that between structural connectivity, functional connectivity and effective connectivity. Functional connectivity is the connectivity between brain regions that share functional properties. More specifically, it can be defined as the temporal correlation between spatially remote neurophysiological events, expressed as deviation from statistical independence across these events 119 in distributed neuronal groups and areas. This applies to both resting state and task-state studies. Let G = (V, E) be a graph with the vertex (node) set V , and the edge set E. A node u ∈ V represents an entity (intensity of a voxel) and an edge (u, v) ∈ E represents a connectivity. The precision matrix (inverse covariance matrix) is a popular measure used to describe an undirected graph, since the zero entry is used to model the conditional independence between the corresponding two nodes. Friedman, 2008 [33] developed a simple algorithm to solve the estimation of sparse undirected graphical models through the use of L1 (lasso) regularization. For each patient, the data Y = (yt,v ) is a matrix, with each row t (t ∈ {1, . . . , T }) denotes the time, and each column v (v ∈ {1, . . . , V }) denotes a voxel. And the values of each entry yt,v denotes the intensity. Each column is time-ordered. Then the precision matrix estimated from data Y must be a function of time as the data is spatial and temporal. From the perspective of application, we actually get a time varying undirected graph which is called dynamic network. As shown in Figure [?], the network topology at time t − 1 is different from that at time t. The topology includes several aspects–organization of voxels and nodes, number of nodes, connectivities between nodes. When considering the changes of network between two time points, we might consider changes in the intensity of nodes, changes in the number of nodes and changes between network connectivities. We call them all together as topology transitions. For more details, please refer to Zhou, et al. 2010 [120] and Monti, et al. 2014 [72]. Because the precision matrix is positive definite, modeling and penalizing the timevarying precision matrix directly is difficult. We have to decompose the precision matrix into parts that do not have positive definite requirements. Suppose each precision matrix 120 Voxel Voxel Voxel Voxel Voxel Voxel Voxel Voxel Voxel Group 3 Voxel Voxel Voxel Group 3 Voxel Voxel Group 2 Group2 Voxel Network at t-1 Network at t Group 1 Group 1 Voxel Voxel Voxel Voxel Voxel Voxel Voxel Voxel Voxel Figure 5.2. Group 1, Group 2 and Group 3 (big circles) denote the nodes of the brain network. The thick solid line denotes the connectivity between nodes. As we can see the connectivity changes with time. Voxels (small circles) denote the real observations. Ωt has a ”LDL” type of matrix decomposition Ωt = LD t L0 , where D t is a diagonal matrix and L is a upper triangular matrix, which is an unique form given by Cholesky Decomposition. And we assume that the diagonal matrix D t evolutes with time. We model the sparsity of the dynamic network by penalizing the Cholesky factor. But the sparse Cholesky factor does not guarantee to construct a sparse symmetric positive-definite matrix. Rothman, 2010 [88] discovered a subtle relationship between the sparsity of Cholesky factor and the sparsity of the covariance matrix. When connectivity representative L changes with time, then it is more likely that every Ωt can be decomposed exactly into Lt Dt Dt Lt .Then the graphical 121 model likelihood with penalty and other hierarchies becomes Q(L, D, β, σ) = T  X t=1 + tr(S t Lt Dt Dt (Lt )0 ) − log det Lt Dt Dt (Lt )0 T X t=2 − λ1 kLt k  1 ! p 2 1 X ) + p log(σ t ) , log(dti ) − βit log(dt−1 i t 2 2(σ ) i=1 where dti is the i − th diagonal entry of D t . 122 BIBLIOGRAPHY 123 BIBLIOGRAPHY [1] Allen, J. S., Damasio, H., and Grabowski, T. J. Normal neuroanatomical variation in the human brain: An mri-volumetric study. American journal of physical anthropology 118, 4 (2002), 341–358. [2] Aparicio, P., Mandaltsi, A., Boamah, J., Chen, H., Selimovic, A., Bratby, M., Uberoi, R., Ventikos, Y., and Watton, P. Modelling the influence of endothelial heterogeneity on the progression of arterial disease: application to abdominal aortic aneurysm evolution. International Journal for Numerical Methods in Biomedical Engineering 30, 5 (2014), 563–586. [3] Arzani, A., Les, A., Dalman, R., and Shadden, S. Effect of exercise on patient specific abdominal aortic aneurysm flow topology and mixing. International Journal for Numerical Methods in Biomedical Engineering 30, 2 (2014), 280295. [4] Barnes, D. E., and Lee, S. J. Predicting alzheimer’s risk: why and how? Alzheimer’s Research & Therapy 3, 6 (2011), 1–3. [5] Berger, J. Statistical decision theory and bayesian inference. Springer0Verlag (New York) (1985). [6] Bergerson, J., and Muehleisen, R. Bayesian large model calibration using simulation and measured data for improved predictions. SAE International Journal of Passenger Cars-Mechanical Systems 8, 2015-01-0481 (2015), 415–420. [7] Biehler, J., Gee, M. W., and Wall, W. A. Towards efficient uncertainty quantification in complex and large-scale biomechanical problems based on a bayesian multifidelity scheme. Biomechanics and modeling in mechanobiology 14, 3 (2015), 489–513. [8] Bouwman, F., Schoonenboom, S., van Der Flier, W., Van Elk, E., Kok, A., Barkhof, F., Blankenstein, M., and Scheltens, P. Csf biomarkers and medial temporal lobe atrophy predict dementia in mild cognitive impairment. Neurobiology of aging 28, 7 (2007), 1070–1074. [9] Boyd, S., and Vandenberghe, L. Convex optimization. Cambridge university press, 2004. [10] Breslow, N. Biostatistics and bayes. Statistical Science (1990), 269–284. [11] Carvalho, C. M., Polson, N. G., and Scott, J. G. The horseshoe estimator for sparse signals. Biometrika 97, 2 (2010), 465–480. 124 [12] Cedarbaum, J. M., Jaros, M., Hernandez, C., Coley, N., Andrieu, S., Grundman, M., Vellas, B., Initiative, A. D. N., et al. Rationale for use of the clinical dementia rating sum of boxes as a primary outcome measure for alzheimers disease clinical trials. Alzheimer’s & Dementia 9, 1 (2013), S45–S55. [13] Chaikof, E. L., Brewster, D. C., Dalman, R. L., Makaroun, M. S., Illig, K. A., Sicard, G. A., Timaran, C. H., Upchurch, G. R., and Veith, F. J. The care of patients with an abdominal aortic aneurysm: the society for vascular surgery practice guidelines. Journal of Vascular Surgery 50, 4 (2009), S2–S49. [14] Chen, Y., and Pham, T. D. Development of a brain mri-based hidden markov model for dementia recognition. Biomedical engineering online 12, Suppl 1 (2013), S2. [15] Chib, S., and Greenberg, E. Understanding the Metropolis-Hastings Algorithm. American Statistician 49, 4 (1995), 327–335. [16] Christopher, J. Multi-state models for panel data: The msm package for r. Journal of Statistical Software 38, 8 (1 2011), 1–28. [17] Commenges, D. Multi-state models in epidemiology. Lifetime data analysis 5, 4 (1999), 315–327. [18] Commenges, D. Inference for multi-state models from interval-censored data. Statistical methods in medical research 11, 2 (2002), 167–182. [19] Commenges, D., Joly, P., Letenneur, L., and Dartigues, J.-F. Incidence and mortality of alzheimer’s disease or dementia using an illness-death model. Statistics in medicine 23, 2 (2004), 199–210. [20] Corder, E., Saunders, A., Strittmatter, W., Schmechel, D., Gaskell, P., Small, G., Roses, A. D., Haines, J., and Pericak-Vance, M. A. Gene dose of apolipoprotein e type 4 allele and the risk of alzheimer’s disease in late onset families. Science 261, 5123 (1993), 921–923. [21] Crane, P. K., Carle, A., Gibbons, L. E., Insel, P., Mackin, R. S., Gross, A., Jones, R. N., Mukherjee, S., Curtis, S. M., Harvey, D., et al. Development and assessment of a composite score for memory in the alzheimers disease neuroimaging initiative (adni). Brain imaging and behavior 6, 4 (2012), 502–516. [22] Currin, C., Mitchell, T., Morris, M., and Ylvisaker, D. Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. Journal of the American Statistical Association 86, 416 (1991), 953–963. [23] Cyron, C., and Humphrey, J. Vascular homeostasis and the concept of mechanobiological stability. International journal of engineering science 85 (2014), 203–223. 125 [24] Cyron, C., Wilson, J., and Humphrey, J. Mechanobiological stability: a new paradigm to understand the enlargement of aneurysms? Journal of The Royal Society Interface 11, 100 (2014), 20140680. [25] Dahabreh, I., Chan, J., Earley, A., Moorthy, D., Avendano, E., Trikalinos, T., Balk, E., and Wong, J. Modeling and simulation in the context of health technology assessment: Review of existing guidance, future research needs, and validity assessment. [26] Datta, J., Ghosh, J. K., et al. Asymptotic properties of bayes risk for the horseshoe prior. Bayesian Analysis 8, 1 (2013), 111–132. [27] Dawkins, C., Srinivasan, T. N., and Whalley, J. Calibration. Handbook of econometrics 5 (2001), 3653–3703. [28] Desikan, R., Fischl, B., Cabral, H., Kemper, T., Guttmann, C., Blacker, D., Hyman, B., Albert, M., and Killiany, R. Mri measures of temporoparietal regions show differential rates of atrophy during prodromal ad. Neurology 71, 11 (2008), 819–825. [29] Dietterich, T. Overfitting and undercomputing in machine learning. ACM computing surveys (CSUR) 27, 3 (1995), 326–327. [30] Dunsmore, I. A bayesian approach to calibration. Journal of the Royal Statistical Society. Series B (Methodological) 30, 2 (1968), 396–405. [31] Erhart, P., Hyhlik-Dürr, A., Geisbüsch, P., Kotelis, D., MüllerEschner, M., Gasser, T. C., von Tengg-Kobligk, H., and Böckler, D. Finite element analysis in asymptomatic, symptomatic, and ruptured abdominal aortic aneurysms: in search of new rupture risk predictors. European Journal of Vascular and Endovascular Surgery 49, 3 (2015), 239–245. [32] Fjell, A. M., McEvoy, L., Holland, D., Dale, A. M., Walhovd, K. B., Initiative, A. D. N., et al. What is normal in normal aging? effects of aging, amyloid and alzheimer’s disease on the cerebral cortex and the hippocampus. Progress in neurobiology 117 (2014), 20–40. [33] Friedman, J., Hastie, T., and Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 3 (2008), 432–441. [34] Frisoni, G. B., Fox, N. C., Jack, C. R., Scheltens, P., and Thompson, P. M. The clinical use of structural mri in alzheimer disease. Nature Reviews Neurology 6, 2 (2010), 67–77. 126 [35] Gaser, C., Franke, K., Klöppel, S., Koutsouleris, N., Sauer, H., Initiative, A. D. N., et al. Brainage in mild cognitive impaired patients: predicting the conversion to alzheimers disease. PloS one 8, 6 (2013), e67346. [36] Gasser, T. C. An irreversible constitutive model for fibrous soft biological tissue: a 3d microfiber approach with demonstrative application to abdominal aortic aneurysms. Acta biomaterialia 7, 6 (2011), 2457–2466. [37] Gelb, D. J. Measurement of progression in alzheimer’s disease: a clinician’s perspective. Statistics in Medicine 19, 11-12 (2000), 1393–1400. [38] George, E. I., and McCulloch, R. E. Approaches for bayesian variable selection. Statistica sinica (1997), 339–373. [39] Gerdes, L. U., Klausen, I. C., Sihm, I., Faergeman, O., and Vogler, G. Apolipoprotein e polymorphism in a danish population compared to findings in 45 other study populations around the world. Genetic epidemiology 9, 3 (1992), 155–167. [40] Gharahi, H., Zambrano, B., Lim, C., Choi, J., Lee, W., and Baek, S. On growth measurements of abdominal aortic aneurysms using maximally inscribed sphere, 2015. [41] Ghosal, S. A review of consistency and convergence of posterior distribution. In Varanashi Symposium in Bayesian Inference, Banaras Hindu University (1997). [42] Giorgio, A., and De Stefano, N. Clinical use of brain volumetry. Journal of Magnetic Resonance Imaging 37, 1 (2013), 1–14. [43] Godbey, M. W. The use of variable-bagging and the cross-validation selector in the prediction of alzheimers using the adni database. [44] Goel, M., Khanna, P., and Kishore, J. Understanding survival analysis: Kaplanmeier estimate. International journal of Ayurveda research 1, 4 (2010), 274. [45] Hankin, R. K. Introducing BACCO, an R bundle for Bayesian Analysis of Computer Code Output. Journal of Statistical Software 14, 16 (2005), 1–21. [46] Hansen, L. P., and Heckman, J. J. The empirical foundations of calibration. The Journal of Economic Perspectives 10, 1 (1996), 87–104. [47] Hawkins-Daarud, A., Prudhomme, S., van der Zee, K. G., and Oden, J. T. Bayesian calibration, validation, and uncertainty quantification of diffuse interface models of tumor growth. Journal of mathematical biology 67, 6-7 (2013), 1457–1485. [48] Heckerman, D., et al. A tutorial on learning with bayesian networks. Nato Asi Series D Behavioural And Social Sciences 89 (1998), 301–354. 127 [49] Higdon, D., Gattiker, J., Williams, B., and Rightley, M. Computer model calibration using high-dimensional output. Journal of the American Statistical Association 103, 482 (2008). [50] Higdon, D., Kennedy, M., Cavendish, J. C., Cafeo, J. A., and Ryne, R. D. Combining field data and computer simulations for calibration and prediction. SIAM Journal on Scientific Computing 26, 2 (2004), 448–466. [51] Holland, J. The reverend thomas bayes, frs (1702-61). Journal of the Royal Statistical Society. Series A (General) 125, 3 (1962), 451–461. [52] Horwitz, B. The elusive concept of brain connectivity. Neuroimage 19, 2 (2003), 466–470. [53] Jackson, C. H., Sharples, L. D., Thompson, S. G., Duffy, S. W., and Couto, E. Multistate markov models for disease progression with classification error. Journal of the Royal Statistical Society: Series D (The Statistician) 52, 2 (2003), 193– 209. [54] Jost, B. C., and Grossberg, G. T. The natural history of alzheimer’s disease: A brain bank study. Journal of the American Geriatrics Society 43, 11 (1995), 1248–1255. [55] JS, W., Baek, S., and Humphrey, J. Importance of initial aortic properties on the evolving regional anisotropy, stiffness and wall thickness of human abdominal aortic aneurysms. Journal of the Royal Society Interface 9, 74 (2012), 2047–2058. [56] Kadane, J. B. Prime time for bayes. Controlled Clinical Trials 16, 5 (1995), 313–318. [57] Kawas, C., Gray, S., Brookmeyer, R., Fozard, J., and Zonderman, A. Age-specific incidence rates of alzheimers disease the baltimore longitudinal study of aging. Neurology 54, 11 (2000), 2072–2077. [58] Kennedy, M. C., and O’Hagan, A. Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63, 3 (2001), 425–464. [59] Klink, A., Hyafil, F., Rudd, J., Faries, P., Fuster, V., Mallat, Z., Meilhac, O., Mulder, W. J., Michel, J.-B., Ramirez, F., et al. Diagnostic and therapeutic strategies for small abdominal aortic aneurysms. Nature Reviews Cardiology 8, 6 (2011), 338–347. [60] Kniemeyer, H., Kessler, T., Reber, P., Ris, H., Hakki, H., and Widmer, M. Treatment of ruptured abdominal aortic aneurysm, a permanent challenge or a waste of resources? prediction of outcome using a multi-organ-dysfunction score. European Journal of Vascular and Endovascular Surgery 19, 2 (2000), 190–196. 128 [61] Kwon, S., Burek, W., Dupay, A., Farsad, M., Baek, S., Park, E., and Lee, W. Interaction of expanding abdominal aortic aneurysm with surrounding tissue: Retrospective ct image studies. Journal of nature and science 1, 8 (2015), e150. [62] Kwon, S., Rectenwald, J., and Baek, S. Intrasac pressure changes and vascular remodeling after endovascular repair of abdominal aortic aneurysms: review and biomechanical model simulation. Journal of biomechanical engineering 133, 1 (2011), 011011. [63] Lederle, F. A., Wilson, S. E., Johnson, G. R., Reinke, D. B., Littooy, F. N., Acher, C. W., Ballard, D. J., Messina, L. M., Gordon, I. L., Chute, E. P., et al. Immediate repair compared with surveillance of small abdominal aortic aneurysms. New England Journal of Medicine 346, 19 (2002), 1437–1444. [64] Likeman, M., Anderson, V. M., Stevens, J. M., Waldman, A. D., Godbolt, A. K., Frost, C., Rossor, M. N., and Fox, N. C. Visual assessment of atrophy on magnetic resonance imaging in the diagnosis of pathologically confirmed youngonset dementias. Archives of neurology 62, 9 (2005), 1410–1415. [65] Lilford, R., and Braunholtz, D. For debate: The statistical basis of public policy: a paradigm shift is overdue. Bmj 313, 7057 (1996), 603–607. [66] Lorenzi, M., Pennec, X., Frisoni, G. B., Ayache, N., Initiative, A. D. N., et al. Disentangling normal aging from alzheimer’s disease in structural magnetic resonance images. Neurobiology of aging 36 (2015), S42–S52. [67] Machin, D., Cheung, Y. B., and Parmar, M. K. Cox’s proportional hazards model. Survival Analysis: A Practical Approach, Second Edition (2006), 121–153. [68] Maier, A., Gee, M., Reeps, C., Pongratz, J., Eckstein, H.-H., and Wall, W. A comparison of diameter, wall stress, and rupture potential index for abdominal aortic aneurysm rupture risk prediction. Annals of biomedical engineering 38, 10 (2010), 3124–3134. [69] McKay, M. D., Beckman, R. J., and Conover, W. J. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21, 2 (1979), 239–245. [70] Meira-Machado, L. F., de Uña-Álvarez, J., Cadarso-Suárez, C., and Andersen, P. Multi-state models for the analysis of time-to-event data. Statistical methods in medical research (2008). [71] Mills, K. L., and Tamnes, C. K. Methods and considerations for longitudinal structural brain imaging analysis across development. Developmental cognitive neuroscience 9 (2014), 172–190. 129 [72] Monti, R. P., Hellyer, P., Sharp, D., Leech, R., Anagnostopoulos, C., and Montana, G. Estimating time-varying brain connectivity networks from functional mri time series. NeuroImage 103 (2014), 427–443. [73] Muehleisen, R. T., and Bergerson, J. Bayesian calibration-what, why and how. [74] New, J., and Chandler, T. Evolutionary tuning of building models to monthly electrical consumption. ASHRAE Transactions 119 (2013), 89. [75] Norris, J. R. Markov Chains. No. 7. Cambridge University Press, 1998. [76] of Neuro Imaging, L. Alzheimer’s disease. [77] O’Hagan, A. Bayesian analysis of computer code outputs: a tutorial. Reliability Engineering & System Safety 91, 10 (2006), 1290–1300. [78] Petersen, R. C., Smith, G. E., Waring, S. C., Ivnik, R. J., Tangalos, E. G., and Kokmen, E. Mild cognitive impairment: clinical characterization and outcome. Archives of neurology 56, 3 (1999), 303–308. [79] Porth, C. Essentials of pathophysiology: Concepts of altered health states. Lippincott Williams & Wilkins, 2010. Edition 3. [80] Racine-Poon, A. A bayesian approach to nonlinear calibration problems. Journal of the American Statistical Association 83, 403 (1988), 650–656. [81] Raftery, P., Keane, M., and ODonnell, J. Calibrating whole building energy models: An evidence-based methodology. Energy and Buildings 43, 9 (2011), 2356– 2364. [82] Reddy, T. A., Maor, I., and Panjapornpon, C. Calibrating detailed building energy simulation programs with measured datapart i: General methodology (rp-1051). Hvac&R Research 13, 2 (2007), 221–241. [83] Reeps, C., Maier, A., Pelisek, J., Härtl, F., Grabher-Meier, V., Wall, W. A., Essler, M., Eckstein, H.-H., and Gee, M. W. Measuring and modeling patient-specific distributions of material properties in abdominal aortic aneurysm wall. Biomechanics and Modeling in Mechanobiology 12, 4 (2013), 717–733. [84] Ribeiro, P. F., Ventura-Antunes, L., Gabi, M., Mota, B., Grinberg, L. T., Farfel, J. M., Ferretti-Rebustini, R. E., Leite, R. E., HerculanoHouzel, S., et al. The human cerebral cortex is neither one nor many: neuronal distribution reveals two quantitatively different zones in the gray matter, three in the white matter, and explains local variations in cortical folding. Front. Neuroanat 7, 28 (2013), 10–3389. 130 [85] Riddle, M., and Muehleisen, R. T. A guide to bayesian calibration of building energy models. In Building Simulation Conference (2014). [86] Risacher, S. L., Kim, S., Shen, L., Nho, K., Foroud, T., Green, R. C., Petersen, R. C., Jack Jr, C. R., Aisen, P. S., Koeppe, R. A., et al. The role of apolipoprotein e (apoe) genotype in early mild cognitive impairment (e-mci). [87] Rizzo, R. J., McCarthy, W. J., Dixit, S. N., Lilly, M. P., Shively, V. P., Flinn, W. R., and Yao, J. S. Collagen types and matrix protein content in human abdominal aortic aneurysms. Journal of Vascular Surgery 10, 4 (1989), 365 – 373. [88] Rothman, A. J., Levina, E., and Zhu, J. A new approach to cholesky-based covariance regularization in high dimensions. Biometrika 97, 3 (2010), 539–550. [89] Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P. Design and analysis of computer experiments. Statistical science (1989), 409–423. [90] Salthouse, T. A. Robust cognitive change. Journal of the International Neuropsychological Society 18, 04 (2012), 749–756. [91] Santner, T. J., Williams, B. J., and Notz, W. I. The design and analysis of computer experiments. Springer Science & Business Media, 2013. [92] Saunders, A., Strittmatter, W., Schmechel, D., George-Hyslop, P. S., Pericak-Vance, M., Joo, S., Rosi, B., Gusella, J., Crapper-MacLachlan, D., Alberts, M., et al. Association of apolipoprotein e allele 4 with late-onset familial and sporadic alzheimer’s disease. Neurology 43, 8 (1993), 1467–1467. [93] Schmitter, D., Roche, A., Maréchal, B., Ribes, D., Abdulkadir, A., Bach-Cuadra, M., Daducci, A., Granziera, C., Klöppel, S., Maeder, P., et al. An evaluation of volume-based morphometry for prediction of mild cognitive impairment and alzheimer’s disease. NeuroImage: Clinical 7 (2015), 7–17. [94] Seyedsalehi, S., Zhang, L., Choi, J., and Baek, S. Prior distributions of material parameters for bayesian calibration of growth and remodeling computational model of abdominal aortic wall. J Biomech Eng 137, 10 (2015), 101001. [95] Sheidaei, A., Hunley, S., Zeinali-Davarani, S., Raguin, L., and Baek, S. Simulation of abdominal aortic aneurysm growth with updating hemodynamic loads using a realistic geometry. Medical engineering & physics 33, 1 (2011), 80–88. [96] Shi, J., Lepore, N., Gutman, B. A., Thompson, P. M., Baxter, L. C., Caselli, R. J., and Wang, Y. Genetic influence of apolipoprotein e4 genotype on hippocampal morphometry: An n= 725 surface-based alzheimer’s disease neuroimaging initiative study. Human brain mapping 35, 8 (2014), 3903–3918. 131 [97] Spampinato, M. V., Rumboldt, Z., Hosker, R. J., and Mintzer, J. E. Apolipoprotein e and gray matter volume loss in patients with mild cognitive impairment and alzheimer disease. Radiology 258, 3 (2011), 843–852. [98] Spiegelhalter, D. J., Myles, J. P., Jones, D. R., and Abrams, K. R. Bayesian methods in health technology assessment: a review. [99] Stephenson, G. Statistical analysis of computer models and the use of derivative information. [100] Sukkar, R., Katz, E., Zhang, Y., Raunig, D., and Wyman, B. T. Disease progression modeling using hidden markov models. In Engineering in Medicine and Biology Society (EMBC), 2012 Annual International Conference of the IEEE (2012), IEEE, pp. 2845–2848. [101] Sweeting, M., Farewell, V., and De Angelis, D. Multi-state markov models for disease progression in the presence of informative examination times: An application to hepatitis c. Statistics in medicine 29, 11 (2010), 1161–1174. [102] Swerdlow, R. H. Brain aging, alzheimer’s disease, and mitochondria. Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease 1812, 12 (2011), 1630–1639. [103] Tuo, R., and Jeff Wu, C. A theoretical framework for calibration in computer models: parametrization, estimation and convergence properties. SIAM/ASA Journal on Uncertainty Quantification 4, 1 (2016), 767–795. [104] Van Der Pas, S., Kleijn, B., Van Der Vaart, A., et al. The horseshoe estimator: Posterior concentration around nearly black vectors. Electronic Journal of Statistics 8, 2 (2014), 2585–2618. [105] Vemuri, P., Wiste, H., Weigand, S., Shaw, L., Trojanowski, J., Weiner, M., Knopman, D., Petersen, R., Jack, C., Initiative, A. D. N., et al. Mri and csf biomarkers in normal, mci, and ad subjects predicting future clinical change. Neurology 73, 4 (2009), 294–301. [106] Walker, S. New approaches to bayesian consistency. Annals of Statistics (2004), 2028–2043. [107] Wang, Y., Resnick, S. M., and Davatzikos, C. Analysis of spatio-temporal brain imaging patterns by hidden markov models and serial mri images. Human brain mapping 35, 9 (2014), 4777–4794. [108] Watton, P. N., Huang, H., and Ventikos, Y. Multi-scale modelling of vascular disease: Abdominal aortic aneurysm evolution. Computational Modeling in Tissue Engineering 7 (2013), 309–339. 132 [109] Whitt, W. Continuous-time markov chains. edu/ ww2040 (2006). INTERNET: www. columbia. [110] Williams, C. K., and Rasmussen, C. E. Gaussian processes for machine learning. MIT Press (2006). [111] Xu, Y., Choi, J., Dass, S., and Maiti, T. Sequential Bayesian prediction and adaptive sampling algorithms for mobile sensor networks. IEEE Transactions on Automatic Control 57, 8 (2012), 2078–2084. [112] Yamamoto, S., Miyazaki, M., Iwano, T., and Kitazawa, S. Bayesian calibration of simultaneity in audiovisual temporal order judgments. PLoS ONE 7 (2012), 1–9. [113] Yu, H.-m., Yang, S.-s., Gao, J.-w., Zhou, L.-y., Liang, R.-f., and Qu, C.-y. Multi-state markov model in outcome of mild cognitive impairments among community elderly residents in mainland china. International Psychogeriatrics 25, 05 (2013), 797– 804. [114] Zankl, A., Schumacher, H., Krumsdorf, U., Katus, H., Jahn, L., et al. Pathology, natural history and treatment of abdominal aortic aneurysms. Clinical Research in Cardiology 96, 3 (2007), 140–151. [115] Zeinali-Davarani, S., and Baek, S. Medical image-based simulation of abdominal aortic aneurysm growth. Mechanics Research Communications 42 (2012), 107–117. [116] Zeinali-Davarani, S., Raguin, L. G., Vorp, D. A., and Baek, S. Identification of in vivo material and geometric parameters of a human aorta: toward patient-specific modeling of abdominal aortic aneurysm. Biomechanics and modeling in mechanobiology 10, 5 (2011), 689–699. [117] Zeinali-Davarani, S., Sheidaei, A., and Baek, S. A finite element model of stress-mediated vascular adaptation: application to abdominal aortic aneurysms. Computer methods in biomechanics and biomedical engineering 14, 9 (2011), 803–817. [118] Zhang, D., Shen, D., Initiative, A. D. N., et al. Predicting future clinical changes of mci patients using longitudinal and multimodal biomarkers. PloS one 7, 3 (2012), e33182. [119] Zhao, X., and Pelegri, A. A. A bayesian approach for characterization of soft tissue viscoelasticity in acoustic radiation force imaging. Int. J. Numer. Meth. Biomed. Engng. (2015). [120] Zhou, S., Lafferty, J., and Wasserman, L. Time varying undirected graphs. Machine Learning 80, 2 (2010), 295–319. 133