SPATIO-TEMPORAL FIELD PREDICTION UNDER LOCALIZATION UNCERTAINTY FOR MOBILE SENSOR NETWORKS By Mahdi Jadaliha A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Mechanical Engineering-Doctor of Philosophy 2013 ABSTRACT SPATIO-TEMPORAL FIELD PREDICTION UNDER LOCALIZATION UNCERTAINTY FOR MOBILE SENSOR NETWORKS By Mahdi Jadaliha Recently, there has been a growing interest in wireless sensor networks due to the advanced embedded system and network technologies. Their applications include, but are not limited to, environment monitoring, building comfort control, traffic control, manufacturing and plant automation, and surveillance systems. The conventional inverse problem approach based on physical transport models is computationally prohibitive for resource-constrained, multi-agent systems. In contrast, Gaussian process and Gaussian Markov models have been widely used to draw statistical inference from geostatistical and environmental data. However, the statistical models need to be carefully tailored such that they can be practical and usable for mobile sensor networks with limited resources. Additionally, one needs to be careful in coping with localization uncertainty resulted from low-cost mobile sensor networks. Thus, a fundamental problem in various applications is to correctly fuse the spatially collected data and estimate the process of interest under localization uncertainty. Motivated by the aforementioned issues, in this dissertation, we consider the problem of using mobile sensor networks to estimate and predict environmental fields modeled by spatio-temporal Gaussian processes and Gaussian Markov random fields in the presence of localization uncertainty. In the first part of this dissertation, we formulate Gaussian process regression with observations under the localization uncertainty. In our formulation, effects of measurement noise, localization uncertainty, and prior distributions are all correctly incorporated in the posterior predictive statistics. The analytically intractable posterior predictive statistics are proposed to be approximated by two techniques, viz., Monte Carlo sampling and Laplaces method. In addition, the localization problem is studied in this part, when the position of the robot is estimated by a maximum likelihood estimation (MLE) using vision data. We transform the high dimensional vision data to a set of uncorrelated feature candidates. A multivariate GP regression with unknown hyperparameters is formulated to connect the set of selected features to their corresponding sampling positions. In order to decrease computational load and increase the accuracy of localization, a feature reduction approach is developed. Therefore, a subset of the features is selected to minimize the localization error using cross-validation methods. In the second part of the dissertation, we consider the problem of predicting a spatial (spatio-temporal) Gaussian Markov random field (GMRF) using sequential noisy observations collected by robotic sensors. The random field of interest is modeled by a GMRF instead of Gaussian process. In this way, we propose iteratively updated predictive inferences. We derive the exact Bayesian solution to the problem of computing the predictive inference of the random field, taking into account uncertain hyperparameters, measurement noise, and uncertain localization in a fully Bayesian point of view. We show that the exact solution is not scalable as the number of observations increases. To cope with this exponentially increasing complexity, we propose scalable approximations with a controllable tradeoff between approximation error and complexity to the exact solution. Finally, we derive an approximate Bayesian solution to the problem of the simultaneously localization and computing the predictive inferences, taking into account observations, uncertain hyperparameters, measurement noise, kinematics of robotic sensors, and uncertain localization. ACKNOWLEDGMENTS I would like to thank everybody who helped to accomplish this dissertation. First of all, I would like to express my sincere appreciation and gratitude to my adviser, Dr. Jongeun Choi, who continuously encouraged, and supported me. This thesis is not possible without his invaluable advice and effort to improve the work. In addition, I would like to thank all my committee members, Dr. Steven Shaw, Dr. Guoming Zhu, and Dr. Tapabrata Maiti, for their contribution and suggestions. Last but not least, I cannot thank enough my sister, my parents and my wife for always being there for me. I would never have been able to accomplish what I have without them. iv TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Chapter 1 Preliminaries . . . . . . . . . . . 1.1 Mathematical Notation . . . . . . . . . . 1.2 Physical Process Model . . . . . . . . . . 1.2.1 Gaussian process . . . . . . . . . 1.2.2 Spatio-temporal Gaussian process 1.2.3 Gaussian Markov random field . . 1.3 Mobile Sensor Network . . . . . . . . . . 1.4 Gaussian Processes for Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 4 5 5 7 Chapter 2 Gaussian process regression under localization uncertainty 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Gaussian Process Regression . . . . . . . . . . . . . . . . . . . . 2.2.2 Monte Carlo and Importance Sampling . . . . . . . . . . . . . . 2.2.3 Laplace Approximations . . . . . . . . . . . . . . . . . . . . . . 2.3 The Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 The Monte Carlo method . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Predictive mean . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Predictive variance . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Fully Exponential Laplace Approximations . . . . . . . . . . . . . . . . 2.5.1 Perdictive mean . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Predictive variance . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Simple Laplace approximations . . . . . . . . . . . . . . . . . . 2.6 Error and Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . 2.7 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.1 Experiment with simulated noisy sampling positions . . . . . . . 2.8.2 Experiment with noisy sampling positions . . . . . . . . . . . . 2.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 11 15 15 18 20 23 27 27 28 29 30 32 32 34 35 38 38 49 51 v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 55 60 63 64 66 69 69 70 72 77 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 80 83 88 89 93 93 99 106 107 113 116 117 121 Chapter 5 Simultaneous Localization and Spatial Prediction . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Sequential Bayesian inference with a GMRF . . . . . . . . . . . . 5.2.1 Gaussian processes and Gaussian Markov random fields . . 5.2.2 Multiple robotic sensors . . . . . . . . . . . . . . . . . . . 5.2.3 Kinematics of robotic vehicles . . . . . . . . . . . . . . . . 5.2.4 Problem formulation and its Bayesian predictive inference 5.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 122 125 125 126 127 129 133 135 Chapter 3 Position estimation using Gaussian process regression 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Image features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Gaussian process (GP) model . . . . . . . . . . . . . . . . . . . . 3.3.1 The ρ-th random field . . . . . . . . . . . . . . . . . . . . 3.4 Localization and feature selection . . . . . . . . . . . . . . . . . . 3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Experimental setups . . . . . . . . . . . . . . . . . . . . . 3.5.2 Learning of GP models in an empirical Bayes approach . . 3.5.3 Comparison study . . . . . . . . . . . . . . . . . . . . . . 3.6 Conclusion and future works . . . . . . . . . . . . . . . . . . . . . Chapter 4 Bayesian Prediction with Uncertain Localization 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Gaussian Process and Gaussian Markov Random fields . . . 4.3 Mobile Senor Networks . . . . . . . . . . . . . . . . . . . . . 4.4 Spatio-Temporal Field Model . . . . . . . . . . . . . . . . . 4.5 Bayesian Predictive Inference . . . . . . . . . . . . . . . . . 4.5.1 Uncertain hyperparameters and exact localization . . 4.5.2 Uncertain hyperparameters and localization . . . . . 4.5.3 Complexity of Algorithms . . . . . . . . . . . . . . . 4.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 4.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 4.7.1 The Static Experiment . . . . . . . . . . . . . . . . . 4.7.2 The Dynamic Experiment . . . . . . . . . . . . . . . 4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . 138 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 BIBLIOGRAPHY . . . . . . . . . . . . . vi . . . . . . . . . . . . . . . . . 141 LIST OF TABLES Table 2.1 Error and complexity analysis . . . . . . . . . . . . . . . . . . . . . 35 Table 2.2 Simulation RMS error . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Table 2.3 Experiment parameters . . . . . . . . . . . . . . . . . . . . . . . . . 45 Table 2.4 Experimental RMS error w.r.t. Gaussian estimation . . . . . . . . . 48 Table 3.1 Computational time . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Table 3.2 The localization performance comparison . . . . . . . . . . . . . . . 74 vii LIST OF FIGURES Figure 2.1 For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. The prediction results of applying Gaussian process regression on the true and noisy sampling position are shown. The first, second, and third rows correspond to the prediction, prediction error variance, and squared empirical error (between predicted and true fields) fields. The first column shows the result of applying Gaussian process regression on the true sampling positions. Second and third columns show the results of applying Gaussian process regression on the noisy sampling positions with 0.1 and 0.4 noise covariance matrices, respectively. True and noisy sampling positions are shown in circles with agent indices in the second row. . . . . . . . . . . . . . . 13 Figure 2.2 A realization of the Gaussian process-ground truth. . . . . . . . . . 35 Figure 2.3 Gaussian process regression using the true positions x. a) The predictive mean estimation. b) The predictive variance estimation, and the true sampling positions x in aquamarine crosses. . . . . . . . . . 38 ¯ The results of the QDS 1 approximations using x. a) The predictive ¯ mean estimation. b) The predictive variance estimation, and x shown with aquamarine crosses. . . . . . . . . . . . . . . . . . . . . . . . . 39 The results of the QDS 2 approximations using E(x|¯ ). a) The prex dictive mean estimation. b) The predictive variance estimation, and E(x|¯ ) shown with aquamarine crosses. . . . . . . . . . . . . . . . . x 40 The results of the Monte Carlo approximations with 1000 samples us¯ ing x. a) The predictive mean estimation. b) The predictive variance ¯ estimation, and x shown with aquamarine crosses.. . . . . . . . . . . 41 ¯ The results of the Laplace approximations using x. a) Predictive ¯ mean estimation. b) The predictive variance estimation, and x shown with aquamarine crosses. . . . . . . . . . . . . . . . . . . . . . . . . 42 Figure 2.4 Figure 2.5 Figure 2.6 Figure 2.7 viii Figure 2.8 Figure 2.9 Figure 2.10 Figure 2.11 Figure 2.12 Figure 2.13 Figure 2.14 Figure 2.15 Figure 2.16 Figure 2.17 ¯ The results of the simple Laplace approximations using x. a) The predictive mean estimation. b) The predictive variance estimation, ˆ ˆ and x shown with aquamarine crosses. x is the estimation of the true sampling positions, computed as a by-product of both fully exponential Laplace and simple Laplace approximations. . . . . . . . . . . . 43 (a) The dye pumping system. (b) An example of normal dye concentration applied to the stream for the pheromone distribution estimation. (c) An example of the highly visible dye plume near the source location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 The results of Gaussian process regression using true x. a) The posterior mean estimation. b) The posterior variance estimation, and x shown with aquamarine crosses. . . . . . . . . . . . . . . . . . . . . 46 ¯ The results of the QDS 1 approach using noisy positions x. a) The predictive mean estimation. b) The predictive variance estimation, ¯ and x shown with aquamarine crosses. . . . . . . . . . . . . . . . . . 46 The results of the Monte Carlo method with 1000 samples using noisy ¯ positions x. a) The predictive mean estimation. b) The predictive ¯ variance estimation, and x shown with aquamarine crosses. . . . . . 47 ¯ The results of the Laplace method using noise augmented positions x shown with aquamarine crosses. a) The predictive mean estimation. ¯ b) The predictive variance estimation and x shown with aquamarine crosses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 The results of the simple Laplace method using noise augmented po¯ sitions x shown with aquamarine crosses. a) The predictive mean ˆ estimation. b) The predictive variance estimation and x shown with aquamarine crosses. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 a) The developed robotic sensor. b) The experimental environmenta 12 × 6 meters outdoor swimming pool. . . . . . . . . . . . . . . . 50 ¯ The results of the QDS 1 approach using noisy positions x. a) The predictive mean estimation. b) The predictive variance estimation, ¯ and x shown with aquamarine crosses. . . . . . . . . . . . . . . . . . 51 The results of the Monte Carlo method with 1000 samples using noisy ¯ positions x. a) The predictive mean estimation. b) The predictive ¯ variance estimation, and x shown with aquamarine crosses. . . . . . 52 ix Figure 2.18 Figure 2.19 Figure 3.1 Figure 3.2 ¯ The results of the Laplace approximations using x. a) Predictive ¯ mean estimation. b) The predictive variance estimation, and x shown with aquamarine crosses. . . . . . . . . . . . . . . . . . . . . . . . . 53 The results of the simple Laplace method using noise augmented po¯ sitions x shown with aquamarine crosses. a) The predictive mean ˆ estimation. b) The predictive variance estimation and x shown with aquamarine crosses. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 (a) and (b) show the wrapped omnidirectional image and the unwrapped panoramic image, respectively. (c) and (d) show the reduced size gray scale image and the two-dimensional FFT magnitude plot, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 (a) Dot iPhone Panorama lens is used for Case 1. (b) A vision-based robot was built equipped with a 360 degree omnidirectional camera for Case 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Figure 3.3 Estimated hyperparameters for each individual scalar field for Case 1. 72 Figure 3.4 Estimated hyperparameters for each individual scalar field for Case 2. 72 Figure 3.5 Predictive mean function map for each individual feature zk for Case 1. 73 Figure 3.6 Prediction variance map for each individual feature zk for Case 1. . Figure 3.7 Predictive mean function map for each individual feature zk for Case 2. 76 Figure 3.8 Prediction variance map for each individual feature zk for Case 2. . 77 Figure 4.1 Example of localization uncertainty for s[i] . The measured sampling location q [i] is indicated in a small red circle which is the closest ˜ point in the discrete support to the measured sampling position in the continuous space. The small red circle along with the blue squares and the blue star show the possible locations of the true sampling point q [i] according to the prior distribution P(q [i] = ˚(j) |˜[i] ) with s q a compact support as shown in the big red circle. The blue star indicates q [i] which is the closest point in the discrete support to the true sampling position in the continuous space. . . . . . . . . . . . 88 Elements of the precision matrix Qθ related to a given location. . . 92 Figure 4.2 x 75 Figure 4.3 Agents 1 and 3 measure their exact sampling positions while agent 2 has a noisy sampling position measured in the star location. Four possible sampling positions for the true sampling position of agent 2 6 2 6 2 are 2a, 2b, 2c and 2d with the prior probabilities 16 , 16 , 16 , and 16 , respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Figure 4.4 (a) The realized spatio-temporal field at time t = 2 is the summation of (b) the smooth mean field and (c) the GMRF as described in (4.13).109 Figure 4.5 The prediction results of Cases 1, 2, and 3 at time t = 2 are shown in the first, second, third columns, respectively. The first, second, and third rows correspond to the prediction, prediction error variance, and squared empirical error fields between the prediction and true fields. True, noisy, and probable sampling positions are shown in circles, stars, and corners of squares, respectively. . . . . . . . . . . . 111 Figure 4.6 (a) The RMS estimation error of βt v.s. time and (b) the posterior probability of the true hyperparameter vector v.s. time. The results for Cases 1, 2, and 3 are shown in blue circles, green stars, and red squares, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Figure 4.7 The mean square difference between Case 1 and Case 3 is shown for the different approximation orders m = 1, · · · , 5. On each box, the central mark is the median, the upper and lower edges of the box are the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points. . . . . . . . . . . . . . . . . . . . . . 113 Figure 4.8 The number of observations with uncertain sampling position on the three different cases is shown on the horizontal axis. The field prediction RMS error for Case 1, Case 2 and Case 3 are shown with white, black, and hatched bars, respectively. . . . . . . . . . . . . . . . . . 114 Figure 4.9 A vision-based robot is built by the authors, and equipped with a 360 degree omnidirectional camera. . . . . . . . . . . . . . . . . . . . . . 115 Figure 4.10 The experimental setup is shown. The measured position of the robot and 4 possible sampling positions are shown in dark green star and light green squares, respectively. The spacial sites are marked with aqua blue dots on the ground. The panoramic view of the robot is pictured on the upper left hand side of the figure. . . . . . . . . . . 115 Figure 4.11 The posterior probability of the hyperparameter vector at t = 2. . . 117 xi Figure 4.12 The prediction results of Cases 1, 2, and 3 at time t = 2 are shown in the first, second, third columns, respectively. The first and second rows correspond to the predictions and the natural logarithm of the prediction error variance. . . . . . . . . . . . . . . . . . . . . . . . 118 Figure 4.13 The time varying field is shown for 4 time iterations. The first column shows the moving object in the field. The second column shows the sampling positions with blue dots and the positioning blind spot with black areas. the third, forth and fifth columns show predicted field for Case 1, Case 3 and Case 4, respectively. The rows are correspond to the time intervals t = 1, 2, 3 and 4. . . . . . . . . . . . . . . . . . 120 Figure 5.1 The prediction results of Cases 1, 2, and 3 at time t = 100 are shown in the first, second, third columns, respectively. The first, second, and third rows correspond to the prediction, prediction error variance, and squared empirical error fields between the prediction and true fields. True, noisy, and probable sampling positions are shown in circles, stars, and dots, respectively. The x and y axis represent 2-D localization, and colors represent the value of the desired quantity in the locations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Figure 5.2 The trajectories of true, predicted, and noisy sampling positions of the robot are shown by red diamonds, blue dots, and green stars for time t ∈ {10, 11, · · · , 30}. The blue ellipsoids show the confidence regions of about 68% for the estimated sampling positions. . . . . . 137 xii LIST OF ALGORITHMS 9 6 xiii Chapter 1 Preliminaries 1.1 Mathematical Notation Standard notation will be used throughout the dissertation. Let R, R≥0 , R>0 , Z, Z≥0 , Z>0 denote, respectively, the sets of real, non-negative real, positive real, integer, non-negative integer, and positive integer numbers. In denotes the identity matrix of size n (I for an appropriate dimension). T For column vectors va ∈ Ra , vb ∈ Rb , and vc ∈ Rc , col(va , vb , vc ) := T va T vb T vc ∈ Ra+b+c stacks all vectors to create one column vector, and va denotes the Euclidean norm (or the vector 2-norm) of va . Let rowi (A) ∈ Rm and colj (A) ∈ Rn denote the i-th row and the j-th column of a matrix A ∈ Rn×m , respectively. |A| and |(| A) denote the determinant of a matrix A ∈ Rn×n , and tr(A) denotes trace of a square matrix A. Let AT ∈ Rm×n be the transpose of a matrix A ∈ Rn×m . The positive definiteness and the positive semi-definiteness of a square matrix A are denoted by A and A 0 0, respectively. The symbol ⊗ denotes the Kronecker product. Let E, Var and Cov denote the expectation operator, the variance operator and the covariance operator, respectively. A random vector z ∈ Rq , which is distributed by a multivariate Gaussian distribution of a mean µ ∈ Rq and a variance Σ ∈ Rq×q , is denoted by z ∼ N (µ, Σ). 1 We define the first and the second derivative operators on h : Rm → R with respect to a vector x ∈ Rm as follow. ∂h(x) ∂h(x) T ,··· , , ∂x1 ∂xm  ∂ 2 h(x) ∂ 2 h(x)  ∂x1 ∂x1 · · · ∂x1 ∂xm  2 . . 2 h(x) = ∂ h(x) =  .. . .  . . . T  ∂x∂x  2 ∂ h(x) ∂ 2 h(x) · · · ∂x ∂x ∂x ∂x h(x) = ∂h(x) = ∂x m 1 m     .   m Let x denote the standard Euclidean norm (2-norm) of a vector x. If there exists c and k ∈ R>0 , such that the approximation error satisfies x − x ≤ k| |, ∀ < c, we say that ˆ the error between x and x is of order O( ) and also write x − x = O( ). ˆ ˆ The relative complement of a set A in a set B is denoted by B \ A := B ∩ Ac , where Ac is the complement of A. For a set A ∈ I, we define zA = {zi | i ∈ A}. Let −A denote the set I \ A. For given A = {c, d} and B = {1, 2}, the multiplication between two sets is defined as B × A = {(1, c), (1, d), (2, c), (2, d)}. Other notation will be explained in due course. 1.2 Physical Process Model In this section, we review important notions on the Gaussian process which will be used to model the physical phenomenon. In particular, we introduce a class of spatio-temporal Gaussian process model with anisotropic covariance functions. The properties of Gaussian Markov Random fields (GMRF) are also briefly reviewed. 2 1.2.1 Gaussian process A Gaussian process can be thought of a generalization of a Gaussian distribution over a finite vector space to function space of infinite dimension. It is formally defined as follows [83, 84]: Definition 1.2.1. A Gaussian process (GP) is a collection of random variables, any finite number of which have a consistent1 joint Gaussian distribution. A Gaussian process z(x) ∼ GP µ(x), K(x, x ; θ) (1.1) is completely specified by its mean function µ(x) and covariance function K(x, x ; θ). Although not needed to be done, we take the mean function to be zero for notational simplicity2 , i.e., µ(x) = 0. If the covariance function K(x, x ; θ) is invariant to translations in the input space, i.e., K(x, x ; θ) = K(x − x ; θ), we call it stationary. Furthermore, if the covariance function is a function of only the distance between the inputs, i.e., K(x, x ; θ) = K( x − x ; θ), then it is called isotropic. In practice, a parametric family of functions is used instead of fixing the covariance function [3]. One common choice of a stationary covariance function is K(x, x ; θ) = σf 2 exp    b − =1  x −x 2 2σ 2  , (1.2) where x is the -th element of x ∈ Rb . From (1.2), it can be easily seen that the correlation between two inputs decreases as the distance between them increases. This decreasing rate 1 It is also known as the marginalization property [83]. It means simply that the random variables obey the usual rules of marginalization, etc. 2 This is not a drastic limitation since the mean of the posterior process is not confined to zero [84]. 3 depends on the choice of the kernel bandwidths {σ }. A very large kernel bandwidth means that the predictions would have little bearing on the corresponding input which is then said to be insignificant. σf 2 gives the overall vertical scale relative to the mean of the Gaussian process in the output space. These parameters play the role of hyperparameters since they correspond to the hyperparameters in neural networks and in the standard parametric model. Therefore, we define θ = (σf 2 , σ1 , · · · , σb )T ∈ Rb+1 as the hyperparameter vector. 1.2.2 Spatio-temporal Gaussian process In the first part of this dissertation, spatio-temporal Gaussian processes are of particular interest. Consider a spatio-temporal Gaussian process z(s, t) ∼ GP(µ(s, t), K(s, t, s , t ; θ)), which is a special case of the Gaussian process defined in (1.1), where x = (sT , t)T ∈ Rι × R≥0 . We consider the following generalized anisotropic covariance function K(x, x ; θ) with a hyperparameter vector θ := (σf 2 , σ1 , · · · , σι , σt )T ∈ Rι+2 : ι K(x, x ; θ) = σf 2 exp − (s − s )2 2σ 2 =1 exp − (t − t )2 2 2σt , (1.3) where s, s ∈ S ⊂ Rι , t, t ∈ R≥0 . {σ1 , · · · , σι } and σt are kernel bandwidths for space and time, respectively. (1.3) shows that points close in the measurement space and time indices are strongly correlated and produce similar values. In reality, the larger temporal distance two measurements are taken with, the less correlated they become, which strongly supports our generalized covariance function in (1.3). This may also justify the truncation 4 (or windowing) of the observed time series data to limit the size of the covariance matrix for reducing the computational cost. A spatially isotropic version of the covariance function in (1.3) has been used in [62]. 1.2.3 Gaussian Markov random field In this section, we introduce Gaussian Markov random fields (GMRFs). An undirected graph G = (V, E) is a tuple consisting of a set of vertices V := {1, · · · , n} and a set of edges E ⊂ V × V. The neighbors of i ∈ V in G are denoted by Ni := {j ∈ V | {i, j} ∈ E}. The GMRF is formally defined as follows [88]. Definition 1.2.2. (GMRF, [88, Definition 2.1]) A random vector z = (z [1] , · · · , z [n] )T ∈ Rn is called a GMRF with respect to a graph G = (V, E) with mean µ and precision matrix 0, if and only if its density has the form Q P(z) = |Q|1/2 (2π)n/2 1 exp − (z − µ)T Q(z − µ) , 2 and (Q)ij = 0 ⇔ {i, j} ∈ E for all i = j, where the precision matrix (or information matrix) Q = K−1 is the inverse of the covariance matrix K. 1.3 Mobile Sensor Network In this section, we consider the sensor network formed by multiple mobile sensing agents and present the measurement model used throughout the thesis. Let N be the number of sensing agents distributed over the surveillance region S ⊂ Rι . The identity of each agent is indexed by I := {1, 2, · · · , N }. Assume that all agents are 5 equipped with identical sensors and take noisy observations at time t ∈ Z>0 . At time t, the sensing agent i takes a noise-corrupted measurement y [i] (t) at its current location s[i] (t) ∈ S, i.e., [i] (t) i.i.d. N (0, σ 2 ), ∼ y [i] (t) = z(s[i] (t), t) + [i] (t), where the sensor noise [i] is considered to be an independent and identically distributed Gaussian random variable. σ 2 > 0 is the noise level and we define the signal-to-noise ratio as γ= σf 2 . σ2 Notice that when a static field is considered, we have z(s, t) = z(s). For notational simplicity, we denote the collection of positions of all N agents at time t as st , i.e., T st := s[1] (t)T , · · · , s[N ] (t)T ∈ SN . The collective measurements from all N mobile sensors at time t is denoted by yt := y [1] (t), · · · , y [N ] (t) T ∈ RN . The cumulative measurements from time t ∈ Z>0 to time t ∈ Z>0 is denoted by yt:t := yT , · · · , yT t t 6 T ∈ R(t −t+1)N . 1.4 Gaussian Processes for Regression (x[i] , y [i] ) | i = 1, · · · , n Suppose we have a data set D = collected by mobile sensing agents where x[i] denotes an input vector of dimension b and y [i] denotes a scalar value of the noise corrupted output. The objective of probabilistic regression is to compute the predictive distribution of the function values z := z(x ) at some test input x . For notational simplicity, we define the collective input vector x of dimension n × b as the aggregation of n input vectors (i.e., x := (x[1] , · · · , x[n] )T ), and the outputs are collected in a vector y := (y [1] , · · · , y [n] )T . The corresponding vector of noise-free outputs is defined as z := (z(x[1] ), · · · , z(x[n] ))T . The advantage of the Gaussian process formulation is that the combination of the prior and noise models can be carried out exactly via matrix operations [111]. The idea of Gaussian process regression is to place a GP prior directly on the space of functions without parameterizing the function z(·), i.e., P(z|θ) = N (µ, K), where µ ∈ Rn is the mean vector obtained by µi = µ(x[i] ), and K := Cov(z, z|θ) ∈ Rn×n is the covariance matrix obtained by (K)ij = K(x[i] , x[j] ; θ). Notice that the GP model and all expressions are always conditional on the corresponding inputs. In the following, we will neglect the explicit conditioning on the input matrix x. The inference in the Gaussian process model is as follows. First, we assume a joint GP 7 prior P(z, z |θ) over functions, i.e.,     k  µ   K  P(z, z |θ) = N  ,  , T K(x , x ; θ) µ(x ) k (1.4) where k := Cov(z, z |θ) ∈ Rn is the covariance between z and z obtained by ki = K(x[i] , x ; θ). Then, the joint posterior is obtained using Bayes rule, i.e., P(z, z |θ, y) = P(y|z) P(z, z |θ) , P(y|θ) where we have used P(y|z, z ) = P(y|z). Finally, the desired predictive distribution P(z |θ, y) is obtained by marginalizing out the latent variables in z, i.e., P(z |θ, y) = P(z, z |θ, y)dz = 1 P(y|θ) P(y|z) P(z, z |θ, y)dz. (1.5) Since we have the joint Gaussian prior given in (1.4) and y|z ∼ N z, σ 2 I , the integral in (1.5) can be evaluated in closed-form and the predictive distribution turns out to be Gaussian, i.e., 2 z |θ, y ∼ N µz |θ,y , σz |θ,y , (1.6) µz |θ,y = µ(x ) + k T (K + σ 2 I)−1 (y − µ), (1.7) where 8 and 2 σz |θ,y = K(x , x ; θ) − k T (K + σ 2 I)−1 k. (1.8) For notational simplicity, we define the covariance matrix of the noisy observations as C := Cov(y, y|θ) = K + σ 2 I. 9 Chapter 2 Gaussian process regression under localization uncertainty Abstract: In this chapter, we formulate Gaussian process regression with observations under the localization uncertainty due to the resource-constrained mobile sensor networks. In our formulation, effects of observations, measurement noise, localization uncertainty, and prior distributions are all correctly incorporated in the posterior predictive statistics. The analytically intractable posterior predictive statistics are proposed to be approximated by two techniques, viz., Monte Carlo sampling and Laplace’s method. Such approximation techniques have been carefully tailored to our problems and their approximation error and complexity are analyzed. Simulation study demonstrates that the proposed approaches perform much better than approaches without considering the localization uncertainty properly. Finally, we have applied the proposed approaches on the experimentally collected real data from a dye concentration field over a section of a river and a temperature field of an outdoor swimming pool to provide proof of concept tests and evaluate the proposed schemes in real situations. In both simulation and experimental results, the proposed methods outperform the quick-and-dirty solutions often used in practice. 10 2.1 Introduction Recently, there has been a growing interest in wireless sensor networks due to advanced embedded network technology. Their applications include, but are not limited to, environment monitoring, building comfort control, traffic control, manufacturing and plant automation, and surveillance systems [17, 25]. Mobility in a sensor network can increase its sensing coverage both in space and time, and robustness against uncertainties in environments. Exploitation of mobile sensor networks has been increased in collecting spatio-temporal data from the environment [10, 13, 64]. Gaussian process regression (or Kriging in geostatistics) has been widely used to draw statistical inference from geostatistical and environmental data [14, 84]. Gaussian process modeling enables us to predict physical values, such as temperature or harmful algae bloom biomass, at any point and time with a predicted uncertainty level. For example, nearoptimal static sensor placements with a mutual information criterion in Gaussian processes were proposed in [56, 57]. A distributed Kriged Kalman filter for spatial estimation based on mobile sensor networks was developed in [13]. Centralized and distributed navigation strategies for mobile sensor networks to move in order to reduce prediction error variances at points of interest were developed in [118]. Localization in sensor networks is a fundamental problem in various applications to correctly fuse the spatially collected data and estimate the process of interest. However, obtaining precise localization of robotic networks under limited resources is very challenging [37,38]. Among the localization schemes, the range-based approach [50, 78] provides higher precision as compared to the range-free approach that could be cost-effective. The global positioning system (GPS) becomes one of the major absolute positioning systems in robotics and mobile 11 sensor networks. Most of affordable GPSs slowly update their measurements and have resolution worse than one meter. A GPS is often augmented by the inertial navigation system (INS) for better resolution [102]. In practice, resource-constrained sensor network systems are prone to large uncertainty in localization. Most previous works on Gaussian process regression for mobile sensor networks [14,57,113,118] have assumed that the exact sampling positions are available, which is not practical in real situations. Therefore, motivated by the aforementioned issues, we consider correct (Bayesian) integration of uncertainties in sampling positions, and measurements noise for Gaussian process regression and its computation error and complexity analysis for the sensor network applications. The overall picture of our work is similar to the one in [74] in which an extended Kalman filter (EKF) was used to incorporate robot localization uncertainty and field parameter uncertainty. However, [74] relies on a parametric model, which is a radial basis function network model and EKF, while our motivation is to use more flexible non-parametric approach, viz., Gaussian process regression taking into account localization uncertainty in a Bayesian framework. Gaussian process regression in [29] integrate uncertainty in the test input position for multiple-step ahead time series forecasting. In [29], uncertainty was not considered in the sampling positions of the training data (or observations). However, localization uncertainty effect is potentially significant. For example, Fig. 2.1 shows the effect of noisy sampling positions on the results of Gaussian process regression. Note that adding noise to the sampling positions considerably increase the empirical RMS error, shown in the third row of Fig. 2.1. A Gaussian approximation to the intractable posterior predictive statistics obtained in [29] has been utilized for the predictive control with Gaussian models [54, 73] and Gaussian process dynamic programming [18]. In general, the length of the training data is much longer 12 (a) (b) (c) 2 9 1 5 19 14 18 13 4 3 6 17 7 (e) 1.4 18 3 4 9 19 11 10 2 16 20 0.2 (g) 15 10 2 13 (f) 1.5 0.2 11 10 0.5 2 15 5 0.6 1.5 6 16 20 (h) 3 13 4 5 6 7 8 12 18 19 9 5 11 812 15 -10 −2 −2 (d) 20 4 7 8 20 12 16 0.5 (i) 1 300 100 Figure 2.1: For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. The prediction results of applying Gaussian process regression on the true and noisy sampling position are shown. The first, second, and third rows correspond to the prediction, prediction error variance, and squared empirical error (between predicted and true fields) fields. The first column shows the result of applying Gaussian process regression on the true sampling positions. Second and third columns show the results of applying Gaussian process regression on the noisy sampling positions with 0.1 and 0.4 noise covariance matrices, respectively. True and noisy sampling positions are shown in circles with agent indices in the second row. than that of the test point for sensor network applications, therefore, our problem possibly involves a high dimensional uncertainty vector for the sampling positions. Gaussian process prediction with localization uncertainty can be obtained as a posterior predictive distribution using Bayes’ rule. The main difficulty to this is that it has no analytic 13 closed-form solution and has to be approximated either through Monte Carlo sampling [75] or other approximation techniques such as variational inference [108]. As an important analytical approximation technique, Laplace’s method has been known to be useful in many such situations [100, 101]. Different Laplace approximation techniques have been analyzed in terms of approximation error and computation complexity [68, 69, 100, 101]. The contribution of this chapter is as follows. First, we formulate Gaussian process regression with observations under the localization uncertainty due to the resource-constrained (possibly mobile) sensor networks. Next, approximations have been obtained for analytically intractable predictive mean, and predictive variance by using the Monte Carlo sampling and Laplace’s method. Such approximation methods have been carefully tailored to our problems. In particular, a modified version of the moment generating function (MGF) approximation [101] has been developed as a part of Laplace’s method to reduce the computational complexity. In addition, we have analyzed and compared the approximation error and the complexity so that one can choose a tradeoff between the performance requirements and constrained resources for a particular sensor network application. Another important contribution has been to provide proof of concept tests and evaluate the proposed schemes in real situations. We have applied the proposed approaches on the experimentally collected real data from a dye concentration field over a section of a river and a temperature field of an outdoor swimming pool. The remainder of this chapter is organized as follows. In Section 2.2, we review Gaussian process regression, Monte Carlo sampling and Laplace’s method briefly. In Section 2.3, Gaussian process prediction in the presence of the localization uncertainty has been formulated for a proposed sampling scheme as a Bayesian inference problem. We first present the Monte Carlo estimators for the posterior predictive mean and variance in Section 2.4. In 14 Section 2.5, using Laplace’s method, we provide approximations for the posterior predictive statistics. Section 2.6 compares the computational cost and approximation accuracy over the proposed estimators based on the Mont Carlo sampling and the Laplace’s method. Finally the simulation and experimental results are provided in Sections 2.7 and 2.8, respectively. 2.2 Preliminaries In this section, we review the spatio-temporal Gaussian process, Monte Carlo sampling and Laplace’s method, which will be used throughout the chapter. 2.2.1 Gaussian Process Regression A Gaussian process defines a distribution over a space of functions and it is completely specified by its mean function and covariance function. Let x ∈ X := S × T ⊂ Rb denote the index vector, where x := [ sT t ]T contains the sampling position s ∈ S ⊂ Rι and the sampling time t ∈ T ⊂ R≥0 . A Gaussian process, z(x) ∈ R, is formally defined as below. Definition 2.2.1. A Gaussian process [84] is a collection of random variables, any finite number of which have a joint Gaussian distribution. For notational simplicity, we consider a zero-mean Gaussian process1 z(x) ∼ GP(0, K(x, x )) ∈ R. For example, one may consider a covariance function defined as K(x, x ) = σf 2 exp − x−x 2 2σ0 2 , (2.1) 1 A Gaussian process with a nonzero-mean can be treated by a change of variables. Even without a change of variables, this is not a drastic limitation, since the mean of the posterior process is not confined to zero [84]. 15 where x, x ∈ Rb . In general, the mean and the covariance functions of a Gaussian process can be estimated a priori by maximizing the likelihood function [113]. Suppose, we have N noise corrupted observations without localization error, and D = {(x[i] , y [i] ) | i = 1, · · · , N }. Assume that y [i] = z [i] + [i] ∈ R, where [i] is an independent and identically distributed (i.i.d.) white Gaussian noise with variance σ 2 . x is defined by x = col(x[1] , x[2] , · · · , x[N ] ) ∈ RbN . The collections of the realizations z = [ z [1] · · · z [N ] ]T ∈ RN and the observations y = [ y [1] · · · y [N ] ]T ∈ RN have the Gaussian distributions z ∼ N (0, K(x)), y ∼ N (0, K(x) + σ 2 I), where K(x) ∈ RN ×N is the covariance matrix of z obtained by (K(x))ij = K(x[i] , x[j] ), and I is the identity matrix with an appropriate size. We can predict the value z of the Gaussian process at a point x as [84] z |D ∼ N (µ (x), σ 2 (x)). (2.2) µ (x) = E(z |D ) = k(x)T (K(x) + σ 2 I)−1 y, (2.3) In (2.2), the predictive mean is and the predictive variance is given by σ 2 (x) = Var(z |D ) = σf 2 − k(x)T (K(x) + σ 2 I)−1 k(x), (2.4) where k(x) ∈ RN is the covariance matrix between z and z obtained by kj (x) = K(x[j] , x ), 16 and σf 2 = K(x , x ) ∈ R is the variance at x . (2.3) and (2.4) can be obtained from the fact that   2  σf  col(z , y)|x , x ∼ N 0,  k(x)T k(x) (K(x) + σ 2 I)    . (2.5) Note that the predictive mean in (2.3) and its prediction error variance in (2.4) require the inversion of the covariance matrix whose size depends on the number of observations N . Hence a drawback of Gaussian process regression is that its computational complexity and memory space increase as more measurements are collected, making the method prohibitive for agents with limited memory and computing power. To overcome this increase in complexity, a number of approximation methods for Gaussian process regression have been proposed. In particular, the sparse greedy approximation method [95], the Nystrom method [112], the informative vector machine [58], the likelihood approximation [93], and the Bayesian committee machine [104] have been shown to be effective for many problems. In particular, it has been proposed that spatio-temporal Gaussian process regression can be applied to truncated observations including only measurements near the target position and time of interest for agents with limited resources [118]. To justify prediction based on only the most recent observations, a similar argument has been made in [6] in the sense that the data from the remote past do not change the predictors significantly under the exponentially decaying correlation functions. In this chapter, we also assume that at each iteration the mobile sensor networks only needs to fuse a fixed number of truncated observations, which are near the target points of interest, to limit the computational resources. 17 2.2.2 Monte Carlo and Importance Sampling In what follows, we briefly review Monte Carlo and Importance sampling based on [2]. The idea of Monte Carlo simulation is to draw an i.i.d. set of random samples {x(i) }r from i=1 a target density P(x), where x ∈ X N = R. These r random samples will be used to r δ(x(i) − x), where approximate P(x) by an empirical point-mass function Pr (x) = 1 r i=1 δ(·) denotes the Dirac delta. Consequently, the integrals I(f ) can be approximated with tractable sums Ir (f ) that converge as follows. 1 Ir (f ) = r r a.s. r→∞ f (x(i) ) → I(f ) = i=1 R f (x) P(x)dx. Ir (f ) is an unbiased estimator. In addition, it will almost surely asymptotically converge to I(f ), which can be proved by the strong law of large numbers. Considering f : R → R for simplicity, if f (·) satisfies σe 2 = E(f 2 (x)) − I 2 (f ) < ∞, then the variance of the estimator 2 Ir (f ) converges to σe as r increases. In particular, the central limit theorem provides r √ us with convergence in distribution as follows. r (Ir (f ) − I(f )) ⇒ N (0, σe 2 ), where ⇒ r→∞ denotes convergence in distribution. Importance sampling is a special case of Monte Carlo implementation, having random samples generated from an available distribution rather than the distribution of interest. Let us introduce an arbitrary importance proposal distribution ρ(x) whose support includes the P(x) support of P(x). We can then rewrite I(f ) as I(f ) = R f (x)ω(x)ρ(x)dx, where ω(x) = ρ(x) is known as the importance weight. Simulating r i.i.d. samples {x(i) }r according to i=1 ρ(x) and evaluating ω(x(i) ), a possible unbiased Monte Carlo estimate of I(f ) is given by ˆ Ir (f ) = r (i) (i) i=1 f (x )ω(x )/ r (i) i=1 ω(x ). a.s. ˆ Under weak assumptions, the strong law of large numbers applies, yielding Ir (f ) → I(f ). r→∞ 18 This integration method can also be interpreted as a sampling method where the posterior ˆ density P(x) is approximated by Pr (x) = r (i) (i) i=1 ω(x )δ(x − x )/ r (i) ˆ i=1 ω(x ), and Ir (f ) ˆ is the integration of f (x) with respect to the empirical measure Pr (x). In this chapter, we shall compute the ratio of two integrals in the form of f (x)ω(x)ρ(x)dx G(f ) = R , R ω(x)ρ(x)dx (2.6) where ρ(x), f (x) and ω(x) are defined on a space R. To compute G(f ) in (2.6), we use the following approximation as proposed in [28]. ˆ Gr (f ) = r (i) (i) i=1 f (x )ω(x ) , r (i) i=1 ω(x ) (2.7) where {x(i) | i = 1, · · · , r} is a sequence of i.i.d. random vectors, which is drawn from ρ(x) distribution. In the following theorem, which can be shown to be a special case of the results from [28], we show the convergence properties of the approximation in (2.7) as functions of r, the number of random samples [28]. ˆ Theorem 2.2.2. (Theorems 1 and 2 from [28]) Consider the approximation Gr (f ) given in (2.7) to the ratio given G(f ) in (2.6). If ω(x)ρ(x) is proportional to a proper probability density function defined on R, and G(f ), R ω(x)ρ(x)dx and R f 2 (x)w2 (x)ρ(x)dx are finite, we have √ a.s. ˆ ˆ Gr (f ) → G(f ), and r Gr (f ) − G(f ) r→∞ 2 ⇒ N (0, σg ), r→∞ where 2 σg = (f (x) − G(f ))2 w2 (x)ρ(x)dx. R 19 Proof. It is a straightforward application of the results from Theorems 1 and 2 in [28] to (2.6) and (2.7). 2.2.3 Laplace Approximations The Laplace method is a family of asymptotic approximations that approximate an integral of a function, i.e., R f (x)dx, where x ∈ R ⊂ Rm , and m = N × b. Let the function f (x) be in a form f (x) = e−nh(x) , where 1 n ∈ Z>0 , and h : R → R is a two times continuously ˆ differentiable real function on R. Let x denote the exact mode of −h, i.e., ˆ x = arg max −h(x). x∈R Then Laplace’s method produces the approximation [100]: f (x)dx = R 2π n m 2 1 x |φ(ˆ )| 2 e−nh(ˆ ) + O(n−1 ), x (2.8) where φ(ˆ ) = [ 2 h(ˆ )]−1 . The Laplace approximation in (2.8) will produce reasonable x x results as long as the −h is unimodal or at least dominated by a single mode. In practice it might be difficult to find the exact mode of −h. A concept of an asymptotic mode is introduced to gauge the approximation error when the exact mode is not used [68]. ˘ ˘ ˆ Definition 2.2.3. x is called an asymptotic mode of order O(n−k ) for −h if x − x → 0 as n → ∞, and h(˘ ) = O(n−k ). x ˘ ˘ Suppose that x is an asymptotic mode of order O(n−1 ) for −h and {h, x} satisfies the 20 regularity conditions. Then it follows that we have f (x)dx = R 2π n m 2 1 x x |φ(˘ )| 2 e−nh(˘ ) Ch (˘ ) + O(n−1 ), x (2.9) where Ch (˘ ) is given by x n xT x x Ch (˘ ) = e 2 h(˘ ) φ(˘ ) h(˘ ) . x More precise form with the asymptotic mode of order O(n−2 ) is computed for an approximation of order O(n−3 ) in [69]. In many Bayesian inference applications and as in our problem, we need to compute the ratio of two integrals. To this end, fully exponential Laplace approximations has been developed by [100] to compute Laplace approximations of the ratio of two integrals, i.e., e−nu(x) dx . M= R e−nh(x) dx R (2.10) If each of −u and −h has a dominant peak at its maximum, then Laplace’s method may be directly applied to both the numerator and denominator of (2.10) separately. If the regularity conditions are satisfied, using (2.8) for the denominator approximation and (2.9) for the numerator approximation, Miyata [68] obtained the following approximation and its error order , ˆ M= 1 x | 2 h(ˆ )| 2 enh(ˆ ) x 1 x | 2 u(˘ )| 2 enu(˘ ) x ˆ M = M + O(n−2 ), 21 × Cu (˘ ), x (2.11) ˆ ˘ where x is the exact mode of −h, and x is the asymptotic mode of −u, and n u(˘ )T [ 2 u(˘ )]−1 u(˘ ) x x x Cu (˘ ) = e 2 x , (2.12) ˘ ˆ x = x − [ 2 u(ˆ )]−1 x u(ˆ ). x Laplace approximations typically has an error of order O(n−1 ) as shown in (2.8) and (2.9). On the other hand, fully exponential Laplace approximations for the ratio form yield an error of order O(n−2 ) as shown in (2.11) since the error terms of order O(n−1 ) in the numerator and denominator cancel each other [100]. Fully exponential Laplace approximations which are presented in (2.11) is limited for positive functions. Then, Tierney et al. [101] proposed a second-order approximation to the expectation of a general function g(x) (not necessarily positive) by applying the fully exponential method to approximate M(τ ) = E(eτ g(x) ) and then differentiating the approximated M(τ ). Consider M(τ ) is defined as follow eτ g(x) e−nh(x) dx e−nu(x) dx = R , M(τ ) = R −nh(x) dx −nh(x) dx Re Re τ where u(x) = − n g(x) + h(x), and eτ g(x) is positive, while g(x) could be positive or negative. d ˆ In particular, dτ M(τ ) evaluated at τ = 0 yields a second-order approximation to E(g(x)) and its order of the error as follow. E(g(x)) = d d ˆ M(τ ) = M(τ ) + O(n−2 ). dτ dτ τ =0 τ =0 (2.13) This method, which is called moment generating function (MGF) Laplace approximation, gives a standard-form approximation using the exact mode of −h(x) [101]. 22 Miyata [68, 69] extended the MGF method for one without computing the exact mode of ˆ ˆ −h(x). Let x be an asymptotic mode of order O(n−1 ) for −h(x). Suppose that {h, x} satisfies the regularity conditions for the asymptotic-mode Laplace method. By using Theorem 5 ˆ in [68], the approximation of E(g(x)) and its error order are given as 1 ˆ tr E(g(x)) = g(ˆ ) + x 2n − 2 g(ˆ )φ(ˆ ) − 1 x x 2n m ijk =1 ∂g(ˆ ) x ∂ 3 h(ˆ ) x φi φjk ∂xi ∂xj ∂xk ∂x (2.14) g(ˆ )T φ(ˆ ) h(ˆ ), x x x ˆ E(g(x)) = E(g(x)) + O(n−2 ), ˆ where φij is the i-th row, j-th column element of the matrix φ(ˆ ). Furthermore, if x is the x exact mode of −h(x), then the approximation has a simpler form because the terms that include h(ˆ ) vanish x 1 ˆ tr E(g(x)) = g(ˆ ) + x 2n 2.3 m 2 g(ˆ )φ(ˆ ) − 1 x x 2n ijk =1 ∂ 3 h(ˆ ) x ∂ φi φjk g(ˆ ). x ∂xi ∂xj ∂xk ∂x (2.15) The Problem Statement In practice, D (data with perfect localization) is not available due to localization uncertainty, and instead its exact sampling points will be replaced with noise corrupted sampling points. To average out measurement and localization noises, in this chapter, we propose to use a sampling scheme in which multiple measurements are taken repeatedly at a set of sampling points of a sensor network. For robotic sensors or mobile sensor networks, this sampling strategy could be efficient and inexpensive since the large energy consumption is usually due to the mobility of the sensor network. Let N sensing agents be indexed by J = {1, · · · , N }. 23 From the proposed sampling scheme, we assume that each agent takes multiple data pairs {(¯[i] , y [i] ) | i ∈ I}, which are indexed by I = {1, · · · , n} at a set of sampling points by the x sensor network {x[j] | j ∈ J }. We then define the map ψ : I → J that takes the index of the data pair in I as the input and returns the index of the sensor that produced the data pair as the output. Consider the following realizations using the sampling scheme and the notation just introduced. x[i] = x(ψ(i)) + v [i] ∈ Rb , ¯ y [i] = z (ψ(i)) + [i] ∈ R, ∀i ∈ I ∀i ∈ I, where [i] is an i.i.d. white Gaussian noise with a zero mean and a variance of σ 2 , i.e., [i] ∼ N (0, σ 2 ) and v [i] is a localization error which has a multivariate normal distribution with a zero mean and a covariance matrix Σv ∈ Rb×b , i.e., v [i] ∼ N (0, Σv ). For instance, the distribution of the localization error may be a result of the fusion of GPS and INS measurements [102], or landmark observations and robot’s kinematics [24]. To simplify the notation, D is introduced to denote the data with the measurement noise and localization error as follows. D = (¯[i] , y [i] ) | i ∈ I . x (2.16) We also define the collective sampling point vectors with and without localization uncer- 24 tainty, and the cumulative localization noise vector, respectively by x = col(x[1] , x[2] , · · · , x[N ] ) ∈ RbN , ¯ x = col(¯[1] , x[2] , · · · , x(n) ) ∈ Rbn , x ¯ ¯ (2.17) v = col(v [1] , v [2] , · · · , v (n) ) ∈ Rbn . From the proposed sampling scheme, to average out the measurement and localization uncertainties, the number of measurements n can be increased without increasing the number of sensors N , and consequently without increasing the dimension of x ∈ RbN . Hence, this approach may be efficient for the resource-constrained (mobile) sensor network at the cost of taking more measurements. Using collective sampling point vectors in (2.17), we have the following relationship. ¯ x = Lx + v, (2.18) ¯ ¯ ¯ ¯ where L = L ⊗ Ib ∈ Rbn×bN , L ∈ Rn×N and L ij = 1 if ψ(i) = j, otherwise L ij = 0. The conditional probabilities P(¯ |x) and P(x|¯ ) can be expressed as x x P(¯ |x) = x 1 1 x T −1 x e− 2 (¯ −Lx) Σv (¯ −Lx) , 1 |2πΣv | 2 P(x|¯ ) = x P(¯ |x) P(x) x , x R P(¯ |x) P(x)dx where Σv = In ⊗ Σv ∈ Rbn×bn . From a Bayesian perspective, we can treat x as a random vector to incorporate a prior distribution on x. For example, if we assign a prior distribution on x such that x ∼ N (0, Σx ) then we have P(x|¯ ) = x 1 1 ¯ T ˚−1 ¯ e− 2 (x−H x) Σ (x−H x) . 1 |2π˚ 2 Σ| 25 (2.19) where H = ˚ T Σ−1 and ˚−1 = Σ−1 + LT Σ−1 L. ΣL v Σ x v Evaluating posterior predictive statistics such as the density, the mean, and the variance are of critical importance for the sensor network applications. Therefore, given the data D in (2.16), our goal is to compute the posterior predictive statistics. In particular we focused on the following two quantities given in detail. • The predictive mean estimator (PME) is given by E(z |D) = Z z P(z |D)dz . z E(z |D) = Z = x R P(z |x, y) P(y|x) P(x|¯ )dxdz x R P(y|x) P(x|¯ )dx (2.20) x R µ (x) P(y|x) P(x|¯ )dx , x R P(y|x) P(x|¯ )dx where µ (x) is given by (2.3). • The predictive variance estimator (PVE) is obtained similarly. From the following equation, E(z 2 |D) = R 2 Z z P(z |x, y)dz P(y|x) P(x|¯ )dx x , x R P(y|x) P(x|¯ )dx where Z z 2 P(z |x, y)dz = σ 2 (x) + µ 2 (x), we obtain Var(z |D) given as the following formula. (σ Var(z |D) = R 2 (x) + µ 2 (x)) P(y|x) P(x|¯ )dx x x R P(y|x) P(x|¯ )dx − E2 (z |D), (2.21) where σ 2 (x) is given by (2.4). The main challenge to our problems is the fact that there are no closed-form formulas 26 for the posterior predictive statistics listed in (2.20), and (2.21). Therefore, in this chapter, approximation techniques will be carefully applied to obtain approximate solutions. In addition, the tradeoffs between the computational complexity and the precision will be investigated for the sensor networks with limited resources. From (2.19), one might be tempted to use the best estimate of x, e.g., the conditional ¯ expectation of x for given measured locations x, i.e., E(x|¯ ) for Gaussian process regression. x Comparison between these type of quick-and-dirty solutions and the proposed Bayesian approaches will be evaluated and discussed with simulated and experimental data presented in Sections 2.7 and 2.8, respectively. 2.4 The Monte Carlo method In this section, we propose importance sampling to compute the posterior predictive statistics. We also summarize the convergence results of the MC estimators based on Theorem 2.2.2. 2.4.1 Predictive mean The predictive mean estimator is given by (2.20). Using importance sampling to approximate (2.20) leads to the following equation, ˆ E(z |D) = r (i) (i) i=1 P(y|x )µ (x ) . r (i) i=1 P(y|x ) 27 (2.22) Theorem 2.2.2 and (2.22) lead to a.s. ˆ E(z |D) → E(z |D), r→∞ ˆ E(z |D) − E(z |D) ⇒ N r→∞ σ2 0, m r , 2 where σm = R (µ (x) − E(z |D))2 P2 (y|x) P(x|¯ )dx, µ (x) is calculated from (2.3), and x x(i) has been sampled from P(x|¯ ) given by (2.19). x 2.4.2 Predictive variance For the prediction error variance given by (2.21), we have Var(z |D) = E σ 2 (x) + Var (µ (x)) . Thus, we propose the following estimator. Var(z |D) = r (i) 2 (i) i=1 P(y|x )σ (x ) r (i) i=1 P(y|x ) r ˆ P(y|x(i) )(µ (x(i) ) − E(z |D))2 , + i=1 r P(y|x(i) ) i=1 (2.23) where σ 2 (x(i) ) = Var(z |x(i) , y), µ (x(i) ) = E(z |x(i) , y) are obtained from the formulas ˆ similar to (2.4) and (2.3), and E(z |D) is given by (2.22). ˆ Applying Theorem 2.2.2 for f (x) = σ 2 (x) + (µ (x) − E(z |D))2 , ω(x) = P(y|x), and ρ(x) = P(x|¯ ), we obtain the following results. x a.s. r→∞ Var(z |D) → Var(z |D), 28 where ˆ ˆ Var(z |D) = E(z 2 |D) − 2E(z |D)E(z |D) + E2 (z |D), Var(z |D) − Var(z |D) 2 ⇒ N 0, σz r−1 , r→∞ and 2 σz = R ˆ (σ 2 (x) + (µ (x) − E(z |D))2 − Var(z |D))2 P2 (y|x) P(x|¯ )dx. x ˆ Remark 2.4.1. Since E(z |D) is not available in (2.23), E(z |D) is used instead. Subsequently, the convergence results for Var(z |D) are given with respect to Var(z |D) which converges to Var(z |D) as r → ∞ by the definition. 2.5 Fully Exponential Laplace Approximations In this section, we propose fully exponential Laplace approximations to compute the posterior predictive statistics. In the process of applying Laplace approximations, we also obtain the estimation of the sampling points given D as a by-product. From the observations D given by (2.16), we can update the estimates of the sampling points x. To this end, we use the maximum a posteriori probability (MAP) estimate of x given by ˆ xM AP = arg max P(y|x) P(x|¯ ). x x∈R 29 (2.24) 2.5.1 Perdictive mean The predictive mean estimator, given by (2.20), can be approximated using MGF method (2.14). To compute E(z |D), let g(x) = µ (x). Using 1 h(x) = − ln (P(y|x) P(x|¯ )), x n (2.25) ˆ the predictive mean estimation using MGF method (MGF-PME), E(z |D) and its error order are given as ˆ ˆ E(z |D) = E(g(x)), (2.26) ˆ E(z |D) = E(z |D) + O(n−2 ), ˆ where E(g(x)) is given in (2.14) or (2.15). However, the MGF-PME given by (2.26) needs the computation of the third derivative of h, which increases the complexity of the algorithm. In this chapter, another MGF method has been developed in order not to use the third derivative of h. To approximate the derivative of M(·) at a point τ in (2.13), we utilize a three-point estimation, which is the slope of a nearby secant line through the points (τ − ζ, M(τ − ζ)) and (τ + ζ, M(τ + ζ)). Approximating the derivative in (2.13) with the three-point estimation, we can avoid the third derivative in (2.14) or (2.15). We summarize our results in the following theorem. ˆ Theorem 2.5.1. Let x be the exact mode of −h(x). The three-point predictive mean esti- 30 mator (TP-PME) and its order of the error are given by 1 3 ˆ E(z |D) = n 4 2 × 1 x 2 h(ˆ ) 2 enh(ˆ ) x 1 ˚x 2˚ x ) − 2 C (ˆ )e−nb(ˆ˚) − b b(ˆ˚ ˚ x˚ b b b 1 p(ˆ p 2˚ x ) − 2 C (ˆ )e−n˚ x˚) p(ˆ˚ p ˚ x˚ p p , (2.27) ˆ E(z |D) = E(z |D) + O(n−3/2 ), where h(x) is given by (2.25), and we have used the following definitions 7 ˚ = h(x) − n− 4 µ (x), b(x) 7 ˚ = h(x) + n− 4 µ (x), p(x) ˆb ˆ x˚ = x − [ 2˚ x)]−1 ˚ x), b(ˆ b(ˆ ˆp ˆ x˚ = x − [ 2˚ x)]−1 ˚ x), p(ˆ p(ˆ n ˚ x )T [ 2˚ x )]−1 ˚ x ) b(ˆ˚ b(ˆ˚ b(ˆ˚ b b b , C˚(ˆ˚) = e 2 x b b n ˚ x )T [ 2˚ x )]−1 ˚ x ) p(ˆ˚ p(ˆ˚ p(ˆ˚ p p p . C˚(ˆ˚) = e 2 p xp Proof. First we use the three-point method to approximate derivative such that d M(ζ) − M(−ζ) M(τ ) = + O(ζ 2 ). dτ 2ζ τ =0 ˆ Plugging M(ζ) = M(ζ) + O(n−2 ) into the above equation and using (2.13), we obtain ˆ ˆ M(ζ) − M(−ζ) ˆ E(z |D) = + O(n−2 ) O(ζ −1 ) + O(ζ 2 ). 2ζ (2.28) By selecting ζ = n−3/4 , we recover the order of the error O(n−2 ) O(ζ −1 ) + O(ζ 2 ) = ˆ ˆ O(n−3/2 ). By computing the estimates M(ζ) and M(−ζ) in (2.28) using (2.11), (2.27) 31 is obtained. Remark 2.5.2. The complexity of either (2.14) or (2.15) is O(n4 ) while the complexity of (2.27) is O(n3 ). In return, the error of (2.15) is of order O(n−2 ) and the error of (2.27) is of order O(n−3/2 ). 2.5.2 Predictive variance We now apply Laplace’s method to approximate the prediction error variance in a similar way. The prediction error variance is given by (2.21). In this case, h(x) is given by (2.25) 1 and u(x) = − n ln(σ 2 (x) + µ 2 (x)) + ln(P(y|x) P(x|¯ )) . Applying (2.11) to this case, the x approximate of Var(z |D) and its order of the error are given by Var(z |D) = 1 x | 2 h(ˆ )| 2 enh(ˆ ) x 1 x | 2 u(˘ )| 2 enu(˘ ) x ˆ Cu (˘ ) − E2 (z |D), x (2.29) Var(z |D) =Var(z |D) + O(n−3/2 ), ˆ ˘ ˘ where x is the exact mode of −h, and x is the asymptotic mode of −u. Cu (˘ ) and x are x ˆ given by (2.12) and E(z |D) is given by (2.27). 2.5.3 Simple Laplace approximations To minimize the computational complexity, one may prefer a simpler approximation. In this chapter, we propose such a simple approximation at the cost of precision, which is summarized in the following theorem. ˆ Theorem 2.5.3. Let x be an asymptotic mode of order O(n−1 ) for −h given by (2.25). ˆ Assume that {h, x} satisfies the regularity conditions. Consider the following approximations 32 for E(z |D) and Var(z |D) ˆ E(z |D) = kT (ˆ )(K(ˆ ) + σ 2 I)−1 y, x x (2.30) Var(z |D) = σf 2 − kT (ˆ )(K(ˆ ) + σ 2 I)−1 k(ˆ ), x x x (2.31) ˆ where K(ˆ ) and k(ˆ ) are covariance matrices as in (2.3) but obtained with x. x x We have then the following order of errors. ˆ E(z |D) = E(z |D) + O(n−1 ), Var(z |D) = Var(z |D) + O(n−1 ). Proof. The approximation for E(z |D) given by (2.30) is the first term of (2.14) neglecting ˆ high order terms. The second and third terms in (2.14) have n−1 inside. x is an asymptotic mode of order O(n−1 ) for −h. By Definition 2.2.3, the last term of (2.14) that contains h(ˆ ) x is O(n−1 ). Hence, without these high order terms, we obtain a simpler approximation of order O(n−1 ). The approximation for Var(z |D) given by (2.31) can be obtained by approximating (2.21) with the first term of the MGF approximation. Let g(x) = σ 2 (x) + µ 2 (x), then Var(z |D) = E(g(x)|D) − E2 (z |D). Using the first term of MGF E(g(x)|D) = g(ˆ ) + x O(n−1 ) = σ 2 (ˆ ) + µ 2 (ˆ ) + O(n−1 ), and using E2 (z |D) = µ 2 (ˆ ) + O(n−1 ), we obtain x x x Var(z |D) = σ 2 (ˆ ) + O(n−1 ). The procedure of showing the order of the approximation x error is the same as what was shown for the approximation in (2.30). Remark 2.5.4. Note that the simple Laplace predictive mean and variance estimators in (2.30) and (2.31) take the same forms of the original predictive mean and variance without 33 the localization error, respectively given in (2.3) and (2.4), evaluated at the MAP estimator ˆ of x. As we previously mentioned, x is the mode of −h given by (2.25) and is the MAP ˆ ˆ estimator of x, i.e., x = xM AP as defined in (2.24). Therefore, the difference in the simple Laplace approximations with respect to a quick-and-dirty solution in which the measured ¯ ˆ ¯ location vector x is used is that the simple Laplace approximations use x instead of x. In applying Laplace’s method, using the one step Newton gradient method to compute ˘ ˆb ˆp asymptotic modes, e.g., x required in (2.29) or x˚ and x˚ required in (2.27) may not satisfy the regularity conditions. In this case, one needs to continue the Newton gradient optimization until the regularity conditions are satisfied. 2.6 Error and Complexity Analysis In this section, we analyze the order of the error and the computational complexity for the proposed approximation methods, which are summarized in Table 2.1. A tradeoff between the approximation error and complexity can be chosen taking into account the performance requirements and constrained resources for a specific sensor network application. For Laplace’s method, the order of the error ranges from O(n−1 ) to O(n−2 ) at the cost of complexity from O(n3 ) to O(n4 ) as shown in Table 2.1. For the Monte Carlo estimators, we introduce O(σr ), which is the probabilistic error 2 order and it implies that the estimation error converges to N (0, σr ) in distribution as the number of random samples r increases. Therefore it is not appropriate to compare the error bound between Monte Carlo and Laplace’s method exactly. Monte Carlo algorithms are used for multivariate integration of dimension b × N . The probabilistic error order of Monte Carlo algorithms that use r sample evaluations is of order r−1/2 for a given n. We may 34 Table 2.1: Error and complexity analysis Estimator Method Error Complexity MC-PME in (2.22) Monte Carlo O(n3 × r) MGF-PME in (2.14) TP-PME in (2.27) S-PME in (2.30) Laplace MGF Laplace MGF Laplace MGF √ O σm r −2 ) O(n O(n−3/2 ) O(n−1 ) O(n4 ) O(n3 ) O(n3 ) 0 2 1 0 2 −2 3 0 1 2 3 4 5 6 Figure 2.2: A realization of the Gaussian process-ground truth. assume that the order of the error for Monte Carlo methods do not depend on the number of measurements n for a large n [28]. With this assumption, the number of samples needed for Monte Carlo algorithms to reduce the initial probabilistic error by n−2 is of order r = n4 . The complexity of the Monte Carlo methods, for the investigated problems in this chapter, is O(n3 × r). To achieve the probabilistic error order O(n−2 ), the complexity of Monte Carlo methods has to be O(n7 ). If we want to keep the error order at the level of O(n−2 ) and O(n−2 ) for Laplace’s and Monte Carlo methods, respectively, the Monte Carlo methods are slightly more expensive than Laplace’s method. 2.7 Simulation Results In this section, we provide simulation results to evaluate the performances of different estimation methods. To this end, a realization of a Gaussian process that will serve as ground 35 truth is shown in Fig. 2.2. The Gaussian process is generated for σf = √ 2 and σ0 = √ 2 in (2.1). The measurement noise and the sampling position uncertainty variance are σ = 0.01 √ and Σv = 0.1 × I, respectively. N = 20 and n = 40 imply that 20 robot takes measurements twice at each sampling position. In Figs. 2.2-2.7, the predicted fields and predicted error variance fields are shown with color bars. The results from Gaussian process regression using the noiseless positions x and the noisy measurement y are shown in Fig. 2.3. ¯ To compare with typical quick-and-dirty solutions (QDS) to deal with noisy locations x in practice, we define two solutions: QDS 1 and 2 approaches. QDS 1 is to applying Gaussian process regression given by (2.3) and (2.4) by simply ignoring noises in the locations and ¯ taking x as the true positions, i.e., µ (¯ ) = kT (¯ )(K(¯ ) + σ 2 I)−1 y. x x x (2.32) When the measurements are taken repeatedly as suggested in Section 2.3, QDS 1 could be improved. In this regard, QDS 2 is to use the conditional expectation of sampling positions ¯ x given x as in (2.19) and the least squares solution of z for given y, which shall be plugged into Gaussian process regression, i.e., ¯ ¯ ¯ ¯ ¯ ¯ ¯ µ (H x) = kT (H x)(K(H x) + σ 2 I)−1 [(LT L)−1 LT z], (2.33) ¯ where H is from (2.19) and L is from (2.18). QDS 2 might be an improved version of QDS 1 when there are many repeated measurements for a set of fixed sampling positions. ¯ Figs. 2.4 and 2.5 show the results of applying QDS 1 and QDS 2 on x and y, respectively. 36 Table 2.2: Simulation RMS error Estimator & Method Gaussian process regression in (2.3) QDS 1 in (2.32) QDS 2 in (2.33) Monte Carlo method in (2.22) Laplace’s method in (2.27) Simple Laplace method in (2.30) Figure Fig. 2.3 Fig. 2.4 Fig. 2.5 Fig. 2.6 Fig. 2.7 Fig. 2.8 RMS error 0.1281 0.9126 0.7374 0.3403 0.3320 0.3503 This averaging helps the QDS approach to generate smoother predictions, and it shows improvement with respect to QDS 1. The results of the Monte Carlo method with r = 1000 samples are shown in Fig. 2.6. The results of Laplace’s method are shown in Fig. 2.7, and they look very similar to those of the Monte Carlo methods in Fig. 2.6. Figs. 2.4-2.7 clearly shows that both the Monte Carlo and Laplace’s method outperform QDS 1 and 2 with respect to the true field in Fig. 2.3. To numerically quantify the performance of each approach, we have computed the root mean square (RMS) error between the predicted and true fields for all methods, which are summarized in Table 2.2. This RMS error analysis could be done since we know the true realization of the Gaussian process exactly in this simulation study. As expected, Gaussian process regression with the true locations perform best. The proposed approaches, i.e., Monte Carlo and Laplace’s method outperform QDS 1 and 2 in terms of RMS errors as well. 37 (a) 0 2 1 0 2 −2 3 0 1 2 0 3 (b) 4 5 6 0.4 1 0.3 0.2 2 0.1 3 0 1 2 3 4 5 6 0 Figure 2.3: Gaussian process regression using the true positions x. a) The predictive mean estimation. b) The predictive variance estimation, and the true sampling positions x in aquamarine crosses. 2.8 2.8.1 Experimental Results Experiment with simulated noisy sampling positions In this section, we apply proposed prediction algorithms to a real experimental data set to model the concentration of 7α, 12α, 24-trihydroxy-5α-cholan-3-one 24-sulfate (3kPZS), a synthesized sea lamprey (Petromyzon marinus) mating pheromone, in the Ocqueoc River, MI, USA which the authors of [49] provided. The sea lamprey is an ecologically damaging vertebrate invasive fish invader of the Laurentian Great Lakes [94] and a sea lamprey management program has been established [11]. A recent study by Johnson et al. [49] showed 38 (a) 0 2 1 0 2 −2 3 0 1 2 0 3 (b) 4 5 6 0.1 1 0.05 2 3 0 1 2 3 4 5 6 0 ¯ Figure 2.4: The results of the QDS 1 approximations using x. a) The predictive mean ¯ estimation. b) The predictive variance estimation, and x shown with aquamarine crosses. that synthesized 3kPZS, a synthesized component of the male mating pheromone, when released into a stream to reach concentrations of 10−14 − 10−10 M (molar) or mol/L, triggers robust upstream movement in ovulated females drawing ≈ 50% into baited traps. The ability to predict 3kPZS concentration at any location and time with a predicted uncertainty level would allow for fine-scale evaluations of hypothesized sea lamprey chemo-orientation mechanisms such as odor-conditioned rheotaxis [49]. Here 3kPZS was added to the stream to produce pulsing excitation to sea lampreys by applying 3kPZS at two minutes intervals [48]. To describe 3kPZS concentration throughout the experimental stream, rhodamine dye (Turner Designs, Rhodamine WT, Sunnyale, CA, USA) was applied at the pheromone release point (or source location) to reach a final in-stream concentration of 1.0 mug/L (mea39 (a) 0 2 1 0 2 −2 3 0 1 2 0 3 (b) 4 5 6 0.4 1 0.3 0.2 2 0.1 3 0 1 2 3 4 5 6 0 Figure 2.5: The results of the QDS 2 approximations using E(x|¯ ). a) The predictive mean x estimation. b) The predictive variance estimation, and E(x|¯ ) shown with aquamarine x crosses. sure the concentration of the 3kPZS). The same pulsing strategy is used when 3kPZS was applied. The dye and 2kPZS pumping systems are shown in Fig. 2.9-(a). An example of the highly visible dye plume near the source location is shown in Fig. 2.9-(c). To quantify dye concentrations in the stream, water samples were collected along transects occurring every 5 m from the sea lamprey release cage (0 m) to the application location (73 m upstream of release cage) as shown in Fig. 2.9-(b). Three sampling positions were fixed along each transect at 1/4, 1/2 and 3/4 of the channel width from the left bank. Water was collected from the three sampling sites along a transect simultaneously (three samplers) every 10 sec. for 2 minutes time interval. Further, a series of samples over 2 min were collected 40 (a) 0 2 1 0 2 −2 3 0 1 2 0 3 (b) 4 5 6 0.4 1 0.3 0.2 2 0.1 3 0 1 2 3 4 5 6 0 ¯ Figure 2.6: The results of the Monte Carlo approximations with 1000 samples using x. a) ¯ The predictive mean estimation. b) The predictive variance estimation, and x shown with aquamarine crosses.. exactly 1 meter downstream of the dye application location. Water samples were collected in 5 ml glass vials and subsequently the fluorescence intensity of each sample measured at 556 nm was determined in a luminescence spectrometer (Perkin Elmer LSS55, Downers Grove, IL, USA) and rhodamine concentration was estimated using a standard curve (R2 = 0.9998). The objective here is to predict the spatio-temporal field of the dye concentration. In fact, the sampling positions are exactly known from the experiment. Therefore, we will intentionally inject some noises in the sampling positions to evaluate the proposed prediction algorithms in this chapter. Before applying the proposed algorithms, the hyperparameters such as σf and σ0 were identified from the experimental data by a maximum a posteriori 41 (a) 0 2 1 0 2 −2 3 0 1 2 0 3 (b) 4 5 6 1.5 1 1 2 0.5 3 0 1 2 3 4 5 6 0 ¯ Figure 2.7: The results of the Laplace approximations using x. a) Predictive mean estima¯ tion. b) The predictive variance estimation, and x shown with aquamarine crosses. probability (MAP) estimator as described in [113]. On the other hand, the value for σ was set to σ = 5×10−3 mug/L according to the noise level from the data sheet of the sensor. We consider the anisotropic covariance function [113] to deal with the case that the correlations along different directions in x = [x1 , x2 , x3 ]T ∈ R3 are different. Next, we performed a change of variables such that in a transformed space and time. The covariance function, given (2.1), could be used for the proposed approaches. For the case of this experiment, the spatial correlation along the river flow direction is different from the correlation perpendicular to the river flow direction. We consider the 42 (a) 0 2 1 0 2 −2 3 0 1 2 0 3 (b) 4 5 6 0.4 1 0.3 0.2 2 0.1 3 0 1 2 3 4 5 6 0 ¯ Figure 2.8: The results of the simple Laplace approximations using x. a) The predictive ˆ mean estimation. b) The predictive variance estimation, and x shown with aquamarine ˆ crosses. x is the estimation of the true sampling positions, computed as a by-product of both fully exponential Laplace and simple Laplace approximations. following covariance function in the units for the experimental data.   2 (x − x )2 j j  2 exp − . K(x, x ) = σf 2 2σj j=1 (2.34) Using this covariance function in (2.34) and computing likelihood function with true value of x, the hyperparameter vector θ := [σf , σ1 , σ2 ] can be computed by the MAP estimator as follow: θM AP = arg max P(y|θ) P(θ). θ∈Θ 43 (2.35) (a) (b) (c) Figure 2.9: (a) The dye pumping system. (b) An example of normal dye concentration applied to the stream for the pheromone distribution estimation. (c) An example of the highly visible dye plume near the source location. 44 Table 2.3: Experiment parameters Description Number of sensors Number of measurements The variability at a fixed point Bandwidth Parameter N n σf σ1 (vertical) σ2 (horizontal) Noise covariance matrix Σv Measurement noise level σ Using the optimization in (2.35), we obtained σf = √ Value 15 15 √ 0.3 1.6 10.7 4 0 0 0.089 0.005 0.3 mug/L, σ1 = 1.6 m and σ2 = 10.7 m. x1 and x2 are the coordinates of the vertical and horizontal (flow direction) axes of Fig. 2.10. As we expect, we have σ1 < σ2 , i.e., the field is more correlated along the river flow direction as compared to the perpendicular to the river flow direction. Next, after finding MAP estimates of hyperparameters σ1 and σ2 , we change the scale of each component of x, such that we have the same values for hyperparameters in the transformed coordinates. The σ new coordinates are given by {xnew = σ0 xi } and we recover i i   K(x, x ) = σf 2 exp − b i=1 2 σ σ ( σ0 )xi − ( σ0 )xi  i i , 2 2σ0  σ as in (2.1). Note that scaling xi with σ0 , also transforms the covariance matrix Σv in a i normalized space. Table 2.3 shows parameter values which are estimated and used for the experimental data. Fig. 2.10 shows the predicted field by applying Gaussian process regression on the true positions x. To illustrate the advantage of proposed methods over the QDS approach in dealing with the noisy localizations, we show the results of applying QDS 1, Monte Carlo, Laplace, and simple Laplace approximations in Figs. 2.11, 2.12, 2.13, and 2.14, respectively. 45 (a) 1 −2 0 2 0 0.5 5 10 15 20 25 30 35 40 45 50 (b) 0 −2 0.1 0 0.05 2 0 5 10 15 20 25 30 35 40 45 50 0 Figure 2.10: The results of Gaussian process regression using true x. a) The posterior mean estimation. b) The posterior variance estimation, and x shown with aquamarine crosses. (a) −2 2 0 1 2 0 5 10 15 20 25 30 35 40 45 50 (b) 0 −2 0.2 0 0.1 2 0 5 10 15 20 25 30 35 40 45 50 0 ¯ Figure 2.11: The results of the QDS 1 approach using noisy positions x. a) The predictive ¯ mean estimation. b) The predictive variance estimation, and x shown with aquamarine crosses. Since there is only one measurement per position, QDS 2 is same as QDS 1. As can be seen in these figures, the peaks of the predicted fields by Monte Carlo and Laplace’s method 46 (a) 1 −2 0 2 0 0.5 5 10 15 20 25 30 35 40 45 50 (b) 0 −2 0.2 0 0.1 2 0 5 10 15 20 25 30 35 40 45 50 0 Figure 2.12: The results of the Monte Carlo method with 1000 samples using noisy positions ¯ ¯ x. a) The predictive mean estimation. b) The predictive variance estimation, and x shown with aquamarine crosses. (a) 1 −2 0 2 0 0.5 5 10 15 20 25 30 35 40 45 50 (b) 0 −2 0.2 0 0.1 2 0 5 10 15 20 25 30 35 40 45 50 0 ¯ Figure 2.13: The results of the Laplace method using noise augmented positions x shown with aquamarine crosses. a) The predictive mean estimation. b) The predictive variance ¯ estimation and x shown with aquamarine crosses. ¯ with noisy x are closer to the peak predicted by the true x than that of the QDS approach. Indeed, QDS 1 has failed in Fig. 2.11 to produce a good estimation of the true field with 47 (a) 1 −2 0 2 0 0.5 5 10 15 20 25 30 35 40 45 50 (b) 0 −2 0.2 0 0.1 2 0 5 10 15 20 25 30 35 40 45 50 0 Figure 2.14: The results of the simple Laplace method using noise augmented positions ¯ x shown with aquamarine crosses. a) The predictive mean estimation. b) The predictive ˆ variance estimation and x shown with aquamarine crosses. Table 2.4: Experimental RMS error w.r.t. Gaussian estimation Estimator & Method QDS 1 in (2.32) Monte Carlo method in (2.22) Laplace’s method in (2.27) Simple Laplace method in (2.30) Figure Fig. 2.11 Fig. 2.12 Fig. 2.13 Fig. 2.14 RMS error 0.3955 0.1176 0.1305 0.1252 neglecting the uncertainty in the positions, while Monte Carlo, Laplace and simple Laplace methods, proposed in this chapter, produce good estimations in Figs. 2.12, 2.13, and 2.14, respectively. Gaussian regression using true x is our best estimation of the true field (Fig. 2.10). Table 2.4 shows the RMS errors of the different estimation approaches using the noisy x (i.e. ¯ x) with respect to Gaussian regression using the true x. As can be seen in Table 2.4, the proposed approaches outperform QDS 1 in terms of the RMS error. 48 2.8.2 Experiment with noisy sampling positions In Section 2.8.1 we used the true and simulated noisy sampling positions to compare the performance of the proposed approaches. In this section, we present another set of experimental results under real localization errors. The experimental results were obtained using a single aquatic surface robot in an outdoor swimming pool as shown in Fig. 2.15. The robot is capable of monitoring the water temperature in an autonomous manner while could be remotely supervised by a central station as well. To measure the location of the robot, we used a Xsense MTi-G sensory unit [1] (as shown in the center of Fig. 2.15-(a)) which consists of an accelerometer, a gyroscope, a magnetometer, and a Global Positioning System (GPS) unit. A Kalman filter is implemented inside of this sensor by the company, which produces the localization of the robot with accuracy of one meter. More information about this experiment can be found in [39]. In this experiment, we have selected and identified the system parameters based on a priori knowledge about the process and sensor noise characteristics. Therefore, model hyperparameters such as σ0 = 2, Σv = I, σ = 0.005, and σf 2 = 0.3 are known. The number of sampling positions and sampled measurements are N = n = 10. To validate our approaches, we have controlled the temperature field by turning the hot water pump on and off. The hot water outlet locations have been shown in Fig. 2.15-(b). We turned on the hot water pump for a while. After that, the hot water pump was turned off, and after 6 minutes, the robot collected 10 measurements with 10 sec. time intervals. The estimated temperature and its error variance fields, by applying QDS 1, Monte Carlo, fully exponential Laplace, and simple Laplace methods are shown in Figs. 2.16, 2.17, 2.18, and 2.19, respectively. For each method, the estimated temperature and its error variance fields are shown in subfigures of (a) and (b), respectively. 49 temperature sensor connector HDD motor driver wifi battery micro controller MTi-G PC104 motor (a) Hot water outlets (b) Figure 2.15: a) The developed robotic sensor. b) The experimental environment- a 12 × 6 meters outdoor swimming pool. As can be seen in Figs. 2.17, 2.18, and 2.19, results from Monte Carlo, fully exponential Laplace, and simple Laplace methods are well matched. 50 (a) −3 18 −2 17.8 −1 17.6 0 17.4 1 17.2 2 3 −6 −4 −2 0 2 4 6 (b) −3 0.5 −2 0.4 −1 0.3 0 0.2 1 0.1 2 3 −6 17 −4 −2 0 2 4 6 0 ¯ Figure 2.16: The results of the QDS 1 approach using noisy positions x. a) The predictive ¯ mean estimation. b) The predictive variance estimation, and x shown with aquamarine crosses. 2.9 Conclusion We have formulated Gaussian process regression with observations under the localization uncertainty due to (possibly mobile) sensor networks with limited resources. Effects of the measurements noise, localization uncertainty, and prior distributions have been all correctly incorporated in the posterior predictive statistics in a Bayesian approach. We have reviewed the Monte Carlo sampling and Laplace’s method, which have been applied to compute the analytically intractable posterior predictive statistics of the Gaussian processes with localization uncertainty. The approximation error and complexity of all the proposed approaches 51 (a) −3 18 −2 17.8 −1 17.6 0 17.4 1 17.2 2 3 −6 −4 −2 0 2 4 6 (b) −3 0.5 −2 0.4 −1 0.3 0 0.2 1 0.1 2 3 −6 17 −4 −2 0 2 4 6 0 Figure 2.17: The results of the Monte Carlo method with 1000 samples using noisy positions ¯ ¯ x. a) The predictive mean estimation. b) The predictive variance estimation, and x shown with aquamarine crosses. have been analyzed. In particular, we have provided tradeoffs between the error and complexity of Laplace approximations and their different degrees such that one can choose a tradeoff taking into account the performance requirements and computation complexity due to the resource-constrained sensor network. Simulation study demonstrated that the proposed approaches perform much better than approaches without considering the localization uncertainty properly. Finally, we applied the proposed approaches on the experimentally collected real data to provide proof of concept tests and evaluation of the proposed schemes in practice. From both simulation and experimental results, the proposed methods outper- 52 (a) −3 18 −2 17.8 −1 17.6 0 17.4 1 17.2 2 3 −6 −4 −2 0 2 4 6 (b) −3 0.5 −2 0.4 −1 0.3 0 0.2 1 0.1 2 3 −6 17 −4 −2 0 2 4 6 0 ¯ Figure 2.18: The results of the Laplace approximations using x. a) Predictive mean estima¯ tion. b) The predictive variance estimation, and x shown with aquamarine crosses. formed the quick-and-dirty solutions often used in practice. 53 (a) −3 18 −2 17.8 −1 17.6 0 17.4 1 17.2 2 3 −6 −4 −2 0 2 4 6 (b) −3 0.5 −2 0.4 −1 0.3 0 0.2 1 0.1 2 3 −6 17 −4 −2 0 2 4 6 0 Figure 2.19: The results of the simple Laplace method using noise augmented positions ¯ x shown with aquamarine crosses. a) The predictive mean estimation. b) The predictive ˆ variance estimation and x shown with aquamarine crosses. 54 Chapter 3 Position estimation using Gaussian process regression Abstract: This chapter presents a position estimation method for a robot using an omnidirectional camera. We present an approach to build a map from optimally selected visual features using Gaussian Process (GP) regression. Optimal feature selection for regression is challenging in general. In this chapter, we apply a cross-validation technique to select features in order to improve the quality of the position estimation. The position of the robot is estimated by a maximum likelihood estimation (MLE). The collection of selected features over a surveillance region is modeled by a multivariate GP with unknown hyperparameters. The hyperparameters are identified through the learning process by an MLE, which are used in prediction in an empirical Bayes approach. The excellent results of the proposed algorithm are illustrated by the experimental study under two different real-world scenarios with and without uncertainty in sampling positions. 3.1 Introduction Localization of a mobile robot relative to its environment using vision information has received extensive attention over past few decades from the robotic and computer vision com- 55 munities [5, 20]. Vision-based robot positioning involves two steps. The first step involves learning some properties of vision data (features) with respect to the spatial position where observation is made, so-called mapping. The second step is to find the best match for the new spatial position corresponding to the newly observed features, so-called matching. The mapping from these visual features to the domain of the associated spatial position is highly nonlinear and sensitive to the type of selected features. In most cases, it is very difficult to derive the map analytically. The features shall vary as much as possible over the spatial domain while varying as small as possible for a given position over the disturbance. For example, they should be insensitive to changes in illumination and partial obstruction. Motivated by the aforementioned situations, we consider the problem of selecting features from the original feature set in order to improve the localization performance of a robot. The central assumption when using a feature selection technique is that the original feature set contains many redundant or irrelevant features. To facilitate further discussion, let us consider a configuration where the input vector X is the robot position and the output feature vector Y is the collection of extracted features from the vision data. We first build a feature map F at a robot location X such that F (X) = Y . In order to reduce position estimation error, the ideal subset is defined as follows. ˆ Yopt = arg min X − F −1 (Y ) 2 , ˆ Y ˆ where Y is a vector that consists of the selected entries of the original vector Y . However with a high cardinality of the original feature set, the optimal solution relies on the combinatorial optimization which is not feasible. On the other hand, using mutual information criteria, F and F −1 could be chosen as 56 follows: F (X) = arg max I(X, Y ), Y where I(X, Y ) = P(X,Y ) P(X, Y ) log P(X) P(Y ) F −1 (Y ) = arg max I(X, Y ), X is the mutual information of X and Y . Note that, in the case where P(X) and P(Y ) are constant then F (X) is equivalent to the maximized log-likelihood function. Guo et al. [33] show by using mutual information, one can achieve a recognition rate higher than 90% while just using 0.61% of feature space for a classification problem. The only issue with mutual information is with its computational complexity and to avoid that, one has to avoid the higher dependencies among variables [81]. In order to make a fast and precise estimation, most of the existing localization algorithms extract a small set of important features from the robotic sensor measurements. The features used in different approaches for robotic localization range from C.1: artificial markers such as color tags [45] and barcodes (that need to be installed) [92], C.2: geometric features such as straight wall segments and corners [46], and to C.3: natural features such as light and color histograms [4]. Most of the landmark-based localization algorithms are classified in C.1 and C.2. It is shown in [87] that autonomous navigation is possible for outdoor environments with the use of a single camera and natural landmarks. In a similar attempt, [22] addressed the challenging problem of indoor place recognition from wearable video recording devices. The localization methods which rely on artificial markers (or static landmarks) have three disadvantages: lack of flexibility, lack of optimality, and lack of autonomy. A method is described in [99] that enables robots to learn landmarks for localization. Artificial neural networks are used for extracting landmarks. However, the localization methods which rely on dynamic landmarks [99] have disadvantages such as lack of stability. Furthermore, there are reasons to avoid the geometric model as well, even when a geometric model does exist. 57 Such cases may include: 1) the difficulty of reliably extracting sparse, stable features using geometrical models, 2) the ability to use all sensory data directly rather than a relatively small amount of abstracted discrete information obtained from feature extraction algorithms, and 3) high computational and storage costs of dealing with dense geometric features. In contrast to the localization problem with artificial markers or popular geometrical models, there is a growing number of practical scenarios in which global statistical information is used instead. Some works illustrate localization using various spatially distributed (continuous) signals such as distributed wireless Ethernet signal strength [27], or multidimensional magnetic fields [105]. In [110], a neural network is used to learn the implicit relationship between the pose displacements of a 6-DOF robot and the observed variations in global descriptors of the image, such as geometric moments and Fourier descriptors. In similar studies, gradient orientation histograms [55] and low dimensional representation of the vision data [103] are used to localize mobile robots. In [80], an algorithm is developed for navigating a mobile robot using a visual potential. The visual potential is computed from the image appearance sequence captured by a camera mounted on the robot. A method for recognizing scene categories by comparing the histograms of local features is presented in [59]. Without explicit object models, by using global cues as indirect evidence about the presence of an object, they consistently achieve an improvement over an orderless image representation [59]. The recent research efforts that are closely related to our problem are summarized as follows. The location for a set of image features from new observations is inferred by comparing new features with the calculated map [7, 66, 67]. In [109], a neural network is used to learn the mapping between image features and robot movements. Similarly, there exists effort on automatically finding the transformation that maximizes the mutual information 58 between two random variables [107]. Using Gaussian process (GP) regression, the authors of [7, 91] present effective approaches to build a map from a sparse set of noisy observations taken from known locations using an omnidirectional camera. While the selection of visual features for such applications determines the ultimate performance of the algorithms, such a topic has not been investigated to date. Therefore, building on Brook’s approach [7] our work expands it more on the feature extraction and selection in order to improve the quality of localization. A Bayesian point of view is taken to make the map using a GP framework. The contributions of the chapter are as follows. This chapter provides a position estimation method for a robot using an omnidirectional camera. We present an approach to build a map from optimally selected visual features using GP regression. First, we describe how we can extract some robust properties from vision data captured by an omnidirectional camera (Section 3.2). In particular, we describe how the fast Fourier transform (FFT) is applied to the panoramic image to calculate a set of image properties. We then transform the high dimensional vision data to a set of uncorrelated feature candidates. A multivariate GP regression with unknown hyperparameters is formulated to connect the set of selected features to their corresponding sampling positions (Section 3.3). An empirical Bayes method using a point estimate is used to predict the feature map. Next, a feature reduction approach is developed using a cross-validation method such that an optimal subset of the features is selected to minimize the Root Mean Square Error (RMSE) (Section 3.4). The effectiveness of the proposed algorithms is illustrated by excellent experimental results under two different setups (Section 3.5). Additionally, some comparison results are presented under different images features and feature selection methods. 59 (a) (b) (c) (d) Figure 3.1: (a) and (b) show the wrapped omnidirectional image and the unwrapped panoramic image, respectively. (c) and (d) show the reduced size gray scale image and the two-dimensional FFT magnitude plot, respectively. 3.2 Image features Conventional video cameras with projective lens have restricted fields of view. Adding mirrors of different shapes, 360◦ panoramic views can be achieved in a single image [26]. In this chapter, to make localization insensitive to heading angle, a spherical mirror is used to capture a 360◦ view from the environment of a robot. Before an omnidirectional image is processed, it is first unwrapped. When it comes from the camera, the image is a nonlinear mapping of a 360◦ panoramic view onto a donut shape. Recovering the panoramic view from the wrapped view requires the reverse mapping of pixels from the wrapped view onto a panoramic view [19, 66]. Figs. 3.1-(a) and 3.1-(b) show the wrapped omnidirectional image and the unwrapped panoramic image, respectively. The fast Fourier transform (FFT) is applied to the panoramic image to calculate a set of 60 image properties y. For a square image of size N × N , the two-dimensional FFT is given by N −1 N −1 F [i] (ρ, l) = b a −j2π(ρ N +l N ) f [i] (a, b)e , a=0 b=0 where f [i] is the i-th two-dimensional realized image, and j is the imaginary unit. In order to use FFT, we convert panoramic color images to gray scale 128 × 128 pixel images, i.e., [f (a, b)]. Fig. 3.1-(c) and (d) show the reduced size gray scale image and its two-dimensional FFT magnitude plot, respectively. The Fourier transform is an important image processing tool which is used to decompose an image into its periodic components. The output of the transformation represents the image in the Fourier or frequency domain, while the input image is in the spatial domain. In image processing, often only the magnitude of the Fourier transformed image is utilized, as it contains most of the information of the geometric structure of the spatial domain image [76]. The Fourier transform is used in a wide range of applications such as image analysis, image filtering, image reconstruction, and image compression. In addition, localization methods are proposed in [66, 110] where every image is represented by a family of components in the Fourier domain. We briefly highlight three properties obtained by applying the Fourier transform on the omnidirectional images. First, the magnitude of the Fourier components can be represented by a continuous function of the robot position. Next, the phase angles of the Fourier components are related to the heading of the robot. Finally, by representing an image with a small number of frequency components in the Fourier domain, data compression can be achieved. In [66], it was shown that the first 15 components of FFT carry enough information to correctly match a pair of images. In an indoor environment most of the objects stand vertically. In this planer navigation 61 case, it is reasonable to assume that features in the horizontal direction are more important than those in the vertical direction. For example, the borders between wall and doors and widows in the horizontal direction will be observed by this planar robot. As expected, from the spectrum Fig. 3.1-(c) it can be seen that almost all the information is contained near the center and the low parts of the frequencies. It is reasonable that components in the frequencies provide the best ground for the localization process. Therefore, the image properties are computed by: [i] yρ = F [i] (ρ, 0) = 127 127 −j2πρb f [i] (a, b) e 128 b=0 , (3.1) a=0 where |·| denotes the absolute value of a complex number. Equation (3.1) can be interpreted as the one-dimensional FFT of the image that is averaged over the vertical direction. In addition to the FFT coefficients, the histogram for the image, and wavelet coefficients could be used as image properties [59]. An image histogram is a type of the histogram that acts as a graphical representation of the tonal distribution in a digital image. The number of different tonal bins in the histogram is specified by the number of properties. Then the histogram determines the number of pixels for each tonal value. A discrete wavelet transform (DWT) is a wavelet transform for which the wavelets are discretely sampled. The wavelet coefficients is also used widely by appearance-based place recognition methods [7, 103]. In Section 3.5, we will experimentally evaluate our approach with features obtained by using FFT, histogram, and discrete wavelet transforms. 62 3.3 Gaussian process (GP) model We propose a multivariate GP as a model for the collection of image features. A GP defines a distribution over a space of functions and it is completely specified by its mean function [i] and covariance function. We denote that yρ := yρ s[i] ∈ R is the i-th realization of the ρ-th image property and s[i] ∈ S is the associated position where the realization occurs. Here S denotes the surveillance region, which is a compact set. Then, the accumulative image properties y is a random vector defined by y = [1] [n] yρ , · · · , yρ T T y1 , · · · , ym T ∈ Rnm , and yρ = ∈ Rn contains n realizations of the ρ-th image property. We assume that the accumulative image properties can be modeled by a multivariate GP, i.e. y ∼ GP(Γ, Λ), where Γ : S n → Rmn and Λ : S n → Rmn×mn are the mean function and the covariance function, respectively. However, the size and multivariate nature of the data lead to computational challenges in implementing the framework. For models with multivariate output, a common practice is to specify a separable covariance structure for the GP for efficient computation. For example, Higdon [36] calibrated a GP simulator with the high dimensional multivariate output, using principal components to reduce the dimensionality. Following such model reduction techniques, we transform the vector y to a vector z such that its elements zρ |ρ ∈ Ωm , where Ωm = {1, · · · , m} are i.i.d. The statistics of y can be computed from the learning dataset. 1 µy = 2 n y [i] , i=1 1 Σy = n−1 n ||y [i] − µy ||2 . i=1 The singular value decomposition (SVD) of Σy is a factorization of the form Σy = U SU T , where U is a real unitary matrix and S is a rectangular diagonal matrix with nonnegative real numbers on the diagonal. In summary, the transformation will be performed by the 63 following formula. z [i] = S −1/2 U T (y [i] − µy ). (3.2) From now on, we assume that we applied the transformation given by (3.2) to the visual data. Hence, we have the zero-mean multivariate GP: z(s) ∼ GP(0, K(s, s )), which consists of multiple scalar GPs that are independent of each other. 3.3.1 The ρ-th random field In this subsection, we only consider the ρ-th random field (visual feature). Other scalar random fields can be treated in the same way. The collection of n realized values of the [1] [n] [i] ρ-th random field is denoted by zρ := (zρ , · · · , zρ )T ∈ Rn , where zρ := zρ (s[i] ) is the [i] [i] i-th realization of the ρ-th random field and s[i] = (s1 , s2 ) ∈ S ⊂ R2 is the associated position where the realization occurs. We then have zρ (s) ∼ N (0, Σρ ), where Σρ ∈ Rn×n is [ij] the covariance matrix. The i, j-th element of Σρ is defined as Σρ [i] [j] = Cov (zρ , zρ ). In this chapter, we consider the squared exponential covariance function [85] defined as  [ij] Σρ 2 = σf,ρ exp − 1 2 2 [i] (s [j] − s )2 σ 2,ρ =1  . (3.3) In general, the mean and the covariance functions of a GP can be estimated a priori by maximizing the likelihood function [114]. [i] The prior distribution of zρ is given by N (0, Σρ ). A noise corrupted measurement zρ at ˜ its corresponding location s[i] is defined as follows. [i] [i] [i] zρ = zρ + ρ , ˜ 64 (3.4) [i] where the measurement errors { ρ } are assumed to be an independent and identically dis[i] i.i.d. tributed (i.i.d.) Gaussian white noise, i.e., ρ ∼ N (0, σ 2,ρ ). Thus, we have that zρ ∼ N (0, Rρ ), ˜ where Rρ = Σρ + σ 2,ρ I . The log-likelihood function is defined by 1 1 T n ˜ Lθ,ρ = − zρ Rρ zρ − log |R|ρ − log 2π, ˜ 2 2 2 (3.5) where n is the size of zρ . ˜ The hyperparameter vector of the ρ-th random field is defined as θρ = (σf,ρ , σ ,ρ , σ1,ρ , σ2,ρ ) ∈ R4 . Using the likelihood function in (3.5) the hyperparameter vector can be computed >0 by the ML estimator ¯ θρ = arg max Lθ,ρ , θ (3.6) which will be plugged in prediction as in an empirical Bayes way. All parameters are learned simultaneously. If no prior information is given, then the maximum a posteriori probability (MAP) estimator is equal to the ML estimator [114]. In a GP, every finite collection of random variables has a multivariate normal distribution. Consider a realized value of the ρ-th random field zρ being taken from the associated location s . The probability distribution P(zρ |s , s, zρ ) is a normal distribution with the following ˜ mean and variance. T −1 ˜ µρ (s ) = Cρ Rρ zρ , 2 2 T −1 σρ (s ) = σf,ρ − Cρ Rρ Cρ , 65 (3.7) where the covariance Cρ := Cov (zρ , zρ ) ∈ R1×n is defined similar to (3.3). In order to estimate location s , using the MAP estimator, we need to compute P(s |˜ρ , s, zρ ), where the noisy observation zρ is the summation of the realized values of the z ˜ ˜ random field zρ and a noise process. ˜ P(s |˜ρ , s, zρ ) = z ˜ ˜ P(˜ρ |s , s, zρ ) P(s |s, zρ ) z . P(˜ρ |s, zρ ) z ˜ (3.8) A MAP estimator given the collection of observations zρ is a mode of the posterior ˜ distribution. sρ = arg max P(s |˜ρ , s, zρ ). ¯ z ˜ s ∈S (3.9) If P(s |s, zρ ) and P(˜ρ |s, zρ ) are uniform probabilities, then the MAP estimator is equal ˜ z ˜ to the ML estimator, given by sρ = arg max Lρ (s ), ¯ s ∈S (3.10) where the ρ-th log-likelihood function, i.e., Lρ (s ), is defined as follows. Lρ (s ) = 3.4 −1 2 |˜ρ − µρ (s )|2 z 2 σ 2,ρ + σρ (s ) 2 + log σ 2,ρ + σρ (s ) + log 2π . (3.11) Localization and feature selection Let Ω be the collection of indices that are associated to the multiple scalar random fields (of the multivariate GP). Provided that all scalar random fields (of the multivariate GP) are independent of each other, we then obtain a computationally efficient ML estimate of the 66 location given the observations of all scalar random fields {˜ρ |ρ ∈ Ω} as follows. z sΩ = arg max ¯ s ∈S Lρ (s ), (3.12) ρ∈Ω where Lρ (s ) is the ρ-th log-likelihood function as given in (3.11). In this chapter, a hold-out cross-validation technique [52] is used for the model selection. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. To this end, we divide the dataset into two segments: one used to learn or train the GP model and the other used to validate the model. The RMSE is used to measure the performance of GP models. It is defined by the following equation. RMSE(Ω) = 1 nc nc [i] sc − sΩ ¯ 2 (3.13) , i=1 where · is the Euclidean norm of a vector. In the case that Ω = ∅, we define the following. RMSE(∅) = 1 nc nc [i] sc − median(sc ) 2 , i=1 where median(·) is the median of a random vector. Assume that Ωm = {1, · · · , m} is the set of all features. Dupuis et al. [23] reported that the backward sequential selection outperform the forward sequential selection. Thus, we use a backward sequential selection algorithm as 67 Algorithm 1 Learning maps from a sparse set of panoramic images observed in known locations Input: (1) training dataset includes a set of panoramic images captured from known spatial sites, Output: (1) a linear transformation from image properties to uncorrelated visual features, (2) the estimated hyperparameter, the estimated mean and the estimated variance function of each independent visual feature, extract image properties y [i] in the available learning dataset using (3.1) use SVD to make a set of uncorrelated visual features z [i] using (3.2) for each independent visual feature estimate hyperparameters using (3.6) compute the mean function and variance function for each of independent features using (3.7) 5: choose optimal subset of visual features using (3.15) to eliminate some of the visual features that are worthless for the localization goal. 1: 2: 3: 4: follows. Ω −1 = Ω − arg min RMSE(Ω − ρ), ∀ ∈ Ωm , ρ∈Ω (3.14) where Ω − ρ = {p|p ∈ Ω , p = ρ}. Finally a subset of features is selected as follows. Ωopt = arg min Ω=Ω1 ,··· ,Ωm RMSE(Ω). (3.15) The optimum subset Ωopt has the minimum RMSE among {Ω1 , · · · , Ωm }. The mapping and matching steps of the proposed approaches in this chapter are summarized in Algorithm 1 and Algorithm 2, respectively. 68 Algorithm 2 Localization predictive inference using learned map of visual features Input: (1) a linear transformation from image properties to uncorrelated visual features, (2) the estimated hyperparameter and the estimated mean and variance function of selected visual features, Output: (1) position of newly captured images. 1: capture new images and obtain image properties y using (3.1) 2: compute the selected visual features z using (3.2) 3: compute the likelihood function of selected features ρ ∈ Ωopt over possible sampling positions using (3.11) 4: determine the estimated position sΩ ¯ using (3.12) opt 3.5 Experimental Results In this section, we demonstrate the effectiveness of the proposed localization algorithms using experiments. We report results on two different datasets of Case 1 and Case 2, obtained from two experimental setups. 3.5.1 Experimental setups In Case 1, the Kogeto panorama lens was used to capture 360-degree images. In total, 207 pairs of exact sampling positions were recorded manually and corresponding captured panoramic images on a regular lattice (23 × 9 feet2 ) were collected. In Case 2, a vision-based robotic sensor was built by the authors to validate a proof of concept for the proposed methods. The wireless mobile robot is equipped with two motorized wheels, a micro-controller, a 360 degree omnidirectional camera, a wireless receiver, and a transmitter. The omnidirectional camera was built from a cheap camera (Ai-Ball Treck R ) and a spherical mirror. The 360 degree images of the environment around the robot produced by the omnidirectional camera were steamed via 802.11 b/g Wi-Fi (see Fig. 3.2). In Case 2, we collected 825 pairs of noisy sampling positions recorded automatically by image processing 69 software (using an exterior camera), and corresponding captured panoramic images. The noise power of the image processing software is 1 square foot, and the samples were taken on a 10 × 26 feet2 surface. Figs. 3.2-(a) and 3.2-(b) show two different hardware setups for Case 1 and Case 2, respectively. The datasets are divided into 80% learning, 10% cross-validation and 10% testing subsets. The learning subset is used to estimate hyperparameters, mean functions, and variance functions of GP models for the scalar fields. The cross-validation subset is used to select the best features in order to minimize the localization estimation error. After the training, the testing subset is used to evaluate the algorithm performance. 3.5.2 Learning of GP models in an empirical Bayes approach As illustrative examples, we apply the proposed algorithm on both dataset with 32 computed visual features. Figs. 3.3 and 3.4 show the estimated hyperparameters (k ∈ Ω32 ) for Case 1 and Case 2, respectively. The variance of the random field σf , the spatial bandwidths β1 and β2 , and the noise variance σ are estimated for each independent feature, individually. Thus, 32 × 4 = 128 hyperparameters have been estimated in total for each experimental setup. In Case 1, the predictive mean and variance of each visual features {zk |k ∈ Ω32 } are shown in Figs. 3.5 and 3.6, respectively. The predictive mean and variance for Case 2 are shown similarly in Figs. 3.7 and 3.8, respectively. The fixed running time using Matlab R2009b on a PC (3.2 GHz Intel i7 Processor) is summarized in Table 3.1, for Case 2 (similar results are observed for Case 1). Localization for each test point takes 0.2 sec which is fast enough for a real time implementation. 70 (a) (b) Figure 3.2: (a) Dot iPhone Panorama lens is used for Case 1. (b) A vision-based robot was built equipped with a 360 degree omnidirectional camera for Case 2. 71 σf,k σ1,k σ2,k σε,k 1 2 4 6 8 10 12 14 16 18 20 22 24 k−th independent features zk. 26 28 30 32 Figure 3.3: Estimated hyperparameters for each individual scalar field for Case 1. σf,k σ1,k σ2,k σε,k 1 2 4 6 8 10 12 14 16 18 20 22 24 k−th independent features zk. 26 28 30 32 Figure 3.4: Estimated hyperparameters for each individual scalar field for Case 2. 3.5.3 Comparison study We compare the performance of the proposed approach under different image features as summarized in Table 3.2. In this comparison study three different types of appearance-based features such as FFT [110], histogram [34], and wavelet [7] are considered. Additionally, the comparison between the proposed cross-validation (CV) method and the traditional PCA in feature elimination is investigated. Finally, we compared the result of localization methods in terms of different number of features. Table 3.2 shows the dataset (column 1), the appearance-based features type (column 2), the total number of features (column 3), the optimum number of features determined by the 72 µ1 8 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 µ2 5 10 15 20 µ5 10 15 20 µ9 10 15 20 µ13 10 15 20 µ17 10 15 20 µ21 10 15 20 µ25 10 15 20 µ29 10 15 20 2 8 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 µ3 10 15 20 µ6 10 15 20 µ10 10 15 20 µ14 10 15 20 µ18 10 15 20 µ22 10 15 20 µ26 10 15 20 µ 30 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 10 15 20 10 15 20 µ7 10 15 20 µ11 10 15 20 µ15 10 15 20 µ19 10 15 20 µ23 10 15 20 µ27 10 15 20 µ31 8 5 µ4 2 5 10 15 20 2 5 8 2 5 8 2 5 10 15 20 µ20 5 10 15 20 µ24 8 2 5 5 1 0 −1 10 15 20 µ28 −3 10 15 20 −4 µ32 8 2 3 −2 8 2 10 15 20 µ16 5 8 2 10 15 20 µ12 4 2 8 2 10 15 20 µ8 5 10 15 20 Figure 3.5: Predictive mean function map for each individual feature zk for Case 1. proposed approach (column 4), the localization RMSE obtained using the total number of features (column 5), the sixteen features selected by PCA (column 6), the sixteen features selected by CV (column 7), and the optimum number of features (column 8). Note that sampling positions of Case 1 were recorded exactly while those of Case 2 73 Table 3.1: Computational time Procedure Pre-processing Feature extraction Learning hyperparameters Learning map and variances Feature selection Localization Number of observations 825 825 659 659 83 83 Time (sec) 35 2 286 51 21 10 Table 3.2: The localization performance comparison Case 1 Case 2 feature type FFT FFT FFT FFT Histogram Histogram Histogram Histogram Wavelet FFT FFT FFT FFT Histogram Histogram Histogram Histogram Wavelet number of features total optimum 8 7 16 13 32 10 64 26 8 5 16 10 32 13 64 32 78 32 8 8 16 11 32 21 64 21 8 5 16 9 32 19 64 32 78 27 74 RMSE using subset of features All PCA 16 CV 16 CV opt 4.98 0.63 4.98 0.53 4.98 4.98 0.24 0.22 4.98 4.98 0.31 0.26 4.12 2.10 3.85 1.57 1.44 1.43 1.22 1.18 1.46 1.56 1.15 1.13 6.69 4.43 4.03 2.49 2.21 2.21 2.00 1.58 1.52 2.42 1.37 1.36 1.96 2.54 1.24 1.19 2.88 2.38 2.60 1.77 2.03 1.54 1.59 1.47 2.08 2.23 1.53 1.40 5.96 7.94 3.82 3.52 σ1 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 10 15 20 σ5 10 15 20 σ9 10 15 20 σ13 10 15 20 σ17 10 15 20 σ21 10 15 20 σ25 10 15 20 σ29 10 15 20 σ2 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 10 15 20 σ6 10 15 20 σ10 10 15 20 σ14 10 15 20 σ18 10 15 20 σ22 10 15 20 σ26 10 15 20 σ30 10 15 20 σ3 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 8 2 5 10 15 20 σ7 10 15 20 σ11 10 15 20 σ15 10 15 20 σ19 10 15 20 σ23 10 15 20 σ27 10 15 20 σ 31 10 15 20 σ4 8 2 5 10 15 20 σ8 5 10 15 20 σ12 8 2 8 2 5 8 2 5 5 8 2 5 8 2 0.8 0.7 10 15 20 σ20 10 15 20 σ24 10 15 20 σ28 0.5 0.4 0.3 0.2 0.1 5 10 15 20 σ 32 5 10 15 20 8 2 0.9 0.6 8 2 10 15 20 σ16 1 Figure 3.6: Prediction variance map for each individual feature zk for Case 1. were noisy. As expected, the results of Case 1 outperform those of Case 2. This could be due to the fact that noisy sampling positions lead to poorly estimated hyperparameters and predictions of the GP. Furthermore, the FFT appearance-based features show more vulnerability against sampling position noise. The RMSE results of FFT-based localization 75 25 µ1 25 µ2 25 µ3 25 µ4 25 µ5 25 µ6 25 µ7 25 20 20 20 20 20 20 20 20 15 15 15 15 15 15 15 15 10 10 10 10 10 10 10 10 5 5 5 5 5 5 5 5 25 5 10 µ10 25 5 10 µ11 25 5 10 µ12 25 5 10 µ13 25 5 10 µ14 25 5 10 µ15 25 20 20 20 20 20 20 20 20 15 15 15 15 15 15 15 15 10 10 10 10 10 10 10 10 5 5 5 5 5 5 5 5 25 5 10 µ22 25 5 10 µ23 25 µ8 25 25 5 10 µ9 5 10 µ17 25 5 10 µ18 25 5 10 µ19 25 5 10 µ20 25 5 10 µ21 20 20 20 20 20 20 20 15 15 15 15 15 15 10 10 10 10 10 10 5 5 5 5 5 5 1.5 1 5 10 µ24 10 5 2 5 0.5 0 15 10 5 10 µ16 20 15 2.5 25 5 10 µ25 25 5 10 µ26 25 5 10 µ27 25 5 10 µ28 25 5 10 µ29 25 5 10 µ30 25 5 µ31 10 −0.5 −1 5 25 20 20 20 20 20 20 20 15 15 15 15 15 15 10 10 10 10 10 10 5 5 5 5 5 5 −2 10 5 −1.5 15 10 µ32 20 15 10 5 5 10 5 10 5 10 5 10 5 10 5 10 5 10 −2.5 5 10 Figure 3.7: Predictive mean function map for each individual feature zk for Case 2. are slightly better than those of histogram-based localization and much better than those of wavelet-based localization [7]. Besides the type and total number of features, the CV has 76 25 σ 1 25 σ 2 25 σ 3 25 σ 4 25 σ 5 25 σ 6 25 σ 7 25 20 20 20 20 20 20 20 20 15 15 15 15 15 15 15 15 10 10 10 10 10 10 10 10 5 5 5 5 5 5 5 5 25 5 10 σ 9 25 5 10 σ 10 25 5 10 σ 11 25 5 10 σ 12 25 5 10 σ 13 25 5 10 σ 14 25 5 10 σ 15 25 20 20 20 20 20 20 20 20 15 15 15 15 15 15 15 15 10 10 10 10 10 10 10 10 5 5 5 5 5 5 5 5 25 5 10 σ 17 25 5 10 σ 18 25 5 10 σ 19 25 5 10 σ 20 25 5 10 σ 21 25 5 10 σ 22 25 5 10 σ 23 25 20 20 20 20 20 20 20 20 15 15 15 15 15 15 15 15 10 10 10 10 10 10 10 10 5 5 5 5 5 5 5 σ 8 5 5 25 10 σ 25 5 25 σ 26 10 5 25 σ 27 10 5 25 σ 28 10 5 25 σ 29 10 5 25 σ 30 5 10 25 σ 31 10 25 20 20 20 20 20 20 20 15 15 15 15 15 15 10 10 10 10 10 10 5 5 5 5 5 5 0.1 5 10 σ 32 0.05 5 5 10 5 10 5 10 5 10 5 10 5 10 5 10 5 10 Figure 3.8: Prediction variance map for each individual feature zk for Case 2. 77 0.2 0.15 10 5 5 10 σ 24 15 10 0.3 0.25 20 15 5 10 σ 16 a better performance with respect to PCA. In general, increased cardinality of the original feature set leads to a better performance in the 5th column possibly due to the increased information. Similar trends are observed in the 7th and the 8th columns of Table 3.2. Most importantly, we conclude that eliminating some of the features helps to improve the localization error. Therefore, the RMSE of the proposed approach in this chapter is much lower than other methods. 3.6 Conclusion and future works This chapter has presented a novel approach to use vision data for the mobile robot localization. The predictive statistics of vision data is learned in advance and used in order to estimate the position of mobile robot, equipped just with an omnidirectional camera in a GPS-denied environment. The multivariate GP model with unknown hyperparameters is used to model a collection of selected visual features. Experimental study shows excellent positioning results within a reasonable computational time under two different experimental setups with and without uncertainty in sampling positions. A key limitation of the current approach arises from the fact that, after the initial training phase, learning is discontinued. In particular, if the environment changes, it is desirable that the localization routines adapt to the changes in the environment. Thus, a future research direction is develop a localization scheme that is adaptive to changes in the environment. 78 Chapter 4 Bayesian Prediction with Uncertain Localization Abstract: In this chapter, we consider the problem of predicting a spatio-temporal random field using sequential noisy observations collected by robotic sensors. The spatio-temporal field of interest is modeled by a sum of a time-varying mean function and a Gaussian Markov random field (GMRF) with unknown hyperparameters. We first derive the exact Bayesian solution to the problem of computing the predictive inference of the random field, taking into account observations, uncertain hyperparameters, measurement noise, and uncertain localization in a fully Bayesian point of view. We show that the exact solution for uncertain localization is not scalable as the number of observations increases. To cope with this exponentially increasing complexity and to be usable for mobile sensor networks with limited resources, we propose a scalable approximation with a controllable tradeoff between approximation error and complexity to the exact solution. The effectiveness of the proposed algorithms is demonstrated by simulation and experimental results. 79 4.1 Introduction In recent years, there has been an increasing exploitation of mobile robotic sensors in environmental monitoring [10, 12, 62, 64, 117]. Gaussian processes (or Gaussian random fields) defined by mean and covariance functions over a continuum space [14, 84] have been frequently used for mobile sensor networks to statistically model physical phenomena such as harmful algal blooms, pH, and temperature, e.g., [30, 57, 119]. The significant computational complexity in Gaussian process regression due to the growing number of observations (and hence the size of covariance matrix) has been tackled in different ways. In [119], the conditions under which near-optimal prediction can be achieved is analyzed, using truncated observations when the covariance function is known a priori. In [120], a new efficient and scalable inference algorithm for a class of Gaussian processes that builds on a Gaussian Markov random field (GMRF) is developed for known hyperparameters. On the other hand, unknown hyperparameters in the covariance function can be estimated by a maximum likelihood (ML) estimator or a maximum a posteriori (MAP) estimator and then can be used for the prediction [114]. However, the point estimate itself needs to be identified using certain amount of measurements and it does not fully incorporate the uncertainty in the estimated hyperparameters into the prediction in a Bayesian perspective. The advantage of a fully Bayesian approach is the capability of incorporating various uncertainties in the model parameters and measurement processes in the prediction [3]. However, the solution often requires an approximation technique such as Markov Chain Monte Carlo (MCMC), Laplace approximation, or variational Bayes methods, which still requires a high level of computational complexity [3]. In [115, 117], a sequential Bayesian prediction algorithm and its distributed version are designed to deal with uncertain bandwidths by 80 using a compactly supported kernel and selecting a subset of collected measurements. Sequential fully Bayesian prediction algorithms for a GMRF with unknown hyperparameters are reported in [116]. Inexpensive wireless/mobile sensor networks [17, 79] are widespread at the cost of precision in localization. Due to their growing usage, there are many practical opportunities where continuously sampled measurements need to be fused for sensors with localization uncertainty. Theoretically-correct yet efficient (or scalable) inference algorithms need to be developed to meet such demands. Related works involving uncertain localization in our context are as follows. Gaussian process regression has been used in building map and localization in many practical applications. In [7], Gaussian process regression is used to model geo-referenced sensor measurements (obtained from a camera). After training, an ML estimator is used to find the best match for the locations of newly sampled measurements. However, the training step should be done in advance for a new environment using a set of data including noisy measurements and the exact sampling positions [7]. In [51, 105], Gaussian process regression is used to implement simultaneous localization and mapping (SLAM) using a magnetic field and its feasibility is shown experimentally. In [77, 106], the problem of using laser range-finder data is investigated to probabilistically classify a robot’s environment. They provide a continuous representation of the robot’s surroundings by employing a Gaussian process. In [65], a Gaussian process model is presented for training on input points corrupted by independent and identically distributed (i.i.d.) Gaussian noise. To make the computation tractable a local linear expansion is used about each sampling position for the mean function. The work in [65] assumes that all hyperparameters are trained offline a priori. In [44], the problem of Gaussian process regression with uncertain localization and known hyperparameters is for81 mulated and solved. A key limitation of such a approach with fixed hiperparameters arises from the fact that, after the initial training phase, learning is discontinued. In general, if the environment changes, it is desirable that the localization algorithm adapts to the changes on the fly. A fully Bayesian approach that treats hyperparameters as random variables can address this issue with increased computational complexity. The novelty of our work as compared to the previous ones in [115–117] is to fully consider the uncertainty on the sampling positions along with other uncertainties such as hyperparameters, observation noise etc., in a fully Bayesian manner. To the best of our knowledge, fully Bayesian prediction algorithms for spatio-temporal random fields that can take into account uncertain localization is scant to date. Hence, this chapter aims to develop such inference algorithms for robotic sensor networks in practical situations. With continuous improvement in computation power in embedded systems, it is very important to prepare theoretically-correct, and flexible fully Bayesian approach to cope with such practical problems. The contributions of the chapter are as follows. First, we model a physical spatiotemporal random field as a GMRF with uncertain hyperparameters and formulate the prediction problems with and without localization uncertainty. Next, we derive the exact Bayesian solution to the problem of computing the predictive inference of the random field, taking into account uncertain hyperparameters, measurement noise, and uncertain localization in a fully Bayesian point of view. We show that the exact solution for uncertain localization is not scalable as the number of observations increases. To cope with this increasing complexity, we propose a scalable approximation with a controllable tradeoff between approximation error and complexity to the exact solution. The effectiveness of the proposed algorithms is demonstrated by simulation and experimental results. In particular, experimental results 82 are illustrated in both static and dynamical environments. A preliminary version of this work without the proofs of theorems and experimental results has been reported in [41]. The chapter is organized as follows. In Section 4.2, we explain how a GMRF can be viewed as a sparse and discretized version of a Gaussian process. In Sections 4.3 and 4.4, we introduce a spatio-temporal field model based on a GMRF and the mobile sensor network. In Section 4.5.1, we present a fully Bayesian inference approach to estimate the spatio-temporal field. The Bayesian prediction algorithm is extended for uncertain sampling positions in Section 4.5.2. The complexity of the proposed algorithms is discussed in Section 4.5.3. We demonstrate the effectiveness through a simulation study in Section 4.6. Finally, we evaluate our approach on a real experimental setup in Section 4.7. 4.2 Gaussian Process and Gaussian Markov Random fields Recently, there are efforts to fit a computationally efficient GMRF on a discrete lattice to a Gaussian random field on a continuum space [15, 35, 90, 116]. It has been demonstrated that GMRFs with small neighborhoods can approximate Gaussian fields surprisingly well [90]. This approximated GMRF and its regression are very attractive for the resource-constrained mobile sensor networks due to its computational efficiency and scalability [60] as compared to the standard Gaussian process and its regression. Fast kriging of large data sets by using a GMRF as an approximation of a Gaussian random field has been proposed in [35]. We now briefly review a GMRF as a discretized Gaussian process on a lattice. Consider a zero-mean Gaussian process: z(s) ∼ GP(0, K(s, s )), where K is the covariance function 83 defined in a continuum space S. We discretize the compact domain S := [0 max(s1 )] × [0 max(s2 )] into n spatial sites Q := ˚(1) , · · · ,˚(n) s s ⊂ R2 , where n = h max(s1 ) × h max(s2 ). h will be chosen such that n ∈ Z>0 . Note that n → ∞ as h → ∞. The collection of realized values of the random field in Q is denoted by z := (z (1) , · · · , z (n) )T ∈ Rn , where z (i) := z(˚(i) ). s The prior distribution of z is given by z ∼ N (0, K), and so we have 1 P(z) ∝ exp − zT K−1 z , 2 (4.1) where K ∈ Rn×n is the covariance matrix. The i, j-th element of K is defined as (K)ij = Cov(z (i) , z (j) ) = K(z (i) , z (j) ). The prior distribution of z can be written by a precision matrix Q0 = K−1 , i.e., z ∼ N (0, Q−1 ). This can be viewed as a discretized version of 0 the Gaussian process (or a GMRF) with a precision matrix Q0 on Q. Note that Q0 of this ˆ GMRF is not sparse. However, a sparse version of Q0 , i.e., Q0 with local neighborhood that ˆ can represent the original Gaussian process can be found, for example, making Q0 close to Q0 in some norm [15,35,90]. This approximate GMRF will be computationally efficient due ˆ to the sparsity of Q0 . For our main problems, we will use a GMRF with a sparse precision matrix that represents a Gaussian process precisely (see Section 4.4). In this section, we assume that we take N noisy measurements y = (y [1] , · · · , y [N ] )T ∈ RN from corresponding sampling locations s = (s[1]T , · · · , s[N ]T )T ∈ S N . Here, the measurement model is given by y [i] := y(s[i] ) = z(s[i] ) + [i] , ∀i = 1, · · · , N 84 (4.2) i.i.d. where [i] ∼ N (0, σ 2 ) is the measurement noise and is assumed to be independent and identically distributed (i.i.d.). Using Gaussian process regression, the posterior distribution for z ∈ Rn is given by z|s, y ∼ N (µ, Σ). (4.3) The predictive mean µ ∈ Rn and covariance matrix Σ ∈ Rn×n can by obtained by µ = kT C−1 y, Σ = K − kT C−1 k, (4.4) where the covariance matrices are defined as k := Cov(y, z) ∈ RN ×n , C := Cov(y, y) ∈ RN ×N , and K := Cov(z, z) ∈ Rn×n . In practice, resource-constrained robotic sensors are prone to large uncertainty in localization. Most previous works on Gaussian process regression for mobile sensor networks [8, 9, 14, 57] have assumed that the exact sampling positions are available, which is not practical in real situations. Localization uncertainty effect is potentially significant. For example, Fig. 2.1 shows the effect of noisy sampling positions on the results of Gaussian process regression. Note that adding noise to the sampling positions considerably increase the empirical RMS error, shown in the third row of Fig. 2.1. This motivates our work in this chapter. The pose of a robot can be estimated by fusing different sensory information producing its estimate and estimation error statistics. In general, Bayes filters such as extended and unscented Kalman filters [47, 96], particle filters [70], and Monte Carlo techniques [98] are used for such localization. Nevertheless, obtaining precise localization of robotic networks 85 under limited resources is very challenging [37, 38]. Throughout the chapter, we assume that the positions of robotic sensors are estimated by a standard technique that produces the position estimates along with estimation error statistics. A discussion regarding specific localization technique is standard and is beyond the scope of this chapter. Having had the aforementioned assumption, from a localization algorithm, the prior distribution for sampling location s[i] is given as P(s[i] |˜[i] ), possibly with a compact support in s S. Then the predictive distribution of z given the measured locations ˜ = (˜[1]T , · · · , s[N ]T )T s s ˜ is thus given by P(z|˜, y) = s s∈S N P(z|s, y) P(s|˜, y)ds, s (4.5) where P(z|s, y) can be obtained in (4.3). However, the predictive distribution in (4.5) does not have a closed-form solution and needs to be computed either by MCMC methods or approximation techniques [43]. Now we consider a discretized version of the Gaussian process, i.e., (GMRF) with a precision matrix Q0 on Q. Since the sampling points of Gaussian process regression are not necessarily on Q, we use the nearest grid point of a given sampling point s in S q [i] = arg min s[i] − q . q∈Q The sampling positions for the GMRF are then exactly on the lattice, i.e., q [i] ∈ Q. The posterior distribution of z ∈ Rn on Q given by measurements in y ∈ RN and sampling 86 positions in q = (q [1]T , · · · , q [N ]T )T ∈ QN , is then obtained by z|q, y ∼ N (Q−1 b, Q−1 ), (4.6) where Q = Q0 + HP −1 H T , b = HP −1 y, with P = σ 2 I ∈ RN ×N and H ∈ Rn×N defined as (H)ij =    1, if ˚(i) = q [j] , s (4.7)   0, otherwise. We consider again localization uncertainty for this GMRF. Let the measured noisy location q [i] be the nearest grid point of the measured noisy sampling point s[i] of the Gaussian ˜ ˜ process. Now we obtain a set of discretized probabilities in Q induced by the continuous prior distribution defined in S. The discrete prior distribution for the sampling location q [i] is given by P(q [i] = ˚(j) |˜[i] ) = s s s⊂Vj P(s|˜[i] )ds, s (4.8) where P(s|˜) is the continuous prior as in Gaussian process regression and Vj is the Voronoi s cell of the j-th grid point ˚(j) given by s Vj := {s ∈ S | s − ˚(j) ≤ s − ˚(i) , ∀i = j}. s s ˜ The predictive distribution of z given y and q is thus given by P(z|˜ , y) = q P(z|q, y) P(q|˜ , y), q q∈QN 87 (4.9) (a) h1 (b) h2 Figure 4.1: Example of localization uncertainty for s[i] . The measured sampling location q [i] is indicated in a small red circle which is the closest point in the discrete support to the ˜ measured sampling position in the continuous space. The small red circle along with the blue squares and the blue star show the possible locations of the true sampling point q [i] according to the prior distribution P(q [i] = ˚(j) |˜[i] ) with a compact support as shown in the s q [i] which is the closest point in the discrete support big red circle. The blue star indicates q to the true sampling position in the continuous space. where P(z|q, y) can be obtained by (4.6) and the summation is over all possible locations in Q. Fig. 4.1 shows two examples of using this approximation approach with h1 > h2 to convert a continuous space to a discrete one. When h → ∞, q → s and the standard ˜ ˜ Gaussian process regression in a continuum space shall be recovered from the prediction using the GMRF in a discretized space. 4.3 Mobile Senor Networks (1) (n) Suppose that the sampling time t ∈ Z>0 is discrete. Let zt := (zt , · · · , zt )T ∈ Rn be the corresponding values of the scalar field at n special sites and time t. Consider N spatially distributed mobile sensing agents indexed by j ∈ J := {1, · · · , N } sampling at time t ∈ Z>0 . At time t, agent j takes a noise corrupted measurement at its 88 [j] ˚(i) ∈ Q, i.e., s current location st [j] (i) [j] i.i.d. ∼ N (0, σ 2 ), t [j] yt = zt + t , (4.10) [j] where the measurement errors { t } are assumed to be i.i.d. The measurement noise level σ 2 > 0 is assumed to be known. For notational simplicity, we denote all agents’ locations at [1]T time t by qt = qt [N ]T T , · · · , qt [N ] T [1] by yt = yt , · · · , yt ∈ QN and the observations made by all agents at time t ∈ RN . Furthermore, we denote the collection of agents’ locations and the collective observations from time 1 to t by q1:t = qT , · · · , qT t 1 (1) T ∈ QN t and y1:t = (n) (y1 , · · · , yt )T ∈ RN t , respectively. In addition, let us define zt = (zt , · · · , zt )T ∈ Rn on [1] [N ] Q, and t = ( t , · · · , t )T ∈ RN . We then have the following collective notation. T yt = Ht zt + t , (4.11) where Hτ ∈ Rn×N is defined by (Hτ )ij =  [j]   1, if ˚(i) = qτ , s (4.12)   0, otherwise. 4.4 Spatio-Temporal Field Model (i) The value of the scalar field at ˚(i) , zt is modeled by a sum of a time-varying mean function s and a GMRF (i) (i) (i) zt = λt + ηt , ∀i ∈ {1, · · · , n}, t ∈ Z>0 . 89 (4.13) (i) Here the mean function λt : Q × Z>0 → R is defined as (i) λt = f (˚(i) )T βt , s (4.14) where f (˚(i) ) = (f1 (˚(i) ), · · · , fp (˚(i) ))T ∈ Rp is a known regression function and βt = s s s [1] [p] (βt , · · · , βt )T ∈ Rp is an unknown vector of regression coefficients. The time evolution of βt ∈ Rp is modeled by a linear time-invariant system given by (4.15) βt+1 = At βt + Bt ωt , where ωt ∼ N (0, W ), β0 ∼ N µβ0 , Σβ0 , and At and Bt are known system parameters. In addition, we consider a zero-mean GMRF [88] ηt = (1) (n) T ηt , · · · , ηt ∈ Rn whose covariance matrix is given by T E(ηt ηk |θ) = Q−1 δ(t − k), θ (4.16) where δ(·) is the Kronecker delta defined by δ(k) =    1, k = 0, (4.17)   0, otherwise, and the inverse covariance matrix (or precision matrix) Qθ ∈ Rn×n is a function of the hyperparameter vector θ. There are different parameterizations of the GMRF (i.e., the precision matrix Qθ ) [88]. Our Bayesian approach does not depend on the choice of the parameterization for the preci- 90 sion matrix. However, for a concrete and useful exposition, we describe a specific parameterization used in this chapter. The precision matrix is parameterized with the full conditionals as follows. Let η be a GMRF on a regular two-dimensional lattice. The associated Gaussian full conditional mean is n 1 (j) (i) (−i) (Qθ )ij ηt , E(ηt |ηt , θ) = − (Qθ )ii j=1 (4.18) (−i) where (Qθ )ij is the i-th row and j-th column element of κ−1 Qθ . Here, ηt is the collection of ηt values everywhere except ˚(i) . The hyperparameter vector is defined as θ = (κ, α)T ∈ s R2 , where α = a − 4. Fig. 4.2 shows the value of κ−1 Qθ ij for one point along with its >0 neighbors, graphically. The value of (Qθ )ij is (4 + a2 )κ−1 as denoted at the center node of the graph in Fig. 4.2. That of (Qθ )ij is −2aκ−1 if j is one of the four closest neighbors of i in the vector 1-norm sense as illustrated by the graph in Fig. 4.2. Thus, the value of (Qθ )ij is zero if j is not one of the twelve closest neighbors of i (or twelve neighbors whose 1-norm distance to the i-th location is less than or equal to 2). The equation in (4.18) states (i) that the conditional expectation of ηt (−i) given the value of ηt everywhere else (i.e., ηt ) can be determined just by knowing the value of ηt on the twelve closest neighbors (see more details in [63]). The resulting GMRF accurately represents a Gaussian random field with the Mat´rn covariance function as shown in [63] e 21−ρ G(r) = σf 2 √ 2ρr ρ Γ(ρ) Kρ √ 2ρr , where Kρ (·) is a modified Bessel function [84], with order ρ = 1, a bandwidth and vertical scale σf 2 = 1/4πακ. 91 (4.19) = 1/h α, 2 1 2 2 -2a 4+a2 -2a 2 1 -2a -2a 2 1 1 Figure 4.2: Elements of the precision matrix Qθ related to a given location. The hyperparameter α > 0 guarantees the positive definiteness of the precision matrix Qθ . In the case where α = 0, the resulting GMRF is a second-order polynomial intrinsic GMRF [88, 89]. From the presented model in (4.13), (4.15), and (4.16), the distribution of zt given βt and θ is zt |βt , θ ∼ N Fs βt , Q−1 , θ (4.20) where Fs := (f (˚(1) ), · · · , f (˚(n) ))T ∈ Rn×p . s s In other words, zt |βt , θ ∼ GP(Fs βt , Kθ ) ∈ Rn is a non-zero mean Gaussian process. Here, the covariance matrix Kθ is defined as inverse of the precision matrix (i.e., Kθ = Q−1 ). θ Note that the precision matrix is a positive definite matrix and invertible, and (Kθ )ij = (i) (j) Cov(zt , zt ), where (Kθ )ij is the i, j-th element of the covariance matrix. For simplicity, let us define Bt = {βt , qt , yt , θ}. Using Gaussian process regression, the 92 posterior distribution for zt |Bt ∈ Rn is given by −1 T T µz |B =Fs βt + Kθ Ht Ht Kθ Ht + σ 2 I yt − Ht Fs βt , t t −1 T T Σz |B =Kθ − Kθ Ht Ht Kθ Ht + σ 2 I Ht Kθ . t t (4.21) The basic idea behind the model introduced in (4.13), (4.15), and (4.16) stems from the space-time Kalman filter model proposed in [16]. The advantage of this spatio-temporal model with known hyperparameters is to make inferences in a recursive manner as the number of observations increases. The zero-mean Gaussian process represents a spatial structure by assuming that the difference between the parametric mean function and the dynamical environmental process is governed by a relatively large time scale. This formulation in turn makes the optimal prediction recursive in time. In this chapter, however, uncertainties in the precision matrix and sampling positions are considered in a fully Bayesian manner. In addition, in contrast to [12, 16, 53], the GMRF with a sparse precision matrix is used to increase the computational efficiency. 4.5 4.5.1 Bayesian Predictive Inference Uncertain hyperparameters and exact localization In this section, we consider the problem of predicting a spatio-temporal random field, using successive noisy measurements sampled by a mobile sensor network. For a known covariance function, the prediction can be shown to be recursive [12] based on Gaussian process regression. The uncertainty in θ in a GMRF has been considered and its sequential prediction algorithms are derived in [116]. However, only the static field has been considered, i.e. 93 µt = µ0 . In this section, we use a Bayesian approach to make predictive inferences of the spatio-temporal random field zt ∈ Rn for the case with uncertain hyperparameters and the exact localization. To this end, we use the following assumptions A.1-A.5. A.1 The spatio-temporal random field is generated by (4.13), (4.15), and (4.16). A.2 The precision matrix Qθ is a given function of an uncertain hyperparameter vector θ. A.3 The noisy measurements {yt }, as in (4.11), are continuously collected by robotic sensors in time t. A.4 The sample positions {qt } are measured precisely by robotic sensors in time t. A.5 The prior distribution of the hyperparameter vector θ is discrete with a support Θ = θ(1) , · · · , θ(L) . Remark 4.5.1. A.1 and A.2 stem from the discretization of a Gaussian process as we described in Section 4.2. From the model in A.1, the zero-mean GMRF represents a spatial structure by assuming that the difference between the parametric mean function and the dynamical environmental process is governed by a relatively large time scale. However, this assumption could be rather strong for cases where data can be fitted better with temporally correlated GMRFs. A.3 is a standard assumption over the measured observations [53]. A.4 is rather strong but it will be relaxed to A.6 in Section 4.5.2 in order to deal with localization uncertainty. A.5 is from the discretization of the hyperparameter vector to replace an integration with a summation over possible hyperparameters. 94 For notational simplicity, we denote the full latent field of dimension n + p by xt = T (zT , βt )T . Let’s define Dk:r := {Pk−1 , qk:r , yk:r }, where t Pk = µx |D , Σx |D } ∪ {P(θ|D1:k )|θ ∈ Θ , k 1:k k 1:k and P0 is assumed to be known. We formulate the first problem as follows. Problem 4.5.2. Consider the assumptions A.1-A.5. Our problem is to find the predictive distribution, mean, and variance of xt conditional on D1:t . In what follows, we summarize the intermediate steps to obtain the solution to Problem 4.5.2. Lemma 4.5.3. Under assumptions A.1 and A.2, the predictive distribution of xt conditional on the hyperparameter vector θ and the measurements D1:t−1 is Gaussian with the following mean and precision matrix   Fs µβt |θ,D1:t−1  µx |θ,D = , t 1:t−1 µβ |θ,D t 1:t−1   Qx |θ,D =  t 1:t−1  −Qθ Fs Qθ T T −Fs Qθ Fs Qθ Fs + Σ−1 βt |θ,D1:t−1 (4.22)  ,  where µβ |θ,D = At µβ |θ,D denotes the expectation of βt conditional on θ and t 1:t−1 t−1 1:t−1 T D1:t−1 and Σβ |θ,D = At Σβ |θ,D AT + Bt W Bt denotes the associated estimation t 1:t−1 t−1 1:t−1 t error covariance matrix. For a given hyperparameter vector θ, (4.22) provides the optimal prediction of the spatio-temporal field in time t using data up to time t − 1. 95 The following lemma is used to compute the posterior distribution of θ, recursively. Lemma 4.5.4. Under assumptions A.3 and A.4, the posterior distribution of the hyperparameter vector θ can be obtained recursively via P(θ|D1:t ) ∝ P(yt |θ, D1:t−1 , qt ) P(θ|D1:t−1 ), (4.23) where the distribution of yt given {θ, D1:t−1 , qt } is Gaussian with the following mean and variance , µy |θ,D = MtT µx |θ,D t t 1:t−1 ,qt 1:t−1 (4.24) Σy |θ,D = MtT Σx |θ,D M + σ 2 I, t t 1:t−1 ,qt 1:t−1 t T here MtT = [Ht 0] ∈ RN ×(n+p) . Proof. The posterior distribution of θ given in (4.23) is computed by applying Bayes’ rule on P(θ|yt , D1:t−1 ). The predictive statistics of yt |θ, D1:t−1 are straightforward results of using (4.11). Note that P(θ|D1:t−1 ) is equal to P(θ|D1:t−1 , qt ). Lemma 4.5.5. Under assumptions A.1-A.4, the full conditional distribution of xt for a given hyperparameter vector and data up to time t is xt |θ, D1:t ∼ N (µx |θ,D , Q−1 ), xt |θ,D1:t t 1:t where Qx |θ,D =Qx |θ,D + σ −2 Mt MtT , t t 1:t 1:t−1 ). µx |θ,D =µx |θ,D + σ −2 Q−1 M (y − MtT µx |θ,D xt |θ,D1:t t t t t t 1:t 1:t−1 1:t−1 96 (4.25) Remark 4.5.6. In order to keep computing the prediction error covariance matrix Q−1 x |θ,D t 1:t alone, the Woodbury lemma could be used to reduce the computational load as follows. Q−1 x |θ,D t 1:t =Q−1 x |θ,D t × 1:t−1 − Q−1 x |θ,D t 1:t−1 Mt σ 2 I + MtT Q−1 M xt |θ,D1:t−1 t −1 (4.26) MtT Q−1 , xt |θ,D1:t−1 where Q−1 can be computed with blockwise inversion using (4.22), xt |θ,D1:t−1   −1 T Qθ + Fs Σβt |θ,D1:t−1 Fs Fs Σβt |θ,D1:t−1  .  T Fs Σβ |θ,D ΣT |θ,D β = Q−1  xt |θ,D1:t−1 t 1:t−1 t 1:t−1 . The blockwise inversion needs to be updated only with Σβ |θ,D t 1:t−1 The following theorem explicitly illustrates how the results of Lemma 4.5.4 and Lemma 4.5.5 lead to the predictive statistics of xt under assumptions A.1-A.5, which will be the solution to Problem 4.5.2. Theorem 4.5.7. Under assumption A.5, the predictive distribution of xt |D1:t is given by P(xt |D1:t ) = P(xt |θ, D1:t ) P(θ|D1:t ), (4.27) θ∈Θ where P(θ|D1:t ) and P(xt |θ, D1:t ) are given by Lemmas 4.5.4 and 4.5.5, respectively. The 97 predictive mean and variance follow as µx |D = t 1:t µx |θ,D P(θ|D1:t ), t 1:t θ∈Θ Σx |θ,D + (µx |θ,D − µx |D )(µx |θ,D − µx |D )T P(θ|D1:t ). t t t 1:t t t 1:t 1:t 1:t 1:t Σx |D = t 1:t θ∈Θ (4.28) Proof. The predictive mean and variance is obtained by marginalizing over the conditional distribution of θ given D1:t . The marginal mean and variance are E(Y ) = E(E(Y |X)) and Var(Y ) = E(Var(Y |X)) + Var(E(Y |X)). Having Y := xt |D1:t and X := θ|D1:t completes the proof of Theorem 4.5.7. Remark 4.5.8. The optimal prediction of the spatio-temporal field xt |θ, D1:t−1 using predictive statistics of xt−1 |θ, D1:t−1 is provided by Lemma 4.5.3. Lemma 4.5.5 provides the optimal estimator of xt |θ, D1:t , using predictive statistics of xt |θ, D1:t−1 which is given by Lemma 4.5.3. Using Lemma 4.5.3 and Lemma 4.5.5 sequentially we can update predictive statistics of xt |θ, D1:t for known hyperparameters. Lemma 4.5.4 gives us the posterior distribution of θ based on the measured data. Finally Theorem 4.5.7 provides the optimal Bayesian prediction of the spatio-temporal random field with a time varying mean function and uncertain hyperparameters by marginalizing θ over the conditional distribution of θ given D1:t . Remark 4.5.9. Under assumption A.5, i.e., a discrete prior on the hyperparameter vector T θ with a support Θ = θ(1) , · · · , θ(L) , the predictive distribution of xt = (zT , βt )T (or the t predictive mean and variance) conditional on D1:t can be computed explicitly in a constant time as time t (or as the number of observations) increases in a recursive way. For a 98 Algorithm 3 Sequential Bayesian predictive inference Initialization: 1: initialize Fs 2: for θ ∈ Θ, initialize Qθ , and compute Q−1 θ At time t ∈ Z>0 , do: 1: obtain new observations yt collected at current locations qt 2: find the map Mt from qt to spacial sites Q, and compute radial basis values Fqt in qt . 3: for θ ∈ Θ do 4: predict µx |θ,D and Qx |θ,D using measurements up to time t − 1, given t t 1:t−1 1:t−1 by (4.22). 5: compute µx |θ,D and Qx |θ,D given by (4.25). t t 1:t 1:t 6: compute µy |θ,D ,qt and Σyt |θ,D1:t−1 ,qt given by (4.24). t 1:t−1 7: calculate P(θ|D1:t ) given by (4.23). 8: end for 9: compute the predictive mean and variance using (4.28). continuous prior distribution P(θ), the summations in (4.27) and (4.28) need to be replaced by integrations with respect to θ, which need to be approximated. The proposed solution to the formulated problem is summarized by Algorithm 3. 4.5.2 Uncertain hyperparameters and localization In the previous section, we assumed that the localization data q1:t is exactly known. However, in practice positions of sensor networks cannot be measured without noise. Instead, for example, there could be several probable possibilities inferred from the measured position. In this section, the proposed method in the previous section will be extended for the uncertain localization data. To the best of our knowledge, this is the first attempt to obtain predictive statistics for a case where both uncertainties in hyperparameters and localization are simultaneously included in a fully Bayesian perspective. In what follows, we present our idea in a problem statement and the corresponding solution. 99 In order to take into account the uncertainty in the sampling positions, we replace assumption A.4 with the following assumption A.6. (i) (i) q A.6 The prior distribution P(qt = ˚t ) is discrete with a support ˚t ∈ Ω(t), and q Ω(t) = (˚(k1 ) , · · · ,˚(kN ) )|(k1 , · · · , kN ) ∈ L(t) s s , which is given at time t along with the corresponding measurement yt . Here, γ(t) = |L(t)| denotes the number of the probable possibilities for qt . A straightforward consequence of Assumption A.6 is that the prior distribution P(qk:r ) is discrete with a support Ω(k : r) := r g=k Ω(g). In addition, L(k : r) := the index in the support Ω(k : r), and γ(k : r) := r g=k L(g) denotes r g=k γ(g) is the number of the probable possibilities for qk:r . Now we state the problem as follows. Problem 4.5.10. Consider assumptions A.1, A.2, A.3, A.5, and A.6. Our problem is to find the predictive distribution, mean and variance of xt conditional on the prior P0 and the measurements y1:t . For the sake of conciseness, let us define Rr:k := {Pr−1 , yr:k }. We then have that ˜ Rr:k ⊂ Dr:k , where we recall that Dr:k := {Pr−1 , qr:k , yr:k }. To solve Problem 4.5.10, we first seek for a way to compute the posterior distribution of qt as summarized in the following lemma. (n) (k) (n) Lemma 4.5.11. Consider P(yt |θ, D1:t−1 ,˚t ) given by (4.24) and P(θ|D1:t−1 ) given by q (n) (4.23), where n ∈ L(1 : t − 1), k ∈ L(t), and D1:t−1 := 100 (n) P0 ,˚1:t−1 , y1:t−1 . Under q assumption A.5, we have (n) (k) (k) (n) (n) q P(yt |θ, D1:t−1 ,˚t ) P(θ|D1:t−1 ). q P(yt |D1:t−1 ,˚t ) = θ∈Θ (k) (n) Under assumption A.6, the posterior distribution of q1:t = ˚1:t−1 ,˚t q q can be obtained, recursively, via (n) (k) (k) (n) (n) (k) q q P ˚1:t−1 ,˚t |R1:t ∝ P(˚1:t−1 |R1:t−1 ) P(yt |D1:t−1 ,˚t ) P(˚t ). q q q (4.29) Proof. The proof is a straightforward result of applying Bayes’ rule. We now give the exact solution to Problem 4.5.10 as follows. (i) Theorem 4.5.12. Consider the predictive distribution P(xt |D1:t ) given by Theorem 4.5.7 (i) (i) and the posterior P ˚1:t |R1:t q (i) given by Lemma 4.5.11, where ˚1:t ∈ Ω(1 : t) and D1:t = q (i) P0 ,˚1:t , y1:t . Under assumption A.6, the predictive distribution of xt |R1:t can be obtained q as follows. (i) (i) q P xt |D1:t P ˚1:t |R1:t . P (xt |R1:t ) = (4.30) i∈L(1:t) Consequently, the predictive mean and variance are given by the formulas. µx |R = t 1:t (i) q (i) P ˚1:t |R1:t , x |D i∈L(1:t) t 1:t µ Σx |R = t 1:t Σ i∈L(1:t) (i) + xt |D1:t µ (i) − µxt |R1:t xt |D1:t T × µ (i) − µxt |R1:t xt |D1:t 101   P ˚(i) |R1:t . q1:t (4.31) Proof. The proof is similar to that of Theorem 4.5.7. Hence, the predictive mean and variance are obtained by marginalizing over the conditional distribution of q1:t . The marginal mean and variance are E(Y ) = E(E(Y |X)) and Var(Y ) = E(Var(Y |X)) + Var(E(Y |X)). Having Y := xt |R1:t and X := q1:t |R1:t proves Theorem 4.5.12. Remark 4.5.13. The complexity of the proposed algorithm in Theorem 4.5.12 is proportional to the number of possibilities for q1:t . The result of Lemma 4.5.11 enables us to compute these probable possibilities recursively. However, the number of the non-zero probable combinations grows exponentially by a power of time. In other words, the number of possibilities for q1:t is the multiplication of the numbers of possibilities for q1:t−1 and possibilities for qt . Note that this growing computational complexity of this exact solution, given in Theorem 4.5.12, to Problem 4.5.10 is prohibitive for sensor networks with limited resources. In what follows, we propose an approximation, with a controllable tradeoff between approximation error and complexity, to the exact solution given in Theorem 4.5.12 by including an option of ignoring uncertainties on past position data. This approximation will include a case of the exact solution with the maximal and original complexity. The idea is based on the fact that the estimation of xt is more susceptible to the uncertainties in recently sampled positions as compared to old ones. To formulate our idea clearly, we present first the following results. Lemma 4.5.14. Using prior distribution of xt−m and measured data yt−m+1:t , where m ∈ Z>0 , the posterior distribution of qt−m+1:t can be obtained recursively via (j) (k) P ˚t−m+1:t−1 ,˚t |Rt−m+1:t ∝ q q (j) (j) (k) (k) P(˚t−m+1:t−1 |Rt−m+1:t−1 ) P(yt |Dt−m+1:t−1 ,˚t ) P(˚t ). q q q 102 (4.32) where j ∈ L(t − m + 1 : t − 1), and k ∈ L(t). Proof. The proof is a straightforward result of applying Bayes’ rule. Theorem 4.5.15. Consider µ and Σ computed by Theorem 4.5.7. (h) (h) xt |Dt−m+1:t xt |Dt−m+1:t Under assumption A.6, the predictive statistics of xt |Rt−m+1:t are as follows. µx |R = t t−m+1:t (h) P ˚t−m+1:t |Rt−m+1:t , q (h) x |D h∈L(t−m+1,t) t t−m+1:t µ Σx |R = t t−m+1:t Σ + (h) xt |Dt−m+1:t h∈L(t−m+1,t) µ − µx |R (h) t t−m+1:t xt |Dt−m+1:t T × µ − µx |R (h) t t−m+1:t xt |Dt−m+1:t (4.33)   P ˚(h) qt−m+1:t |Rt−m+1:t . Proof. The proof is similar to that of Theorem 4.5.12. To implement approximations to the predictive statistics of xt |R1:t which are given by Theorem 4.5.12, we consider the following conditions. C.1 For 1 m ≤ t, we have that P(xt |R1:t ) ≈ P(xt |Rt−m+1:t ) C.2 For 1 (4.34) m ≤ t, Pt can be approximated by Pt ≈ µx |R ,Σ ∪ {P(θ|Rt−m+1:t )|θ ∈ Θ} . t t−m+1:t xt |Rt−m+1:t 103 (4.35) Under conditions C.1 and C.2, it is natural for us to propose the following approximations. µx |R ≈ µx |R , t 1:t t t−m+1:t (4.36) Σx |R ≈ Σx |R . t 1:t t t−m+1:t (h) Remark 4.5.16. In Theorem 4.5.15, the predictive statistics of xt |Dt−m+1:t are obtained from Algorithm 3 which is given in Section 4.5.1. The only difference is that we start from time t − m + 1 instead of time 1 with Pt−m instead of P0 . Note that without condition (h) C.2 we cannot use Algorithm 3 to calculate the statistics of xt |Dt−m+1:t . The proposed approximation for the case of uncertain localization in (4.36) is quite different from a mere truncation of old data in the sense that past measurements still affect the current estimation through the approximately updated prior information using (4.35). Note that we update the prior information from Pt−m to Pt with the cumulative data collected from time t−m+1 up to time t, which is different from only using truncated observations. The proposed approximation for the formulated problem is summarized by Algorithm 4. To further understand the nature of the proposed approximation, consider the following two extreme special cases. Corollary 4.5.17. As a special case of Theorem 4.5.15 for m = 1, the posterior distribution of qt can be obtained via (k) (k) (k) P(˚t |Pt−1 , yt ) ∝ P(yt |Pt−1 ,˚t ) P(˚t ), q q q (4.37) where k ∈ L(t). The predictive distribution of xt |R1:t can be approximated in a constant 104 time as time t increases in a sequential way. P (xt |R1:t ) ≈ P (xt |Pt−1 , yt ) , where (k) (k) P xt |Pt−1 ,˚t , yt P ˚t |Pt−1 , yt q q P (xt |Pt−1 , yt ) = (4.38) k∈L(t) (k) q and the posterior P ˚t |Pt−1 , yt is given by (4.37). Consequently, the predictive mean µx |R and variance Σx |R can be approximated by µx |P ,y and Σx |P ,y , respect t−1 t t t−1 t t 1:t t 1:t tively, i.e., µx |P ,y = t t−1 t (k) P ˚t |Pt−1 , yt , q (k) x |P ,˚ ,yt q k∈L(t) t t−1 t µ Σ Σx |P ,y = t t−1 t k∈L(t) + (k) xt |Pt−1 ,˚t ,yt q µ − µx |P ,y (k) t t−1 t xt |Pt−1 ,˚t ,yt q T × µ − µx |P ,y (k) t t−1 t xt |Pt−1 ,˚t ,yt q (4.39)   P ˚(k) |Pt−1 , yt . qt Corollary 4.5.18. For another special case with m = t, Theorem 4.5.15 becomes Theorem 4.5.12. Remark 4.5.19. For a fixed m ∈ Z>0 , Algorithm 4 is scalable as time t (or the number of observations) increases. In our approach, the level of the approximation can be controlled by users by selecting a tradeoff (or by choosing a value for m) between the approximation error and complexity. Most simplistic approximation can be obtained by choosing m = 1 as in Corollery 4.5.17. The original exact solution with maximal complexity is recovered by 105 Algorithm 4 Sequential Bayesian predictive inference approximation with uncertain localization. At time t ∈ Z>0 , do: 1: obtain new observations yt along with the probabilities for locations P(qt ) (h) 2: for ˚t ∈ Ω(t − m + 1 : t) do q 3: 4: 5: 6: 7: predict µ (j) (k) ,Σ and P(yt |Dt−m+1:t−1 ,˚t ) using Algorithm q (h) (h) xt |Dt−m+1:t xt |Dt−m+1:t 3 (h) compute P ˚t−m+1:t |Rt−m+1:t using Lemma 4.5.14. q end for compute µx |R and Σx |R , using Theorem 4.5.15. t t−m+1:t t t−m+1:t use following approximation to update estimations , µx |R ≈ µx |R t t−m+1:t t 1:t , Σx |R ≈ Σx |R t t−m+1:t t 1:t Pt ≈ µx |R ,Σ , P(Θ|Rt−m+1:t ) . t t−m+1:t xt |Rt−m+1:t selecting m = t as shown in Corollery 4.5.18. 4.5.3 Complexity of Algorithms In this section, we discuss complexity aspects of the proposed algorithms. For a fixed number of the radial basis functions (i.e., p) and a fixed number of the special sites (i.e., n), the computational complexity of Algorithm 3 is dominated by (4.25). The complexity of Algorithm 3 in each time step is O(LN 2 ), where L is the number of possible hyperparameter vectors and N is the number of agents. The complexity of Algorithm 4 in time t is O (γ(t − m + 1 : t)) times the complexity of Algorithm 3 for m time steps. Hence, the complexity of Algorithm 4 in time t is O γ(t − m + 1 : t)LN 2 M . The numbers of special sites and radial basis functions affect the complexity of Algorithm 3 as well. The complexity of the three cases is O(n3 ) with respect to the number of special 106 sites due to the matrix inversion. Thus, if we use finer grids we increase the number of special sites, i.e. n, and the complexity increase cubical. For a fixed set of L, n and N , the complexity of Algorithm 3 with respect to p is O(p3 ). 4.6 Simulation Results In this section, we demonstrate the effectiveness of the proposed sequential Bayesian inference algorithms using a numerical experiment. Consider a scenario, where mobile sensing agents can only move on a regular lattice, denoted by G, which is shown with circles in Fig. 4.3. This reachable set of robotic sensors is a subset of the spatial sites (i.e., G ⊂ Q). Note that Q is a finer grid graph on which the GMRF is realized. The spatial sites are shown as intersections between dashed lines in Fig. 4.3. Assume that some agents measure their true positions exactly while others only know a finite set of probable positions with the associated probabilities. Fig. 4.3 shows an illustrative configuration where agents 1 and 3 measure sampling positions exactly while agent 2 knows only a set of uncertain possibilities (as shown with 2a, 2b, 2c, and 2d) for a given measured sampling position (a solid star). The prior probability that the true sampling position of agent 2 is located in 2a is proportional 2 to the rectangular area between the star and 2d, i.e., 16 . The other prior probabilities of 2b, 2c, and 2d are defined in the same way, which provides a high probability for a node whose location is close to the observed sampling location. This way of assigning priors guarantees that the sum of prior probabilities is equal to one. In this simulation, we allow two agents get measurements at the same location and time. Repeatedly sampled positions are denoted by large circles in Fig. 4.5-(g). In this illustrative example, we use the spatio-temporal field introduced in Section 4.4. 107 1 3 2a 2b 2c 2d Figure 4.3: Agents 1 and 3 measure their exact sampling positions while agent 2 has a noisy sampling position measured in the star location. Four possible sampling positions for the 2 6 2 true sampling position of agent 2 are 2a, 2b, 2c and 2d with the prior probabilities 16 , 16 , 16 , 6 and 16 , respectively. The spacial sites in Q consist of 51×51 grid points, i.e., n = 2601, uniformly distributed over the surveillance region [−25, 25] × [−25, 25]. The time-varying mean function µt consists of ten radial basis functions (i.e., p = 10), which are defined as follow. fj (q [i] ) = exp − q [i] − ξj 2 2 2σj , j ∈ {1, · · · , p}, (4.40) where σj is the bandwidth and ξj is the center location of the j-th radial basis function. The first radial basis function has an infinity bandwidth (i.e., σ1 = ∞) to represent the average of the field, and the others have a bandwidth equal to σj = 15. The centers of radial basis functions are {(0, 0)} ∪ {−15, 0, 15} × {−15, 0, 15}. The prior distribution of β0 is chosen to be β0 ∼ N (0, 5I). The time evolution of βt is modeled by (4.15), where the state matrix At and the input matrix Bt are given by 0.95I10×10 and 0.5I10×10 , respectively. The input disturbance variance is known to be W = I10×10 . 108 (a) (b) (c) 4 2 0 2 0 2 1 −2 −2 −4 −4 0 −1 −2 Figure 4.4: (a) The realized spatio-temporal field at time t = 2 is the summation of (b) the smooth mean field and (c) the GMRF as described in (4.13). We choose the precision matrix Qx|θ with hyperparameters α = 0.01 equivalent to a √ √ bandwidth = 2/ α ≈ 14.14, and κ = 10 equivalent to σf 2 = 1/4πακ ≈ 0.8. The measurement noise variance σ 2 = 0.1 is assumed to be known. The numerically simulated field zt from our model in (4.13) is shown in Fig. 4.4-(a). From the same realization, the mean function µt and the GMRF ηt , as in (4.13), are shown in Fig. 4.4-(b) and Fig. 4.4-(c), respectively. Ten mobile sensing agents, i.e, N = 10, take measurements at time t ∈ {1, 2, · · · , 10}, where only three agents are assumed to have uncertain sampling positions. In Figs. 4.5-(g), (h), and (i), true, noisy, and probable sampling positions are shown in circles, stars, and corners of squares, respectively. The prediction results are summarized for three methods of prediction as follows. • Case 1: Figs. 4.5-(a), (d), and (g) show the prediction, prediction error variance, and squared (empirical) error fields, using Algorithm 3 with exact sampling positions. With the true sampling positions, the best prediction quality is expected for this case. • Case 2: Figs. 4.5-(b), (e), and (h) show the resulting fields, by applying Algorithm 3 naively to measured sampling positions including noisy locations. The results clearly 109 illustrate that naively applying Algorithm 3 to noisy sampling positions can potentially distort prediction at a significant level as shown in Figs. 4.5-(b) and (h). • Case 3: Figs. 4.5-(c), (f), and (i) show the resulting fields, by applying Algorithm 4 with m = 1. In particular, we use the result from Corollary 4.5.17 (i.e., m = 1) to evaluate the effectiveness of the proposed approximation in marginalizing the uncertainty in sampling positions. The resulting prediction quality is much improved as compared to Case 2 and is even comparable to the result for Case 1. Cases 1, 2, and 3 will be also applied to realistic experimental data in Section 4.7. Fig. 4.6-(a) shows the RMS errors of the estimated βt for Cases 1, 2, and 3 in time. The RMS errors are defined as follows. RMS(t) = 1 p p i=1 [i] (µ [i] − βt )2 . β (4.41) t Fig. 4.6-(a) clearly illustrates that the RMS errors of Cases 1 and 3 are similarly lower than that of Case 2 for all time. Now we examine the convergence rate of hyperparameters. The prior distribution of the hyperparameter vector θ is discrete with a support Θ = {(κ, α), (0.1κ, α), (10κ, α), (κ, 0.1α), (κ, 10α)} , along with the corresponding uniform probabilities {0.2, 0.2, 0.2, 0.2, 0.2}. As shown in Fig. 4.6-(b), the updated posterior probabilities of the hyperparameters (κ, α) converge to the true hyperparameters for Case 1 (blue circles) and Case 3 (red squares), as the number of observations increases. However, Case 2 (green stars) does not converge to the true ones. 110 (a) (b) (c) 3 2 1 0 −1 −2 −3 (d) (e) (f) 25 20 15 10 5 (g) (h) (i) 0 10 8 6 4 2 0 Figure 4.5: The prediction results of Cases 1, 2, and 3 at time t = 2 are shown in the first, second, third columns, respectively. The first, second, and third rows correspond to the prediction, prediction error variance, and squared empirical error fields between the prediction and true fields. True, noisy, and probable sampling positions are shown in circles, stars, and corners of squares, respectively. 111 (a) 10 Case 1 Case 2 Case 3 8 6 4 2 0 2 5 10 15 20 (b) 1 0.8 Case 1 Case 2 Case 3 0.6 0.4 0.2 0 0 2 5 10 15 20 Figure 4.6: (a) The RMS estimation error of βt v.s. time and (b) the posterior probability of the true hyperparameter vector v.s. time. The results for Cases 1, 2, and 3 are shown in blue circles, green stars, and red squares, respectively. Note that the results in Figs. 4.4 and 4.5 are obtained at time t = 2. The corresponding values for t = 2 in Fig. 4.6 are highlighted by a gray area. Fig. 4.7 shows the controllable tradeoff between approximation error and complexity. The complexity of Algorithm 4 increases exponentially with respect to m. The predicted field is compared between Case 1, which is the best prediction quality expected, and Case 3. Clearly, by increasing m, the mean square difference between Case 1 and Case 3 decreases. However, this improvement costs exponentially increasing computational load. Fig. 4.8 shows the effect of increasing number of observations with uncertain sampling positions on the three cases. Here we assume that we have seven observations with known true sampling positions plus a few observations with uncertain sampling positions. Clearly the RMS error decreases in Case 1 and Case 3 by adding new observations with true and 112 The mean square diffrence 2.5 2 1.5 1 0.5 0 1 2 3 4 The approximation order m 5 Figure 4.7: The mean square difference between Case 1 and Case 3 is shown for the different approximation orders m = 1, · · · , 5. On each box, the central mark is the median, the upper and lower edges of the box are the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points. uncertain sampling positions, respectively. On the other hand, as shown in Fig. 4.8, adding new observations with noisy sampling positions could increase RMS error for Case 2. In this example, the fixed running time using Matlab R2009b (MathWorks) on a PC (2.4 GHz Intel Core 2 Duo Processor) is about 3 minutes for the proposed approximation (m = 1) which is fast enough for real world implementation. In addition, the quality of prediction by Algorithm 4 with a finite m could be as good as one using Algorithm 3 with the true sampling positions. 4.7 Experimental Results Similarly to [82], a vision-based robotic sensor was built by the authors to validate the proof of concept for the proposed methods. The wireless mobile robot is equipped with two motorized wheels, a micro-controller, and a 360 degree omnidirectional camera, a motion sensor, a wireless receiver, and transmitter. The omnidirectional camera is homemade from a cheap Wi-Fi remote CCD camera (Ai-Ball Treck R ) and a globe mirror. The vision images 113 Average of RMS Error 1.6 Case 1 Case 2 Case 3 1.5 1.4 1.3 1.2 1.1 1 0 1 2 Number of uncertain sampling positions 3 Figure 4.8: The number of observations with uncertain sampling position on the three different cases is shown on the horizontal axis. The field prediction RMS error for Case 1, Case 2 and Case 3 are shown with white, black, and hatched bars, respectively. of the 360 degree environment around the robot produced by the omnidirectional camera are steamed via 802.11 b/g Wi-Fi interface to a laptop for image processing (see Fig. 4.9). In this section, we apply the proposed prediction algorithms to real experimental data. Fig. 4.10 shows our experimental setup in which a redness intensity field is sampled by the captured images from the CCD camera on top of the mobile robot. The CCD camera captures 360 degree images. The redness intensity is computed by simply averaging the red component of the RGB picture. Noisy measurements are sampled at random sampling positions by our robot. The position of the robot has been measured by an image processing software which is built by the authors and the true sampling positions are obtained manually. The objective is to predict the redness intensity field using the scalar field model proposed in Section 4.4. Note that each image has a lot of information. However, we only used the redness out of each image as a scalar value of interest. In the future, we plan to extend this experiment for multivariate random fields for multiple features from each image. Here we are 114 Figure 4.9: A vision-based robot is built by the authors, and equipped with a 360 degree omnidirectional camera. Figure 4.10: The experimental setup is shown. The measured position of the robot and 4 possible sampling positions are shown in dark green star and light green squares, respectively. The spacial sites are marked with aqua blue dots on the ground. The panoramic view of the robot is pictured on the upper left hand side of the figure. 115 considering two different scenarios. First, we consider a static field and then a time-varying field with a moving person in the surveillance region. 4.7.1 The Static Experiment In this study, the spatial sites in Q are considered to be 10 × 26 grid points, i.e., n = 260. The grid points are shown in Fig. 4.10 with aqua-blue dots. The mean function µt consists of only one radial basis function that keeps moving average of the field. Note that this basis function can model the changes in the brightness of the images caused by the slow changes of the environment lights. The center of the radial basis function is (5, 13), and its bandwidth σ1 is ∞. The prior distribution of the hyperparameter vector θ is chosen to be discrete with a support Θ = {25, 50, 100, 200, 400} × {0.1, 0.2, 0.4, 0.8, 1.6} and the associated uniform probabilities. The measurement noise variance σ = 0.01 is estimated. To demonstrate the usefulness of our model in (4.13), (4.15), and (4.16), and our prediction algorithm, the mobile robot measures eighty samples, i.e., N = 80, where only nine sampling positions are uncertain with four possibilities each. Thus there is 49 possible combinations for this set of sampling positions. After two set of observations, the resulting posterior probabilities of the hyperparameters for Case 1 is shown in Fig. 4.11. Fig. 4.11 shows the estimated hyperparameters converge to κ = 100 and α = 0.8, where is equivalent with = 1.58 and σf = 0.0315. Similarly to Section 4.6, the prediction and prediction error variance are computed for Cases 1, 2, and 3. However, since it is a realistic experiment, the ground truth of the field of interest is not available and so we are not able to compute the empirical error. The prediction and the prediction error variance using true sampling positions (Case 1), are shown in Figs. 4.12-(a) and (d), respectively. Figs. 4.12-(b) and (e) show the resulting fields, by applying 116 probability Figure 4.11: The posterior probability of the hyperparameter vector at t = 2. Algorithm 3 naively to measured sampling positions including noisy locations (Case 2). Figs. 4.12-(c) and (f) show the results by applying Algorithm 4 with m = 1 (Case 3). The results confirm that the quality of the prediction in Case 3 is not much compromised as compared to Case 1 and demonstrate the capability of our proposed algorithm to deal with uncertain sampling positions. 4.7.2 The Dynamic Experiment In this scenario, the spatial sites are the same as in the previous experiment. The mean function µt consists of twenty nine radial basis functions. The centers of radial basis functions are {(5, 13)} ∪ {1, 4, 7, 10} × {1, 5, 9, 13, 17, 21, 25}. The time evolution of βt is modeled by (4.15), where the state matrix At and the input matrix Bt are given by I and 0.05I, respectively. The first radial basis function has an infinity bandwidth (i.e., σ1 = ∞) to represent the average of the field, and the others have a bandwidth that is equal to σj = 4. 117 (a) (b) (c) 25 25 25 20 20 20 15 15 15 10 10 10 5 5 5 2 4 6 8 10 2 4 6 8 10 (d) 0.9 0.85 0.8 0.75 2 4 6 8 10 (e) 0.7 (f ) 25 25 25 20 20 20 15 15 15 10 10 10 −8 5 5 5 −9 2 4 6 8 10 2 4 6 8 10 −5 −6 −7 2 4 6 8 10 Figure 4.12: The prediction results of Cases 1, 2, and 3 at time t = 2 are shown in the first, second, third columns, respectively. The first and second rows correspond to the predictions and the natural logarithm of the prediction error variance. In this experiment, a person moves in the surveillance region while the robot is collecting observations. The robot collects 57, 40, 31, and 45 observations corresponding to times t = 1, 2, 3, and 4, respectively. The first column in Fig. 4.13 shows the positions of moving robot and person in the domain. As mentioned, the position of the robot has been measured with a fixed camera and help of an image processing software. Sometimes, the robot is not visible by the positioning system since the moving person blocks the visual contact between the robot and the positioning system. The black areas in the second column of Fig. 4.13 118 show the blind spots of the positioning system. Note that other positioning systems like GPS also have blind spots such as GPS denied areas. Therefore, when the robot moves to a blind spot the position cannot be determined precisely. In this case, we assign different probabilities on multiple sampling positions. True sampling positions in each time step are shown with blue dots in the second column of Fig. 4.13 while just blue dots in the white area are measured through the positioning system and the positions in the blind spots are recorded manually for a comparison purpose. The prediction results of Case 1 and Case 3 are compared with a trivial method of prediction defined as follows. • Case 4: The fifth column of Fig. 4.13 shows the resulting fields, by applying Algorithm 3 on just observed sampling positions. Here all the observations whose sampling positions are uncertain are discarded. In particular, 12, 3, 2, and 4 observations have been discarded for time t = 1, 2, 3 and 4, respectively. The predicted field using true sampling positions (Case 1) and uncertain sampling positions (Case 3) are shown in the third and forth columns of Fig. 4.13, respectively. Since in the fully automated experiment true sampling positions in the blind spot are not available, we used the proposed algorithm in this chapter to deal with uncertain sampling positions in the blind spot areas. The predicted field simply by discarding uncertain sampling positions is shown in the fifth column of Fig. 4.13. Clearly, the results obtained for Case 3 with m = 1, is comparable to the result for Case 1 that is the best prediction quality expected among all cases. Thus, the experimental result demonstrates the effectiveness of the proposed algorithm in this chapter. 119 25 25 25 20 20 20 20 15 15 15 15 10 t=1 25 10 10 10 5 5 5 5 2 6 10 2 6 10 25 25 25 20 20 20 20 15 15 15 15 10 10 10 10 5 5 5 5 2 6 10 2 6 10 25 25 0.75 25 20 20 20 20 15 15 15 15 10 10 10 10 5 5 5 5 2 6 10 2 6 10 0.9 0.75 2 6 10 2 6 10 25 25 25 25 20 20 20 20 15 15 15 15 10 t=4 0.9 2 6 10 2 6 10 25 t=3 0.75 2 6 10 2 6 10 25 t=2 0.9 10 10 10 5 5 5 5 2 6 10 2 6 10 2 6 10 0.9 0.75 2 6 10 Figure 4.13: The time varying field is shown for 4 time iterations. The first column shows the moving object in the field. The second column shows the sampling positions with blue dots and the positioning blind spot with black areas. the third, forth and fifth columns show predicted field for Case 1, Case 3 and Case 4, respectively. The rows are correspond to the time intervals t = 1, 2, 3 and 4. 120 4.8 Conclusion We have tackled a problem of predicting a spatio-temporal field using successive noisy measurements obtained by robotic sensors, some of which have uncertain localization. We developed the spatio-temporal field of interest using a GMRF and designed sequential prediction algorithms for computing the exact and approximated predictive inference from a Bayesian point of view. The most important contribution is that the computation times for Algorithm 3 and Algorithm 4 with a finite m at each time step do not grow as the number of measurements increases. Two different experimental results provide a solid proof of concept on the proposed scheme. The proposed algorithm will be useful for robotics applications such as environmental monitoring by autonomous aquatic robots [40], decentralized environmental modeling by mobile sensor networks [64], gradient climbing of environmental fields [10,62] with or without localization uncertainty. Furthermore, our approach can provide a fully Bayesian way to automatically tune the hyperparameters in Gaussian process-based SLAM [7, 51, 77, 105] with increase computational complexity. 121 Chapter 5 Simultaneous Localization and Spatial Prediction Abstract: This chapter investigates a fully Bayesian way to solve the simultaneous localization and spatial prediction (SLAP) problem using a Gaussian Markov random field (GMRF) model. The objective is to simultaneously localize robotic sensors and predict a spatial field of interest using sequentially obtained noisy observations collected by robotic sensors. The set of observations consists of the observed uncertain poses of robotic sensing vehicles and noisy measurements of a spatial field. To be flexible, the spatial field of interest is modeled by a GMRF with uncertain hyperparameters. We derive an approximate Bayesian solution to the problem of computing the predictive inferences of the GMRF and the localization, taking into account observations, uncertain hyperparameters, measurement noise, kinematics of robotic sensors, and uncertain localization. The effectiveness of the proposed algorithm is illustrated by simulation results. 5.1 Introduction The simultaneous localization and mapping (SLAM) problem is important to be solved for a robot to explore an unknown environment under localization uncertainty [21]. The 122 variations of the SLAM problem are surveyed and categorized with different perspectives in [97]. In general, most SLAM problems have strong geometric models [21,32,61,71,72,97]. For example, a robot learns the locations of the landmarks while localizing itself using triangulation algorithms. Such geometric models could be classified in two groups, viz., a sparse set of features which can be individually identified, often used in Kalman filtering methods [21], and a dense representation such as an occupancy grid, often used in particle filtering methods [31]. In contrast to the SLAM problem with popular geometrical models, there is a growing number of practical scenarios in which no such geometric model exists. Consider localization using various spatially distributed (continuous) signals such as distributed wireless Ethernet signal strength [27], or multi-dimensional magnetic fields [105]. Underwater autonomous gliders for ocean sampling cannot find usual geometrical models from measurements of environmental variables such as pH, salinity, and temperature [62]. Furthermore, there are reasons to avoid the geometric model as well, even when a geometric model does exist. Such cases may include: 1) the difficulty of reliably extracting sparse, stable features, 2) the ability to use all sensory data directly rather than a relatively small amount of abstracted discrete information obtained from feature extraction algorithms, and 3) high computational and storage costs of dealing with dense features. Motivated by the aforementioned situations, in this chapter, we consider scenarios without geometric models and tackle the problem of simultaneous localization and prediction (SLAP) of a spatial field. Nonparametric modeling and prediction techniques for random fields have been exploited for mobile robotic sensors [10, 12, 62, 64, 117]. Random fields such as Gaussian processes and Gaussian Markov random fields (GMRFs) [14, 84] have been frequently used for mobile 123 sensor networks to statistically model physical phenomena such as harmful algal blooms, pH, salinity, temperature, and wireless signal strength e.g., [30, 57, 119]. The recent research efforts that are closely related to our problem are summarized as follows. In [43], the authors formulated Gaussian process regression under uncertain localization. In [42], the authors used a GMRF with uncertain hyperparameters and tackled a problem of prediction of the random field under localization uncertainty. However, kinematics or dynamics of the sensor vehicles were not incorporated in [42, 43]. In [7], Gaussian process regression was used to model geo-referenced sensor measurements (obtained from a camera). After training with data including noisy measurements and their exact sampling positions, a maximum likelihood estimator was used to find the best match for the location of each of newly sampled measurements. However, this was not SLAM since the training step has to be performed a priori for a given environment [7]. In [51, 105], Gaussian process regression was also used to implement SLAM using a magnetic field and the feasibility of such approaches were shown experimentally. The work in [77] used laser range-finder data to probabilistically classify the robot’s environment into a region of occupancy. It provides a continuous representation of robot’s surroundings by employing a Gaussian process. In [27], so called a WiF-SLAM problem was solved using a Gaussian process latent variable model (GP-LVM). However, the accurately known training data and the independence across the dimensions and instantiations of the data were assumed in [27], which may not be practical. To the best of our knowledge, most work related to our SLAP problem did not address uncertainties in the hyperparameters of the Gaussian process in a fully Bayesian way. In most of the previous work, the hyperparamters in the model were estimated offline a priori. In this chapter, we formulate the simultaneous localization and spatial prediction (SLAP) problem, in order to simultaneously localize robotic sensors and predict a spatial random 124 field of interest using sequentially obtained noisy observations collected by robotic sensors. The set of observations consists of the observed uncertain poses of robotic sensing vehicles and noisy measurements of a spatial field. To be flexible, the spatial field of interest is modeled by a GMRF with uncertain hyperparameters. We then derive an approximate Bayesian solution to the problem of computing the predictive inferences of the GMRF and the localization, taking into account observations, uncertain hyperparameters, measurement noise, kinematics of robotic sensors, and uncertain localization. The effectiveness of the proposed algorithm is illustrated by simulation results. 5.2 Sequential Bayesian inference with a GMRF In this section, we define a GMRF model in detail, formulate the problem, and provide its solution. 5.2.1 Gaussian processes and Gaussian Markov random fields In this section, we briefly introduce a GMRF as a discretized Gaussian process on a lattice. Consider a Gaussian process: z ∼ GP(µ, K), where µ is the mean vector, and K ∈ Rn×n is the covariance matrix. We discretize the compact domain S := [0 max(s1 )] × [0 max(s2 )] into n spatial sites Q := ˚(1) , · · · ,˚(n) , where n = h max(s1 ) × h max(s2 ). h will be s s chosen such that n ∈ Z>0 . Note that n → ∞ as h → ∞. The collection of realized values of the random field in Q is denoted by z := (z (1) , · · · , z (n) )T ∈ Rn , where z (i) := z(˚(i) ). s The prior distribution of z is given by N (µ, K). We then have 1 P(z) ∝ exp − (z − µ)T K−1 (z − µ) . 2 125 (5.1) The i, j-th element of K is defined as (K)ij = Cov(z (i) , z (j) ). The prior distribution of z can be written by a precision matrix Q = K−1 , i.e., z ∼ N (µ, Q−1 ). This can be viewed as a discretized version of the Gaussian process (or a GMRF) with a precision matrix Q ˆ on Q. Note that Q of this GMRF is not sparse. However, a sparse version of Q, i.e., Q with local neighborhood that can represent the original Gaussian process can be found, for ˆ example, making Q close to Q in some norm [15, 35, 90]. This approximate GMRF will be ˆ computationally efficient due to the sparsity of Q. In our simulation study, we will use a GMRF with a sparse precision matrix that represents a Gaussian process precisely as shown in [42, 63]. However, any parameterization of µθ and Qθ , where θ is the hyperparameter vector, can be used. 5.2.2 Multiple robotic sensors Consider N spatially distributed robots with sensors indexed by j ∈ J := {1, · · · , N } sampling at time t ∈ Z>0 . Suppose that the sampling time t ∈ Z>0 is discrete. Recall that the surveillance region is discretized as a lattice that consists of n spatial sites, whose set is denoted by Q. Let n spatial sites in Q be indexed by I := {1, · · · , n}, and z := col z (1) , · · · , z (n) ∈ Rn be the corresponding static values of the scalar field at n special [1] [N ] sites. We denote all robots’ locations at time t by qt = col qt , · · · , qt [1] [N ] observations made by all robots at time t by yt = col yt , · · · , yt [1] [N ] observed states of all robots at time t by ˜t = col st , · · · , st s ˜ ˜ ∈ QN , the ∈ RN , and the ∈ S N . ˜t and yt are noisy s observations of st and z, respectively. At time t, robot j takes a noise corrupted measurement [j] at its current location qt = ˚(i) ∈ Q, ∀j ∈ J , i ∈ I, viz., s [j] [j] yt = z (i) + t , 126 (5.2) [j] where the measurement errors { t } are assumed to be the independent and identically [j] i.i.d. ∼ N (0, σ 2 ). The measurement noise distributed (i.i.d.) Gaussian white noise, i.e., t level σ 2 > 0 is assumed to be known, and we define t := col [N ] [1] t ,··· , t ∈ RN . In addition, at time t, robot j takes a noisy observation of its own vehicle position. [j] [j] [j] st = qt + e t , ˜ (5.3) [j] i.i.d. [j] where the observation errors {et } are distributed by et 2 ∼ N (0, σe I). 2 The observation noise level σe > 0 is assumed to be known, and we define et := [1] [N ] col et , · · · , et ∈ R2N . Our models can be represented in the concise collective notation. yt = Ht z + t , (5.4) ˜t = Lt qt + et , s where Lt is the observation matrix for the vehicle states, and Hτ ∈ Rn×N is defined by  [j]   1, if qτ = ˚(i) , s (Hτ )ij =   0, otherwise. 5.2.3 Kinematics of robotic vehicles In this section, we introduce a specific model for the motion of robotic vehicles. Each robotic sensor is modeled by a nonholonomic differentially driven vehicle in a two dimensional 127 domain, i.e., S ⊂ R2 . In this case, an equation of motion for robot i [86] may be given by    = [i] ˙  s1,t  where [i] [i] s1,t , s2,t , [i] ψt , [i] s2,t ˙ [i] ut   [i] [i]  ut cos ψt  [i] [i] ut sin ψt [i] , and βt [i]  + βt , (5.5) denote the inertial position, the orientation, the linear speed, and the system noise of robot i in time t, respectively. In this case, the kinematics of the vehicle network can be further described in detail as follows. st+1 = st + Ft ut + vt , (5.6) where ut is a known control input and vt is an i.i.d. white noise realized by a known normal distribution N 0, Σvt . The evolution of the location of the robot can be more detailed as follows. qt+1 = Q (qt + Ft ut + vt ) (5.7) = qt + Ft ut + wt , where Q : S → Q is the nearest neighbor rule quantizer that takes an input and returns a projected value on S. vt is the process noise and wt is the quantization error between the continuous and discretized states, i.e., wt = Q (qt + Ft ut + vt ) − (qt + Ft ut ). As the cardinality of S increases, we have that wt → vt . A special case of (5.7) is that Ft ut is controlled and wt is chosen such that the next location qt+1 is on a grid point in S. In this case, we have vt = wt . [i] Assuming that {ψt | ∀i ∈ J } in (5.5) can be measured precisly, Ft ∈ R2N ×N in (5.5) is 128 obtained as follows.  [1]  cos ψt   0   sin ψ [1] 0  t   [2]  0 cos ψt    [2] Ft = ∆t  0 sin ψt   . .  . .  . .    0 0    0 0 ··· ··· ··· ··· .. . ··· ··· 0     0     0    , 0   .  .  .   [N ]  cos ψt    [N ] sin ψt where ∆t = 1 is a sampling time. We denote the collections of cumulative robots’ locations, cumulative observations, and cumulative control inputs from time 1 to time t, respectively, by ˜1:t := col (˜1 , · · · , ˜t ) ∈ S N t , y1:t := col (y1 , · · · , yt ) ∈ RN t , and u1:t := col (u1 , · · · , ut ) ∈ s s s RN t . 5.2.4 Problem formulation and its Bayesian predictive inference In this section, we formulate the SLAP problem and provide its Bayesian solution. To be precise, we present the following assumptions A.1-A.5 for the problem formulation. A.1 The scalar random field z is generated by a GMRF model which is given by z ∼ N (µθ , Q−1 ), where µθ and Qθ are given functions of a hyperparameter vector θ. θ A.2 The noisy measurements {yt } and the noisy sampling positions {˜t }, as in (5.4), are s collected by robotic sensors in time t = 1, 2, · · · . A.3 The control input {ut } is a known deterministic vector at time t. 129 A.4 The prior distribution of the hyperparameter vector θ is discrete with a support Θ = θ(1) , · · · , θ(L) . (i) (i) A.5 The prior distribution P(qt = ˚t ) is discrete with a support ˚t ∈ Ω(t), and q q Ω(t) = (˚(k1 ) , · · · ,˚(kN ) )|(k1 , · · · , kN ) ∈ L(t) s s , which is given at time t. Here, γ(t) = |L(t)| denotes the number of the probable possibilities for qt . Problem 5.2.1. Consider the assumptions A.1-A.5. Our problem is to simultaneously find the predictive distributions, means, and variances for both z and q conditional on Dt := {y1:t , ˜1:t }. s The solution to Problem 5.2.1 is derived as follows. The distribution of the GMRF is given by P (z|θ, Dt−1 ) = N µz|θ,D t−1 , Σz|θ,D t−1 . Recall that the evolution of qt is given by (5.6) and the input ut is a known deterministic vector at time t. Therefore, P (qt |Dt−1 ) can be updated by the Gaussian approximation of P(qt−1 |Dt−1 ). P (qt |Dt−1 ) ∝ exp (qt − µq |D + Ft−1 ut−1 )T (Σq |D + Σwt−1 )−1 t−1 t−1 t−1 t−1 (5.8) × (qt − µq |D + Ft−1 ut−1 ) . t−1 t−1 Similarly, P (yt |θ, Dt−1 , qt ) is updated by the Gaussian approximation of P(z|θ, Dt−1 ) as follows. T P (yt |θ, Dt−1 , qt ) ≈ N Ht µz|θ,D t−1 130 T , Σ t + Ht Σz|θ,D t−1 Ht . (5.9) Remark 5.2.2. For the sake of reducing memory usage and complexity, the distribution of qt |Dt−1 and yt |θ, Dt−1 , qt are approximated by normal distributions. The joint distribution z, qt , θ|Dt−1 is obtained as follows. P (z, qt , θ|Dt−1 ) = P (z|θ, qt , Dt−1 ) P (θ|qt , Dt−1 ) P (qt |Dt−1 ) . (5.10) The observation model is given by (5.4), thus the probabilities of the observed data are T P (yt |z, qt ) = N Ht z, Σ and P (˜t |qt ) = N Lt qt , Σet . The measured random variables s have the following conditional joint distribution, P (yt , ˜t |z, qt , θ, Dt−1 ) = P (yt |z, θ, qt , Dt−1 ) P (˜t |qt ) . s s (5.11) From Bayes’ rule, the posterior joint distribution of the scalar field values, the sampling positions, and the hyperparameter vector is given as follows. P (z, qt , θ|Dt ) = In addition, P (qt , θ|Dt ) = P (yt , ˜t |z, qt , θ, Dt−1 ) P (z, qt , θ|Dt−1 ) s . P (yt , ˜t |Dt−1 ) s P (z, qt , θ|Dt ) dz is given as follows. P (˜t |qt ) P (θ|qt , Dt−1 ) P (qt |Dt−1 ) s P (yt , ˜t |Dt−1 ) s P (z, qt , θ|Dt ) dz = × where (5.12) (5.13) P (z|θ, qt , Dt−1 ) P (yt |z, θ, qt , Dt−1 ) dz, P (z|θ, qt , Dt−1 ) P (yt |z, θ, qt , Dt−1 ) dz = P (yt |qt , θ, Dt−1 ), and P(yt |qt , θ, Dt−1 ) is given by (5.9). Remark 5.2.3. From the Bayes’ rule, P(θ|qt , Dt−1 ) is given by 131 P(θ,qt |Dt−1 ) . P(qt |Dt−1 ) We can compute P (θ, qt |Dt−1 ) for all the possible combinations of qt in the previous iteration using (5.13). However, for the sake of reducing the computational cost, we approximate P (θ|qt , Dt−1 ) by P (θ|Dt−1 ). Therefore, we have P (qt , θ|Dt ) ≈ P (˜t |qt ) P (θ|Dt−1 ) P (qt |Dt−1 ) P (yt |qt , θ, Dt−1 ) s , P (yt , ˜t |Dt−1 ) s Marginalizing out uncertainties on the possible qt and θ, we obtain the following. P (z, θ|Dt ) = P (z, qt , θ|Dt ), qt ∈Ω(t) (5.14) P (z, θ|Dt ). P (z|Dt ) = θ∈Θ Our estimation of qt and θ can be corrected using measured data up to time t as follows. P (qt |Dt ) = P (qt , θ|Dt ), θ∈Θ (5.15) P (qt , θ|Dt ). P (θ|Dt ) = qt ∈Ω(t) The predictive probability and the mean value of z|θ, Dt are obtained as follows. P (z|θ, Dt ) = µz|θ,D = t P (z, θ|Dt ) , P (θ|Dt ) 1 P (θ|Dt ) (5.16) µz|q ,θ,D P (qt , θ|Dt ) . t t qt ∈Ω(t) The predictive covariance matrix of z|θ, Dt can be obtained using the law of total variance Σz|θ,D = E Σz|q ,θ,D + Cov µz|q ,θ,D , where the E and Cov is computed over random t t t t t 132 variable qt . Such variables are obtained as follows. E Σz|q ,θ,D = t t Σz|q ,θ,D P (qt |θ, Dt ) , t t qt ∈Ω(t) Cov µz|q ,θ,D = t t µz|q ,θ,D − µz|θ,D t t t µz|q ,θ,D − µz|θ,D t t t T P (qt |θ, Dt ) , qt ∈Ω(t) (5.17) where the predictive mean and covariance of z|qt , θ, Dt are calculated using Gaussian process regression as follows. µz|q ,θ,D =µz|θ,D t t t−1 + Σz|θ,D t−1 T Hqt Σ−1 y |θ,D t t−1 ,qt (yt − µy |θ,D ,q ), t t−1 t (5.18) T − Σz|θ,D Hqt Σ−1 H Σ . Σz|q ,θ,D =Σz|θ,D yt |θ,Dt−1 ,qt qt z|θ,Dt−1 t t t−1 t−1 Finally, the first and second moments of qt |Dt are obtained as follows. µq |D = t t qt P (qt |Dt ) , qt ∈Ω(t) qt − µq |D t t Σq |D = t t 2 P (qt |Dt ) . qt ∈Ω(t) 5.3 Simulation Results In this section, we demonstrate the effectiveness of the proposed sequential Bayesian inference algorithm using a numerical experiment. Consider a robot is moving in a discretized surveillance region Q. The spatial sites in Q consist of 31 × 31 grid points, i.e., n = 961, uniformly distributed over the surveillance region S := [−15, 15] × [−15, 15]. In this illustrative example, we realize the spatial field developed in [42], which a GMRF 133 wrapped around in a torus structure. Thus the top edge (respectively, the left edge) and the bottom edge (respectively, the right edge) are neighbors each other. The parameters of the model in [42] are selected as follows. The mean vector µθ is chosen to be zero, and the precision matrix Qθ is chosen with hyperparameters α = 0.1 equivalent to a bandwidth √ √ 2 = 2/ α ≈ 4.47, and κ = 50 equivalent to σf = 1/4πακ ≈ 0.016. The prior distribution of the hyperparameter vector θ is discrete with a support Θ = {(κ, α), (0.1κ, α), (10κ, α), (κ, 0.1α), (κ, 10α)} , along with the corresponding uniform probabilities {0.2, 0.2, 0.2, 0.2, 0.2}. The measurement noise variance in (5.2) is given by σ = 0.1. A robot takes measurements at time t ∈ {1, 2, · · · , 100} with localization uncertainty. In Figs. 5.1-(d), (e), and (f), true, noisy, and probable sampling positions are shown in circles, stars, and dots, respectively, at time t = 100. In this simulation, the standard deviation of the noise in the observed sampling position is given by σe = 10 in (5.3). The probable sampling positions that form support Ω(t), are selected within the confidence region of (i) [j] Pr( ˚t − qt s ˜ ≤ σe ). The results of the simultaneous localization and spatial prediction are summarized for three methods as follows. • Case 1: Figs. 5.1-(a), (d), and (g) show the prediction, prediction error variance, and squared (empirical) error fields, using exact sampling positions. With the true sampling positions, the best prediction quality is expected for this case. • Case 2: Figs. 5.1-(b), (e), and (h) show the results, by using sampled noisy positions. The results clearly illustrate that naively applying GMRF regression to noisy sampling 134 positions can potentially distort prediction at a significant level. Fig. 5.1-(h) shows that squared error of this case is considerably higher than that of Case 1. • Case 3: Figs. 5.1-(c), (f), and (i) show the results, by applying the proposed approach in Section 5.2.4. The resulting prediction quality is much improved as compared to Case 2 and is even comparable to that of Case 1. The averaged squared errors in time and space, using true sampling positions (Case 1), noisy sampling positions (Case 2), and using uncertain sampling positions (Case 3) are 0.0837 × 10−3 , 0.1664 × 10−3 , and 0.0989 × 10−3 , respectively. This shows the effectiveness of our solution to Problem 5.2.1. The true positions of the robot for time t ∈ T := {10, 31, · · · , 30} are shown in Fig. 5.2 by red diamonds and lines. The estimated sampling positions of the robot E(qt |Dt ) for t ∈ T are shown in blue dots with estimated confidence regions. Fig. 5.2 clearly shows that the proposed approach in this chapter significantly reduce the localization uncertainty as compared to the noise level of the sampled positions (denoted by green stars). In this example, the fixed running time using Matlab R2009b (MathWorks) on a PC (3.2 GHz Intel i7 Processor) is about 40 seconds for each iteration of time which is fast enough for real world implementation. 5.4 Conclusion In this chapter, we provide an approximate Bayesian solution to the problem of simultaneous localization and spatial prediction (SLAP), taking into account kinematics of robots and uncertainties in the precision matrix, the sampling positions, and the measurements of a GMRF in a fully Bayesian manner. In contrast to [42], the kinematics of the robotic vehicles 135 (b) (a) (c) −10 −10 0 0 0 10 10 0.3 −10 10 −10 0 10 −10 0 10 0.2 0.1 0 −0.1 −10 (e) (d) 0 10 (f) −10 −10 0 0 0 10 10 0.07 −10 10 −10 0 10 −10 (g) 0 10 0.05 0.03 0.01 −10 (h) −10 −0.2 0 10 (i) −10 0.15 −10 0.1 0 0 0 10 10 10 0.05 −10 0 10 −10 0 10 −10 0 10 0 Figure 5.1: The prediction results of Cases 1, 2, and 3 at time t = 100 are shown in the first, second, third columns, respectively. The first, second, and third rows correspond to the prediction, prediction error variance, and squared empirical error fields between the prediction and true fields. True, noisy, and probable sampling positions are shown in circles, stars, and dots, respectively. The x and y axis represent 2-D localization, and colors represent the value of the desired quantity in the locations. are integrated into the inference algorithm. The simulation results show that the proposed approach estimates the sampling positions and predicts the spatial field along with their prediction error variances successfully, in a fixed computational time. The simulation study suggests that the complexity of the proposed scalable inference algorithm is affordable for a 136 −15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15 Figure 5.2: The trajectories of true, predicted, and noisy sampling positions of the robot are shown by red diamonds, blue dots, and green stars for time t ∈ {10, 11, · · · , 30}. The blue ellipsoids show the confidence regions of about 68% for the estimated sampling positions. robot to operate in real world situations. 137 Chapter 6 Conclusion and Future Work In this chapter, we briefly summarize the key contributions presented in this dissertation and propose some directions for future work. 6.1 Conclusion In Chapter 2, we have formulated Gaussian process regression with observations under the localization uncertainty. Effects of the measurements noise, localization uncertainty, and prior distributions have been all correctly incorporated in the posterior predictive statistics in a Bayesian approach. We have reviewed the Monte Carlo sampling and Laplace’s method, which have been applied to compute the analytically intractable posterior predictive statistics of the Gaussian processes with localization uncertainty. The approximation error and complexity of all the proposed approaches have been analyzed. In particular, we have provided tradeoffs between the error and complexity of Laplace approximations and their different degrees such that one can choose a tradeoff taking into account the performance requirements and computation complexity due to the resource-constrained sensor network. Simulation study demonstrated that the proposed approaches perform much better than approaches without considering the localization uncertainty properly. In Chapter 3, the multivariate Gaussian process model is utilized to use vision data for the mobile robot localization. The predictive statistics of vision data is learned in advance 138 and used in order to estimate the position of a mobile robot, equipped just with an omnidirectional camera. For the sake of flexibility, the multivariate GP model with unknown hyperparameters is used to model vision data. The experiments show good positioning results within a reasonable computational time. In Chapter 4, the problem of predicting a spatio-temporal field using successive noisy measurements is tackled. We developed the spatio-temporal field of interest using a GMRF and designed sequential prediction algorithms for computing the exact and approximated predictive inference from a Bayesian point of view. The most important contribution is that the computation times for the proposed algorithms do not grow as the number of measurements increases. In Chapter 5, we provide an approximate Bayesian solution to the problem of simultaneous localization and spatial prediction (SLAP), taking into account kinematics of robots and uncertainties in the precision matrix, the sampling positions, and the measurements of a GMRF in a fully Bayesian manner. In contrast to Chapter 4, the kinematics of the robotic vehicles are integrated into the inference algorithm. The simulation results show that the proposed approach estimates the sampling positions and predicts the spatial field along with their prediction error variances successfully, in a fixed computational time. The simulation study suggests that the complexity of the proposed scalable inference algorithm is affordable for a robot to operate in real world situations. 6.2 Future Work In the long term, we plan to expanding our current work and exploring on the following directions. 139 Although optimal sampling was not pursued in this dissertation, it will be interesting to consider the optimal sampling strategies for mobile sensing agents subject to complicated vehicle dynamics. The distributed implementation of the proposed algorithms is practically important, especially for a sensor network with a limited computational power. To this end, we are interested in investigating on the efficient implementation and emerging technologies such as GPU computing to speed up current algorithms for real time applications. Expanding the work on spatial modeling using GMRFs to deal with a general spatiotemporal process is an important direction to follow in the future. Implementing the developed algorithms in more challenging and more realistic experiments will play a key role in expanding the scope of applications of the proposed methods in this dissertation. 140 BIBLIOGRAPHY 141 BIBLIOGRAPHY [1] Xsens company, MTi-G, miniature AHRS with integreted GPS, 2009. [2] C. Andrieu, N. De Freitas, A. Doucet, and M.I. Jordan. An introduction to MCMC for machine learning. Machine learning, 50(1):5–43, 2003. [3] C. M. Bishop. Pattern recognition and machine learning. Springer, New York, 2006. [4] P. Blaer and P. Allen. Topological mobile robot localization using fast vision techniques. In Proceedings of the IEEE International Conference on Robotics and Automation,, volume 1, pages 1031–1036, 2002. [5] Francisco Bonin-Font, Alberto Ortiz, and Gabriel Oliver. Visual navigation for mobile robots: A survey. Journal of intelligent and robotic systems, 53(3):263–296, 2008. [6] A. Brix and P.J. Diggle. Spatiotemporal prediction for log-Gaussian Cox processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(4):823– 841, 2001. [7] A. Brooks, A. Makarenko, and B. Upcroft. Gaussian process models for indoor and outdoor sensor-centric robot localization. IEEE Transactions on Robotics, 24(6):1341– 1351, 2008. [8] J. Choi, J. Lee, and S. Oh. Biologically-inspired navigation strategies for swarm intelligence using spatial Gaussian processes. In Proceedings of the 17th International Federation of Automatic Control World Congress, pages 593–598, July 2008. [9] J. Choi, J. Lee, and S. Oh. Swarm intelligence for achieving the global maximum using spatio-temporal Gaussian processes. In Proceedings of the 27th American Control Conference, pages 135–140, June 2008. [10] J. Choi, S. Oh, and R. Horowitz. Distributed learning and cooperative control for multi-agent systems. Automatica, 45(12):2802–2814, December 2009. [11] G.C. Christie and C.I. Goddard. Sea lamprey international symposium (slis ii): advances in the integrated management of sea lamprey in the great lakes. Journal of Great Lakes Research, 29:1–14, 2003. 142 [12] J. Cort´s. Distributed Kriged Kalman filter for spatial estimation. IEEE Transactions e on Automatic Control, 54(12):2816–2827, December 2009. [13] J. Cort´s. Distributed Kriged Kalman filter for spatial estimation. IEEE Transactions e on Automatic Control, 54(12):2816–2827, 2010. [14] N. Cressie. Kriging nonstationary data. Journal of the American Statistical Association, 81(395):625–634, September 1986. [15] N. Cressie and N. Verzelen. Conditional-mean least-squares fitting of Gaussian Markov random fields to Gaussian fields. Computational Statistics & Data Analysis, 52(5):2794–2807, January 2008. [16] N. Cressie and C.K. Wikle. Space–time Kalman filter. Encyclopedia of environmetrics, 4:2045–2049, 2002. [17] D. Culler, D. Estrin, and M. Srivastava. Guest editors’ introduction: overview of sensor networks. Computer, 37(8):41–49, August 2004. [18] M.P. Deisenroth, C.E. Rasmussen, and J. Peters. Gaussian process dynamic programming. Neurocomputing, 72(7-9):1508–1524, 2009. [19] C´dric Demonceaux, Pascal Vasseur, and Yohan Fougerolle. Central catadioptric image e processing with geodesic metric. Image and Vision Computing, 29(12):840–849, 2011. [20] Guilherme N. DeSouza and Avinash C. Kak. Vision for mobile robot navigation: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence,, 24(2):237– 267, 2002. [21] M.W.M.G. Dissanayake, P. Newman, S. Clark, H.F. Durrant-Whyte, and M. Csorba. A solution to the simultaneous localization and map building (SLAM) problem. IEEE Transactions on Robotics and Automation, 17(3):229–241, 2001. [22] Vladislavs Dovgalecs, R´mi M´gret, and Yannick Berthoumieu. Multiple feature fue e sion based on co-training approach and time regularization for place classification in wearable video. Advances in Multimedia, 1, 2013. [23] Y. Dupuis, X. Savatier, and P. Vasseur. Feature subset selection applied to model-free gait recognition. Image and Vision Computing, 31(8):580 – 591, 2013. 143 [24] H. Durrant-Whyte and T. Bailey. Simultaneous localisation and mapping (SLAM): Part I the essential algorithms. Robotics and Automation Magazine, 13(2):99–110, 2006. [25] D. Estrin, D. Culler, K. Pister, and G. Sukhatme. Connecting the physical world with pervasive networks. Pervasive computing, 1(1):59–69, 2002. [26] Jo˜o Carlos Apar´ Fernandes and Jos´ Alberto B Campos Neves. Using conical and a ıcio e spherical mirrors with conventional cameras for 360 panorama views in a single image. In Proceedings of the IEEE International Conference on Mechatronics, pages 157–160, 2006. [27] B. Ferris, D. Fox, and N. Lawrence. WiFi-SLAM using Gaussian process latent variable models. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 2480–2485, 2007. [28] J. Geweke. Bayesian inference in econometric models using Monte Carlo integration. Econometrica: Journal of the Econometric Society, 57(6):1317–1339, 1989. [29] A. Girard, C.E. Rasmussen, J.Q. Candela, and R. Murray-Smith. Gaussian process priors with uncertain inputs-application to multiple-step ahead time series forecasting. Advances in Neural Information Processing Systems, pages 545–552, 2003. [30] R. Graham and J. Cort´s. Cooperative adaptive sampling of random fields with e partially known covariance. International Journal of Robust and Nonlinear Control, 22(5):504–534, March 2012. [31] G. Grisettiyz, C. Stachniss, and W. Burgard. Improving grid-based SLAM with raoblackwellized particle filters by adaptive proposals and selective resampling. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pages 2432–2437. IEEE, 2005. [32] J.E. Guivant and E.M. Nebot. Optimization of the simultaneous localization and mapbuilding algorithm for real-time implementation. IEEE Transactions on Robotics and Automation, 17(3):242–257, 2001. [33] Baofeng Guo and Mark S Nixon. Gait feature subset selection by mutual information. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 39(1):36–46, 2009. 144 [34] Efstathios Hadjidemetriou, Michael D. Grossberg, and Shree K. Nayar. Multiresolution histograms and their use for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(7):831–847, 2004. [35] L. Hartman and O. H¨ssjer. Fast kriging of large data sets with Gaussian Markov o random fields. Computational Statistics & Data Analysis, 52(5):2331–2349, January 2008. [36] Dave Higdon, James Gattiker, Brian Williams, and Maria Rightley. Computer model calibration using high-dimensional output. Journal of the American Statistical Association, 103(482):570–583, 2008. [37] KC Ho. Alleviating sensor position error in source localization using calibration emitters at inaccurate locations. IEEE Transactions on Signal Processing, 58(1):67–83, 2010. [38] L. Hu and D. Evans. Localization for mobile sensor networks. In Proceedings of the 10th annual international conference on mobile computing and networking, pages 45– 57. ACM, 2004. [39] M. Jadaliha and J. Choi. Environmental monitoring using autonomous aquatic robots: Sampling algorithms and experiments. IEEE Transactions on Control Systems Technology. in press, 2012. [40] M. Jadaliha and J. Choi. Environmental monitoring using autonomous aquatic robots: Sampling algorithms and experiments. IEEE Transactions on Control Systems Technology, 21(3):899–905, 2013. [41] M. Jadaliha, Y. Xu, , and J. Choi. Efficient Spatial Prediction Using Gaussian Markov Random Fields under Uncertain Localization. In Proceedings of the ASME Dynamic Systems and Control Conference, October 17-19, 2012, Ft. Lauderdale, FL, USA., pages 1–6, 2012. [42] M. Jadaliha, Y. Xu, and J. Choi. Efficient Spatial Prediction Using Gaussian Markov Random Fields Under Uncertain Localization. In Proceeding of the ASME Dynamic Systems and Control Conference, October 2012. [43] M. Jadaliha, Y. Xu, and J. Choi. Gaussian process regression using Laplace approximations under localization uncertainty. In Proceedings of the American Control Conference, June 2012. 145 [44] Mahdi Jadaliha, Yunfei Xu, Jongeun Choi, N Johnson, and Weiming Li. Gaussian process regression for sensor networks under localization uncertainty. IEEE Transactions on Signal Processing, 61(2):223–237, 2013. [45] Gijeong Jang, Sungho Lee, and Inso Kweon. Color landmark based self-localization for indoor mobile robots. In Proceedings of the IEEE International Conference on Robotics and Automation, volume 1, pages 1037–1042, 2002. [46] Woo Yeon Jeong and Kyoung Mu Lee. Visual slam with line and corner features. In Proceeding of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2570–2575. IEEE, 2006. [47] L. Jetto, S. Longhi, and G. Venturini. Development and experimental validation of an adaptive extended Kalman filter for the localization of mobile robots. IEEE Transactions on Robotics and Automation, 15(2):219–229, April 1999. [48] N.S. Johnson, M.A. Luehring, M.J. Siefkes, and W. Li. Mating pheromone reception and induced behavior in ovulating female sea lampreys. North American journal of fisheries management, 26(1):88–96, 2006. [49] N.S. Johnson, S.S. Yun, H.T. Thompson, C.O. Brant, and W. Li. A synthesized pheromone induces upstream movement in female sea lamprey and summons them into traps. Proceedings of the National Academy of Sciences, 106(4):1021, 2009. [50] R. Karlsson and F. Gustafsson. Bayesian surface and underwater navigation. IEEE Transactions on Signal Processing, 54(11):4204–4213, 2006. [51] A. Kemppainen, J. Haverinen, I. Vallivaara, and J. R¨ning. Near-optimal SLAM o exploration in Gaussian processes. In Proceeding of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pages 7–13, September 2010. [52] Ji-Hyun Kim. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics & Data Analysis, 53(11):3735–3745, 2009. [53] J. Ko and D. Fox. GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models. Autonomous Robots, 27(1):75–90, 2009. [54] J. Kocijan, R. Murray-Smith, C.E. Rasmussen, and B. Likar. Predictive control with Gaussian process models. In Proceedings of EUROCON 2003, volume 1. IEEE, 2003. 146 [55] J. Kosecka, Liang Zhou, P. Barber, and Z. Duric. Qualitative image based localization in indoors environments. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages II–3–II–8, 2003. [56] A. Krause, C. Guestrin, A. Gupta, and J. Kleinberg. Near-optimal sensor placements: maximizing information while minimizing communication cost. In Proceedings of the 5th international conference on Information processing in sensor networks, pages 2–10. ACM, 2006. [57] A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in Gaussian processes: theory, efficient algorithms and empirical studies. The Journal of Machine Learning Research, 9:235–284, 2008. [58] N. Lawrence, M. Seeger, and R. Herbrich. Fast sparse Gaussian process methods: The informative vector machine. Advances in Neural Information Processing Systems, pages 625–632, 2003. [59] Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 2169–2178, 2006. [60] J. Le Ny and G. J. Pappas. On trajectory optimization for active sensing in Gaussian process models. In Proceedings of the 48th IEEE Conference on Decision and Control, pages 6286–6292, December 2009. [61] J.J. Leonard and H.F. Durrant-Whyte. Simultaneous map building and localization for an autonomous mobile robot. In Proceeding of the Intelligent Robots and Systems (IROS), pages 1442–1447. IEEE, 1991. [62] N.E. Leonard, D.A. Paley, F. Lekien, R. Sepulchre, D.M. Fratantoni, and R.E. Davis. Collective motion, sensor networks, and ocean sampling. Proceedings of the IEEE, 95(1):48–74, 2007. [63] F. Lindgren, H. Rue, and J. Lindstr¨m. An explicit link between Gaussian fields and o Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B, 73(4):423–498, September 2011. [64] K. M. Lynch, I. B. Schwartz, P. Yang, and R. A. Freeman. Decentralized environmental modeling by mobile sensor networks. IEEE Transactions on Robotics, 24(3):710–724, June 2008. 147 [65] A. McHutchon and C.E. Rasmussen. Gaussian Process training with input noise. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Grandata, Spain,, pages 1–9, December 2011. [66] Emanuele Menegatti, Takeshi Maeda, and Hiroshi Ishiguro. Image-based memory for robot navigation using properties of omnidirectional images. Robotics and Autonomous Systems, 47(4):251–267, 2004. [67] Emanuele Menegatti, Mauro Zoccarato, Enrico Pagello, and Hiroshi Ishiguro. Imagebased Monte Carlo localisation with omnidirectional images. Robotics and Autonomous Systems, 48(1):17–30, 2004. [68] Y. Miyata. Fully exponential Laplace approximations using asymptotic modes. Journal of the American Statistical Association, 99(468):1037–1049, 2004. [69] Y. Miyata. Laplace approximations to means and variances with asymptotic modes. Journal of Statistical Planning and Inference, 140(2):382–392, 2010. [70] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. Fastslam: A factored solution to the simultaneous localization and mapping problem. In Proceedings of the National conference on Artificial Intelligence, pages 593–598. AAAI, 2002. [71] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. FastSLAM 2.0: An improved particle filtering algorithm for simultaneous localization and mapping that provably converges. In Proceeding of the Artificial Intelligence, volume 18, pages 1151–1156, 2003. [72] K. Murphy. Bayesian map learning in dynamic environments. Advances in Neural Information Processing Systems (NIPS), 12:1015–1021, 1999. [73] R. Murray-Smith, D. Sbarbaro, C.E. Rasmussen, and A. Girard. Adaptive, cautious, predictive control with Gaussian process priors. In Proceeding of the 13th IFAC Symposium on System Identification, Rotterdam, Netherlands, pages 1195–1200, 2003. [74] M.F. Mysorewala, D.O. Popa, and F.L. Lewis. Multi-scale adaptive sampling with mobile agents for mapping of forest fires. Journal of Intelligent and Robotic Systems, 54(4):535–565, 2009. [75] D. Nash and M. Hannah. Using Monte-Carlo simulations and Bayesian Networks to quantify and demonstrate the impact of fertiliser best management practices. Environmental Modelling & Software, 26(9):1079–1088, 2011. 148 [76] Mark Nixon and Alberto S Aguado. Feature extraction & image processing. Academic Press, 2008. [77] S. O’Callaghan, F.T. Ramos, and H. Durrant-Whyte. Contextual occupancy maps using Gaussian processes. In Proceeding of the IEEE International Conference on Robotics and Automation, pages 1054–1060, May 2009. [78] P. Oguz-Ekim, JP Gomes, J. Xavier, and P. Oliveira. Robust localization of nodes and time-recursive tracking in sensor networks using noisy range measurements. IEEE Transactions on Signal Processing, 59(99):3930–3942, 2011. [79] S. Oh, L. Schenato, P. Chen, and S. Sastry. Tracking and coordination of multiple agents using sensor networks: System design, algorithms and experiments. Proceedings of the IEEE, 95(1):234–254, January 2007. [80] Naoya Ohnishi and Atsushi Imiya. Appearance-based navigation and homing for autonomous mobile robot. Image and Vision Computing, 31(6):511 – 532, 2013. [81] Hanchuan Peng, Fuhui Long, and Chris Ding. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):1226–1238, 2005. [82] Christian Plagemann, Cyrill Stachniss, Jrgen Hess, Felix Endres, and Nathan Franklin. A nonparametric learning approach to range sensing from omnidirectional vision. Robotics and Autonomous Systems, 58(6):762 – 772, 2010. [83] J. Qui˜onero-Candela and C. E. Rasmussen. A unifying view of sparse approximate n Gaussian process regression. The Journal of Machine Learning Research, 6:1939–1959, 2005. [84] C. E. Rasmussen and C. K. I. Williams. Gaussian processes for machine learning. The MIT Press, Cambridge, Massachusetts, London, England, 2006. [85] Carl Edward Rasmussen. Gaussian processes for machine learning. MIT Press, 2006. [86] W. Ren. Consensus strategies for cooperative control of vehicle formations. Control Theory & Applications, IET, 1(2):505–512, 2007. [87] Eric Royer, Maxime Lhuillier, Michel Dhome, and Jean-Marc Lavest. Monocular vision for mobile robot localization and autonomous navigation. International Journal of Computer Vision, 74(3):237–260, 2007. 149 [88] H. Rue and L. Held. Gaussian Markov random fields: theory and applications. Chapman & Hall, 2005. [89] H. Rue, S. Martino, and N. Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2):319–392, April 2009. [90] H. Rue and H. Tjelmeland. Fitting Gaussian Markov random fields to Gaussian fields. Scandinavian Journal of Statistics, 29(1):31–49, March 2002. [91] Timo Schairer, Benjamin Huhle, Philipp Vorst, Andreas Schilling, and Wolfgang Straser. Visual mapping with uncertainty for correspondence-free localization using Gaussian process regression. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4229–4235, 2011. [92] D Scharstein and A.J Briggs. Real-time recognition of self-similar landmarks. Image and Vision Computing, 19(11):763 – 772, 2001. [93] Matthias Seeger. Bayesian Gaussian process models: PAC-Bayesian generalisation error bounds and sparse approximations. PhD thesis, School of Informatics, University of Edinburgh, 2003. [94] BR Smith and JJ Tibbles. Sea lamprey (petromyzon marinus) in lakes huron, michigan, and superior: history of invasion and control, 1936-78. Canadian Journal of Fisheries and Aquatic Sciences, 37(11):1780–1801, 1980. [95] A. J. Smola and P. L. Bartlett. Sparse greedy Gaussian process regression. In Proceeding of the Advances in Neural Information Processing Systems 13, pages 619–625, 2001. [96] M. St-Pierre and D. Gingras. Comparison between the unscented Kalman filter and the extended Kalman filter for the position estimation module of an integrated navigation information system. In Proceeding of the IEEE Intelligent Vehicles Symposium, pages 831–835. IEEE, June 2004. [97] S. Thrun. Simultaneous localization and mapping. Robotics and cognitive approaches to spatial mapping, pages 13–41, 2008. [98] S. Thrun, D. Fox, W. Burgard, and F. Dellaert. Robust Monte Carlo localization for mobile robots. Artificial intelligence, 128(1):99–141, May 2001. 150 [99] Sebastian Thrun. Bayesian landmark learning for mobile robot localization. Machine Learning, 33(1):41–76, 1998. [100] L. Tierney and J.B. Kadane. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81(393):82–86, 1986. [101] L. Tierney, R.E. Kass, and J.B. Kadane. Fully exponential Laplace approximations to expectations and variances of nonpositive functions. Journal of the American Statistical Association, 84(407):710–716, 1989. [102] D.H. Titterton and J.L. Weston. Strapdown inertial navigation technology. Peter Peregrinus Ltd, 2004. [103] A. Torralba, K.P. Murphy, W.T. Freeman, and M.A. Rubin. Context-based vision system for place and object recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages 273–280, 2003. [104] Volker Tresp. A Bayesian committee machine. Neural Computation, 12(11):2719–2741, 2000. [105] I. Vallivaara, J. Haverinen, A. Kemppainen, and J. Roning. Simultaneous localization and mapping using ambient magnetic field. In Proceeding of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pages 14–19, September 2010. [106] Shrihari Vasudevan. Data fusion with Gaussian processes. Robotics and Autonomous Systems, 60(12):1528 – 1544, 2012. [107] N. Vlassis, R. Bunschoten, and B. Krose. Learning task-relevant features from robot data. In Proceedings of the IEEE International Conference on Robotics and Automation, volume 1, pages 499–504, 2001. [108] M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 1(1-2):1–305, 2008. [109] Gordon Wells and Carme Torras. Assessing image features for vision-based robot positioning. Journal of Intelligent and Robotic Systems, 30(1):95–118, 2001. [110] Gordon Wells, Christophe Venaille, and Carme Torras. Vision-based robot positioning using neural networks. Image and Vision Computing, 14(10):715–732, 1996. 151 [111] C. K. I. Williams and C. E. Rasmussen. Gaussian processes for regression. Advances in Neural Information Processing Systems, 8:514–520, 1996. [112] Christopher Williams and Matthias Seeger. Using the Nystrom method to speed up kernel machines. In Proceeding of the Advances in Neural Information Processing Systems 13, pages 682–688. MIT Press, 2001. [113] Y. Xu and J. Choi. Adaptive sampling for learning Gaussian processes using mobile sensor networks. Sensors, 11(3):3051–3066, 2011. [114] Y. Xu and J. Choi. Adaptive sampling for learning Gaussian processes using mobile sensor networks. Sensors, 11(3):3051–3066, March 2011. [115] Y. Xu, J. Choi, S. Dass, and T. Maiti. Bayesian prediction and adaptive sampling algorithms for mobile sensor networks. In Proceedings of the 2011 American Control Conference, pages 4195–4200, June 2011. [116] Y. Xu, J. Choi, S. Dass, and T. Maiti. Efficient Bayesian spatial prediction with mobile sensor networks using Gaussian Markov random fields. 2012. submitted to Automatica. [117] Y. Xu, J. Choi, S. Dass, and T. Maiti. Sequential Bayesian prediction and adaptive sampling algorithms for mobile sensor networks. IEEE Transactions on Automatic Control, 57(8):2078–2084, 2012. [118] Y. Xu, J. Choi, and S. Oh. Mobile sensor network coordination using Gaussian processes with truncated observations. IEEE Transactions on Robotics, 27(6):1118–1131, 2011. [119] Y. Xu, J. Choi, and S. Oh. Mobile sensor network navigation using Gaussian processes with truncated observations. IEEE Transactions on Robotics, 27(6):1118 – 1131, December 2011. [120] Yunfei Xu and Jongeun Choi. Spatial prediction with mobile sensor networks using Gaussian processes with built-in Gaussian Markov random fields. Automatica, 48(8):1735 – 1740, 2012. 152